My understanding of CNNs

In the world of Deep Neural Network plethora of parameters are involved. This is due to the complicated structures sometime a fully feed-forward neural network (FNN) can take. It is pertinent too, to complicate an NN in order to increase accuracies in predictions. With the increase in complications, there are 2 kinds of problem that appear:

  • There is a challenge to trade-off between accuracy and cost of computation
  • Problem of over-fitting, i.e. high variance for validation data on many occasions.

In order to resolve the above 2 challenges, we take the Convolutional Neural Network (CNN) route. Here, not only do we de-escalate the complexity of network, which is not at the cost of accuracy, but also prevent the problem of over-fitting. Let’s see how.

There are 2 noteworthy aspects of CNN – sparse network and weight sharing. Concept of CNN was born from the image transformation technique by multiplying the pixels on an image through various types of filters. It is based on the contributions from the nearest neighbouring pixels. So, imagine using an 8X8 kernel (or weight filter) for an image of 32X32 pixels has to undergo a transformation. In very simple way of convoluting this transformation, the resulting transformed image has dimensions of 25X25 pixels. When you add other considerations such as padding §, stride (S), the resulting image dimensions differ by a factor of 2P/S.

A similar technique is deployed in constructing a CNN with some additional considerations. There could be multiple filters used in a CNN so that transformed layer doesn’t not completely sparse out (with low levels of accuracies). This is analogous to the depth of the resulting transformed image in the above example. Additionally, CNNs go through transformation by non-linear functions (as similar in FNNs) to preserve requisite complexity in the resulting network. Moreover, CNNs use the concept of Max-Pooling (i.e. resampling a group of neuron/pixels in a layer based on group-by functions) to further balance the cost of computation by shedding additional complexity in the network.