Why can't we use a 3x3 filter to reduce the depth instead of 1x1?

I have understood that a 1x1 convolution will result in less computation and we can also decrease the depth based on the no. of filters.

I just wanna know If our idea was to reduce the depth can’t we just do it with any other dimension filter like if we had taken a 3x3 filter and used less no of filters than the input depth of output would have been less.

Then apart from being less computationally intensive why not use a 3x3 filter than a 1x1?

Hi @atulkrjha,
Thanks for this really nice question,

  1. If we look at this carefully, 1x1 filters have a smaller receptive field (lookout area), which will help the output to capture smaller and complex features from the input.
    Whereas, if we use a 3x3 filter, the receptive field would have been wider, resulting into a different output than 1x1.

  2. While reducing the depth of our input, we need to capture just the local features, and might not want a general overview of a 3x3 span, which is why 1x1 filters are specifically used for this purpose.

Others can feel free to add more comments to this discussion, if they find something unique.

1 Like