Say after the input layer, the first hidden layer is a Conv 2D layer. In which say we decide to have 16 filters of size 5*5. What is done to ensure that each filter out 16 learns a different representation?
In the implementations of popular CNN architectures there haven’t been any method to ensure that each filter learns a different representation.
The random initialization of the filters is more or less enough to ensure most filters are different.
One interesting experiment that one could try to understand this deeper is the following:
Train a 2 CNN layer model for MNIST (flatten and FC after 2 conv layers).
For different experiments, keep increasing the num_filters of layer_1 from say 16 till 256, and visualize all filters after all trainings.
You will notice that, as number of filters increase, there is higher chance for 2 or more filters to have somewhat similar receptive fields.