Why use InceptionNet Auxiliary Loss when freezing previous layers?

In the video about Inception Network,

loss = loss_fn(outputs, labels) + 0.3 * loss_fn(aux_outputs, labels)

We use Auxiliary loss because while training deeper into the network the gradients vanish, but if we are freezing the previous weights (i.e not training them) then why are we using loss from previous layers and adding it to the final loss thus adding noise to the loss.
Could this possibly slow down training and stop us from reaching better results
or the vice versa?

Any help would be appreciated : )

You are right, we do not require the auxiliary losses if we are freezing most of the network except the last layer or 2.
That is, we would require only the last layer loss in such fine-tuning cases.

If this is there as part of the course notebook, this an errata. (that is, probably a left over thing after the full training which was forgotten to remove). We request to just remove the auxillary loss part.

No, since we would have frozen (grad=None) the previous layers parameters, the backpropagation would not have been done to the frozen layers by PyTorch. So it doesn’t affect anything.

1 Like