In the video about Inception Network,
loss = loss_fn(outputs, labels) + 0.3 * loss_fn(aux_outputs, labels)
We use Auxiliary loss because while training deeper into the network the gradients vanish, but if we are freezing the previous weights (i.e not training them) then why are we using loss from previous layers and adding it to the final loss thus adding noise to the loss.
Could this possibly slow down training and stop us from reaching better results
or the vice versa?