Capstone Project Doubt in Milestone 2


  • I have completed the text detection part (using detectron2).
  • Milestone -2 ( image to text ), no issue in the CNN and RNN/LSTM part.
  • I am facing issue in the ctcloss part. The documentation of pytorch didn’t help me understand. How do I give the targets(ground truth) for a CTC loss function. If if accepts some kind of encoding as a input. How do I encode the ground truth( say a hindi word in our case)

Kindly clarify this doubt to help me progress further.

Thanks in advance

Hi @abhishek_kalyanarama,

This thread may help:Doubt in text recognition

Hello Ishvinder,

I had posted my query on 07/12/2020 in the link below:

But I have not received any reply for the above. But I managed to resolve the above issues.

Currently, the loss per epoch doesn’t seem to decrease. It increases & decrease. It would be very helpful if some guidance is given.

I am trying the Image to Text Part.
Summary of the Architecture:

  1. My image size to CNN is fixed (N,3,128,128)
  2. The out put from CNN is (N,64,28,28)
  3. To the LSTM i am giving an input of (N,64*28,28) -> (batch size, sequence size , inputs)
  4. Output from LSTM is (N,28,class_size)
  5. To CTC loss

Kindly go through my code in the below link and give some insights on how to make the loss decrease continuously.