Image augmentation-imbalanced dataset

Hello,
I am doing a binary image classification task using Transfer learning(Inception v3). But there is caveat in the dataset, the classes are imbalanced in the favour of negative samples in both train and test dataset. So there are 3 questions:

  1. If I use the image augmentation during training(rotation, horizontal flip), will it not increase the number of samples of both classes during different epochs in training. If it does, the ratio between the 2 classes still remains same?
    2).Does image augmentation alone can solve the problem for imbalanced dataset?
    3). when using colab’s gpu, and training heavy models like this one, how one can get benefit of hyperparameter tuning using mlflow to see the effect of different batch sizes , epochs etc…

Thanks!!

Hi @rupal,

  1. You should augment data only for the class that is imbalanced, as creating more samples of already dominant class will have even adverse effects.
  2. There are a number of techniques to overcome this problem, but augmenting is probably the most commonly used approach, if somehow this doesn’t work in your case, you can consider reducing the examples of dominant class if you have plenty of data already available.
  3. This point is a bit unclear to me, yes you can probably leverage the power of mlflow on colab, but you may require a bit more effort to set it up. Please refer Colab synergy with MLflow: how to monitor progress and store models

yeah ig the best way for imbalanced dataset is probably to augment only the minority class, the technical word for which is “over-sampling”. use appropriate evaluation metric for this kind of dataset. Also have you tried without transfer learning ie with ur own CNN, sometimes that also work!!