Data for contest 3 from module 3 in Deep Learning course

The zipped files in this contest - I tried every thing to see that the file names are printed. Still, not able to access the file data. Can someone please tell me where I am making a mistake? The error I am getting at line at Image.open() says it “cannot identify image file <_io.BytesIO object at 0x7f9f7e41a770>”. I used the io.ByteIO, otherwise the file data wasn’t readable in the 1st place. Below is my code to access and assemble train and test data in respective dicts

rootPath = r"../input/padhai-text-non-text-classification-level-3/"
ta_language = ['ta', 'jpg']
hi_language = ['hi', 'jpg']
en_language = ['en', 'jpg']
background = ['background', 'jpg']
testFilePattern = ['kaggle_level_3', 'jpg']
pattern = '*.zip'
images_train = {}
images_test = {}


for root, dirs, files in os.walk(rootPath):
    for filename in fnmatch.filter(files, pattern):
        zip_handler = zipfile.ZipFile(os.path.join(root, filename), "r")
        print(filename)
        for files in zip_handler.namelist():
            data = zip_handler.read(files)
            dataEncoded = BytesIO(data)
            image = Image.open(dataEncoded)
            image = image.convert("L")

            #Write a logic to check file name and put the file in apt dict with apt key_prefix
            if all(x in files for x in background):
                image_index = 'bgr_'+files.split('/')[2][:-4]
                images_train[image_index] = np.array(image.copy()).flatten()
                image.close()
            elif all(x in files for x in ta_language):
                image_index = 'ta_'+files.split('/')[2][:-4]
                images_train[image_index] = np.array(image.copy()).flatten()
                image.close()
            elif all(x in files for x in hi_language):
                image_index = 'hi_'+files.split('/')[2][:-4]
                images_train[image_index] = np.array(image.copy()).flatten()
                image.close()
            elif all(x in files for x in en_language):
                image_index = 'en_'+files.split('/')[2][:-4]
                images_train[image_index] = np.array(image.copy()).flatten()
                image.close()
            elif all(x in files for x in testFilePattern):
                image_index = files.split('/')[1][:-4]
                images_test[image_index] = np.array(image.copy()).flatten()
                image.close()

        zip_handler.close()


print(len(images_train))
print(len(images_test))



I got the mistake I was doing. I was exercising the Image.open() also for some of the folder files that are under the zipped folder. Off course, doing that on them will render an error.

1 Like

The challenge now is I keep getting some warning message that says I am exceeding memory usage

/opt/conda/lib/python3.7/site-packages/ipykernel_launcher.py:11: RuntimeWarning: overflow encountered in exp #This is added by InteractiveShellApp.init_path()

Try standardizing the data,we get overflow when the value exceeds the possible range of values supported by the data type,so if you standardize the data,usually it should work fine.

I have standardized my X values using StandardScaler

Then try the things,specified over here once:

1 Like

@sanjay_rao
Thanks! Using from scipy.special import expit did suppress the warning