In the course lectures we used train_test_split function to break dataset into X_train , Y_train , X_test, Y_test but in kaggle competiton there is already a test.csv file with no Y_test so, how we have to check our model on X_test
You can split the data into 80:20 for training and validation.
You can use train_test_split from sklearn for this
But on seeing codes of other people having top scores, they are not doing train test splitting of train datasets. most of them are doing binarisation of each column separately.
while I am doing train test splitting after binarisation using simple pd.cut. then I am training model using this new 80% train datasets and applying it on 20% test(validation) datasets. this is giving me accuracy of around 70%.
but finally on applying this mode on the given test dataset, it is not giving proper results. Please someone throw some light on this