Kaggle mobile like dislike contest doubt in binarisation

Hello everyone,

I have doubt regarding binarisation of output(i.e. rating) of training datasets

method 1.

y_binarised = train_new[‘Rating’].apply(lambda x: 1 if x>=4.07 else 0) #took 4.07 as it is mean of rating column

method 2:


y_binarised = binarised_train[‘Rating’].values

both giving different answer of y_binarised. I know pd.cut will divide the data in two half wrt to mean. that is why I took 4.07(mean) in 1st method. still both giving different ansere of y _ binarised and hence leads to different accuracy on model deployment. what am_ I missing here? Thanks in advance

try this code for method2:


why [1,0] ? this we used during breast cancer example but here why?