Hello everyone,

I have doubt regarding binarisation of output(i.e. rating) of training datasets

method 1.

y_binarised = train_new[‘Rating’].apply(lambda x: 1 if x>=4.07 else 0) #took 4.07 as it is mean of rating column

method 2:

binarised_train=train_new.apply(pd.cut,bins=2,labels=[0,1])

y_binarised = binarised_train[‘Rating’].values

both giving different answer of y_binarised. I know pd.cut will divide the data in two half wrt to mean. that is why I took 4.07(mean) in 1st method. still both giving different ansere of y _ binarised and hence leads to different accuracy on model deployment. what am_ I *missing here? Thanks in advance*