Difference in Output

Below are two codes to solve a single problem… Please explain why the output is coming different in both cases??

import numpy as np

cric_data=np.loadtxt("cric_data-200320-181217.tsv",skiprows=1, usecols=[1,2,3])

cric_data.shape

##Method I

qrs=np.percentile(cric_data[:,2],[25,50,75,100],axis=0)
mean=np.zeros(4,)
for i in range(4):
    mean[i]=np.mean(cric_data[cric_data[:,2]<=qrs[i]][:,0])
print(mean)

##Output: [20.9122807  27.92920354 32.0887574  39.87555556]
##Method II

arr=cric_data[:,2]

arr1=qrs.reshape(4,1)

indices=arr<arr1

for i in range(4):
    mean[i]=np.mean(cric_data[indices[i,:]][:,0])
print(mean)
##Output: [19.67272727 28.18018018 31.68862275 39.79910714]

Link:https://colab.research.google.com/drive/1TQC-LgVz3H-c5rnHT9Q7VNf747Svs-Ep?usp=sharing

Thanks

Hi @Pankaj_Rana,
It’s due to the fact that numpy uses some other method to calculate quartiles and percentile.

what does it mean?? I am using percentile only in both the cases!!!
Which one would be correct?

Really sorry, I misunderstood your doubt.

It seems that in the first snippet,

you’re using <= comparison operator.

Whereas in 2nd snippet,

you’re using < operator, which might be the reason.

1 Like