I am building a Performance comparison Table between Numpy and Series:
Two Instances caught my Eye. Any help will be really helpful.
- We say that we should avoid using Loops in Numpy and Series, but I came across one scenario where for loop is performing better
In Below Code I am Calculating Density of Planets using for Loops and without for Loop
mass= pd.Series([0.330, 4.87, 5.97, 0.073, 0.642, 1898, 568, 86.8, 102, 0.0146], index = ['MERCURY', 'VENUS', 'EARTH', 'MOON', 'MARS', 'JUPITER', 'SATURN', 'URANUS', 'NEPTUNE', 'PLUTO'])
diameter = pd.Series([4879, 12104, 12756, 3475, 6792, 142984, 120536, 51118, 49528, 2370], index = ['MERCURY', 'VENUS', 'EARTH', 'MOON', 'MARS', 'JUPITER', 'SATURN', 'URANUS', 'NEPTUNE', 'PLUTO'])
%%timeit -n 1000
density = mass / (np.pi * np.power(diameter, 3) /6)
1000 loops, best of 3: 617 µs per loop
%%timeit -n 1000
density = pd.Series()
for planet in mass.index:
density[planet] = mass[planet] / ((np.pi * np.power(diameter[planet], 3)) / 6)
1000 loops, best of 3: 183 µs per loop
- Second, I am trying to replace nan values in Series using Two approaches
Why do the Second approach works Faster??? My Guess is that second approach is converting Series Object in N-d array
sample2 = pd.Series([1, 2, 3, 4325, 23, 3, 4213, 102, 89, 4, np.nan, 6, 803, 43, np.nan, np.nan, np.nan])
x = np.mean(sample2)
x
%%timeit -n 10000
sample3 = pd.Series(np.where(np.isnan(sample2), x, sample2))
10000 loops, best of 3: 166 µs per loop
%%timeit -n 10000
sample2[np.isnan(sample2)] =x
10000 loops, best of 3: 1.08 ms per loop
%%timeit -n 10000
sample2[pd.isnull(sample2)] = x
10000 loops, best of 3: 1.02 ms per loop