I am building a Performance comparison Table between Numpy and Series:
Two Instances caught my Eye. Any help will be really helpful.
- We say that we should avoid using Loops in Numpy and Series, but I came across one scenario where for loop is performing better
In Below Code I am Calculating Density of Planets using for Loops and without for Loop
mass= pd.Series([0.330, 4.87, 5.97, 0.073, 0.642, 1898, 568, 86.8, 102, 0.0146], index = ['MERCURY', 'VENUS', 'EARTH', 'MOON', 'MARS', 'JUPITER', 'SATURN', 'URANUS', 'NEPTUNE', 'PLUTO']) diameter = pd.Series([4879, 12104, 12756, 3475, 6792, 142984, 120536, 51118, 49528, 2370], index = ['MERCURY', 'VENUS', 'EARTH', 'MOON', 'MARS', 'JUPITER', 'SATURN', 'URANUS', 'NEPTUNE', 'PLUTO']) %%timeit -n 1000 density = mass / (np.pi * np.power(diameter, 3) /6) 1000 loops, best of 3: 617 µs per loop %%timeit -n 1000 density = pd.Series() for planet in mass.index: density[planet] = mass[planet] / ((np.pi * np.power(diameter[planet], 3)) / 6) 1000 loops, best of 3: 183 µs per loop - Second, I am trying to replace nan values in Series using Two approaches
Why do the First approach works Faster??? My Guess is that second approach is converting Series Object in N-d array
sample2 = pd.Series([1, 2, 3, 4325, 23, 3, 4213, 102, 89, 4, np.nan, 6, 803, 43, np.nan, np.nan, np.nan]) x = np.mean(sample2) x %%timeit -n 10000 sample3 = pd.Series(np.where(np.isnan(sample2), x, sample2)) 10000 loops, best of 3: 166 µs per loop %%timeit -n 10000 sample2[np.isnan(sample2)] =x 10000 loops, best of 3: 1.08 ms per loop https://stackoverflow.com/questions/65572910/performance-comparisons-between-series-and-numpy January 05, 2021 at 11:38AM
没有评论:
发表评论