2021年1月4日星期一

Performance Comparisons between Series and Numpy

I am building a Performance comparison Table between Numpy and Series:

Two Instances caught my Eye. Any help will be really helpful.

  1. We say that we should avoid using Loops in Numpy and Series, but I came across one scenario where for loop is performing better

In Below Code I am Calculating Density of Planets using for Loops and without for Loop

mass=  pd.Series([0.330, 4.87, 5.97, 0.073, 0.642, 1898, 568, 86.8, 102, 0.0146], index = ['MERCURY', 'VENUS', 'EARTH', 'MOON', 'MARS', 'JUPITER', 'SATURN', 'URANUS', 'NEPTUNE', 'PLUTO'])  diameter = pd.Series([4879, 12104, 12756, 3475, 6792, 142984, 120536, 51118, 49528, 2370], index = ['MERCURY', 'VENUS', 'EARTH', 'MOON', 'MARS', 'JUPITER', 'SATURN', 'URANUS', 'NEPTUNE', 'PLUTO'])     %%timeit -n 1000     density = mass / (np.pi * np.power(diameter, 3) /6)     1000 loops, best of 3: 617 µs per loop     %%timeit -n 1000     density = pd.Series()    for planet in mass.index:        density[planet] = mass[planet] / ((np.pi * np.power(diameter[planet], 3)) / 6)     1000 loops, best of 3: 183 µs per loop  
  1. Second, I am trying to replace nan values in Series using Two approaches

Why do the First approach works Faster??? My Guess is that second approach is converting Series Object in N-d array

sample2 = pd.Series([1, 2, 3, 4325, 23, 3, 4213, 102, 89, 4, np.nan, 6, 803, 43, np.nan, np.nan, np.nan])     x = np.mean(sample2)     x     %%timeit -n 10000     sample3 = pd.Series(np.where(np.isnan(sample2), x, sample2))     10000 loops, best of 3: 166 µs per loop     %%timeit -n 10000     sample2[np.isnan(sample2)] =x     10000 loops, best of 3: 1.08 ms per loop     
https://stackoverflow.com/questions/65572910/performance-comparisons-between-series-and-numpy January 05, 2021 at 11:38AM

没有评论:

发表评论