2021年1月7日星期四

Order of pandas dataframe ranking does not match the order of the original dataframe

The rank result of pandas DataFrame seems weird. This is a sample code:

>>> import pandas as pd  >>> import numpy as np  >>> df = pd.DataFrame(np.random.random((10, 5)))  >>> df            0         1         2         3         4  0  0.956603  0.379341  0.268281  0.446098  0.630782  1  0.939022  0.732704  0.892836  0.813121  0.829652  2  0.628488  0.046074  0.344966  0.422442  0.942899  3  0.535603  0.473202  0.885504  0.481541  0.873048  4  0.908629  0.449296  0.740381  0.356437  0.670467  5  0.631618  0.147706  0.381521  0.723074  0.151051  6  0.276021  0.274220  0.812456  0.283248  0.609319  7  0.112798  0.855934  0.198935  0.433243  0.247930  8  0.479593  0.643699  0.068690  0.465188  0.907548  9  0.452467  0.295931  0.629863  0.565983  0.784952  >>> df.rank(pct=True)       0    1    2    3    4  0  1.0  0.5  0.3  0.5  0.4  1  0.9  0.9  1.0  1.0  0.7  2  0.6  0.1  0.4  0.3  1.0  3  0.5  0.7  0.9  0.7  0.8  4  0.8  0.6  0.7  0.2  0.5  5  0.7  0.2  0.5  0.9  0.1  6  0.2  0.3  0.8  0.1  0.3  7  0.1  1.0  0.2  0.4  0.2  8  0.4  0.8  0.1  0.6  0.9  9  0.3  0.4  0.6  0.8  0.6  >>> df.iloc[0, :].rank(pct=True)  0    1.0  1    0.4  2    0.2  3    0.6  4    0.8  Name: 0, dtype: float64  

I don't understand why the first row of the ranking on the df row-wise (the default for axis is 0) is not the same as the ranking on the first row of the data frame.

Also, the result of df.rank(pct=True) seems weird. Looking at the first row of df, we see that col0 > col4 > col3 > col1 > col2. Since the default is ascending=True, I would expect the result of df.rank(pct=True) to also have the same order, but its result is col0 > col3 = col1 > col4 > col2. On the other hand, the order of df.iloc[0,:].rank(pct=True) seems correct. So my question is:

  1. Why is the first row of df.rank(pct=True) different from df.iloc[0, :].rank(pct=True)?
  2. Why is the order of df.rank(pct=True) not the same as the order of df?
https://stackoverflow.com/questions/65623316/order-of-pandas-dataframe-ranking-does-not-match-the-order-of-the-original-dataf January 08, 2021 at 12:05PM

没有评论:

发表评论