The rank result of pandas DataFrame seems weird. This is a sample code:
>>> import pandas as pd >>> import numpy as np >>> df = pd.DataFrame(np.random.random((10, 5))) >>> df 0 1 2 3 4 0 0.956603 0.379341 0.268281 0.446098 0.630782 1 0.939022 0.732704 0.892836 0.813121 0.829652 2 0.628488 0.046074 0.344966 0.422442 0.942899 3 0.535603 0.473202 0.885504 0.481541 0.873048 4 0.908629 0.449296 0.740381 0.356437 0.670467 5 0.631618 0.147706 0.381521 0.723074 0.151051 6 0.276021 0.274220 0.812456 0.283248 0.609319 7 0.112798 0.855934 0.198935 0.433243 0.247930 8 0.479593 0.643699 0.068690 0.465188 0.907548 9 0.452467 0.295931 0.629863 0.565983 0.784952 >>> df.rank(pct=True) 0 1 2 3 4 0 1.0 0.5 0.3 0.5 0.4 1 0.9 0.9 1.0 1.0 0.7 2 0.6 0.1 0.4 0.3 1.0 3 0.5 0.7 0.9 0.7 0.8 4 0.8 0.6 0.7 0.2 0.5 5 0.7 0.2 0.5 0.9 0.1 6 0.2 0.3 0.8 0.1 0.3 7 0.1 1.0 0.2 0.4 0.2 8 0.4 0.8 0.1 0.6 0.9 9 0.3 0.4 0.6 0.8 0.6 >>> df.iloc[0, :].rank(pct=True) 0 1.0 1 0.4 2 0.2 3 0.6 4 0.8 Name: 0, dtype: float64 I don't understand why the first row of the ranking on the df row-wise (the default for axis is 0) is not the same as the ranking on the first row of the data frame.
Also, the result of df.rank(pct=True) seems weird. Looking at the first row of df, we see that col0 > col4 > col3 > col1 > col2. Since the default is ascending=True, I would expect the result of df.rank(pct=True) to also have the same order, but its result is col0 > col3 = col1 > col4 > col2. On the other hand, the order of df.iloc[0,:].rank(pct=True) seems correct. So my question is:
- Why is the first row of
df.rank(pct=True)different fromdf.iloc[0, :].rank(pct=True)? - Why is the order of
df.rank(pct=True)not the same as the order ofdf?
没有评论:
发表评论