I'm trying to make a column in a dataframe depicting a group or bin
that observation belongs to. The idea is to sort the dataframe according to some column, then develop another column denoting which bin that observation belongs to. If I want deciles, then I should be able to tell a function I want 10 equal (or close to equal) groups.
I tried the pandas qcut but that just gives a tuples of the the upper and lower limits of the bins. I would like just 1,2,3,4....etc. Take the following for example
import numpy as np import pandas as pd x = [1,2,3,4,5,6,7,8,5,45,64545,65,6456,564] y = np.random.rand(len(x)) df_dict = {'x': x, 'y': y} df = pd.DataFrame(df_dict)
This gives a df of 14 observations. How could I get groups of 5 equal bins?
The desired result would be the following:
x y group 0 1 0.926273 1 1 2 0.678101 1 2 3 0.636875 1 3 4 0.802590 2 4 5 0.494553 2 5 6 0.874876 2 6 7 0.607902 3 7 8 0.028737 3 8 5 0.493545 3 9 45 0.498140 4 10 64545 0.938377 4 11 65 0.613015 4 12 6456 0.288266 5 13 564 0.917817 5
https://stackoverflow.com/questions/67424710/how-do-i-make-bins-of-equal-number-of-observations-in-a-pandas-dataframe May 07, 2021 at 03:21AM
没有评论:
发表评论