2021年5月6日星期四

How do I make bins of equal number of observations in a pandas dataframe?

I'm trying to make a column in a dataframe depicting a group or bin that observation belongs to. The idea is to sort the dataframe according to some column, then develop another column denoting which bin that observation belongs to. If I want deciles, then I should be able to tell a function I want 10 equal (or close to equal) groups.

I tried the pandas qcut but that just gives a tuples of the the upper and lower limits of the bins. I would like just 1,2,3,4....etc. Take the following for example

import numpy as np  import pandas as pd    x = [1,2,3,4,5,6,7,8,5,45,64545,65,6456,564]  y = np.random.rand(len(x))    df_dict = {'x': x, 'y': y}  df = pd.DataFrame(df_dict)  

This gives a df of 14 observations. How could I get groups of 5 equal bins?

The desired result would be the following:

        x         y  group  0       1  0.926273      1  1       2  0.678101      1  2       3  0.636875      1  3       4  0.802590      2  4       5  0.494553      2  5       6  0.874876      2  6       7  0.607902      3  7       8  0.028737      3  8       5  0.493545      3  9      45  0.498140      4  10  64545  0.938377      4  11     65  0.613015      4  12   6456  0.288266      5  13    564  0.917817      5  
https://stackoverflow.com/questions/67424710/how-do-i-make-bins-of-equal-number-of-observations-in-a-pandas-dataframe May 07, 2021 at 03:21AM

没有评论:

发表评论