2021年1月1日星期五

Pandas groupby and keep the max length value

I have the following data frame:

name    state   gender  region  old_ip1 old_ip2 new_ip1 new_ip2  ABC     GA      M       East    1.2.3.4 2.3.4.5       ABC     GA      M       East                  ABC     GA      M       East                  ABC     GA      M       East                    3.4.5.6 4.5.6.7  ABC     A       M       South                 ABC     GA      M       South   5.6.7.8 6.7.8.9       ABC     GA      M       South                   7.8.9.1 8.9.1.2  BCD     GA      M       East    9.1.2.3 1.2.3.4       BCD     GA      M       East                    2.3.4.5 3.4.5.6  

I need to group by the data frame by the first 4 columns, and keep the ip value. Each group has one row with old ip 1 and 2, and a different row with new ip 1 and 2. It's possible a group also contains rows without any values in old ip and new ip.

The output should be:

name    state   gender  region  old_ip1 old_ip2 new_ip1 new_ip2  ABC     GA      M       East    1.2.3.4 2.3.4.5 3.4.5.6 4.5.6.7   ABC     GA      M       South   5.6.7.8 6.7.8.9 7.8.9.1 8.9.1.2  BCD     GA      M       East    9.1.2.3 1.2.3.4 2.3.4.5 3.4.5.6  

I am thinking to concatenate all values for each group, but it does not work. Here is my code so far:

df.groupby(['name', 'state', 'gender', 'region'], as_index=False).agg(lambda x : ';'.join(x))  

Looking for a solution. It does not have to be concatenating.

https://stackoverflow.com/questions/65535848/pandas-groupby-and-keep-the-max-length-value January 02, 2021 at 11:46AM

没有评论:

发表评论