I have the following data frame:
name state gender region old_ip1 old_ip2 new_ip1 new_ip2 ABC GA M East 1.2.3.4 2.3.4.5 ABC GA M East ABC GA M East ABC GA M East 3.4.5.6 4.5.6.7 ABC A M South ABC GA M South 5.6.7.8 6.7.8.9 ABC GA M South 7.8.9.1 8.9.1.2 BCD GA M East 9.1.2.3 1.2.3.4 BCD GA M East 2.3.4.5 3.4.5.6
I need to group by the data frame by the first 4 columns, and keep the ip value. Each group has one row with old ip 1 and 2, and a different row with new ip 1 and 2. It's possible a group also contains rows without any values in old ip and new ip.
The output should be:
name state gender region old_ip1 old_ip2 new_ip1 new_ip2 ABC GA M East 1.2.3.4 2.3.4.5 3.4.5.6 4.5.6.7 ABC GA M South 5.6.7.8 6.7.8.9 7.8.9.1 8.9.1.2 BCD GA M East 9.1.2.3 1.2.3.4 2.3.4.5 3.4.5.6
I am thinking to concatenate all values for each group, but it does not work. Here is my code so far:
df.groupby(['name', 'state', 'gender', 'region'], as_index=False).agg(lambda x : ';'.join(x))
Looking for a solution. It does not have to be concatenating.
https://stackoverflow.com/questions/65535848/pandas-groupby-and-keep-the-max-length-value January 02, 2021 at 11:46AM
没有评论:
发表评论