2021年3月3日星期三

How to trim util the last row meet condition within groups in Pandas

Problem Description: Suppose I have the following dataframe

df = pd.DataFrame({"date": [1,2,3,3,4,1,2,3,3,4,1,1,1,4,4,4,1,1,1,2,2,3,3,3,4,4],                 "variable": ["A", "A", "A","A","A","A", "A", "A","A","A", "B", "B", "B","B","B","B" ,"C", "C", "C","C", "D","D","D","D","D","D"],                 "no": [1, 2.2, 3.5, 1.5, 1.5,1, 2.2, 3.5, 1.5, 1.5, 1.2, 1.3, 1.1, 2, 3,1, 2.2, 3.5, 1.5, 1.5, 1.2, 1.3, 1.1, 2, 3,9],                 "value": [0.469112, -0.282863, -1.509059, -1.135632, 1.212112,0.469112, -0.282863, -1.509059, -1.135632, 1.212112, -0.173215,                           0.119209, -1.044236, -0.861849, -0.234,0.469112, -0.282863, -1.509059, -1.135632, 1.212112, -0.173215,                           0.119209, -1.044236, -0.861849, 0.332,0.87]})  

where the df would be

print(df)      date variable   no     value  0      1        A  1.0  0.469112  1      1        A  1.0  0.469112  2      3        A  1.5 -1.135632  3      4        A  1.5  1.212112  4      3        A  1.5 -1.135632  5      4        A  1.5  1.212112  6      2        A  2.2 -0.282863  7      2        A  2.2 -0.282863  8      3        A  3.5 -1.509059  9      3        A  3.5 -1.509059  10     4        B  1.0  0.469112  11     1        B  1.1 -1.044236  12     1        B  1.2 -0.173215  13     1        B  1.3  0.119209  14     4        B  2.0 -0.861849  15     4        B  3.0 -0.234000  16     1        C  1.5 -1.135632  17     2        C  1.5  1.212112  18     1        C  2.2 -0.282863  19     1        C  3.5 -1.509059  20     3        D  1.1 -1.044236  21     2        D  1.2 -0.173215  22     3        D  1.3  0.119209  23     3        D  2.0 -0.861849  24     4        D  3.0  0.332000  25     4        D  9.0  0.870000  

And then I wanna:

  1. Sort based on columns variable and no,
  2. Trim each group until the last row meets a condition, say, I would like to trim the group (by single column, say variable) until the last row where the value in column value is greater than 0, in other words, to drop rest of rows after the last row that meets the condition.

I have tried groupby-apply

df.groupby('variable', as_index=False).apply(      lambda x: x.iloc[: x.where(x['value'] > 0).last_valid_index() + 1,  ]))  

but the result is incorrect:

      date variable   no     value  0 0      1        A  1.0  0.469112    1      1        A  1.0  0.469112    2      3        A  1.5 -1.135632    3      4        A  1.5  1.212112    4      3        A  1.5 -1.135632    5      4        A  1.5  1.212112  1 10     4        B  1.0  0.469112    11     1        B  1.1 -1.044236    12     1        B  1.2 -0.173215    13     1        B  1.3  0.119209    14     4        B  2.0 -0.861849    15     4        B  3.0 -0.234000  2 16     1        C  1.5 -1.135632    17     2        C  1.5  1.212112    18     1        C  2.2 -0.282863    19     1        C  3.5 -1.509059  3 20     3        D  1.1 -1.044236    21     2        D  1.2 -0.173215    22     3        D  1.3  0.119209    23     3        D  2.0 -0.861849    24     4        D  3.0  0.332000    25     4        D  9.0  0.870000  

as you may see the end of group B and C are not greater than 0.

Anyone who could provide a solution and explain why my solution does not work would be highly appreciated.

Plus. Since the size of dataframe is way larger than the example here, I assume we had better not reverse the dataframe.

https://stackoverflow.com/questions/66466970/how-to-trim-util-the-last-row-meet-condition-within-groups-in-pandas March 04, 2021 at 08:41AM

没有评论:

发表评论