Problem Description: Suppose I have the following dataframe
df = pd.DataFrame({"date": [1,2,3,3,4,1,2,3,3,4,1,1,1,4,4,4,1,1,1,2,2,3,3,3,4,4], "variable": ["A", "A", "A","A","A","A", "A", "A","A","A", "B", "B", "B","B","B","B" ,"C", "C", "C","C", "D","D","D","D","D","D"], "no": [1, 2.2, 3.5, 1.5, 1.5,1, 2.2, 3.5, 1.5, 1.5, 1.2, 1.3, 1.1, 2, 3,1, 2.2, 3.5, 1.5, 1.5, 1.2, 1.3, 1.1, 2, 3,9], "value": [0.469112, -0.282863, -1.509059, -1.135632, 1.212112,0.469112, -0.282863, -1.509059, -1.135632, 1.212112, -0.173215, 0.119209, -1.044236, -0.861849, -0.234,0.469112, -0.282863, -1.509059, -1.135632, 1.212112, -0.173215, 0.119209, -1.044236, -0.861849, 0.332,0.87]}) where the df would be
print(df) date variable no value 0 1 A 1.0 0.469112 1 1 A 1.0 0.469112 2 3 A 1.5 -1.135632 3 4 A 1.5 1.212112 4 3 A 1.5 -1.135632 5 4 A 1.5 1.212112 6 2 A 2.2 -0.282863 7 2 A 2.2 -0.282863 8 3 A 3.5 -1.509059 9 3 A 3.5 -1.509059 10 4 B 1.0 0.469112 11 1 B 1.1 -1.044236 12 1 B 1.2 -0.173215 13 1 B 1.3 0.119209 14 4 B 2.0 -0.861849 15 4 B 3.0 -0.234000 16 1 C 1.5 -1.135632 17 2 C 1.5 1.212112 18 1 C 2.2 -0.282863 19 1 C 3.5 -1.509059 20 3 D 1.1 -1.044236 21 2 D 1.2 -0.173215 22 3 D 1.3 0.119209 23 3 D 2.0 -0.861849 24 4 D 3.0 0.332000 25 4 D 9.0 0.870000 And then I wanna:
- Sort based on columns
variableandno, - Trim each group until the last row meets a condition, say, I would like to trim the group (by single column, say
variable) until the last row where the value in columnvalueis greater than 0, in other words, to drop rest of rows after the last row that meets the condition.
I have tried groupby-apply
df.groupby('variable', as_index=False).apply( lambda x: x.iloc[: x.where(x['value'] > 0).last_valid_index() + 1, ])) but the result is incorrect:
date variable no value 0 0 1 A 1.0 0.469112 1 1 A 1.0 0.469112 2 3 A 1.5 -1.135632 3 4 A 1.5 1.212112 4 3 A 1.5 -1.135632 5 4 A 1.5 1.212112 1 10 4 B 1.0 0.469112 11 1 B 1.1 -1.044236 12 1 B 1.2 -0.173215 13 1 B 1.3 0.119209 14 4 B 2.0 -0.861849 15 4 B 3.0 -0.234000 2 16 1 C 1.5 -1.135632 17 2 C 1.5 1.212112 18 1 C 2.2 -0.282863 19 1 C 3.5 -1.509059 3 20 3 D 1.1 -1.044236 21 2 D 1.2 -0.173215 22 3 D 1.3 0.119209 23 3 D 2.0 -0.861849 24 4 D 3.0 0.332000 25 4 D 9.0 0.870000 as you may see the end of group B and C are not greater than 0.
Anyone who could provide a solution and explain why my solution does not work would be highly appreciated.
Plus. Since the size of dataframe is way larger than the example here, I assume we had better not reverse the dataframe.
https://stackoverflow.com/questions/66466970/how-to-trim-util-the-last-row-meet-condition-within-groups-in-pandas March 04, 2021 at 08:41AM
没有评论:
发表评论