I need help making a cumulative sum that resets when a condition changes. I've used a loop but the dataset is too heavy. Also tried using both these links but I can't get my desired output. I can't remove duplicates from the dataframe because I need information that's not shown in the example table.
Cumulative sum that resets based on the year
I have the columns License Plate
and Transp Doc
and need a new one called Deliveries
, the condition would be: If it is the same License Plate AND the same Transp Doc it is considered 1 delivery, however if it's not the same Transp Doc add 1, and if it is not the same License Plate reset the cumulative sum. The excel formula (set on cell C3) on which I'm basing this approach would be =IF(A2=A3;IF(B2=B3;C2+0;C2+1);1)
A | B | C | |
---|---|---|---|
1 | License Plate | Transp Doc | Deliveries |
2 | AAA1111 | 65184 | 1 |
3 | AAA1111 | 65186 | 2 |
4 | AAA1111 | 65188 | 3 |
5 | BBB2222 | 65195 | 1 |
6 | BBB2222 | 65195 | 1 |
7 | BBB2222 | 65201 | 2 |
8 | CCC3333 | 65207 | 1 |
9 | CCC3333 | 65207 | 1 |
10 | DDD4444 | 65212 | 1 |
import pandas as pd DF = pd.DataFrame({'License Plate': ["AAA1111","AAA1111","AAA1111","BBB2222","BBB2222","BBB2222","CCC3333","CCC3333","DDD4444"], 'Transp Doc': [65184,65186,65188,65195,65195,65201,65207,65207,65212], 'Deliveries': [1,2,3,1,1,2,1,1,1], })
The furthest I got was using this line:
DF['Deliveries'] = DF['License Plate'].eq(DF['License Plate'].shift()).cumsum()
Which just sums the if upper row is equal, without considering if Transp Doc is the same, and doesn't reset when plate changes.
https://stackoverflow.com/questions/67388590/cumulative-sum-that-resets-on-new-value-without-a-loop May 05, 2021 at 12:27AM
没有评论:
发表评论