2021年5月4日星期二

Cumulative sum that resets on new value, without a loop

I need help making a cumulative sum that resets when a condition changes. I've used a loop but the dataset is too heavy. Also tried using both these links but I can't get my desired output. I can't remove duplicates from the dataframe because I need information that's not shown in the example table.

Cumulative sum that resets based on the year

Pandas cumulative count

I have the columns License Plate and Transp Doc and need a new one called Deliveries, the condition would be: If it is the same License Plate AND the same Transp Doc it is considered 1 delivery, however if it's not the same Transp Doc add 1, and if it is not the same License Plate reset the cumulative sum. The excel formula (set on cell C3) on which I'm basing this approach would be =IF(A2=A3;IF(B2=B3;C2+0;C2+1);1)

A B C
1 License Plate Transp Doc Deliveries
2 AAA1111 65184 1
3 AAA1111 65186 2
4 AAA1111 65188 3
5 BBB2222 65195 1
6 BBB2222 65195 1
7 BBB2222 65201 2
8 CCC3333 65207 1
9 CCC3333 65207 1
10 DDD4444 65212 1
import pandas as pd      DF = pd.DataFrame({'License Plate': ["AAA1111","AAA1111","AAA1111","BBB2222","BBB2222","BBB2222","CCC3333","CCC3333","DDD4444"],            'Transp Doc': [65184,65186,65188,65195,65195,65201,65207,65207,65212],            'Deliveries': [1,2,3,1,1,2,1,1,1],            })  

The furthest I got was using this line:

DF['Deliveries'] = DF['License Plate'].eq(DF['License Plate'].shift()).cumsum()

Which just sums the if upper row is equal, without considering if Transp Doc is the same, and doesn't reset when plate changes.

https://stackoverflow.com/questions/67388590/cumulative-sum-that-resets-on-new-value-without-a-loop May 05, 2021 at 12:27AM

没有评论:

发表评论