2020年12月21日星期一

Cleaning sales data for market stall in Pandas; splitting rows

My girlfriend is an illustration/artist with a stall at a market, and I'm trying to help her with her inventory management. At the moment, she puts her sales through her credit card machine (Sum Up), and I can export the transactions in a csv in approx. the following format:

Number | Description | Amount    1 | 1 x Product A | $10    2 | 1 x Product A | $10    3 | 1 x Product B | $4    4 | 1 x Product A, 2 x Product B | $18  

.......

So far, I have figured out I can do this:

n_by_product = Transactions_df.groupby("Description")['Date'].count()  n_by_product.sort_values(ascending=False)  

to get this type of output:

Description        |             #  1 x Product A      |             2                                                 1 x Product B      |             1                                               1 x Product A,  2 x Product B |  1  

While this works well for single product transactions, the issue is separating out the products when there is more than one per transaction.

Currently, my thinking is to do something like this, to identify the separate items:

Transactions_df['Description'].str.split(',')  

Then loop through the rows of the data frame, testing if the description of that row contains two items, then if it does, copy the row in place with one of the items

Transactions_df['Description'].str.split(',')[0][0])  

and append a row to the bottom of the dataframe which has the other item:

Transactions_df['Description'].str.split(',')[0][1])    

However... I'm not exactly sure how to go about doing this, and I also feel like there should probably be a better way.

Very grateful if anyone has any suggestions on what approach I should take! Cheers!

https://stackoverflow.com/questions/65401822/cleaning-sales-data-for-market-stall-in-pandas-splitting-rows December 22, 2020 at 08:35AM

没有评论:

发表评论