2021年1月20日星期三

python Pandas - add/insert rows before or after specific sequence of column values and count number of imputed rows

I have a large dataframe consisting of equipment information, ordered in sequence of the equipment name and time.

For this analysis, the proper sequence of events is from the 'event' column's "stop" to "start" values and all the values in between. I have assigned integer values corresponding to this repeating pattern and create a new column 'sequence'. A sample portion of my dataset looks like this:

data = [['abc01', 3000.0, 'transac_complete', 'system', '13:10:37', 1],          ['abc01', 100.0, 'start', 'system', '14:00:37', 2],         ['abc01', 500.0, 'stop', 'system', '14:02:17', 3],         ['abc01', 0.3, 'a_type_1', 'a', '14:10:37', 3],         ['abc01', 0.2, 'a_type_2', 'a', '14:10:37', 3],         ['abc01', 200.0, 'start', 'system', '14:10:37', 3],         ['abc01', 1120.0, 'transac_complete', 'system', '14:13:57', 3],         ['abc01', 505.0, 'transac_complete', 'system', '14:32:37', 3],         ['abc01', 600.0, 'stop', 'system', '14:41:02', 4],         ['abc01', 0.233, 'a_type_3', 'a', '14:51:02', 4],         ['abc01', 0.244, 'b_type_1', 'b', '14:51:03', 4],         ['abc01', 5.0, 'b_type_2', 'b', '14:51:03', 4],         ['abc01', 0.33299999999999996, 'c_type_1', 'c', '14:51:08', 4],         ['abc01', 1500.0, 'start', 'system', '14:51:08', 4],         ['abc01', 24.0, 'stop', 'system', '15:16:08', 5],         ['abc01', 500.0, 'start', 'system', '15:16:32', 5],         ['abc01', 1000.0, 'stop', 'system', '15:24:52', 6],         ['abc02', 9009.0, 'transac_complete', 'system', '15:41:32', 6],         ['abc02', 66.0, 'stop', 'system', '18:11:41', 6],         ['abc02', 100.0, 'stop', 'system', '18:12:47', 6],         ['abc02', 599.0, 'start', 'system', '18:14:27', 6]]    columns = ['name', 'duration', 'event', 'err_grp', 'timestamp', 'sequence']    df = pd.DataFrame.from_records(data, columns=columns)  

To this, I realised that there are missing 'start' values and my first 2 values that do not fulfil the criteria as a sequence of events are single rowed events that isn't what I want.

  1. how do I add/insert a new row that satisfy the condition of not having a 'start' value between 2 'stop' values, whether they are consecutive or not. This new 'start' row will take on the timestamp value of the 'stop' row after, and the duration value based on the difference of timestamp of the 'stop' values before and after.

  2. how do i count either the number of missing 'start' values, or the number of 'start' values imputed, since they are the same thing.

Any help is appreciated. Thanks!

https://stackoverflow.com/questions/65820240/python-pandas-add-insert-rows-before-or-after-specific-sequence-of-column-valu January 21, 2021 at 10:08AM

没有评论:

发表评论