2021年1月26日星期二

Find pattern in pandas dataframe, reorder it row-wise, and reset index

This is a multipart problem. I have found solutions for each separate part, but when I try to combine these solutions, I don't get the outcome I want.

Let's say this is my dataframe:

df = pd.DataFrame(list(zip([1, 3, 6, 7, 7, 8, 4], [6, 7, 7, 9, 5, 3, 1])), columns = ['Values', 'Vals'])  df        Values  Vals  0     1     6  1     3     7  2     6     7  3     7     9  4     7     5  5     8     3  6     4     1  

Let's say I want to find the pattern [6, 7, 7] in the 'Values' column. I can use a modified version of the second solution given here: Pandas: How to find a particular pattern in a dataframe column?

pattern = [6, 7, 7]    pat_i = [df[i-len(pattern):i] # Get the index    for i in range(len(pattern), len(df)) # for each 3 consequent elements    if all(df['Values'][i-len(pattern):i] == pattern)] # if the pattern matched  pat_i    [   Values  Vals   2       6     7   3       7     9   4       7     5]  

The only way I've found to narrow this down to just index values is the following:

pat_i = [df.index[i-len(pattern):i] # Get the index    for i in range(len(pattern), len(df)) # for each 3 consequent elements    if all(df['Values'][i-len(pattern):i] == pattern)] # if the pattern matched  pat_i    [RangeIndex(start=2, stop=5, step=1)]  

Once I've found the pattern, what I want to do, within the original dataframe, is reorder the pattern to [7, 7, 6], moving the entire associated rows as I do this. In other words, going by the index, I want to get output that looks like this:

df.reindex([0, 1, 3, 4, 2, 5, 6])        Values  Vals  0     1     6  1     3     7  3     7     9  4     7     5  2     6     7  5     8     3  6     4     1  

Then, finally, I want to reset the index so that the values in all the columns stay in the new re-ordered place;

    Values  Vals  0     1     6  1     3     7  2     7     9  3     7     5  4     6     7  5     8     3  6     4     1  

In order to use pat_i as a basis for re-ordering, I've tried to modify the second solution given here: Python Pandas: How to move one row to the first row of a Dataframe?

target_row = 2  # Move target row to first element of list.  idx = [target_row] + [i for i in range(len(df)) if i != target_row]  

However, I can't figure out how to exploit the pat_i RangeIndex object to use it with this code. The solution, when I find it, will be applied to hundreds of dataframes, each one of which will contain the [6, 7, 7] pattern that needs to be re-ordered in one place, but not the same place in each dataframe.

Any help appreciated...and I'm sure there must be an elegant, pythonic way of doing this, as it seems like it should be a common enough challenge. Thank you.

https://stackoverflow.com/questions/65911081/find-pattern-in-pandas-dataframe-reorder-it-row-wise-and-reset-index January 27, 2021 at 08:02AM

没有评论:

发表评论