2021年5月4日星期二

Reshape wide to long for many columns with a common prefix

My frame has many pairs of identically named columns, with the only difference being the prefix. For example, player1.player.id and player2.player.id.

Here's an example (with fewer and shorter columns):

pd.DataFrame({'p1.a': {0: 4, 1: 0}, 'p1.b': {0: 1, 1: 4},                'p1.c': {0: 2, 1: 8}, 'p1.d': {0: 3, 1: 12},                'p1.e': {0: 4, 1: 16}, 'p1.f': {0: 5, 1: 20},                'p1.g': {0: 6, 1: 24},                'p2.a': {0: 0, 1: 0}, 'p2.b': {0: 3, 1: 12},                'p2.c': {0: 6, 1: 24}, 'p2.d': {0: 9, 1: 36},                'p2.e': {0: 12, 1: 48}, 'p2.f': {0: 15, 1: 60},                'p2.g': {0: 18, 1: 72}})       p1.a  p1.b  p1.c  p1.d  p1.e  p1.f  p1.g  p2.a  p2.b  p2.c  p2.d  p2.e  p2.f  p2.g  0     4     1     2     3     4     5     6     0     3     6     9    12    15    18  1     0     4     8    12    16    20    24     0    12    24    36    48    60    72  

I'd like to turn it into a long format, with a new side column denoting either p1 or p2. I have several crappy ways of doing it, for example:

df1 = df.filter(regex='^p1.*').assign(side='p1')  df2 = df.filter(regex='^p2.*').assign(side='p2')  df1.columns = [c.replace('p1.', '') for c in df1.columns]  df2.columns = [c.replace('p2.', '') for c in df2.columns]  pd.concat([df1, df2]).head()       a   b   c   d   e   f   g side  0  4   1   2   3   4   5   6   p1  1  0   4   8  12  16  20  24   p1  0  0   3   6   9  12  15  18   p2  1  0  12  24  36  48  60  72   p2  

This feels non-idiomatic, and I couldn't get pd.wide_to_long() to work here.

I'd appreciate an answer which also handles arbitrary substrings, not just prefix, i.e., I'm also interested in something like this:

   foo.p1.a  foo.p1.b  foo.p1.c  foo.p1.d  foo.p1.e  foo.p1.f  foo.p1.g  foo.p2.a  foo.p2.b  foo.p2.c  foo.p2.d  foo.p2.e  foo.p2.f  foo.p2.g  0         4         1         2         3         4         5         6         0         3         6         9        12        15        18  1         0         4         8        12        16        20        24         0        12        24        36        48        60        72  

Turning into:

   foo.a  foo.b  foo.c  foo.d  foo.e  foo.f  foo.g side  0      4      1      2      3      4      5      6   p1  1      0      4      8     12     16     20     24   p1  0      0      3      6      9     12     15     18   p2  1      0     12     24     36     48     60     72   p2  

But if there's an idiomatic way to handle prefixes whereas substrings require complexity, I'd appreciate learning about both.

What's the idiomatic (pythonic? pandonic?) way of doing this?

https://stackoverflow.com/questions/67393474/reshape-wide-to-long-for-many-columns-with-a-common-prefix May 05, 2021 at 07:41AM

没有评论:

发表评论