2021年1月5日星期二

python - combined 3 data frames, but need to realign data by values in 1 column

I have several data sources that i'm trying to work with - i asked a related question a couple of days ago (click here!

so i have 3 dataframes each with a 'user_id' column that is common across all 3 data frames, but not all dataframes are exactly the same size.

I didn't realize it at first and used pd.concat combine them but they aren't lined up by user_id, and i'm not sure how to accomplish that.

here is some sample data from each, and sample data from the resulting concat (perhaps that is helpful?)

df1:

        user_id duration      0   1000    116.830000      1   1001    328.092000      2   1002    259.043333      3   1003    1041.000000      4   1004    327.368750      5   1005    470.220000      6   1006    32.055000      7   1007    496.830000      8   1008    491.103333      9   1009    698.710000  

df2:

user_id mb_used  0   1000    1902.000000  1   1001    16088.200000  2   1002    13432.000000  3   1003    27045.000000  4   1004    19544.500000  5   1005    17141.000000  6   1006    17094.000000  7   1007    28770.800000  8   1008    18491.333333  9   1009    23405.125000  

df3:

    user_id id  0   1000    11.000000  1   1001    41.400000  2   1002    29.333333  3   1003    50.000000  4   1004    22.125000  5   1005    11.000000  6   1006    77.000000  7   1007    51.000000  8   1008    28.000000  9   1011    53.000000    df 4 = pd.concat([df1,df2,df3],axis=1)  

df4 result:

   user_id     duration  user_id       mb_used  user_id         id  0   1000.0   116.830000     1000   1902.000000   1000.0  11.000000  1   1001.0   328.092000     1001  16088.200000   1001.0  41.400000  2   1002.0   259.043333     1002  13432.000000   1002.0  29.333333  3   1003.0  1041.000000     1003  27045.000000   1003.0  50.000000  4   1004.0   327.368750     1004  19544.500000   1004.0  22.125000  5   1005.0   470.220000     1005  17141.000000   1005.0  11.000000  6   1006.0    32.055000     1006  17094.000000   1006.0  77.000000  7   1007.0   496.830000     1007  28770.800000   1007.0  51.000000  8   1008.0   491.103333     1008  18491.333333   1008.0  28.000000  **9   1009.0   698.710000     1009  23405.125000   1011.0  53.000000**  

is there something i've done wrong or could add to line them by that shared user_id or should i be using a different method? i'll be honest - i started with pd.merge but quickly realized i was in over my head in trying to structure that, but if that is the only way (or the best way) i'll take another crack at it.

thanks in advance for your time, and i apologize for what is likely a lack of proper terminology, i am quite new at python (and programming in general)

https://stackoverflow.com/questions/65588762/python-combined-3-data-frames-but-need-to-realign-data-by-values-in-1-column January 06, 2021 at 09:01AM

没有评论:

发表评论