2021年1月28日星期四

Comparing pandas DataFrames where column values are lists

I have some chemical data that I'm trying to process using Pandas. I have two dataframes:

C_atoms_all.head()       id_all  index_all label_all species_all                   position  0    217          1         C           C    [6.609, 6.6024, 19.3301]  1    218          2         C           C  [4.8792, 11.9845, 14.6312]  2    219          3         C           C  [4.8373, 10.7563, 13.9466]  3    220          4         C           C  [4.7366, 10.9327, 12.5408]  4   6573          5         C           C  [1.9482, -3.8747, 19.6319]    C_atoms_a.head()      id_a  index_a label_a species_a                    position  0   55        1       C         C    [6.609, 6.6024, 19.3302]  1   56        2       C         C  [4.8792, 11.9844, 14.6313]  2   57        3       C         C  [4.8372, 10.7565, 13.9467]  3   58        4       C         C  [4.7367, 10.9326, 12.5409]  4   59        5       C         C  [5.1528, 15.5976, 14.1249]  

What I want to do is get a mapping of all of the id_all values to the id_a values where their position matches. You can see that for C_atoms_all.iloc[0]['id_all'] (which returns 55) and the same query for C_atoms_a, the position values match (within a small fudge factor), which I should also include in the query.

The problem I'm having is that I can't merge or filter on the position columns because lists aren't hashable in Python.

I'd ideally like to return a dataframe that looks like so:

  id_all  id_a                    position       217    55    [6.609, 6.6024, 19.3301]       ...   ...                        ...  

for every row where the position values match.

https://stackoverflow.com/questions/65947119/comparing-pandas-dataframes-where-column-values-are-lists January 29, 2021 at 08:19AM

没有评论:

发表评论