2021年2月11日星期四

How to test if a df['column'] contains one of the substrings in a list, in pandas?

I have a pd dataframe that has a column that contains values as ['cat, pet','dog, pet','dog','bird', 'bird, pet','tail', 'cat, tail'], and I want to find all places where s contains both of ['cat', 'pet'], and extract the rows that match.

NOTICE that this question is not made for the specific case ['cat', 'pet'], but for a dynamic input that could handle any combination of lists, and the dataframe is more than 10.000 rows long.

My goal is to filter the rows based on values contained in a specific column

I know that if I want to find 'cat' OR 'pet', I just filter like:

 search = ['cat', 'pet']   df[df['column'].str.contains('|'.join(search))]  

But what if I want to match 'cat' AND 'pet' or other lists with different combinations of values??

I tried:

df[df['column'].str.contains('&'.join(search))]  But it is not working for me :/  

Also tried:

np.logical_and.reduce([df['column'].str.contains(word) for word in search])  
https://stackoverflow.com/questions/66165025/how-to-test-if-a-dfcolumn-contains-one-of-the-substrings-in-a-list-in-panda February 12, 2021 at 08:43AM

没有评论:

发表评论