Python & Pandas:如何查询列表类型的列是否包含某些内容?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/41518920/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Python & Pandas: How to query if a list-type column contains something?
提问by cqcn1991
I have a dataframe, which contains info about movies. It has a column called genre
, which contains a list of genres it belongs to. For example:
我有一个数据框,其中包含有关电影的信息。它有一个名为 的列genre
,其中包含它所属的流派列表。例如:
df['genre']
## returns
0 ['comedy', 'sci-fi']
1 ['action', 'romance', 'comedy']
2 ['documentary']
3 ['crime','horror']
...
I want to know how can I query the dataframe, so it returns the movie belongs to a cerain genre?
我想知道如何查询数据框,以便它返回属于某种类型的电影?
For example, something may like df['genre'].contains('comedy')
returns 0 or 1.
例如,某些东西可能喜欢df['genre'].contains('comedy')
返回 0 或 1。
I know for a list, I can do things like:
我知道一个列表,我可以做这样的事情:
'comedy' in ['comedy', 'sci-fi']
However, in pandas, I didn't find something similar, the only thing I know is df['genre'].str.contains()
, but it didn't work for the list type.
但是,在Pandas中,我没有找到类似的东西,我唯一知道的是df['genre'].str.contains()
,但它不适用于列表类型。
回答by jezrael
You can use apply
for create mask
and then boolean indexing
:
您可以使用apply
for createmask
然后boolean indexing
:
mask = df.genre.apply(lambda x: 'comedy' in x)
df1 = df[mask]
print (df1)
genre
0 [comedy, sci-fi]
1 [action, romance, comedy]
回答by piRSquared
using sets
使用集合
df.genre.map(set(['comedy']).issubset)
0 True
1 True
2 False
3 False
dtype: bool
df.genre[df.genre.map(set(['comedy']).issubset)]
0 [comedy, sci-fi]
1 [action, romance, comedy]
dtype: object
presented in a way I like better
以我更喜欢的方式呈现
comedy = set(['comedy'])
iscomedy = comedy.issubset
df[df.genre.map(iscomedy)]
more efficient
更高效
comedy = set(['comedy'])
iscomedy = comedy.issubset
df[[iscomedy(l) for l in df.genre.values.tolist()]]
using str
in two passes
slow! and not perfectly accurate!
使用str
两种经过
慢!而且不完全准确!
df[df.genre.str.join(' ').str.contains('comedy')]
回答by HYRY
According to the source code, you can use .str.contains(..., regex=False)
.
根据源代码,您可以使用.str.contains(..., regex=False)
.
回答by Adrien Renaud
A complete example:
一个完整的例子:
import pandas as pd
data = pd.DataFrame([[['foo', 'bar']],
[['bar', 'baz']]], columns=['list_column'])
print(data)
list_column
0 [foo, bar]
1 [bar, baz]
filtered_data = data.loc[
lambda df: df.list_column.apply(
lambda l: 'foo' in l
)
]
print(filtered_data)
list_column
0 [foo, bar]
回答by bloodrootfc
One liner using boolean indexing and list comprehension:
一个使用布尔索引和列表理解的班轮:
searchTerm = 'something'
df[[searchTerm in x for x in df['arrayColumn']]]