pandas 在熊猫中过滤 - 如何应用自定义方法(lambda)?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/32968747/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Filtering in pandas - how to apply a custom method (lambda)?
提问by アレックス
I have a DataFrame where one of the columns contains an string which contains words delimited by comma.
我有一个 DataFrame,其中一列包含一个字符串,其中包含以逗号分隔的单词。
>>> df['column1']
# ....
996 str1, str2, str3
997 str4, str5, str7
998 str8, str9, str10
# ...........
I need to treat the content of that column as an array of string so I can do this:
我需要将该列的内容视为字符串数组,以便我可以这样做:
[
# .....
& (df['column1'].isin('str2')) # should return the row #996
# ....
]
I tried this but it hasn't panned out, of course:
我试过这个,但它没有成功,当然:
[
# .....
& (df['column1'].split(',').isin('str2'))
# ....
]
How can I do that? Or rather how can I use a method (lambda) to modify the content of the column before filtering?
我怎样才能做到这一点?或者更确切地说,如何在过滤之前使用方法 (lambda) 修改列的内容?
UPDATE1:
更新1:
This is a part of my code:
这是我的代码的一部分:
for x in pd.read_csv.....
df_item = x
if filter1:
df_item = df_item[(df_item['column1'] == filter1)]
if filter2:
df_item = df_item[(df_item['column2'].isin(subjects))]
# .....
How can I apply df['column2'].apply(lambda x: 'str2' in x.split(','))to
我怎样才能申请df['column2'].apply(lambda x: 'str2' in x.split(','))到
if filter2:
df_item = df_item[(df_item['column2'].isin(subjects))]
回答by Anand S Kumar
isinchecks whether the value from the series is in the iterable (in your case 'str2') . Not whether str2is contained in your series' value.
isin检查系列中的值是否在可迭代中(在您的情况下'str2')。不是是否str2包含在您的系列值中。
If your series contains strings, then a method to get what you want would be to use .str.contains()to check whether the string contains str2. Example -
如果您的系列包含字符串,那么获取您想要的内容的方法将.str.contains()用于检查字符串是否包含str2. 例子 -
df['column1'].str.contains('str2')
If you must split the contents use ','(that is if str2can be a substring of any of the other strings) . You can use Series.apply. Example -
如果您必须拆分内容使用','(即 ifstr2可以是任何其他字符串的子字符串)。您可以使用Series.apply. 例子 -
df['column1'].apply(lambda x: 'str2' in x.split(','))
To apply this, simply use this to filter the DataFrame. Example -
要应用它,只需使用它来过滤 DataFrame。例子 -
if <somefilter>:
df_item = df_item[df_item['column2'].apply(lambda x: 'str2' in x.split(','))]

