根据 Pandas 中的字符串列表过滤出行
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/28914078/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Filter out rows based on list of strings in Pandas
提问by geokrowding
I have a large time series data frame (called df), and the first 5 records look like this:
我有一个大的时间序列数据框(称为df),前 5 条记录如下所示:
df
stn years_of_data total_minutes avg_daily TOA_daily K_daily
date
1900-01-14 AlberniElementary 4 5745 34.100 114.600 0.298
1900-01-14 AlberniWeather 6 7129 29.500 114.600 0.257
1900-01-14 Arbutus 8 11174 30.500 114.600 0.266
1900-01-14 Arrowview 7 10080 27.600 114.600 0.241
1900-01-14 Bayside 7 9745 33.800 114.600 0.295
Goal:
目标:
I am trying to remove rows where anyof the strings in a list are present in the 'stn'column. So,I am basically trying to filter this dataset to not include rows containing any of the strings in following list.
我正在尝试删除“stn”列中存在列表中任何字符串的行。因此,我基本上是在尝试过滤此数据集以不包含包含以下列表中任何字符串的行。
Attempt:
试图:
remove_list = ['Arbutus','Bayside']
cleaned = df[df['stn'].str.contains('remove_list')]
Returns:
返回:
Out[78]:
出[78]:
stn years_of_data total_minutes avg_daily TOA_daily K_daily
date
Nothing!
没有!
I have tried a few combinations of quotes, brackets, and even a lambda function; though I am fairly new, so probably not using syntax properly..
我尝试了几种引号、括号甚至 lambda 函数的组合;虽然我是新手,所以可能没有正确使用语法..
回答by EdChum
Use isin:
使用isin:
cleaned = df[~df['stn'].isin(remove_list)]
In [7]:
remove_list = ['Arbutus','Bayside']
df[~df['stn'].isin(remove_list)]
Out[7]:
stn years_of_data total_minutes avg_daily \
date
1900-01-14 AlberniElementary 4 5745 34.1
1900-01-14 AlberniWeather 6 7129 29.5
1900-01-14 Arrowview 7 10080 27.6
TOA_daily K_daily
date
1900-01-14 114.6 0.298
1900-01-14 114.6 0.257
1900-01-14 114.6 0.241
回答by rajan
Had a similar question, found this old thread, I think there are other ways to get the same result. My issue with @EdChum's solution for my particular application is that I don't have a list that will be matched exactly. If you have the same issue, .isinisn't meant for that application.
有一个类似的问题,找到了这个旧线程,我认为还有其他方法可以获得相同的结果。我对@EdChum 针对我的特定应用程序的解决方案的问题是,我没有可以完全匹配的列表。如果您有同样的问题,.isin则不适用于该应用程序。
Instead, you can also try a few options, including a numpy.where:
相反,您还可以尝试一些选项,包括 numpy.where:
removelist = ['ayside','rrowview']
df['flagCol'] = numpy.where(df.stn.str.contains('|'.join(remove_list)),1,0)
Note that this solution doesn't actually remove the matching rows, just flags them. You can copy/slice/drop as you like.
请注意,此解决方案实际上并没有删除匹配的行,只是标记它们。您可以根据需要复制/切片/删除。
This solution would be useful in the case that you don't know, for example, if the station names are capitalized or not and don't want to go through standardizing text beforehand. numpy.whereis usually pretty fast as well, probably not much different from .isin.
此解决方案在您不知道的情况下很有用,例如,站名是否大写并且不想事先通过标准化文本。numpy.where通常也很快,可能与.isin.

