根据 Pandas 中的字符串列表过滤出行

Question

提问by geokrowding

I have a large time series data frame (called df), and the first 5 records look like this:

我有一个大的时间序列数据框（称为df），前 5 条记录如下所示：

df

         stn     years_of_data  total_minutes avg_daily TOA_daily   K_daily
date                        
1900-01-14  AlberniElementary      4    5745    34.100  114.600 0.298
1900-01-14  AlberniWeather         6    7129    29.500  114.600 0.257
1900-01-14  Arbutus                8    11174   30.500  114.600 0.266
1900-01-14  Arrowview              7    10080   27.600  114.600 0.241
1900-01-14  Bayside                7    9745    33.800  114.600 0.295

Goal:

目标：

I am trying to remove rows where anyof the strings in a list are present in the 'stn'column. So,I am basically trying to filter this dataset to not include rows containing any of the strings in following list.

我正在尝试删除“stn”列中存在列表中任何字符串的行。因此，我基本上是在尝试过滤此数据集以不包含包含以下列表中任何字符串的行。

Attempt:

试图：

remove_list = ['Arbutus','Bayside']

cleaned = df[df['stn'].str.contains('remove_list')]

Returns:

返回：

Out[78]:

出[78]：

stn years_of_data   total_minutes   avg_daily   TOA_daily   K_daily
date

Nothing!

没有！

I have tried a few combinations of quotes, brackets, and even a lambda function; though I am fairly new, so probably not using syntax properly..

我尝试了几种引号、括号甚至 lambda 函数的组合；虽然我是新手，所以可能没有正确使用语法..

Answer 1

回答by EdChum

Use isin:

使用isin：

cleaned = df[~df['stn'].isin(remove_list)]

In [7]:

remove_list = ['Arbutus','Bayside']
df[~df['stn'].isin(remove_list)]
Out[7]:
                          stn  years_of_data  total_minutes  avg_daily  \
date                                                                     
1900-01-14  AlberniElementary              4           5745       34.1   
1900-01-14     AlberniWeather              6           7129       29.5   
1900-01-14          Arrowview              7          10080       27.6   

            TOA_daily  K_daily  
date                            
1900-01-14      114.6    0.298  
1900-01-14      114.6    0.257  
1900-01-14      114.6    0.241

Answer 2

回答by rajan

Had a similar question, found this old thread, I think there are other ways to get the same result. My issue with @EdChum's solution for my particular application is that I don't have a list that will be matched exactly. If you have the same issue, .isinisn't meant for that application.

有一个类似的问题，找到了这个旧线程，我认为还有其他方法可以获得相同的结果。我对@EdChum 针对我的特定应用程序的解决方案的问题是，我没有可以完全匹配的列表。如果您有同样的问题，.isin则不适用于该应用程序。

Instead, you can also try a few options, including a numpy.where:

相反，您还可以尝试一些选项，包括 numpy.where：

  removelist = ['ayside','rrowview']
  df['flagCol'] = numpy.where(df.stn.str.contains('|'.join(remove_list)),1,0)

Note that this solution doesn't actually remove the matching rows, just flags them. You can copy/slice/drop as you like.

请注意，此解决方案实际上并没有删除匹配的行，只是标记它们。您可以根据需要复制/切片/删除。

This solution would be useful in the case that you don't know, for example, if the station names are capitalized or not and don't want to go through standardizing text beforehand. numpy.whereis usually pretty fast as well, probably not much different from .isin.

此解决方案在您不知道的情况下很有用，例如，站名是否大写并且不想事先通过标准化文本。numpy.where通常也很快，可能与.isin.

根据 Pandas 中的字符串列表过滤出行

提问by geokrowding

回答by EdChum

回答by rajan

相关推荐

最近更新

标签

根据 Pandas 中的字符串列表过滤出行

提问by geokrowding

回答by EdChum

回答by rajan

相关推荐

使用 Bokeh 绘制整个 Pandas DataFrame

pandas 在多个程序中正确使用 Scikit 的 LabelEncoder

pandas 如何将函数应用于适当的数据框

如何删除 Pandas 数据帧索引的“秒”？

相关推荐

最近更新

标签