Python Pandas:过滤数据框

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/26249574/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-13 22:33:22  来源:igfitidea点击:

Python Pandas: Filtering a data frame

pythonpandassubsetdata-cleaning

提问by kk415kk

I'm pretty new to Pandas but wanted to try it out after working with R for a while.

我对 Pandas 还很陌生,但在使用 R 一段时间后想尝试一下。

A problem I'm having is figuring out why a filter isn't working for one of my data frames. I have a data frame data_dfwith multiple columns, one of which is cwhich holds country names. I'm trying to filter out the rows where c == None.

我遇到的一个问题是弄清楚为什么过滤器对我的一个数据框不起作用。我有一个data_df包含多列的数据框,其中一列c包含国家/地区名称。我正在尝试过滤掉c == None.

My first attempt was to do this:

我的第一次尝试是这样做:

countries_df = data_df[data_df.c != None]

However, that yielded 0 rows. This, however, worked:

但是,这产生了 0 行。然而,这奏效了:

countries_df = data_df[~data_df.c.isin([None])]

Can someone explain why? It seems that from the Pandas doc, the first should be able to filter correctly.

有人可以解释为什么吗?从 Pandas 文档看来,第一个应该能够正确过滤。

Some sample rows:

一些示例行:

  _heartbeat_                           a                    al     c      cy     g
0   NaN Mozilla/5.0 (Linux; U; Android 4.1.2; en-us; H...   en-US   US  Anaheim 15r91
1   NaN Mozilla/4.0 (compatible; MSIE 7.0; Windows NT ...   en-us   None    NaN ifIpBW
2   NaN Mozilla/5.0 (Windows NT 6.1; rv:21.0) Gecko/20...   en-US,en;q=0.5  US  Fort Huachuca   10DaxOu
3   NaN Mozilla/5.0 (Linux; U; Android 4.1.2; en-us; S...   en-US   US  Houston TysVFU
4   NaN Opera/9.80 (Android; Opera Mini/7.5.33286/29.3...   en  None    NaN 10IGW7m
5   NaN Mozilla/5.0 (compatible; MSIE 10.0; Windows NT...   en-US   US  Mishawaka   13GrCeP
6   NaN Mozilla/5.0 (Windows NT 6.1; WOW64; rv:20.0) G...   en-US,en;q=0.5  US  Hammond YmtpnZ
7   NaN Mozilla/5.0 (iPhone; U; CPU iPhone OS 4_3_5 li...   en-us   None    NaN 13oM0hV
8   NaN Mozilla/5.0 (iPhone; CPU iPhone OS 6_1_3 like ...   en-us   AU  Sydney  15r91
9   NaN Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKi...   en-US,en;q=0.8  None    NaN 109LtDc
10  NaN Mozilla/5.0 (iPhone; CPU iPhone OS 6_1_3 like ...   en-us   US  Middletown  109ar5F
11  NaN Mozilla/5.0 (iPhone; CPU iPhone OS 6_1_3 like ...   en-us   US  Germantown  107xZnW

回答by BrenBarn

It appears that pandas and Numpy treat Nonespecially when comparing for equality. In pandas, Noneis supposed to be like NaN, representing a missing value. To find rows where the value is not None (or nan), you could do data_df[data_df.c.notnull()](or data_df[~data_df.c.isnull()]).

None在比较相等性时,pandas 和 Numpy 似乎特别对待。在Pandas中,None应该像 NaN 一样,代表一个缺失值。要查找值不是 None(或nan)的行,您可以执行data_df[data_df.c.notnull()](或data_df[~data_df.c.isnull()])。