python pandas:过滤掉给定字段的空或空字符串记录

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/39475566/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 02:00:28  来源:igfitidea点击:

python pandas: filter out records with null or empty string for a given field

pythonpandasdataframe

提问by Edamame

I am trying to filter out records whose field_A is null or empty string in the data frame like below:

我试图过滤掉数据框中 field_A 为空或空字符串的记录,如下所示:

my_df[my_df.editions is not None]
my_df.shape

This gives me error:

这给了我错误:

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-40-e1969e0af259> in <module>()
      1 my_df['editions'] = my['editions'].astype(str)
----> 2 my_df = my_df[my_df.editions is not None]
      3 my_df.shape

/home/edamame/anaconda2/lib/python2.7/site-packages/pandas/core/frame.pyc in __getitem__(self, key)
   1995             return self._getitem_multilevel(key)
   1996         else:
-> 1997             return self._getitem_column(key)
   1998 
   1999     def _getitem_column(self, key):

/home/edamame/anaconda2/lib/python2.7/site-packages/pandas/core/frame.pyc in _getitem_column(self, key)
   2002         # get column
   2003         if self.columns.is_unique:
-> 2004             return self._get_item_cache(key)
   2005 
   2006         # duplicate columns & possible reduce dimensionality

/home/edamame/anaconda2/lib/python2.7/site-packages/pandas/core/generic.pyc in _get_item_cache(self, item)
   1348         res = cache.get(item)
   1349         if res is None:
-> 1350             values = self._data.get(item)
   1351             res = self._box_item_values(item, values)
   1352             cache[item] = res

/home/edamame/anaconda2/lib/python2.7/site-packages/pandas/core/internals.pyc in get(self, item, fastpath)
   3288 
   3289             if not isnull(item):
-> 3290                 loc = self.items.get_loc(item)
   3291             else:
   3292                 indexer = np.arange(len(self.items))[isnull(self.items)]

/home/edamame/anaconda2/lib/python2.7/site-packages/pandas/indexes/base.pyc in get_loc(self, key, method, tolerance)
   1945                 return self._engine.get_loc(key)
   1946             except KeyError:
-> 1947                 return self._engine.get_loc(self._maybe_cast_indexer(key))
   1948 
   1949         indexer = self.get_indexer([key], method=method, tolerance=tolerance)

pandas/index.pyx in pandas.index.IndexEngine.get_loc (pandas/index.c:4154)()

pandas/index.pyx in pandas.index.IndexEngine.get_loc (pandas/index.c:4018)()

pandas/hashtable.pyx in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:12368)()

pandas/hashtable.pyx in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:12322)()

KeyError: True

or

或者

my_df[my_df.editions != None]
my_df.shape

This one gave no error but didn't filter out any None values.

这个没有给出错误,但没有过滤掉任何 None 值。

I also tried:

我也试过:

my_df = my_df[my_df.editions.notnull()]

This one doesn't give error but doesn't filter out any None values either.

这个不会出错,但也不会过滤掉任何 None 值。

Could anyone please advise how to solve this problem? Thanks!

任何人都可以请教如何解决这个问题?谢谢!

采纳答案by MattR

Can you create a new dataframe from the filtering?

你能从过滤中创建一个新的数据框吗?

Dataframe before:

之前的数据框:

a     b
1     9
2    10
3    11
4    12
5    13
6    14
7    15
8  null

Example:

例子:

import pandas

my_df = pandas.DataFrame({"a":[1,2,3,4,5,6,7,8],"b":[9,10,11,12,13,14,15,"null"]})

my_df2= my_df[(my_df['b']!="null")]
print(my_df2)

dataframe after:

之后的数据帧:

a   b
1   9
2  10
3  11
4  12
5  13
6  14
7  15

What it is doing is looking for "null" and excluding it. You could do the same thing with empty strings.

它正在做的是寻找“null”并排除它。你可以用空字符串做同样的事情。

回答by Gonzalo Ferreiro Volpi

You can negativize a condition while filtering using ~.

您可以在使用 过滤时否定条件~

So in your case you should do:

所以在你的情况下你应该这样做:

my_df = my_df[~my_df.editions.isnull()]