python pandas:过滤掉给定字段的空或空字符串记录
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/39475566/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
python pandas: filter out records with null or empty string for a given field
提问by Edamame
I am trying to filter out records whose field_A is null or empty string in the data frame like below:
我试图过滤掉数据框中 field_A 为空或空字符串的记录,如下所示:
my_df[my_df.editions is not None]
my_df.shape
This gives me error:
这给了我错误:
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
<ipython-input-40-e1969e0af259> in <module>()
1 my_df['editions'] = my['editions'].astype(str)
----> 2 my_df = my_df[my_df.editions is not None]
3 my_df.shape
/home/edamame/anaconda2/lib/python2.7/site-packages/pandas/core/frame.pyc in __getitem__(self, key)
1995 return self._getitem_multilevel(key)
1996 else:
-> 1997 return self._getitem_column(key)
1998
1999 def _getitem_column(self, key):
/home/edamame/anaconda2/lib/python2.7/site-packages/pandas/core/frame.pyc in _getitem_column(self, key)
2002 # get column
2003 if self.columns.is_unique:
-> 2004 return self._get_item_cache(key)
2005
2006 # duplicate columns & possible reduce dimensionality
/home/edamame/anaconda2/lib/python2.7/site-packages/pandas/core/generic.pyc in _get_item_cache(self, item)
1348 res = cache.get(item)
1349 if res is None:
-> 1350 values = self._data.get(item)
1351 res = self._box_item_values(item, values)
1352 cache[item] = res
/home/edamame/anaconda2/lib/python2.7/site-packages/pandas/core/internals.pyc in get(self, item, fastpath)
3288
3289 if not isnull(item):
-> 3290 loc = self.items.get_loc(item)
3291 else:
3292 indexer = np.arange(len(self.items))[isnull(self.items)]
/home/edamame/anaconda2/lib/python2.7/site-packages/pandas/indexes/base.pyc in get_loc(self, key, method, tolerance)
1945 return self._engine.get_loc(key)
1946 except KeyError:
-> 1947 return self._engine.get_loc(self._maybe_cast_indexer(key))
1948
1949 indexer = self.get_indexer([key], method=method, tolerance=tolerance)
pandas/index.pyx in pandas.index.IndexEngine.get_loc (pandas/index.c:4154)()
pandas/index.pyx in pandas.index.IndexEngine.get_loc (pandas/index.c:4018)()
pandas/hashtable.pyx in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:12368)()
pandas/hashtable.pyx in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:12322)()
KeyError: True
or
或者
my_df[my_df.editions != None]
my_df.shape
This one gave no error but didn't filter out any None values.
这个没有给出错误,但没有过滤掉任何 None 值。
I also tried:
我也试过:
my_df = my_df[my_df.editions.notnull()]
This one doesn't give error but doesn't filter out any None values either.
这个不会出错,但也不会过滤掉任何 None 值。
Could anyone please advise how to solve this problem? Thanks!
任何人都可以请教如何解决这个问题?谢谢!
采纳答案by MattR
Can you create a new dataframe from the filtering?
你能从过滤中创建一个新的数据框吗?
Dataframe before:
之前的数据框:
a b
1 9
2 10
3 11
4 12
5 13
6 14
7 15
8 null
Example:
例子:
import pandas
my_df = pandas.DataFrame({"a":[1,2,3,4,5,6,7,8],"b":[9,10,11,12,13,14,15,"null"]})
my_df2= my_df[(my_df['b']!="null")]
print(my_df2)
dataframe after:
之后的数据帧:
a b
1 9
2 10
3 11
4 12
5 13
6 14
7 15
What it is doing is looking for "null" and excluding it. You could do the same thing with empty strings.
它正在做的是寻找“null”并排除它。你可以用空字符串做同样的事情。
回答by Gonzalo Ferreiro Volpi
You can negativize a condition while filtering using ~
.
您可以在使用 过滤时否定条件~
。
So in your case you should do:
所以在你的情况下你应该这样做:
my_df = my_df[~my_df.editions.isnull()]