如果列值不为 NULL,则 Python 熊猫应用函数
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/26614465/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Python pandas apply function if a column value is not NULL
提问by ragesz
I have a dataframe (in Python 2.7, pandas 0.15.0):
我有一个数据框(在 Python 2.7 中,pandas 0.15.0):
df=
A B C
0 NaN 11 NaN
1 two NaN ['foo', 'bar']
2 three 33 NaN
I want to apply a simple function for rows that does not contain NULL values in a specific column. My function is as simple as possible:
我想对特定列中不包含 NULL 值的行应用一个简单的函数。我的功能尽可能简单:
def my_func(row):
print row
And my apply code is the following:
我的应用代码如下:
df[['A','B']].apply(lambda x: my_func(x) if(pd.notnull(x[0])) else x, axis = 1)
It works perfectly. If I want to check column 'B' for NULL values the pd.notnull()works perfectly as well. But if I select column 'C' that contains list objects:
它完美地工作。如果我想检查 'B' 列的 NULL 值,它也pd.notnull()可以完美地工作。但是,如果我选择包含列表对象的“C”列:
df[['A','C']].apply(lambda x: my_func(x) if(pd.notnull(x[1])) else x, axis = 1)
then I get the following error message: ValueError: ('The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()', u'occurred at index 1')
然后我收到以下错误消息: ValueError: ('The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()', u'occurred at index 1')
Does anybody know why pd.notnull()works only for integer and string columns but not for 'list columns'?
有人知道为什么pd.notnull()只适用于整数和字符串列而不适用于“列表列”吗?
And is there a nicer way to check for NULL values in column 'C' instead of this:
是否有更好的方法来检查列 'C' 中的 NULL 值而不是这样:
df[['A','C']].apply(lambda x: my_func(x) if(str(x[1]) != 'nan') else x, axis = 1)
Thank you!
谢谢!
采纳答案by Korem
The problem is that pd.notnull(['foo', 'bar'])operates elementwise and returns array([ True, True], dtype=bool). Your if condition trys to convert that to a boolean, and that's when you get the exception.
问题是按pd.notnull(['foo', 'bar'])元素操作并返回array([ True, True], dtype=bool). 您的 if 条件尝试将其转换为布尔值,这就是您收到异常的时候。
To fix it, you could simply wrap the isnull statement with np.all:
要修复它,您可以简单地用以下内容包装 isnull 语句np.all:
df[['A','C']].apply(lambda x: my_func(x) if(np.all(pd.notnull(x[1]))) else x, axis = 1)
Now you'll see that np.all(pd.notnull(['foo', 'bar']))is indeed True.
现在您会看到np.all(pd.notnull(['foo', 'bar']))确实如此True。
回答by Aziz Alto
Also another way is to just use row.notnull().all()(without numpy), here is an example:
另一种方法是只使用row.notnull().all()(没有numpy),这是一个例子:
df.apply(lambda row: func1(row) if row.notnull().all() else func2(row), axis=1)
Here is a complete example on your df:
这是您的 df 上的完整示例:
>>> d = {'A': [None, 2, 3, 4], 'B': [11, None, 33, 4], 'C': [None, ['a','b'], None, 4]}
>>> df = pd.DataFrame(d)
>>> df
A B C
0 NaN 11.0 None
1 2.0 NaN [a, b]
2 3.0 33.0 None
3 4.0 4.0 4
>>> def func1(r):
... return 'No'
...
>>> def func2(r):
... return 'Yes'
...
>>> df.apply(lambda row: func1(row) if row.notnull().all() else func2(row), axis=1)
0 Yes
1 Yes
2 Yes
3 No
And a friendlier screenshot :-)
还有一个更友好的截图:-)
回答by coffman21
I had a column contained lists and NaNs. So, the next one worked for me.
我有一列包含列表和NaNs。所以,下一个对我有用。
df.C.map(lambda x: my_func(x) if type(x) == list else x)
回答by Andrew Monger
Try...
尝试...
df['a'] = df['a'].apply(lambda x: x.replace(',','\,') if x != None else x)
this example just adds an escape character to a comma if the value is not None
如果值不是 None,则此示例仅向逗号添加转义字符


