pandas 将函数应用于熊猫数据框
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/21269599/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Applying a function to pandas dataframe
提问by mikebmassey
I'm trying to perform some text analysis on a pandasdataframe, but am having some trouble with the flow. Alternatively, maybe I just not getting it... PS - I'm a python beginner-ish.
我正在尝试对 进行一些文本分析pandasdataframe,但在流程上遇到了一些问题。或者,也许我只是不明白...... PS - 我是一个 python 初学者。
Dataframe example:
数据框示例:
df = pd.DataFrame({'Document' : ['a','1','a', '6','7','N'], 'Type' : ['7', 'E', 'Y', '6', 'C', '9']})
Document Type
0 a 7
1 1 E
2 a Y
3 6 6
4 7 C
5 N 9
I'm trying to build a flow that if 'Document' or 'Type' is a number or not, do something.
我正在尝试构建一个流程,如果“文档”或“类型”是否为数字,请执行某些操作。
Here is a simple function to return whether 'Document' is a number (edited to show how I am trying some if/then flow on the field):
这是一个简单的函数,用于返回“文档”是否为数字(已编辑以显示我如何在字段上尝试一些 if/then 流程):
def fn(dfname):
if dfname['Document'].apply(str.isdigit):
dfname['Check'] = 'Y'
else:
dfname['Check'] = 'N'
Now, I applyit to the dataframe:
现在,我将apply它转到数据框:
df.apply(fn(df), axis=0)
I get this error back:
我得到这个错误:
TypeError: ("'NoneType' object is not callable", u'occurred at index Document')
From the error message, it looks that I am not handling the index correctly. Can anyone see where I am going wrong?
从错误消息看来,我没有正确处理索引。谁能看到我哪里出错了?
Lastly - this may or may not be related to the issue, but I am really struggling with how indexeswork in pandas. I think I have run into more issues with the index than any other issue.
最后 - 这可能与问题有关,也可能无关,但我真的很纠结如何indexes在pandas. 我想我遇到的索引问题比其他任何问题都多。
回答by Paul H
You're close.
你很接近。
The thing you have to realize about apply is you need to write functions that operate on scalar values and return the result that you want. With that in mind:
关于apply,您必须意识到的一点是您需要编写对标量值进行操作并返回您想要的结果的函数。考虑到这一点:
import pandas as pd
df = pd.DataFrame({'Document' : ['a','1','a', '6','7','N'], 'Type' : ['7', 'E', 'Y', '6', 'C', '9']})
def fn(val):
if str(val).isdigit():
return 'Y'
else:
return 'N'
df['check'] = df['Document'].apply(fn)
gives me:
给我:
Document Type check
0 a 7 N
1 1 E Y
2 a Y N
3 6 6 Y
4 7 C Y
5 N 9 N
Edit:
编辑:
Just want to clarify that when using applyon a series, you should write function that accept scalar values. When using applyon a DataFrame, however, the functions should accept either full columns (when axis=0-- the default) or full rows (when axis=1).
只是想澄清一下,apply在系列上使用时,您应该编写接受标量值的函数。apply但是,当在 DataFrame 上使用时,函数应该接受完整列(when axis=0-- 默认)或完整行(when axis=1)。
回答by Andy Hayden
It's worth noting that you can do this (without using apply, so more efficiently) using str.contains:
值得注意的是,您可以使用str.contains:
In [11]: df['Document'].str.contains('^\d+$')
Out[11]:
0 False
1 True
2 False
3 True
4 True
5 False
Name: Document, dtype: bool
Here the regex ^ and $ mean start and end respectively.
这里正则表达式 ^ 和 $ 分别表示开始和结束。

