pandas 将函数应用于熊猫数据框

Question

提问by mikebmassey

I'm trying to perform some text analysis on a pandasdataframe, but am having some trouble with the flow. Alternatively, maybe I just not getting it... PS - I'm a python beginner-ish.

我正在尝试对进行一些文本分析pandasdataframe，但在流程上遇到了一些问题。或者，也许我只是不明白...... PS - 我是一个 python 初学者。

Dataframe example:

数据框示例：

df = pd.DataFrame({'Document' : ['a','1','a', '6','7','N'], 'Type' : ['7', 'E', 'Y', '6', 'C', '9']})


     Document   Type
0    a          7
1    1          E
2    a          Y
3    6          6
4    7          C
5    N          9

I'm trying to build a flow that if 'Document' or 'Type' is a number or not, do something.

我正在尝试构建一个流程，如果“文档”或“类型”是否为数字，请执行某些操作。

Here is a simple function to return whether 'Document' is a number (edited to show how I am trying some if/then flow on the field):

这是一个简单的函数，用于返回“文档”是否为数字（已编辑以显示我如何在字段上尝试一些 if/then 流程）：

def fn(dfname):
    if dfname['Document'].apply(str.isdigit):
        dfname['Check'] = 'Y'
    else:
        dfname['Check'] = 'N'

Now, I applyit to the dataframe:

现在，我将apply它转到数据框：

df.apply(fn(df), axis=0)

I get this error back:

我得到这个错误：

TypeError: ("'NoneType' object is not callable", u'occurred at index Document')

From the error message, it looks that I am not handling the index correctly. Can anyone see where I am going wrong?

从错误消息看来，我没有正确处理索引。谁能看到我哪里出错了？

Lastly - this may or may not be related to the issue, but I am really struggling with how indexeswork in pandas. I think I have run into more issues with the index than any other issue.

最后 - 这可能与问题有关，也可能无关，但我真的很纠结如何indexes在pandas. 我想我遇到的索引问题比其他任何问题都多。

Answer 1

回答by Paul H

You're close.

你很接近。

The thing you have to realize about apply is you need to write functions that operate on scalar values and return the result that you want. With that in mind:

关于apply，您必须意识到的一点是您需要编写对标量值进行操作并返回您想要的结果的函数。考虑到这一点：

import pandas as pd

df = pd.DataFrame({'Document' : ['a','1','a', '6','7','N'], 'Type' : ['7', 'E', 'Y', '6', 'C', '9']})

def fn(val):
    if str(val).isdigit():
        return 'Y'
    else:
        return 'N'

df['check'] = df['Document'].apply(fn)

gives me:

给我：

  Document Type check
0        a    7     N
1        1    E     Y
2        a    Y     N
3        6    6     Y
4        7    C     Y
5        N    9     N

Edit:

编辑：

Just want to clarify that when using applyon a series, you should write function that accept scalar values. When using applyon a DataFrame, however, the functions should accept either full columns (when axis=0-- the default) or full rows (when axis=1).

只是想澄清一下，apply在系列上使用时，您应该编写接受标量值的函数。apply但是，当在 DataFrame 上使用时，函数应该接受完整列（when axis=0-- 默认）或完整行（when axis=1）。

Answer 2

回答by Andy Hayden

It's worth noting that you can do this (without using apply, so more efficiently) using str.contains:

值得注意的是，您可以使用str.contains：

In [11]: df['Document'].str.contains('^\d+$')
Out[11]: 
0    False
1     True
2    False
3     True
4     True
5    False
Name: Document, dtype: bool

Here the regex ^ and $ mean start and end respectively.

这里正则表达式 ^ 和 $ 分别表示开始和结束。

pandas 将函数应用于熊猫数据框

提问by mikebmassey

回答by Paul H

Edit:

编辑：

回答by Andy Hayden

相关推荐

最近更新

标签

pandas 将函数应用于熊猫数据框

提问by mikebmassey

回答by Paul H

Edit:

编辑：

回答by Andy Hayden

相关推荐

pandas 如何使用 groupby 获得熊猫的月均值

在 Pandas DataFrame 中查找值的 VLOOKUP 等效函数

Python pandas rolling_apply 两列输入到函数中

Pandas：估算 NaN

相关推荐

最近更新

标签