Python Pandas:如何对单列使用 apply() 函数?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/34962104/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 15:46:33  来源:igfitidea点击:

Pandas: How can I use the apply() function for a single column?

pythonpandasdataframepython-3.5

提问by Amani

I have a pandas data frame with two columns. I need to change the values of the first column without affecting the second one and get back the whole data frame with just first column values changed. How can I do that using apply in pandas?

我有一个包含两列的熊猫数据框。我需要在不影响第二列的情况下更改第一列的值,并在仅更改第一列值的情况下取回整个数据框。我怎样才能在 Pandas 中使用 apply 来做到这一点?

采纳答案by Fabio Lamanna

Given a sample dataframe dfas:

给定一个示例数据框df

a,b
1,2
2,3
3,4
4,5

what you want is:

你想要的是:

df['a'] = df['a'].apply(lambda x: x + 1)

that returns:

返回:

   a  b
0  2  2
1  3  3
2  4  4
3  5  5

回答by George Petrov

For a single column better to use map(), like this:

对于更好地使用的单列map(),如下所示:

df = pd.DataFrame([{'a': 15, 'b': 15, 'c': 5}, {'a': 20, 'b': 10, 'c': 7}, {'a': 25, 'b': 30, 'c': 9}])

    a   b  c
0  15  15  5
1  20  10  7
2  25  30  9



df['a'] = df['a'].map(lambda a: a / 2.)

      a   b  c
0   7.5  15  5
1  10.0  10  7
2  12.5  30  9

回答by Mike Müller

You don't need a function at all. You can work on a whole column directly.

你根本不需要函数。您可以直接处理整个列。

Example data:

示例数据:

>>> df = pd.DataFrame({'a': [100, 1000], 'b': [200, 2000], 'c': [300, 3000]})
>>> df

      a     b     c
0   100   200   300
1  1000  2000  3000

Half all the values in column a:

列中所有值的一半a

>>> df.a = df.a / 2
>>> df

     a     b     c
0   50   200   300
1  500  2000  3000

回答by Thibaut Dubernet

Although the given responses are correct, they modify the initial data frame, which is not always desirable (and, given the OP asked for examples "using apply", it might be they wanted a version that returns a new data frame, as applydoes).

尽管给定的响应是正确的,但它们修改了初始数据帧,这并不总是可取的(并且,鉴于 OP 要求提供“使用apply”示例,他们可能想要一个返回新数据帧的版本,就像apply那样)。

This is possible using assign: it is valid to assignto existing columns, as the documentation states (emphasis is mine):

这可以使用assign:它对assign现有列有效,如文档所述(重点是我的):

Assign new columns to a DataFrame.

Returns a new objectwith all original columns in addition to new ones. Existing columns that are re-assigned will be overwritten.

将新列分配给 DataFrame。

返回一个包含所有原始列和新列的新对象重新分配的现有列将被覆盖

In short:

简而言之:

In [1]: import pandas as pd

In [2]: df = pd.DataFrame([{'a': 15, 'b': 15, 'c': 5}, {'a': 20, 'b': 10, 'c': 7}, {'a': 25, 'b': 30, 'c': 9}])

In [3]: df.assign(a=lambda df: df.a / 2)
Out[3]: 
      a   b  c
0   7.5  15  5
1  10.0  10  7
2  12.5  30  9

In [4]: df
Out[4]: 
    a   b  c
0  15  15  5
1  20  10  7
2  25  30  9

Note that the function will be passed the whole dataframe, not only the column you want to modify, so you will need to make sure you select the right column in your lambda.

请注意,该函数将传递整个数据框,而不仅仅是您要修改的列,因此您需要确保在 lambda 中选择正确的列。

回答by durjoy

If you are really concerned about the execution speed of your apply function and you have a huge dataset to work on, you could use swifter to make faster execution, here is an example for swifter on pandas dataframe:

如果您真的很关心 apply 函数的执行速度,并且您有一个庞大的数据集需要处理,则可以使用 swifter 来加快执行速度,以下是在 Pandas 数据帧上 swifter 的示例:

import pandas as pd
import swifter

def fnc(m):
    return m*3+4

df = pd.DataFrame({"m": [1,2,3,4,5,6], "c": [1,1,1,1,1,1], "x":[5,3,6,2,6,1]})

# apply a self created function to a single column in pandas
df["y"] = df.m.swifter.apply(fnc)

This will enable your all CPU cores to compute the result hence it will be much faster than normal apply functions. Try and let me know if it become useful for you.

这将使您的所有 CPU 内核都能计算结果,因此它会比普通的应用函数快得多。尝试并告诉我它是否对您有用。

回答by Harry_pb

Let me try a complex computation using datetime and considering nulls or empty spaces. I am reducing 30 years on a datetime column and using applymethod as well as lambdaand converting datetime format. Line if x != '' else xwill take care of all empty spaces or nulls accordingly.

让我尝试使用日期时间并考虑空值或空格的复杂计算。我在日期时间列上减少了 30 年,并使用apply方法以及lambda转换日期时间格式。Lineif x != '' else x将相应地处理所有空格或空值。

df['Date'] = df['Date'].fillna('')
df['Date'] = df['Date'].apply(lambda x : ((datetime.datetime.strptime(str(x), '%m/%d/%Y') - datetime.timedelta(days=30*365)).strftime('%Y%m%d')) if x != '' else x)