Python Pandas:如何对单列使用 apply() 函数?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/34962104/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Pandas: How can I use the apply() function for a single column?
提问by Amani
I have a pandas data frame with two columns. I need to change the values of the first column without affecting the second one and get back the whole data frame with just first column values changed. How can I do that using apply in pandas?
我有一个包含两列的熊猫数据框。我需要在不影响第二列的情况下更改第一列的值,并在仅更改第一列值的情况下取回整个数据框。我怎样才能在 Pandas 中使用 apply 来做到这一点?
采纳答案by Fabio Lamanna
Given a sample dataframe df
as:
给定一个示例数据框df
:
a,b
1,2
2,3
3,4
4,5
what you want is:
你想要的是:
df['a'] = df['a'].apply(lambda x: x + 1)
that returns:
返回:
a b
0 2 2
1 3 3
2 4 4
3 5 5
回答by George Petrov
For a single column better to use map()
, like this:
对于更好地使用的单列map()
,如下所示:
df = pd.DataFrame([{'a': 15, 'b': 15, 'c': 5}, {'a': 20, 'b': 10, 'c': 7}, {'a': 25, 'b': 30, 'c': 9}])
a b c
0 15 15 5
1 20 10 7
2 25 30 9
df['a'] = df['a'].map(lambda a: a / 2.)
a b c
0 7.5 15 5
1 10.0 10 7
2 12.5 30 9
回答by Mike Müller
You don't need a function at all. You can work on a whole column directly.
你根本不需要函数。您可以直接处理整个列。
Example data:
示例数据:
>>> df = pd.DataFrame({'a': [100, 1000], 'b': [200, 2000], 'c': [300, 3000]})
>>> df
a b c
0 100 200 300
1 1000 2000 3000
Half all the values in column a
:
列中所有值的一半a
:
>>> df.a = df.a / 2
>>> df
a b c
0 50 200 300
1 500 2000 3000
回答by Thibaut Dubernet
Although the given responses are correct, they modify the initial data frame, which is not always desirable (and, given the OP asked for examples "using apply
", it might be they wanted a version that returns a new data frame, as apply
does).
尽管给定的响应是正确的,但它们修改了初始数据帧,这并不总是可取的(并且,鉴于 OP 要求提供“使用apply
”示例,他们可能想要一个返回新数据帧的版本,就像apply
那样)。
This is possible using assign
: it is valid to assign
to existing columns, as the documentation states (emphasis is mine):
这可以使用assign
:它对assign
现有列有效,如文档所述(重点是我的):
Assign new columns to a DataFrame.
Returns a new objectwith all original columns in addition to new ones. Existing columns that are re-assigned will be overwritten.
将新列分配给 DataFrame。
返回一个包含所有原始列和新列的新对象。重新分配的现有列将被覆盖。
In short:
简而言之:
In [1]: import pandas as pd
In [2]: df = pd.DataFrame([{'a': 15, 'b': 15, 'c': 5}, {'a': 20, 'b': 10, 'c': 7}, {'a': 25, 'b': 30, 'c': 9}])
In [3]: df.assign(a=lambda df: df.a / 2)
Out[3]:
a b c
0 7.5 15 5
1 10.0 10 7
2 12.5 30 9
In [4]: df
Out[4]:
a b c
0 15 15 5
1 20 10 7
2 25 30 9
Note that the function will be passed the whole dataframe, not only the column you want to modify, so you will need to make sure you select the right column in your lambda.
请注意,该函数将传递整个数据框,而不仅仅是您要修改的列,因此您需要确保在 lambda 中选择正确的列。
回答by durjoy
If you are really concerned about the execution speed of your apply function and you have a huge dataset to work on, you could use swifter to make faster execution, here is an example for swifter on pandas dataframe:
如果您真的很关心 apply 函数的执行速度,并且您有一个庞大的数据集需要处理,则可以使用 swifter 来加快执行速度,以下是在 Pandas 数据帧上 swifter 的示例:
import pandas as pd
import swifter
def fnc(m):
return m*3+4
df = pd.DataFrame({"m": [1,2,3,4,5,6], "c": [1,1,1,1,1,1], "x":[5,3,6,2,6,1]})
# apply a self created function to a single column in pandas
df["y"] = df.m.swifter.apply(fnc)
This will enable your all CPU cores to compute the result hence it will be much faster than normal apply functions. Try and let me know if it become useful for you.
这将使您的所有 CPU 内核都能计算结果,因此它会比普通的应用函数快得多。尝试并告诉我它是否对您有用。
回答by Harry_pb
Let me try a complex computation using datetime and considering nulls or empty spaces. I am reducing 30 years on a datetime column and using apply
method as well as lambda
and converting datetime format. Line if x != '' else x
will take care of all empty spaces or nulls accordingly.
让我尝试使用日期时间并考虑空值或空格的复杂计算。我在日期时间列上减少了 30 年,并使用apply
方法以及lambda
转换日期时间格式。Lineif x != '' else x
将相应地处理所有空格或空值。
df['Date'] = df['Date'].fillna('')
df['Date'] = df['Date'].apply(lambda x : ((datetime.datetime.strptime(str(x), '%m/%d/%Y') - datetime.timedelta(days=30*365)).strftime('%Y%m%d')) if x != '' else x)