pandas 将函数应用于两列熊猫数据框以获得两个新列
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/37283123/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Applying a function to two columns of pandas dataframe to get two new columns
提问by ahoosh
I have a pandas data frame with columns Longitude
and Latitude
. I'd like to get X
and Y
from them. There is a function in utm
called from_latlon
that does this. It receives Latitude
and Longitude
and gives [X,Y]
. Here's what I do:
我有一个带有列Longitude
和Latitude
. 我想获得X
并Y
从他们。在utm
调用from_latlon
中有一个函数可以做到这一点。它接收Latitude
和Longitude
并给出[X,Y]
。这是我所做的:
def get_X(row):
return utm.from_latlon(row['Latitude'], row['Longitude'])[0]
def get_Y(row):
return utm.from_latlon(row['Latitude'], row['Longitude'])[1]
df['X'] = df.apply(get_X, axis=1)
df['Y'] = df.apply(get_Y, axis=1)
I'd like to define a function get_XY
and apply from_latlon
just one time to save time. I took a look at here, hereand herebut I could not find a way to make two columns with one apply
function. Thanks.
我想定义一个函数get_XY
并from_latlon
只应用一次以节省时间。我查看了here、here和here,但我找不到用一个apply
函数创建两列的方法。谢谢。
回答by BrenBarn
You can return a list from your function:
您可以从您的函数返回一个列表:
d = pandas.DataFrame({
"A": [1, 2, 3, 4, 5],
"B": [8, 88, 0, -8, -88]
})
def foo(row):
return [row["A"]+row["B"], row["A"]-row["B"]]
>>> d.apply(foo, axis=1)
A B
0 9 -7
1 90 -86
2 3 3
3 -4 12
4 -83 93
You can also return a Series. This lets you specify the column names of the return value:
您也可以返回一个系列。这使您可以指定返回值的列名:
def foo(row):
return pandas.Series({"X": row["A"]+row["B"], "Y": row["A"]-row["B"]})
>>> d.apply(foo, axis=1)
X Y
0 9 -7
1 90 -86
2 3 3
3 -4 12
4 -83 93
回答by RufusVS
I merged a couple of the answers from a similar thread and now have a generic multi-column in, multi-column out template I use in Jupyter/pandas:
我合并了来自类似线程的几个答案,现在有一个通用的多列输入、多列输出模板,我在 Jupyter/pandas 中使用:
# plain old function doesn't know about rows/columns, it just does its job.
def my_func(arg1,arg2):
return arg1+arg2, arg1-arg2 # return multiple responses
df['sum'],df['difference'] = zip(*df.apply(lambda x: my_func(x['first'],x['second']),axis=1))