pandas 将函数应用于两列熊猫数据框以获得两个新列

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/37283123/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 01:14:52  来源:igfitidea点击:

Applying a function to two columns of pandas dataframe to get two new columns

pythonpandasmultiple-columnsapply

提问by ahoosh

I have a pandas data frame with columns Longitudeand Latitude. I'd like to get Xand Yfrom them. There is a function in utmcalled from_latlonthat does this. It receives Latitudeand Longitudeand gives [X,Y]. Here's what I do:

我有一个带有列LongitudeLatitude. 我想获得XY从他们。在utm调用from_latlon中有一个函数可以做到这一点。它接收LatitudeLongitude并给出[X,Y]。这是我所做的:

    def get_X(row):
        return utm.from_latlon(row['Latitude'], row['Longitude'])[0]

    def get_Y(row):
        return utm.from_latlon(row['Latitude'], row['Longitude'])[1] 

    df['X'] = df.apply(get_X, axis=1)
    df['Y'] = df.apply(get_Y, axis=1)

I'd like to define a function get_XYand apply from_latlonjust one time to save time. I took a look at here, hereand herebut I could not find a way to make two columns with one applyfunction. Thanks.

我想定义一个函数get_XYfrom_latlon只应用一次以节省时间。我查看了hereherehere,但我找不到用一个apply函数创建两列的方法。谢谢。

回答by BrenBarn

You can return a list from your function:

您可以从您的函数返回一个列表:

d = pandas.DataFrame({
    "A": [1, 2, 3, 4, 5],
    "B": [8, 88, 0, -8, -88]
})

def foo(row):
    return [row["A"]+row["B"], row["A"]-row["B"]]

>>> d.apply(foo, axis=1)
    A   B
0   9  -7
1  90 -86
2   3   3
3  -4  12
4 -83  93

You can also return a Series. This lets you specify the column names of the return value:

您也可以返回一个系列。这使您可以指定返回值的列名:

def foo(row):
    return pandas.Series({"X": row["A"]+row["B"], "Y": row["A"]-row["B"]})

>>> d.apply(foo, axis=1)
    X   Y
0   9  -7
1  90 -86
2   3   3
3  -4  12
4 -83  93

回答by RufusVS

I merged a couple of the answers from a similar thread and now have a generic multi-column in, multi-column out template I use in Jupyter/pandas:

我合并了来自类似线程的几个答案,现在有一个通用的多列输入、多列输出模板,我在 Jupyter/pandas 中使用:

# plain old function doesn't know about rows/columns, it just does its job.
def my_func(arg1,arg2):
    return arg1+arg2, arg1-arg2  # return multiple responses

df['sum'],df['difference'] = zip(*df.apply(lambda x: my_func(x['first'],x['second']),axis=1))