Python Pandas 应用函数将多个值返回到 Pandas 数据帧中的行

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/23690284/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 03:19:34  来源:igfitidea点击:

pandas apply function that returns multiple values to rows in pandas dataframe

pythonpandasdataframeapplyiterable-unpacking

提问by Fra

I have a dataframe with a timeindex and 3 columns containing the coordinates of a 3D vector:

我有一个带有时间索引和包含 3D 矢量坐标的 3 列的数据框:

                         x             y             z
ts
2014-05-15 10:38         0.120117      0.987305      0.116211
2014-05-15 10:39         0.117188      0.984375      0.122070
2014-05-15 10:40         0.119141      0.987305      0.119141
2014-05-15 10:41         0.116211      0.984375      0.120117
2014-05-15 10:42         0.119141      0.983398      0.118164

I would like to apply a transformation to each row that also returns a vector

我想对每一行应用一个转换,它也返回一个向量

def myfunc(a, b, c):
    do something
    return e, f, g

but if I do:

但如果我这样做:

df.apply(myfunc, axis=1)

I end up with a Pandas series whose elements are tuples. This is beacause apply will take the result of myfunc without unpacking it. How can I change myfunc so that I obtain a new df with 3 columns?

我最终得到了一个 Pandas 系列,它的元素是元组。这是因为 apply 将在不解包的情况下获取 myfunc 的结果。如何更改 myfunc 以便获得具有 3 列的新 df?

Edit:

编辑:

All solutions below work. The Series solution does allow for column names, the List solution seem to execute faster.

下面的所有解决方案都有效。Series 解决方案确实允许列名,List 解决方案似乎执行得更快。

def myfunc1(args):
    e=args[0] + 2*args[1]
    f=args[1]*args[2] +1
    g=args[2] + args[0] * args[1]
    return pd.Series([e,f,g], index=['a', 'b', 'c'])

def myfunc2(args):
    e=args[0] + 2*args[1]
    f=args[1]*args[2] +1
    g=args[2] + args[0] * args[1]
    return [e,f,g]

%timeit df.apply(myfunc1 ,axis=1)

100 loops, best of 3: 4.51 ms per loop

%timeit df.apply(myfunc2 ,axis=1)

100 loops, best of 3: 2.75 ms per loop

采纳答案by Happy001

Just return a list instead of tuple.

只需返回一个列表而不是元组。

In [81]: df
Out[81]: 
                            x         y         z
ts                                               
2014-05-15 10:38:00  0.120117  0.987305  0.116211
2014-05-15 10:39:00  0.117188  0.984375  0.122070
2014-05-15 10:40:00  0.119141  0.987305  0.119141
2014-05-15 10:41:00  0.116211  0.984375  0.120117
2014-05-15 10:42:00  0.119141  0.983398  0.118164

[5 rows x 3 columns]

In [82]: def myfunc(args):
   ....:        e=args[0] + 2*args[1]
   ....:        f=args[1]*args[2] +1
   ....:        g=args[2] + args[0] * args[1]
   ....:        return [e,f,g]
   ....: 

In [83]: df.apply(myfunc ,axis=1)
Out[83]: 
                            x         y         z
ts                                               
2014-05-15 10:38:00  2.094727  1.114736  0.234803
2014-05-15 10:39:00  2.085938  1.120163  0.237427
2014-05-15 10:40:00  2.093751  1.117629  0.236770
2014-05-15 10:41:00  2.084961  1.118240  0.234512
2014-05-15 10:42:00  2.085937  1.116202  0.235327

回答by U2EF1

Return Seriesand it will put them in a DataFrame.

返回Series,它将把它们放在一个 DataFrame 中。

def myfunc(a, b, c):
    do something
    return pd.Series([e, f, g])

This has the bonus that you can give labels to each of the resulting columns. If you return a DataFrame it just inserts multiple rows for the group.

这有一个好处,您可以为每个结果列提供标签。如果您返回一个 DataFrame,它只会为该组插入多行。

回答by Fra

Found a possible solution, by changing myfunc to return an np.array like this:

找到了一个可能的解决方案,通过改变 myfunc 返回一个 np.array 像这样:

import numpy as np

def myfunc(a, b, c):
    do something
    return np.array((e, f, g))

any better solution?

任何更好的解决方案?

回答by Dennis Golomazov

Based on the excellent answerby @U2EF1, I've created a handy function that applies a specified function that returns tuples to a dataframe field, and expands the result back to the dataframe.

基于@U2EF1的出色回答,我创建了一个方便的函数,该函数应用指定的函数将元组返回到数据帧字段,并将结果扩展回数据帧。

def apply_and_concat(dataframe, field, func, column_names):
    return pd.concat((
        dataframe,
        dataframe[field].apply(
            lambda cell: pd.Series(func(cell), index=column_names))), axis=1)

Usage:

用法:

df = pd.DataFrame([1, 2, 3], index=['a', 'b', 'c'], columns=['A'])
print df
   A
a  1
b  2
c  3

def func(x):
    return x*x, x*x*x

print apply_and_concat(df, 'A', func, ['x^2', 'x^3'])

   A  x^2  x^3
a  1    1    1
b  2    4    8
c  3    9   27

Hope it helps someone.

希望它可以帮助某人。

回答by Genarito

I've tried returning a tuple (I was using functions like scipy.stats.pearsonrwhich return that kind of structures) but It returned a 1D Series instead of a Dataframe which was I expected. If I created a Series manually the performance was worse, so I fixed It using the result_typeas explained in the official API documentation:

我试过返回一个元组(我正在使用类似scipy.stats.pearsonr返回那种结构的函数),但它返回了一个 1D 系列而不是我期望的数据帧。如果我手动创建一个系列,性能会更差,所以我使用官方 API 文档result_type中的解释来修复它:

Returning a Series inside the function is similar to passing result_type='expand'. The resulting column names will be the Series index.

在函数内部返回一个 Series 类似于传递 result_type='expand'。结果列名将是系列索引。

So you could edit your code this way:

所以你可以这样编辑你的代码:

def myfunc(a, b, c):
    # do something
    return (e, f, g)

df.apply(myfunc, axis=1,  result_type='expand')