Python Pandas 应用函数将多个值返回到 Pandas 数据帧中的行

Question

提问by Fra

I have a dataframe with a timeindex and 3 columns containing the coordinates of a 3D vector:

我有一个带有时间索引和包含 3D 矢量坐标的 3 列的数据框：

                         x             y             z
ts
2014-05-15 10:38         0.120117      0.987305      0.116211
2014-05-15 10:39         0.117188      0.984375      0.122070
2014-05-15 10:40         0.119141      0.987305      0.119141
2014-05-15 10:41         0.116211      0.984375      0.120117
2014-05-15 10:42         0.119141      0.983398      0.118164

I would like to apply a transformation to each row that also returns a vector

我想对每一行应用一个转换，它也返回一个向量

def myfunc(a, b, c):
    do something
    return e, f, g

but if I do:

但如果我这样做：

df.apply(myfunc, axis=1)

I end up with a Pandas series whose elements are tuples. This is beacause apply will take the result of myfunc without unpacking it. How can I change myfunc so that I obtain a new df with 3 columns?

我最终得到了一个 Pandas 系列，它的元素是元组。这是因为 apply 将在不解包的情况下获取 myfunc 的结果。如何更改 myfunc 以便获得具有 3 列的新 df？

Edit:

编辑：

All solutions below work. The Series solution does allow for column names, the List solution seem to execute faster.

下面的所有解决方案都有效。Series 解决方案确实允许列名，List 解决方案似乎执行得更快。

def myfunc1(args):
    e=args[0] + 2*args[1]
    f=args[1]*args[2] +1
    g=args[2] + args[0] * args[1]
    return pd.Series([e,f,g], index=['a', 'b', 'c'])

def myfunc2(args):
    e=args[0] + 2*args[1]
    f=args[1]*args[2] +1
    g=args[2] + args[0] * args[1]
    return [e,f,g]

%timeit df.apply(myfunc1 ,axis=1)

100 loops, best of 3: 4.51 ms per loop

%timeit df.apply(myfunc2 ,axis=1)

100 loops, best of 3: 2.75 ms per loop

Answer 1

采纳答案by Happy001

Just return a list instead of tuple.

只需返回一个列表而不是元组。

In [81]: df
Out[81]: 
                            x         y         z
ts                                               
2014-05-15 10:38:00  0.120117  0.987305  0.116211
2014-05-15 10:39:00  0.117188  0.984375  0.122070
2014-05-15 10:40:00  0.119141  0.987305  0.119141
2014-05-15 10:41:00  0.116211  0.984375  0.120117
2014-05-15 10:42:00  0.119141  0.983398  0.118164

[5 rows x 3 columns]

In [82]: def myfunc(args):
   ....:        e=args[0] + 2*args[1]
   ....:        f=args[1]*args[2] +1
   ....:        g=args[2] + args[0] * args[1]
   ....:        return [e,f,g]
   ....: 

In [83]: df.apply(myfunc ,axis=1)
Out[83]: 
                            x         y         z
ts                                               
2014-05-15 10:38:00  2.094727  1.114736  0.234803
2014-05-15 10:39:00  2.085938  1.120163  0.237427
2014-05-15 10:40:00  2.093751  1.117629  0.236770
2014-05-15 10:41:00  2.084961  1.118240  0.234512
2014-05-15 10:42:00  2.085937  1.116202  0.235327

Answer 2

回答by U2EF1

Return Seriesand it will put them in a DataFrame.

返回Series，它将把它们放在一个 DataFrame 中。

def myfunc(a, b, c):
    do something
    return pd.Series([e, f, g])

This has the bonus that you can give labels to each of the resulting columns. If you return a DataFrame it just inserts multiple rows for the group.

这有一个好处，您可以为每个结果列提供标签。如果您返回一个 DataFrame，它只会为该组插入多行。

Answer 3

回答by Fra

Found a possible solution, by changing myfunc to return an np.array like this:

找到了一个可能的解决方案，通过改变 myfunc 返回一个 np.array 像这样：

import numpy as np

def myfunc(a, b, c):
    do something
    return np.array((e, f, g))

any better solution?

任何更好的解决方案？

Answer 4

回答by Dennis Golomazov

Based on the excellent answerby @U2EF1, I've created a handy function that applies a specified function that returns tuples to a dataframe field, and expands the result back to the dataframe.

基于@U2EF1的出色回答，我创建了一个方便的函数，该函数应用指定的函数将元组返回到数据帧字段，并将结果扩展回数据帧。

def apply_and_concat(dataframe, field, func, column_names):
    return pd.concat((
        dataframe,
        dataframe[field].apply(
            lambda cell: pd.Series(func(cell), index=column_names))), axis=1)

Usage:

用法：

df = pd.DataFrame([1, 2, 3], index=['a', 'b', 'c'], columns=['A'])
print df
   A
a  1
b  2
c  3

def func(x):
    return x*x, x*x*x

print apply_and_concat(df, 'A', func, ['x^2', 'x^3'])

   A  x^2  x^3
a  1    1    1
b  2    4    8
c  3    9   27

Hope it helps someone.

希望它可以帮助某人。

Answer 5

回答by Genarito

I've tried returning a tuple (I was using functions like scipy.stats.pearsonrwhich return that kind of structures) but It returned a 1D Series instead of a Dataframe which was I expected. If I created a Series manually the performance was worse, so I fixed It using the result_typeas explained in the official API documentation:

我试过返回一个元组（我正在使用类似scipy.stats.pearsonr返回那种结构的函数），但它返回了一个 1D 系列而不是我期望的数据帧。如果我手动创建一个系列，性能会更差，所以我使用官方 API 文档result_type中的解释来修复它：

Returning a Series inside the function is similar to passing result_type='expand'. The resulting column names will be the Series index.

在函数内部返回一个 Series 类似于传递 result_type='expand'。结果列名将是系列索引。

So you could edit your code this way:

所以你可以这样编辑你的代码：

def myfunc(a, b, c):
    # do something
    return (e, f, g)

df.apply(myfunc, axis=1,  result_type='expand')

Python Pandas 应用函数将多个值返回到 Pandas 数据帧中的行

提问by Fra

采纳答案by Happy001

回答by U2EF1

回答by Fra

回答by Dennis Golomazov

回答by Genarito

相关推荐

最近更新

标签

Python Pandas 应用函数将多个值返回到 Pandas 数据帧中的行

提问by Fra

采纳答案by Happy001

回答by U2EF1

回答by Fra

回答by Dennis Golomazov

回答by Genarito

相关推荐

Python 将日期时间小时设置为特定时间

Python Pycharm 的代码风格检查：忽略/关闭特定规则

Python 如果 numpy 数组元素高于特定阈值，则将它们设置为零

如何仅使用 Python stdlib 检查 jpeg 图像是彩色还是灰度

相关推荐

最近更新

标签