Pandas：累积回报函数

Question

提问by Kelaref

I have a dataframe such as the following:

我有一个数据框，如下所示：

  Index      Return
2008-11-21   0.153419
2008-11-24   0.037421
2008-11-25   0.077500

What's the best way to calculate a cumulative return across all columns on the last row?

计算最后一行所有列的累积回报的最佳方法是什么？

Following is the intended result:

以下是预期的结果：

  Index      Return
2008-11-21   0.153419
2008-11-24   0.037421
2008-11-25   0.077500
Cumulative   0.289316

Where cumulative return calculated as follows:

其中累积回报计算如下：

cumulative = (1 + return1) * (1 + return2) * (1 + return3) - 1

What is the best way to perform this in pandas?

在Pandas中执行此操作的最佳方法是什么？

Answer 1

回答by Steven G

there is a pandas cumprod()method for that. this will work for every columns.

有一个Pandascumprod()方法。这将适用于每一列。

df.ix["Cumulative"] = ((df+1).cumprod()-1).iloc[-1]

this would be about 2 time faster than other solutions on large dataset:

这将比大型数据集上的其他解决方案快约 2 倍：

In[106]: %timeit df.ix["Cumulative"] = ((df+1).cumprod()-1).iloc[-1]
10 loops, best of 3: 18.4 ms per loop
In[107]: %timeit df.ix['Cummulative'] = df.apply(lambda x: (x+1).prod()-1)
10 loops, best of 3: 32.9 ms per loop
In[110]: %timeit df.append(df.iloc[:,1:].apply(lambda col: (col + 1).prod() - 1), ignore_index=True)
10 loops, best of 3: 37.1 ms per loop
In[113]: %timeit df.append(df.apply(lambda col: prod([(1+c) for c in col]) - 1), ignore_index=True)
1 loop, best of 3: 262 ms per loop

I would suggest to neveruse apply if you can find a built-in method since apply is looping over the dataframe which makes it slow. Bult-in method are highly efficient and normally there is no way you are going to get faster than them using apply.

如果您能找到内置方法，我建议永远不要使用 apply，因为 apply 正在循环遍历数据帧，这使得它变慢。内置方法非常有效，通常没有办法比使用 apply 更快。

Answer 2

回答by TheF1rstPancake

Another solution:

另一种解决方案：

df.ix["Cumulative"] = (df['Return']+1).prod() - 1

This will add 1 to the df['Return']column, multiply all the rows together, and then subtract one from the result. This will result in a simple float value. The result will then be placed at the index "Cumulative". Since that index doesn't exist yet, it will be appended to the end of the DataFrame:

这会将df['Return']列加 1 ，将所有行相乘，然后从结果中减去 1。这将产生一个简单的浮点值。结果将被放置在索引“累积”中。由于该索引尚不存在，它将被附加到 DataFrame 的末尾：

               Return
2008-11-21   0.153419
2008-11-25   0.077500
2008-11-24   0.037421
Cummulative  0.289316

If you want to apply this across multiple columns:

如果您想在多列中应用它：

df.ix['Cummulative'] = df.apply(lambda x: (x+1).prod()-1)

This would output the following (I made a second column called "Return2" that is a copy of "Return"):

这将输出以下内容（我创建了名为“Return2”的第二列，它是“Return”的副本）：

               Return   Return2
2008-11-21   0.153419  0.153419
2008-11-25   0.077500  0.077500
2008-11-24   0.037421  0.037421
Cummulative  0.289316  0.289316

Answer 3

回答by Psidom

With pandas, you can use the prod()method:

使用pandas，您可以使用以下prod()方法：

df.append(df.iloc[:,1:].apply(lambda col: (col + 1).prod() - 1), ignore_index=True)

#        Index    Return
#0  2008-11-21  0.153419
#1  2008-11-24  0.037421
#2  2008-11-25  0.077500
#3         NaN  0.289316

Or as @Randy C commented, this can be further simplified to:

或者正如@Randy C 所评论的，这可以进一步简化为：

df.append((df.iloc[:,1:] + 1).prod() - 1, ignore_index=True)

Answer 4

回答by AlexG

Here is mine:

这是我的：

from numpy import prod
df.append(df.apply(lambda col: prod([(1+c) for c in col]) - 1), ignore_index=True)

Answer 5

回答by Randy

One option is to just use reduce, though others might be able to come up with faster vectorized methods:

一种选择是仅使用reduce，尽管其他人可能能够提出更快的矢量化方法：

In [10]: pd.read_clipboard()
Out[10]:
        Index    Return
0  2008-11-21  0.153419
1  2008-11-24  0.037421
2  2008-11-25  0.077500

In [11]: reduce(lambda x, y: (1+x)*(1+y)-1, _10['Return'])
Out[11]: 0.28931612705992227

Note that in Python 3, reduceis part of the functoolslibrary, though it's a builtin for Python 2.

请注意，在 Python 3 中，它reduce是functools库的一部分，尽管它是 Python 2 的内置函数。

Pandas：累积回报函数

提问by Kelaref

回答by Steven G

回答by TheF1rstPancake

回答by Psidom

回答by AlexG

回答by Randy

相关推荐

最近更新

标签

Pandas：累积回报函数

提问by Kelaref

回答by Steven G

回答by TheF1rstPancake

回答by Psidom

回答by AlexG

回答by Randy

相关推荐

pandas 绘制带有通过循环附加跟踪的图表时无效的“figure_or_data”参数 - plotly

Pandas to_csv() 保存大数据帧的速度很慢

pandas 如何按一列分组并对另一列的值进行排序？

pandas “子集”不适用于 drop_duplicates 熊猫数据框

相关推荐

最近更新

标签