在 Pandas 中减少一列

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/14542145/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-13 20:37:07  来源:igfitidea点击:

Reductions down a column in Pandas

pythonpandas

提问by Isaac

I'm trying to transform a (well, many) column of return data to a column of closing prices. In Clojure, I'd use reductions, which is like reduce, but returns a sequence of all the intermediate values.

我正在尝试将一列(很多)返回数据转换为一列收盘价。在 Clojure 中,我会使用reductions,它类似于reduce,但返回所有中间值的序列。

e.g.

例如

$ c

0.12
-.13
0.23
0.17
0.29
-0.11

# something like this
$ c.reductions(init=1, lambda accumulator, ret: accumulator * (1 + ret)) 

1.12
0.97
1.20
1.40
1.81
1.61

NB: The actual closing price doesn't matter, hence using 1 as the initial value. I just need a "mock" closing price.

注意:实际收盘价无关紧要,因此使用 1 作为初始值。我只需要一个“模拟”收盘价。

My data's actual structure is a DataFrame of named columns of TimeSeries. I guess I'm looking for a function similar applymap, but I'd rather not do something hacky with that function and reference the DF from within it (which I suppose is one solution to this problem?)

我的数据的实际结构是 TimeSeries 命名列的 DataFrame。我想我正在寻找一个类似的函数applymap,但我宁愿不对该函数做一些hacky 并从其中引用 DF(我认为这是解决此问题的一种方法?)

Additionally, what would I do if I wanted to keep the returnsdata, but have the closing "price" with it? Should I return a tuple instead, and have the TimeSeries be of the type (returns, closing_price)?

此外,如果我想保留returns数据,但有收盘价,我该怎么办?我应该返回一个元组,并将 TimeSeries 设为类型(returns, closing_price)吗?

采纳答案by Andy Hayden

It's worth noting that it's often faster (as well as easier to understand) to write more verbosely in pandas, rather than write as a reduce.

值得注意的是,在 Pandas 中更详细地编写通常更快(也更容易理解),而不是编写为reduce.

In your specific example I would just addand then cumprod:

在你的具体例子中,我只是add然后cumprod

In [2]: c.add(1).cumprod()
Out[2]: 
0    1.120000
1    0.974400
2    1.198512
3    1.402259
4    1.808914
5    1.609934

or perhaps init * c.add(1).cumprod().

或者也许init * c.add(1).cumprod()

Note: In some cases however, for example where memory is an issue, you may have to rewrite these in a more low-level/clever way, but it's usually worth trying the simplest method first (and testing against it e.g. using %timeit or profiling memory).

注意:然而,在某些情况下,例如在内存有问题的情况下,您可能必须以更低级/更聪明的方式重写它们,但通常值得先尝试最简单的方法(并针对它进行测试,例如使用 %timeit 或分析内存)。

回答by Zelazny7

It doesn't look like it's a well publicized feature yet, but you can use expanding_applyto achieve the returns calculation:

它看起来还不是一个广为人知的功能,但您可以使用它expanding_apply来实现收益计算:

In [1]: s
Out[1]:
0    0.12
1   -0.13
2    0.23
3    0.17
4    0.29
5   -0.11

In [2]: pd.expanding_apply(s ,lambda s: reduce(lambda x, y: x * (1+y), s, 1))

Out[2]:
0    1.120000
1    0.974400
2    1.198512
3    1.402259
4    1.808914
5    1.609934

I'm not 100% certain, but I believe expanding_applyworks on the applied series starting from the first index through the current index. I use the built-in reducefunction that works exactly like your Clojure function.

我不是 100% 确定,但我相信expanding_apply适用于从第一个索引到当前索引的应用系列。我使用的内置reduce函数与您的 Clojure 函数完全一样。

Docstring for expanding_apply:

文档字符串expanding_apply

Generic expanding function application

Parameters
----------
arg : Series, DataFrame
func : function
    Must produce a single value from an ndarray input
min_periods : int
    Minimum number of observations in window required to have a value
freq : None or string alias / date offset object, default=None
    Frequency to conform to before computing statistic
center : boolean, default False
    Whether the label should correspond with center of window

Returns
-------
y : type of input argument

回答by Alexander

For readability, I prefer the following solution:

为了可读性,我更喜欢以下解决方案:

returns = pd.Series([0.12, -.13, 0.23, 0.17, 0.29, -0.11])

initial_value = 100
cum_growth = initial_value * (1 + returns).cumprod()

>>> cum_growth
0    112.000000
1     97.440000
2    119.851200
3    140.225904
4    180.891416
5    160.993360
dtype: float64

If you'd like to include the initial value in the series:

如果您想在系列中包含初始值:

>>> pd.concat([pd.Series(initial_value), cum_growth]).reset_index(drop=True)
0    100.000000
1    112.000000
2     97.440000
3    119.851200
4    140.225904
5    180.891416
6    160.993360
dtype: float64