python pandas:如何计算导数/梯度

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/41780489/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-20 01:35:59  来源:igfitidea点击:

python pandas: how to calculate derivative/gradient

pythonpandasdata-analysis

提问by nskalis

Given that I have the following two vectors:

鉴于我有以下两个向量:

In [99]: time_index
Out[99]: 
[1484942413,
 1484942712,
 1484943012,
 1484943312,
 1484943612,
 1484943912,
 1484944212,
 1484944511,
 1484944811,
 1484945110]

In [100]: bytes_in
Out[100]: 
[1293981210388,
 1293981379944,
 1293981549960,
 1293981720866,
 1293981890968,
 1293982062261,
 1293982227492,
 1293982391244,
 1293982556526,
 1293982722320]

Where bytes_inis an incremental only counter, and time_indexis a list to unix timestamps (epoch).

其中bytes_in是仅增量计数器,而time_index是 unix 时间戳(纪元)的列表。

Objective:What I would like to calculate is the bitrate.

目标:我想计算的是比特率。

That means that I will build a data frame like

这意味着我将构建一个数据框,如

In [101]: timeline = pandas.to_datetime(time_index, unit="s")

In [102]: recv = pandas.Series(bytes_in, timeline).resample("300S").mean().ffill().apply(lambda i: i*8)

In [103]: recv
Out[103]: 
2017-01-20 20:00:00    10351849683104
2017-01-20 20:05:00    10351851039552
2017-01-20 20:10:00    10351852399680
2017-01-20 20:15:00    10351853766928
2017-01-20 20:20:00    10351855127744
2017-01-20 20:25:00    10351856498088
2017-01-20 20:30:00    10351857819936
2017-01-20 20:35:00    10351859129952
2017-01-20 20:40:00    10351860452208
2017-01-20 20:45:00    10351861778560
Freq: 300S, dtype: int64

Question:Now, what is strange, calculating the gradient manually gives me :

问题:现在,奇怪的是,手动计算梯度给了我:

In [104]: (bytes_in[1]-bytes_in[0])*8/300
Out[104]: 4521.493333333333

which is the correct value ..

这是正确的值..

while calculating the gradient with pandas gives me

用熊猫计算梯度时给了我

In [124]: recv.diff()
Out[124]: 
2017-01-20 20:00:00          NaN
2017-01-20 20:05:00    1356448.0
2017-01-20 20:10:00    1360128.0
2017-01-20 20:15:00    1367248.0
2017-01-20 20:20:00    1360816.0
2017-01-20 20:25:00    1370344.0
2017-01-20 20:30:00    1321848.0
2017-01-20 20:35:00    1310016.0
2017-01-20 20:40:00    1322256.0
2017-01-20 20:45:00    1326352.0
Freq: 300S, dtype: float64

which is not the same as above, 1356448.0 is different than 4521.493333333333

与上述不同1356448.0 与 4521.493333333333 不同

Could you please enlighten on what I am doing wrong ?

你能告诉我我做错了什么吗?

回答by piRSquared

pd.Series.diff()only takes the differences. It doesn't divide by the delta of the index as well.

pd.Series.diff()只需要差异。它也不除以指数的增量。

This gets you the answer

这给你答案

recv.diff() / recv.index.to_series().diff().dt.total_seconds()

2017-01-20 20:00:00            NaN
2017-01-20 20:05:00    4521.493333
2017-01-20 20:10:00    4533.760000
2017-01-20 20:15:00    4557.493333
2017-01-20 20:20:00    4536.053333
2017-01-20 20:25:00    4567.813333
2017-01-20 20:30:00    4406.160000
2017-01-20 20:35:00    4366.720000
2017-01-20 20:40:00    4407.520000
2017-01-20 20:45:00    4421.173333
Freq: 300S, dtype: float64


You could also use numpy.gradientpassing the bytes_inand the delta you expect to have. This will not reduce the length by one, instead making assumptions about the edges.

您还可以使用numpy.gradient传递bytes_in期望的和 delta 。这不会将长度减一,而是对边缘进行假设。

np.gradient(bytes_in, 300) * 8

array([ 4521.49333333,  4527.62666667,  4545.62666667,  4546.77333333,
        4551.93333333,  4486.98666667,  4386.44      ,  4387.12      ,
        4414.34666667,  4421.17333333])

回答by Zitzero

A naive explanation would be that diff literally subtracts following entries while np.gradient uses a central difference scheme.

一个天真的解释是 diff 从字面上减去以下条目,而 np.gradient 使用中心差异方案。

回答by scls

As there is no builtin derivativemethod in Pandas Series / DataFrame you can use https://github.com/scls19fr/pandas-helper-calc.

由于derivativePandas Series / DataFrame 中没有内置方法,您可以使用https://github.com/scls19fr/pandas-helper-calc

It will provide a new accessor called calcto Pandas Series and DataFrames to calculate numerically derivative and integral.

它将提供一个名为calcPandas Series 和 DataFrames的新访问器来计算数值导数和积分。

So you will be able to simply do

所以你将能够简单地做

recv.calc.derivative()

It's using diff()under the hood.

diff()在引擎盖下使用。

回答by Messypuddle

Can you explain why np.gradient doesn't produce the same results as the first proposed answer. – Darthtrader May 5 at 9:58

你能解释为什么 np.gradient 不会产生与第一个建议的答案相同的结果。– Darthtrader 5 月 5 日 9:58

np.gradient uses a 2nd order scheme while .diff() uses a 1st order scheme. This means that the results from np.gradient will be continuous as will the derivative. The results from .diff() do not have to have a continuous derivative. Essentially np.gradient gives 'smoother' results.

np.gradient 使用二阶方案,而 .diff() 使用一阶方案。这意味着 np.gradient 的结果将是连续的,导数也是如此。.diff() 的结果不必具有连续导数。本质上 np.gradient 给出了“更平滑”的结果。

回答by Merv Merzoug

Or if you'd like to calculate the rate of change you can just use df.pct_change()

或者,如果您想计算变化率,您可以使用 df.pct_change()

As a parameter you can enter df.pct_change(n), where nis the lookback period assuming you have a datetime indexed dataframe.

作为参数,您可以输入df.pct_change(n)n假设您有一个日期时间索引的数据框,回溯期在哪里。