python pandas：如何计算导数/梯度

Question

提问by nskalis

Given that I have the following two vectors:

鉴于我有以下两个向量：

In [99]: time_index
Out[99]: 
[1484942413,
 1484942712,
 1484943012,
 1484943312,
 1484943612,
 1484943912,
 1484944212,
 1484944511,
 1484944811,
 1484945110]

In [100]: bytes_in
Out[100]: 
[1293981210388,
 1293981379944,
 1293981549960,
 1293981720866,
 1293981890968,
 1293982062261,
 1293982227492,
 1293982391244,
 1293982556526,
 1293982722320]

Where bytes_inis an incremental only counter, and time_indexis a list to unix timestamps (epoch).

其中bytes_in是仅增量计数器，而time_index是 unix 时间戳（纪元）的列表。

Objective:What I would like to calculate is the bitrate.

目标：我想计算的是比特率。

That means that I will build a data frame like

这意味着我将构建一个数据框，如

In [101]: timeline = pandas.to_datetime(time_index, unit="s")

In [102]: recv = pandas.Series(bytes_in, timeline).resample("300S").mean().ffill().apply(lambda i: i*8)

In [103]: recv
Out[103]: 
2017-01-20 20:00:00    10351849683104
2017-01-20 20:05:00    10351851039552
2017-01-20 20:10:00    10351852399680
2017-01-20 20:15:00    10351853766928
2017-01-20 20:20:00    10351855127744
2017-01-20 20:25:00    10351856498088
2017-01-20 20:30:00    10351857819936
2017-01-20 20:35:00    10351859129952
2017-01-20 20:40:00    10351860452208
2017-01-20 20:45:00    10351861778560
Freq: 300S, dtype: int64

Question:Now, what is strange, calculating the gradient manually gives me :

问题：现在，奇怪的是，手动计算梯度给了我：

In [104]: (bytes_in[1]-bytes_in[0])*8/300
Out[104]: 4521.493333333333

which is the correct value ..

这是正确的值..

while calculating the gradient with pandas gives me

用熊猫计算梯度时给了我

In [124]: recv.diff()
Out[124]: 
2017-01-20 20:00:00          NaN
2017-01-20 20:05:00    1356448.0
2017-01-20 20:10:00    1360128.0
2017-01-20 20:15:00    1367248.0
2017-01-20 20:20:00    1360816.0
2017-01-20 20:25:00    1370344.0
2017-01-20 20:30:00    1321848.0
2017-01-20 20:35:00    1310016.0
2017-01-20 20:40:00    1322256.0
2017-01-20 20:45:00    1326352.0
Freq: 300S, dtype: float64

which is not the same as above, 1356448.0 is different than 4521.493333333333

与上述不同，1356448.0 与 4521.493333333333 不同

Could you please enlighten on what I am doing wrong ?

你能告诉我我做错了什么吗？

Answer 1

回答by piRSquared

pd.Series.diff()only takes the differences. It doesn't divide by the delta of the index as well.

pd.Series.diff()只需要差异。它也不除以指数的增量。

This gets you the answer

这给你答案

recv.diff() / recv.index.to_series().diff().dt.total_seconds()

2017-01-20 20:00:00            NaN
2017-01-20 20:05:00    4521.493333
2017-01-20 20:10:00    4533.760000
2017-01-20 20:15:00    4557.493333
2017-01-20 20:20:00    4536.053333
2017-01-20 20:25:00    4567.813333
2017-01-20 20:30:00    4406.160000
2017-01-20 20:35:00    4366.720000
2017-01-20 20:40:00    4407.520000
2017-01-20 20:45:00    4421.173333
Freq: 300S, dtype: float64

You could also use numpy.gradientpassing the bytes_inand the delta you expect to have. This will not reduce the length by one, instead making assumptions about the edges.

您还可以使用numpy.gradient传递bytes_in期望的和 delta 。这不会将长度减一，而是对边缘进行假设。

np.gradient(bytes_in, 300) * 8

array([ 4521.49333333,  4527.62666667,  4545.62666667,  4546.77333333,
        4551.93333333,  4486.98666667,  4386.44      ,  4387.12      ,
        4414.34666667,  4421.17333333])

Answer 2

回答by Zitzero

A naive explanation would be that diff literally subtracts following entries while np.gradient uses a central difference scheme.

一个天真的解释是 diff 从字面上减去以下条目，而 np.gradient 使用中心差异方案。

Answer 3

回答by scls

As there is no builtin derivativemethod in Pandas Series / DataFrame you can use https://github.com/scls19fr/pandas-helper-calc.

由于derivativePandas Series / DataFrame 中没有内置方法，您可以使用https://github.com/scls19fr/pandas-helper-calc。

It will provide a new accessor called calcto Pandas Series and DataFrames to calculate numerically derivative and integral.

它将提供一个名为calcPandas Series 和 DataFrames的新访问器来计算数值导数和积分。

So you will be able to simply do

所以你将能够简单地做

recv.calc.derivative()

It's using diff()under the hood.

它diff()在引擎盖下使用。

Answer 4

回答by Messypuddle

Can you explain why np.gradient doesn't produce the same results as the first proposed answer. – Darthtrader May 5 at 9:58

你能解释为什么 np.gradient 不会产生与第一个建议的答案相同的结果。– Darthtrader 5 月 5 日 9:58

np.gradient uses a 2nd order scheme while .diff() uses a 1st order scheme. This means that the results from np.gradient will be continuous as will the derivative. The results from .diff() do not have to have a continuous derivative. Essentially np.gradient gives 'smoother' results.

np.gradient 使用二阶方案，而 .diff() 使用一阶方案。这意味着 np.gradient 的结果将是连续的，导数也是如此。.diff() 的结果不必具有连续导数。本质上 np.gradient 给出了“更平滑”的结果。

Answer 5

回答by Merv Merzoug

Or if you'd like to calculate the rate of change you can just use df.pct_change()

或者，如果您想计算变化率，您可以使用 df.pct_change()

As a parameter you can enter df.pct_change(n), where nis the lookback period assuming you have a datetime indexed dataframe.

作为参数，您可以输入df.pct_change(n)，n假设您有一个日期时间索引的数据框，回溯期在哪里。

python pandas：如何计算导数/梯度

提问by nskalis

回答by piRSquared

回答by Zitzero

回答by scls

回答by Messypuddle

回答by Merv Merzoug

相关推荐

最近更新

标签

python pandas：如何计算导数/梯度

提问by nskalis

回答by piRSquared

回答by Zitzero

回答by scls

回答by Messypuddle

回答by Merv Merzoug

相关推荐

Python 使用 Beautiful Soup 查找特定类

为 Python 3.6 更新 pip3？

Python ssl.SSLError: [SSL: CERTIFICATE_VERIFY_FAILED] 证书验证失败 (_ssl.c:749)

Python 如何降级 conda 版本？

相关推荐

最近更新

标签