python pandas:如何计算导数/梯度
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/41780489/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
python pandas: how to calculate derivative/gradient
提问by nskalis
Given that I have the following two vectors:
鉴于我有以下两个向量:
In [99]: time_index
Out[99]:
[1484942413,
1484942712,
1484943012,
1484943312,
1484943612,
1484943912,
1484944212,
1484944511,
1484944811,
1484945110]
In [100]: bytes_in
Out[100]:
[1293981210388,
1293981379944,
1293981549960,
1293981720866,
1293981890968,
1293982062261,
1293982227492,
1293982391244,
1293982556526,
1293982722320]
Where bytes_inis an incremental only counter, and time_indexis a list to unix timestamps (epoch).
其中bytes_in是仅增量计数器,而time_index是 unix 时间戳(纪元)的列表。
Objective:What I would like to calculate is the bitrate.
目标:我想计算的是比特率。
That means that I will build a data frame like
这意味着我将构建一个数据框,如
In [101]: timeline = pandas.to_datetime(time_index, unit="s")
In [102]: recv = pandas.Series(bytes_in, timeline).resample("300S").mean().ffill().apply(lambda i: i*8)
In [103]: recv
Out[103]:
2017-01-20 20:00:00 10351849683104
2017-01-20 20:05:00 10351851039552
2017-01-20 20:10:00 10351852399680
2017-01-20 20:15:00 10351853766928
2017-01-20 20:20:00 10351855127744
2017-01-20 20:25:00 10351856498088
2017-01-20 20:30:00 10351857819936
2017-01-20 20:35:00 10351859129952
2017-01-20 20:40:00 10351860452208
2017-01-20 20:45:00 10351861778560
Freq: 300S, dtype: int64
Question:Now, what is strange, calculating the gradient manually gives me :
问题:现在,奇怪的是,手动计算梯度给了我:
In [104]: (bytes_in[1]-bytes_in[0])*8/300
Out[104]: 4521.493333333333
which is the correct value ..
这是正确的值..
while calculating the gradient with pandas gives me
用熊猫计算梯度时给了我
In [124]: recv.diff()
Out[124]:
2017-01-20 20:00:00 NaN
2017-01-20 20:05:00 1356448.0
2017-01-20 20:10:00 1360128.0
2017-01-20 20:15:00 1367248.0
2017-01-20 20:20:00 1360816.0
2017-01-20 20:25:00 1370344.0
2017-01-20 20:30:00 1321848.0
2017-01-20 20:35:00 1310016.0
2017-01-20 20:40:00 1322256.0
2017-01-20 20:45:00 1326352.0
Freq: 300S, dtype: float64
which is not the same as above, 1356448.0 is different than 4521.493333333333
与上述不同,1356448.0 与 4521.493333333333 不同
Could you please enlighten on what I am doing wrong ?
你能告诉我我做错了什么吗?
回答by piRSquared
pd.Series.diff()
only takes the differences. It doesn't divide by the delta of the index as well.
pd.Series.diff()
只需要差异。它也不除以指数的增量。
This gets you the answer
这给你答案
recv.diff() / recv.index.to_series().diff().dt.total_seconds()
2017-01-20 20:00:00 NaN
2017-01-20 20:05:00 4521.493333
2017-01-20 20:10:00 4533.760000
2017-01-20 20:15:00 4557.493333
2017-01-20 20:20:00 4536.053333
2017-01-20 20:25:00 4567.813333
2017-01-20 20:30:00 4406.160000
2017-01-20 20:35:00 4366.720000
2017-01-20 20:40:00 4407.520000
2017-01-20 20:45:00 4421.173333
Freq: 300S, dtype: float64
You could also use numpy.gradient
passing the bytes_in
and the delta you expect to have. This will not reduce the length by one, instead making assumptions about the edges.
您还可以使用numpy.gradient
传递bytes_in
期望的和 delta 。这不会将长度减一,而是对边缘进行假设。
np.gradient(bytes_in, 300) * 8
array([ 4521.49333333, 4527.62666667, 4545.62666667, 4546.77333333,
4551.93333333, 4486.98666667, 4386.44 , 4387.12 ,
4414.34666667, 4421.17333333])
回答by Zitzero
A naive explanation would be that diff literally subtracts following entries while np.gradient uses a central difference scheme.
一个天真的解释是 diff 从字面上减去以下条目,而 np.gradient 使用中心差异方案。
回答by scls
As there is no builtin derivative
method in Pandas Series / DataFrame you can use https://github.com/scls19fr/pandas-helper-calc.
由于derivative
Pandas Series / DataFrame 中没有内置方法,您可以使用https://github.com/scls19fr/pandas-helper-calc。
It will provide a new accessor called calc
to Pandas Series and DataFrames to calculate numerically derivative and integral.
它将提供一个名为calc
Pandas Series 和 DataFrames的新访问器来计算数值导数和积分。
So you will be able to simply do
所以你将能够简单地做
recv.calc.derivative()
It's using diff()
under the hood.
它diff()
在引擎盖下使用。
回答by Messypuddle
Can you explain why np.gradient doesn't produce the same results as the first proposed answer. – Darthtrader May 5 at 9:58
你能解释为什么 np.gradient 不会产生与第一个建议的答案相同的结果。– Darthtrader 5 月 5 日 9:58
np.gradient uses a 2nd order scheme while .diff() uses a 1st order scheme. This means that the results from np.gradient will be continuous as will the derivative. The results from .diff() do not have to have a continuous derivative. Essentially np.gradient gives 'smoother' results.
np.gradient 使用二阶方案,而 .diff() 使用一阶方案。这意味着 np.gradient 的结果将是连续的,导数也是如此。.diff() 的结果不必具有连续导数。本质上 np.gradient 给出了“更平滑”的结果。
回答by Merv Merzoug
Or if you'd like to calculate the rate of change you can just use df.pct_change()
或者,如果您想计算变化率,您可以使用 df.pct_change()
As a parameter you can enter df.pct_change(n)
, where n
is the lookback period assuming you have a datetime indexed dataframe.
作为参数,您可以输入df.pct_change(n)
,n
假设您有一个日期时间索引的数据框,回溯期在哪里。