pandas 计算系列的本地时间导数

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/39235712/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 01:55:20  来源:igfitidea点击:

Calculate local time derivative of Series

pythonpandasdatetimenumpy

提问by Adam

I have data that I'm importing from an hdf5 file. So, it comes in looking like this:

我有从 hdf5 文件导入的数据。所以,它看起来像这样:

import pandas as pd
tmp=pd.Series([1.,3.,4.,3.,5.],['2016-06-27 23:52:00','2016-06-27 23:53:00','2016-06-27 23:54:00','2016-06-27 23:55:00','2016-06-27 23:59:00'])
tmp.index=pd.to_datetime(tmp.index)

>>>tmp
2016-06-27 23:52:00    1.0
2016-06-27 23:53:00    3.0
2016-06-27 23:54:00    4.0
2016-06-27 23:55:00    3.0
2016-06-27 23:59:00    5.0
dtype: float64

I would like to find the local slope of the data. If I just do tmp.diff() I do get the local change in value. But, I want to get the change in value per second (time derivative) I would like to do something like this, but this is the wrong way to do it and gives an error:

我想找到数据的局部斜率。如果我只是做 tmp.diff() 我确实得到了本地的价值变化。但是,我想获得每秒值的变化(时间导数)我想做这样的事情,但这是错误的方法并给出错误:

tmp.diff()/tmp.index.diff()

I have figured out that I can do it by converting all the data to a DataFrame, but that seems inefficient. Especially, since I'm going to have to work with a large, on disk file in chunks. Is there a better way to do it other than this:

我发现我可以通过将所有数据转换为 DataFrame 来实现,但这似乎效率低下。特别是,因为我将不得不分块处理一个大的磁盘文件。除了这个,还有没有更好的方法来做到这一点:

df=pd.DataFrame(tmp)
df['secvalue']=df.index.astype(np.int64)/1e+9
df['slope']=df['Value'].diff()/df['secvalue'].diff()

回答by piRSquared

Use numpy.gradient

numpy.gradient

import numpy as np
import pandas as pd

slope = pd.Series(np.gradient(tmp.values), tmp.index, name='slope')

To address the unequal temporal index, i'd resample over minutes and interpolate. Then my gradients would be over equal intervals.

为了解决不相等的时间索引,我会在几分钟内重新采样并进行插值。然后我的梯度将超过相等的间隔。

tmp_ = tmp.resample('T').interpolate()

slope = pd.Series(np.gradient(tmp_.values), tmp_.index, name='slope')

df = pd.concat([tmp_.rename('values'), slope], axis=1)
df

enter image description here

在此处输入图片说明

df.plot()

enter image description here

在此处输入图片说明