Pandas TimeSeries 重采样产生 NaN

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/33364590/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 00:06:52  来源:igfitidea点击:

Pandas TimeSeries resample produces NaNs

pythonpandastime-seriesresampling

提问by Peter Lenaers

I am resampling a Pandas TimeSeries. The timeseries consist of binary values (it is a categorical variable) with no missing values, but after resampling NaNs appear. How is this possible?

我正在重新采样 Pandas TimeSeries。时间序列由没有缺失值的二进制值(它是一个分类变量)组成,但在重新采样后出现 NaN。这怎么可能?

I can't post any example data here since it is sensitive info, but I create and resample the series as follows:

我无法在此处发布任何示例数据,因为它是敏感信息,但我创建并重新采样该系列如下:

series = pd.Series(data, ts)
series_rs = series.resample('60T', how='mean')

回答by jezrael

upsamplingconverts to a regular time interval, so if there are no samples you get NaN.

upsampling转换为固定的时间间隔,因此如果没有样本,您会得到NaN.

You can fill missing values backward by fill_method='bfill'or for forward - fill_method='ffill'or fill_method='pad'.

您可以通过fill_method='bfill'或 for forward -fill_method='ffill'或 向后填充缺失值fill_method='pad'

import pandas as pd

ts = pd.date_range('1/1/2015', periods=10, freq='100T')
data = range(10)
series = pd.Series(data, ts)
print series
#2015-01-01 00:00:00    0
#2015-01-01 01:40:00    1
#2015-01-01 03:20:00    2
#2015-01-01 05:00:00    3
#2015-01-01 06:40:00    4
#2015-01-01 08:20:00    5
#2015-01-01 10:00:00    6
#2015-01-01 11:40:00    7
#2015-01-01 13:20:00    8
#2015-01-01 15:00:00    9
#Freq: 100T, dtype: int64
series_rs = series.resample('60T', how='mean')
print series_rs
#2015-01-01 00:00:00     0
#2015-01-01 01:00:00     1
#2015-01-01 02:00:00   NaN
#2015-01-01 03:00:00     2
#2015-01-01 04:00:00   NaN
#2015-01-01 05:00:00     3
#2015-01-01 06:00:00     4
#2015-01-01 07:00:00   NaN
#2015-01-01 08:00:00     5
#2015-01-01 09:00:00   NaN
#2015-01-01 10:00:00     6
#2015-01-01 11:00:00     7
#2015-01-01 12:00:00   NaN
#2015-01-01 13:00:00     8
#2015-01-01 14:00:00   NaN
#2015-01-01 15:00:00     9
#Freq: 60T, dtype: float64
series_rs = series.resample('60T', how='mean', fill_method='bfill')
print series_rs
#2015-01-01 00:00:00    0
#2015-01-01 01:00:00    1
#2015-01-01 02:00:00    2
#2015-01-01 03:00:00    2
#2015-01-01 04:00:00    3
#2015-01-01 05:00:00    3
#2015-01-01 06:00:00    4
#2015-01-01 07:00:00    5
#2015-01-01 08:00:00    5
#2015-01-01 09:00:00    6
#2015-01-01 10:00:00    6
#2015-01-01 11:00:00    7
#2015-01-01 12:00:00    8
#2015-01-01 13:00:00    8
#2015-01-01 14:00:00    9
#2015-01-01 15:00:00    9
#Freq: 60T, dtype: float64

回答by Bart Bisschops

Please note that fill_method has now been deprecated. resample()now returns a resampling object on which you can perform operations just like a groupby object.

请注意,fill_method 现在已被弃用。resample()现在返回一个重采样对象,您可以像 groupby 对象一样对其执行操作。

common downsampling operations:

常见的下采样操作:

.mean()
.sum()
.agg()
.apply()

upsampling operations:

上采样操作:

.ffill()
.bfill()

See the whats-new message in the documentation https://pandas.pydata.org/pandas-docs/stable/whatsnew.html#whatsnew-0180-breaking-resample

请参阅文档中的最新消息 https://pandas.pydata.org/pandas-docs/stable/whatsnew.html#whatsnew-0180-break-resample

so the example would become

所以这个例子会变成

series_rs = series.resample('60T').mean()