Pandas TimeSeries 重采样产生 NaN

Question

提问by Peter Lenaers

I am resampling a Pandas TimeSeries. The timeseries consist of binary values (it is a categorical variable) with no missing values, but after resampling NaNs appear. How is this possible?

我正在重新采样 Pandas TimeSeries。时间序列由没有缺失值的二进制值（它是一个分类变量）组成，但在重新采样后出现 NaN。这怎么可能？

I can't post any example data here since it is sensitive info, but I create and resample the series as follows:

我无法在此处发布任何示例数据，因为它是敏感信息，但我创建并重新采样该系列如下：

series = pd.Series(data, ts)
series_rs = series.resample('60T', how='mean')

Answer 1

回答by jezrael

upsamplingconverts to a regular time interval, so if there are no samples you get NaN.

upsampling转换为固定的时间间隔，因此如果没有样本，您会得到NaN.

You can fill missing values backward by fill_method='bfill'or for forward - fill_method='ffill'or fill_method='pad'.

您可以通过fill_method='bfill'或 for forward -fill_method='ffill'或向后填充缺失值fill_method='pad'。

import pandas as pd

ts = pd.date_range('1/1/2015', periods=10, freq='100T')
data = range(10)
series = pd.Series(data, ts)
print series
#2015-01-01 00:00:00    0
#2015-01-01 01:40:00    1
#2015-01-01 03:20:00    2
#2015-01-01 05:00:00    3
#2015-01-01 06:40:00    4
#2015-01-01 08:20:00    5
#2015-01-01 10:00:00    6
#2015-01-01 11:40:00    7
#2015-01-01 13:20:00    8
#2015-01-01 15:00:00    9
#Freq: 100T, dtype: int64
series_rs = series.resample('60T', how='mean')
print series_rs
#2015-01-01 00:00:00     0
#2015-01-01 01:00:00     1
#2015-01-01 02:00:00   NaN
#2015-01-01 03:00:00     2
#2015-01-01 04:00:00   NaN
#2015-01-01 05:00:00     3
#2015-01-01 06:00:00     4
#2015-01-01 07:00:00   NaN
#2015-01-01 08:00:00     5
#2015-01-01 09:00:00   NaN
#2015-01-01 10:00:00     6
#2015-01-01 11:00:00     7
#2015-01-01 12:00:00   NaN
#2015-01-01 13:00:00     8
#2015-01-01 14:00:00   NaN
#2015-01-01 15:00:00     9
#Freq: 60T, dtype: float64
series_rs = series.resample('60T', how='mean', fill_method='bfill')
print series_rs
#2015-01-01 00:00:00    0
#2015-01-01 01:00:00    1
#2015-01-01 02:00:00    2
#2015-01-01 03:00:00    2
#2015-01-01 04:00:00    3
#2015-01-01 05:00:00    3
#2015-01-01 06:00:00    4
#2015-01-01 07:00:00    5
#2015-01-01 08:00:00    5
#2015-01-01 09:00:00    6
#2015-01-01 10:00:00    6
#2015-01-01 11:00:00    7
#2015-01-01 12:00:00    8
#2015-01-01 13:00:00    8
#2015-01-01 14:00:00    9
#2015-01-01 15:00:00    9
#Freq: 60T, dtype: float64

Answer 2

回答by Bart Bisschops

Please note that fill_method has now been deprecated. resample()now returns a resampling object on which you can perform operations just like a groupby object.

请注意，fill_method 现在已被弃用。resample()现在返回一个重采样对象，您可以像 groupby 对象一样对其执行操作。

common downsampling operations:

常见的下采样操作：

.mean()
.sum()
.agg()
.apply()

upsampling operations:

上采样操作：

.ffill()
.bfill()

See the whats-new message in the documentation https://pandas.pydata.org/pandas-docs/stable/whatsnew.html#whatsnew-0180-breaking-resample

请参阅文档中的最新消息 https://pandas.pydata.org/pandas-docs/stable/whatsnew.html#whatsnew-0180-break-resample

so the example would become

所以这个例子会变成

series_rs = series.resample('60T').mean()

Pandas TimeSeries 重采样产生 NaN

提问by Peter Lenaers

回答by jezrael

回答by Bart Bisschops

相关推荐

最近更新

标签

Pandas TimeSeries 重采样产生 NaN

提问by Peter Lenaers

回答by jezrael

回答by Bart Bisschops

相关推荐

按多列和重复索引对 Pandas DataFrame 进行排序

具有布尔值和整数的数据帧的 Pandas 条件子集

Pandas str.contains 用于部分字符串的精确匹配

Python Pandas 按二级索引（或任何其他级别）对多索引进行切片

相关推荐

最近更新

标签