pandas 填充熊猫中缺失的索引

Question

提问by qua

I have data like follows:

我有如下数据：

import pandas as pd
from datetime import datetime

x = pd.Series([1, 2, 4], [datetime(2013,11,1), datetime(2013,11, 2), datetime(2013, 11, 4)])

The missing index at November 3rd corresponds to a zero value, and I want it to look like this:

11 月 3 日缺失的索引对应于零值，我希望它看起来像这样：

y = pd.Series([1,2,0,4], pd.date_range('2013-11-01', periods = 4))

What's the best way to convert x to y? I've tried

将 x 转换为 y 的最佳方法是什么？我试过了

y = pd.Series(x, pd.date_range('2013-11-1', periods = 4)).fillna(0)

This throws an index error sometimes which I can't interpret (Index length did not match values, even though index and data have the same length. Is there a better way to do this?

这有时会引发我无法解释的索引错误（索引长度与值不匹配，即使索引和数据具有相同的长度。有没有更好的方法来做到这一点？

Answer 1

回答by Roman Pekar

You can use pandas.Series.resample()for this:

您可以pandas.Series.resample()为此使用：

>>> x.resample('D').fillna(0)
2013-11-01    1
2013-11-02    2
2013-11-03    0
2013-11-04    4

There's fill_methodparameter in the resample()function, but I don't know if it's possible to use it to replace NaNduring resampling. But looks like you can use howmethod to take care of it, like:

函数里有fill_method参数，resample()不知道NaN重采样时能不能用它来替换。但是看起来您可以使用how方法来处理它，例如：

>>> x.resample('D', how=lambda x: x.mean() if len(x) > 0 else 0)
2013-11-01    1
2013-11-02    2
2013-11-03    0
2013-11-04    4

Don't know which method is preferred one. Please also take a look at @AndyHayden's answer - probably reindex()with fill_value=0would be most efficien way to do this, but you have to make your own tests.

不知道哪种方法是首选。也请看一看@ AndyHayden的答案-可能reindex()与fill_value=0将做到这一点最efficien方式，但你必须让自己的测试。

Answer 2

回答by Andy Hayden

I think I would use a resample(note if there are dupes it takes the mean by default):

我想我会使用重采样（请注意，如果有重复，默认情况下取平均值）：

In [11]: x.resample('D')  # you could use how='first'
Out[11]: 
2013-11-01     1
2013-11-02     2
2013-11-03   NaN
2013-11-04     4
Freq: D, dtype: float64

In [12]: x.resample('D').fillna(0)
Out[12]: 
2013-11-01    1
2013-11-02    2
2013-11-03    0
2013-11-04    4
Freq: D, dtype: float64

If you prefered dupes to raise, then use reindex:

如果您更喜欢欺骗者，请使用reindex：

In [13]: x.reindex(pd.date_range('2013-11-1', periods=4), fill_value=0)
Out[13]: 
2013-11-01   1
2013-11-02   2
2013-11-03   0
2013-11-04   4
Freq: D, dtype: float64

pandas 填充熊猫中缺失的索引

提问by qua

回答by Roman Pekar

回答by Andy Hayden

相关推荐

最近更新

标签

pandas 填充熊猫中缺失的索引

提问by qua

回答by Roman Pekar

回答by Andy Hayden

相关推荐

在带有分层索引的 Pandas 数据框中使用 iloc 时遇到问题

Pandas：一种使用命名元组列表初始化数据框的简洁方法

pandas 在熊猫表中插入链接

在 Pandas 中按小时过滤

相关推荐

最近更新

标签