pandas 填充熊猫中缺失的索引

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/20392265/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-13 21:24:30  来源:igfitidea点击:

fill missing indices in pandas

pythonpandas

提问by qua

I have data like follows:

我有如下数据:

import pandas as pd
from datetime import datetime

x = pd.Series([1, 2, 4], [datetime(2013,11,1), datetime(2013,11, 2), datetime(2013, 11, 4)])

The missing index at November 3rd corresponds to a zero value, and I want it to look like this:

11 月 3 日缺失的索引对应于零值,我希望它看起来像这样:

y = pd.Series([1,2,0,4], pd.date_range('2013-11-01', periods = 4))

What's the best way to convert x to y? I've tried

将 x 转换为 y 的最佳方法是什么?我试过了

y = pd.Series(x, pd.date_range('2013-11-1', periods = 4)).fillna(0)

This throws an index error sometimes which I can't interpret (Index length did not match values, even though index and data have the same length. Is there a better way to do this?

这有时会引发我无法解释的索引错误(索引长度与值不匹配,即使索引和数据具有相同的长度。有没有更好的方法来做到这一点?

回答by Roman Pekar

You can use pandas.Series.resample()for this:

您可以pandas.Series.resample()为此使用:

>>> x.resample('D').fillna(0)
2013-11-01    1
2013-11-02    2
2013-11-03    0
2013-11-04    4

There's fill_methodparameter in the resample()function, but I don't know if it's possible to use it to replace NaNduring resampling. But looks like you can use howmethod to take care of it, like:

函数里有fill_method参数,resample()不知道NaN重采样时能不能用它来替换。但是看起来您可以使用how方法来处理它,例如:

>>> x.resample('D', how=lambda x: x.mean() if len(x) > 0 else 0)
2013-11-01    1
2013-11-02    2
2013-11-03    0
2013-11-04    4

Don't know which method is preferred one. Please also take a look at @AndyHayden's answer - probably reindex()with fill_value=0would be most efficien way to do this, but you have to make your own tests.

不知道哪种方法是首选。也请看一看@ AndyHayden的答案-可能reindex()fill_value=0将做到这一点最efficien方式,但你必须让自己的测试。

回答by Andy Hayden

I think I would use a resample(note if there are dupes it takes the mean by default):

我想我会使用重采样(请注意,如果有重复,默认情况下取平均值):

In [11]: x.resample('D')  # you could use how='first'
Out[11]: 
2013-11-01     1
2013-11-02     2
2013-11-03   NaN
2013-11-04     4
Freq: D, dtype: float64

In [12]: x.resample('D').fillna(0)
Out[12]: 
2013-11-01    1
2013-11-02    2
2013-11-03    0
2013-11-04    4
Freq: D, dtype: float64

If you prefered dupes to raise, then use reindex:

如果您更喜欢欺骗者,请使用reindex

In [13]: x.reindex(pd.date_range('2013-11-1', periods=4), fill_value=0)
Out[13]: 
2013-11-01   1
2013-11-02   2
2013-11-03   0
2013-11-04   4
Freq: D, dtype: float64