pandas 填充熊猫数据框中的日期空白

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/28466772/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-13 22:56:31  来源:igfitidea点击:

Filling date gaps in pandas dataframe

pythondatetimecsvpandaspad

提问by sten

I have Pandas DataFrame (loaded from .csv) with Date-time as index.. where there is/have-to-be one entry per day. The problem is that I have gaps i.e. there is days for which I have no data at all. What is the easiest way to insert rows (days) in the gaps ? Also is there a way to control what is inserted in the columns as data ! Say 0 OR copy the prev day info OR to fill sliding increasing/decreasing values in the range from prev-date toward next-date data-values.

我有 Pandas DataFrame(从 .csv 加载),日期时间作为索引......每天有/必须有一个条目。问题是我有差距,即有几天我根本没有数据。在间隙中插入行(天)的最简单方法是什么?还有一种方法可以控制作为数据插入列中的内容!说 0 或复制前一天信息或以填充从前一个日期到下一个日期数据值范围内的滑动递增/递减值。

thanks

谢谢

Here is example 01-03 and 01-04 are missing :

以下是缺少 01-03 和 01-04 的示例:

In [60]: df['2015-01-06':'2015-01-01']
Out[60]: 
           Rate  High (est)  Low (est)
Date                                      
2015-01-06  1.19643      0.0000     0.0000
2015-01-05  1.20368      1.2186     1.1889
2015-01-02  1.21163      1.2254     1.1980
2015-01-01  1.21469      1.2282     1.2014


Still experimenting but this seems to solve the problem :

仍在试验中,但这似乎解决了问题:

df.set_index(pd.DatetimeIndex(df.Date),inplace=True)

and then resample... the reason being that importing the .csv with header-col-name Date, is not actually creating date-time-index, but Frozen-list whatever that means. resample() is expecting : if isinstance(ax, DatetimeIndex): .....

然后重新采样......原因是导入带有 header-col-name Date 的 .csv,实际上并不是创建日期时间索引,而是 Frozen-list 无论这意味着什么。resample() 期待:if isinstance(ax, DatetimeIndex): .....



Here is my final solution :

这是我的最终解决方案:

  #make dates the index
  self.df.set_index(pd.DatetimeIndex(self.df.Date), inplace=True)
  #fill the gaps
  self.df = self.df.resample('D',fill_method='pad')
  #fix the Date column
  self.df.Date = self.df.index.values

I had to fix the Date column, because resample() just allow you to pad-it. It fixes the index correctly though, so I could use it to fix the Date column.

我不得不修复日期列,因为 resample() 只允许您填充它。但是它正确修复了索引,所以我可以用它来修复日期列。

Here is snipped of the data after correction :

以下是更正后的数据截图:

2015-01-29 2015-01-29  1.13262      0.0000     0.0000
2015-01-30 2015-01-30  1.13161      1.1450     1.1184
2015-01-31 2015-01-31  1.13161      1.1450     1.1184
2015-02-01 2015-02-01  1.13161      1.1450     1.1184

01-30, 01-31 are the new generated data.

01-30、01-31是新生成的数据。

回答by Andy Hayden

You'll could resample by day e.g. using mean if there are multiple entries per day:

如果每天有多个条目,您可以按天重新采样,例如使用平均值:

df.resample('D', how='mean')

You can then ffillto replace NaNs with the previous days result.

然后ffill,您可以用前几天的结果替换 NaN。

See up and down samplingin the docs.

请参阅文档中的上下采样