pandas 从稀疏数据帧填充连续的熊猫数据帧

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/13370525/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-13 20:29:23  来源:igfitidea点击:

Filling continuous pandas dataframe from sparse dataframe

pythonpython-2.7pandas

提问by Brian Keegan

I have a dictionary name date_dict keyed by datetime dates with values corresponding to integer counts of observations. I convert this to a sparse series/dataframe with censored observations that I would like to join or convert to a series/dataframe with continuous dates. The nasty list comprehension is my hack to get around the fact that pandas apparently won't automatically covert datetime date objects to an appropriate DateTime index.

我有一个字典名称 date_dict 以日期时间日期为键,其值对应于观察的整数计数。我将其转换为带有删失观察的稀疏系列/数据框,我想加入或转换为具有连续日期的系列/数据框。令人讨厌的列表理解是我绕过Pandas显然不会自动将日期时间日期对象转换为适当的日期时间索引这一事实的技巧。

df1 = pd.DataFrame(data=date_dict.values(),
                   index=[datetime.datetime.combine(i, datetime.time()) 
                          for i in date_dict.keys()],
                   columns=['Name'])
df1 = df1.sort(axis=0)

This example has 1258 observations and the DateTime index runs from 2003-06-24 to 2012-11-07.

此示例有 1258 个观测值,DateTime 索引从 2003-06-24 到 2012-11-07 运行。

df1.head()
             Name
Date
2003-06-24   2
2003-08-13   1
2003-08-19   2
2003-08-22   1
2003-08-24   5

I can create an empty dataframe with a continuous DateTime index, but this introduces an unneeded column and seems clunky. I feel as though I'm missing a more elegant solution involving a join.

我可以创建一个带有连续 DateTime 索引的空数据框,但这会引入一个不需要的列并且看起来很笨重。我觉得好像我错过了一个更优雅的解决方案,涉及连接。

df2 = pd.DataFrame(data=None,columns=['Empty'],
                   index=pd.DateRange(min(date_dict.keys()),
                                      max(date_dict.keys())))
df3 = df1.join(df2,how='right')
df3.head()
            Name    Empty
2003-06-24   2   NaN
2003-06-25  NaN  NaN
2003-06-26  NaN  NaN
2003-06-27  NaN  NaN
2003-06-30  NaN  NaN

Is there a simpler or more elegant way to fill a continuous dataframe from a sparse dataframe so that there is (1) a continuous index, (2) the NaNs are 0s, and (3) there is no left-over empty column in the dataframe?

是否有更简单或更优雅的方法来从稀疏数据帧填充连续数据帧,以便有(1)连续索引,(2)NaN 为 0,以及(3)没有剩余的空列数据框?

            Name
2003-06-24   2
2003-06-25   0
2003-06-26   0
2003-06-27   0
2003-06-30   0

回答by Matti John

You can just use reindex on a time series using your date range. Also it looks like you would be better off using a TimeSeries instead of a DataFrame (see documentation), although reindexing is also the correct method for adding missing index values to DataFrames as well.

您可以使用您的日期范围在时间序列上使用 reindex。此外,看起来您最好使用 TimeSeries 而不是 DataFrame(请参阅文档),尽管重新索引也是将缺失的索引值添加到 DataFrame 的正确方法。

For example, starting with:

例如,从以下开始:

date_index = pd.DatetimeIndex([pd.datetime(2003,6,24), pd.datetime(2003,8,13),
        pd.datetime(2003,8,19), pd.datetime(2003,8,22), pd.datetime(2003,8,24)])

ts = pd.Series([2,1,2,1,5], index=date_index)

Gives you a time series like your example dataframe's head:

为您提供类似于示例数据框头部的时间序列:

2003-06-24    2
2003-08-13    1
2003-08-19    2
2003-08-22    1
2003-08-24    5

Simply doing

简单地做

ts.reindex(pd.date_range(min(date_index), max(date_index)))

then gives you a complete index, with NaNs for your missing values (you can use fillnaif you want to fill the missing values with some other values - see here):

然后给你一个完整的索引,用 NaN 表示你的缺失值(fillna如果你想用其他一些值填充缺失值,你可以使用- 请参见此处):

2003-06-24     2
2003-06-25   NaN
2003-06-26   NaN
2003-06-27   NaN
2003-06-28   NaN
2003-06-29   NaN
2003-06-30   NaN
2003-07-01   NaN
2003-07-02   NaN
2003-07-03   NaN
2003-07-04   NaN
2003-07-05   NaN
2003-07-06   NaN
2003-07-07   NaN
2003-07-08   NaN
2003-07-09   NaN
2003-07-10   NaN
2003-07-11   NaN
2003-07-12   NaN
2003-07-13   NaN
2003-07-14   NaN
2003-07-15   NaN
2003-07-16   NaN
2003-07-17   NaN
2003-07-18   NaN
2003-07-19   NaN
2003-07-20   NaN
2003-07-21   NaN
2003-07-22   NaN
2003-07-23   NaN
2003-07-24   NaN
2003-07-25   NaN
2003-07-26   NaN
2003-07-27   NaN
2003-07-28   NaN
2003-07-29   NaN
2003-07-30   NaN
2003-07-31   NaN
2003-08-01   NaN
2003-08-02   NaN
2003-08-03   NaN
2003-08-04   NaN
2003-08-05   NaN
2003-08-06   NaN
2003-08-07   NaN
2003-08-08   NaN
2003-08-09   NaN
2003-08-10   NaN
2003-08-11   NaN
2003-08-12   NaN
2003-08-13     1
2003-08-14   NaN
2003-08-15   NaN
2003-08-16   NaN
2003-08-17   NaN
2003-08-18   NaN
2003-08-19     2
2003-08-20   NaN
2003-08-21   NaN
2003-08-22     1
2003-08-23   NaN
2003-08-24     5
Freq: D, Length: 62