pandas - 扩展 DataFrame 的索引将新行的所有列设置为 NaN?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/19119039/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
pandas - Extend Index of a DataFrame setting all columns for new rows to NaN?
提问by paul
I have time-indexed data:
我有时间索引数据:
df2 = pd.DataFrame({ 'day': pd.Series([date(2012, 1, 1), date(2012, 1, 3)]), 'b' : pd.Series([0.22, 0.3]) })
df2 = df2.set_index('day')
df2
b
day
2012-01-01 0.22
2012-01-03 0.30
What is the best way to extend this data frame so that it has one row for every day in January 2012 (say), where all columns are set to NaN(here only b) where we don't have data?
扩展此数据框的最佳方法是什么,使其在 2012 年 1 月的每一天都有一行(例如),其中所有列都设置为NaN(仅b在此处)我们没有数据的地方?
So the desired result would be:
所以想要的结果是:
b
day
2012-01-01 0.22
2012-01-02 NaN
2012-01-03 0.30
2012-01-04 NaN
...
2012-01-31 NaN
Many thanks!
非常感谢!
回答by Mark
Use this:
用这个:
ix = pd.DatetimeIndex(start=date(2012, 1, 1), end=date(2012, 1, 31), freq='D')
df2.reindex(ix)
Which gives:
这使:
b
2012-01-01 0.22
2012-01-02 NaN
2012-01-03 0.30
2012-01-04 NaN
2012-01-05 NaN
[...]
2012-01-29 NaN
2012-01-30 NaN
2012-01-31 NaN
回答by EdChum
You can resample passing day as frequency, without specifying a fill_methodparameter missing values will be NaNfilled as you desired
您可以重新采样过去的日期作为频率,而不指定fill_method参数缺失值将根据需要NaN填充
df3 = df2.asfreq('D')
df3
Out[16]:
b
2012-01-01 0.22
2012-01-02 NaN
2012-01-03 0.30
To answer your second part, I can't think of a more elegant way at the moment:
回答你的第二部分,我目前想不出更优雅的方式:
df3 = DataFrame({ 'day': Series([date(2012, 1, 4), date(2012, 1, 31)])})
df3.set_index('day',inplace=True)
merged = df2.append(df3)
merged = merged.asfreq('D')
merged
Out[46]:
b
2012-01-01 0.22
2012-01-02 NaN
2012-01-03 0.30
2012-01-04 NaN
2012-01-05 NaN
2012-01-06 NaN
2012-01-07 NaN
2012-01-08 NaN
2012-01-09 NaN
2012-01-10 NaN
2012-01-11 NaN
2012-01-12 NaN
2012-01-13 NaN
2012-01-14 NaN
2012-01-15 NaN
2012-01-16 NaN
2012-01-17 NaN
2012-01-18 NaN
2012-01-19 NaN
2012-01-20 NaN
2012-01-21 NaN
2012-01-22 NaN
2012-01-23 NaN
2012-01-24 NaN
2012-01-25 NaN
2012-01-26 NaN
2012-01-27 NaN
2012-01-28 NaN
2012-01-29 NaN
2012-01-30 NaN
2012-01-31 NaN
This constructs a second time series and then we just append and call asfreq('D')as before.
这构建了第二个时间序列,然后我们asfreq('D')像以前一样追加和调用。
回答by Jacob
Here's another option:
First add a NaNrecord on the last day you want, then resample. This way resampling will fill the missing dates for you.
这是另一种选择:首先NaN在您想要的最后一天添加一条记录,然后重新采样。通过这种方式,重新采样将为您填补缺失的日期。
Starting Frame:
起始帧:
import pandas as pd
import numpy as np
from datetime import date
df2 = pd.DataFrame({ 'day': pd.Series([date(2012, 1, 1), date(2012, 1, 3)]), 'b' : pd.Series([0.22, 0.3]) })
df2= df2.set_index('day')
df2
Out:
b
day
2012-01-01 0.22
2012-01-03 0.30
Filled Frame:
填充框架:
df2 = df2.set_value(date(2012,1,31),'b',np.float('nan'))
df2.asfreq('D')
Out:
b
day
2012-01-01 0.22
2012-01-02 NaN
2012-01-03 0.30
2012-01-04 NaN
2012-01-05 NaN
2012-01-06 NaN
2012-01-07 NaN
2012-01-08 NaN
2012-01-09 NaN
2012-01-10 NaN
2012-01-11 NaN
2012-01-12 NaN
2012-01-13 NaN
2012-01-14 NaN
2012-01-15 NaN
2012-01-16 NaN
2012-01-17 NaN
2012-01-18 NaN
2012-01-19 NaN
2012-01-20 NaN
2012-01-21 NaN
2012-01-22 NaN
2012-01-23 NaN
2012-01-24 NaN
2012-01-25 NaN
2012-01-26 NaN
2012-01-27 NaN
2012-01-28 NaN
2012-01-29 NaN
2012-01-30 NaN
2012-01-31 NaN
回答by deckard
Not exactly the question since here you know that the second index is all days in January, but suppose you have another index say from another data frame df1, which might be disjoint and with a random frequency. Then you can do this:
不完全是问题,因为您知道第二个索引是一月份的所有天数,但假设您有另一个来自另一个数据框 df1 的索引,它可能不相交且频率随机。然后你可以这样做:
ix = pd.DatetimeIndex(list(df2.index) + list(df1.index)).unique().sort_values()
df2.reindex(ix)
Converting indices to lists allows one to create a longer list in a natural way.
将索引转换为列表允许以自然的方式创建更长的列表。
回答by Hunaphu
def extendframe(df, ndays):
"""
(df, ndays) -> df that is padded by ndays in beginning and end
"""
ixd = df.index - datetime.timedelta(ndays)
ixu = df.index + datetime.timedelta(ndays)
ixx = df.index.union(ixd.union(ixu))
df_ = df.reindex(ixx)
return df_

