pandas 熊猫重新采样数据框并将日期时间索引保留为一列
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/17185942/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
pandas resampling dataframe and keep datetime index as a column
提问by ybb
I'm trying to resample daily data to weekly data using pandas.
我正在尝试使用Pandas将每日数据重新采样为每周数据。
I'm using the following:
我正在使用以下内容:
weekly_start_date =pd.Timestamp('01/05/2011')
weekly_end_date =pd.Timestamp('05/28/2013')
daily_data = daily_data[(daily_data["date"] >= weekly_start_date) & (daily_data["date"] <= weekly_end_date)]
daily_data = daily_data.set_index('date',drop=False)
weekly_data = daily_data.resample('7D',how=np.sum,closed='left',label='left')
The problem is weekly_data doesn't have the date column anymore.
问题是weekly_data 不再有日期列。
What did I miss?
我错过了什么?
Thanks,
谢谢,
回答by Briford Wylie
If I understand your question, it looks like your doing the resampling correctly (Pandas docs on resampling here: http://pandas.pydata.org/pandas-docs/stable/timeseries.html).
如果我理解您的问题,那么您似乎正确地进行了重新采样(关于重新采样的 Pandas 文档:http: //pandas.pydata.org/pandas-docs/stable/timeseries.html)。
weekly_data = daily_data.resample('7D',how=np.sum,closed='left',label='left')
If the only issue is that you'd like the DateTimeIndex replicated in a column you can just do this.
如果唯一的问题是您希望在一列中复制 DateTimeIndex,您可以这样做。
weekly_data['date'] = weekly_data.index.values
Apologies if I misunderstood the question. :)
如果我误解了这个问题,请道歉。:)
回答by Andy Hayden
You can only resample by numeric columns:
您只能按数字列重新采样:
In [11]: df = pd.DataFrame([[pd.Timestamp('1/1/2012'), 1, 'a', [1]], [pd.Timestamp('1/2/2012'), 2, 'b', [2]]], columns=['date', 'no', 'letter', 'li'])
In [12]: df1 = df.set_index('date', drop=False)
In [13]: df1
Out[13]:
date no letter li
date
2012-01-01 2012-01-01 00:00:00 1 a [1]
2012-01-02 2012-01-02 00:00:00 2 b [2]
In [15]: df1.resample('M', how=np.sum)
Out[15]:
no
date
2012-01-31 3
We can see that it uses the dtype to determine whether it's numeric:
我们可以看到它使用dtype来确定它是否是数字:
In [16]: df1.no = df1.no.astype(object)
In [17]: df1.resample('M', how=sum)
Out[17]:
date no letter li
date
2012-01-31 0 0 0 0
An awful hack for actual summing:
实际求和的可怕黑客:
In [21]: rng = pd.date_range(weekly_start_date, weekly_end_date, freq='M')
In [22]: g = df1.groupby(rng.asof)
In [23]: g.apply(lambda t: t.apply(lambda x: x.sum(1))).unstack()
Out[23]:
date no letter li
2011-12-31 2650838400000000000 3 ab [1, 2]
The date is the sum of the epoch nanoseconds...
日期是纪元纳秒的总和...
(Hopefully I'm doing something silly, and there's is an easier way!)
(希望我在做一些愚蠢的事情,并且有一种更简单的方法!)

