pandas 熊猫重新采样数据框并将日期时间索引保留为一列

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/17185942/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-13 20:55:54  来源:igfitidea点击:

pandas resampling dataframe and keep datetime index as a column

pandaspivot

提问by ybb

I'm trying to resample daily data to weekly data using pandas.

我正在尝试使用Pandas将每日数据重新采样为每周数据。

I'm using the following:

我正在使用以下内容:

weekly_start_date =pd.Timestamp('01/05/2011')
weekly_end_date =pd.Timestamp('05/28/2013')

daily_data = daily_data[(daily_data["date"] >= weekly_start_date) & (daily_data["date"] <= weekly_end_date)]    

daily_data = daily_data.set_index('date',drop=False)
weekly_data = daily_data.resample('7D',how=np.sum,closed='left',label='left')

The problem is weekly_data doesn't have the date column anymore.

问题是weekly_data 不再有日期列。

What did I miss?

我错过了什么?

Thanks,

谢谢,

回答by Briford Wylie

If I understand your question, it looks like your doing the resampling correctly (Pandas docs on resampling here: http://pandas.pydata.org/pandas-docs/stable/timeseries.html).

如果我理解您的问题,那么您似乎正确地进行了重新采样(关于重新采样的 Pandas 文档:http: //pandas.pydata.org/pandas-docs/stable/timeseries.html)。

  weekly_data = daily_data.resample('7D',how=np.sum,closed='left',label='left')

If the only issue is that you'd like the DateTimeIndex replicated in a column you can just do this.

如果唯一的问题是您希望在一列中复制 DateTimeIndex,您可以这样做。

  weekly_data['date'] = weekly_data.index.values

Apologies if I misunderstood the question. :)

如果我误解了这个问题,请道歉。:)

回答by Andy Hayden

You can only resample by numeric columns:

您只能按数字列重新采样:

In [11]: df = pd.DataFrame([[pd.Timestamp('1/1/2012'), 1, 'a', [1]], [pd.Timestamp('1/2/2012'), 2, 'b', [2]]], columns=['date', 'no', 'letter', 'li'])

In [12]: df1 = df.set_index('date', drop=False)

In [13]: df1
Out[13]:
                          date  no letter   li
date
2012-01-01 2012-01-01 00:00:00   1      a  [1]
2012-01-02 2012-01-02 00:00:00   2      b  [2]

In [15]: df1.resample('M', how=np.sum)
Out[15]:
            no
date
2012-01-31   3

We can see that it uses the dtype to determine whether it's numeric:

我们可以看到它使用dtype来确定它是否是数字:

In [16]: df1.no = df1.no.astype(object)

In [17]: df1.resample('M', how=sum)
Out[17]:
            date  no  letter  li
date
2012-01-31     0   0       0   0

An awful hack for actual summing:

实际求和的可怕黑客:

In [21]: rng = pd.date_range(weekly_start_date, weekly_end_date, freq='M')

In [22]: g = df1.groupby(rng.asof)

In [23]: g.apply(lambda t: t.apply(lambda x: x.sum(1))).unstack()
Out[23]:
                           date no letter      li
2011-12-31  2650838400000000000  3     ab  [1, 2]

The date is the sum of the epoch nanoseconds...

日期是纪元纳秒的总和...

(Hopefully I'm doing something silly, and there's is an easier way!)

(希望我在做一些愚蠢的事情,并且有一种更简单的方法!)