pandas 没有日期时间索引的熊猫数据帧每天重新采样

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/37842260/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 01:24:18  来源:igfitidea点击:

pandas dataframe resample per day without date time index

pythonpandasdataframetime-series

提问by Nikhil

I have a dataframe in pandas of the following form:

我有以下形式的Pandas数据框:

      timestamps         light
7   2004-02-28 00:58:45 150.88
26  2004-02-28 00:59:45 143.52
34  2004-02-28 01:00:45 150.88
42  2004-02-28 01:01:15 150.88
59  2004-02-28 01:02:15 150.88

Here note that the index is not the timestamps column. But I want to resample (or bin the data somehow) to reflect the average value of the light column per minute , hour, day etc.. I have looked into the resamplemethod that pandas offers and it requires the dataframe to have a datatime index for the method to work (unless I've misunderstood this).

这里注意索引不是时间戳列。但我想重新采样(或以某种方式对数据进行分箱)以反映每分钟、每小时、每天等的轻列的平均值。我研究了resamplePandas提供的方法,它要求数据框具有数据时间索引工作方法(除非我误解了这一点)。

  1. So my first question is, can I re-index the dataframe to have timestamps as the index (note that not each row has a unique timestamp and for each timestamp, there are about 30 rows with the same timestamp,each representing a sensor).

  2. If not, is there some other way to possibly achieve another dataframe which has the average value of light per hour , per day , per month etc..?

  1. 所以我的第一个问题是,我可以重新索引数据帧以将时间戳记作为索引(请注意,并非每一行都有唯一的时间戳,对于每个时间戳,大约有 30 行具有相同的时间戳,每行代表一个传感器)。

  2. 如果没有,是否有其他方法可以实现另一个数据帧,该数据帧具有每小时、每天、每月等的平均光值?

Any help would be appreciated.

任何帮助,将不胜感激。

采纳答案by jezrael

You are right - need DatetimeIndex, TimedeltaIndexor PeriodIndexelse error:

你是对的-需要DatetimeIndexTimedeltaIndexPeriodIndex其他错误:

TypeError: Only valid with DatetimeIndex, TimedeltaIndex or PeriodIndex, but got an instance of 'Index'

类型错误:仅对 DatetimeIndex、TimedeltaIndex 或 PeriodIndex 有效,但得到了“Index”的实例

So you have to first reset_indexand set_indexif original indexis important:

所以你必须首先reset_indexset_index如果原创index很重要:

print (df.reset_index().set_index('timestamps'))
                     index   light
timestamps                        
2004-02-28 00:58:45      7  150.88
2004-02-28 00:59:45     26  143.52
2004-02-28 01:00:45     34  150.88
2004-02-28 01:01:15     42  150.88
2004-02-28 01:02:15     59  150.88

if not only set_index:

如果不仅set_index

print (df.set_index('timestamps'))
                      light
timestamps                 
2004-02-28 00:58:45  150.88
2004-02-28 00:59:45  143.52
2004-02-28 01:00:45  150.88
2004-02-28 01:01:15  150.88
2004-02-28 01:02:15  150.88

and then resample:

然后resample

print (df.reset_index().set_index('timestamps').resample('1D').mean())
            index    light
timestamps                
2004-02-28   33.6  149.408

回答by Stef

For pandas version 0.19.0 and newer you can use the onkeyword:

对于 0.19.0 及更新版本的 Pandas,您可以使用on关键字:

df.resample('H', on='timestamps').mean()

Result:

结果:

                      light
timestamps                 
2004-02-28 00:00:00  147.20
2004-02-28 01:00:00  150.88

回答by Arjjun

Here is an approach to resample.

这是重新采样方法

You can use the following method to sample at Tinterval.

您可以使用以下方法进行T间隔采样。

If original data was in every minute, your new resampled data will be at the 2 mininterval. You can use 3T, 4T....any Tvalue that fits your need.

如果原始数据在 each 中minute,则新的重新采样数据将位于该2 min间隔中。您可以使用3T, 4T....任何T适合您需要的值。

df_2T = df.resample('2T', on = 'timestamp').mean()

df_2T = df.resample('2T', on = 'timestamp').mean()

For hourlydf_hourly = df.resample('60T', on = 'timestamp').mean()

每小时df_hourly = df.resample('60T', on = 'timestamp').mean()

For dailydf_daily = df.resample('1440T', on = 'timestamp').mean()

对于日常df_daily = df.resample('1440T', on = 'timestamp').mean()

Note: One day has 60*24 = 1440 min

注:一天有 60*24 = 1440 分钟