pandas 没有日期时间索引的熊猫数据帧每天重新采样
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/37842260/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
pandas dataframe resample per day without date time index
提问by Nikhil
I have a dataframe in pandas of the following form:
我有以下形式的Pandas数据框:
timestamps light
7 2004-02-28 00:58:45 150.88
26 2004-02-28 00:59:45 143.52
34 2004-02-28 01:00:45 150.88
42 2004-02-28 01:01:15 150.88
59 2004-02-28 01:02:15 150.88
Here note that the index is not the timestamps column. But I want to resample (or bin the data somehow) to reflect the average value of the light column per minute , hour, day etc.. I have looked into the resample
method that pandas offers and it requires the dataframe to have a datatime index for the method to work (unless I've misunderstood this).
这里注意索引不是时间戳列。但我想重新采样(或以某种方式对数据进行分箱)以反映每分钟、每小时、每天等的轻列的平均值。我研究了resample
Pandas提供的方法,它要求数据框具有数据时间索引工作方法(除非我误解了这一点)。
So my first question is, can I re-index the dataframe to have timestamps as the index (note that not each row has a unique timestamp and for each timestamp, there are about 30 rows with the same timestamp,each representing a sensor).
If not, is there some other way to possibly achieve another dataframe which has the average value of light per hour , per day , per month etc..?
所以我的第一个问题是,我可以重新索引数据帧以将时间戳记作为索引(请注意,并非每一行都有唯一的时间戳,对于每个时间戳,大约有 30 行具有相同的时间戳,每行代表一个传感器)。
如果没有,是否有其他方法可以实现另一个数据帧,该数据帧具有每小时、每天、每月等的平均光值?
Any help would be appreciated.
任何帮助,将不胜感激。
采纳答案by jezrael
You are right - need DatetimeIndex
, TimedeltaIndex
or PeriodIndex
else error:
你是对的-需要DatetimeIndex
,TimedeltaIndex
或PeriodIndex
其他错误:
TypeError: Only valid with DatetimeIndex, TimedeltaIndex or PeriodIndex, but got an instance of 'Index'
类型错误:仅对 DatetimeIndex、TimedeltaIndex 或 PeriodIndex 有效,但得到了“Index”的实例
So you have to first reset_index
and set_index
if original index
is important:
所以你必须首先reset_index
,set_index
如果原创index
很重要:
print (df.reset_index().set_index('timestamps'))
index light
timestamps
2004-02-28 00:58:45 7 150.88
2004-02-28 00:59:45 26 143.52
2004-02-28 01:00:45 34 150.88
2004-02-28 01:01:15 42 150.88
2004-02-28 01:02:15 59 150.88
if not only set_index
:
如果不仅set_index
:
print (df.set_index('timestamps'))
light
timestamps
2004-02-28 00:58:45 150.88
2004-02-28 00:59:45 143.52
2004-02-28 01:00:45 150.88
2004-02-28 01:01:15 150.88
2004-02-28 01:02:15 150.88
and then resample
:
然后resample
:
print (df.reset_index().set_index('timestamps').resample('1D').mean())
index light
timestamps
2004-02-28 33.6 149.408
回答by Stef
For pandas version 0.19.0 and newer you can use the on
keyword:
对于 0.19.0 及更新版本的 Pandas,您可以使用on
关键字:
df.resample('H', on='timestamps').mean()
Result:
结果:
light
timestamps
2004-02-28 00:00:00 147.20
2004-02-28 01:00:00 150.88
回答by Arjjun
Here is an approach to resample.
这是重新采样的方法。
You can use the following method to sample at T
interval.
您可以使用以下方法进行T
间隔采样。
If original data was in every minute
, your new resampled data will be at the 2 min
interval.
You can use 3T, 4T....
any T
value that fits your need.
如果原始数据在 each 中minute
,则新的重新采样数据将位于该2 min
间隔中。您可以使用3T, 4T....
任何T
适合您需要的值。
df_2T = df.resample('2T', on = 'timestamp').mean()
df_2T = df.resample('2T', on = 'timestamp').mean()
For hourlydf_hourly = df.resample('60T', on = 'timestamp').mean()
每小时df_hourly = df.resample('60T', on = 'timestamp').mean()
For dailydf_daily = df.resample('1440T', on = 'timestamp').mean()
对于日常df_daily = df.resample('1440T', on = 'timestamp').mean()
Note: One day has 60*24 = 1440 min
注:一天有 60*24 = 1440 分钟