pandas 没有日期时间索引的熊猫数据帧每天重新采样

Question

提问by Nikhil

I have a dataframe in pandas of the following form:

我有以下形式的Pandas数据框：

      timestamps         light
7   2004-02-28 00:58:45 150.88
26  2004-02-28 00:59:45 143.52
34  2004-02-28 01:00:45 150.88
42  2004-02-28 01:01:15 150.88
59  2004-02-28 01:02:15 150.88

Here note that the index is not the timestamps column. But I want to resample (or bin the data somehow) to reflect the average value of the light column per minute , hour, day etc.. I have looked into the resamplemethod that pandas offers and it requires the dataframe to have a datatime index for the method to work (unless I've misunderstood this).

这里注意索引不是时间戳列。但我想重新采样（或以某种方式对数据进行分箱）以反映每分钟、每小时、每天等的轻列的平均值。我研究了resamplePandas提供的方法，它要求数据框具有数据时间索引工作方法（除非我误解了这一点）。

So my first question is, can I re-index the dataframe to have timestamps as the index (note that not each row has a unique timestamp and for each timestamp, there are about 30 rows with the same timestamp,each representing a sensor).
If not, is there some other way to possibly achieve another dataframe which has the average value of light per hour , per day , per month etc..?

所以我的第一个问题是，我可以重新索引数据帧以将时间戳记作为索引（请注意，并非每一行都有唯一的时间戳，对于每个时间戳，大约有 30 行具有相同的时间戳，每行代表一个传感器）。
如果没有，是否有其他方法可以实现另一个数据帧，该数据帧具有每小时、每天、每月等的平均光值？

Any help would be appreciated.

任何帮助，将不胜感激。

Answer 1

采纳答案by jezrael

You are right - need DatetimeIndex, TimedeltaIndexor PeriodIndexelse error:

你是对的-需要DatetimeIndex，TimedeltaIndex或PeriodIndex其他错误：

TypeError: Only valid with DatetimeIndex, TimedeltaIndex or PeriodIndex, but got an instance of 'Index'

类型错误：仅对 DatetimeIndex、TimedeltaIndex 或 PeriodIndex 有效，但得到了“Index”的实例

So you have to first reset_indexand set_indexif original indexis important:

所以你必须首先reset_index，set_index如果原创index很重要：

print (df.reset_index().set_index('timestamps'))
                     index   light
timestamps                        
2004-02-28 00:58:45      7  150.88
2004-02-28 00:59:45     26  143.52
2004-02-28 01:00:45     34  150.88
2004-02-28 01:01:15     42  150.88
2004-02-28 01:02:15     59  150.88

if not only set_index:

如果不仅set_index：

print (df.set_index('timestamps'))
                      light
timestamps                 
2004-02-28 00:58:45  150.88
2004-02-28 00:59:45  143.52
2004-02-28 01:00:45  150.88
2004-02-28 01:01:15  150.88
2004-02-28 01:02:15  150.88

and then resample:

然后resample：

print (df.reset_index().set_index('timestamps').resample('1D').mean())
            index    light
timestamps                
2004-02-28   33.6  149.408

Answer 2

回答by Stef

For pandas version 0.19.0 and newer you can use the onkeyword:

对于 0.19.0 及更新版本的 Pandas，您可以使用on关键字：

df.resample('H', on='timestamps').mean()

Result:

结果：

                      light
timestamps                 
2004-02-28 00:00:00  147.20
2004-02-28 01:00:00  150.88

Answer 3

回答by Arjjun

Here is an approach to resample.

这是重新采样的方法。

You can use the following method to sample at Tinterval.

您可以使用以下方法进行T间隔采样。

If original data was in every minute, your new resampled data will be at the 2 mininterval. You can use 3T, 4T....any Tvalue that fits your need.

如果原始数据在 each 中minute，则新的重新采样数据将位于该2 min间隔中。您可以使用3T, 4T....任何T适合您需要的值。

df_2T = df.resample('2T', on = 'timestamp').mean()

For hourlydf_hourly = df.resample('60T', on = 'timestamp').mean()

每小时df_hourly = df.resample('60T', on = 'timestamp').mean()

For dailydf_daily = df.resample('1440T', on = 'timestamp').mean()

对于日常df_daily = df.resample('1440T', on = 'timestamp').mean()

Note: One day has 60*24 = 1440 min

注：一天有 60*24 = 1440 分钟

pandas 没有日期时间索引的熊猫数据帧每天重新采样

提问by Nikhil

采纳答案by jezrael

回答by Stef

回答by Arjjun

相关推荐

最近更新

标签

pandas 没有日期时间索引的熊猫数据帧每天重新采样

提问by Nikhil

采纳答案by jezrael

回答by Stef

回答by Arjjun

相关推荐

pandas 如何使用python pandas基于特定（字符串）列对数据框进行排序？

pandas 如何从行和列引用返回数据框值？

pandas 将 json 嵌套到 csv - 通用方法

pandas groupby-apply 行为，返回一个系列（不一致的输出类型）

相关推荐

最近更新

标签