更改 Pandas Dataframe 中的时间频率

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/26342713/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-13 22:34:48  来源:igfitidea点击:

Changing time frequency in Pandas Dataframe

pythonpandastime-seriestime-frequency

提问by Zhubarb

I have a Pandas DataFrame as below.

我有一个 Pandas DataFrame,如下所示。

df
                              A           B
date_time                                    
2014-07-01 06:03:59.614000  62.1250       NaN
2014-07-01 06:03:59.692000  62.2500       NaN
2014-07-01 06:13:34.524000  62.2500  241.0625
2014-07-01 06:13:34.602000  62.2500  241.5000
2014-07-01 06:15:05.399000  62.2500  241.3750
2014-07-01 06:15:05.399000  62.2500  241.2500
2014-07-01 06:15:42.004000  62.2375  241.2500
2014-07-01 06:15:42.082000  62.2375  241.3750
2014-07-01 06:15:42.082000  62.2375  240.2500

I want to change the frequency of this to regular 1 minuteintervals. But get the error below:

我想将此频率更改为定期1 minute间隔。但得到以下错误:

new = df.asfreq('1Min')
>>error: cannot reindex from a duplicate axis

Now, I understand why this is happening. Since my time granularity is high (in milliseconds) but irregular, I get multiple readings per minute, even per second. So I tried to combine these millisecond readings to minutes and get rid of duplicates as below.

现在,我明白为什么会这样了。由于我的时间粒度很高(以毫秒为单位)但不规则,我每分钟甚至每秒都会得到多个读数。因此,我尝试将这些毫秒读数与分钟结合起来,并删除重复项,如下所示。

# try to convert the index to minutes and drop duplicates
df['index'] = df.index
df['minute_index']= df['index'].apply( lambda x: x.strftime('%Y-%m-%d %H:%M'))
df.drop_duplicates(cols = 'minute_index', inplace = True, take_last = True)
df_by_minute = df.set_index('minute_index')
df_by_minute
                        A                B               index
minute_index                                                     
2014-07-01 06:03    62.2500        NaN 2014-07-01 06:03:59.692000
2014-07-01 06:13    62.2500     241.50 2014-07-01 06:13:34.602000
2014-07-01 06:15    62.2375     240.25 2014-07-01 06:15:42.082000

# now change the frequency to 1 minute but I just get NaNs (!)
df_by_minute.asfreq('1Min')
                            A          B   index
2014-07-01 06:03:00        NaN        NaN   NaT
2014-07-01 06:04:00        NaN        NaN   NaT
2014-07-01 06:05:00        NaN        NaN   NaT
2014-07-01 06:06:00        NaN        NaN   NaT
2014-07-01 06:07:00        NaN        NaN   NaT
2014-07-01 06:08:00        NaN        NaN   NaT
2014-07-01 06:09:00        NaN        NaN   NaT
2014-07-01 06:10:00        NaN        NaN   NaT
2014-07-01 06:11:00        NaN        NaN   NaT
2014-07-01 06:12:00        NaN        NaN   NaT
2014-07-01 06:13:00        NaN        NaN   NaT
2014-07-01 06:14:00        NaN        NaN   NaT
2014-07-01 06:15:00        NaN        NaN   NaT

As you see it does not work.. Can someone help? What I am trying to achieve is to get a function that returns A or B as of DateTimeand DateTime would be in 1Min increments.

如您所见,它不起作用.. 有人可以帮忙吗?我想要实现的是获得一个返回的函数,A or B as of DateTimeDateTime 将以 1Min 为增量。

采纳答案by Jihun

I think, not asfreqbut resamplefits your needs:

我认为,asfreq但不resample符合您的需求:

new = df.resample('T', how='mean')

For howoption, you can also use 'last' or 'first'.

对于how选项,您还可以使用“last”或“first”。