从 Pandas 数据框列中删除“秒”和“分钟”

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/43400331/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 03:23:33  来源:igfitidea点击:

Remove 'seconds' and 'minutes' from a Pandas dataframe column

pythonpandasdataframetime-series

提问by Dustin Helliwell

Given a dataframe like:

给定一个数据框,如:

import numpy as np
import pandas as pd

df = pd.DataFrame(
{'Date' : pd.date_range('1/1/2011', periods=5, freq='3675S'),
 'Num' : np.random.rand(5)})
                 Date       Num
0 2011-01-01 00:00:00  0.580997
1 2011-01-01 01:01:15  0.407332
2 2011-01-01 02:02:30  0.786035
3 2011-01-01 03:03:45  0.821792
4 2011-01-01 04:05:00  0.807869

I would like to remove the 'minutes' and 'seconds' information.

我想删除“分钟”和“秒”信息。

The following (mostly stolen from: How to remove the 'seconds' of Pandas dataframe index?) works okay,

以下(主要来自:How to remove the 'seconds' of Pandas dataframe index?)工作正常,

df = df.assign(Date = lambda x: pd.to_datetime(x['Date'].dt.strftime('%Y-%m-%d %H')))
                 Date       Num
0 2011-01-01 00:00:00  0.580997
1 2011-01-01 01:00:00  0.407332
2 2011-01-01 02:00:00  0.786035
3 2011-01-01 03:00:00  0.821792
4 2011-01-01 04:00:00  0.807869

but it feels strange to convert a datetime to a string then back to a datetime. Is there a way to do this more directly?

但是将日期时间转换为字符串然后再转换回日期时间感觉很奇怪。有没有办法更直接地做到这一点?

回答by piRSquared

dt.round

dt.round

This is how it should be done... use dt.round

这就是它应该如何完成...使用 dt.round

df.assign(Date=df.Date.dt.round('H'))

                 Date       Num
0 2011-01-01 00:00:00  0.577957
1 2011-01-01 01:00:00  0.995748
2 2011-01-01 02:00:00  0.864013
3 2011-01-01 03:00:00  0.468762
4 2011-01-01 04:00:00  0.866827

OLD ANSWER

旧答案

One approach is to set the index and use resample

一种方法是设置索引并使用 resample

df.set_index('Date').resample('H').last().reset_index()

                 Date       Num
0 2011-01-01 00:00:00  0.577957
1 2011-01-01 01:00:00  0.995748
2 2011-01-01 02:00:00  0.864013
3 2011-01-01 03:00:00  0.468762
4 2011-01-01 04:00:00  0.866827

Another alternative is to strip the dateand hourcomponents

另一种选择是剥离datehour组件

df.assign(
    Date=pd.to_datetime(df.Date.dt.date) +
         pd.to_timedelta(df.Date.dt.hour, unit='H'))

                 Date       Num
0 2011-01-01 00:00:00  0.577957
1 2011-01-01 01:00:00  0.995748
2 2011-01-01 02:00:00  0.864013
3 2011-01-01 03:00:00  0.468762
4 2011-01-01 04:00:00  0.866827