Pandas 0.15 DataFrame:删除或重置 datetime64 的时间部分

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/26531109/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-13 22:36:23  来源:igfitidea点击:

Pandas 0.15 DataFrame: Remove or reset time portion of a datetime64

pythonpandas

提问by n4cer500

I have imported a CSV file into a pandas DataFrame and have a datetime64 column with values such as:

我已将 CSV 文件导入到 Pandas DataFrame 中,并有一个 datetime64 列,其中包含以下值:

2014-06-30 21:50:00

I simply want to either remove the time or set the time to midnight:

我只想删除时间或将时间设置为午夜:

2014-06-30 00:00:00 

What is the easiest way of doing this?

这样做的最简单方法是什么?

回答by Frank

Pandas has a builtin function pd.datetools.normalize_datefor that purpose:

pd.datetools.normalize_date为此,Pandas 有一个内置函数:

df['date_col'] = df['date_col'].apply(pd.datetools.normalize_date)

It's implemented in Cythonand does the following:

在 Cython 中实现并执行以下操作:

if PyDateTime_Check(dt):
    return dt.replace(hour=0, minute=0, second=0, microsecond=0)
elif PyDate_Check(dt):
    return datetime(dt.year, dt.month, dt.day)
else:
    raise TypeError('Unrecognized type: %s' % type(dt))

回答by Kathirmani Sukumar

Use dtmethods, which is vectorized to yield faster results.

使用dt矢量化的方法以更快地产生结果。

# There are better ways of converting it in to datetime column. 
# Ignore those to keep it simple
data['date_column'] = pd.to_datetime(data['date_column'])
data['date_column'].dt.date

回答by phil

pd.datetools.normalize_datehas been deprecated. Use df['date_col'] = df['date_col'].dt.normalize()instead.

pd.datetools.normalize_date已被弃用。使用df['date_col'] = df['date_col'].dt.normalize()来代替。

See https://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.dt.normalize.html

https://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.dt.normalize.html

回答by EdChum

I can think of two ways, setting or assigning to a new column just the date()attribute, or calling replaceon the datetime object and passing param hour=0, minute=0:

我可以想到两种方法,仅将date()属性设置或分配给新列,或者调用replacedatetime 对象并传递 param hour=0, minute=0

In [106]:
# example data
t = """datetime
2014-06-30 21:50:00"""
df = pd.read_csv(io.StringIO(t), parse_dates=[0])
df
Out[106]:
             datetime
0 2014-06-30 21:50:00
In [107]:
# apply a lambda accessing just the date() attribute
df['datetime'] = df['datetime'].apply( lambda x: x.date() )
print(df)
# reset df
df = pd.read_csv(io.StringIO(t), parse_dates=[0])
# call replace with params hour=0, minute=0
df['datetime'] = df['datetime'].apply( lambda x: x.replace(hour=0, minute=0) )
df

     datetime
0  2014-06-30
Out[107]:
    datetime
0 2014-06-30

回答by Sebastian N

Since pd.datetools.normalize_datehas been deprecated and you are working with the datetime64data type, use:

由于pd.datetools.normalize_date已被弃用并且您正在使用该datetime64数据类型,请使用:

df.your_date_col = df.your_date_col.apply(lambda x: x.replace(hour=0, minute=0, second=0, microsecond=0))

This way you don't need to convert to pandas datetime first. If it's already a pandas datetime, then see answer from Phil.

这样您就不需要先转换为Pandas日期时间。如果它已经是一个Pandas日期时间,那么请参阅 Phil 的回答。

df.your_date_col = df.your_date_col.dt.normalize()

回答by Kevin S

The fastest way I have found to strip everything but the date is to use the underlying Numpy structure of pandas Timestamps.

我发现去除除日期之外的所有内容的最快方法是使用 Pandas Timestamps 的底层 Numpy 结构。

import pandas as pd
dates = pd.to_datetime(['1990-1-1 1:00:11',
                        '1991-1-1',
                        '1999-12-31 12:59:59.999'])
dates

DatetimeIndex(['1990-01-01 01:00:11', '1991-01-01 00:00:00',
           '1999-12-31 12:59:59.999000'],
           dtype='datetime64[ns]', freq=None)

dates = dates.astype(np.int64)
ns_in_day = 24*60*60*np.int64(1e9)
dates //= ns_in_day
dates *= ns_in_day
dates = dates.astype(np.dtype('<M8[ns]'))
dates = pd.Series(dates)
dates

0   1990-01-01
1   1991-01-01
2   1999-12-31
dtype: datetime64[ns]

This might not work when data have timezone information.

当数据具有时区信息时,这可能不起作用。