Pandas 0.15 DataFrame:删除或重置 datetime64 的时间部分
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 
原文地址: http://stackoverflow.com/questions/26531109/
Warning: these are provided under cc-by-sa 4.0 license.  You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Pandas 0.15 DataFrame: Remove or reset time portion of a datetime64
提问by n4cer500
I have imported a CSV file into a pandas DataFrame and have a datetime64 column with values such as:
我已将 CSV 文件导入到 Pandas DataFrame 中,并有一个 datetime64 列,其中包含以下值:
2014-06-30 21:50:00
I simply want to either remove the time or set the time to midnight:
我只想删除时间或将时间设置为午夜:
2014-06-30 00:00:00 
What is the easiest way of doing this?
这样做的最简单方法是什么?
回答by Frank
Pandas has a builtin function pd.datetools.normalize_datefor that purpose:
pd.datetools.normalize_date为此,Pandas 有一个内置函数:
df['date_col'] = df['date_col'].apply(pd.datetools.normalize_date)
It's implemented in Cythonand does the following:
它在 Cython 中实现并执行以下操作:
if PyDateTime_Check(dt):
    return dt.replace(hour=0, minute=0, second=0, microsecond=0)
elif PyDate_Check(dt):
    return datetime(dt.year, dt.month, dt.day)
else:
    raise TypeError('Unrecognized type: %s' % type(dt))
回答by Kathirmani Sukumar
Use dtmethods, which is vectorized to yield faster results.
使用dt矢量化的方法以更快地产生结果。
# There are better ways of converting it in to datetime column. 
# Ignore those to keep it simple
data['date_column'] = pd.to_datetime(data['date_column'])
data['date_column'].dt.date
回答by phil
pd.datetools.normalize_datehas been deprecated. Use df['date_col'] = df['date_col'].dt.normalize()instead. 
pd.datetools.normalize_date已被弃用。使用df['date_col'] = df['date_col'].dt.normalize()来代替。
See https://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.dt.normalize.html
见https://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.dt.normalize.html
回答by EdChum
I can think of two ways, setting or assigning to a new column just the date()attribute, or calling replaceon the datetime object and passing param hour=0, minute=0:
我可以想到两种方法,仅将date()属性设置或分配给新列,或者调用replacedatetime 对象并传递 param hour=0, minute=0:
In [106]:
# example data
t = """datetime
2014-06-30 21:50:00"""
df = pd.read_csv(io.StringIO(t), parse_dates=[0])
df
Out[106]:
             datetime
0 2014-06-30 21:50:00
In [107]:
# apply a lambda accessing just the date() attribute
df['datetime'] = df['datetime'].apply( lambda x: x.date() )
print(df)
# reset df
df = pd.read_csv(io.StringIO(t), parse_dates=[0])
# call replace with params hour=0, minute=0
df['datetime'] = df['datetime'].apply( lambda x: x.replace(hour=0, minute=0) )
df
     datetime
0  2014-06-30
Out[107]:
    datetime
0 2014-06-30
回答by Sebastian N
Since pd.datetools.normalize_datehas been deprecated and you are working with the datetime64data type, use:
由于pd.datetools.normalize_date已被弃用并且您正在使用该datetime64数据类型,请使用:
df.your_date_col = df.your_date_col.apply(lambda x: x.replace(hour=0, minute=0, second=0, microsecond=0))
This way you don't need to convert to pandas datetime first. If it's already a pandas datetime, then see answer from Phil.
这样您就不需要先转换为Pandas日期时间。如果它已经是一个Pandas日期时间,那么请参阅 Phil 的回答。
df.your_date_col = df.your_date_col.dt.normalize()
回答by Kevin S
The fastest way I have found to strip everything but the date is to use the underlying Numpy structure of pandas Timestamps.
我发现去除除日期之外的所有内容的最快方法是使用 Pandas Timestamps 的底层 Numpy 结构。
import pandas as pd
dates = pd.to_datetime(['1990-1-1 1:00:11',
                        '1991-1-1',
                        '1999-12-31 12:59:59.999'])
dates
DatetimeIndex(['1990-01-01 01:00:11', '1991-01-01 00:00:00',
           '1999-12-31 12:59:59.999000'],
           dtype='datetime64[ns]', freq=None)
dates = dates.astype(np.int64)
ns_in_day = 24*60*60*np.int64(1e9)
dates //= ns_in_day
dates *= ns_in_day
dates = dates.astype(np.dtype('<M8[ns]'))
dates = pd.Series(dates)
dates
0   1990-01-01
1   1991-01-01
2   1999-12-31
dtype: datetime64[ns]
This might not work when data have timezone information.
当数据具有时区信息时,这可能不起作用。

