Python 使用 pandas.to_datetime 时只保留日期部分
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/16176996/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Keep only date part when using pandas.to_datetime
提问by
I use pandas.to_datetimeto parse the dates in my data. Pandas by default represents the dates with datetime64[ns]even though the dates are all daily only.
I wonder whether there is an elegant/clever way to convert the dates to datetime.dateor datetime64[D]so that, when I write the data to CSV, the dates are not appended with 00:00:00. I know I can convert the type manually element-by-element:
我pandas.to_datetime用来解析数据中的日期。Pandas 默认表示日期,datetime64[ns]即使日期都是每天。我不知道是否有日期,转换成一个优雅的/聪明的办法datetime.date或datetime64[D]这样,当我写数据到CSV的,日期是不追加00:00:00。我知道我可以逐个元素手动转换类型:
[dt.to_datetime().date() for dt in df.dates]
But this is really slow since I have many rows and it sort of defeats the purpose of using pandas.to_datetime. Is there a way to convert the dtypeof the entire column at once? Or alternatively, does pandas.to_datetimesupport a precision specification so that I can get rid of the time part while working with daily data?
但这真的很慢,因为我有很多行,而且有点违背了使用pandas.to_datetime. 有没有办法dtype一次转换整个列的 ?或者,是否pandas.to_datetime支持精度规范,以便我可以在处理日常数据时摆脱时间部分?
回答by Dale Jung
Converting to datetime64[D]:
转换为datetime64[D]:
df.dates.values.astype('M8[D]')
Though re-assigning that to a DataFrame col will revert it back to [ns].
尽管将其重新分配给 DataFrame col 会将其恢复为 [ns]。
If you wanted actual datetime.date:
如果你想要实际的datetime.date:
dt = pd.DatetimeIndex(df.dates)
dates = np.array([datetime.date(*date_tuple) for date_tuple in zip(dt.year, dt.month, dt.day)])
回答by EdChum
Since version 0.15.0this can now be easily done using .dtto access just the date component:
从版本开始,0.15.0现在可以使用.dt仅访问日期组件轻松完成此操作:
df['just_date'] = df['dates'].dt.date
The above returns a datetime.datedtype, if you want to have a datetime64then you can just normalizethe time component to midnight so it sets all the values to 00:00:00:
上面返回一个datetime.datedtype,如果你想要一个,datetime64那么你可以只normalize将时间组件设置为午夜,以便将所有值设置为00:00:00:
df['normalised_date'] = df['dates'].dt.normalize()
This keeps the dtype as datetime64but the display shows just the datevalue.
这将保留 dtype,datetime64但显示仅显示date值。
回答by j08lue
Pandas DatetimeIndexand Serieshave a method called normalizethat does exactly what you want.
PandasDatetimeIndex并Series有一个方法normalize可以完全满足您的要求。
You can read more about it in this answer.
您可以在此答案中阅读更多相关信息。
It can be used as ser.dt.normalize()
它可以用作 ser.dt.normalize()
回答by Pietro Battiston
While I upvoted EdChum's answer, which is the most direct answer to the question the OP posed, it does not really solve the performance problem (it still relies on python datetimeobjects, and hence any operation on them will be not vectorized - that is, it will be slow).
虽然我赞成 EdChum 的回答,这是对 OP 提出的问题的最直接回答,但它并没有真正解决性能问题(它仍然依赖于 pythondatetime对象,因此对它们的任何操作都不会被矢量化——也就是说,它会很慢)。
A better performing alternativeis to use df['dates'].dt.floor('d'). Strictly speaking, it does not "keep only date part", since it just sets the time to 00:00:00. But it does work as desired by the OP when, for instance:
性能更好的替代方法是使用df['dates'].dt.floor('d'). 严格来说,它不会“只保留日期部分”,因为它只是将时间设置为00:00:00. 但它确实可以按照 OP 的要求工作,例如:
- printing to screen
- saving to csv
- using the column to
groupby
- 打印到屏幕
- 保存到 csv
- 使用该列
groupby
... and it is much more efficient, since the operation is vectorized.
...而且效率更高,因为操作是矢量化的。
EDIT:in fact, the answer the OP's would have preferred is probably "recent versions of pandasdo notwrite the time to csv if it is 00:00:00for all observations".
编辑:其实,在OP的宁愿答案很可能是“最近的版本pandas也没有时间写为csv如果是00:00:00对所有的意见”。
回答by Mani Abi Anand
This is a simple way to extract the date:
这是提取日期的简单方法:
import pandas as pd
d='2015-01-08 22:44:09'
date=pd.to_datetime(d).date()
print(date)
回答by jpp
Pandas v0.13+: Use to_csvwith date_formatparameter
Pandas v0.13+:to_csv与date_format参数一起使用
Avoid, where possible, converting your datetime64[ns]series to an objectdtype series of datetime.dateobjects. The latter, often constructed using pd.Series.dt.date, is stored as an array of pointers and is inefficient relative to a pure NumPy-based series.
在可能的情况下,避免将您的datetime64[ns]系列转换为objectdtype 系列的datetime.date对象。后者通常使用 构造pd.Series.dt.date,存储为指针数组,相对于纯基于 NumPy 的系列而言效率低下。
Since your concern is format when writing to CSV, just use the date_formatparameter of to_csv. For example:
由于你的关心是格式写入CSV时,只需使用date_format的参数to_csv。例如:
df.to_csv(filename, date_format='%Y-%m-%d')
See Python's strftimedirectivesfor formatting conventions.
有关格式约定,请参阅Python 的strftime指令。
回答by Gil Baggio
Simple Solution:
简单的解决方案:
df['date_only'] = df['date_time_column'].dt.date
回答by Katekarin
Just giving a more up to date answer in case someone sees this old post.
只是给出一个更新的答案,以防有人看到这个旧帖子。
Adding "utc=False" when converting to datetime will remove the timezone component and keep only the date in a datetime64[ns] data type.
在转换为 datetime 时添加“utc=False”将删除时区组件并仅保留 datetime64[ns] 数据类型的日期。
pd.to_datetime(df['Date'], utc=False)
You will be able to save it in excel without getting the error "ValueError: Excel does not support datetimes with timezones. Please ensure that datetimes are timezone unaware before writing to Excel."
您将能够将其保存在 excel 中而不会出现错误“ValueError:Excel 不支持带时区的日期时间。在写入 Excel 之前,请确保日期时间是不知道时区的。”
回答by Climbs_lika_Spyder
I wanted to be able to change the type for a set of columns in a data frame and then remove the time keeping the day. round(), floor(), ceil()all work
我希望能够更改数据框中一组列的类型,然后删除保留一天的时间。round(), floor(), ceil()都有效
df[date_columns] = df[date_columns].apply(pd.to_datetime)
df[date_columns] = df[date_columns].apply(lambda t: t.dt.floor('d'))


