pandas 熊猫:改变日
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/28888730/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Pandas: Change day
提问by FooBar
I have a series in datetimeformat, and need to change the day to 1for each entry. I have thought of numerous simple solutions, but none of them works for me. For now, the only thing that actually works is
我有一系列datetime格式,需要更改1每个条目的日期。我想过许多简单的解决方案,但没有一个对我有用。目前,唯一真正有效的是
- set the series as the index
- Query month and year from the index
- Reconstruct a new time series using year, month and 1
- 将系列设置为索引
- 从索引中查询月份和年份
- 使用年、月和 1 重建新的时间序列
It can't really be that complicated, can it? There is month start, but is unfortunately an offset, that's of no use here. There seems to be no set()function for the method, and even less functionality while the series is a column, and not (part of) the index itself.
不会真的那么复杂吧?有月份开始,但不幸的是一个offset,在这里没用。该方法似乎没有任何set()功能,当系列是一列而不是(部分)索引本身时,功能甚至更少。
The only related question was this, but the trick used there is not applicable here.
唯一相关的问题是this,但是那里使用的技巧在这里不适用。
回答by Jon Clements
You can use .applyand datetime.replace, eg:
您可以使用.applyand datetime.replace,例如:
import pandas as pd
from datetime import datetime
ps = pd.Series([datetime(2014, 1, 7), datetime(2014, 3, 13), datetime(2014, 6, 12)])
new = ps.apply(lambda dt: dt.replace(day=1))
Gives:
给出:
0 2014-01-01
1 2014-03-01
2 2014-06-01
dtype: datetime64[ns]
回答by Kyle Barron
The other answer works, but any time you use apply, you slow your code down a lot. I was able to get an 8.5x speedup by writing a quick vectorized Datetime replace for a series.
其他答案的工作,但你使用任何时候apply,你慢你的代码下降了很多。通过为系列编写快速矢量化日期时间替换,我能够获得 8.5 倍的加速。
def vec_dt_replace(series, year=None, month=None, day=None):
return pd.to_datetime(
{'year': series.dt.year if year is None else year,
'month': series.dt.month if month is None else month,
'day': series.dt.day if day is None else day})
Apply:
申请:
%timeit dtseries.apply(lambda dt: dt.replace(day=1))
# 4.17 s ± 38.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Vectorized:
矢量化:
%timeit vec_dt_replace(dtseries, day=1)
# 491 ms ± 6.48 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Note that you could face errors by trying to change dates to ones that don't exist, like trying to change 2012-02-29 to 2013-02-29. Use the errorsargument of pd.to_datetimeto ignore or coerce them.
请注意,尝试将日期更改为不存在的日期可能会遇到错误,例如尝试将 2012-02-29 更改为 2013-02-29。使用 的errors参数pd.to_datetime来忽略或强制它们。
Data generation: Generate series with 1 million random dates:
数据生成:生成具有 100 万个随机日期的系列:
import pandas as pd
import numpy as np
# Generate random dates. Modified from: https://stackoverflow.com/a/50668285
def pp(start, end, n):
start_u = start.value // 10 ** 9
end_u = end.value // 10 ** 9
return pd.Series(
(10 ** 9 * np.random.randint(start_u, end_u, n)).view('M8[ns]'))
start = pd.to_datetime('2015-01-01')
end = pd.to_datetime('2018-01-01')
dtseries = pp(start, end, 1000000)
# Remove time component
dtseries = dtseries.dt.normalize()

