从 Python (pandas) 的日期列中获取周开始日期(星期一)?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/27989120/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Get week start date (Monday) from a date column in Python (pandas)?
提问by dev28
I have seen a lot of posts about how you can do it with a date string but I am trying something for a dataframe column and haven't got any luck so far. My current method is : Get the weekday from 'myday' and then offset to get monday.
我已经看过很多关于如何使用日期字符串来完成它的帖子,但我正在为数据框列尝试一些东西,但到目前为止还没有任何运气。我目前的方法是:从“我的一天”中获取工作日,然后偏移以获得星期一。
df['myday'] is column of dates.
mydays = pd.DatetimeIndex(df['myday']).weekday
df['week_start'] = pd.DatetimeIndex(df['myday']) - pd.DateOffset(days=mydays)
But I get TypeError: unsupported type for timedelta days component: numpy.ndarray
但我得到 TypeError: unsupported type for timedelta days component: numpy.ndarray
How can I get week start date from a df column?
如何从 df 列中获取周开始日期?
采纳答案by knightofni
it fails because pd.DateOffset expects a single integer as a parameter (and you are feeding it an array). You can only use DateOffset to change a date column by the same offset.
它失败是因为 pd.DateOffset 需要一个整数作为参数(并且您正在为其提供一个数组)。您只能使用 DateOffset 将日期列更改为相同的偏移量。
try this :
尝试这个 :
import datetime as dt
# Change 'myday' to contains dates as datetime objects
df['myday'] = pd.to_datetime(df['myday'])
# 'daysoffset' will container the weekday, as integers
df['daysoffset'] = df['myday'].apply(lambda x: x.weekday())
# We apply, row by row (axis=1) a timedelta operation
df['week_start'] = df.apply(lambda x: x['myday'] - dt.TimeDelta(days=x['daysoffset']), axis=1)
I haven't actually tested this code, (there was no sample data), but that should work for what you have described.
我还没有真正测试过这段代码,(没有样本数据),但这应该适用于你所描述的。
However, you might want to look at pandas.Resample, which might provide a better solution - depending on exactly what you are looking for.
但是,您可能想要查看pandas.Resample,它可能会提供更好的解决方案 - 取决于您正在寻找的内容。
回答by n8yoder
While both @knightofni'sand @Paul'ssolutions work I tend to try to stay away from using apply in Pandas because it is usually quite slow compared to array-based methods. In order to avoid this, after casting to a datetime column (via pd.to_datetime) we can modify the weekday based method and simply cast the day of the week to be a numpy timedelta64[D]by either casting it directly:
虽然两者@ knightofni的和@保罗的解决方案的工作我倾向于尝试使用适用于大Pandas,因为它通常相当缓慢相比,基于阵列的方法望而却步。为了避免这种情况,在转换到日期时间列(通过pd.to_datetime)之后,我们可以修改基于工作日的方法,并通过直接转换来简单地将星期几转换为numpy timedelta64[D]:
df['week_start'] = df['myday'] - df['myday'].dt.weekday.astype('timedelta64[D]')
or by using to_timedeltaas @ribitskiyb suggested:
或者按照@ribitskiyb 的建议使用to_timedelta:
df['week_start'] = df['myday'] - pd.to_timedelta(df['myday'].dt.weekday, unit='D').
Using test data with 60,000 datetimes I got the following times using the suggested answers using the newly released Pandas 1.0.1.
使用具有 60,000 个日期时间的测试数据,我使用新发布的 Pandas 1.0.1 使用建议的答案获得了以下时间。
%timeit df.apply(lambda x: x['myday'] - datetime.timedelta(days=x['myday'].weekday()), axis=1)
>>> 1.33 s ± 28.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%timeit df['myday'].dt.to_period('W').apply(lambda r: r.start_time)
>>> 5.59 ms ± 138 μs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit df['myday'] - df['myday'].dt.weekday.astype('timedelta64[D]')
>>> 3.44 ms ± 106 μs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit df['myday'] - pd.to_timedelta(df['myday'].dt.weekday, unit='D')
>>> 3.47 ms ± 170 μs per loop (mean ± std. dev. of 7 runs, 100 loops each)
These results show that Pandas 1.0.1 has dramatically improved the speed of the to_period apply based method (vs Pandas <= 0.25) but show that converting directly to a timedelta (by either casting the type directly .astype('timedelta64[D]')or using pd.to_timedeltais still superior. Based on these results I would suggest using pd.to_timedeltagoing forward.
这些结果表明 Pandas 1.0.1 显着提高了基于 to_period 应用的方法的速度(与 Pandas <= 0.25 相比),但表明直接转换为 timedelta(通过直接转换类型.astype('timedelta64[D]')或使用pd.to_timedelta仍然更好。基于这些结果我建议pd.to_timedelta继续使用。
回答by Paul
Another alternative:
另一种选择:
df['week_start'] = df['myday'].dt.to_period('W').apply(lambda r: r.start_time)
This will set 'week_start' to be the first Monday before the time in 'myday'.
这会将“week_start”设置为“myday”时间之前的第一个星期一。
回答by ribitskiyb
(Just adding to n8yoder's answer)
(只是添加到n8yoder的答案)
Using .astype('timedelta64[D]')seems not so readable to me -- found an alternative using just the functionality of pandas:
使用.astype('timedelta64[D]')对我来说似乎不太可读——找到了一个只使用Pandas功能的替代方案:
df['myday'] - pd.to_timedelta(arg=df['myday'].dt.weekday, unit='D')
回答by Rohan R. Pawar
from datetime import datetime, timedelta
# Convert column to pandas datetime equivalent
df['myday'] = pd.to_datetime(df['myday'])
# Create function to calculate Start Week date
week_start_date = lambda date: date - timedelta(days=date.weekday())
# Apply above function on DataFrame column
df['week_start_date'] = df['myday'].apply(week_start_date)

