Pandas:如何从周和年创建日期时间对象?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/45436873/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 04:08:44  来源:igfitidea点击:

Pandas: How to create a datetime object from Week and Year?

pythonpandasdatetimenumpy

提问by Khris

I have a dataframe that provides two integer columns with the Year and Week of the year:

我有一个数据框,它提供了两个整数列,其中包含一年中的年和周:

import pandas as pd
import numpy as np
L1 = [43,44,51,2,5,12]
L2 = [2016,2016,2016,2017,2017,2017]
df = pd.DataFrame({"Week":L1,"Year":L2})

df
Out[72]: 
   Week  Year
0    43  2016
1    44  2016
2    51  2016
3     2  2017
4     5  2017
5    12  2017

I need to create a datetime-object from these two numbers.

我需要从这两个数字创建一个日期时间对象。

I tried this, but it throws an error:

我试过这个,但它抛出一个错误:

df["DT"] = df.apply(lambda x: np.datetime64(x.Year,'Y') + np.timedelta64(x.Week,'W'),axis=1)

Then I tried this, it works but gives the wrong result, that is it ignores the week completely:

然后我尝试了这个,它有效但给出了错误的结果,即它完全忽略了这一周:

df["S"] = df.Week.astype(str)+'-'+df.Year.astype(str)
df["DT"] = df["S"].apply(lambda x: pd.to_datetime(x,format='%W-%Y'))

df
Out[74]: 
   Week  Year        S         DT
0    43  2016  43-2016 2016-01-01
1    44  2016  44-2016 2016-01-01
2    51  2016  51-2016 2016-01-01
3     2  2017   2-2017 2017-01-01
4     5  2017   5-2017 2017-01-01
5    12  2017  12-2017 2017-01-01

I'm really getting lost between Python's datetime, Numpy's datetime64, and pandas Timestamp, can you tell me how it's done correctly?

我真的在 Python datetime、 Numpydatetime64和 pandas之间迷失了方向Timestamp,你能告诉我它是如何正确完成的吗?

I'm using Python 3, if that is relevant in any way.

我正在使用 Python 3,如果这有任何意义的话。

EDIT:

编辑:

Starting with Python 3.8 the problem is easily solved with a newly introduced method on datetime.date objects: https://docs.python.org/3/library/datetime.html#datetime.date.fromisocalendar

从 Python 3.8 开始,使用 datetime.date 对象上新引入的方法可以轻松解决该问题:https: //docs.python.org/3/library/datetime.html#datetime.date.fromisocalendar

采纳答案by MaxU

Try this:

尝试这个:

In [19]: pd.to_datetime(df.Year.astype(str), format='%Y') + \
             pd.to_timedelta(df.Week.mul(7).astype(str) + ' days')
Out[19]:
0   2016-10-28
1   2016-11-04
2   2016-12-23
3   2017-01-15
4   2017-02-05
5   2017-03-26
dtype: datetime64[ns]

Initially I have timestamps in s

最初我有时间戳 s

It's much easier to parse it from UNIX epoch timestamp:

从 UNIX 纪元时间戳解析它要容易得多:

df['Date'] = pd.to_datetime(df['UNIX_Time'], unit='s')

Timingfor 10M rows DF:

10M 行 DF 的时序

Setup:

设置:

In [26]: df = pd.DataFrame(pd.date_range('1970-01-01', freq='1T', periods=10**7), columns=['date'])

In [27]: df.shape
Out[27]: (10000000, 1)

In [28]: df['unix_ts'] = df['date'].astype(np.int64)//10**9

In [30]: df
Out[30]:
                       date    unix_ts
0       1970-01-01 00:00:00          0
1       1970-01-01 00:01:00         60
2       1970-01-01 00:02:00        120
3       1970-01-01 00:03:00        180
4       1970-01-01 00:04:00        240
5       1970-01-01 00:05:00        300
6       1970-01-01 00:06:00        360
7       1970-01-01 00:07:00        420
8       1970-01-01 00:08:00        480
9       1970-01-01 00:09:00        540
...                     ...        ...
9999990 1989-01-05 10:30:00  599999400
9999991 1989-01-05 10:31:00  599999460
9999992 1989-01-05 10:32:00  599999520
9999993 1989-01-05 10:33:00  599999580
9999994 1989-01-05 10:34:00  599999640
9999995 1989-01-05 10:35:00  599999700
9999996 1989-01-05 10:36:00  599999760
9999997 1989-01-05 10:37:00  599999820
9999998 1989-01-05 10:38:00  599999880
9999999 1989-01-05 10:39:00  599999940

[10000000 rows x 2 columns]

Check:

查看:

In [31]: pd.to_datetime(df.unix_ts, unit='s')
Out[31]:
0         1970-01-01 00:00:00
1         1970-01-01 00:01:00
2         1970-01-01 00:02:00
3         1970-01-01 00:03:00
4         1970-01-01 00:04:00
5         1970-01-01 00:05:00
6         1970-01-01 00:06:00
7         1970-01-01 00:07:00
8         1970-01-01 00:08:00
9         1970-01-01 00:09:00
                  ...
9999990   1989-01-05 10:30:00
9999991   1989-01-05 10:31:00
9999992   1989-01-05 10:32:00
9999993   1989-01-05 10:33:00
9999994   1989-01-05 10:34:00
9999995   1989-01-05 10:35:00
9999996   1989-01-05 10:36:00
9999997   1989-01-05 10:37:00
9999998   1989-01-05 10:38:00
9999999   1989-01-05 10:39:00
Name: unix_ts, Length: 10000000, dtype: datetime64[ns]

Timing:

定时:

In [32]: %timeit pd.to_datetime(df.unix_ts, unit='s')
10 loops, best of 3: 156 ms per loop

Conclusion:I think 156 milliseconds for converting 10.000.000 rows is not that slow

结论:我认为 156 毫秒转换 10.000.000 行并不算慢

回答by jezrael

Like @Gianmario Spacagna mentioned for datetimes higher like 2018 use %Vwith %G:

像@Gianmario Spacagna提到的日期时间像高2018使用%V具有%G

L1 = [43,44,51,2,5,12,52,53,1,2,5,52]
L2 = [2016,2016,2016,2017,2017,2017,2018,2018,2019,2019,2019,2019]
df = pd.DataFrame({"Week":L1,"Year":L2})


df['new'] = pd.to_datetime(df.Week.astype(str)+
                           df.Year.astype(str).add('-1') ,format='%V%G-%u')
print (df)
    Week  Year        new
0     43  2016 2016-10-24
1     44  2016 2016-10-31
2     51  2016 2016-12-19
3      2  2017 2017-01-09
4      5  2017 2017-01-30
5     12  2017 2017-03-20
6     52  2018 2018-12-24
7     53  2018 2018-12-31
8      1  2019 2018-12-31
9      2  2019 2019-01-07
10     5  2019 2019-01-28
11    52  2019 2019-12-23

回答by Gianmario Spacagna

There is something fishy going on with weeks starting from 2019. The ISO-8601 standard assigns the 31st December 2018 to the week 1 of year 2019. The other approaches based on:

从 2019 年开始的几周有些可疑。ISO-8601 标准将 2018 年 12 月 31 日指定为 2019 年的第 1 周。其他方法基于:

pd.to_datetime(df.Week.astype(str)+
                  df.Year.astype(str).add('-2') ,format='%W%Y-%w')

will give shifted results starting from 2019.

将从 2019 年开始提供转移的结果。

In order to be compliant with the ISO-8601 standard you would have to do the following:

为了符合 ISO-8601 标准,您必须执行以下操作:

import pandas as pd
import datetime

L1 = [52,53,1,2,5,52]
L2 = [2018,2018,2019,2019,2019,2019]
df = pd.DataFrame({"Week":L1,"Year":L2})
df['ISO'] = df['Year'].astype(str) + '-W' + df['Week'].astype(str) + '-1'
df['DT'] = df['ISO'].map(lambda x: datetime.datetime.strptime(x, "%G-W%V-%u"))
print(df)

It prints:

它打印:

   Week  Year         ISO         DT
0    52  2018  2018-W52-1 2018-12-24
1    53  2018  2018-W53-1 2018-12-31
2     1  2019   2019-W1-1 2018-12-31
3     2  2019   2019-W2-1 2019-01-07
4     5  2019   2019-W5-1 2019-01-28
5    52  2019  2019-W52-1 2019-12-23

The week 53 of 2018 is ignored and mapped to the week 1 of 2019.

2018 年的第 53 周被忽略并映射到 2019 年的第 1 周。

Please verify yourself on https://www.epochconverter.com/weeks/2019.

请在https://www.epochconverter.com/weeks/2019上验证自己。