pandas 将年份和年份中的日期转换为熊猫中的日期时间索引

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/34258892/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 00:23:02  来源:igfitidea点击:

Converting year and day of year into datetime index in pandas

pythonpandas

提问by user308827

I have a dataframe:

我有一个数据框:

 year  doy
 2000   49   
 2000   65   
 2000   81   
 2001   97   
 2001  113   
 2001  129   
 2001  145   
 2001  161 

I want to create a datetime index for this dataframe. Here is what I am doing:

我想为这个数据框创建一个日期时间索引。这是我在做什么:

df.index = pandas.DatetimeIndex(df['doy'].apply(lambda x: date(2000, 1, 1)+ relativedelta(days=int(x)-1)))

However, this creates a datetime index which only uses 2000 as year. How can I fix that?

但是,这会创建一个仅使用 2000 作为年份的日期时间索引。我该如何解决?

采纳答案by unutbu

You can use NumPy datetime64/timedelta64 arithmeticto find the desired dates:

您可以使用NumPy datetime64/timedelta64 算法来查找所需的日期:

In [97]: (np.asarray(df['year'], dtype='datetime64[Y]')-1970)+(np.asarray(df['doy'], dtype='timedelta64[D]')-1)
Out[97]: 
array(['2000-02-18', '2000-03-05', '2000-03-21', '2001-04-07',
       '2001-04-23', '2001-05-09', '2001-05-25', '2001-06-10'], dtype='datetime64[D]')

Since composing dates given various parts of dates (e.g. years, months, days, weeks, hours, etc.) is a common problem, here is a utility function to make it easier:

由于给定日期的各个部分(例如年、月、日、周、小时等)来组合日期是一个常见问题,因此这里有一个实用函数来简化它:

def compose_date(years, months=1, days=1, weeks=None, hours=None, minutes=None,
                 seconds=None, milliseconds=None, microseconds=None, nanoseconds=None):
    years = np.asarray(years) - 1970
    months = np.asarray(months) - 1
    days = np.asarray(days) - 1
    types = ('<M8[Y]', '<m8[M]', '<m8[D]', '<m8[W]', '<m8[h]',
             '<m8[m]', '<m8[s]', '<m8[ms]', '<m8[us]', '<m8[ns]')
    vals = (years, months, days, weeks, hours, minutes, seconds,
            milliseconds, microseconds, nanoseconds)
    return sum(np.asarray(v, dtype=t) for t, v in zip(types, vals)
               if v is not None)

df = pd.DataFrame({'doy': [49, 65, 81, 97, 113, 129, 145, 161],
                   'year': [2000, 2000, 2000, 2001, 2001, 2001, 2001, 2001]})

df.index = compose_date(df['year'], days=df['doy'])

yields

产量

            doy  year
2000-02-18   49  2000
2000-03-05   65  2000
2000-03-21   81  2000
2001-04-07   97  2001
2001-04-23  113  2001
2001-05-09  129  2001
2001-05-25  145  2001
2001-06-10  161  2001

回答by Alex

You can use the date specifier %jto extract the day of year. So combine the two columns, shift the year, and convert to datetime!

您可以使用日期说明符%j来提取一年中的哪一天。所以将两列组合起来,移动年份,然后转换为日期时间!

pd.to_datetime(df['year'] * 1000 + df['doy'], format='%Y%j')

returns

回报

0   2000-02-18
1   2000-03-05
2   2000-03-21
3   2001-04-07
4   2001-04-23
5   2001-05-09
6   2001-05-25
7   2001-06-10
dtype: datetime64[ns]