pandas 将年份和月份名称转换为熊猫数据框的日期时间列

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/50663700/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 05:38:36  来源:igfitidea点击:

convert year and month name into datetime column for pandas dataframe

pythonpandas

提问by user308827

How do i convert year and month name into datetime column for this dataframe:

我如何将此数据框的年份和月份名称转换为日期时间列:

 region  year    Months
0  alabama  2018   January
1  alabama  2018  February
2  alabama  2018     March
3  alabama  2018     April
4  alabama  2018       May

When I do this:

当我这样做时:

pd.to_datetime(df_sub['year'] * 10000 + df_sub['Months'] * 100, format='%Y%m')

I get this error:

我收到此错误:

*** TypeError: unsupported operand type(s) for +: 'int' and 'str'

回答by jezrael

You can convert yearcolumn to string, add Monthsand use parameter formatin to_datetimeby http://strftime.org/:

你可以转换year为字符串,添加列Months和使用参数formatto_datetimehttp://strftime.org/

print (pd.to_datetime(df_sub['year'].astype(str)  + df_sub['Months'], format='%Y%B'))
0   2018-01-01
1   2018-02-01
2   2018-03-01
3   2018-04-01
4   2018-05-01
dtype: datetime64[ns]

回答by piRSquared

f-string in a comprehension (Python 3.6+)

理解中的 f 字符串(Python 3.6+)

pd.to_datetime([f'{y}-{m}-01' for y, m in zip(df.year, df.Months)])

DatetimeIndex(['2018-01-01', '2018-02-01', '2018-03-01', '2018-04-01',
               '2018-05-01'],
              dtype='datetime64[ns]', freq=None)


str.format

str.format

pd.to_datetime(['{}-{}-01'.format(y, m) for y, m in zip(df.year, df.Months)])

DatetimeIndex(['2018-01-01', '2018-02-01', '2018-03-01', '2018-04-01',
               '2018-05-01'],
              dtype='datetime64[ns]', freq=None)

回答by Simeon Ikudabo

Here is a simple program that gets the output you are looking for:

这是一个简单的程序,可以获取您正在寻找的输出:

import pandas as pd

data_frame = pd.DataFrame({'Region': ['alabama', 'alabama', 'alabama', 'alabama', 'alabama'],
                          'Year': [2018, 2018, 2018, 2018, 2018], 'Months': ['January', 'February', 'March', 'April', 'May']})


date_1 ='{}-{}'.format(data_frame['Months'].iloc[0], data_frame['Year'].iloc[0])
date_2 = '{}-{}'.format('June', data_frame['Year'].iloc[4])

data_frame.index = pd.date_range(date_1, date_2, freq='M')
print(data_frame)

We can just format the date in the beginning location in the range, and the format date_2 from the final location + 1 month in order for us to avoid an index error. Formatting these values to strings will allow pandas to format them to dates with the date_range() function. We can set the index to this range since you said you wanted a column with these values, but if you don't want the dates to be your index, we could also create a column called dates, and use an insert statement to add them wherever you want. In our date_range function date_1 will be our first date, and date_2 will be our last date. We can also set the frequency to monthly so that the indexes in the date column align with the indexes from the other columns. Below is our output:

我们可以只在范围内的开始位置格式化日期,从最终位置开始格式化 date_2 + 1 个月,以避免索引错误。将这些值格式化为字符串将允许 Pandas 使用 date_range() 函数将它们格式化为日期。我们可以将索引设置为这个范围,因为你说你想要一个包含这些值的列,但如果你不希望日期成为你的索引,我们也可以创建一个名为日期的列,并使用插入语句添加它们哪里都行。在我们的 date_range 函数中,date_1 将是我们的第一个日期,而 date_2 将是我们的最后一个日期。我们还可以将频率设置为每月,以便日期列中的索引与其他列中的索引对齐。下面是我们的输出:

              Months   Region  Year
2018-01-31   January  alabama  2018
2018-02-28  February  alabama  2018
2018-03-31     March  alabama  2018
2018-04-30     April  alabama  2018
2018-05-31       May  alabama  2018