pandas 将年份和月份名称转换为熊猫数据框的日期时间列

Question

提问by user308827

How do i convert year and month name into datetime column for this dataframe:

我如何将此数据框的年份和月份名称转换为日期时间列：

 region  year    Months
0  alabama  2018   January
1  alabama  2018  February
2  alabama  2018     March
3  alabama  2018     April
4  alabama  2018       May

When I do this:

当我这样做时：

pd.to_datetime(df_sub['year'] * 10000 + df_sub['Months'] * 100, format='%Y%m')

I get this error:

我收到此错误：

*** TypeError: unsupported operand type(s) for +: 'int' and 'str'

Answer 1

回答by jezrael

You can convert yearcolumn to string, add Monthsand use parameter formatin to_datetimeby http://strftime.org/:

你可以转换year为字符串，添加列Months和使用参数format在to_datetime由http://strftime.org/：

print (pd.to_datetime(df_sub['year'].astype(str)  + df_sub['Months'], format='%Y%B'))
0   2018-01-01
1   2018-02-01
2   2018-03-01
3   2018-04-01
4   2018-05-01
dtype: datetime64[ns]

Answer 2

回答by piRSquared

f-string in a comprehension (Python 3.6+)

理解中的 f 字符串（Python 3.6+）

pd.to_datetime([f'{y}-{m}-01' for y, m in zip(df.year, df.Months)])

DatetimeIndex(['2018-01-01', '2018-02-01', '2018-03-01', '2018-04-01',
               '2018-05-01'],
              dtype='datetime64[ns]', freq=None)

str.format

pd.to_datetime(['{}-{}-01'.format(y, m) for y, m in zip(df.year, df.Months)])

DatetimeIndex(['2018-01-01', '2018-02-01', '2018-03-01', '2018-04-01',
               '2018-05-01'],
              dtype='datetime64[ns]', freq=None)

Answer 3

回答by Simeon Ikudabo

Here is a simple program that gets the output you are looking for:

这是一个简单的程序，可以获取您正在寻找的输出：

import pandas as pd

data_frame = pd.DataFrame({'Region': ['alabama', 'alabama', 'alabama', 'alabama', 'alabama'],
                          'Year': [2018, 2018, 2018, 2018, 2018], 'Months': ['January', 'February', 'March', 'April', 'May']})


date_1 ='{}-{}'.format(data_frame['Months'].iloc[0], data_frame['Year'].iloc[0])
date_2 = '{}-{}'.format('June', data_frame['Year'].iloc[4])

data_frame.index = pd.date_range(date_1, date_2, freq='M')
print(data_frame)

We can just format the date in the beginning location in the range, and the format date_2 from the final location + 1 month in order for us to avoid an index error. Formatting these values to strings will allow pandas to format them to dates with the date_range() function. We can set the index to this range since you said you wanted a column with these values, but if you don't want the dates to be your index, we could also create a column called dates, and use an insert statement to add them wherever you want. In our date_range function date_1 will be our first date, and date_2 will be our last date. We can also set the frequency to monthly so that the indexes in the date column align with the indexes from the other columns. Below is our output:

我们可以只在范围内的开始位置格式化日期，从最终位置开始格式化 date_2 + 1 个月，以避免索引错误。将这些值格式化为字符串将允许 Pandas 使用 date_range() 函数将它们格式化为日期。我们可以将索引设置为这个范围，因为你说你想要一个包含这些值的列，但如果你不希望日期成为你的索引，我们也可以创建一个名为日期的列，并使用插入语句添加它们哪里都行。在我们的 date_range 函数中，date_1 将是我们的第一个日期，而 date_2 将是我们的最后一个日期。我们还可以将频率设置为每月，以便日期列中的索引与其他列中的索引对齐。下面是我们的输出：

              Months   Region  Year
2018-01-31   January  alabama  2018
2018-02-28  February  alabama  2018
2018-03-31     March  alabama  2018
2018-04-30     April  alabama  2018
2018-05-31       May  alabama  2018

pandas 将年份和月份名称转换为熊猫数据框的日期时间列

提问by user308827

回答by jezrael

回答by piRSquared

回答by Simeon Ikudabo

相关推荐

最近更新

标签

pandas 将年份和月份名称转换为熊猫数据框的日期时间列

提问by user308827

回答by jezrael

回答by piRSquared

回答by Simeon Ikudabo

相关推荐

pandas AttributeError：无法访问“DataFrameGroupBy”对象的可调用属性“reset_index”，请尝试使用“apply”方法

当第一列为空时，Pandas 读取带有多个标题的 Excel 表

pandas 使用样本权重训练 xgboost (0.7) 分类器

pandas 从 numpy 数组创建熊猫数据框

相关推荐

最近更新

标签