pandas 将数据框列名称从字符串格式更改为日期时间

Question

提问by gtroupis

I have a dataframe where the names of the columns are dates (Year-month) in the form of strings. How can I convert these names in datetime format? I tried doing this:

我有一个数据框，其中列的名称是字符串形式的日期（年-月）。如何将这些名称转换为日期时间格式？我试过这样做：

new_cols = pd.to_datetime(df.columns)
df = df[new_cols]

but I get the error:

但我收到错误：

KeyError: "DatetimeIndex(
['2000-01-01', '2000-02-01',
 '2000-03-01', '2000-04-01',
 '2000-05-01', '2000-06-01', 
'2000-07-01', '2000-08-01',               
'2000-09-01', '2000-10-01',
'2015-11-01', '2015-12-01', 
'2016-01-01', '2016-02-01',
'2016-03-01', '2016-04-01', 
'2016-05-01', '2016-06-01',
'2016-07-01', '2016-08-01'],
dtype='datetime64[ns]', length=200, freq=None) not in index"

Thanks!

谢谢！

Answer 1

回答by jezrael

If select by loccolumns values was not changed, so get KeyError.

如果按loc列选择值未更改，则获取KeyError.

So you need assign output to columns:

所以你需要将输出分配给columns：

df.columns = pd.to_datetime(df.columns)

Sample:

样本：

cols = ['2000-01-01', '2000-02-01', '2000-03-01', '2000-04-01', '2000-05-01']
vals = np.arange(5)
df = pd.DataFrame(columns = cols, data=[vals])
print (df)
   2000-01-01  2000-02-01  2000-03-01  2000-04-01  2000-05-01
0           0           1           2           3           4

print (df.columns)
Index(['2000-01-01', '2000-02-01', '2000-03-01', '2000-04-01', '2000-05-01'], dtype='object')

df.columns = pd.to_datetime(df.columns)

print (df.columns)
DatetimeIndex(['2000-01-01', '2000-02-01', '2000-03-01', '2000-04-01',
               '2000-05-01'],
              dtype='datetime64[ns]', freq=None)

Also is possible convert to period:

也可以转换为句点：

print (df.columns)
Index(['2000-01-01', '2000-02-01', '2000-03-01', '2000-04-01', '2000-05-01'], dtype='object')

df.columns = pd.to_datetime(df.columns).to_period('M')

print (df.columns)
PeriodIndex(['2000-01', '2000-02', '2000-03', '2000-04', '2000-05'],
             dtype='period[M]', freq='M')

Answer 2

回答by Fred Cascarini

As an expansion to jezrael's answer, the original code will be trying to slice the df array by the array stored in new_cols and store the result as df - but since those values don't exist in df yet it returns an error saying it can't find that index to slice by.

作为对 jezrael 答案的扩展，原始代码将尝试通过存储在 new_cols 中的数组对 df 数组进行切片并将结果存储为 df - 但由于这些值在 df 中不存在但它返回一个错误，表示它可以' t 找到要切片的索引。

As such you need to declare that you're changing the name of the columns, as in jezrael's answer.

因此，您需要声明您正在更改列的名称，如 jezrael 的回答。

pandas 将数据框列名称从字符串格式更改为日期时间

提问by gtroupis

回答by jezrael

回答by Fred Cascarini

相关推荐

最近更新

标签

pandas 将数据框列名称从字符串格式更改为日期时间

提问by gtroupis

回答by jezrael

回答by Fred Cascarini

相关推荐

Python Pandas：根据时间范围删除时间序列的行

Python Pandas 线性回归 groupby

Python & Pandas：如何查询列表类型的列是否包含某些内容？

pandas 熊猫分组和过滤

相关推荐

最近更新

标签