Python pandas - pd.melt 带有日期时间索引的数据帧结果为 NaN

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/30984167/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-13 23:30:35  来源:igfitidea点击:

Python pandas - pd.melt a dataframe with datetime index results in NaN

pythondatetimepandas

提问by lucfr

I have the following dataframe (sim_2005):

我有以下数据框(sim_2005):

Date         ELEM1 ELEM2 ... ELEM1133
2005-01-01   0.021 2.455 ... 345.2
2005-01-02   0.321 2.331 ... 355.1
...          ...   ...   ... ...
2005-12-31   0.789 3.456 ... 459.9
[365 rows x 1133 columns]

with Datebeing a pandas.tseries.index.DatetimeIndex. I transformed it with the help of @ami-tavoryusing pandas melt function:

Date作为一个pandas.tseries.index.DatetimeIndex。我在@ami-tavory的帮助下使用 pandas 熔化函数对其进行了转换:

 sim_2005_melted = pd.melt(sim_2005, id_vars=sim_2005.index.name, value_vars=list(sim_2005.columns.values), var_name='ELEM', value_name='Q_sim').sort(columns='Date')

Which results in:

结果是:

ID     Date   ELEM     Q_sim
1      NaN    ELEM1    0.021
2      NaN    ELEM1    0.321
...
366    NaN    ELEM2    2.455
367    NaN    ELEM2    2.331
...
402983 NaN    ELEM1133 345.2
402984 NaN    ELEM1133 355.1

For some reason the datetime index is not transported over and the column is filled with NaN's. Any help or idea what's wrong?

出于某种原因,日期时间索引没有被传输,并且该列填充了 NaN。任何帮助或想法有什么问题?

回答by Alexander

Assuming Dateis the index to your DataFrame, you can get a date column in your melted DataFrame as follows:

假设Date是您的 DataFrame 的索引,您可以在熔化的 DataFrame 中获得一个日期列,如下所示:

sim_2005_melted['Date'] = pd.concat([sim_2005.reset_index().Date 
                                     for _ in range(sim_2005.shape[1])], 
                                    ignore_index=True).values

回答by Jianxun Li

Here is one way to use .stack()to solve your question.

这是.stack()用于解决您的问题的一种方法。

import pandas as pd
import numpy as np

# try to simulate your data
columns = ['ELEM' + str(x) for x in np.arange(1, 1134, 1)]
sim_2005 = pd.DataFrame(np.random.randn(365, 1133), index=pd.date_range('2005-01-01', periods=365, freq='D'), columns=columns)

processed_sim_2005 = sim_2005.stack().reset_index()
processed_sim_2005.columns = ['Date', 'ELEM', 'Q_sim']

Out[82]: 
             Date      ELEM   Q_sim
0      2005-01-01     ELEM1  0.6221
1      2005-01-01     ELEM2  0.1862
2      2005-01-01     ELEM3 -1.0736
3      2005-01-01     ELEM4 -0.9756
4      2005-01-01     ELEM5  0.8397
...           ...       ...     ...
413540 2005-12-31  ELEM1129  0.0345
413541 2005-12-31  ELEM1130  0.5522
413542 2005-12-31  ELEM1131 -0.6900
413543 2005-12-31  ELEM1132 -0.2269
413544 2005-12-31  ELEM1133  0.1243

[413545 rows x 3 columns]

回答by Joseph Clark

A possibly simpler solution still using .melt()is to pull your date index out into a column with .reset_index()first:

仍在使用的一个可能更简单的解决方案.melt()是将您的日期索引拉到一个列中,.reset_index()首先:

sim_2005_melted = pd.melt(sim_2005.reset_index(), id_vars=sim_2005.index.name, value_vars=list(sim_2005.columns.values), var_name='ELEM', value_name='Q_sim')

sim_2005_melted = pd.melt(sim_2005.reset_index(), id_vars=sim_2005.index.name, value_vars=list(sim_2005.columns.values), var_name='ELEM', value_name='Q_sim')

You get the same result with .stack()but this way is a bit more flexible if you want all the extra melty goodness.

你得到相同的结果,.stack()但如果你想要所有额外的融化善良,这种方式会更灵活。