逆透视 Pandas 数据
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 
原文地址: http://stackoverflow.com/questions/18259067/
Warning: these are provided under cc-by-sa 4.0 license.  You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Unpivot Pandas Data
提问by Alex Rothberg
I currently have a DataFramelaid out as:
我目前有一个DataFrame布局为:
        Jan Feb Mar Apr ...
2001    1   12  12  19  
2002    9   ...
2003    ...
and I would like to "unpivot" the data to look like:
我想“反透视”数据看起来像:
Date    Value
Jan 2001    1
Feb 2001    1
Mar 2001    12
...
Jan 2002    9
What is the best way to accomplish this using pandas/NumPy?
使用 pandas/NumPy 完成此任务的最佳方法是什么?
回答by Viktor Kerkez
You just have to do df.unstack()and that will create a MultiIndexed Series with month as a first level and the year as the second level index. If you want them to be columns then just call reset_index()after that.
你只需要这样做df.unstack(),这将创建一个多索引系列,以月份为第一级,年份为第二级索引。如果您希望它们成为列,则只需reset_index()在此之后调用。
>>> df
      Jan  Feb
2001    3    4
2002    2    7
>>> df.unstack()
Jan  2001    3
     2002    2
Feb  2001    4
     2002    7
>>> df = df.unstack().reset_index(name='value')
>>> df
  level_0  level_1  value
0     Jan     2001      3
1     Jan     2002      2
2     Feb     2001      4
3     Feb     2002      7
>>> df.rename(columns={'level_0': 'month', 'level_1': 'year'}, inplace=True)
>>> df
  month  year  value
0   Jan  2001      3
1   Jan  2002      2
2   Feb  2001      4
3   Feb  2002      7
回答by Phillip Cloud
Another solution would be to use pandas.meltto avoid unnecessary creation of a MultiIndex, though this isn't thatexpensive if your frame is small and with my solution you still have to create a temporary for the "molten" data. The guts of meltsuggest that both id_varsand valueare copied since id_varscreation uses tileand valuecreation uses df.values.ravel('F')which I believe makes a copy if your data are not in Fortran order. 
另一种解决方案是pandas.melt避免不必要地创建 a MultiIndex,尽管如果您的框架很小,这并不是那么昂贵,并且使用我的解决方案您仍然必须为“熔化”数据创建一个临时数据。胆量melt表明,id_vars和value都被复制,因为id_vars创建使用tile和value创建使用df.values.ravel('F'),如果您的数据不是 Fortran 顺序,我相信会复制。
EDIT:I'm not exactly sure when a copy is made when ravelis called since the orderparameter only indicates how you want your data readand the docstring says a copy is made only when needed.
编辑:我不确定何时在ravel调用时制作副本,因为该order参数仅指示您希望如何读取数据,并且文档字符串表示仅在需要时制作副本。
In [99]: mons
Out[99]:
['Jan',
 'Feb',
 'Mar',
 'Apr',
 'May',
 'Jun',
 'Jul',
 'Aug',
 'Sep',
 'Oct',
 'Nov',
 'Dec']
In [100]: df = DataFrame(randn(201, len(mons)), columns=mons, index=map(str, arange(1901, 2102)))
In [101]: df.head()
Out[101]:
        Jan    Feb    Mar    Apr    May    Jun    Jul    Aug    Sep    Oct  \
1901  1.141 -0.270  0.329  0.214 -1.030  0.324 -1.448  2.003 -0.061  0.477
1902  0.136  0.151  0.447 -0.493  1.329  1.410  0.020 -0.705  0.870  0.478
1903 -0.000  0.689  1.768 -0.057 -1.471  0.515 -0.315  0.703  2.511  0.592
1904  1.199  1.246 -0.255  0.182 -0.454 -0.452  1.074  0.178  2.495 -0.543
1905  1.073  1.375 -1.837  1.048 -0.139 -0.273 -0.958 -1.164 -1.012  0.950
        Nov    Dec
1901  0.102  0.122
1902  2.941  0.654
1903  0.347 -1.636
1904 -0.047  0.457
1905  1.277 -0.284
In [102]: df.reset_index(inplace=True)
In [103]: df.head()
Out[103]:
  index    Jan    Feb    Mar    Apr    May    Jun    Jul    Aug    Sep    Oct  \
0  1901  1.141 -0.270  0.329  0.214 -1.030  0.324 -1.448  2.003 -0.061  0.477
1  1902  0.136  0.151  0.447 -0.493  1.329  1.410  0.020 -0.705  0.870  0.478
2  1903 -0.000  0.689  1.768 -0.057 -1.471  0.515 -0.315  0.703  2.511  0.592
3  1904  1.199  1.246 -0.255  0.182 -0.454 -0.452  1.074  0.178  2.495 -0.543
4  1905  1.073  1.375 -1.837  1.048 -0.139 -0.273 -0.958 -1.164 -1.012  0.950
     Nov    Dec
0  0.102  0.122
1  2.941  0.654
2  0.347 -1.636
3 -0.047  0.457
4  1.277 -0.284
In [104]: res = pd.melt(df, id_vars=['index'], var_name=['months'])
In [105]: res['date'] = res['months'] + ' ' + res['index']
In [106]: res.head()
Out[106]:
  index months  value      date
0  1901    Jan  1.141  Jan 1901
1  1902    Jan  0.136  Jan 1902
2  1903    Jan -0.000  Jan 1903
3  1904    Jan  1.199  Jan 1904
4  1905    Jan  1.073  Jan 1905

