逆透视 Pandas 数据

Question

提问by Alex Rothberg

I currently have a DataFramelaid out as:

我目前有一个DataFrame布局为：

        Jan Feb Mar Apr ...
2001    1   12  12  19  
2002    9   ...
2003    ...

and I would like to "unpivot" the data to look like:

我想“反透视”数据看起来像：

Date    Value
Jan 2001    1
Feb 2001    1
Mar 2001    12
...
Jan 2002    9

What is the best way to accomplish this using pandas/NumPy?

使用 pandas/NumPy 完成此任务的最佳方法是什么？

Answer 1

回答by Viktor Kerkez

You just have to do df.unstack()and that will create a MultiIndexed Series with month as a first level and the year as the second level index. If you want them to be columns then just call reset_index()after that.

你只需要这样做df.unstack()，这将创建一个多索引系列，以月份为第一级，年份为第二级索引。如果您希望它们成为列，则只需reset_index()在此之后调用。

>>> df
      Jan  Feb
2001    3    4
2002    2    7
>>> df.unstack()
Jan  2001    3
     2002    2
Feb  2001    4
     2002    7
>>> df = df.unstack().reset_index(name='value')
>>> df
  level_0  level_1  value
0     Jan     2001      3
1     Jan     2002      2
2     Feb     2001      4
3     Feb     2002      7
>>> df.rename(columns={'level_0': 'month', 'level_1': 'year'}, inplace=True)
>>> df
  month  year  value
0   Jan  2001      3
1   Jan  2002      2
2   Feb  2001      4
3   Feb  2002      7

Answer 2

回答by Phillip Cloud

Another solution would be to use pandas.meltto avoid unnecessary creation of a MultiIndex, though this isn't thatexpensive if your frame is small and with my solution you still have to create a temporary for the "molten" data. The guts of meltsuggest that both id_varsand valueare copied since id_varscreation uses tileand valuecreation uses df.values.ravel('F')~~which I believe makes a copy if your data are not in Fortran order~~.

另一种解决方案是pandas.melt避免不必要地创建 a MultiIndex，尽管如果您的框架很小，这并不是那么昂贵，并且使用我的解决方案您仍然必须为“熔化”数据创建一个临时数据。胆量melt表明，id_vars和value都被复制，因为id_vars创建使用tile和value创建使用df.values.ravel('F')~~，如果您的数据不是 Fortran 顺序，我相信会复制~~。

EDIT:I'm not exactly sure when a copy is made when ravelis called since the orderparameter only indicates how you want your data readand the docstring says a copy is made only when needed.

编辑：我不确定何时在ravel调用时制作副本，因为该order参数仅指示您希望如何读取数据，并且文档字符串表示仅在需要时制作副本。

In [99]: mons
Out[99]:
['Jan',
 'Feb',
 'Mar',
 'Apr',
 'May',
 'Jun',
 'Jul',
 'Aug',
 'Sep',
 'Oct',
 'Nov',
 'Dec']

In [100]: df = DataFrame(randn(201, len(mons)), columns=mons, index=map(str, arange(1901, 2102)))

In [101]: df.head()
Out[101]:
        Jan    Feb    Mar    Apr    May    Jun    Jul    Aug    Sep    Oct  \
1901  1.141 -0.270  0.329  0.214 -1.030  0.324 -1.448  2.003 -0.061  0.477
1902  0.136  0.151  0.447 -0.493  1.329  1.410  0.020 -0.705  0.870  0.478
1903 -0.000  0.689  1.768 -0.057 -1.471  0.515 -0.315  0.703  2.511  0.592
1904  1.199  1.246 -0.255  0.182 -0.454 -0.452  1.074  0.178  2.495 -0.543
1905  1.073  1.375 -1.837  1.048 -0.139 -0.273 -0.958 -1.164 -1.012  0.950

        Nov    Dec
1901  0.102  0.122
1902  2.941  0.654
1903  0.347 -1.636
1904 -0.047  0.457
1905  1.277 -0.284

In [102]: df.reset_index(inplace=True)

In [103]: df.head()
Out[103]:
  index    Jan    Feb    Mar    Apr    May    Jun    Jul    Aug    Sep    Oct  \
0  1901  1.141 -0.270  0.329  0.214 -1.030  0.324 -1.448  2.003 -0.061  0.477
1  1902  0.136  0.151  0.447 -0.493  1.329  1.410  0.020 -0.705  0.870  0.478
2  1903 -0.000  0.689  1.768 -0.057 -1.471  0.515 -0.315  0.703  2.511  0.592
3  1904  1.199  1.246 -0.255  0.182 -0.454 -0.452  1.074  0.178  2.495 -0.543
4  1905  1.073  1.375 -1.837  1.048 -0.139 -0.273 -0.958 -1.164 -1.012  0.950

     Nov    Dec
0  0.102  0.122
1  2.941  0.654
2  0.347 -1.636
3 -0.047  0.457
4  1.277 -0.284

In [104]: res = pd.melt(df, id_vars=['index'], var_name=['months'])

In [105]: res['date'] = res['months'] + ' ' + res['index']

In [106]: res.head()
Out[106]:
  index months  value      date
0  1901    Jan  1.141  Jan 1901
1  1902    Jan  0.136  Jan 1902
2  1903    Jan -0.000  Jan 1903
3  1904    Jan  1.199  Jan 1904
4  1905    Jan  1.073  Jan 1905

逆透视 Pandas 数据

提问by Alex Rothberg

回答by Viktor Kerkez

回答by Phillip Cloud

相关推荐

最近更新

标签

逆透视 Pandas 数据

提问by Alex Rothberg

回答by Viktor Kerkez

回答by Phillip Cloud

相关推荐

如何并行执行对 Pandas 数据帧的多个 SQL 查询

pandas 熊猫导入错误

pandas 熊猫系列的 groupby 不起作用

Pandas：使用 groupby 和函数进行数据帧过滤

相关推荐

最近更新

标签