pandas 熊猫将数据框转为 3d 数据

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/13261175/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-13 20:29:08  来源:igfitidea点击:

pandas pivot dataframe to 3d data

pythonpandas

提问by mathtick

There seem to be a lot of possibilities to pivot flat table data into a 3d array but I'm somehow not finding one that works: Suppose I have some data with columns=['name', 'type', 'date', 'value']. When I try to pivot via

似乎有很多可能性可以将平面表数据转为 3d 数组,但不知何故我找不到一个有效的方法:假设我有一些包含 columns=['name', 'type', 'date', ' 的数据价值']。当我尝试通过

pivot(index='name', columns=['type', 'date'], values='value')

I get

我得到

ValueError: Buffer has wrong number of dimensions (expected 1, got 2)

Am I reading docs from dev pandas maybe? It seems like this is the usage described there. I am running 0.8 pandas.

我可能正在阅读 dev pandas 的文档吗?似乎这是那里描述的用法。我正在运行 0.8 Pandas。

I guess, I'm wondering if I have a MultiIndex ['x', 'y', 'z'] Series, is there a pandas way to put that in a panel? I can use groupby and get the job done, but then this is almost like what I would do in numpy to assemble an n-d array. Seems like a fairly generic operation so I would imagine it might be implemented already.

我想,我想知道我是否有一个 MultiIndex ['x', 'y', 'z'] 系列,有没有一种 Pandas 方法可以将它放在面板中?我可以使用 groupby 并完成工作,但这几乎就像我在 numpy 中组装 nd 数组所做的一样。似乎是一个相当通用的操作,所以我想它可能已经实现了。

回答by Matti John

pivotonly supports using a single column to generate your columns. You probably want to use pivot_tableto generate a pivot table using multiple columns e.g.

pivot仅支持使用单个列来生成您的列。您可能想pivot_table使用多列生成数据透视表,例如

pandas.tools.pivot.pivot_table(your_dataframe, values='value', index='name', columns=['type', 'date'], aggfunc='sum')

The hierarchical columns that are mentioned in the API referenceand documentationfor pivotrelates to cases where you have multiple valuefields rather than multiple categories.

这是在中提到的分层列API参考文档pivot涉及到,如果有多个案例字段,而不是多个类别

Assuming 'type' and 'date' are categories, whose values should be used as the column names, then you should use pivot_table.

假设 'type' 和 'date' 是类别,它们的值应该用作列名,那么您应该使用pivot_table.

However, if you want separate columns for different value fields for the same category (e.g. 'type'), then you should use pivotwithout specifying the value column and your category as the columns parameter.

但是,如果您希望为同一类别(例如“类型”)的不同值字段使用单独的列,那么您应该使用pivot而不将值列和您的类别指定为列参数。

For example, suppose you have this DataFrame:

例如,假设您有这个 DataFrame:

df = DataFrame({'name': ['A', 'B', 'A', 'B'], 'type': [1, 1, 2, 2], 'date': ['2012-01-01', '2012-01-01', '2012-02-01', '2012-02-01'],  'value': [1, 2, 3, 4]})

pt = df.pivot_table(values='value', index='name', columns=['type', 'date'])
p = df.pivot('name', 'type')

pt will be:

pt 将是:

type           1           2
date  2012-01-01  2012-02-01
name                        
A              1           3
B              2           4

and p will be:

p 将是:

          date              value   
type           1           2      1  2
name                                  
A     2012-01-01  2012-02-01      1  3
B     2012-01-01  2012-02-01      2  4

NOTE: For pandas version < 0.14.0, the indexand columnskeyword arguments should be replaced with rowsand colsrespecively.

注意:对于版本 < 0.14.0 的 Pandas,indexcolumns关键字参数应分别替换为rowscols

回答by eldad-a

The original post ended with the question:

原始帖子以问题结尾:

"I'm wondering if I have a MultiIndex ['x', 'y', 'z'] Series, is there a pandas way to put that in a panel?"

“我想知道我是否有一个 MultiIndex ['x', 'y', 'z'] 系列,有没有一种 Pandas 方法可以将它放在面板中?”

to which I was looking for a solution myself.

我自己正在寻找解决方案。

I ended up with the following:

我最终得到了以下结果:

In [1]: import pandas as pd

## generate xyz example:
In [3]: df = pd.DataFrame({col:pd.np.random.randint(0,10,10) 
                               for col in ['x','y','z','data']})

## set all x,y,z coordinates as indices
In [5]: df.set_index(['x','y','z'], inplace=True)

## set the z coordinate as headers of the columns 
# NB: this is will turn the data into "dense" with NaNs where there were no 'data'
In [7]: df = df['data'].unstack()

## now it is ready to be "pivot"ed into a panel
In [9]: data_panel = df.to_panel()

In [10]: df
Out[10]: 
     data                        
z       1   3   4   5   6   7   9
x y                              
1 5   NaN NaN NaN NaN NaN NaN   1
  6   NaN NaN NaN NaN NaN NaN   0
2 9   NaN NaN NaN NaN NaN   1 NaN
3 9     6 NaN NaN NaN NaN NaN NaN
5 9   NaN NaN NaN NaN NaN NaN   8
7 1   NaN NaN NaN NaN   8 NaN NaN
  3   NaN NaN NaN NaN NaN NaN   5
  7   NaN NaN NaN   1 NaN NaN NaN
  9   NaN   0 NaN NaN NaN NaN NaN
9 5   NaN NaN   1 NaN NaN NaN NaN

[10 rows x 7 columns]

In [11]: data_panel
Out[11]: 
<class 'pandas.core.panel.Panel'>
Dimensions: 7 (items) x 6 (major_axis) x 6 (minor_axis)
Items axis: 1 to 9
Major_axis axis: 1 to 9
Minor_axis axis: 1 to 9

The columns headers will be the Items of the Panel, the first level index with be the MajorAxis (rows) and the second level will be the MinorAxis (columns)

列标题将是面板的项目,第一级索引是 MajorAxis(行),第二级将是 MinorAxis(列)