pandas 熊猫将数据框转为 3d 数据

Question

提问by mathtick

There seem to be a lot of possibilities to pivot flat table data into a 3d array but I'm somehow not finding one that works: Suppose I have some data with columns=['name', 'type', 'date', 'value']. When I try to pivot via

似乎有很多可能性可以将平面表数据转为 3d 数组，但不知何故我找不到一个有效的方法：假设我有一些包含 columns=['name', 'type', 'date', ' 的数据价值']。当我尝试通过

pivot(index='name', columns=['type', 'date'], values='value')

I get

我得到

ValueError: Buffer has wrong number of dimensions (expected 1, got 2)

Am I reading docs from dev pandas maybe? It seems like this is the usage described there. I am running 0.8 pandas.

我可能正在阅读 dev pandas 的文档吗？似乎这是那里描述的用法。我正在运行 0.8 Pandas。

I guess, I'm wondering if I have a MultiIndex ['x', 'y', 'z'] Series, is there a pandas way to put that in a panel? I can use groupby and get the job done, but then this is almost like what I would do in numpy to assemble an n-d array. Seems like a fairly generic operation so I would imagine it might be implemented already.

我想，我想知道我是否有一个 MultiIndex ['x', 'y', 'z'] 系列，有没有一种 Pandas 方法可以将它放在面板中？我可以使用 groupby 并完成工作，但这几乎就像我在 numpy 中组装 nd 数组所做的一样。似乎是一个相当通用的操作，所以我想它可能已经实现了。

Answer 1

回答by Matti John

pivotonly supports using a single column to generate your columns. You probably want to use pivot_tableto generate a pivot table using multiple columns e.g.

pivot仅支持使用单个列来生成您的列。您可能想pivot_table使用多列生成数据透视表，例如

pandas.tools.pivot.pivot_table(your_dataframe, values='value', index='name', columns=['type', 'date'], aggfunc='sum')

The hierarchical columns that are mentioned in the API referenceand documentationfor pivotrelates to cases where you have multiple valuefields rather than multiple categories.

这是在中提到的分层列API参考和文档的pivot涉及到，如果有多个案例值字段，而不是多个类别。

Assuming 'type' and 'date' are categories, whose values should be used as the column names, then you should use pivot_table.

假设 'type' 和 'date' 是类别，它们的值应该用作列名，那么您应该使用pivot_table.

However, if you want separate columns for different value fields for the same category (e.g. 'type'), then you should use pivotwithout specifying the value column and your category as the columns parameter.

但是，如果您希望为同一类别（例如“类型”）的不同值字段使用单独的列，那么您应该使用pivot而不将值列和您的类别指定为列参数。

For example, suppose you have this DataFrame:

例如，假设您有这个 DataFrame：

df = DataFrame({'name': ['A', 'B', 'A', 'B'], 'type': [1, 1, 2, 2], 'date': ['2012-01-01', '2012-01-01', '2012-02-01', '2012-02-01'],  'value': [1, 2, 3, 4]})

pt = df.pivot_table(values='value', index='name', columns=['type', 'date'])
p = df.pivot('name', 'type')

pt will be:

pt 将是：

type           1           2
date  2012-01-01  2012-02-01
name                        
A              1           3
B              2           4

and p will be:

p 将是：

          date              value   
type           1           2      1  2
name                                  
A     2012-01-01  2012-02-01      1  3
B     2012-01-01  2012-02-01      2  4

NOTE: For pandas version < 0.14.0, the indexand columnskeyword arguments should be replaced with rowsand colsrespecively.

注意：对于版本 < 0.14.0 的 Pandas，index和columns关键字参数应分别替换为rows和cols。

Answer 2

回答by eldad-a

The original post ended with the question:

原始帖子以问题结尾：

"I'm wondering if I have a MultiIndex ['x', 'y', 'z'] Series, is there a pandas way to put that in a panel?"

“我想知道我是否有一个 MultiIndex ['x', 'y', 'z'] 系列，有没有一种 Pandas 方法可以将它放在面板中？”

to which I was looking for a solution myself.

我自己正在寻找解决方案。

I ended up with the following:

我最终得到了以下结果：

In [1]: import pandas as pd

## generate xyz example:
In [3]: df = pd.DataFrame({col:pd.np.random.randint(0,10,10) 
                               for col in ['x','y','z','data']})

## set all x,y,z coordinates as indices
In [5]: df.set_index(['x','y','z'], inplace=True)

## set the z coordinate as headers of the columns 
# NB: this is will turn the data into "dense" with NaNs where there were no 'data'
In [7]: df = df['data'].unstack()

## now it is ready to be "pivot"ed into a panel
In [9]: data_panel = df.to_panel()

In [10]: df
Out[10]: 
     data                        
z       1   3   4   5   6   7   9
x y                              
1 5   NaN NaN NaN NaN NaN NaN   1
  6   NaN NaN NaN NaN NaN NaN   0
2 9   NaN NaN NaN NaN NaN   1 NaN
3 9     6 NaN NaN NaN NaN NaN NaN
5 9   NaN NaN NaN NaN NaN NaN   8
7 1   NaN NaN NaN NaN   8 NaN NaN
  3   NaN NaN NaN NaN NaN NaN   5
  7   NaN NaN NaN   1 NaN NaN NaN
  9   NaN   0 NaN NaN NaN NaN NaN
9 5   NaN NaN   1 NaN NaN NaN NaN

[10 rows x 7 columns]

In [11]: data_panel
Out[11]: 
<class 'pandas.core.panel.Panel'>
Dimensions: 7 (items) x 6 (major_axis) x 6 (minor_axis)
Items axis: 1 to 9
Major_axis axis: 1 to 9
Minor_axis axis: 1 to 9

The columns headers will be the Items of the Panel, the first level index with be the MajorAxis (rows) and the second level will be the MinorAxis (columns)

列标题将是面板的项目，第一级索引是 MajorAxis（行），第二级将是 MinorAxis（列）

pandas 熊猫将数据框转为 3d 数据

提问by mathtick

回答by Matti John

回答by eldad-a

相关推荐

最近更新

标签

pandas 熊猫将数据框转为 3d 数据

提问by mathtick

回答by Matti John

回答by eldad-a

相关推荐

pandas 像在 MATLAB 中一样在 IPython 中保存会话？

pandas python pandas的转换器

pandas 如何在熊猫中将两个数据框与不同的列标签相乘？

从 python pandas 中的 DataFrame 中删除特定行

相关推荐

最近更新

标签