Pandas:将 DataFrame 列值转换为新的 DataFrame 索引和列
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/17698975/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Pandas: Convert DataFrame Column Values Into New Dataframe Indices and Columns
提问by Mike
I have a dataframe that looks like this:
我有一个看起来像这样的数据框:
a b c
0 1 10
1 2 10
2 2 20
3 3 30
4 1 40
4 3 10
The dataframe above as default (0,1,2,3,4...) indices. I would like to convert it into a dataframe that looks like this:
上面的数据框作为默认 (0,1,2,3,4...) 索引。我想将其转换为如下所示的数据框:
1 2 3
0 10 0 0
1 0 10 0
2 0 20 0
3 0 0 30
4 40 0 10
Where column 'a' in the first dataframe becomes the index in the second dataframe, the values of 'b' become the column names and the values of c are copied over, with 0 or NaN filling missing values. The original dataset is large and will result in a very sparse second dataframe. I then intend to add this dataframe to a much larger one, which is straightforward.
第一个数据帧中的列 'a' 成为第二个数据帧中的索引,'b' 的值成为列名,并复制 c 的值,用 0 或 NaN 填充缺失值。原始数据集很大,将导致非常稀疏的第二个数据帧。然后我打算将此数据帧添加到一个更大的数据帧中,这很简单。
Can anyone advise the best way to achieve this please?
任何人都可以建议实现这一目标的最佳方法吗?
回答by joris
You can use the pivotmethod for this.
您可以pivot为此使用该方法。
See the docs: http://pandas.pydata.org/pandas-docs/stable/reshaping.html#reshaping-by-pivoting-dataframe-objects
请参阅文档:http: //pandas.pydata.org/pandas-docs/stable/reshaping.html#reshaping-by-pivoting-dataframe-objects
An example:
一个例子:
In [1]: import pandas as pd
In [2]: df = pd.DataFrame({'a':[0,1,2,3,4,4], 'b':[1,2,2,3,1,3], 'c':[10,10,20,3
0,40,10]})
In [3]: df
Out[3]:
a b c
0 0 1 10
1 1 2 10
2 2 2 20
3 3 3 30
4 4 1 40
5 4 3 10
In [4]: df.pivot(index='a', columns='b', values='c')
Out[4]:
b 1 2 3
a
0 10 NaN NaN
1 NaN 10 NaN
2 NaN 20 NaN
3 NaN NaN 30
4 40 NaN 10
If you want zeros instead of NaN's as in your example, you can use fillna:
如果你想要零而不是你的例子中的 NaN,你可以使用fillna:
In [5]: df.pivot(index='a', columns='b', values='c').fillna(0)
Out[5]:
b 1 2 3
a
0 10 0 0
1 0 10 0
2 0 20 0
3 0 0 30
4 40 0 10

