将 Pandas 数据框转换为二维数组
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/32553976/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Turn a pandas dataframe into a two dimensional array
提问by ben890
I have a dataframe with three columns. X, Y, and counts, where counts is the number of occurences where x and y appear together. My goal is to transform this from a dataframe to an array of two dimensions where X is the name of the rows, Y is the name of the columns and the counts make up the records in the table.
我有一个包含三列的数据框。X、Y 和计数,其中计数是 x 和 y 一起出现的次数。我的目标是将它从数据帧转换为二维数组,其中 X 是行的名称,Y 是列的名称,计数构成表中的记录。
Is this possible? I can elaborate if needed.
这可能吗?如果需要,我可以详细说明。
回答by Alexander
To get the same result as a pivot table, you can also perform a groupbyoperation and then unstack one of the columns:
要获得与数据透视表相同的结果,您还可以执行一个groupby操作,然后取消堆叠其中一列:
import numpy as np
import pandas as pd
df = pd.DataFrame({'color': ['red', 'blue', 'black'] * 2,
'vehicle': ['car', 'truck'] * 3,
'value': np.arange(1, 7)})
>>> df
color value vehicle
0 red 1 car
1 blue 2 truck
2 black 3 car
3 red 4 truck
4 blue 5 car
5 black 6 truck
>>> df.groupby(['color', 'vehicle']).sum().unstack('vehicle')
value
vehicle car truck
color
black 3 6
blue 5 2
red 1 4
回答by daedalus
Here is an IPython session that may be a good simulation of what you are trying to do:
这是一个 IPython 会话,可以很好地模拟您正在尝试执行的操作:
In [17]: import pandas as pd
In [18]: from random import randint
In [19]: x = ['a', 'b', 'c'] * 4
In [20]: y = ['i', 'j', 'k', 'l'] * 3
In [21]: counts = [randint(10, 20) for i in range(12)]
In [22]: df = pd.DataFrame(dict(x=x, y=y, counts=counts))
In [23]: df.head()
Out[23]:
counts x y
0 16 a i
1 10 b j
2 16 c k
3 15 a l
4 19 b i
In [24]: df.pivot(index='x', columns='y', values='counts')
Out[24]:
y i j k l
x
a 16 14 18 15
b 19 10 15 20
c 10 18 16 16
In [25]: df.pivot(index='x', columns='y', values='counts').values
Out[25]:
array([[16, 14, 18, 15],
[19, 10, 15, 20],
[10, 18, 16, 16]], dtype=int64)

