将 Pandas 数据框转换为二维数组

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/32553976/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-13 23:53:10  来源:igfitidea点击:

Turn a pandas dataframe into a two dimensional array

arrayspandasdataframe

提问by ben890

I have a dataframe with three columns. X, Y, and counts, where counts is the number of occurences where x and y appear together. My goal is to transform this from a dataframe to an array of two dimensions where X is the name of the rows, Y is the name of the columns and the counts make up the records in the table.

我有一个包含三列的数据框。X、Y 和计数,其中计数是 x 和 y 一起出现的次数。我的目标是将它从数据帧转换为二维数组,其中 X 是行的名称,Y 是列的名称,计数构成表中的记录。

Is this possible? I can elaborate if needed.

这可能吗?如果需要,我可以详细说明。

回答by Alexander

To get the same result as a pivot table, you can also perform a groupbyoperation and then unstack one of the columns:

要获得与数据透视表相同的结果,您还可以执行一个groupby操作,然后取消堆叠其中一列:

import numpy as np
import pandas as pd

df = pd.DataFrame({'color': ['red', 'blue', 'black'] * 2, 
                   'vehicle': ['car', 'truck'] * 3, 
                   'value': np.arange(1, 7)})

>>> df
   color  value vehicle
0    red      1     car
1   blue      2   truck
2  black      3     car
3    red      4   truck
4   blue      5     car
5  black      6   truck

>>> df.groupby(['color', 'vehicle']).sum().unstack('vehicle')
         value       
vehicle    car  truck
color                
black        3      6
blue         5      2
red          1      4

回答by daedalus

Here is an IPython session that may be a good simulation of what you are trying to do:

这是一个 IPython 会话,可以很好地模拟您正在尝试执行的操作:

In [17]: import pandas as pd

In [18]: from random import randint

In [19]: x = ['a', 'b', 'c'] * 4

In [20]: y = ['i', 'j', 'k', 'l'] * 3

In [21]: counts = [randint(10, 20) for i in range(12)]

In [22]: df = pd.DataFrame(dict(x=x, y=y, counts=counts))

In [23]: df.head()
Out[23]:
   counts  x  y
0      16  a  i
1      10  b  j
2      16  c  k
3      15  a  l
4      19  b  i

In [24]: df.pivot(index='x', columns='y', values='counts')
Out[24]:
y   i   j   k   l
x
a  16  14  18  15
b  19  10  15  20
c  10  18  16  16

In [25]: df.pivot(index='x', columns='y', values='counts').values
Out[25]:
array([[16, 14, 18, 15],
       [19, 10, 15, 20],
       [10, 18, 16, 16]], dtype=int64)