pandas 如何在给定 3 列的情况下创建方形数据框/矩阵 - Python

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/47683642/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 04:52:01  来源:igfitidea点击:

How to create a square dataframe/matrix given 3 columns - Python

pythonpandasreformatting

提问by WolVes

I am struggling to figure out how to develop a square matrix given a format like

我正在努力弄清楚如何在给定格式的情况下开发方阵

a a 0
a b 3
a c 4
a d 12
b a 3 
b b 0
b c 2
...

To something like:

类似于:

  a b c d e
a 0 3 4 12 ... 
b 3 0 2 7 ... 
c 4 3 0 .. .
d 12 ...  
e . .. 

in pandas. I developed a method which I thinks works but takes forever to run because it has to iterate through each column and row for every value starting from the beginning each time using for loops. I feel like I'm definitely reinventing the wheel here. This also isnt realistic for my dataset given how many columns and rows there are. Is there something similar to R's cast function in python which can do this significantly faster?

在Pandas。我开发了一种我认为有效但需要永远运行的方法,因为它必须在每次使用 for 循环时从头开始迭代每个值的每一列和每一行。我觉得我肯定是在这里重新发明轮子。考虑到有多少列和行,这对于我的数据集也是不现实的。是否有类似于 Python 中 R 的 cast 函数的东西可以更快地做到这一点?

回答by unutbu

You could use df.pivot:

你可以使用df.pivot

import pandas as pd

df = pd.DataFrame([['a', 'a', 0],
                   ['a', 'b', 3],
                   ['a', 'c', 4],
                   ['a', 'd', 12],
                   ['b', 'a', 3],
                   ['b', 'b', 0],
                   ['b', 'c', 2]], columns=['X','Y','Z'])

print(df.pivot(index='X', columns='Y', values='Z'))

yields

产量

Y    a    b    c     d
X                     
a  0.0  3.0  4.0  12.0
b  3.0  0.0  2.0   NaN

Here, index='X'tells df.pivotto use the column labeled 'X'as the index, and columns='Y'tells it to use the column labeled 'Y'as the column index.

在这里,index='X'告诉df.pivot使用标记'X'为索引的列,并columns='Y'告诉它使用标记'Y'为列索引的列。

See the docsfor more on pivotand other reshaping methods.

有关更多信息和其他重塑方法,请参阅文档pivot



Alternatively, you could use pd.crosstab:

或者,您可以使用pd.crosstab

print(pd.crosstab(index=df.iloc[:,0], columns=df.iloc[:,1], 
                  values=df.iloc[:,2], aggfunc='sum'))

Unlike df.pivotwhich expects each (a1, a2)pair to be unique, pd.crosstab(with agfunc='sum') will aggregate duplicate pairs by summing the associated values. Although there are no duplicate pairs in your posted example, specifying how duplicates are supposed to be aggregated is required when the valuesparameter is used.

df.pivot期望每(a1, a2)对都是唯一的不同,pd.crosstab(with agfunc='sum') 将通过对相关值求和来聚合重复的对。尽管您发布的示例中没有重复的对,但在values使用该参数时需要指定应该如何聚合重复项。

Also, whereas df.pivotis passed column labels, pd.crosstabis passed array-likes (such as whole columns of df). df.iloc[:, i]is the ith column of df.

此外,虽然df.pivot传递的是列标签,但pd.crosstab传递的是类数组(例如 的整列df)。df.iloc[:, i]是 的i第 列df