pandas 如何在给定 3 列的情况下创建方形数据框/矩阵 - Python

Question

提问by WolVes

I am struggling to figure out how to develop a square matrix given a format like

我正在努力弄清楚如何在给定格式的情况下开发方阵

a a 0
a b 3
a c 4
a d 12
b a 3 
b b 0
b c 2
...

To something like:

类似于：

  a b c d e
a 0 3 4 12 ... 
b 3 0 2 7 ... 
c 4 3 0 .. .
d 12 ...  
e . ..

in pandas. I developed a method which I thinks works but takes forever to run because it has to iterate through each column and row for every value starting from the beginning each time using for loops. I feel like I'm definitely reinventing the wheel here. This also isnt realistic for my dataset given how many columns and rows there are. Is there something similar to R's cast function in python which can do this significantly faster?

在Pandas。我开发了一种我认为有效但需要永远运行的方法，因为它必须在每次使用 for 循环时从头开始迭代每个值的每一列和每一行。我觉得我肯定是在这里重新发明轮子。考虑到有多少列和行，这对于我的数据集也是不现实的。是否有类似于 Python 中 R 的 cast 函数的东西可以更快地做到这一点？

Answer 1

回答by unutbu

You could use df.pivot:

你可以使用df.pivot：

import pandas as pd

df = pd.DataFrame([['a', 'a', 0],
                   ['a', 'b', 3],
                   ['a', 'c', 4],
                   ['a', 'd', 12],
                   ['b', 'a', 3],
                   ['b', 'b', 0],
                   ['b', 'c', 2]], columns=['X','Y','Z'])

print(df.pivot(index='X', columns='Y', values='Z'))

yields

产量

Y    a    b    c     d
X                     
a  0.0  3.0  4.0  12.0
b  3.0  0.0  2.0   NaN

Here, index='X'tells df.pivotto use the column labeled 'X'as the index, and columns='Y'tells it to use the column labeled 'Y'as the column index.

在这里，index='X'告诉df.pivot使用标记'X'为索引的列，并columns='Y'告诉它使用标记'Y'为列索引的列。

See the docsfor more on pivotand other reshaping methods.

有关更多信息和其他重塑方法，请参阅文档pivot。

Alternatively, you could use pd.crosstab:

或者，您可以使用pd.crosstab：

print(pd.crosstab(index=df.iloc[:,0], columns=df.iloc[:,1], 
                  values=df.iloc[:,2], aggfunc='sum'))

Unlike df.pivotwhich expects each (a1, a2)pair to be unique, pd.crosstab(with agfunc='sum') will aggregate duplicate pairs by summing the associated values. Although there are no duplicate pairs in your posted example, specifying how duplicates are supposed to be aggregated is required when the valuesparameter is used.

与df.pivot期望每(a1, a2)对都是唯一的不同，pd.crosstab(with agfunc='sum') 将通过对相关值求和来聚合重复的对。尽管您发布的示例中没有重复的对，但在values使用该参数时需要指定应该如何聚合重复项。

Also, whereas df.pivotis passed column labels, pd.crosstabis passed array-likes (such as whole columns of df). df.iloc[:, i]is the ith column of df.

此外，虽然df.pivot传递的是列标签，但pd.crosstab传递的是类数组（例如的整列df）。df.iloc[:, i]是的i第列df。

pandas 如何在给定 3 列的情况下创建方形数据框/矩阵 - Python

提问by WolVes

回答by unutbu

相关推荐

最近更新

标签

pandas 如何在给定 3 列的情况下创建方形数据框/矩阵 - Python

提问by WolVes

回答by unutbu

相关推荐

pandas 在协作中从驱动器加载 xlsx 文件

使用 Pandas 读取日志文件

pandas 遍历数据框熊猫时如何获取列名？

Pandas 并排堆积条形图

相关推荐

最近更新

标签