Python Pandas：分组依据和数据透视表的区别

Question

提问by user4943236

I just started learning Pandas and was wondering if there is any difference between pandas groupbyand pandas pivot_tablefunctions. Can anyone help me understand the difference between them. Help would be appreciated.

我刚开始学习 Pandas，想知道pandas groupby和pandas pivot_table函数之间是否有任何区别。谁能帮我理解它们之间的区别。帮助将不胜感激。

Answer 1

采纳答案by David Maust

Both pivot_tableand groupbyare used to aggregate your dataframe. The difference is only with regard to the shape of the result.

双方pivot_table并groupby用于收集您的数据帧。区别仅在于结果的形状。

Using pd.pivot_table(df, index=["a"], columns=["b"], values=["c"], aggfunc=np.sum)a table is created where ais on the row axis, bis on the column axis, and the values are the sum of c.

使用pd.pivot_table(df, index=["a"], columns=["b"], values=["c"], aggfunc=np.sum)表格创建，其中a位于行轴上，b位于列轴上，并且值是的总和c。

Example:

例子：

df = pd.DataFrame({"a": [1,2,3,1,2,3], "b":[1,1,1,2,2,2], "c":np.random.rand(6)})
pd.pivot_table(df, index=["a"], columns=["b"], values=["c"], aggfunc=np.sum)

b         1         2
a                    
1  0.528470  0.484766
2  0.187277  0.144326
3  0.866832  0.650100

Using groupby, the dimensions given are placed into columns, and rows are created for each combination of those dimensions.

使用groupby，将给定的维度放入列中，并为这些维度的每个组合创建行。

In this example, we create a series of the sum of values c, grouped by all unique combinations of aand b.

在这个例子中，我们创建了一系列的值的总和c，由所有唯一组合进行分组a和b。

df.groupby(['a','b'])['c'].sum()

a  b
1  1    0.528470
   2    0.484766
2  1    0.187277
   2    0.144326
3  1    0.866832
   2    0.650100
Name: c, dtype: float64

A similar usage of groupbyis if we omit the ['c']. In this case, it creates a dataframe (not a series) of the sums of all remaining columns grouped by unique values of aand b.

的类似用法groupby是如果我们省略['c']. 在这种情况下，它会创建的唯一值进行分组所有剩余列的求和的数据框（不是一个系列）a和b。

print df.groupby(["a","b"]).sum()
            c
a b          
1 1  0.528470
  2  0.484766
2 1  0.187277
  2  0.144326
3 1  0.866832
  2  0.650100

Answer 2

回答by kyramichel

It's more appropriate to use .pivot_table()instead of .groupby()when you need to show aggregates with both rows and column labels.

它更适合使用.pivot_table()，而不是.groupby()当你需要表现出与行和列标签聚集。

.pivot_table()makes it easy to create row and column labels at the same time and is preferable, even though you can get similar results using .groupby()with few extra steps.

.pivot_table()使同时创建行和列标签变得容易并且更可取，即使您可以通过.groupby()很少的额外步骤获得类似的结果。

Python Pandas：分组依据和数据透视表的区别

提问by user4943236

采纳答案by David Maust

回答by kyramichel

相关推荐

最近更新

标签

Python Pandas：分组依据和数据透视表的区别

提问by user4943236

采纳答案by David Maust

回答by kyramichel

相关推荐

Python 如何将数组转换为列表？

Python ValueError：没有足够的值来解包（预期 11，得到 1）

从python中的pandas系列中删除元素

Python 设置转换的列表的时间复杂度是多少？

相关推荐

最近更新

标签