Python Pandas:分组依据和数据透视表的区别

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/34702815/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 15:24:05  来源:igfitidea点击:

Pandas: group by and Pivot table difference

pythonpandas

提问by user4943236

I just started learning Pandas and was wondering if there is any difference between pandas groupbyand pandas pivot_tablefunctions. Can anyone help me understand the difference between them. Help would be appreciated.

我刚开始学习 Pandas,想知道pandas groupbypandas pivot_table函数之间是否有任何区别。谁能帮我理解它们之间的区别。帮助将不胜感激。

采纳答案by David Maust

Both pivot_tableand groupbyare used to aggregate your dataframe. The difference is only with regard to the shape of the result.

双方pivot_tablegroupby用于收集您的数据帧。区别仅在于结果的形状。

Using pd.pivot_table(df, index=["a"], columns=["b"], values=["c"], aggfunc=np.sum)a table is created where ais on the row axis, bis on the column axis, and the values are the sum of c.

使用pd.pivot_table(df, index=["a"], columns=["b"], values=["c"], aggfunc=np.sum)表格创建,其中a位于行轴上,b位于列轴上,并且值是 的总和c

Example:

例子:

df = pd.DataFrame({"a": [1,2,3,1,2,3], "b":[1,1,1,2,2,2], "c":np.random.rand(6)})
pd.pivot_table(df, index=["a"], columns=["b"], values=["c"], aggfunc=np.sum)

b         1         2
a                    
1  0.528470  0.484766
2  0.187277  0.144326
3  0.866832  0.650100

Using groupby, the dimensions given are placed into columns, and rows are created for each combination of those dimensions.

使用groupby,将给定的维度放入列中,并为这些维度的每个组合创建行。

In this example, we create a series of the sum of values c, grouped by all unique combinations of aand b.

在这个例子中,我们创建了一系列的值的总和c,由所有唯一组合进行分组ab

df.groupby(['a','b'])['c'].sum()

a  b
1  1    0.528470
   2    0.484766
2  1    0.187277
   2    0.144326
3  1    0.866832
   2    0.650100
Name: c, dtype: float64

A similar usage of groupbyis if we omit the ['c']. In this case, it creates a dataframe (not a series) of the sums of all remaining columns grouped by unique values of aand b.

的类似用法groupby是如果我们省略['c']. 在这种情况下,它会创建的唯一值进行分组所有剩余列的求和的数据框(不是一个系列)ab

print df.groupby(["a","b"]).sum()
            c
a b          
1 1  0.528470
  2  0.484766
2 1  0.187277
  2  0.144326
3 1  0.866832
  2  0.650100

回答by kyramichel

It's more appropriate to use .pivot_table()instead of .groupby()when you need to show aggregates with both rows and column labels.

它更适合使用.pivot_table(),而不是.groupby()当你需要表现出与行和列标签聚集。

.pivot_table()makes it easy to create row and column labels at the same time and is preferable, even though you can get similar results using .groupby()with few extra steps.

.pivot_table()使同时创建行和列标签变得容易并且更可取,即使您可以通过.groupby()很少的额外步骤获得类似的结果。