Python Pandas:分组依据和数据透视表的区别
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/34702815/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Pandas: group by and Pivot table difference
提问by user4943236
I just started learning Pandas and was wondering if there is any difference between pandas groupby
and pandas pivot_table
functions. Can anyone help me understand the difference between them.
Help would be appreciated.
我刚开始学习 Pandas,想知道pandas groupby
和pandas pivot_table
函数之间是否有任何区别。谁能帮我理解它们之间的区别。帮助将不胜感激。
采纳答案by David Maust
Both pivot_table
and groupby
are used to aggregate your dataframe. The difference is only with regard to the shape of the result.
双方pivot_table
并groupby
用于收集您的数据帧。区别仅在于结果的形状。
Using pd.pivot_table(df, index=["a"], columns=["b"], values=["c"], aggfunc=np.sum)
a table is created where a
is on the row axis, b
is on the column axis, and the values are the sum of c
.
使用pd.pivot_table(df, index=["a"], columns=["b"], values=["c"], aggfunc=np.sum)
表格创建,其中a
位于行轴上,b
位于列轴上,并且值是 的总和c
。
Example:
例子:
df = pd.DataFrame({"a": [1,2,3,1,2,3], "b":[1,1,1,2,2,2], "c":np.random.rand(6)})
pd.pivot_table(df, index=["a"], columns=["b"], values=["c"], aggfunc=np.sum)
b 1 2
a
1 0.528470 0.484766
2 0.187277 0.144326
3 0.866832 0.650100
Using groupby
, the dimensions given are placed into columns, and rows are created for each combination of those dimensions.
使用groupby
,将给定的维度放入列中,并为这些维度的每个组合创建行。
In this example, we create a series of the sum of values c
, grouped by all unique combinations of a
and b
.
在这个例子中,我们创建了一系列的值的总和c
,由所有唯一组合进行分组a
和b
。
df.groupby(['a','b'])['c'].sum()
a b
1 1 0.528470
2 0.484766
2 1 0.187277
2 0.144326
3 1 0.866832
2 0.650100
Name: c, dtype: float64
A similar usage of groupby
is if we omit the ['c']
. In this case, it creates a dataframe (not a series) of the sums of all remaining columns grouped by unique values of a
and b
.
的类似用法groupby
是如果我们省略['c']
. 在这种情况下,它会创建的唯一值进行分组所有剩余列的求和的数据框(不是一个系列)a
和b
。
print df.groupby(["a","b"]).sum()
c
a b
1 1 0.528470
2 0.484766
2 1 0.187277
2 0.144326
3 1 0.866832
2 0.650100
回答by kyramichel
It's more appropriate to use .pivot_table()
instead of .groupby()
when you need to show aggregates with both rows and column labels.
它更适合使用.pivot_table()
,而不是.groupby()
当你需要表现出与行和列标签聚集。
.pivot_table()
makes it easy to create row and column labels at the same time and is preferable, even though you can get similar results using .groupby()
with few extra steps.
.pivot_table()
使同时创建行和列标签变得容易并且更可取,即使您可以通过.groupby()
很少的额外步骤获得类似的结果。