Pandas 交叉表与 Pandas 数据透视表有何不同?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/36267745/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How is a Pandas crosstab different from a Pandas pivot_table?
提问by root
Both the pandas.crosstab and the Pandas pivot table seem to provide the exact same functionality. Are there any differences?
pandas.crosstab 和 Pandas 数据透视表似乎都提供了完全相同的功能。有什么区别吗?
回答by root
The main difference between the two is the pivot_table
expects your input data to already be a DataFrame; you pass a DataFrame to pivot_table
and specify the index
/columns
/values
by passing the column names as strings. With cross_tab
, you don't necessarily need to have a DataFrame going in, as you just pass array-like objects for index
/columns
/values
.
两者之间的主要区别是pivot_table
期望您的输入数据已经是一个 DataFrame;你传递一个数据帧到pivot_table
并指定index
/ columns
/values
通过将列名作为字符串。有了cross_tab
,你不一定需要有一个数据帧中,会因为你只是通过阵列状的对象index
/ columns
/ values
。
Looking at the source codefor crosstab
, it essentially takes the array-like objects you pass, creates a DataFrame, then calls pivot_table
as appropriate.
纵观源代码的crosstab
,它基本上是采取类似数组的传递对象,创建了一个数据帧,然后调用pivot_table
适当的。
In general, use pivot_table
if you already have a DataFrame, so you don't have the additional overhead of creating the same DataFrame again. If you're starting from array-like objects and are only concerned with the pivoted data, use crosstab
. In most cases, I don't think it will really make a difference which function you decide to use.
一般来说,pivot_table
如果您已经有一个 DataFrame 就使用它,这样您就没有再次创建同一个 DataFrame 的额外开销。如果您从类似数组的对象开始并且只关心旋转的数据,请使用crosstab
. 在大多数情况下,我认为您决定使用哪个功能不会真正产生影响。
回答by jezrael
Is it the same, if in pivot_table
use aggfunc=len
and fill_value=0
:
是否相同,如果正在pivot_table
使用aggfunc=len
和fill_value=0
:
pd.crosstab(df['Col X'], df['Col Y'])
pd.pivot_table(df, index=['Col X'], columns=['Col Y'], aggfunc=len, fill_value=0)
EDIT: There is more difference:
编辑:还有更多区别:
Default aggfunc
are different: pivot_table
- np.mean
, crosstab
- len
.
默认值aggfunc
不同:pivot_table
- np.mean
,crosstab
- len
。
Parameter margins_name
is only in pivot_table
.
参数margins_name
仅在pivot_table
.
In pivot_table
you can use Grouper
for index
and columns
keywords.
在pivot_table
您可以使用Grouper
forindex
和columns
关键字。
I think if you need simply frequency table, crosstab
function is better.
我认为如果您只需要频率表,crosstab
功能会更好。
回答by yzerman
The pivot_table
does not have the normalize
argument, unfortunately.
不幸的pivot_table
是,没有normalize
论点。
In crosstab
, the normalize
argument calculates percentages by dividing each cell by the sum of cells, as described below:
在 中crosstab
,normalize
参数通过将每个单元格除以单元格的总和来计算百分比,如下所述:
normalize = 'index'
divides each cell by the sum of its rownormalize = 'columns'
divides each cell by the sum of its columnnormalize = True
divides each cell by the total of all cells in the table
normalize = 'index'
将每个单元格除以其行的总和normalize = 'columns'
将每个单元格除以其列的总和normalize = True
将每个单元格除以表格中所有单元格的总数