Pandas 交叉表与 Pandas 数据透视表有何不同?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/36267745/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 00:57:07  来源:igfitidea点击:

How is a Pandas crosstab different from a Pandas pivot_table?

pandasnumpyscipypivot-tablecrosstab

提问by root

Both the pandas.crosstab and the Pandas pivot table seem to provide the exact same functionality. Are there any differences?

pandas.crosstab 和 Pandas 数据透视表似乎都提供了完全相同的功能。有什么区别吗?

回答by root

The main difference between the two is the pivot_tableexpects your input data to already be a DataFrame; you pass a DataFrame to pivot_tableand specify the index/columns/valuesby passing the column names as strings. With cross_tab, you don't necessarily need to have a DataFrame going in, as you just pass array-like objects for index/columns/values.

两者之间的主要区别是pivot_table期望您的输入数据已经是一个 DataFrame;你传递一个数据帧到pivot_table并指定index/ columns/values通过将列名作为字符串。有了cross_tab,你不一定需要有一个数据帧中,会因为你只是通过阵列状的对象index/ columns/ values

Looking at the source codefor crosstab, it essentially takes the array-like objects you pass, creates a DataFrame, then calls pivot_tableas appropriate.

纵观源代码crosstab,它基本上是采取类似数组的传递对象,创建了一个数据帧,然后调用pivot_table适当的。

In general, use pivot_tableif you already have a DataFrame, so you don't have the additional overhead of creating the same DataFrame again. If you're starting from array-like objects and are only concerned with the pivoted data, use crosstab. In most cases, I don't think it will really make a difference which function you decide to use.

一般来说,pivot_table如果您已经有一个 DataFrame 就使用它,这样您就没有再次创建同一个 DataFrame 的额外开销。如果您从类似数组的对象开始并且只关心旋转的数据,请使用crosstab. 在大多数情况下,我认为您决定使用哪个功能不会真正产生影响。

回答by jezrael

Is it the same, if in pivot_tableuse aggfunc=lenand fill_value=0:

是否相同,如果正在pivot_table使用aggfunc=lenfill_value=0

pd.crosstab(df['Col X'], df['Col Y'])
pd.pivot_table(df, index=['Col X'], columns=['Col Y'], aggfunc=len, fill_value=0)

EDIT: There is more difference:

编辑:还有更多区别:

Default aggfuncare different: pivot_table- np.mean, crosstab- len.

默认值aggfunc不同:pivot_table- np.meancrosstab- len

Parameter margins_nameis only in pivot_table.

参数margins_name仅在pivot_table.

In pivot_tableyou can use Grouperfor indexand columnskeywords.

pivot_table您可以使用Grouperforindexcolumns关键字。



I think if you need simply frequency table, crosstabfunction is better.

我认为如果您只需要频率表,crosstab功能会更好。

回答by yzerman

The pivot_tabledoes not have the normalizeargument, unfortunately.

不幸的pivot_table是,没有normalize论点。

In crosstab, the normalizeargument calculates percentages by dividing each cell by the sum of cells, as described below:

在 中crosstabnormalize参数通过将每个单元格除以单元格的总和来计算百分比,如下所述:

  • normalize = 'index'divides each cell by the sum of its row
  • normalize = 'columns'divides each cell by the sum of its column
  • normalize = Truedivides each cell by the total of all cells in the table
  • normalize = 'index'将每个单元格除以其行的总和
  • normalize = 'columns'将每个单元格除以其列的总和
  • normalize = True将每个单元格除以表格中所有单元格的总数