Pandas 交叉表与 Pandas 数据透视表有何不同？

Question

提问by root

Both the pandas.crosstab and the Pandas pivot table seem to provide the exact same functionality. Are there any differences?

pandas.crosstab 和 Pandas 数据透视表似乎都提供了完全相同的功能。有什么区别吗？

Answer 1

回答by root

The main difference between the two is the pivot_tableexpects your input data to already be a DataFrame; you pass a DataFrame to pivot_tableand specify the index/columns/valuesby passing the column names as strings. With cross_tab, you don't necessarily need to have a DataFrame going in, as you just pass array-like objects for index/columns/values.

两者之间的主要区别是pivot_table期望您的输入数据已经是一个 DataFrame；你传递一个数据帧到pivot_table并指定index/ columns/values通过将列名作为字符串。有了cross_tab，你不一定需要有一个数据帧中，会因为你只是通过阵列状的对象index/ columns/ values。

Looking at the source codefor crosstab, it essentially takes the array-like objects you pass, creates a DataFrame, then calls pivot_tableas appropriate.

纵观源代码的crosstab，它基本上是采取类似数组的传递对象，创建了一个数据帧，然后调用pivot_table适当的。

In general, use pivot_tableif you already have a DataFrame, so you don't have the additional overhead of creating the same DataFrame again. If you're starting from array-like objects and are only concerned with the pivoted data, use crosstab. In most cases, I don't think it will really make a difference which function you decide to use.

一般来说，pivot_table如果您已经有一个 DataFrame 就使用它，这样您就没有再次创建同一个 DataFrame 的额外开销。如果您从类似数组的对象开始并且只关心旋转的数据，请使用crosstab. 在大多数情况下，我认为您决定使用哪个功能不会真正产生影响。

Answer 2

回答by jezrael

Is it the same, if in pivot_tableuse aggfunc=lenand fill_value=0:

是否相同，如果正在pivot_table使用aggfunc=len和fill_value=0：

pd.crosstab(df['Col X'], df['Col Y'])
pd.pivot_table(df, index=['Col X'], columns=['Col Y'], aggfunc=len, fill_value=0)

EDIT: There is more difference:

编辑：还有更多区别：

Default aggfuncare different: pivot_table- np.mean, crosstab- len.

默认值aggfunc不同：pivot_table- np.mean，crosstab- len。

Parameter margins_nameis only in pivot_table.

参数margins_name仅在pivot_table.

In pivot_tableyou can use Grouperfor indexand columnskeywords.

在pivot_table您可以使用Grouperforindex和columns关键字。

I think if you need simply frequency table, crosstabfunction is better.

我认为如果您只需要频率表，crosstab功能会更好。

Answer 3

回答by yzerman

The pivot_tabledoes not have the normalizeargument, unfortunately.

不幸的pivot_table是，没有normalize论点。

In crosstab, the normalizeargument calculates percentages by dividing each cell by the sum of cells, as described below:

在中crosstab，normalize参数通过将每个单元格除以单元格的总和来计算百分比，如下所述：

normalize = 'index'divides each cell by the sum of its row
normalize = 'columns'divides each cell by the sum of its column
normalize = Truedivides each cell by the total of all cells in the table

normalize = 'index'将每个单元格除以其行的总和
normalize = 'columns'将每个单元格除以其列的总和
normalize = True将每个单元格除以表格中所有单元格的总数

Pandas 交叉表与 Pandas 数据透视表有何不同？

提问by root

回答by root

回答by jezrael

回答by yzerman

相关推荐

最近更新

标签

Pandas 交叉表与 Pandas 数据透视表有何不同？

提问by root

回答by root

回答by jezrael

回答by yzerman

相关推荐

pandas 熊猫：在 groupby 组内对观察进行排序

pandas 熊猫：如何找到每行最频繁的值？

pandas 查询pandas MultiIndex的正确方法

pandas 如何从pandas groupby中的多列中获取唯一值

相关推荐

最近更新

标签