如何使用 Statsmodels 库从 Pandas 数据框创建马赛克图？

Question

提问by Dirk

Using Python 3.4, Pandas 0.15 and Statsmodels 0.6.0, I try to create a mosaic plotfrom a dataframe as described in the Statsmodels documentation. However, I just don't understand how the input has to be formatted that is provided to the mosaic()function.

使用 Python 3.4、Pandas 0.15 和 Statsmodels 0.6.0，我尝试根据Statsmodels 文档中所述的数据框创建马赛克图。但是，我只是不明白必须如何格式化提供给函数的输入。mosaic()

Given a simple dataframe:

给定一个简单的数据框：

In:
myDataframe = pd.DataFrame({'size' : ['small', 'large', 'large', 'small', 'large', 'small'], 'length' : ['long', 'short', 'short', 'long', 'long', 'short']})

Out:
  length   size
0   long  small
1  short  large
2  short  large
3   long  small
4   long  large
5  short  small

When trying to create a mosaic plot of this data:

尝试创建此数据的马赛克图时：

from statsmodels.graphics.mosaicplot import mosaic
mosaic(data=myDataframe, title='Mosaic Plot')

gives ValueError: cannot label index with a null key

给 ValueError: cannot label index with a null key

As mosaic plots are a visualization of contingency tables, I tried to create such first with

由于马赛克图是列联表的可视化，我尝试首先使用

In:
myCrosstable = pd.crosstab(myDataframe['size'], myDataframe['length'])

Out:
length  long  short
size               
large      1      2
small      2      1

Still, using myCrosstableas data argument gives the same error.

尽管如此，myCrosstable用作数据参数会产生相同的错误。

How does the dataframe have to be formatted in order to get accepted by the mosaic()function? The documentation says as explanation for the data argument: Parameters:

数据帧必须如何格式化才能被mosaic()函数接受？文档说作为数据参数的解释：参数：

data : dict, pandas.Series, np.ndarray, pandas.DataFrame
The contingency table that contains the data. Each category should contain a non-negative number with a tuple as index.

数据：字典、pandas.Series、np.ndarray、pandas.DataFrame
The contingency table that contains the data. Each category should contain a non-negative number with a tuple as index.

Isn't that what the pd.crosstabfunction returns? If not, how can I convert the dataframe accordingly?

这不是pd.crosstab函数返回的内容吗？如果没有，我该如何相应地转换数据帧？

Answer 1

回答by Primer

I used your data and this code:

我使用了您的数据和此代码：

mosaic(myDataframe, ['size', 'length'])

and got the chart like this:

并得到这样的图表：

mosaic chart

马赛克图

Answer 2

回答by Emile Bres

You can also use the stack function on the crosstab to avoid recomputing the contingency table.

您还可以使用交叉表上的堆栈函数来避免重新计算列联表。

With your code, mosaic(myCrossTable.stack())works.

用你的代码，mosaic(myCrossTable.stack())工作。

如何使用 Statsmodels 库从 Pandas 数据框创建马赛克图？

提问by Dirk

回答by Primer

回答by Emile Bres

相关推荐

最近更新

标签

如何使用 Statsmodels 库从 Pandas 数据框创建马赛克图？

提问by Dirk

回答by Primer

回答by Emile Bres

相关推荐

Pandas 数据框到列表字典

将 Pandas 数据框传递给类

pandas 按组每列的唯一值数

pandas 按 SFrame 列记录值

相关推荐

最近更新

标签