使用 bokeh 或 matplotlib 来自 Pandas DataFrame 的分层饼图/甜甜圈图

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/33019879/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 00:00:26  来源:igfitidea点击:

Hierarchic pie/donut chart from Pandas DataFrame using bokeh or matplotlib

pythonpandasmatplotlibbokeh

提问by Adrià Cereto i Massagué

I have the following pandas DataFrame ("A" is the last column's header; the rest of columns are a combined hierarchical index):

我有以下 Pandas DataFrame(“A”是最后一列的标题;其余列是组合的分层索引):

    A
kingdom      phylum            class             order                family                        genus              species             
No blast hit                                                                                                                           2496
k__Archaea   p__Euryarchaeota  c__Thermoplasmata o__E2                f__[Methanomassiliicoccaceae] g__vadinCA11       s__                6
k__Bacteria  p__               c__               o__                  f__                           g__                s__                5
             p__Actinobacteria c__Acidimicrobiia o__Acidimicrobiales  f__                           g__                s__                0
                               c__Actinobacteria o__Actinomycetales   f__Corynebacteriaceae         g__Corynebacterium s__stationis       2
                                                                      f__Micrococcaceae             g__Arthrobacter    s__                8
                                                 o__Bifidobacteriales f__Bifidobacteriaceae         g__Bifidobacterium s__              506
                                                                                                                       s__animalis       48
                               c__Coriobacteriia o__Coriobacteriales  f__Coriobacteriaceae          g__                s__              734
                                                                                                    g__Collinsella     s__aerofaciens     3

(a CSV with the data is available here)

此处提供包含数据的 CSV 文件)

I want to plot in a pie/donut chart , where each concentric circle is a level (kingdom, phylum, etc.) and is divided according to the sum of the column A for that level, so I end with something similar to this, but with my data:

我想绘制饼图/甜甜圈图,其中每个同心圆是一个级别(界、门等),并根据该级别 A 列的总和进行划分,因此我以类似的内容结束,但我的数据:

disk usage chart

磁盘使用图表

I've looked into matplotlib and bokeh, but the most similar thing I've found so far is the bokeh Donut chart example, using a deprecated chart, which I don't know how to extrapolate for more than 2 levels.

我已经研究过 matplotlib 和 bokeh,但到目前为止我发现的最相似的东西是 bokeh Donut 图示例,使用的是已弃用的图表,我不知道如何推断超过 2 个级别。

回答by Ajean

I don't know if there is anything pre-defined that does this, but it's possible to construct your own using groupby and overlapping pie plots. I constructed the following script to take your data and get something at least similar to what you specified.

我不知道是否有任何预定义的东西可以做到这一点,但可以使用 groupby 和重叠饼图构建自己的。我构建了以下脚本来获取您的数据并获得至少与您指定的内容相似的内容。

Note that the groupby calls (which are used to calculate the totals at each level) must have sorting turned off for things to line up correctly. Your dataset is also very non-uniform, so I just made some random data to spread out the resulting chart a bit for the sake of illustration.

请注意,groupby 调用(用于计算每个级别的总数)必须关闭排序,以便正确排列。你的数据集也非常不均匀,所以为了说明起见,我只是制作了一些随机数据来分散生成的图表。

You'll probably have to tweak colors and label positions, but it may be a start.

您可能需要调整颜色和标签位置,但这可能是一个开始。

import pandas as pd
import matplotlib.pyplot as plt
import numpy as np

df = pd.read_csv('species.csv')
df = df.dropna() # Drop the "no hits" line
df['A'] = np.random.rand(len(df)) * 100 + 1

# Do the summing to get the values for each layer
def nested_pie(df):

    cols = df.columns.tolist()
    outd = {}
    gb = df.groupby(cols[0], sort=False).sum()
    outd[0] = {'names':gb.index.values, 'values':gb.values}
    for lev in range(1,7):
        gb = df.groupby(cols[:(lev+1)], sort=False).sum()
        outd[lev] = {'names':gb.index.levels[lev][gb.index.labels[lev]].tolist(),
                     'values':gb.values}
    return outd

outd = nested_pie(df)
diff = 1/7.0

# This first pie chart fill the plot, it's the lowest level
plt.pie(outd[6]['values'], labels=outd[6]['names'], labeldistance=0.9,
        colors=plt.style.library['bmh']['axes.color_cycle'])
ax = plt.gca()
# For each successive plot, change the max radius so that they overlay
for i in np.arange(5,-1,-1):
    ax.pie(outd[i]['values'], labels=outd[i]['names'], 
           radius=np.float(i+1)/7.0, labeldistance=((2*(i+1)-1)/14.0)/((i+1)/7.0),
           colors=plt.style.library['bmh']['axes.color_cycle'])
ax.set_aspect('equal')

Modulo slight changes from the call to random(), this yields a plot like this: layered pie chart random data

从调用到 的模数略有变化random(),这会产生如下图: 分层饼图随机数据

On your real data it looks like this:

在您的真实数据上,它看起来像这样:

layered pie chart user data

分层饼图用户数据