Python:在seaborn条形图中绘制百分比
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/35692781/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Python: Plotting percentage in seaborn bar plot
提问by PagMax
For a dataframe
对于数据框
import pandas as pd
df=pd.DataFrame({'group':list("AADABCBCCCD"),'Values':[1,0,1,0,1,0,0,1,0,1,0]})
I am trying to plot a barplot showing percentage of times A, B, C, D
takes zero (or one).
我正在尝试绘制一个条形图,显示时间百分比A, B, C, D
为零(或一)。
I have a round about way which works but I am thinking there has to be more straight forward way
我有一个可行的方法,但我认为必须有更直接的方法
tempdf=df.groupby(['group','Values']).Values.count().unstack().fillna(0)
tempdf['total']=df['group'].value_counts()
tempdf['percent']=tempdf[0]/tempdf['total']*100
tempdf.reset_index(inplace=True)
print tempdf
sns.barplot(x='group',y='percent',data=tempdf)
If it were plotting just the mean value, I could simply do sns.barplot
on df
dataframe than tempdf. I am not sure how to do it elegantly if I am interested in plotting percentages.
如果它只是绘制平均值,我可以简单地sns.barplot
在df
数据帧上做而不是 tempdf。如果我对绘制百分比感兴趣,我不确定如何优雅地做到这一点。
Thanks,
谢谢,
回答by mgoldwasser
You can use Pandas in conjunction with seaborn to make this easier:
您可以将 Pandas 与 seaborn 结合使用以简化此操作:
import pandas as pd
import seaborn as sns
df = sns.load_dataset("tips")
x, y, hue = "day", "proportion", "sex"
hue_order = ["Male", "Female"]
(df[x]
.groupby(df[hue])
.value_counts(normalize=True)
.rename(y)
.reset_index()
.pipe((sns.barplot, "data"), x=x, y=y, hue=hue))
回答by Anton Protopopov
You could use your own function in sns.barplot
estimator
, as from docs:
您可以在 中使用自己的函数sns.barplot
estimator
,如文档所示:
estimator: callable that maps vector -> scalar, optional
Statistical function to estimate within each categorical bin.
estimator:可调用的映射向量 -> 标量,可选的
统计函数以在每个分类箱内进行估计。
For you case you could define function as lambda:
对于您的情况,您可以将函数定义为 lambda:
sns.barplot(x='group', y='Values', data=df, estimator=lambda x: sum(x==0)*100.0/len(x))
回答by Ted Petrou
You can use the library Dexplot, which has the ability to return relative frequencies for categorical variables. It has a similar API to Seaborn. Pass the column you would like to get the relative frequency for to the agg
parameter. If you would like to subdivide this by another column, do so with the hue
parameter. The following returns raw counts.
您可以使用库 Dexplot,它能够返回分类变量的相对频率。它具有与 Seaborn 类似的 API。将您想要获取相对频率的列传递给agg
参数。如果您想将其细分为另一列,请使用hue
参数执行此操作。以下返回原始计数。
import dexplot as dxp
dxp.aggplot(agg='group', data=df, hue='Values')
To get the relative frequencies, set the normalize
parameter to the column you want to normalize over. Use 'all'
to normalize over the overall total count.
要获得相对频率,请将normalize
参数设置为要标准化的列。使用'all'
标准化在整体总数。
dxp.aggplot(agg='group', data=df, hue='Values', normalize='group')
Normalizing over the 'Values'
column would produce the following graph, where the total of all the '0' bars are 1.
对该'Values'
列进行归一化将产生下图,其中所有“0”条的总和为 1。
dxp.aggplot(agg='group', data=df, hue='Values', normalize='Values')
回答by Deepak Natarajan
You can follow these steps so that you can see the count and percentages on top of the bars in your plot. Check the example outputs down below
您可以按照以下步骤操作,以便您可以看到图中条形顶部的计数和百分比。检查下面的示例输出
with_huefunction will plot percentages on the bar graphs if you have the 'hue' parameter in your plots. It takes the actual graph, feature, Number_of_categories in feature, and hue_categories(number of categories in hue feature) as a parameter.
如果您的图中有“色调”参数,with_hue函数将在条形图上绘制百分比。它以实际图形、特征、特征中的 Number_of_categories 和hue_categories(色调特征中的类别数)作为参数。
without_huefunction will plot percentages on the bar graphs if you have a normal plot. It takes the actual graph and feature as a parameter.
如果您有正常绘图,without_hue函数将在条形图上绘制百分比。它以实际图形和特征为参数。
def with_hue(plot, feature, Number_of_categories, hue_categories):
a = [p.get_height() for p in plot.patches]
patch = [p for p in plot.patches]
for i in range(Number_of_categories):
total = feature.value_counts().values[i]
for j in range(hue_categories):
percentage = '{:.1f}%'.format(100 * a[(j*Number_of_categories + i)]/total)
x = patch[(j*Number_of_categories + i)].get_x() + patch[(j*Number_of_categories + i)].get_width() / 2 - 0.15
y = patch[(j*Number_of_categories + i)].get_y() + patch[(j*Number_of_categories + i)].get_height()
ax.annotate(percentage, (x, y), size = 12)
plt.show()
def without_hue(plot, feature):
total = len(feature)
for p in ax.patches:
percentage = '{:.1f}%'.format(100 * p.get_height()/total)
x = p.get_x() + p.get_width() / 2 - 0.05
y = p.get_y() + p.get_height()
ax.annotate(percentage, (x, y), size = 12)
plt.show()