Python Seaborn:带有频率的 countplot()

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/33179122/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 12:57:07  来源:igfitidea点击:

Seaborn: countplot() with frequencies

pythonpandasmatplotlibdata-visualizationseaborn

提问by marillion

I have a Pandas DataFrame with a column called "AXLES", which can take an integer value between 3-12. I am trying to use Seaborn's countplot() option to achieve the following plot:

我有一个 Pandas DataFrame,其中有一列名为“AXLES”,它可以采用 3-12 之间的整数值。我正在尝试使用 Seaborn 的 countplot() 选项来实现以下图:

  1. left y axis shows the frequencies of these values occurring in the data. The axis extends are [0%-100%], tick marks at every 10%.
  2. right y axis shows the actual counts, values correspond to tick marks determined by the left y axis (marked at every 10%.)
  3. x axis shows the categories for the bar plots [3, 4, 5, 6, 7, 8, 9, 10, 11, 12].
  4. Annotation on top of the bars show the actual percentage of that category.
  1. 左 y 轴显示这些值在数据中出现的频率。轴延伸为 [0%-100%],每 10% 处有刻度线。
  2. 右 y 轴显示实际计数,值对应于由左 y 轴确定的刻度线(每 10% 标记。)
  3. x 轴显示条形图的类别 [3, 4, 5, 6, 7, 8, 9, 10, 11, 12]。
  4. 条形顶部的注释显示该类别的实际百分比。

The following code gives me the plot below, with actual counts, but I could not find a way to convert them into frequencies. I can get the frequencies using df.AXLES.value_counts()/len(df.index)but I am not sure about how to plug this information into Seaborn's countplot().

下面的代码为我提供了下面的图,带有实际计数,但我找不到将它们转换为频率的方法。我可以获得使用的频率,df.AXLES.value_counts()/len(df.index)但我不确定如何将此信息插入 Seaborn 的countplot().

I also found a workaround for the annotations, but I am not sure if that is the best implementation.

我还找到了注释的解决方法,但我不确定这是否是最佳实现。

Any help would be appreciated!

任何帮助,将不胜感激!

Thanks

谢谢

plt.figure(figsize=(12,8))
ax = sns.countplot(x="AXLES", data=dfWIM, order=[3,4,5,6,7,8,9,10,11,12])
plt.title('Distribution of Truck Configurations')
plt.xlabel('Number of Axles')
plt.ylabel('Frequency [%]')

for p in ax.patches:
        ax.annotate('%{:.1f}'.format(p.get_height()), (p.get_x()+0.1, p.get_height()+50))

enter image description here

在此处输入图片说明

EDIT:

编辑:

I got closer to what I need with the following code, using Pandas' bar plot, ditching Seaborn. Feels like I'm using so many workarounds, and there has to be an easier way to do it. The issues with this approach:

我使用以下代码更接近我需要的东西,使用 Pandas 的条形图,抛弃 Seaborn。感觉就像我使用了很多变通方法,并且必须有一种更简单的方法来做到这一点。这种方法的问题:

  • There is no orderkeyword in Pandas' bar plot function as Seaborn's countplot() has, so I cannot plot all categories from 3-12 as I did in the countplot(). I need to have them shown even if there is no data in that category.
  • The secondary y-axis messes up the bars and the annotation for some reason (see the white gridlines drawn over the text and bars).

    plt.figure(figsize=(12,8))
    plt.title('Distribution of Truck Configurations')
    plt.xlabel('Number of Axles')
    plt.ylabel('Frequency [%]')
    
    ax = (dfWIM.AXLES.value_counts()/len(df)*100).sort_index().plot(kind="bar", rot=0)
    ax.set_yticks(np.arange(0, 110, 10))
    
    ax2 = ax.twinx()
    ax2.set_yticks(np.arange(0, 110, 10)*len(df)/100)
    
    for p in ax.patches:
        ax.annotate('{:.2f}%'.format(p.get_height()), (p.get_x()+0.15, p.get_height()+1))
    
  • 有没有order在大熊猫柱状图功能关键字作为Seaborn的countplot()了,所以我不能从3-12绘制所有类别,因为我在做countplot()。即使该类别中没有数据,我也需要显示它们。
  • 由于某种原因,辅助 y 轴弄乱了条形图和注释(请参阅在文本和条形图上绘制的白色网格线)。

    plt.figure(figsize=(12,8))
    plt.title('Distribution of Truck Configurations')
    plt.xlabel('Number of Axles')
    plt.ylabel('Frequency [%]')
    
    ax = (dfWIM.AXLES.value_counts()/len(df)*100).sort_index().plot(kind="bar", rot=0)
    ax.set_yticks(np.arange(0, 110, 10))
    
    ax2 = ax.twinx()
    ax2.set_yticks(np.arange(0, 110, 10)*len(df)/100)
    
    for p in ax.patches:
        ax.annotate('{:.2f}%'.format(p.get_height()), (p.get_x()+0.15, p.get_height()+1))
    

enter image description here

在此处输入图片说明

采纳答案by tmdavison

You can do this by making a twinxaxes for the frequencies. You can switch the two y axes around so the frequencies stay on the left and the counts on the right, but without having to recalculate the counts axis (here we use tick_left()and tick_right()to move the ticks and set_label_positionto move the axis labels

您可以通过twinx为频率制作轴来做到这一点。您可以切换两个 y 轴,使频率保持在左侧,计数在右侧,但无需重新计算计数轴(这里我们使用tick_left()andtick_right()移动刻度并set_label_position移动轴标签

You can then set the ticks using the matplotlib.tickermodule, specifically ticker.MultipleLocatorand ticker.LinearLocator.

然后,您可以使用matplotlib.ticker模块设置刻度,特别是ticker.MultipleLocatorticker.LinearLocator

As for your annotations, you can get the x and y locations for all 4 corners of the bar with patch.get_bbox().get_points(). This, along with setting the horizontal and vertical alignment correctly, means you don't need to add any arbitrary offsets to the annotation location.

至于您的注释,您可以使用patch.get_bbox().get_points(). 这与正确设置水平和垂直对齐方式一起意味着您不需要向注释位置添加任何任意偏移量。

Finally, you need to turn the grid off for the twinned axis, to prevent grid lines showing up on top of the bars (ax2.grid(None))

最后,您需要关闭孪生轴的网格,以防止网格线出现在条形 ( ax2.grid(None))

Here is a working script:

这是一个工作脚本:

import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns
import matplotlib.ticker as ticker

# Some random data
dfWIM = pd.DataFrame({'AXLES': np.random.normal(8, 2, 5000).astype(int)})
ncount = len(dfWIM)

plt.figure(figsize=(12,8))
ax = sns.countplot(x="AXLES", data=dfWIM, order=[3,4,5,6,7,8,9,10,11,12])
plt.title('Distribution of Truck Configurations')
plt.xlabel('Number of Axles')

# Make twin axis
ax2=ax.twinx()

# Switch so count axis is on right, frequency on left
ax2.yaxis.tick_left()
ax.yaxis.tick_right()

# Also switch the labels over
ax.yaxis.set_label_position('right')
ax2.yaxis.set_label_position('left')

ax2.set_ylabel('Frequency [%]')

for p in ax.patches:
    x=p.get_bbox().get_points()[:,0]
    y=p.get_bbox().get_points()[1,1]
    ax.annotate('{:.1f}%'.format(100.*y/ncount), (x.mean(), y), 
            ha='center', va='bottom') # set the alignment of the text

# Use a LinearLocator to ensure the correct number of ticks
ax.yaxis.set_major_locator(ticker.LinearLocator(11))

# Fix the frequency range to 0-100
ax2.set_ylim(0,100)
ax.set_ylim(0,ncount)

# And use a MultipleLocator to ensure a tick spacing of 10
ax2.yaxis.set_major_locator(ticker.MultipleLocator(10))

# Need to turn the grid on ax2 off, otherwise the gridlines end up on top of the bars
ax2.grid(None)

plt.savefig('snscounter.pdf')

enter image description here

在此处输入图片说明

回答by spfrnd

I got it to work using core matplotlib's bar plot. I didn't have your data obviously, but adapting it to yours should be straight forward. enter image description here

我使用 corematplotlib的条形图让它工作。我显然没有您的数据,但将其调整为您的数据应该是直接的。 在此处输入图片说明

Approach

方法

I used matplotlib's twin axis and plotted the data as bars on the second Axesobject. The rest ist just some fiddeling around to get the ticks right and make annotations.

我使用了matplotlib双轴并将数据绘制为第二个Axes对象上的条形图。剩下的就是摆弄一些东西来获得正确的刻度并进行注释。

Hope this helps.

希望这可以帮助。

Code

代码

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib
from mpl_toolkits.mplot3d import Axes3D
import seaborn as sns

tot = np.random.rand( 1 ) * 100
data = np.random.rand( 1, 12 )
data = data / sum(data,1) * tot

df = pd.DataFrame( data )
palette = sns.husl_palette(9, s=0.7 )

### Left Axis
# Plot nothing here, autmatically scales to second axis.

fig, ax1 = plt.subplots()
ax1.set_ylim( [0,100] )

# Remove grid lines.
ax1.grid( False )
# Set ticks and add percentage sign.
ax1.yaxis.set_ticks( np.arange(0,101,10) )
fmt = '%.0f%%'
yticks = matplotlib.ticker.FormatStrFormatter( fmt )
ax1.yaxis.set_major_formatter( yticks )

### Right Axis
# Plot data as bars.
x = np.arange(0,9,1)
ax2 = ax1.twinx()
rects = ax2.bar( x-0.4, np.asarray(df.loc[0,3:]), width=0.8 )

# Set ticks on x-axis and remove grid lines.
ax2.set_xlim( [-0.5,8.5] )
ax2.xaxis.set_ticks( x )
ax2.xaxis.grid( False )

# Set ticks on y-axis in 10% steps.
ax2.set_ylim( [0,tot] )
ax2.yaxis.set_ticks( np.linspace( 0, tot, 11 ) )

# Add labels and change colors.
for i,r in enumerate(rects):
    h = r.get_height()
    r.set_color( palette[ i % len(palette) ] )
    ax2.text( r.get_x() + r.get_width()/2.0, \
              h + 0.01*tot,                  \
              r'%d%%'%int(100*h/tot), ha = 'center' )

回答by CT Zhu

I think you can first set the y major ticks manually and then modify each label

我认为您可以先手动设置 y 主要刻度,然后修改每个标签

dfWIM = pd.DataFrame({'AXLES': np.random.randint(3, 10, 1000)})
total = len(dfWIM)*1.
plt.figure(figsize=(12,8))
ax = sns.countplot(x="AXLES", data=dfWIM, order=[3,4,5,6,7,8,9,10,11,12])
plt.title('Distribution of Truck Configurations')
plt.xlabel('Number of Axles')
plt.ylabel('Frequency [%]')

for p in ax.patches:
        ax.annotate('{:.1f}%'.format(100*p.get_height()/total), (p.get_x()+0.1, p.get_height()+5))

#put 11 ticks (therefore 10 steps), from 0 to the total number of rows in the dataframe
ax.yaxis.set_ticks(np.linspace(0, total, 11))

#adjust the ticklabel to the desired format, without changing the position of the ticks. 
_ = ax.set_yticklabels(map('{:.1f}%'.format, 100*ax.yaxis.get_majorticklocs()/total))

enter image description here

在此处输入图片说明