pandas 如何在python中制作帕累托图?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/53577630/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 06:11:18  来源:igfitidea点击:

How to make Pareto Chart in python?

pythonpandasmatplotlibseabornpareto-chart

提问by ImportanceOfBeingErnest

Pareto is very popular diagarm in Excel and Tableu. In excel we can easily draw a Pareto diagram but I found no easy way to draw the diagram in Python.

Pareto 是 Excel 和 Tableu 中非常流行的图表。在 excel 中,我们可以轻松地绘制帕累托图,但我发现在 Python 中没有简单的方法来绘制该图。

I have a pandas dataframe like this:

我有一个像这样的Pandas数据框:

import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

df = pd.DataFrame({'country': [177.0, 7.0, 4.0, 2.0, 2.0, 1.0, 1.0, 1.0]})
df.index = ['USA', 'Canada', 'Russia', 'UK', 'Belgium', 'Mexico', 'Germany', 'Denmark']
print(df)

         country
USA        177.0
Canada       7.0
Russia       4.0
UK           2.0
Belgium      2.0
Mexico       1.0
Germany      1.0
Denmark      1.0

How to draw the Pareto diagram ? Using maybe pandas, seaborn, matplotlib etc?

帕累托图怎么画?可能使用Pandas、seaborn、matplotlib 等?

So far I have been able to make descending order bar chart. But its still remaining to put cumulative sum line plot on top of them.

到目前为止,我已经能够制作降序条形图。但是仍然需要将累积总和线图放在它们之上。

My attempt: df.sort_values(by='country',ascending=False).plot.bar()

我的尝试: df.sort_values(by='country',ascending=False).plot.bar()

Required plot:

所需情节:

回答by ImportanceOfBeingErnest

You would probably want to create a new column with the percentage in it and plot one column as bar chart and the other as a line chart in a twin axes.

您可能希望创建一个包含百分比的新列,并将其中一列绘制为条形图,将另一列绘制为双轴中的折线图。

import pandas as pd
import matplotlib.pyplot as plt
from matplotlib.ticker import PercentFormatter

df = pd.DataFrame({'country': [177.0, 7.0, 4.0, 2.0, 2.0, 1.0, 1.0, 1.0]})
df.index = ['USA', 'Canada', 'Russia', 'UK', 'Belgium', 'Mexico', 'Germany', 'Denmark']
df = df.sort_values(by='country',ascending=False)
df["cumpercentage"] = df["country"].cumsum()/df["country"].sum()*100


fig, ax = plt.subplots()
ax.bar(df.index, df["country"], color="C0")
ax2 = ax.twinx()
ax2.plot(df.index, df["cumpercentage"], color="C1", marker="D", ms=7)
ax2.yaxis.set_major_formatter(PercentFormatter())

ax.tick_params(axis="y", colors="C0")
ax2.tick_params(axis="y", colors="C1")
plt.show()

enter image description here

在此处输入图片说明

回答by crinix

More generalized version of ImportanceOfBeingErnest's code:

ImportanceOfBeingErnest 代码的更通用版本:

def create_pareto_chart(df, by_variable, quant_variable):
    df.index = by_variable
    df["cumpercentage"] = quant_variable.cumsum()/quant_variable.sum()*100

    fig, ax = plt.subplots()
    ax.bar(df.index, quant_variable, color="C0")
    ax2 = ax.twinx()
    ax2.plot(df.index, df["cumpercentage"], color="C1", marker="D", ms=7)
    ax2.yaxis.set_major_formatter(PercentFormatter())

    ax.tick_params(axis="y", colors="C0")
    ax2.tick_params(axis="y", colors="C1")
    plt.show()

And this one includes Pareto by grouping according to a threshold, too. For example: If you set it to 70, it will group minorities beyond 70 into one group called "Other".

这个也包括根据阈值分组的帕累托。例如:如果您将其设置为 70,它会将 70 岁以上的少数群体归为一组,称为“其他”。

def create_pareto_chart(by_variable, quant_variable, threshold):

    total=quant_variable.sum()
    df = pd.DataFrame({'by_var':by_variable, 'quant_var':quant_variable})
    df["cumpercentage"] = quant_variable.cumsum()/quant_variable.sum()*100
    df = df.sort_values(by='quant_var',ascending=False)
    df_above_threshold = df[df['cumpercentage'] < threshold]
    df=df_above_threshold
    df_below_threshold = df[df['cumpercentage'] >= threshold]
    sum = total - df['quant_var'].sum()
    restbarcumsum = 100 - df_above_threshold['cumpercentage'].max()
    rest = pd.Series(['OTHERS', sum, restbarcumsum],index=['by_var','quant_var', 'cumpercentage'])
    df = df.append(rest,ignore_index=True)
    df.index = df['by_var']
    df = df.sort_values(by='cumpercentage',ascending=True)

    fig, ax = plt.subplots()
    ax.bar(df.index, df["quant_var"], color="C0")
    ax2 = ax.twinx()
    ax2.plot(df.index, df["cumpercentage"], color="C1", marker="D", ms=7)
    ax2.yaxis.set_major_formatter(PercentFormatter())

    ax.tick_params(axis="x", colors="C0", labelrotation=70)
    ax.tick_params(axis="y", colors="C0")
    ax2.tick_params(axis="y", colors="C1")

    plt.show()

回答by Lucas Aimaretto

Another way is using the secondary_yparameter without using twinx():

另一种方法是使用secondary_y参数而不使用twinx()

df['pareto'] = 100 *df.country.cumsum() / df.country.sum()
fig, axes = plt.subplots()
ax1 = df.plot(use_index=True, y='country',  kind='bar', ax=axes)
ax2 = df.plot(use_index=True, y='pareto', marker='D', color="C1", kind='line', ax=axes, secondary_y=True)
ax2.set_ylim([0,110])

enter image description here

在此处输入图片说明

The parameter use_index=Trueis needed because your indexis your xaxis in this case. Otherwise you could've used x='x_Variable'.

该参数use_index=True是必需的,因为在这种情况下您index是您的x轴。否则你可以使用x='x_Variable'.

回答by venergiac

pareto chart for pandas.dataframe

pandas.dataframe 的帕累托图

import pandas as pd
import matplotlib.pyplot as plt
from matplotlib.ticker import PercentFormatter


def _plot_pareto_by(df_, group_by, column):

    df = df_.groupby(group_by)[column].sum().reset_index()
    df = df.sort_values(by=column,ascending=False)

    df["cumpercentage"] = df[column].cumsum()/df[column].sum()*100


    fig, ax = plt.subplots(figsize=(20,5))
    ax.bar(df[group_by], df[column], color="C0")
    ax2 = ax.twinx()
    ax2.plot(df[group_by], df["cumpercentage"], color="C1", marker="D", ms=7)
    ax2.yaxis.set_major_formatter(PercentFormatter())

    ax.tick_params(axis="y", colors="C0")
    ax2.tick_params(axis="y", colors="C1")

    for tick in ax.get_xticklabels():
        tick.set_rotation(45)
    plt.show()

enter image description here

在此处输入图片说明