Python 对 pandas/matplotlib 条形图中条形的顺序进行排序

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/22635110/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 01:19:49  来源:igfitidea点击:

Sorting the order of bars in pandas/matplotlib bar plots

pythonmatplotlibpandas

提问by psychemedia

What is the Pythonic/pandas way of sorting 'levels' within a column in pandas to give a specific ordering of bars in bar plot.

什么是 Pythonic/pandas 在 Pandas 中对列中的“级别”进行排序以给出条形图中条形的特定顺序的方法。

For example, given:

例如,给定:

import pandas as pd
df = pd.DataFrame({
    'group': ['a', 'a', 'a', 'a', 'a', 'a', 'a', 
              'b', 'b', 'b', 'b', 'b', 'b', 'b'],
    'day': ['Mon', 'Tues', 'Fri', 'Thurs', 'Sat', 'Sun', 'Weds',
            'Fri', 'Sun', 'Thurs', 'Sat', 'Weds', 'Mon', 'Tues'],
    'amount': [1, 2, 4, 2, 1, 1, 2, 4, 5, 3, 4, 2, 1, 3]})
dfx = df.groupby(['group'])
dfx.plot(kind='bar', x='day')

I can generate the following pair of plots:

我可以生成以下一对图:

Disordered bar charts

无序的条形图

The order of the bars follows the row order.

条形的顺序遵循行顺序。

What's the best way of reordering the data so that the bar charts have bars ordered Mon-Sun?

重新排序数据以便条形图具有按周一至周日排序的条形的最佳方法是什么?

UPDATE: this rubbish solution works - but it's far from elegant in the way it uses an extra sorting column:

更新:这个垃圾解决方案有效 - 但它使用额外排序列的方式远非优雅:

df2 = pd.DataFrame({
    'day': ['Mon', 'Tues', 'Weds', 'Thurs', 'Fri', 'Sat', 'Sun'],
    'num': [0, 1, 2, 3, 4, 5, 6]})
df = pd.merge(df, df2, on='day')
df = df.sort_values('num')
dfx = df.groupby(['group'])
dfx.plot(kind='bar', x='day')

FURTHER GENERALISATION:

进一步概括:

Is there a solution that also fixes the order of bars in a 'dodged' bar plot:

是否有解决方案也可以修复“躲避”条形图中条形的顺序:

df.pivot('day', 'group', 'amount').plot(kind='bar')

enter image description here

在此处输入图片说明

采纳答案by Dan Allan

You'll have to provide a mapping to specify how to order the day names. (If they were stored as proper dates, there would be other ways to do this.)

您必须提供一个映射来指定如何对日期名称进行排序。(如果它们被存储为正确的日期,还有其他方法可以做到这一点。)

Updated:

更新:

Build the key. You could write out a dictionary explicitly or use something clever like this dict comprehension.

构建密钥。你可以明确地写出一本字典,或者使用一些聪明的东西,比如这个 dict comprehension。

weekdays = ['Mon', 'Tues', 'Weds', 'Thurs', 'Fri', 'Sat', 'Sun']
mapping = {day: i for i, day in enumerate(weekdays)}
key = df['day'].map(mapping)

And the sorting is simple:

排序很简单:

df.iloc[key.argsort()]

回答by Saul Berardo

I will provide bellow code to extend Dan's answer to address the "FURTHER GENERALIZATION" section of the OP's question. First, a complete example for the simple case (just one variable) based in Dan's solution:

我将提供波纹管代码来扩展丹的答案,以解决 OP 问题的“进一步概括”部分。首先,一个基于 Dan 解决方案的简单案例(只有一个变量)的完整示例:

import pandas as pd

# Create dataframe 
df=pd.DataFrame({
    'group':['a','a','a','a','a','a','a','b','b','b','b','b','b','b'],
    'day':['Mon','Tues','Fri','Thurs','Sat','Sun','Weds','Fri','Sun','Thurs','Sat','Weds','Mon','Tues'],
    'amount':[1,2,4,2,1,1,2,4,5,3,4,2,1,3]
})


# Calculate the total amount for each day
df_grouped = df.groupby(['day']).sum().amount.reset_index()

# Use Dan's trick to order days names in the table created by groupby
weekdays = ['Mon', 'Tues', 'Weds', 'Thurs', 'Fri', 'Sat', 'Sun']
mapping = {day: i for i, day in enumerate(weekdays)}
key = df_grouped['day'].map(mapping)    
df_grouped = df_grouped.iloc[key.argsort()]

# Draw the bar chart
df_grouped.plot(kind='bar', x='day')

And now, we use the same ordering technique to order the rows of the pivot table (instead of the rows created by groupby).

现在,我们使用相同的排序技术对数据透视表的行(而不是 groupby 创建的行)进行排序。

import pandas as pd

# Create dataframe 
df=pd.DataFrame({
    'group':['a','a','a','a','a','a','a','b','b','b','b','b','b','b'],
    'day':['Mon','Tues','Fri','Thurs','Sat','Sun','Weds','Fri','Sun','Thurs','Sat','Weds','Mon','Tues'],
    'amount':[1,2,4,2,1,1,2,4,5,3,4,2,1,3]
})

# Get the amount for each day AND EACH GROUP
df_grouped = df.groupby(['group', 'day']).sum().amount.reset_index()

# Create pivot table to get the total amount for each day and each in the proper format to plot multiple series with pandas
df_pivot = df_grouped.pivot('day','group','amount').reset_index()

# Use Dan's trick to order days names in the table created by PIVOT (not the table created by groupby, in the previous example)
weekdays = ['Mon', 'Tues', 'Weds', 'Thurs', 'Fri', 'Sat', 'Sun']
mapping = {day: i for i, day in enumerate(weekdays)}
key = df_pivot['day'].map(mapping)    
df_pivot = df_pivot.iloc[key.argsort()]

# Draw the bar chart
df_pivot.plot(kind='bar', x='day')

The result is shown bellow:

结果如下所示:

enter image description here

在此处输入图片说明

回答by djakubosky

I know this response is late, but a simplistic solution to the two cases presented, without use of a dictionary/mappings would be something like I've posted below.

我知道这个回复晚了,但是对于所提出的两个案例的简单解决方案,不使用字典/映射将类似于我在下面发布的内容。

Setting 'day' as an index enables you to use .loc to select data in a specific order

将 'day' 设置为索引使您可以使用 .loc 以特定顺序选择数据

1) For the two separate plots

1) 对于两个独立的地块

df=pd.DataFrame({'group':['a','a','a','a','a','a','a','b','b','b','b','b','b','b'],
     'day':['Mon','Tues','Fri','Thurs','Sat','Sun','Weds','Fri','Sun','Thurs','Sat','Weds','Mon','Tues'],
     'amount':[1,2,4,2,1,1,2,4,5,3,4,2,1,3]})

order = ['Mon', 'Tues', 'Weds','Thurs','Fri','Sat','Sun']`
df.set_index('day').loc[order].groupby('group').plot(kind='bar')

2) For the pivot example with the dodged plot:

2)对于带有躲避图的枢轴示例:

order = ['Mon', 'Tues', 'Weds','Thurs','Fri','Sat','Sun']
df.pivot('day','group','amount').loc[order].plot(kind='bar')

note that pivot results in day being in the index already so you can use .loc here again.

请注意,pivot 导致 day 已经在索引中,因此您可以在此处再次使用 .loc。

Edit: it is best practice to use .loc instead of .ix in these solutions, .ix will be deprecated and can have weird results when column names and indexes are numbers.

编辑:在这些解决方案中使用 .loc 而不是 .ix 是最佳实践,.ix 将被弃用,并且当列名和索引是数字时可能会产生奇怪的结果。