pandas / matplotlib：分面条形图

Question

提问by ako

I am making a series of bar plots of data with two categorical variables and one numeric. What i have is the below, but what I would love to do is to facet by one of the categorical variables as with facet_wrapin ggplot. I have a somewhat working example, but I get the wrong plot type (lines and not bars) and I do subsetting of the data in a loop--that can't be the best way.

我正在制作一系列带有两个分类变量和一个数字的数据条形图。我所拥有的是以下内容，但我想做的是通过分类变量之一进行分面，如facet_wrapin ggplot。我有一个有点工作的例子，但我得到了错误的绘图类型（线而不是条），我在循环中对数据进行了子集化——这不是最好的方法。

## first try--plain vanilla
import pandas as pd
import numpy as np
N = 100

## generate toy data
ind = np.random.choice(['a','b','c'], N)
cty = np.random.choice(['x','y','z'], N)
jobs = np.random.randint(low=1,high=250,size=N)

## prep data frame
df_city = pd.DataFrame({'industry':ind,'city':cty,'jobs':jobs})
df_city_grouped = df_city.groupby(['city','industry']).jobs.sum().unstack()
df_city_grouped.plot(kind='bar',stacked=True,figsize=(9, 6))

This gives something like this:

这给出了这样的东西：

  city industry  jobs
0    z        b   180
1    z        c   121
2    x        a    33
3    z        a   121
4    z        c   236

firstplot

第一个情节

However, what i would like to see is something like this:

但是，我希望看到的是这样的：

## R code
library(plyr)
df_city<-read.csv('/home/aksel/Downloads/mockcity.csv',sep='\t')

## summarize
df_city_grouped <- ddply(df_city, .(city,industry), summarise, jobstot = sum(jobs))

## plot
ggplot(df_city_grouped, aes(x=industry, y=jobstot)) +
  geom_bar(stat='identity') +
  facet_wrap(~city)

enter image description here

在此处输入图片说明

The closest I get with matplotlib is something like this:

我与 matplotlib 最接近的是这样的：

cols =df_city.city.value_counts().shape[0]
fig, axes = plt.subplots(1, cols, figsize=(8, 8))

for x, city in enumerate(df_city.city.value_counts().index.values):
    data = df_city[(df_city['city'] == city)]
    data = data.groupby(['industry']).jobs.sum()
    axes[x].plot(data)

enter image description here

在此处输入图片说明

So two questions:

所以两个问题：

Can I do bar plots (they plot lines as shown here) using the AxesSubplot object and end up with something along the lines of the facet_wrap example from ggplotexample;
In loops generating charts such as this attempt, I subset the data in each. I can't imagine that is the 'proper' way to do this type of faceting?

我可以使用 AxesSubplot 对象绘制条形图（它们绘制此处所示的线条），并最终得到与 example 中 facet_wrap 示例相同的内容吗ggplot？
在像这种尝试的循环生成图表中，我对每个图表中的数据进行了子集化。我无法想象这是进行此类刻面的“正确”方式？

Answer 1

采纳答案by Phlya

Second example here: http://pandas-docs.github.io/pandas-docs-travis/visualization.html#bar-plots

这里的第二个例子：http: //pandas-docs.github.io/pandas-docs-travis/visualization.html#bar-plots

Anyway, you can always do that by hand, as you did yourself.

无论如何，您始终可以像自己一样手动完成此操作。

EDIT: BTW, you can always use rpy2 in python, so you can do all the same things as in R.

编辑：顺便说一句，您始终可以在 python 中使用 rpy2，因此您可以执行与 R 中相同的所有操作。

Also, have a look at this: http://pandas.pydata.org/pandas-docs/stable/rplot.htmlI am not sure, but it should be helpful for creating plots over many panels, though might require further reading.

另外，看看这个：http: //pandas.pydata.org/pandas-docs/stable/rplot.html我不确定，但它应该有助于在许多面板上创建绘图，尽管可能需要进一步阅读。

Answer 2

回答by ako

@tcasell suggested the barcall in the loop. Here is a working, if not elegant, example.

@tcasell 建议bar在循环中调用。这是一个工作的，如果不是优雅的例子。

## second try--facet by county

N = 100
industry = ['a','b','c']
city = ['x','y','z']
ind = np.random.choice(industry, N)
cty = np.random.choice(city, N)
jobs = np.random.randint(low=1,high=250,size=N)
df_city =pd.DataFrame({'industry':ind,'city':cty,'jobs':jobs})

## how many panels do we need?
cols =df_city.city.value_counts().shape[0]
fig, axes = plt.subplots(1, cols, figsize=(8, 8))

for x, city in enumerate(df_city.city.value_counts().index.values):
    data = df_city[(df_city['city'] == city)]
    data = data.groupby(['industry']).jobs.sum()
    print (data)
    print type(data.index)
    left=  [k[0] for k in enumerate(data)]
    right=  [k[1] for k in enumerate(data)]

    axes[x].bar(left,right,label="%s" % (city))
    axes[x].set_xticks(left, minor=False)
    axes[x].set_xticklabels(data.index.values)

    axes[x].legend(loc='best')
    axes[x].grid(True)
    fig.suptitle('Employment By Industry By City', fontsize=20)

enter image description here

在此处输入图片说明

pandas / matplotlib：分面条形图

提问by ako

采纳答案by Phlya

回答by ako

相关推荐

最近更新

标签

pandas / matplotlib：分面条形图

提问by ako

采纳答案by Phlya

回答by ako

相关推荐

pandas 如何*不*在ipython笔记本（熊猫数据框的html表）中显示'NaN'？

如果只有一列，为什么 Pandas Transform 会失败

pandas 使用熊猫叠加多个直方图

pandas - 具有非数字值的pivot_table？（数据错误：没有要聚合的数字类型）

相关推荐

最近更新

标签

pandas 如何不在ipython笔记本（熊猫数据框的html表）中显示'NaN'？