Python 绘制 groupbys 时 Seaborn 的“无法解释输入”错误

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/32908315/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 12:27:38  来源:igfitidea点击:

'Could not interpret input' error with Seaborn when plotting groupbys

pythonpandasgroupingaggregateseaborn

提问by marshallbanana

Say I have this dataframe

说我有这个数据框

d = {     'Path'   : ['abc', 'abc', 'ghi','ghi', 'jkl','jkl'],
          'Detail' : ['foo', 'bar', 'bar','foo','foo','foo'],
          'Program': ['prog1','prog1','prog1','prog2','prog3','prog3'],
          'Value'  : [30, 20, 10, 40, 40, 50],
          'Field'  : [50, 70, 10, 20, 30, 30] }


df = DataFrame(d)
df.set_index(['Path', 'Detail'], inplace=True)
df

               Field Program  Value
Path Detail                      
abc  foo        50   prog1     30
     bar        70   prog1     20
ghi  bar        10   prog1     10
     foo        20   prog2     40
jkl  foo        30   prog3     40
     foo        30   prog3     50

I can aggregate it no problem (if there's a better way to do this, by the way, I'd like to know!)

我可以汇总它没问题(如果有更好的方法来做到这一点,顺便说一下,我想知道!)

df_count = df.groupby('Program').count().sort(['Value'], ascending=False)[['Value']]
df_count

Program   Value
prog1    3
prog3    2
prog2    1

df_mean = df.groupby('Program').mean().sort(['Value'], ascending=False)[['Value']]
df_mean

Program  Value
prog3    45
prog2    40
prog1    20

I can plot it from Pandas no problem...

我可以从 Pandas 绘制它没问题...

df_mean.plot(kind='bar')

But why do I get this error when I try it in seaborn?

但是为什么我在 seaborn 中尝试时会收到此错误?

sns.factorplot('Program',data=df_mean)
    ---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-26-23c2921627ec> in <module>()
----> 1 sns.factorplot('Program',data=df_mean)

C:\Anaconda3\lib\site-packages\seaborn\categorical.py in factorplot(x, y, hue, data, row, col, col_wrap, estimator, ci, n_boot, units, order, hue_order, row_order, col_order, kind, size, aspect, orient, color, palette, legend, legend_out, sharex, sharey, margin_titles, facet_kws, **kwargs)
   2673     # facets to ensure representation of all data in the final plot
   2674     p = _CategoricalPlotter()
-> 2675     p.establish_variables(x_, y_, hue, data, orient, order, hue_order)
   2676     order = p.group_names
   2677     hue_order = p.hue_names

C:\Anaconda3\lib\site-packages\seaborn\categorical.py in establish_variables(self, x, y, hue, data, orient, order, hue_order, units)
    143                 if isinstance(input, string_types):
    144                     err = "Could not interperet input '{}'".format(input)
--> 145                     raise ValueError(err)
    146 
    147             # Figure out the plotting orientation

ValueError: Could not interperet input 'Program'

采纳答案by lrnzcig

The reason for the exception you are getting is that Programbecomes an index of the dataframes df_meanand df_countafter your group_byoperation.

您获得异常的原因是它Program成为数据帧的索引df_meandf_count在您的group_by操作之后。

If you wanted to get the factorplotfrom df_mean, an easy solution is to add the index as a column,

如果您想获取factorplotfrom df_mean,一个简单的解决方案是将索引添加为列,

In [7]:

df_mean['Program'] = df_mean.index

In [8]:

%matplotlib inline
import seaborn as sns
sns.factorplot(x='Program', y='Value', data=df_mean)

However you could even more simply let factorplotdo the calculations for you,

然而,你甚至可以更简单地让factorplot你为你做计算,

sns.factorplot(x='Program', y='Value', data=df)

You'll obtain the same result. Hope it helps.

您将获得相同的结果。希望能帮助到你。

EDIT after comments

评论后编辑

Indeed you make a very good point about the parameter as_index; by default it is set to True, and in that case Programbecomes part of the index, as in your question.

确实,您对参数提出了很好的观点as_index;默认情况下,它设置为 True,在这种情况下,它Program会成为索引的一部分,就像您的问题一样。

In [14]:

df_mean = df.groupby('Program', as_index=True).mean().sort(['Value'], ascending=False)[['Value']]
df_mean

Out[14]:
        Value
Program 
prog3   45
prog2   40
prog1   20

Just to be clear, this way Programis not column anymore, but it becomes the index. the trick df_mean['Program'] = df_mean.indexactually keeps the index as it is, and adds a new column for the index, so that Programis duplicated now.

需要明确的是,这种方式Program不再是列,而是成为索引。这个技巧df_mean['Program'] = df_mean.index实际上保持索引原样,并为索引添加一个新列,以便Program现在复制。

In [15]:

df_mean['Program'] = df_mean.index
df_mean

Out[15]:
        Value   Program
Program     
prog3   45  prog3
prog2   40  prog2
prog1   20  prog1

However, if you set as_indexto False, you get Programas a column, plus a new autoincrement index,

然而,如果你设置as_index为 False,你会得到Program一个列,加上一个新的自动增量索引,

In [16]:

df_mean = df.groupby('Program', as_index=False).mean().sort(['Value'], ascending=False)[['Program', 'Value']]
df_mean

Out[16]:
    Program Value
2   prog3   45
1   prog2   40
0   prog1   20

This way you could feed it directly to seaborn. Still, you could use dfand get the same result.

这样你就可以直接把它喂给seaborn. 不过,您可以使用df并获得相同的结果。

Hope it helps.

希望能帮助到你。