Python 使用 seaborn 为数据框绘制直方图

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/32923301/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 12:29:05  来源:igfitidea点击:

Plotting histogram using seaborn for a dataframe

pythonnumpypandasseaborn

提问by user1017373

I have a dataFrame which has multiple columns and many rows..Many row has no value for column so in the data frame its represented as NaN. The example dataFrame is as follows,

我有一个具有多列和多行的数据框。许多行没有列的值,因此在数据框中它表示为 NaN。示例数据帧如下,

df.head()
GEN Sample_1    Sample_2    Sample_3    Sample_4    Sample_5    Sample_6    Sample_7    Sample_8    Sample_9    Sample_10   Sample_11   Sample_12   Sample_13   Sample_14
A123    9.4697  3.19689 4.8946  8.54594 13.2568 4.93848 3.16809 NAN NAN NAN NAN NAN NAN NAN
A124    6.02592 4.0663  3.9218  2.66058 4.38232         NAN NAN NAN NAN NAN NAN NAN
A125    7.88999 2.51576 4.97483 5.8901  21.1346 5.06414 15.3094 2.68169 8.12449 NAN NAN NAN NAN NAN
A126    5.99825 10.2186 15.2986 7.53729 4.34196 8.75048 16.9358 5.52708 NAN NAN NAN NAN NAN NAN
A127    28.5014 4.86702 NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN

I wanted to plot histogram for this dataFrame using seaborn function from python and so i was trying the following lines,

我想使用 python 中的 seaborn 函数为这个 dataFrame 绘制直方图,所以我尝试了以下几行,

sns.set(color_codes=True)
sns.set(style="white", palette="muted")
sns.distplot(df)

But its throwing the following error,

但它抛出以下错误,

    ValueError                                Traceback (most recent call last)
    <ipython-input-80-896d7fe85ef3> in <module>()
          1 sns.set(color_codes=True)
          2 sns.set(style="white", palette="muted")
    ----> 3 sns.distplot(df)

    /anaconda3/lib/python3.4/site-packages/seaborn/distributions.py in distplot(a, bins, hist, kde, rug, fit, hist_kws, kde_kws, rug_kws, fit_kws, color, vertical, norm_hist, axlabel, label, ax)
        210         hist_color = hist_kws.pop("color", color)
        211         ax.hist(a, bins, orientation=orientation,
    --> 212                 color=hist_color, **hist_kws)
        213         if hist_color != color:
        214             hist_kws["color"] = hist_color

   /anaconda3/lib/python3.4/site-packages/matplotlib/axes/_axes.py in hist(self, x, bins, range, normed, weights, cumulative, bottom, histtype, align, orientation, rwidth, log, color, label, stacked, **kwargs)
       5627             color = mcolors.colorConverter.to_rgba_array(color)
       5628             if len(color) != nx:
    -> 5629                 raise ValueError("color kwarg must have one color per dataset")
       5630 
       5631         # We need to do to 'weights' what was done to 'x'

    ValueError: color kwarg must have one color per dataset

Any helps/suggestions to get rid of this error would be greatly appreciated..!!!

任何帮助/建议摆脱这个错误将不胜感激..!!!

采纳答案by Mike Williamson

I had also thought the seaborndocumentationmentioned that multiple columns could be plotted simultaneously, and highlighted by color by default.

我还认为seaborn文档提到可以同时绘制多个列,并默认以颜色突出显示。

But upon re-reading, I did not see anything. Instead, I think I inferred it from this tutorial, where part of the way through, the tutorial plots a data frame with multiple columns.

但重读后,我什么也没看到。相反,我认为我是从本教程中推断出来的,在其中的一部分过程中,本教程绘制了一个具有多列的数据框。



However, the "solution" is trivial, and hopefully exactly what you're looking for:

然而,“解决方案”是微不足道的,希望正是您正在寻找的:

sns.set(color_codes=True)
sns.set(style="white", palette="muted")
sns.distplot(df)

for col_id in df.columns:
    sns.distplot(df[col_id])

By default, this will alter the colors, "knowing" which one has already been used.

默认情况下,这将改变颜色,“知道”已经使用了哪种颜色。

Generated image from code above (using different data set)

从上面的代码生成的图像(使用不同的数据集)

Note: I used a different data set, since I wasn't sure how to re-create yours.

注意:我使用了不同的数据集,因为我不确定如何重新创建您的数据集。

回答by Sergey Bushmanov

Let's assume I have the excerpt from the data you have showed above (with only difference that on my machine NANis NaN).

让我们假设我有你上面显示的数据的摘录(我的机器上唯一的区别NANNaN)。

Then, the best graphical representation I can think of is grouped barplot: one group for every sample, within every group there are gene bars (some people call this histogram occasionally)

然后,我能想到的最好的图形表示是分组条形图:每个样本一组,每组中都有基因条(有些人偶尔会称之为直方图)

In order to do that, you need first to "melt" your data, in Rparlour, i.e. make it "long". Then, you can proceed with plotting.

为了做到这一点,您首先需要在R客厅“融化”您的数据,即使其“长”。然后,您可以继续绘图。

data = df.set_index('GEN').unstack().reset_index()
data.columns = ['sample','GEN', 'value']

sns.set(style="white")
g = sns.factorplot(x='sample'
                   ,y= 'value'
                   ,hue='GEN'
                   ,data=data
                   ,kind='bar'
                   ,aspect=2
                   )
g.set_xticklabels(rotation=30);

enter image description here

在此处输入图片说明

Please, let us know if this is the type of plot you were after.

请告诉我们这是否是您所追求的情节类型。

回答by Ivan Zhovannik

I had similar problem because my pandas.DataFramehad elements of type Objectin a column I wanted to plot (my_column). So that the command:

我遇到了类似的问题,因为我的pandas.DataFrame在我想要绘制的列(my_column)中有Object类型的元素。这样命令:

print(df[my_column])

gave me:

给我:

Length: 150, dtype: object

The solution was

解决办法是

sns.distplot(df[my_column].astype(float))

As the datatype of my_columntransformed to:

由于my_column的数据类型转换为:

Length: 150, dtype: float64

enter image description here

在此处输入图片说明