pandas 每个列数据框的分布概率,在一个图中

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/50952133/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 05:43:20  来源:igfitidea点击:

Distribution probabilities for each column data frame, in one plot

pandasvisualizationseaborn

提问by Annalix

I am creating probability distributions for each column of my data frame by distplot from seaborn library sns.distplot(). For one plot I do

我正在通过来自 seaborn 库 sns.distplot() 的 distplot 为我的数据框的每一列创建概率分布。对于一个情节我做

x = df['A']
sns.distplot(x);

I am trying to use the FacetGrid & Map to have all plots for each columns at once in this way. But doesn't work at all.

我正在尝试使用 FacetGrid 和 Map 以这种方式一次获得每列的所有图。但根本不起作用。

  g = sns.FacetGrid(df, col = 'A','B','C','D','E')
  g.map(sns.distplot())

回答by Scott Boston

I think you need to use meltto reshape your dataframe to long format, see this MVCE:

我认为您需要使用melt将数据帧重塑为长格式,请参阅此 MVCE:

df = pd.DataFrame(np.random.random((100,5)), columns = list('ABCDE'))
dfm = df.melt(var_name='columns')
g = sns.FacetGrid(dfm, col='columns')
g = (g.map(sns.distplot, 'value'))

Output: enter image description here

输出: 在此处输入图片说明

回答by ImportanceOfBeingErnest

You're getting this wrong on two levels.

你在两个层面上都弄错了。

  • Python syntax.
    FacetGrid(df, col = 'A','B','C','D','E')is invalid, because colgets set to Aand the remaining characters are interpreted as further arguments. But since they are not named, this is invalid python syntax.

  • Seaborn concepts.

    • Seaborn expects a single column name as input for the color rowargument. This means that the dataframe needs to be in a format that has one column which determines to which column or row the respective datum belongs.

    • You do not call the function to be used by map. The idea is of course that mapitself calls it.

  • Python 语法。
    FacetGrid(df, col = 'A','B','C','D','E')无效,因为col被设置为A并且剩余的字符被解释为进一步的参数。但由于它们没有命名,这是无效的 python 语法。

  • Seaborn 概念。

    • Seaborn 需要单个列名作为colorrow参数的输入。这意味着数据框需要采用具有一列的格式,该列确定相应数据属于哪一列或哪一行。

    • 您不调用 map 使用的函数。这个想法当然是map它本身所称的。

Solutions:

解决方案:

  • Loop over columns:

    import pandas as pd
    import numpy as np
    import matplotlib.pyplot as plt
    import seaborn as sns
    
    df = pd.DataFrame(np.random.randn(14,5), columns=list("ABCDE"))
    
    fig, axes = plt.subplots(ncols=5)
    for ax, col in zip(axes, df.columns):
        sns.distplot(df[col], ax=ax)
    
    plt.show()
    
  • Melt dataframe

    import pandas as pd
    import numpy as np
    import matplotlib.pyplot as plt
    import seaborn as sns
    
    df = pd.DataFrame(np.random.randn(14,5), columns=list("ABCDE"))
    
    g = sns.FacetGrid(df.melt(), col="variable")
    g.map(sns.distplot, "value")
    
    plt.show()
    
  • 循环列:

    import pandas as pd
    import numpy as np
    import matplotlib.pyplot as plt
    import seaborn as sns
    
    df = pd.DataFrame(np.random.randn(14,5), columns=list("ABCDE"))
    
    fig, axes = plt.subplots(ncols=5)
    for ax, col in zip(axes, df.columns):
        sns.distplot(df[col], ax=ax)
    
    plt.show()
    
  • 融化数据框

    import pandas as pd
    import numpy as np
    import matplotlib.pyplot as plt
    import seaborn as sns
    
    df = pd.DataFrame(np.random.randn(14,5), columns=list("ABCDE"))
    
    g = sns.FacetGrid(df.melt(), col="variable")
    g.map(sns.distplot, "value")
    
    plt.show()
    

回答by nishant

I think the easiest approach is to just loop the columns and create a plot.

我认为最简单的方法是循环列并创建一个图。

import numpy as np
improt pandas as pd
import matplotlib.pyplot as plt

df = pd.DataFrame(np.random.random((100,5)), columns = list('ABCDE'))
for col in df.columns:
    hist = df[col].hist(bins=10)
    print("Plotting for column {}".format(col))
    plt.show()

回答by E.Zolduoarrati

You can use the following:

您可以使用以下内容:

# listing dataframes types
list(set(df.dtypes.tolist()))
# include only float and integer
df_num = df.select_dtypes(include = ['float64', 'int64'])
# display what has been selected
df_num.head()
# plot
df_num.hist(figsize=(16, 20), bins=50, xlabelsize=8, ylabelsize=8);