pandas 每个列数据框的分布概率，在一个图中

Question

提问by Annalix

I am creating probability distributions for each column of my data frame by distplot from seaborn library sns.distplot(). For one plot I do

我正在通过来自 seaborn 库 sns.distplot() 的 distplot 为我的数据框的每一列创建概率分布。对于一个情节我做

x = df['A']
sns.distplot(x);

I am trying to use the FacetGrid & Map to have all plots for each columns at once in this way. But doesn't work at all.

我正在尝试使用 FacetGrid 和 Map 以这种方式一次获得每列的所有图。但根本不起作用。

  g = sns.FacetGrid(df, col = 'A','B','C','D','E')
  g.map(sns.distplot())

Answer 1

回答by Scott Boston

I think you need to use meltto reshape your dataframe to long format, see this MVCE:

我认为您需要使用melt将数据帧重塑为长格式，请参阅此 MVCE：

df = pd.DataFrame(np.random.random((100,5)), columns = list('ABCDE'))
dfm = df.melt(var_name='columns')
g = sns.FacetGrid(dfm, col='columns')
g = (g.map(sns.distplot, 'value'))

Output:

输出：

Answer 2

回答by ImportanceOfBeingErnest

You're getting this wrong on two levels.

你在两个层面上都弄错了。

Python syntax.
FacetGrid(df, col = 'A','B','C','D','E')is invalid, because colgets set to Aand the remaining characters are interpreted as further arguments. But since they are not named, this is invalid python syntax.
Seaborn concepts.
- Seaborn expects a single column name as input for the color rowargument. This means that the dataframe needs to be in a format that has one column which determines to which column or row the respective datum belongs.
- You do not call the function to be used by map. The idea is of course that mapitself calls it.

Python 语法。
FacetGrid(df, col = 'A','B','C','D','E')无效，因为col被设置为A并且剩余的字符被解释为进一步的参数。但由于它们没有命名，这是无效的 python 语法。
Seaborn 概念。
- Seaborn 需要单个列名作为colorrow参数的输入。这意味着数据框需要采用具有一列的格式，该列确定相应数据属于哪一列或哪一行。
- 您不调用 map 使用的函数。这个想法当然是map它本身所称的。

Solutions:

解决方案：

Loop over columns:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

df = pd.DataFrame(np.random.randn(14,5), columns=list("ABCDE"))

fig, axes = plt.subplots(ncols=5)
for ax, col in zip(axes, df.columns):
    sns.distplot(df[col], ax=ax)

plt.show()

Melt dataframe

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

df = pd.DataFrame(np.random.randn(14,5), columns=list("ABCDE"))

g = sns.FacetGrid(df.melt(), col="variable")
g.map(sns.distplot, "value")

plt.show()

循环列：

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

df = pd.DataFrame(np.random.randn(14,5), columns=list("ABCDE"))

fig, axes = plt.subplots(ncols=5)
for ax, col in zip(axes, df.columns):
    sns.distplot(df[col], ax=ax)

plt.show()

融化数据框

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

df = pd.DataFrame(np.random.randn(14,5), columns=list("ABCDE"))

g = sns.FacetGrid(df.melt(), col="variable")
g.map(sns.distplot, "value")

plt.show()

Answer 3

回答by nishant

I think the easiest approach is to just loop the columns and create a plot.

我认为最简单的方法是循环列并创建一个图。

import numpy as np
improt pandas as pd
import matplotlib.pyplot as plt

df = pd.DataFrame(np.random.random((100,5)), columns = list('ABCDE'))
for col in df.columns:
    hist = df[col].hist(bins=10)
    print("Plotting for column {}".format(col))
    plt.show()

Answer 4

回答by E.Zolduoarrati

You can use the following:

您可以使用以下内容：

# listing dataframes types
list(set(df.dtypes.tolist()))
# include only float and integer
df_num = df.select_dtypes(include = ['float64', 'int64'])
# display what has been selected
df_num.head()
# plot
df_num.hist(figsize=(16, 20), bins=50, xlabelsize=8, ylabelsize=8);

pandas 每个列数据框的分布概率，在一个图中

提问by Annalix

回答by Scott Boston

回答by ImportanceOfBeingErnest

回答by nishant

回答by E.Zolduoarrati

相关推荐

最近更新

标签

pandas 每个列数据框的分布概率，在一个图中

提问by Annalix

回答by Scott Boston

回答by ImportanceOfBeingErnest

回答by nishant

回答by E.Zolduoarrati

相关推荐

如何识别 Pandas 的 Parquet 后端

pandas 三维熊猫数据帧错误“必须通过二维输入”

pandas 检查数据框是否具有零元素

pandas 如何用熊猫列的最大值替换无限值？

相关推荐

最近更新

标签