Pandas 中的列直方图
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/25447208/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Column histograms in Pandas
提问by Amelio Vazquez-Reina
Say I have a dataframe like the following:
假设我有一个如下所示的数据框:
A B C D
s1 1 2 4 2
s2 2 1 4 3
s3 1 4 1 3
I would like to get a bar plot that shows the histogram of values per column. That is, a bar plot that shows histograms per columnnext to each other in the x axis, with spacing between the histograms (columns). In other words, it would be a two-level bar chart, where for each column in the dataframe we have bars representing the histogram of the column.
我想得到一个条形图,显示每列值的直方图。即,条形图显示每列在 x 轴上相邻的直方图,直方图(列)之间有间距。换句话说,它将是一个两级条形图,其中对于数据框中的每一列,我们都有表示该列直方图的条形图。
In case it matters, we can assume that the number of possible values each column has is known and constant for every column (e.g. range [0,5])
万一重要,我们可以假设每列具有的可能值的数量是已知的并且对于每列都是恒定的(例如 range [0,5])
When I try doing:
当我尝试做:
df.plot(kind='bar')
I get something completely different from what I want (the x ticks correspond to the rows, instead of [columns: [value0, value1, valueN]). The closest "in spirit" to what I want is is:
我得到的东西与我想要的完全不同(x 刻度对应于行,而不是 [ columns: [value0, value1, valueN])。最接近我想要的“精神”是:
df.plot(kind='density')
But I am looking for a histogram-like description per column, more than an overlay of PDFs.
但我正在寻找每列类似直方图的描述,而不仅仅是 PDF 的叠加。
Update
更新
Hopefully an example helps. I am looking for something like this plot below, (code here) but instead of showing two scores per group, it would show a histogram of values per column in my dataframe:
希望一个例子有帮助。我正在寻找类似下面这个图的东西,(这里的代码)但不是每组显示两个分数,而是在我的数据框中显示每列值的直方图:


回答by BKay
This presentation doesn't rescale, it horizontally translates the individual histograms so that they don't overlap and then labels the X-axis with the column names (at median values) rather than represent scale.
此演示文稿不会重新缩放,它会水平平移单个直方图,使它们不会重叠,然后使用列名称(中值)标记 X 轴,而不是表示比例。
from pandas import DataFrame
from numpy.random import randn
sample = 1000
df = DataFrame(randn(sample, 8))
accum1 = 0
accum2 = 0
spacer = 1
MyTics = []
for colname in df.columns:
TransformedValues = df[colname] - accum1 + accum2
MyTics.extend([TransformedValues.median()])
axs = (TransformedValues).hist()
accum1 += df[colname].min()
accum2 += df[colname].max() + spacer
axs.set_xticks(MyTics)
axs.set_xticklabels(df.columns)


回答by weemattisnot
There is numpy's histogramfunction, and matplotlib's histogram plotting function 'hist'.
有 numpy 的直方图函数和 matplotlib 的直方图绘图函数 'hist'。

