Numpy & Pandas:从 Pandas 直方图返回直方图值?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/38451407/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 01:36:46  来源:igfitidea点击:

Numpy & Pandas: Return histogram values from pandas histogram plot?

pythonnumpypandasmatplotlib

提问by cqcn1991

I know that I can plot histogram by pandas:

我知道我可以用Pandas绘制直方图:

df4 = pd.DataFrame({'a': np.random.randn(1000) + 1})
df4['a'].hist()

enter image description here

在此处输入图片说明

But how can I retrieve the histogram count from such a plot?

但是如何从这样的图中检索直方图计数?

I know I can do it by (from Histogram values of a Pandas Series)

我知道我可以通过(来自Pandas 系列的直方图值

count,division = np.histogram(df4['a'])

But get the count value after df.hist()using this feels very redundent. Is it possible to get the frequency value directly from pandas?

但是df.hist()使用这个之后得到count值感觉很冗余。是否可以直接从 Pandas 获取频率值?

回答by piRSquared

The quick answer is:

快速答案是:

pd.cut(df4['a'], 10).value_counts().sort_index()

From the documentation:

文档

bins: integer, default 10
Number of histogram bins to be used
bins: integer, default 10
Number of histogram bins to be used

So look at pd.cut(df4['a'], 10).value_counts()

所以看看 pd.cut(df4['a'], 10).value_counts()

You see that the values are the same as from np.histogram

您会看到这些值与 from 相同 np.histogram

回答by Alex Spangher

This is another way to calculate a histogram in pandas. It is more complicated but IMO better since you avoid the weird stringed-bins that pd.cutreturns that wreck any plot. You will also get style points for using .pipe():

这是在 Pandas 中计算直方图的另一种方法。它更复杂,但 IMO 更好,因为您避免了pd.cut返回破坏任何情节的奇怪的字符串。您还将获得使用的样式点.pipe()

(df['a']
 .pipe(lambda s: pd.Series(np.histogram(s, range=(0, 100), bins=20)))
 .pipe(lambda s: pd.Series(s[0], index=s[1][:-1]))
)

You can then pipe on more things at the end, like:

然后你可以在最后处理更多的事情,比如:

.pipe(lambda s: s/s.sum())

which will give you a distribution.

这会给你一个分布。

Ideally, there'd be a sensible densityin pd.histthat could do this for you. Pandasdoes have a density=Falsekeyword but it's nonsensical. I've read explanations a thousand times, like this one, but I've never understood it nor understood who would actually useit. 99.9% of the time when you see fractions on a histogram, you think "distribution", not np.sum(pdf * np.diff(bins))which is what density=Trueactually calculates. Makes you want to weep.

理想情况下,有一个明智density的人pd.hist可以为您做到这一点。Pandas确实有一个density=False关键字,但它是荒谬的。我已经阅读了一千遍解释,就像这个一样,但我从来没有理解它,也不知道谁会真正使用它。当你看到一个直方图分数99.9%的时间,你认为“分配”,而不是np.sum(pdf * np.diff(bins))它是什么density=True实际计算。让人想哭。