Numpy & Pandas:从 Pandas 直方图返回直方图值?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/38451407/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Numpy & Pandas: Return histogram values from pandas histogram plot?
提问by cqcn1991
I know that I can plot histogram by pandas:
我知道我可以用Pandas绘制直方图:
df4 = pd.DataFrame({'a': np.random.randn(1000) + 1})
df4['a'].hist()
But how can I retrieve the histogram count from such a plot?
但是如何从这样的图中检索直方图计数?
I know I can do it by (from Histogram values of a Pandas Series)
我知道我可以通过(来自Pandas 系列的直方图值)
count,division = np.histogram(df4['a'])
But get the count value after df.hist()
using this feels very redundent. Is it possible to get the frequency value directly from pandas?
但是df.hist()
使用这个之后得到count值感觉很冗余。是否可以直接从 Pandas 获取频率值?
回答by piRSquared
The quick answer is:
快速答案是:
pd.cut(df4['a'], 10).value_counts().sort_index()
From the documentation:
从文档:
bins: integer, default 10 Number of histogram bins to be used
bins: integer, default 10 Number of histogram bins to be used
So look at pd.cut(df4['a'], 10).value_counts()
所以看看 pd.cut(df4['a'], 10).value_counts()
You see that the values are the same as from np.histogram
您会看到这些值与 from 相同 np.histogram
回答by Alex Spangher
This is another way to calculate a histogram in pandas. It is more complicated but IMO better since you avoid the weird stringed-bins that pd.cut
returns that wreck any plot. You will also get style points for using .pipe()
:
这是在 Pandas 中计算直方图的另一种方法。它更复杂,但 IMO 更好,因为您避免了pd.cut
返回破坏任何情节的奇怪的字符串。您还将获得使用的样式点.pipe()
:
(df['a']
.pipe(lambda s: pd.Series(np.histogram(s, range=(0, 100), bins=20)))
.pipe(lambda s: pd.Series(s[0], index=s[1][:-1]))
)
You can then pipe on more things at the end, like:
然后你可以在最后处理更多的事情,比如:
.pipe(lambda s: s/s.sum())
which will give you a distribution.
这会给你一个分布。
Ideally, there'd be a sensible density
in pd.hist
that could do this for you. Pandas
does have a density=False
keyword but it's nonsensical. I've read explanations a thousand times, like this one, but I've never understood it nor understood who would actually useit. 99.9% of the time when you see fractions on a histogram, you think "distribution", not np.sum(pdf * np.diff(bins))
which is what density=True
actually calculates. Makes you want to weep.
理想情况下,有一个明智density
的人pd.hist
可以为您做到这一点。Pandas
确实有一个density=False
关键字,但它是荒谬的。我已经阅读了一千遍解释,就像这个一样,但我从来没有理解它,也不知道谁会真正使用它。当你看到一个直方图分数99.9%的时间,你认为“分配”,而不是np.sum(pdf * np.diff(bins))
它是什么density=True
实际计算。让人想哭。