Numpy & Pandas：从 Pandas 直方图返回直方图值？

Question

提问by cqcn1991

I know that I can plot histogram by pandas:

我知道我可以用Pandas绘制直方图：

df4 = pd.DataFrame({'a': np.random.randn(1000) + 1})
df4['a'].hist()

But how can I retrieve the histogram count from such a plot?

但是如何从这样的图中检索直方图计数？

I know I can do it by (from Histogram values of a Pandas Series)

我知道我可以通过（来自Pandas 系列的直方图值）

count,division = np.histogram(df4['a'])

But get the count value after df.hist()using this feels very redundent. Is it possible to get the frequency value directly from pandas?

但是df.hist()使用这个之后得到count值感觉很冗余。是否可以直接从 Pandas 获取频率值？

Answer 1

回答by piRSquared

The quick answer is:

快速答案是：

pd.cut(df4['a'], 10).value_counts().sort_index()

From the documentation:

从文档：

bins: integer, default 10
Number of histogram bins to be used

bins: integer, default 10
Number of histogram bins to be used

So look at pd.cut(df4['a'], 10).value_counts()

所以看看 pd.cut(df4['a'], 10).value_counts()

You see that the values are the same as from np.histogram

您会看到这些值与 from 相同 np.histogram

Answer 2

回答by Alex Spangher

This is another way to calculate a histogram in pandas. It is more complicated but IMO better since you avoid the weird stringed-bins that pd.cutreturns that wreck any plot. You will also get style points for using .pipe():

这是在 Pandas 中计算直方图的另一种方法。它更复杂，但 IMO 更好，因为您避免了pd.cut返回破坏任何情节的奇怪的字符串。您还将获得使用的样式点.pipe()：

(df['a']
 .pipe(lambda s: pd.Series(np.histogram(s, range=(0, 100), bins=20)))
 .pipe(lambda s: pd.Series(s[0], index=s[1][:-1]))
)

You can then pipe on more things at the end, like:

然后你可以在最后处理更多的事情，比如：

.pipe(lambda s: s/s.sum())

which will give you a distribution.

这会给你一个分布。

Ideally, there'd be a sensible densityin pd.histthat could do this for you. Pandasdoes have a density=Falsekeyword but it's nonsensical. I've read explanations a thousand times, like this one, but I've never understood it nor understood who would actually useit. 99.9% of the time when you see fractions on a histogram, you think "distribution", not np.sum(pdf * np.diff(bins))which is what density=Trueactually calculates. Makes you want to weep.

理想情况下，有一个明智density的人pd.hist可以为您做到这一点。Pandas确实有一个density=False关键字，但它是荒谬的。我已经阅读了一千遍解释，就像这个一样，但我从来没有理解它，也不知道谁会真正使用它。当你看到一个直方图分数99.9％的时间，你认为“分配”，而不是np.sum(pdf * np.diff(bins))它是什么density=True实际计算。让人想哭。

Numpy & Pandas：从 Pandas 直方图返回直方图值？

提问by cqcn1991

回答by piRSquared

回答by Alex Spangher

相关推荐

最近更新

标签

Numpy & Pandas：从 Pandas 直方图返回直方图值？

提问by cqcn1991

回答by piRSquared

回答by Alex Spangher

相关推荐

返回 Pandas 数据框中特定值的列名

Pandas to_sql 到 sqlite 返回“Engine”对象没有属性“cursor”

在列表中的数据框列中搜索部分字符串匹配 - Pandas - Python

在 Pandas 中反转“one-hot”编码

相关推荐

最近更新

标签