是否有函数可以检索 Pandas 中系列的直方图计数?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/17148787/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Are there functions to retrieve the histogram counts of a Series in pandas?
提问by Rafael S. Calsaverini
There is a method to plotSeries histograms, but is there a function to retrieve the histogram counts to do further calculations on top of it?
有一种绘制系列直方图的方法,但是是否有一个函数可以检索直方图计数以在其之上进行进一步的计算?
I keep using numpy's functions to do this and converting the result to a DataFrame or Series when I need this. It would be nice to stay with pandas objects the whole time.
我一直使用 numpy 的函数来执行此操作,并在需要时将结果转换为 DataFrame 或 Series。一直和 pandas 对象呆在一起会很好。
采纳答案by Andy Hayden
If your Series was discrete you could use value_counts:
如果您的系列是离散的,您可以使用value_counts:
In [11]: s = pd.Series([1, 1, 2, 1, 2, 2, 3])
In [12]: s.value_counts()
Out[12]:
2 3
1 3
3 1
dtype: int64
You can see that s.hist()is essentially equivalent to s.value_counts().plot().
你可以看到它s.hist()本质上等同于s.value_counts().plot().
If it was of floats an awful hacky solution could be to use groupby:
如果它是浮动的,一个糟糕的解决方案可能是使用 groupby:
s.groupby(lambda i: np.floor(2*s[i]) / 2).count()
回答by Dan Allan
Since histand value_countsdon't use the Series' index, you may as well treat the Series like an ordinary array and use np.histogramdirectly. Then build a Series from the result.
既然hist并value_counts没有使用Series的索引,你不妨把Series当作普通数组np.histogram直接使用。然后根据结果构建一个系列。
In [4]: s = Series(randn(100))
In [5]: counts, bins = np.histogram(s)
In [6]: Series(counts, index=bins[:-1])
Out[6]:
-2.968575 1
-2.355032 4
-1.741488 5
-1.127944 26
-0.514401 23
0.099143 23
0.712686 12
1.326230 5
1.939773 0
2.553317 1
dtype: int32
This is a really convenient way to organize the result of a histogram for subsequent computation.
这是一种为后续计算组织直方图结果的非常方便的方法。
To index by the centerof each bin instead of the left edge, you could use bins[:-1] + np.diff(bins)/2.
要按每个 bin的中心而不是左边缘进行索引,您可以使用bins[:-1] + np.diff(bins)/2.
回答by IanS
If you know the number of bins you want, you can use pandas' cutfunction, which is now accessible via value_counts. Using the same random example:
如果你知道你想要的 bin 数量,你可以使用 pandas 的cut函数,现在可以通过value_counts. 使用相同的随机示例:
s = pd.Series(np.random.randn(100))
s.value_counts(bins=5)
Out[55]:
(-0.512, 0.311] 40
(0.311, 1.133] 25
(-1.335, -0.512] 14
(1.133, 1.956] 13
(-2.161, -1.335] 8

