Python:出现频率
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/22127769/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Python: Frequency of occurrences
提问by user40
I have list of integers and want to get frequency of each integer. This was discussed here
我有整数列表,想获得每个整数的频率。这是在这里讨论的
The problem is that approach I'm using gives me frequency of floating numbers when my data set consist of integers only. Why that happens and how I can get frequency of integers from my data?
问题是当我的数据集仅包含整数时,我使用的方法给了我浮点数的频率。为什么会发生这种情况以及如何从我的数据中获取整数的频率?
I'm using pyplot.histogram to plot a histogram with frequency of occurrences
我正在使用 pyplot.histogram 绘制出现频率的直方图
import numpy as np
import matplotlib.pyplot as plt
from numpy import *
data = loadtxt('data.txt',dtype=int,usecols=(4,)) #loading 5th column of csv file into array named data.
plt.hist(data) #plotting the column as histogram
I'm getting the histogram, but I've noticed that if I "print" hist(data)
我得到了直方图,但我注意到如果我“打印” hist(data)
hist=np.histogram(data)
print hist(data)
I get this:
我明白了:
(array([ 2323, 16338, 1587, 212, 26, 14, 3, 2, 2, 2]),
array([ 1. , 2.8, 4.6, 6.4, 8.2, 10. , 11.8, 13.6, 15.4,
17.2, 19. ]))
Where the second array represent values and first array represent number of occurrences.
其中第二个数组表示值,第一个数组表示出现次数。
In my data set all values are integers, how that happens that second array have floating numbers and how should I get frequency of integers?
在我的数据集中,所有值都是整数,第二个数组有浮点数是如何发生的,我应该如何获得整数的频率?
UPDATE:
更新:
This solves the problem, thank you Lev for the reply.
这解决了问题,谢谢Lev的回复。
plt.hist(data, bins=np.arange(data.min(), data.max()+1))
To avoid creating a new question how I can plot columns "in the middle" for each integer? Say, I want column for integer 3 take space between 2.5 and 3.5 not between 3 and 4.
为了避免创建一个新问题,我如何为每个整数在“中间”绘制列?说,我希望整数 3 的列在 2.5 和 3.5 之间而不是在 3 和 4 之间。


采纳答案by Lev Levitsky
If you don't specify what bins to use, np.histogramand pyplot.histwill use a default setting, which is to use 10 equal bins. The left border of the 1st bin is the smallest value and the right border of the last bin is the largest.
如果没有指定哪些BIN与使用,np.histogram并且pyplot.hist将使用默认设置,这是用10个相等的垃圾箱。第一个 bin 的左边界是最小值,最后一个 bin 的右边界最大。
This is why the bin borders are floating point numbers. You can use the binskeyword arguments to enforce another choice of bins, e.g.:
这就是 bin 边界是浮点数的原因。您可以使用bins关键字参数来强制选择另一个 bins,例如:
plt.hist(data, bins=np.arange(data.min(), data.max()+1))
Edit:the easiest way to shift all bins to the left is probably just to subtract 0.5 from all bin borders:
编辑:将所有 bin 向左移动的最简单方法可能只是从所有 bin 边界中减去 0.5:
plt.hist(data, bins=np.arange(data.min(), data.max()+1)-0.5)
Another way to achieve the same effect (not equivalent if non-integers are present):
实现相同效果的另一种方法(如果存在非整数则不等效):
plt.hist(data, bins=np.arange(data.min(), data.max()+1), align='left')
回答by Ondro
You can use groupbyfrom itertoolsas shown in How to count the frequency of the elements in a list?
您可以使用groupbyfromitertools如如何计算列表中元素的频率?
import numpy as np
from itertools import groupby
freq = {key:len(list(group)) for key, group in groupby(np.sort(data))}
回答by RK1
(Late to the party, just thought I would add a seabornimplementation)
(晚会,只是想我会添加一个seaborn实现)
Seaborn Implementation of the above question:
以上问题的Seaborn实现:
seaborn.__version__ = 0.9.0at time of writing.
seaborn.__version__ = 0.9.0在写作的时候。
Load the libraries and setup mock data.
加载库并设置模拟数据。
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
data = np.array([3]*10 + [5]*20 + [7]*5 + [9]*27 + [11]*2)
Plot the data using seaborn.distplot:
使用seaborn.distplot以下方法绘制数据:
Using specified bins, calculated as per the above question.
使用指定的 bin,根据上述问题计算。
sns.distplot(data,bins=np.arange(data.min(), data.max()+1),kde=False,hist_kws={"align" : "left"})
plt.show()
Trying numpybuilt-in binning methods
尝试numpy内置分箱方法
I used the doanebinning method below, which produced integer bins, migth be worth trying out the standard binning methodsfrom numpy.histogram_bin_edgesas this is how matplotlib.hist()bins the data.
我使用了doane下面的分箱方法,它产生了整数分箱,可能值得尝试标准分箱方法,numpy.histogram_bin_edges因为这是matplotlib.hist()对数据进行分箱的方式。
sns.distplot(data,bins="doane",kde=False,hist_kws={"align" : "left"})
plt.show()
Produces the below Histogram:
产生以下直方图:

