如何从 Python 中的字符串列表制作直方图?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/28418988/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to make a histogram from a list of strings in Python?
提问by Gray
I have a list of strings:
我有一个字符串列表:
a = ['a', 'a', 'a', 'a', 'b', 'b', 'c', 'c', 'c', 'd', 'e', 'e', 'e', 'e', 'e']
I want to make a histogram for displaying the frequency distribution of the letters. I can make a list that contains the count of each letter using following codes:
我想制作一个直方图来显示字母的频率分布。我可以使用以下代码制作一个包含每个字母计数的列表:
from itertools import groupby
b = [len(list(group)) for key, group in groupby(a)]
How do I make the histogram? I may have a million such elements in list a
.
如何制作直方图?我在 list 中可能有一百万个这样的元素a
。
采纳答案by notconfusing
Very easy with Pandas
.
很容易使用Pandas
。
import pandas
from collections import Counter
a = ['a', 'a', 'a', 'a', 'b', 'b', 'c', 'c', 'c', 'd', 'e', 'e', 'e', 'e', 'e']
letter_counts = Counter(a)
df = pandas.DataFrame.from_dict(letter_counts, orient='index')
df.plot(kind='bar')
Notice that Counter
is making a frequency count, so our plot type is 'bar'
not 'hist'
.
请注意,这Counter
是在进行频率计数,因此我们的绘图类型'bar'
不是'hist'
。
回答by tommyo
Check out matplotlib.pyplot.bar
. There is also numpy.histogram
which is more flexible if you want wider bins.
退房matplotlib.pyplot.bar
。numpy.histogram
如果您想要更宽的垃圾箱,还有哪个更灵活。
回答by Martijn Pieters
Rather than use groupby()
(which requires your input to be sorted), use collections.Counter()
; this doesn't have to create intermediary lists just to count inputs:
而不是使用groupby()
(这需要对您的输入进行排序),使用collections.Counter()
; 这不必创建中间列表来计算输入:
from collections import Counter
counts = Counter(a)
You haven't really specified what you consider to be a 'histogram'. Lets assume you wanted to do this on the terminal:
您还没有真正指定您认为是“直方图”的内容。假设您想在终端上执行此操作:
width = 120 # Adjust to desired width
longest_key = max(len(key) for key in counts)
graph_width = width - longest_key - 2
widest = counts.most_common(1)[0][1]
scale = graph_width / float(widest)
for key, size in sorted(counts.items()):
print('{}: {}'.format(key, int(size * scale) * '*'))
Demo:
演示:
>>> from collections import Counter
>>> a = ['a', 'a', 'a', 'a', 'b', 'b', 'c', 'c', 'c', 'd', 'e', 'e', 'e', 'e', 'e']
>>> counts = Counter(a)
>>> width = 120 # Adjust to desired width
>>> longest_key = max(len(key) for key in counts)
>>> graph_width = width - longest_key - 2
>>> widest = counts.most_common(1)[0][1]
>>> scale = graph_width / float(widest)
>>> for key, size in sorted(counts.items()):
... print('{}: {}'.format(key, int(size * scale) * '*'))
...
a: *********************************************************************************************
b: **********************************************
c: **********************************************************************
d: ***********************
e: *********************************************************************************************************************
More sophisticated tools are found in the numpy.histogram()
and matplotlib.pyplot.hist()
functions. These do the tallying for you, with matplotlib.pyplot.hist()
also providing you with graph output.
在numpy.histogram()
和matplotlib.pyplot.hist()
函数中可以找到更复杂的工具。这些为您计算,并matplotlib.pyplot.hist()
为您提供图形输出。
回答by Ramon Martinez
As @notconfusing pointed above this can be solved with Pandas and Counter. If for any reason you need to not use Pandasyou can get by with only matplotlib
using the function in the following code:
正如@notconfusing 上面指出的,这可以通过 Pandas 和 Counter 解决。如果出于任何原因您不需要使用 Pandas,您可以只matplotlib
使用以下代码中的函数:
from collections import Counter
import numpy as np
import matplotlib.pyplot as plt
a = ['a', 'a', 'a', 'a', 'b', 'b', 'c', 'c', 'c', 'd', 'e', 'e', 'e', 'e', 'e']
letter_counts = Counter(a)
def plot_bar_from_counter(counter, ax=None):
""""
This function creates a bar plot from a counter.
:param counter: This is a counter object, a dictionary with the item as the key
and the frequency as the value
:param ax: an axis of matplotlib
:return: the axis wit the object in it
"""
if ax is None:
fig = plt.figure()
ax = fig.add_subplot(111)
frequencies = counter.values()
names = counter.keys()
x_coordinates = np.arange(len(counter))
ax.bar(x_coordinates, frequencies, align='center')
ax.xaxis.set_major_locator(plt.FixedLocator(x_coordinates))
ax.xaxis.set_major_formatter(plt.FixedFormatter(names))
return ax
plot_bar_from_counter(letter_counts)
plt.show()
回答by Mitul Panchal
Simple and effective way to make character histrogram in python
在python中制作字符直方图的简单有效方法
import numpy as np
import matplotlib.pyplot as plt
from collections import Counter
a = []
count =0
d = dict()
filename = raw_input("Enter file name: ")
with open(filename,'r') as f:
for word in f:
for letter in word:
if letter not in d:
d[letter] = 1
else:
d[letter] +=1
num = Counter(d)
x = list(num.values())
y = list(num.keys())
x_coordinates = np.arange(len(num.keys()))
plt.bar(x_coordinates,x)
plt.xticks(x_coordinates,y)
plt.show()
print x,y
回答by drammock
回答by G M
Using numpy
使用 numpy
Using numpy 1.9 or greater:
使用 numpy 1.9 或更高版本:
import numpy as np
a = ['a', 'a', 'a', 'a', 'b', 'b', 'c', 'c', 'c', 'd', 'e', 'e', 'e', 'e', 'e']
labels, counts = np.unique(a,return_counts=True)
This can be plotted using:
这可以使用以下方式绘制:
import matplotlib.pyplot as plt
ticks = range(len(counts))
plt.bar(ticks,counts, align='center')
plt.xticks(ticks, labels)