Python Pandas:如何确定数据集的分布?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/28585367/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Python Pandas: How I can determine the distribution of my dataset?
提问by UserYmY
This is my dataset with two columns of NS and count.
这是我的数据集,包含两列 NS 和计数。
NS count
0 ns18.dnsdhs.com. 1494
1 ns0.relaix.net. 1835
2 ns2.techlineindia.com. 383
3 ns2.microwebsys.com. 1263
4 ns2.holy-grail-body-transformation-program.com. 1
5 ns2.chavano.com. 1
6 ns1.x10host.ml. 17
7 ns1.amwebaz.info. 48
8 ns2.guacirachocolates.com.br. 1
9 ns1.clicktodollars.com. 2
Now I would like to see how many NSs have the same count by plotting it. My own guess is that I can use histogram to see that but I am not sure how. Can anyone help?
现在我想通过绘制它来看看有多少 NS 具有相同的计数。我自己的猜测是我可以使用直方图来查看,但我不确定如何。任何人都可以帮忙吗?
回答by will
From your comment, I'm guessing your data table is actually much longer, and you want to see the distribution of name server counts(whatever count is here).
从您的评论来看,我猜您的数据表实际上要长得多,而且您想查看名称服务器的分布counts(无论这里有多少)。
I think you should just be able to do this:
我认为你应该能够做到这一点:
df.hist(column="count")
And you'll get what you want. IF that is what you want.
你会得到你想要的。如果那是你想要的。
pandas has decent documentation for all of it's functions though, and histograms are described here.
不过,pandas 对它的所有功能都有不错的文档,这里描述了直方图。
If you actually want to see "how many have the same count", rather than a representation of the disribution, then you'll either need to set the binskwarg to be df["count"].max()-df["count"].min()- or do as you said and count the number of times you get each countand then create a bar chart.
如果您真的想查看“有多少具有相同的计数”,而不是分布的表示,那么您要么需要将binskwarg 设置为df["count"].max()-df["count"].min()- 或者按照您所说的进行并计算您获得每个的次数count然后创建一个条形图。
Maybe something like:
也许是这样的:
from collections import Counter
counts = Counter()
for count in df["count"]:
counts[count] += 1
print counts
An alternative, and cleaner approach, which i completely missed and wwii pointed out below, is just to use the standard constructor of Counter:
另一种更简洁的方法,我完全错过了,二战在下面指出的,只是使用标准的构造函数Counter:
count_counter = Counter(df['count'])

