Python Pandas：如何确定数据集的分布？

Question

提问by UserYmY

This is my dataset with two columns of NS and count.

这是我的数据集，包含两列 NS 和计数。

    NS                                                count
0   ns18.dnsdhs.com.                                  1494
1   ns0.relaix.net.                                   1835
2   ns2.techlineindia.com.                            383
3   ns2.microwebsys.com.                              1263
4   ns2.holy-grail-body-transformation-program.com.   1
5   ns2.chavano.com.                                  1
6   ns1.x10host.ml.                                   17
7   ns1.amwebaz.info.                                 48
8   ns2.guacirachocolates.com.br.                     1
9   ns1.clicktodollars.com.                           2

Now I would like to see how many NSs have the same count by plotting it. My own guess is that I can use histogram to see that but I am not sure how. Can anyone help?

现在我想通过绘制它来看看有多少 NS 具有相同的计数。我自己的猜测是我可以使用直方图来查看，但我不确定如何。任何人都可以帮忙吗？

Answer 1

回答by will

From your comment, I'm guessing your data table is actually much longer, and you want to see the distribution of name server counts(whatever count is here).

从您的评论来看，我猜您的数据表实际上要长得多，而且您想查看名称服务器的分布counts（无论这里有多少）。

I think you should just be able to do this:

我认为你应该能够做到这一点：

df.hist(column="count")

And you'll get what you want. IF that is what you want.

你会得到你想要的。如果那是你想要的。

pandas has decent documentation for all of it's functions though, and histograms are described here.

不过，pandas 对它的所有功能都有不错的文档，这里描述了直方图。

If you actually want to see "how many have the same count", rather than a representation of the disribution, then you'll either need to set the binskwarg to be df["count"].max()-df["count"].min()- or do as you said and count the number of times you get each countand then create a bar chart.

如果您真的想查看“有多少具有相同的计数”，而不是分布的表示，那么您要么需要将binskwarg 设置为df["count"].max()-df["count"].min()- 或者按照您所说的进行并计算您获得每个的次数count然后创建一个条形图。

Maybe something like:

也许是这样的：

from collections import Counter
counts = Counter()
for count in df["count"]:
  counts[count] += 1

print counts

An alternative, and cleaner approach, which i completely missed and wwii pointed out below, is just to use the standard constructor of Counter:

另一种更简洁的方法，我完全错过了，二战在下面指出的，只是使用标准的构造函数Counter：

count_counter = Counter(df['count'])

Python Pandas：如何确定数据集的分布？

提问by UserYmY

回答by will

相关推荐

最近更新

标签

Python Pandas：如何确定数据集的分布？

提问by UserYmY

回答by will

相关推荐

Pandas - 是否可以在没有quotechar 的情况下读取_csv？

pandas 创建一个空的 MultiIndex

pandas 使用熊猫将多个数据帧合并为一个

Pandas：按月汇总每个子组

相关推荐

最近更新

标签