pandas 熊猫数据框的直方图
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/28822698/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Histogram of a pandas dataframe
提问by ca_san
I couldn't find anywhere on the site a similar question.
我在网站上的任何地方都找不到类似的问题。
I have a fairly large file, with over 100000 lines and I read it using pandas:
我有一个相当大的文件,超过 100000 行,我使用 Pandas 读取它:
df = pd.read_excel("somefile.xls",index_col='Offense Type')
ended up with a dataframe consisting of the first column (the index column) and another column, 'Offense_type' and 'Hour' respectively.
最后得到一个数据框,分别由第一列(索引列)和另一列 'Offense_type' 和 'Hour' 组成。
'Offense Type' consists of a series of "cathegories" say cat1, cat2, cat3, etc... 'Hour' consists of a series of integer numbers between 1 and 24.
“进攻类型”由一系列“类别”组成,例如 cat1、cat2、cat3 等……“小时”由一系列 1 到 24 之间的整数组成。
What I would like to do is obtain a histogram of the ocurrences of each number in the dataframe (there aren't that many cathegories It's at most 10 of them)
我想要做的是获取数据框中每个数字出现的直方图(没有那么多类别,最多 10 个)
Here's an ASCII representation of what I want to get"
这是我想要得到的 ASCII 表示”
(the x's represent the bars in the histogram, they will surely be at a much higher value than 1,2 or 3)
(x 代表直方图中的条形,它们的值肯定会比 1,2 或 3 高得多)
x x # And so on
x x x x x x #
x x x x x x x #
1 2 11 20 5 8 18 #
Cat1 Cat2 #
But i'm getting a single barplot for every line in df using:
但是我使用以下命令为 df 中的每一行获取一个条形图:
df.plot(kind='bar')
which is basically unreadable:
这基本上是不可读的:


I've also tried with the hist() and Histogram() function with no luck.
我也尝试过 hist() 和 Histogram() 函数,但没有成功。
Here's some sample data:
以下是一些示例数据:


回答by ca_san
After a long night, I got the answer since every event was ocurring only once I added an extra column in the file with the number one and then indexed the dataframe by this:
经过一个漫长的夜晚,我得到了答案,因为每个事件只发生一次,我在文件中添加了一个额外的列,然后用数字为第一列索引数据框:
df = pd.read_excel("somefile.xls",index_col='Numberone')
And then simply tried this:
然后简单地尝试这个:
df.hist(by=df['Offense Type'])
finally getting exactly what I wanted
终于得到了我想要的

