pandas 熊猫数据框的直方图

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/28822698/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-13 23:00:20  来源:igfitidea点击:

Histogram of a pandas dataframe

pandashistogramdataframe

提问by ca_san

I couldn't find anywhere on the site a similar question.

我在网站上的任何地方都找不到类似的问题。

I have a fairly large file, with over 100000 lines and I read it using pandas:

我有一个相当大的文件,超过 100000 行,我使用 Pandas 读取它:

df = pd.read_excel("somefile.xls",index_col='Offense Type')

ended up with a dataframe consisting of the first column (the index column) and another column, 'Offense_type' and 'Hour' respectively.

最后得到一个数据框,分别由第一列(索引列)和另一列 'Offense_type' 和 'Hour' 组成。

'Offense Type' consists of a series of "cathegories" say cat1, cat2, cat3, etc... 'Hour' consists of a series of integer numbers between 1 and 24.

“进攻类型”由一系列“类别”组成,例如 cat1、cat2、cat3 等……“小时”由一系列 1 到 24 之间的整数组成。

What I would like to do is obtain a histogram of the ocurrences of each number in the dataframe (there aren't that many cathegories It's at most 10 of them)

我想要做的是获取数据框中每个数字出现的直方图(没有那么多类别,最多 10 个)

Here's an ASCII representation of what I want to get"

这是我想要得到的 ASCII 表示”

(the x's represent the bars in the histogram, they will surely be at a much higher value than 1,2 or 3)

(x 代表直方图中的条形,它们的值肯定会比 1,2 或 3 高得多)

   x        x         # And so on
 x x  x     x x  x    #
 x x  x  x  x x  x    #
 1 2 11 20  5 8 18    #
   Cat1      Cat2     #

But i'm getting a single barplot for every line in df using:

但是我使用以下命令为 df 中的每一行获取一个条形图:

df.plot(kind='bar')

which is basically unreadable:

这基本上是不可读的:

histogram_of_dataframe

histogram_of_dataframe

I've also tried with the hist() and Histogram() function with no luck.

我也尝试过 hist() 和 Histogram() 函数,但没有成功。

Here's some sample data:

以下是一些示例数据:

sample_data

样本数据

回答by ca_san

After a long night, I got the answer since every event was ocurring only once I added an extra column in the file with the number one and then indexed the dataframe by this:

经过一个漫长的夜晚,我得到了答案,因为每个事件只发生一次,我在文件中添加了一个额外的列,然后用数字为第一列索引数据框:

df = pd.read_excel("somefile.xls",index_col='Numberone')

And then simply tried this:

然后简单地尝试这个:

df.hist(by=df['Offense Type'])

finally getting exactly what I wanted

终于得到了我想要的