Python/Pandas DataFrame 中的频率图

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/26476668/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 00:31:57  来源:igfitidea点击:

Frequency plot in Python/Pandas DataFrame

pythonnumpymatplotlibpandas

提问by SMU

I have a parsed very large dataframe with some values like this and several columns:

我有一个解析过的非常大的数据框,其中包含一些像这样的值和几列:

Name Age Points ...
XYZ  42  32pts  ...
ABC  41  32pts  ...
DEF  32  35pts
GHI  52  35pts
JHK  72  35pts
MNU  43  42pts
LKT  32  32pts
LKI  42  42pts
JHI  42  35pts
JHP  42  42pts
XXX  42  42pts
XYY  42  35pts

I have imported numpy and matplotlib.

我已经导入了 numpy 和 matplotlib。

I need to plot a graph of the number of times the value in the column 'Points' occurs. I dont need to have any bins for the plotting. So it is more of a plot to see how many times the same score of points occurs over a large dataset.

我需要绘制“点”列中值出现的次数的图形。我不需要任何用于绘图的垃圾箱。因此,它更像是一个图,可以查看在大型数据集上出现相同分数的次数。

So essentially the bar plot (or histogram, if you can call it that) should show that 32pts occurs thrice, 35pts occurs 5 times and 42pts occurs 4 times. If I can plot the values in sorted order, all the more better. I have tried df.hist() but it is not working for me. Any clues? Thanks.

所以基本上条形图(或直方图,如果你可以这样称呼它)应该显示 32pts 出现三次,35pts 出现 5 次,42pts 出现 4 次。如果我可以按排序顺序绘制值,那就更好了。我试过 df.hist() 但它对我不起作用。有什么线索吗?谢谢。

回答by Paul H

Just plot the results of the dataframe's value_countmethod directly:

只需直接绘制数据框value_count方法的结果:

import matplotlib.pyplot as plt
import pandas

data = load_my_data()
fig, ax = plt.subplots()
data['Points'].value_counts().plot(ax=ax, kind='bar')

If you want to remove the string 'pnts' from all of the elements in your column, you can do something like this:

如果要从列中的所有元素中删除字符串 'pnts',可以执行以下操作:

df['points_int'] = df['Points'].str.replace('pnts', '').astype(int)

That assumes they all end with 'pnts'. If it varying from line to line, you need to look into regular expressions like this: Split columns using pandas

假设它们都以“pnts”结尾。如果它从一行到另一行都不同,您需要查看这样的正则表达式: Split columns using pandas

And the official docs: http://pandas.pydata.org/pandas-docs/stable/text.html#text-string-methods

和官方文档:http: //pandas.pydata.org/pandas-docs/stable/text.html#text-string-methods

回答by Yogesh Kumar

Seaborn package has countplotfunction which can be made use of to make frequency plot:

Seaborn 包具有countplot可用于制作频率图的功能:

import seaborn as sns

ax = sns.countplot(x="Points",data=df)