Python/Pandas DataFrame 中的频率图
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/26476668/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Frequency plot in Python/Pandas DataFrame
提问by SMU
I have a parsed very large dataframe with some values like this and several columns:
我有一个解析过的非常大的数据框,其中包含一些像这样的值和几列:
Name Age Points ...
XYZ 42 32pts ...
ABC 41 32pts ...
DEF 32 35pts
GHI 52 35pts
JHK 72 35pts
MNU 43 42pts
LKT 32 32pts
LKI 42 42pts
JHI 42 35pts
JHP 42 42pts
XXX 42 42pts
XYY 42 35pts
I have imported numpy and matplotlib.
我已经导入了 numpy 和 matplotlib。
I need to plot a graph of the number of times the value in the column 'Points' occurs. I dont need to have any bins for the plotting. So it is more of a plot to see how many times the same score of points occurs over a large dataset.
我需要绘制“点”列中值出现的次数的图形。我不需要任何用于绘图的垃圾箱。因此,它更像是一个图,可以查看在大型数据集上出现相同分数的次数。
So essentially the bar plot (or histogram, if you can call it that) should show that 32pts occurs thrice, 35pts occurs 5 times and 42pts occurs 4 times. If I can plot the values in sorted order, all the more better. I have tried df.hist() but it is not working for me. Any clues? Thanks.
所以基本上条形图(或直方图,如果你可以这样称呼它)应该显示 32pts 出现三次,35pts 出现 5 次,42pts 出现 4 次。如果我可以按排序顺序绘制值,那就更好了。我试过 df.hist() 但它对我不起作用。有什么线索吗?谢谢。
回答by Paul H
Just plot the results of the dataframe's value_countmethod directly:
只需直接绘制数据框value_count方法的结果:
import matplotlib.pyplot as plt
import pandas
data = load_my_data()
fig, ax = plt.subplots()
data['Points'].value_counts().plot(ax=ax, kind='bar')
If you want to remove the string 'pnts' from all of the elements in your column, you can do something like this:
如果要从列中的所有元素中删除字符串 'pnts',可以执行以下操作:
df['points_int'] = df['Points'].str.replace('pnts', '').astype(int)
That assumes they all end with 'pnts'. If it varying from line to line, you need to look into regular expressions like this: Split columns using pandas
假设它们都以“pnts”结尾。如果它从一行到另一行都不同,您需要查看这样的正则表达式: Split columns using pandas
And the official docs: http://pandas.pydata.org/pandas-docs/stable/text.html#text-string-methods
和官方文档:http: //pandas.pydata.org/pandas-docs/stable/text.html#text-string-methods
回答by Yogesh Kumar
Seaborn package has countplotfunction which can be made use of to make frequency plot:
Seaborn 包具有countplot可用于制作频率图的功能:
import seaborn as sns
ax = sns.countplot(x="Points",data=df)

