pandas 使用python数据帧中的两列(值、计数)绘制直方图
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/41675931/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Plot histogram using two columns (values, counts) in python dataframe
提问by zingsy
I have a dataframe having multiple columns in pairs: if one column is values then the adjacent column is the corresponding counts. I want to plot a histogram using values as xvariable and counts as the frequency.
我有一个多列成对的数据框:如果一列是值,那么相邻的列是相应的计数。我想使用值作为x变量绘制直方图并计数为频率。
For example, I have the following columns:
例如,我有以下列:
Age Counts
60 1204
45 700
21 400
. .
. .
34 56
10 150
I want my code to bin the Age
values in ten-year intervals between the maximum and minimum values and get the cumulative frequencies for each interval from the Counts
column and then plot a histogram. Is there a way to do this using matplotlib ?
我希望我的代码Age
在最大值和最小值之间的十年间隔内对值进行分箱,并从Counts
列中获取每个间隔的累积频率,然后绘制直方图。有没有办法使用 matplotlib 做到这一点?
I have tried the following but in vain:
我尝试了以下但徒劳无功:
patient_dets.plot(x='PatientAge', y='PatientAgecounts', kind='hist')
(patient_dets is the dataframe with 'PatientAge' and 'PatientAgecounts' as columns)
(patient_dets 是数据框,以“PatientAge”和“PatientAgecounts”为列)
回答by jezrael
I think you need Series.plot.bar
:
我认为你需要Series.plot.bar
:
patient_dets.set_index('PatientAge')['PatientAgecounts'].plot.bar()
If need bins, one possible solution is with pd.cut
:
如果需要垃圾箱,一种可能的解决方案是pd.cut
:
#helper df with min and max ages
df1 = pd.DataFrame({'G':['14 yo and younger','15-19','20-24','25-29','30-34',
'35-39','40-44','45-49','50-54','55-59','60-64','65+'],
'Min':[0, 15,20,25,30,35,40,45,50,55,60,65],
'Max':[14,19,24,29,34,39,44,49,54,59,64,120]})
print (df1)
G Max Min
0 14 yo and younger 14 0
1 15-19 19 15
2 20-24 24 20
3 25-29 29 25
4 30-34 34 30
5 35-39 39 35
6 40-44 44 40
7 45-49 49 45
8 50-54 54 50
9 55-59 59 55
10 60-64 64 60
11 65+ 120 65
cutoff = np.hstack([np.array(df1.Min[0]), df1.Max.values])
labels = df1.G.values
patient_dets['Groups'] = pd.cut(patient_dets.PatientAge, bins=cutoff, labels=labels, right=True, include_lowest=True)
print (patient_dets)
PatientAge PatientAgecounts Groups
0 60 1204 60-64
1 45 700 45-49
2 21 400 20-24
3 34 56 30-34
4 10 150 14 yo and younger
patient_dets.groupby(['PatientAge','Groups'])['PatientAgecounts'].sum().plot.bar()
回答by Laura
You can use pd.cut() to bin your data, and then plot using the function plot('bar')
您可以使用 pd.cut() 对数据进行合并,然后使用函数 plot('bar') 进行绘图
import numpy as np
nBins = 10
my_bins = np.linspace(patient_dets.Age.min(),patient_dets.Age.max(),nBins)
patient_dets.groupby(pd.cut(patient_dets.Age, bins =nBins)).sum()['Counts'].plot('bar')