pandas 使用python数据帧中的两列（值、计数）绘制直方图

Question

提问by zingsy

I have a dataframe having multiple columns in pairs: if one column is values then the adjacent column is the corresponding counts. I want to plot a histogram using values as xvariable and counts as the frequency.

我有一个多列成对的数据框：如果一列是值，那么相邻的列是相应的计数。我想使用值作为x变量绘制直方图并计数为频率。

For example, I have the following columns:

例如，我有以下列：

   Age    Counts
   60     1204
   45      700
   21      400
   .       .
   .       .
   34       56
   10      150

I want my code to bin the Agevalues in ten-year intervals between the maximum and minimum values and get the cumulative frequencies for each interval from the Countscolumn and then plot a histogram. Is there a way to do this using matplotlib ?

我希望我的代码Age在最大值和最小值之间的十年间隔内对值进行分箱，并从Counts列中获取每个间隔的累积频率，然后绘制直方图。有没有办法使用 matplotlib 做到这一点？

I have tried the following but in vain:

我尝试了以下但徒劳无功：

patient_dets.plot(x='PatientAge', y='PatientAgecounts', kind='hist')

(patient_dets is the dataframe with 'PatientAge' and 'PatientAgecounts' as columns)

（patient_dets 是数据框，以“PatientAge”和“PatientAgecounts”为列）

Answer 1

回答by jezrael

I think you need Series.plot.bar:

我认为你需要Series.plot.bar：

patient_dets.set_index('PatientAge')['PatientAgecounts'].plot.bar()

If need bins, one possible solution is with pd.cut:

如果需要垃圾箱，一种可能的解决方案是pd.cut：

#helper df with min and max ages
df1 = pd.DataFrame({'G':['14 yo and younger','15-19','20-24','25-29','30-34',
                         '35-39','40-44','45-49','50-54','55-59','60-64','65+'], 
                     'Min':[0, 15,20,25,30,35,40,45,50,55,60,65], 
                     'Max':[14,19,24,29,34,39,44,49,54,59,64,120]})

print (df1)
                    G  Max  Min
0   14 yo and younger   14    0
1               15-19   19   15
2               20-24   24   20
3               25-29   29   25
4               30-34   34   30
5               35-39   39   35
6               40-44   44   40
7               45-49   49   45
8               50-54   54   50
9               55-59   59   55
10              60-64   64   60
11                65+  120   65

cutoff = np.hstack([np.array(df1.Min[0]), df1.Max.values])
labels = df1.G.values

patient_dets['Groups'] = pd.cut(patient_dets.PatientAge, bins=cutoff, labels=labels, right=True, include_lowest=True)
print (patient_dets)
   PatientAge  PatientAgecounts             Groups
0          60              1204              60-64
1          45               700              45-49
2          21               400              20-24
3          34                56              30-34
4          10               150  14 yo and younger

patient_dets.groupby(['PatientAge','Groups'])['PatientAgecounts'].sum().plot.bar()

Answer 2

回答by Laura

You can use pd.cut() to bin your data, and then plot using the function plot('bar')

您可以使用 pd.cut() 对数据进行合并，然后使用函数 plot('bar') 进行绘图

import numpy as np
nBins = 10
my_bins = np.linspace(patient_dets.Age.min(),patient_dets.Age.max(),nBins)

patient_dets.groupby(pd.cut(patient_dets.Age, bins =nBins)).sum()['Counts'].plot('bar')

pandas 使用python数据帧中的两列（值、计数）绘制直方图

提问by zingsy

回答by jezrael

回答by Laura

相关推荐

最近更新

标签

pandas 使用python数据帧中的两列（值、计数）绘制直方图

提问by zingsy

回答by jezrael

回答by Laura

相关推荐

将 Pandas 系列作为列附加到 DataFrame

Python Pandas：根据时间范围删除时间序列的行

Python Pandas 线性回归 groupby

Python & Pandas：如何查询列表类型的列是否包含某些内容？

相关推荐

最近更新

标签