pandas - 绘制列变量的分布

Question

提问by Mike S

I'm trying to visualize some data, but I'm not very experienced with the subject, and am having trouble finding the best bay to get what I'm looking for. I've searched around and found similar questions, but nothing that'll answer exactly what I want, so hopefully I'm not duplicating a common question.

我正在尝试将一些数据可视化，但我对这个主题不是很熟悉，并且无法找到最好的海湾来获得我正在寻找的东西。我四处搜索并发现了类似的问题，但没有什么能完全回答我想要的，所以希望我不会重复一个常见的问题。

Anyway, I have a DataFrame with a column for patient_id(and others, but this is the relevant one. For example:

无论如何，我有一个 DataFrame 有一列用于patient_id（和其他人，但这是相关的。例如：

   patient_id  other_stuff
0      000001          ...
1      000001          ...
2      000001          ...
3      000002          ...
4      000003          ...
5      000003          ...
6      000004          ...
etc

Where each row represents a specific episode that patient had. I want to plot the distribution in which the x axis is the number of episodes a patient had, and the y axis is the number of patients that have had said number of episodes. For example, based on the above, there's one patient with three episodes, one patient with two episodes, and two patients with one episode each, i.e. x = [1, 2, 3], y = [2, 1, 1]. Currently, I do the following:

其中每一行代表患者发生的特定事件。我想绘制分布，其中 x 轴是患者的发作次数，y 轴是出现所述发作次数的患者人数。例如，基于上述，有一个患者有 3 次发作，一名患者有 2 次发作，还有两名患者各有一次发作，即x = [1, 2, 3], y = [2, 1, 1]。目前，我执行以下操作：

episode_count_distribution = (
    patients.patient_id
    .value_counts() # the number of rows for each patient_id (i.e. episodes per patient)
    .value_counts() # the number of patients for each possible row count above (i.e. distribution of episodes per patient)
    .sort_index()
)
episode_count_distribution.plot()

This method does what I want, but strikes me as a bit opaque and hard to follow, so I'm wondering if there's a better way.

这种方法可以满足我的要求，但让我觉得有点不透明且难以遵循，所以我想知道是否有更好的方法。

Answer 1

回答by Ami Tavory

You might be looking for something like

你可能正在寻找类似的东西

df.procedure_id.groupby(df.patient_id).nunique().hist();

Explanation:

解释：

df.procedure_id.groupby(df.patient_id).nunique()finds the number of unique procedures per patient.
hist()plots a histogram.

df.procedure_id.groupby(df.patient_id).nunique()查找每位患者的唯一程序数。
hist()绘制直方图。

Example

例子

df = pd.DataFrame({'procedure_id': [3, 2, 3, 2, 4, 1, 2, 3], 'patient_id': [1, 2, 3, 2, 1, 2, 3, 2]})
df.procedure_id.groupby(df.patient_id).nunique().hist();
xlabel('num patients');
ylabel('num treatments');

pandas - 绘制列变量的分布

提问by Mike S

回答by Ami Tavory

相关推荐

最近更新

标签

pandas - 绘制列变量的分布

提问by Mike S

回答by Ami Tavory

相关推荐

如何在 Sublime Text 3 上安装 Pandas

根据多个条件向 Python Pandas DataFrame 添加新列

pandas 'DataFrame' 对象没有属性 'isna'

pandas 制作单行数据框

相关推荐

最近更新

标签