Python Pandas GroupBy 的绘图结果

Question

提问by Maxim Zaslavsky

I'm starting to learn Pandas and am trying to find the most Pythonic (or panda-thonic?) ways to do certain tasks.

我开始学习 Pandas 并试图找到最 Pythonic（或 panda-thonic？）的方法来完成某些任务。

Suppose we have a DataFrame with columns A, B, and C.

假设我们有一个包含 A、B 和 C 列的 DataFrame。

Column A contains boolean values: each row's A value is either true or false.
Column B has some important values we want to plot.

A 列包含布尔值：每一行的 A 值要么为真，要么为假。
B 列有一些我们想要绘制的重要值。

What we want to discover is the subtle distinctions between B values for rows that have A set to false, vs. B values for rows that have A is true.

我们想要发现的是 A 设置为 false 的行的 B 值与 A 设置为 true 的行的 B 值之间的细微区别。

In other words, how can I group by the value of column A (either true or false), then plot the values of column B for both groups on the same graph?The two datasets should be colored differently to be able to distinguish the points.

换句话说，我如何按 A 列的值（真或假）进行分组，然后在同一张图上为两组绘制 B 列的值？这两个数据集应该用不同的颜色来区分点。

Next, let's add another feature to this program: before graphing, we want to compute another value for each row and store it in column D. This value is the mean of all data stored in B for the entire five minutes before a record - but we only include rows that have the same boolean value stored in A.

接下来，让我们为该程序添加另一个功能：在绘制图形之前，我们要为每一行计算另一个值并将其存储在 D 列中。该值是在记录之前整个五分钟内存储在 B 中的所有数据的平均值 - 但是我们只包含存储在 A 中的具有相同布尔值的行。

In other words, if I have a row where A=Trueand time=t, I want to compute a value for column D that is the mean of B for all records from time t-5to tthat have the same A=True.

换句话说，如果我有一行 whereA=True和time=t，我想计算列 D 的值，该值是从时间t-5到t具有相同的所有记录的 B 的平均值A=True。

In this case, how can we execute the groupby on values of A, then apply this computation to each individual group, and finally plot the D values for the two groups?

在这种情况下，我们如何对 A 的值执行 groupby，然后将此计算应用于每个单独的组，最后绘制这两个组的 D 值？

Answer 1

采纳答案by unutbu

I think @herrfz hit all the high points. I'll just flesh out the details:

我认为@herrfz 达到了所有的高点。我将充实细节：

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

sin = np.sin
cos = np.cos
pi = np.pi
N = 100

x = np.linspace(0, pi, N)
a = sin(x)
b = cos(x)

df = pd.DataFrame({
    'A': [True]*N + [False]*N,
    'B': np.hstack((a,b))
    })

for key, grp in df.groupby(['A']):
    plt.plot(grp['B'], label=key)
    grp['D'] = pd.rolling_mean(grp['B'], window=5)    
    plt.plot(grp['D'], label='rolling ({k})'.format(k=key))
plt.legend(loc='best')    
plt.show()

enter image description here

在此处输入图片说明

Python Pandas GroupBy 的绘图结果

提问by Maxim Zaslavsky

采纳答案by unutbu

相关推荐

最近更新

标签

Python Pandas GroupBy 的绘图结果

提问by Maxim Zaslavsky

采纳答案by unutbu

相关推荐

如何“更新”或“覆盖”python列表

Python 在 Django Admin 中更改密码

Python 如何检查包含 NaN 的列表

使用python和imaplib登录gmail失败

相关推荐

最近更新

标签