Pandas groupby 意味着 - 进入数据帧？

Question

提问by Craig

Say my data looks like this:

假设我的数据如下所示：

date,name,id,dept,sale1,sale2,sale3,total_sale
1/1/17,John,50,Sales,50.0,60.0,70.0,180.0
1/1/17,Mike,21,Engg,43.0,55.0,2.0,100.0
1/1/17,Jane,99,Tech,90.0,80.0,70.0,240.0
1/2/17,John,50,Sales,60.0,70.0,80.0,210.0
1/2/17,Mike,21,Engg,53.0,65.0,12.0,130.0
1/2/17,Jane,99,Tech,100.0,90.0,80.0,270.0
1/3/17,John,50,Sales,40.0,50.0,60.0,150.0
1/3/17,Mike,21,Engg,53.0,55.0,12.0,120.0
1/3/17,Jane,99,Tech,80.0,70.0,60.0,210.0

I want a new column average, which is the average of total_salefor each name,id,depttuple

我想要一个新列average，它是total_sale每个name,id,dept元组的平均值

I tried

我试过

df.groupby(['name', 'id', 'dept'])['total_sale'].mean()

And this does return a series with the mean:

这确实返回了一个具有平均值的系列：

name  id  dept 
Jane  99  Tech     240.000000
John  50  Sales    180.000000
Mike  21  Engg     116.666667
Name: total_sale, dtype: float64

but how would I reference the data? The series is a one dimensional one of shape (3,). Ideally I would like this put back into a dataframe with proper columns so I can reference properly by name/id/dept.

但我将如何引用数据？该系列是形状 (3,) 的一维系列。理想情况下，我希望将其放回具有适当列的数据框中，以便我可以通过name/id/dept.

Answer 1

回答by Nathan

If you call .reset_index()on the series that you have, it will get you a dataframe like you want (each level of the index will be converted into a column):

如果您调用.reset_index()您拥有的系列，它将为您提供您想要的数据框（索引的每个级别都将转换为一列）：

df.groupby(['name', 'id', 'dept'])['total_sale'].mean().reset_index()

EDIT: to respond to the OP's comment, adding this column back to your original dataframe is a little trickier. You don't have the same number of rows as in the original dataframe, so you can't assign it as a new column yet. However, if you set the index the same, pandasis smart and will fill in the values properly for you. Try this:

编辑：为了回应 OP 的评论，将此列添加回原始数据框有点棘手。您的行数与原始数据框中的行数不同，因此您还不能将其分配为新列。但是，如果您将索引设置为相同，pandas则很聪明，并且会为您正确填写值。尝试这个：

cols = ['date','name','id','dept','sale1','sale2','sale3','total_sale']
data = [
['1/1/17', 'John', 50, 'Sales', 50.0, 60.0, 70.0, 180.0],
['1/1/17', 'Mike', 21, 'Engg', 43.0, 55.0, 2.0, 100.0],
['1/1/17', 'Jane', 99, 'Tech', 90.0, 80.0, 70.0, 240.0],
['1/2/17', 'John', 50, 'Sales', 60.0, 70.0, 80.0, 210.0],
['1/2/17', 'Mike', 21, 'Engg', 53.0, 65.0, 12.0, 130.0],
['1/2/17', 'Jane', 99, 'Tech', 100.0, 90.0, 80.0, 270.0],
['1/3/17', 'John', 50, 'Sales', 40.0, 50.0, 60.0, 150.0],
['1/3/17', 'Mike', 21, 'Engg', 53.0, 55.0, 12.0, 120.0],
['1/3/17', 'Jane', 99, 'Tech', 80.0, 70.0, 60.0, 210.0]
]
df = pd.DataFrame(data, columns=cols)

mean_col = df.groupby(['name', 'id', 'dept'])['total_sale'].mean() # don't reset the index!
df = df.set_index(['name', 'id', 'dept']) # make the same index here
df['mean_col'] = mean_col
df = df.reset_index() # to take the hierarchical index off again

Answer 2

回答by A.Kot

You are very close. You simply need to add a set of brackets around [['total_sale']]to tell python to select as a dataframe and not a series:

你很亲近。您只需要在周围添加一组括号[['total_sale']]来告诉 python 选择作为数据框而不是系列：

df.groupby(['name', 'id', 'dept'])[['total_sale']].mean()

If you want all columns:

如果您想要所有列：

df.groupby(['name', 'id', 'dept'], as_index=False).mean()[['name', 'id', 'dept', 'total_sale']]

Answer 3

回答by YOBEN_S

Adding to_frame

添加 to_frame

df.groupby(['name', 'id', 'dept'])['total_sale'].mean().to_frame()

Answer 4

回答by Tahir Ahmad

The answer is in two lines of code:

答案在两行代码中：

The first line creates the hierarchical frame.

第一行创建分层框架。

df_mean = df.groupby(['name', 'id', 'dept'])[['total_sale']].mean()

The second line converts it to a dataframe with four columns('name', 'id', 'dept', 'total_sale')

第二行将其转换为具有四列的数据框（'name', 'id', 'dept', 'total_sale'）

df_mean = df_mean.reset_index()

Pandas groupby 意味着 - 进入数据帧？

提问by Craig

回答by Nathan

回答by A.Kot

回答by YOBEN_S

回答by Tahir Ahmad

相关推荐

最近更新

标签

Pandas groupby 意味着 - 进入数据帧？

提问by Craig

回答by Nathan

回答by A.Kot

回答by YOBEN_S

回答by Tahir Ahmad

相关推荐

pandas 当数据帧中存在 NaN 时使用 astype 时出错

vba 使用范围的最后一列(F:LastColumn)

如何在 matplot 上绘制散点趋势线？Python-Pandas

vba VBA将文件从一个目录复制到另一个目录

相关推荐

最近更新

标签