Pandas groupby 和聚合输出应包括所有原始列(包括未聚合的列)

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/47360510/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 04:47:09  来源:igfitidea点击:

Pandas groupby and aggregation output should include all the original columns (including the ones not aggregated on)

pythonpandasdataframegroup-bypandas-groupby

提问by Growler

I have the following data frame and want to:

我有以下数据框并且想要:

  • Group records by month
  • Sum QTY_SOLDand NET_AMTof each unique UPC_ID(per month)
  • Include the rest of the columns as well in the resulting dataframe
  • 分组记录 month
  • 点心QTY_SOLDNET_AMT每一个独特的UPC_ID(每月)
  • 在结果数据框中也包括其余的列

The way I thought I can do this is 1st: create a monthcolumn to aggregate the D_DATES, then sum QTY_SOLDby UPC_ID.

我认为我可以做到这一点的方法是第一:创建一个month列来聚合D_DATES,然后QTY_SOLD通过求和UPC_ID

Script:

脚本:

# Convert date to date time object
df['D_DATE'] = pd.to_datetime(df['D_DATE'])

# Create aggregated months column
df['month'] = df['D_DATE'].apply(dt.date.strftime, args=('%Y.%m',))

# Group by month and sum up quantity sold by UPC_ID
df = df.groupby(['month', 'UPC_ID'])['QTY_SOLD'].sum()


Current data frame:

当前数据框:

UPC_ID | UPC_DSC | D_DATE | QTY_SOLD | NET_AMT
----------------------------------------------
111      desc1    2/26/2017   2         10 (2 x )
222      desc2    2/26/2017   3         15
333      desc3    2/26/2017   1         4
111      desc1    3/1/2017    1         5
111      desc1    3/3/2017    4         20

Desired Output:

期望输出:

MONTH | UPC_ID | QTY_SOLD | NET_AMT | UPC_DSC
----------------------------------------------
2017-2      111     2         10       etc...
2017-2      222     3         15
2017-2      333     1         4
2017-3      111     5         25

Actual Output:

实际输出:

MONTH | UPC_ID  
----------------------------------------------
2017-2      111     2
            222     3
            333     1
2017-3      111     5
...  

Questions:

问题:

  • How do I include the month for each row?
  • How do I include the rest of the columns of the dataframe?
  • How do also sum NET_AMTin addition to QTY_SOLD?
  • 我如何为每一行包含月份?
  • 如何包含数据框的其余列?
  • 如何总结也是NET_AMTQTY_SOLD

回答by cs95

aggwith a dictof functions

agg有一个dict功能

Create a dictof functions and pass it to agg. You'll also need as_index=Falseto prevent the group columns from becoming the index in your output.

创建一个dict函数并将其传递给agg. 您还需要as_index=False防止组列成为输出中的索引。

f = {'NET_AMT': 'sum', 'QTY_SOLD': 'sum', 'UPC_DSC': 'first'}
df.groupby(['month', 'UPC_ID'], as_index=False).agg(f)

     month  UPC_ID UPC_DSC  NET_AMT  QTY_SOLD
0  2017.02     111   desc1       10         2
1  2017.02     222   desc2       15         3
2  2017.02     333   desc3        4         1
3  2017.03     111   desc1       25         5


Blanket sum

毯子 sum

Just call sumwithout any column names. This handles the numeric columns. For UPC_DSC, you'll need to handle it separately.

sum无需任何列名即可调用。这处理数字列。对于UPC_DSC,您需要单独处理它。

g = df.groupby(['month', 'UPC_ID'])
i = g.sum()
j = g[['UPC_DSC']].first()

pd.concat([i, j], 1).reset_index()

     month  UPC_ID  QTY_SOLD  NET_AMT UPC_DSC
0  2017.02     111         2       10   desc1
1  2017.02     222         3       15   desc2
2  2017.02     333         1        4   desc3
3  2017.03     111         5       25   desc1

回答by YOBEN_S

I am thinking about this long time, thanks for your question push me to make it .By using aggand if...else

我考虑了很长时间,感谢您的问题促使我成功。通过使用aggif...else

df.groupby(['month', 'UPC_ID'],as_index=False).agg(lambda x : x.sum() if x.dtype=='int64' else x.head(1))
Out[1221]: 
   month  UPC_ID UPC_DSC     D_DATE  QTY_SOLD  NET_AMT
0      2     111   desc1 2017-02-26         2       10
1      2     222   desc2 2017-02-26         3       15
2      2     333   desc3 2017-02-26         1        4
3      3     111   desc1 2017-03-01         5       25