Python Pandas 数据透视表行小计

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/15570099/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-18 20:26:02  来源:igfitidea点击:

Pandas Pivot tables row subtotals

pythonpandaspivot-table

提问by balsagoth

I'm using Pandas 0.10.1

我正在使用 Pandas 0.10.1

Considering this Dataframe:

考虑这个数据框:

Date       State   City    SalesToday  SalesMTD  SalesYTD
20130320     stA    ctA            20       400      1000
20130320     stA    ctB            30       500      1100
20130320     stB    ctC            10       500       900
20130320     stB    ctD            40       200      1300
20130320     stC    ctF            30       300       800

How can i group subtotals per state?

我如何对每个州的小计进行分组?

State   City  SalesToday  SalesMTD  SalesYTD
  stA    ALL          50       900      2100
  stA    ctA          20       400      1000
  stA    ctB          30       500      1100

I tried with a pivot table but i only can have subtotals in columns

我尝试使用数据透视表,但我只能在列中有小计

table = pivot_table(df, values=['SalesToday', 'SalesMTD','SalesYTD'],\
                     rows=['State','City'], aggfunc=np.sum, margins=True)

I can achieve this on excel, with a pivot table.

我可以使用数据透视表在 excel 上实现这一点。

采纳答案by Wes McKinney

If you put State and City not both in the rows, you'll get separate margins. Reshape and you get the table you're after:

如果您不将 State 和 City 都放在行中,您将获得单独的边距。重塑,你会得到你想要的桌子:

In [10]: table = pivot_table(df, values=['SalesToday', 'SalesMTD','SalesYTD'],\
                     rows=['State'], cols=['City'], aggfunc=np.sum, margins=True)


In [11]: table.stack('City')
Out[11]: 
            SalesMTD  SalesToday  SalesYTD
State City                                
stA   All        900          50      2100
      ctA        400          20      1000
      ctB        500          30      1100
stB   All        700          50      2200
      ctC        500          10       900
      ctD        200          40      1300
stC   All        300          30       800
      ctF        300          30       800
All   All       1900         130      5100
      ctA        400          20      1000
      ctB        500          30      1100
      ctC        500          10       900
      ctD        200          40      1300
      ctF        300          30       800

I admit this isn't totally obvious.

我承认这并不完全显而易见。

回答by Rutger Kassies

You can get the summarized values by using groupby() on the State column.

您可以通过在 State 列上使用 groupby() 来获取汇总值。

Lets make some sample data first:

让我们先制作一些示例数据:

import pandas as pd
import StringIO

incsv = StringIO.StringIO("""Date,State,City,SalesToday,SalesMTD,SalesYTD
20130320,stA,ctA,20,400,1000
20130320,stA,ctB,30,500,1100
20130320,stB,ctC,10,500,900
20130320,stB,ctD,40,200,1300
20130320,stC,ctF,30,300,800""")

df = pd.read_csv(incsv, index_col=['Date'], parse_dates=True)

Then apply the groupby function and add a column City:

然后应用 groupby 函数并添加列 City:

dfsum = df.groupby('State', as_index=False).sum()
dfsum['City'] = 'All'

print dfsum

  State  SalesToday  SalesMTD  SalesYTD City
0   stA          50       900      2100  All
1   stB          50       700      2200  All
2   stC          30       300       800  All

We can append the original data to the summed df by using append:

我们可以使用 append 将原始数据附加到求和的 df 中:

dfsum.append(df).set_index(['State','City']).sort_index()

print dfsum

            SalesMTD  SalesToday  SalesYTD
State City                                
stA   All        900          50      2100
      ctA        400          20      1000
      ctB        500          30      1100
stB   All        700          50      2200
      ctC        500          10       900
      ctD        200          40      1300
stC   All        300          30       800
      ctF        300          30       800

I added the set_index and sort_index to make it look more like your example output, its not strictly necessary to get the results.

我添加了 set_index 和 sort_index 以使其看起来更像您的示例输出,它不是获得结果所必需的。

回答by Richard Mao

How about this one ?

这个怎么样 ?

table = pd.pivot_table(data, index=['State'],columns = ['City'],values=['SalesToday', 'SalesMTD','SalesYTD'],\
                      aggfunc=np.sum, margins=True)

enter image description here

在此处输入图片说明

回答by hs moon

I Think this subtotal example code is what you want(similar to excel subtotal)

我认为这个小计示例代码是你想要的(类似于 excel 小计)

I assume that you want group by columns A, B, C, D, than count column value of E

我假设您希望按 A、B、C、D 列分组,而不是计算 E 的列值

main_df.groupby(['A', 'B', 'C']).apply(lambda sub_df: sub_df\
       .pivot_table(index=['D'], values=['E'], aggfunc='count', margins=True)

output:

输出:

A B C  D  E
       a  1 
a a a  b  2
       c  2
     all  5
       a  3 
b b a  b  2
       c  2
     all  7
       a  3 
b b b  b  6
       c  2
       d  3
     all 14