Python 如何在 Pandas 中创建 sum 行和 sum 列?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/53414960/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 20:18:06  来源:igfitidea点击:

How do I create a sum row and sum column in pandas?

pythonpandas

提问by Wayne Werner

I'm going through the Khan Academy course on Statistics as a bit of a refresher from my college days, and as a way to get me up to speed on pandas & other scientific Python.

我正在学习可汗学院的统计学课程,作为我大学时代的一点复习,也是让我快速了解 Pandas 和其他科学 Python 的一种方式。

I've got a table that looks like this from Khan Academy:

我有一张来自可汗学院的桌子,看起来像这样:

             | Undergraduate | Graduate | Total
-------------+---------------+----------+------
Straight A's |           240 |       60 |   300
-------------+---------------+----------+------
Not          |         3,760 |      440 | 4,200
-------------+---------------+----------+------
Total        |         4,000 |      500 | 4,500

I would like to recreate this table using pandas. Of course I could create a DataFrame using something like

我想使用熊猫重新创建这个表。当然,我可以使用类似的东西创建一个 DataFrame

"Graduate": {...},
"Undergraduate": {...},
"Total": {...},

But that seems like a naive approach that would both fall over quickly and just not really be extensible.

但这似乎是一种天真的方法,既会很快失败,又不能真正扩展。

I've got the non-totals part of the table like this:

我有表格的非总计部分,如下所示:

df = pd.DataFrame(
    {
        "Undergraduate": {"Straight A's": 240, "Not": 3_760},
        "Graduate": {"Straight A's": 60, "Not": 440},
    }
)
df

I've been looking and found a couple of promising things, like:

我一直在寻找并发现了一些有希望的东西,例如:

df['Total'] = df.sum(axis=1)

But I didn't find anything terribly elegant.

但我没有发现任何非常优雅的东西。

I did find the crosstabfunction that looks like it should do what I want, but it seems like in order to do that I'd have to create a dataframe consisting of 1/0 for all of these values, which seems silly because I've already got an aggregate.

我确实找到了crosstab看起来应该做我想做的功能,但似乎为了做到这一点,我必须为所有这些值创建一个由 1/0 组成的数据框,这看起来很愚蠢,因为我已经已经有一个聚合。

I have found some approaches that seem to manually build a new totals row, but it seems like there should be a better way, something like:

我发现了一些似乎手动构建新总计行的方法,但似乎应该有更好的方法,例如:

totals(df, rows=True, columns=True)

or something.

或者其他的东西。

Does this exist in pandas, or do I have to just cobble together my own approach?

这是否存在于熊猫中,还是我必须拼凑出我自己的方法?

回答by Archie

Or in two steps, using the .sum()function as you suggested (which might be a bit more readable as well):

或者分两步,.sum()按照您的建议使用该函数(这也可能更具可读性):

import pandas as pd

df = pd.DataFrame( {"Undergraduate": {"Straight A's": 240, "Not": 3_760},"Graduate": {"Straight A's": 60, "Not": 440},})

#Total sum per column: 
df.loc['Total',:]= df.sum(axis=0)

#Total sum per row: 
df.loc[:,'Total'] = df.sum(axis=1)

Output:

输出:

              Graduate  Undergraduate  Total
Not                440           3760   4200
Straight A's        60            240    300
Total              500           4000   4500

回答by piRSquared

appendand assign

appendassign

The point of this answer is to provide an in line and notan in place solution.

这个答案的重点是提供在线而不是就地解决方案。

append

append

I use appendto stack a Seriesor DataFramevertically. It also creates a copyso that I can continue to chain.

append用来堆叠 aSeriesDataFrame垂直。它还创建了一个,copy以便我可以继续链接。

assign

assign

I use assignto add a column. However, the DataFrameI'm working on is in the in between nether space. So I use a lambdain the assignargument which tells Pandasto apply it to the calling DataFrame.

assign用来添加一列。然而,DataFrame我正在研究的是在下界空间之间。所以我lambdaassign参数中使用 a告诉Pandas将它应用到调用DataFrame.



df.append(df.sum().rename('Total')).assign(Total=lambda d: d.sum(1))

              Graduate  Undergraduate  Total
Not                440           3760   4200
Straight A's        60            240    300
Total              500           4000   4500


Fun alternative

有趣的替代品

Uses dropwith errors='ignore'to get rid of potentially pre-existing Totalrows and columns.

使用dropwitherrors='ignore'摆脱潜在的预先存在的Total行和列。

Also, still in line.

而且,还在排队。

def tc(d):
  return d.assign(Total=d.drop('Total', errors='ignore', axis=1).sum(1))

df.pipe(tc).T.pipe(tc).T

              Graduate  Undergraduate  Total
Not                440           3760   4200
Straight A's        60            240    300
Total              500           4000   4500

回答by YOBEN_S

From the original data using crosstab, if just base on your input, you just need meltbefore crosstab

从使用的原始数据crosstab,如果只是基于您的输入,您只需要melt之前crosstab

s=df.reset_index().melt('index')
pd.crosstab(index=s['index'],columns=s.variable,values=s.value,aggfunc='sum',margins=True)
Out[33]: 
variable      Graduate  Undergraduate   All
index                                      
Not                440           3760  4200
Straight A's        60            240   300
All                500           4000  4500


Toy data

玩具数据

df=pd.DataFrame({'c1':[1,2,2,3,4],'c2':[2,2,3,3,3],'c3':[1,2,3,4,5]}) 
# before `agg`, I think your input is the result after `groupby` 
df
Out[37]: 
   c1  c2  c3
0   1   2   1
1   2   2   2
2   2   3   3
3   3   3   4
4   4   3   5


pd.crosstab(df.c1,df.c2,df.c3,aggfunc='sum',margins
=True)
Out[38]: 
c2     2     3  All
c1                 
1    1.0   NaN    1
2    2.0   3.0    5
3    NaN   4.0    4
4    NaN   5.0    5
All  3.0  12.0   15

回答by TimeSeam

The original data is:

原始数据为:

>>> df = pd.DataFrame(dict(Undergraduate=[240, 3760], Graduate=[60, 440]), index=["Straight A's", "Not"])
>>> df
Out: 
              Graduate  Undergraduate
Straight A's        60            240
Not                440           3760

You can only use df.Tto achieve recreating this table:

您只能使用df.T来实现重新创建此表:

>>> df_new = df.T
>>> df_new
Out: 
               Straight A's   Not
Graduate                 60   440
Undergraduate           240  3760

After computing the Totalby row and columns:

Total按行和列计算后:

>>> df_new.loc['Total',:]= df_new.sum(axis=0)
>>> df_new.loc[:,'Total'] = df_new.sum(axis=1)
>>> df_new
Out: 
               Straight A's     Not   Total
Graduate               60.0   440.0   500.0
Undergraduate         240.0  3760.0  4000.0
Total                 300.0  4200.0  4500.0