Python 如何在 Pandas 中创建 sum 行和 sum 列?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/53414960/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How do I create a sum row and sum column in pandas?
提问by Wayne Werner
I'm going through the Khan Academy course on Statistics as a bit of a refresher from my college days, and as a way to get me up to speed on pandas & other scientific Python.
我正在学习可汗学院的统计学课程,作为我大学时代的一点复习,也是让我快速了解 Pandas 和其他科学 Python 的一种方式。
I've got a table that looks like this from Khan Academy:
我有一张来自可汗学院的桌子,看起来像这样:
| Undergraduate | Graduate | Total
-------------+---------------+----------+------
Straight A's | 240 | 60 | 300
-------------+---------------+----------+------
Not | 3,760 | 440 | 4,200
-------------+---------------+----------+------
Total | 4,000 | 500 | 4,500
I would like to recreate this table using pandas. Of course I could create a DataFrame using something like
我想使用熊猫重新创建这个表。当然,我可以使用类似的东西创建一个 DataFrame
"Graduate": {...},
"Undergraduate": {...},
"Total": {...},
But that seems like a naive approach that would both fall over quickly and just not really be extensible.
但这似乎是一种天真的方法,既会很快失败,又不能真正扩展。
I've got the non-totals part of the table like this:
我有表格的非总计部分,如下所示:
df = pd.DataFrame(
{
"Undergraduate": {"Straight A's": 240, "Not": 3_760},
"Graduate": {"Straight A's": 60, "Not": 440},
}
)
df
I've been looking and found a couple of promising things, like:
我一直在寻找并发现了一些有希望的东西,例如:
df['Total'] = df.sum(axis=1)
But I didn't find anything terribly elegant.
但我没有发现任何非常优雅的东西。
I did find the crosstab
function that looks like it should do what I want, but it seems like in order to do that I'd have to create a dataframe consisting of 1/0 for all of these values, which seems silly because I've already got an aggregate.
我确实找到了crosstab
看起来应该做我想做的功能,但似乎为了做到这一点,我必须为所有这些值创建一个由 1/0 组成的数据框,这看起来很愚蠢,因为我已经已经有一个聚合。
I have found some approaches that seem to manually build a new totals row, but it seems like there should be a better way, something like:
我发现了一些似乎手动构建新总计行的方法,但似乎应该有更好的方法,例如:
totals(df, rows=True, columns=True)
or something.
或者其他的东西。
Does this exist in pandas, or do I have to just cobble together my own approach?
这是否存在于熊猫中,还是我必须拼凑出我自己的方法?
回答by Archie
Or in two steps, using the .sum()
function as you suggested (which might be a bit more readable as well):
或者分两步,.sum()
按照您的建议使用该函数(这也可能更具可读性):
import pandas as pd
df = pd.DataFrame( {"Undergraduate": {"Straight A's": 240, "Not": 3_760},"Graduate": {"Straight A's": 60, "Not": 440},})
#Total sum per column:
df.loc['Total',:]= df.sum(axis=0)
#Total sum per row:
df.loc[:,'Total'] = df.sum(axis=1)
Output:
输出:
Graduate Undergraduate Total
Not 440 3760 4200
Straight A's 60 240 300
Total 500 4000 4500
回答by piRSquared
append
and assign
append
和 assign
The point of this answer is to provide an in line and notan in place solution.
这个答案的重点是提供在线而不是就地解决方案。
append
append
I use append
to stack a Series
or DataFrame
vertically. It also creates a copy
so that I can continue to chain.
我append
用来堆叠 aSeries
或DataFrame
垂直。它还创建了一个,copy
以便我可以继续链接。
assign
assign
I use assign
to add a column. However, the DataFrame
I'm working on is in the in between nether space. So I use a lambda
in the assign
argument which tells Pandas
to apply it to the calling DataFrame
.
我assign
用来添加一列。然而,DataFrame
我正在研究的是在下界空间之间。所以我lambda
在assign
参数中使用 a告诉Pandas
将它应用到调用DataFrame
.
df.append(df.sum().rename('Total')).assign(Total=lambda d: d.sum(1))
Graduate Undergraduate Total
Not 440 3760 4200
Straight A's 60 240 300
Total 500 4000 4500
Fun alternative
有趣的替代品
Uses drop
with errors='ignore'
to get rid of potentially pre-existing Total
rows and columns.
使用drop
witherrors='ignore'
摆脱潜在的预先存在的Total
行和列。
Also, still in line.
而且,还在排队。
def tc(d):
return d.assign(Total=d.drop('Total', errors='ignore', axis=1).sum(1))
df.pipe(tc).T.pipe(tc).T
Graduate Undergraduate Total
Not 440 3760 4200
Straight A's 60 240 300
Total 500 4000 4500
回答by YOBEN_S
From the original data using crosstab
, if just base on your input, you just need melt
before crosstab
从使用的原始数据crosstab
,如果只是基于您的输入,您只需要melt
之前crosstab
s=df.reset_index().melt('index')
pd.crosstab(index=s['index'],columns=s.variable,values=s.value,aggfunc='sum',margins=True)
Out[33]:
variable Graduate Undergraduate All
index
Not 440 3760 4200
Straight A's 60 240 300
All 500 4000 4500
Toy data
玩具数据
df=pd.DataFrame({'c1':[1,2,2,3,4],'c2':[2,2,3,3,3],'c3':[1,2,3,4,5]})
# before `agg`, I think your input is the result after `groupby`
df
Out[37]:
c1 c2 c3
0 1 2 1
1 2 2 2
2 2 3 3
3 3 3 4
4 4 3 5
pd.crosstab(df.c1,df.c2,df.c3,aggfunc='sum',margins
=True)
Out[38]:
c2 2 3 All
c1
1 1.0 NaN 1
2 2.0 3.0 5
3 NaN 4.0 4
4 NaN 5.0 5
All 3.0 12.0 15
回答by TimeSeam
The original data is:
原始数据为:
>>> df = pd.DataFrame(dict(Undergraduate=[240, 3760], Graduate=[60, 440]), index=["Straight A's", "Not"])
>>> df
Out:
Graduate Undergraduate
Straight A's 60 240
Not 440 3760
You can only use df.T
to achieve recreating this table:
您只能使用df.T
来实现重新创建此表:
>>> df_new = df.T
>>> df_new
Out:
Straight A's Not
Graduate 60 440
Undergraduate 240 3760
After computing the Total
by row and columns:
Total
按行和列计算后:
>>> df_new.loc['Total',:]= df_new.sum(axis=0)
>>> df_new.loc[:,'Total'] = df_new.sum(axis=1)
>>> df_new
Out:
Straight A's Not Total
Graduate 60.0 440.0 500.0
Undergraduate 240.0 3760.0 4000.0
Total 300.0 4200.0 4500.0