Python Pandas 数据框总行
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/21752399/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Pandas dataframe total row
提问by Daniel
I have a dataframe, something like:
我有一个数据框,例如:
foo bar qux
0 a 1 3.14
1 b 3 2.72
2 c 2 1.62
3 d 9 1.41
4 e 3 0.58
and I would like to add a 'total' row to the end of the dataframe:
我想在数据框的末尾添加一个“总计”行:
foo bar qux
0 a 1 3.14
1 b 3 2.72
2 c 2 1.62
3 d 9 1.41
4 e 3 0.58
5 tot 15 9.47
I've tried to use the sumcommand but I end up with a Series, which although I can convert back to a Dataframe, doesn't maintain the data types:
我尝试使用该sum命令,但最终得到了一个系列,虽然我可以转换回数据帧,但不维护数据类型:
tot_row = pd.DataFrame(df.sum()).T
tot_row['foo'] = 'tot'
tot_row.dtypes:
foo object
bar object
qux object
I would like to maintain the data types from the original data frame as I need to apply other operations to the total row, something like:
我想维护原始数据框中的数据类型,因为我需要对总行应用其他操作,例如:
baz = 2*tot_row['qux'] + 3*tot_row['bar']
采纳答案by jmz
Append a totals row with
附加一个总计行
df.append(df.sum(numeric_only=True), ignore_index=True)
The conversion is necessary only if you have a column of strings or objects.
仅当您有一列字符串或对象时才需要进行转换。
It's a bit of a fragile solution so I'd recommend sticking to operations on the dataframe, though. eg.
不过,这是一个脆弱的解决方案,因此我建议坚持对数据帧进行操作。例如。
baz = 2*df['qux'].sum() + 3*df['bar'].sum()
回答by rmschne
Use DataFrame.pivot_tablewith margins=True:
使用DataFrame.pivot_table有margins=True:
import pandas as pd
data = [('a',1,3.14),('b',3,2.72),('c',2,1.62),('d',9,1.41),('e',3,.58)]
df = pd.DataFrame(data, columns=('foo', 'bar', 'qux'))
Original df:
原文df:
foo bar qux
0 a 1 3.14
1 b 3 2.72
2 c 2 1.62
3 d 9 1.41
4 e 3 0.58
Since pivot_tablerequires some sort of grouping (without the indexargument, it'll raise a ValueError: No group keys passed!), and your original index is vacuous, we'll use the foocolumn:
由于pivot_table需要某种分组(没有index参数,它会引发 a ValueError: No group keys passed!),并且您的原始索引是空的,我们将使用该foo列:
df.pivot_table(index='foo',
margins=True,
margins_name='total', # defaults to 'All'
aggfunc=sum)
Voilà!
瞧!
bar qux
foo
a 1 3.14
b 3 2.72
c 2 1.62
d 9 1.41
e 3 0.58
total 18 9.47
回答by frishrash
Alternative way (verified on Pandas 0.18.1):
替代方式(在 Pandas 0.18.1 上验证):
import numpy as np
total = df.apply(np.sum)
total['foo'] = 'tot'
df.append(pd.DataFrame(total.values, index=total.keys()).T, ignore_index=True)
Result:
结果:
foo bar qux
0 a 1 3.14
1 b 3 2.72
2 c 2 1.62
3 d 9 1.41
4 e 3 0.58
5 tot 18 9.47
回答by ihightower
Following helped for me to add a column total and row total to a dataframe.
以下帮助我将列总数和行总数添加到数据框中。
Assume dft1 is your original dataframe... now add a column total and row total with the following steps.
假设 dft1 是您的原始数据框...现在通过以下步骤添加列总计和行总计。
from io import StringIO
import pandas as pd
#create dataframe string
dfstr = StringIO(u"""
a;b;c
1;1;1
2;2;2
3;3;3
4;4;4
5;5;5
""")
#create dataframe dft1 from string
dft1 = pd.read_csv(dfstr, sep=";")
## add a column total to dft1
dft1['Total'] = dft1.sum(axis=1)
## add a row total to dft1 with the following steps
sum_row = dft1.sum(axis=0) #get sum_row first
dft1_sum=pd.DataFrame(data=sum_row).T #change it to a dataframe
dft1_sum=dft1_sum.reindex(columns=dft1.columns) #line up the col index to dft1
dft1_sum.index = ['row_total'] #change row index to row_total
dft1.append(dft1_sum) # append the row to dft1
回答by Matthias Kauer
df.loc["Total"] = df.sum()
works for me and I find it easier to remember. Am I missing something? Probably wasn't possible in earlier versions.
对我有用,我发现它更容易记住。我错过了什么吗?在早期版本中可能是不可能的。
I'd actually like to add the total row only temporarily though. Adding it permanently is good for display but makes it a hassle in further calculations.
我实际上只想暂时添加总行。永久添加它有利于显示,但会使进一步计算变得麻烦。
Just found
刚发现
df.append(df.sum().rename('Total'))
This prints what I want in a Jupyter notebook and appears to leave the df itself untouched.
这会在 Jupyter 笔记本中打印出我想要的内容,并且似乎没有触及 df 本身。
回答by SammyRod
This is the way that I do it, by transposing and using the assign method in combination with a lambda function. It makes it simple for me.
这就是我做的方式,通过转置和使用赋值方法与 lambda 函数的结合。这对我来说很简单。
df.T.assign(GrandTotal = lambda x: x.sum(axis=1)).T
回答by Pedro Moisés Camacho Ure?a
Building on JMZ answer
基于 JMZ 答案
df.append(df.sum(numeric_only=True), ignore_index=True)
if you want to continue using your current index you can name the sum series using .rename() as follows:
如果您想继续使用当前索引,您可以使用 .rename() 命名总和系列,如下所示:
df.append(df.sum().rename('Total'))
This will add a row at the bottom of the table.
这将在表格底部添加一行。
回答by Sarah
Building on answer from Matthias Kauer.
基于 Matthias Kauer 的回答。
To add row total:
要添加行总计:
df.loc["Row_Total"] = df.sum()
To add column total,
要添加列总计,
df.loc[:,"Column_Total"] = df.sum(axis=1)
回答by Poudel
This gives total on both rows and columns
这给出了行和列的总数
import numpy as np
import pandas as pd
df = pd.DataFrame({'a': [10,20],'b':[100,200],'c': ['a','b']})
df.loc['Column_Total']= df.sum(numeric_only=True, axis=0)
df.loc[:,'Row_Total'] = df.sum(numeric_only=True, axis=1)
print(df)
a b c Row_Total
0 10.0 100.0 a 110.0
1 20.0 200.0 b 220.0
Column_Total 30.0 300.0 NaN 330.0

