Python 获取 Pandas 列的总数
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/41286569/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Get total of Pandas column
提问by LearningToJava
Target
目标
I have a Pandas data frame, as shown below, with multiple columns and would like to get the total of column, MyColumn
.
我有一个 Pandas 数据框,如下所示,有多个列,并希望获得列的总数MyColumn
。
Data Frame- df
:
数据帧-df
:
print df
print df
X MyColumn Y Z
0 A 84 13.0 69.0
1 B 76 77.0 127.0
2 C 28 69.0 16.0
3 D 28 28.0 31.0
4 E 19 20.0 85.0
5 F 84 193.0 70.0
My attempt:
我的尝试:
I have attempted to get the sum of the column using groupby
and .sum()
:
我试图使用groupby
and获取列的总和.sum()
:
Total = df.groupby['MyColumn'].sum()
print Total
This causes the following error:
这会导致以下错误:
TypeError: 'instancemethod' object has no attribute '__getitem__'
Expected Output
预期产出
I'd have expected the output to be as followed:
我原以为输出如下:
319
Or alternatively, I would like df
to be edited with a new row
entitled TOTAL
containing the total:
或者,我想df
编辑一个包含总数的新row
标题TOTAL
:
X MyColumn Y Z
0 A 84 13.0 69.0
1 B 76 77.0 127.0
2 C 28 69.0 16.0
3 D 28 28.0 31.0
4 E 19 20.0 85.0
5 F 84 193.0 70.0
TOTAL 319
回答by jezrael
You should use sum
:
你应该使用sum
:
Total = df['MyColumn'].sum()
print (Total)
319
Then you use loc
with Series
, in that case the index should be set as the same as the specific column you need to sum:
然后使用loc
with Series
,在这种情况下,索引应设置为与需要求和的特定列相同:
df.loc['Total'] = pd.Series(df['MyColumn'].sum(), index = ['MyColumn'])
print (df)
X MyColumn Y Z
0 A 84.0 13.0 69.0
1 B 76.0 77.0 127.0
2 C 28.0 69.0 16.0
3 D 28.0 28.0 31.0
4 E 19.0 20.0 85.0
5 F 84.0 193.0 70.0
Total NaN 319.0 NaN NaN
because if you pass scalar, the values of all rows will be filled:
因为如果你传递标量,所有行的值都将被填充:
df.loc['Total'] = df['MyColumn'].sum()
print (df)
X MyColumn Y Z
0 A 84 13.0 69.0
1 B 76 77.0 127.0
2 C 28 69.0 16.0
3 D 28 28.0 31.0
4 E 19 20.0 85.0
5 F 84 193.0 70.0
Total 319 319 319.0 319.0
Two other solutions are with at
, and ix
see the applications below:
df.at['Total', 'MyColumn'] = df['MyColumn'].sum()
print (df)
X MyColumn Y Z
0 A 84.0 13.0 69.0
1 B 76.0 77.0 127.0
2 C 28.0 69.0 16.0
3 D 28.0 28.0 31.0
4 E 19.0 20.0 85.0
5 F 84.0 193.0 70.0
Total NaN 319.0 NaN NaN
df.ix['Total', 'MyColumn'] = df['MyColumn'].sum()
print (df)
X MyColumn Y Z
0 A 84.0 13.0 69.0
1 B 76.0 77.0 127.0
2 C 28.0 69.0 16.0
3 D 28.0 28.0 31.0
4 E 19.0 20.0 85.0
5 F 84.0 193.0 70.0
Total NaN 319.0 NaN NaN
Note:Since Pandas v0.20, ix
has been deprecated. Use loc
or iloc
instead.
注意:自 Pandas v0.20 起,ix
已弃用。使用loc
或iloc
代替。
回答by Psidom
Another option you can go with here:
您可以在这里使用的另一种选择:
df.loc["Total", "MyColumn"] = df.MyColumn.sum()
# X MyColumn Y Z
#0 A 84.0 13.0 69.0
#1 B 76.0 77.0 127.0
#2 C 28.0 69.0 16.0
#3 D 28.0 28.0 31.0
#4 E 19.0 20.0 85.0
#5 F 84.0 193.0 70.0
#Total NaN 319.0 NaN NaN
You can also use append()
method:
您还可以使用append()
方法:
df.append(pd.DataFrame(df.MyColumn.sum(), index = ["Total"], columns=["MyColumn"]))
Update:
更新:
In case you need to append sum for all numericcolumns, you can do one of the followings:
如果您需要为所有数字列附加总和,您可以执行以下操作之一:
Use append
to do this in a functional manner (doesn't change the original data frame):
用于append
以功能方式执行此操作(不更改原始数据框):
# select numeric columns and calculate the sums
sums = df.select_dtypes(pd.np.number).sum().rename('total')
# append sums to the data frame
df.append(sums)
# X MyColumn Y Z
#0 A 84.0 13.0 69.0
#1 B 76.0 77.0 127.0
#2 C 28.0 69.0 16.0
#3 D 28.0 28.0 31.0
#4 E 19.0 20.0 85.0
#5 F 84.0 193.0 70.0
#total NaN 319.0 400.0 398.0
Use loc
to mutate data frame in place:
用于loc
在原地改变数据框:
df.loc['total'] = df.select_dtypes(pd.np.number).sum()
df
# X MyColumn Y Z
#0 A 84.0 13.0 69.0
#1 B 76.0 77.0 127.0
#2 C 28.0 69.0 16.0
#3 D 28.0 28.0 31.0
#4 E 19.0 20.0 85.0
#5 F 84.0 193.0 70.0
#total NaN 638.0 800.0 796.0
回答by Jeff Crites
Similar to getting the length of a dataframe, len(df)
, the following worked for pandas and blaze:
与获取数据帧的长度类似len(df)
,以下内容适用于 pandas 和 blaze:
Total = sum(df['MyColumn'])
or alternatively
或者
Total = sum(df.MyColumn)
print Total
回答by Suraj Verma
There are two ways to sum of a column
dataset = pd.read_csv("data.csv")
1: sum(dataset.Column_name)
2: dataset['Column_Name'].sum()
有两种方法可以对列求和
数据集 = pd.read_csv("data.csv")
1: sum(dataset.Column_name)
2:数据集['Column_Name'].sum()
If there is any issue in this the please correct me..
如果这里有任何问题,请纠正我..
回答by Ghanshyam Savaliya
As other option, you can do something like below
作为其他选择,您可以执行以下操作
Group Valuation amount
0 BKB Tube 156
1 BKB Tube 143
2 BKB Tube 67
3 BAC Tube 176
4 BAC Tube 39
5 JDK Tube 75
6 JDK Tube 35
7 JDK Tube 155
8 ETH Tube 38
9 ETH Tube 56
Below script, you can use for above data
下面的脚本,您可以用于上述数据
import pandas as pd
data = pd.read_csv("daata1.csv")
bytreatment = data.groupby('Group')
bytreatment['amount'].sum()