Python 获取 Pandas 列的总数

Question

提问by LearningToJava

Target

目标

I have a Pandas data frame, as shown below, with multiple columns and would like to get the total of column, MyColumn.

我有一个 Pandas 数据框，如下所示，有多个列，并希望获得列的总数MyColumn。

Data Frame- df:

数据帧-df：

print df

           X           MyColumn  Y              Z   
0          A           84        13.0           69.0   
1          B           76         77.0          127.0   
2          C           28         69.0           16.0   
3          D           28         28.0           31.0   
4          E           19         20.0           85.0   
5          F           84        193.0           70.0

My attempt:

我的尝试：

I have attempted to get the sum of the column using groupbyand .sum():

我试图使用groupbyand获取列的总和.sum()：

Total = df.groupby['MyColumn'].sum()

print Total

This causes the following error:

这会导致以下错误：

TypeError: 'instancemethod' object has no attribute '__getitem__'

Expected Output

预期产出

I'd have expected the output to be as followed:

我原以为输出如下：

Or alternatively, I would like dfto be edited with a new rowentitled TOTALcontaining the total:

或者，我想df编辑一个包含总数的新row标题TOTAL：

           X           MyColumn  Y              Z   
0          A           84        13.0           69.0   
1          B           76         77.0          127.0   
2          C           28         69.0           16.0   
3          D           28         28.0           31.0   
4          E           19         20.0           85.0   
5          F           84        193.0           70.0   
TOTAL                  319

Answer 1

回答by jezrael

You should use sum:

你应该使用sum：

Total = df['MyColumn'].sum()
print (Total)
319

Then you use locwith Series, in that case the index should be set as the same as the specific column you need to sum:

然后使用locwith Series，在这种情况下，索引应设置为与需要求和的特定列相同：

df.loc['Total'] = pd.Series(df['MyColumn'].sum(), index = ['MyColumn'])
print (df)
         X  MyColumn      Y      Z
0        A      84.0   13.0   69.0
1        B      76.0   77.0  127.0
2        C      28.0   69.0   16.0
3        D      28.0   28.0   31.0
4        E      19.0   20.0   85.0
5        F      84.0  193.0   70.0
Total  NaN     319.0    NaN    NaN

because if you pass scalar, the values of all rows will be filled:

因为如果你传递标量，所有行的值都将被填充：

df.loc['Total'] = df['MyColumn'].sum()
print (df)
         X  MyColumn      Y      Z
0        A        84   13.0   69.0
1        B        76   77.0  127.0
2        C        28   69.0   16.0
3        D        28   28.0   31.0
4        E        19   20.0   85.0
5        F        84  193.0   70.0
Total  319       319  319.0  319.0

Two other solutions are with at, and ixsee the applications below:

其他两个解决方案与at，ix请参阅下面的应用程序：

df.at['Total', 'MyColumn'] = df['MyColumn'].sum()
print (df)
         X  MyColumn      Y      Z
0        A      84.0   13.0   69.0
1        B      76.0   77.0  127.0
2        C      28.0   69.0   16.0
3        D      28.0   28.0   31.0
4        E      19.0   20.0   85.0
5        F      84.0  193.0   70.0
Total  NaN     319.0    NaN    NaN

df.ix['Total', 'MyColumn'] = df['MyColumn'].sum()
print (df)
         X  MyColumn      Y      Z
0        A      84.0   13.0   69.0
1        B      76.0   77.0  127.0
2        C      28.0   69.0   16.0
3        D      28.0   28.0   31.0
4        E      19.0   20.0   85.0
5        F      84.0  193.0   70.0
Total  NaN     319.0    NaN    NaN

Note:Since Pandas v0.20, ixhas been deprecated. Use locor ilocinstead.

注意：自 Pandas v0.20 起，ix已弃用。使用loc或iloc代替。

Answer 2

回答by Psidom

Another option you can go with here:

您可以在这里使用的另一种选择：

df.loc["Total", "MyColumn"] = df.MyColumn.sum()

#         X  MyColumn      Y       Z
#0        A     84.0    13.0    69.0
#1        B     76.0    77.0   127.0
#2        C     28.0    69.0    16.0
#3        D     28.0    28.0    31.0
#4        E     19.0    20.0    85.0
#5        F     84.0   193.0    70.0
#Total  NaN    319.0     NaN     NaN

You can also use append()method:

您还可以使用append()方法：

df.append(pd.DataFrame(df.MyColumn.sum(), index = ["Total"], columns=["MyColumn"]))

Update:

更新：

In case you need to append sum for all numericcolumns, you can do one of the followings:

如果您需要为所有数字列附加总和，您可以执行以下操作之一：

Use appendto do this in a functional manner (doesn't change the original data frame):

用于append以功能方式执行此操作（不更改原始数据框）：

# select numeric columns and calculate the sums
sums = df.select_dtypes(pd.np.number).sum().rename('total')

# append sums to the data frame
df.append(sums)
#         X  MyColumn      Y      Z
#0        A      84.0   13.0   69.0
#1        B      76.0   77.0  127.0
#2        C      28.0   69.0   16.0
#3        D      28.0   28.0   31.0
#4        E      19.0   20.0   85.0
#5        F      84.0  193.0   70.0
#total  NaN     319.0  400.0  398.0

Use locto mutate data frame in place:

用于loc在原地改变数据框：

df.loc['total'] = df.select_dtypes(pd.np.number).sum()
df
#         X  MyColumn      Y      Z
#0        A      84.0   13.0   69.0
#1        B      76.0   77.0  127.0
#2        C      28.0   69.0   16.0
#3        D      28.0   28.0   31.0
#4        E      19.0   20.0   85.0
#5        F      84.0  193.0   70.0
#total  NaN     638.0  800.0  796.0

Answer 3

回答by Jeff Crites

Similar to getting the length of a dataframe, len(df), the following worked for pandas and blaze:

与获取数据帧的长度类似len(df)，以下内容适用于 pandas 和 blaze：

Total = sum(df['MyColumn'])

or alternatively

或者

Total = sum(df.MyColumn)
print Total

Answer 4

回答by Suraj Verma

There are two ways to sum of a column
dataset = pd.read_csv("data.csv")
1: sum(dataset.Column_name)
2: dataset['Column_Name'].sum()

有两种方法可以对列求和
数据集 = pd.read_csv("data.csv")
1: sum(dataset.Column_name)
2：数据集['Column_Name'].sum()

If there is any issue in this the please correct me..

如果这里有任何问题，请纠正我..

Answer 5

回答by Ghanshyam Savaliya

As other option, you can do something like below

作为其他选择，您可以执行以下操作

Group   Valuation   amount
    0   BKB Tube    156
    1   BKB Tube    143
    2   BKB Tube    67
    3   BAC Tube    176
    4   BAC Tube    39
    5   JDK Tube    75
    6   JDK Tube    35
    7   JDK Tube    155
    8   ETH Tube    38
    9   ETH Tube    56

Below script, you can use for above data

下面的脚本，您可以用于上述数据

import pandas as pd    
data = pd.read_csv("daata1.csv")
bytreatment = data.groupby('Group')
bytreatment['amount'].sum()

Python 获取 Pandas 列的总数

提问by LearningToJava

回答by jezrael

回答by Psidom

回答by Jeff Crites

回答by Suraj Verma

回答by Ghanshyam Savaliya

相关推荐

最近更新

标签

Python 获取 Pandas 列的总数

提问by LearningToJava

回答by jezrael

回答by Psidom

回答by Jeff Crites

回答by Suraj Verma

回答by Ghanshyam Savaliya

相关推荐

Python 熊猫的 groupby 中的 as_index 是什么？

Python Pip 错误：需要 Microsoft Visual C++ 14.0

Python NameError: 名称 'csv' 未定义

Python 错误：找不到 pip 的匹配分布

相关推荐

最近更新

标签