如何使用我的 Pandas 数据框创建一个显示组值 sum() 的数据透视表？

Question

提问by dot.Py

My df1:

我的df1：

               cnpj     num_doc    bc_icms
0    02817342000124  0000010154   17827.07
1    54921580000189  0000112428  108000.00
2    08953538000122  0000012865     232.00
3    08953538000122  0000012865     239.00
4    08953538000122  0000012865     215.00
5    07374346000107  0000014224     320.12
6    07374346000107  0000014231     385.04
7    07374346000107  0000014263     401.28
8    07374346000107  0000014279     391.26
9    02364118000124  0000015263   37353.10
10   02364118000124  0000015264   56214.14

The output of df1.dtypes:

的输出df1.dtypes：

cnpj        object
num_doc     object
bc_icms    float64
dtype: object

So.... I'm trying to create a pivot table to answer the following question:

所以......我正在尝试创建一个数据透视表来回答以下问题：

What is the sumof bc_icmsfor each cnpj?

每个的sumofbc_icms是cnpj什么？

This is what I've wrote:

这是我写的：

indexes = [np.array(df1['cnpj']), np.array(df1['num_doc'])]
pt1 = pd.DataFrame(df1['bc_icms'], index=indexes)
print pt1

And here's the output:

这是输出：

                           bc_icms
02817342000124 0000010154      NaN
54921580000189 0000112428      NaN
08953538000122 0000012865      NaN
               0000012865      NaN
               0000012865      NaN
07374346000107 0000014224      NaN
               0000014231      NaN
               0000014263      NaN
               0000014279      NaN
02364118000124 0000015263      NaN
               0000015264      NaN
               0000015265      NaN
07720786000160 0000020128      NaN

I think this is the pivot table structure that I want! Good! But...

我想这就是我想要的数据透视表结构！好的！但...

How can I fix these NaN's ?
How can I create a "sum" line for each cnpj ?

我该如何修复这些 NaN？
如何为每个 cnpj 创建一个“总和”线？

Example in Excel:

Excel 中的示例：

Answer 1

回答by Fabio Lamanna

IIUC, you need a sum of each cnpjvalues, so I would use groupby as:

IIUC，您需要每个cnpj值的总和，因此我将使用 groupby 作为：

g = df.groupby('cnpj')['bc_icms'].sum().reset_index(name='sum')

that returns:

返回：

             cnpj        sum
0   2364118000124   93567.24
1   2817342000124   17827.07
2   7374346000107    1497.70
3   8953538000122     686.00
4  54921580000189  108000.00

Hope that helps.

希望有帮助。

EDIT:

编辑：

you can also use:

您还可以使用：

g = df.groupby(['cnpj','num_doc'])['bc_icms'].sum()

that returns the complete dataframe out:

返回完整的数据帧：

cnpj            num_doc
2364118000124   15263       37353.10
                15264       56214.14
2817342000124   10154       17827.07
7374346000107   14224         320.12
                14231         385.04
                14263         401.28
                14279         391.26
8953538000122   12865         686.00
54921580000189  112428     108000.00

如何使用我的 Pandas 数据框创建一个显示组值 sum() 的数据透视表？

提问by dot.Py

回答by Fabio Lamanna

相关推荐

最近更新

标签

如何使用我的 Pandas 数据框创建一个显示组值 sum() 的数据透视表？

提问by dot.Py

回答by Fabio Lamanna

相关推荐

如何在 Python 中将 Pandas DataFrame 与 None 进行比较？

pandas 列值的 pct_change

pandas 如何升级 iPython 使用的软件包？

pandas 在fillna中使用自定义函数Series

相关推荐

最近更新

标签