如何使用我的 Pandas 数据框创建一个显示组值 sum() 的数据透视表?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/36363127/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How can I create a Pivot Table that show sum() of group values, using my Pandas Data Frame?
提问by dot.Py
My df1
:
我的df1
:
cnpj num_doc bc_icms
0 02817342000124 0000010154 17827.07
1 54921580000189 0000112428 108000.00
2 08953538000122 0000012865 232.00
3 08953538000122 0000012865 239.00
4 08953538000122 0000012865 215.00
5 07374346000107 0000014224 320.12
6 07374346000107 0000014231 385.04
7 07374346000107 0000014263 401.28
8 07374346000107 0000014279 391.26
9 02364118000124 0000015263 37353.10
10 02364118000124 0000015264 56214.14
The output of df1.dtypes
:
的输出df1.dtypes
:
cnpj object
num_doc object
bc_icms float64
dtype: object
So.... I'm trying to create a pivot table to answer the following question:
所以......我正在尝试创建一个数据透视表来回答以下问题:
What is the
sum
ofbc_icms
for eachcnpj
?
每个的
sum
ofbc_icms
是cnpj
什么?
This is what I've wrote:
这是我写的:
indexes = [np.array(df1['cnpj']), np.array(df1['num_doc'])]
pt1 = pd.DataFrame(df1['bc_icms'], index=indexes)
print pt1
And here's the output:
这是输出:
bc_icms
02817342000124 0000010154 NaN
54921580000189 0000112428 NaN
08953538000122 0000012865 NaN
0000012865 NaN
0000012865 NaN
07374346000107 0000014224 NaN
0000014231 NaN
0000014263 NaN
0000014279 NaN
02364118000124 0000015263 NaN
0000015264 NaN
0000015265 NaN
07720786000160 0000020128 NaN
I think this is the pivot table structure that I want! Good! But...
我想这就是我想要的数据透视表结构!好的!但...
How can I fix these NaN's ?
How can I create a "sum" line for each cnpj ?
我该如何修复这些 NaN?
如何为每个 cnpj 创建一个“总和”线?
Example in Excel:
Excel 中的示例:
回答by Fabio Lamanna
IIUC, you need a sum of each cnpj
values, so I would use groupby as:
IIUC,您需要每个cnpj
值的总和,因此我将使用 groupby 作为:
g = df.groupby('cnpj')['bc_icms'].sum().reset_index(name='sum')
that returns:
返回:
cnpj sum
0 2364118000124 93567.24
1 2817342000124 17827.07
2 7374346000107 1497.70
3 8953538000122 686.00
4 54921580000189 108000.00
Hope that helps.
希望有帮助。
EDIT:
编辑:
you can also use:
您还可以使用:
g = df.groupby(['cnpj','num_doc'])['bc_icms'].sum()
that returns the complete dataframe out:
返回完整的数据帧:
cnpj num_doc
2364118000124 15263 37353.10
15264 56214.14
2817342000124 10154 17827.07
7374346000107 14224 320.12
14231 385.04
14263 401.28
14279 391.26
8953538000122 12865 686.00
54921580000189 112428 108000.00