pandas 使用多索引在熊猫中添加小计列

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/20888954/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-13 21:30:46  来源:igfitidea点击:

Add subtotal columns in pandas with multi-index

pythonpandas

提问by 8one6

I have a dataframe with a 3-level deep multi-index on the columns. I would like to compute subtotals across rows (sum(axis=1)) where I sum across one of the levels while preserving the others. I think I know how to do this using the levelkeyword argument of pd.DataFrame.sum. However, I'm having trouble thinking of how to incorporate the result of this sum back into the original table.

我有一个数据框,列上有一个 3 级深度多索引。我想计算跨行 ( sum(axis=1)) 的小计,我在其中一个级别上求和,同时保留其他级别。我想我知道如何使用要做到这一点level的关键字参数pd.DataFrame.sum。但是,我在考虑如何将这个总和的结果合并回原始表时遇到了麻烦。

Setup:

设置:

import numpy as np
import pandas as pd
from itertools import product

np.random.seed(0)

colors = ['red', 'green']
shapes = ['square', 'circle']
obsnum = range(5)

rows = list(product(colors, shapes, obsnum))
idx = pd.MultiIndex.from_tuples(rows)
idx.names = ['color', 'shape', 'obsnum']

df = pd.DataFrame({'attr1': np.random.randn(len(rows)), 
                   'attr2': 100 * np.random.randn(len(rows))},
                  index=idx)

df.columns.names = ['attribute']

df = df.unstack(['color', 'shape'])

Gives a nice frame like so:

给出一个漂亮的框架,如下所示:

Original frame

原画框

Say I wanted to reduce the shapelevel. I could run:

说我想降低shape水平。我可以运行:

tots = df.sum(axis=1, level=['attribute', 'color'])

to get my totals like so:

像这样得到我的总数:

totals

总计

Once I have this, I'd like to tack it on to the original frame. I think I can do this in a somewhat cumbersome way:

一旦我有了这个,我想把它钉在原来的框架上。我想我可以用一种有点麻烦的方式来做到这一点:

tots = df.sum(axis=1, level=['attribute', 'color'])
newcols = pd.MultiIndex.from_tuples(list((i[0], i[1], 'sum(shape)') for i in tots.columns))
tots.columns = newcols
bigframe = pd.concat([df, tots], axis=1).sort_index(axis=1)

aggregated

聚合

Is there a more natural way to do this?

有没有更自然的方法来做到这一点?

回答by cronos

Here is a way without loops:

这是一种没有循环的方法:

s = df.sum(axis=1, level=[0,1]).T
s["shape"] = "sum(shape)"
s.set_index("shape", append=True, inplace=True)
df.combine_first(s.T)

The trick is to use the transposed sum. So we can insert another column (i.e. row) with the name of the additional level, which we name exactly like the one we summed over. This column can be converted to a level in the index with set_index. Then we combine dfwith the transposed sum. If the summed level is not the last one you might need some level reordering.

诀窍是使用转置和。因此,我们可以插入具有附加级别名称的另一列(即行),该名称与我们总结的名称完全相同。可以将此列转换为索引中的级别set_index。然后我们结合df转置和。如果总和级别不是最后一个,您可能需要对级别重新排序。

回答by Paul H

Here's my brute-force way of doing it.

这是我的蛮力方法。

After running your well written (thank you) sample code, I did this:

在运行你写得很好的(谢谢)示例代码后,我这样做了:

attributes = pd.unique(df.columns.get_level_values('attribute'))
colors = pd.unique(df.columns.get_level_values('color'))

for attr in attributes:
    for clr in colors:
        df[(attr, clr, 'sum')] = df.xs([attr, clr], level=['attribute', 'color'], axis=1).sum(axis=1)

df

Which gives me:

这给了我:

big table

大桌子