pandas 使用多索引在熊猫中添加小计列

Question

提问by 8one6

I have a dataframe with a 3-level deep multi-index on the columns. I would like to compute subtotals across rows (sum(axis=1)) where I sum across one of the levels while preserving the others. I think I know how to do this using the levelkeyword argument of pd.DataFrame.sum. However, I'm having trouble thinking of how to incorporate the result of this sum back into the original table.

我有一个数据框，列上有一个 3 级深度多索引。我想计算跨行 ( sum(axis=1)) 的小计，我在其中一个级别上求和，同时保留其他级别。我想我知道如何使用要做到这一点level的关键字参数pd.DataFrame.sum。但是，我在考虑如何将这个总和的结果合并回原始表时遇到了麻烦。

Setup:

设置：

import numpy as np
import pandas as pd
from itertools import product

np.random.seed(0)

colors = ['red', 'green']
shapes = ['square', 'circle']
obsnum = range(5)

rows = list(product(colors, shapes, obsnum))
idx = pd.MultiIndex.from_tuples(rows)
idx.names = ['color', 'shape', 'obsnum']

df = pd.DataFrame({'attr1': np.random.randn(len(rows)), 
                   'attr2': 100 * np.random.randn(len(rows))},
                  index=idx)

df.columns.names = ['attribute']

df = df.unstack(['color', 'shape'])

Gives a nice frame like so:

给出一个漂亮的框架，如下所示：

Original frame

原画框

Say I wanted to reduce the shapelevel. I could run:

说我想降低shape水平。我可以运行：

tots = df.sum(axis=1, level=['attribute', 'color'])

to get my totals like so:

像这样得到我的总数：

totals

Once I have this, I'd like to tack it on to the original frame. I think I can do this in a somewhat cumbersome way:

一旦我有了这个，我想把它钉在原来的框架上。我想我可以用一种有点麻烦的方式来做到这一点：

tots = df.sum(axis=1, level=['attribute', 'color'])
newcols = pd.MultiIndex.from_tuples(list((i[0], i[1], 'sum(shape)') for i in tots.columns))
tots.columns = newcols
bigframe = pd.concat([df, tots], axis=1).sort_index(axis=1)

aggregated

Is there a more natural way to do this?

有没有更自然的方法来做到这一点？

Answer 1

回答by cronos

Here is a way without loops:

这是一种没有循环的方法：

s = df.sum(axis=1, level=[0,1]).T
s["shape"] = "sum(shape)"
s.set_index("shape", append=True, inplace=True)
df.combine_first(s.T)

The trick is to use the transposed sum. So we can insert another column (i.e. row) with the name of the additional level, which we name exactly like the one we summed over. This column can be converted to a level in the index with set_index. Then we combine dfwith the transposed sum. If the summed level is not the last one you might need some level reordering.

诀窍是使用转置和。因此，我们可以插入具有附加级别名称的另一列（即行），该名称与我们总结的名称完全相同。可以将此列转换为索引中的级别set_index。然后我们结合df转置和。如果总和级别不是最后一个，您可能需要对级别重新排序。

Answer 2

回答by Paul H

Here's my brute-force way of doing it.

这是我的蛮力方法。

After running your well written (thank you) sample code, I did this:

在运行你写得很好的（谢谢）示例代码后，我这样做了：

attributes = pd.unique(df.columns.get_level_values('attribute'))
colors = pd.unique(df.columns.get_level_values('color'))

for attr in attributes:
    for clr in colors:
        df[(attr, clr, 'sum')] = df.xs([attr, clr], level=['attribute', 'color'], axis=1).sum(axis=1)

df

Which gives me:

这给了我：

big table

大桌子

pandas 使用多索引在熊猫中添加小计列

提问by 8one6

回答by cronos

回答by Paul H

相关推荐

最近更新

标签

pandas 使用多索引在熊猫中添加小计列

提问by 8one6

回答by cronos

回答by Paul H

相关推荐

pandas 如何在pandas DataFrame中选择和删除具有重复名称的列

pandas.DataFrame.describe() 与 numpy.percentile() NaN 处理

Pandas：如何在 pd.DataFrame.plot() 中在 x 轴上显示次要网格线

pandas 熊猫 - 如何将 r 数据帧转换回熊猫？

相关推荐

最近更新

标签