Python pandas：如何使用多索引运行数据透视表？

Question

提问by Pythonista anonymous

I would like to run a pivot on a pandas DataFrame, with the index being two columns, not one. For example, one field for the year, one for the month, an 'item' field which shows 'item 1' and 'item 2' and a 'value' field with numerical values. I want the index to be year + month.

我想在 pandas 上运行一个数据透视DataFrame表，索引是两列，而不是一列。例如，一个字段用于年份，一个用于月份，一个“item”字段显示“item 1”和“item 2”，以及一个带有数值的“value”字段。我希望索引为年 + 月。

The only way I managed to get this to work was to combine the two fields into one, then separate them again. is there a better way?

我设法让它起作用的唯一方法是将两个字段合并为一个，然后再次将它们分开。有没有更好的办法？

Minimal code copied below. Thanks a lot!

下面复制了最少的代码。非常感谢！

PS Yes, I am aware there are other questions with the keywords 'pivot' and 'multi-index', but I did not understand if/how they can help me with this question.

PS 是的，我知道关键字“枢轴”和“多索引”还有其他问题，但我不明白它们是否/如何帮助我解决这个问题。

import pandas as pd
import numpy as np

df= pd.DataFrame()
month = np.arange(1, 13)
values1 = np.random.randint(0, 100, 12)
values2 = np.random.randint(200, 300, 12)


df['month'] = np.hstack((month, month))
df['year'] = 2004
df['value'] = np.hstack((values1, values2))
df['item'] = np.hstack((np.repeat('item 1', 12), np.repeat('item 2', 12)))

# This doesn't work: 
# ValueError: Wrong number of items passed 24, placement implies 2
# mypiv = df.pivot(['year', 'month'], 'item', 'value')

# This doesn't work, either:
# df.set_index(['year', 'month'], inplace=True)
# ValueError: cannot label index with a null key
# mypiv = df.pivot(columns='item', values='value')

# This below works but is not ideal: 
# I have to first concatenate then separate the fields I need
df['new field'] = df['year'] * 100 + df['month']

mypiv = df.pivot('new field', 'item', 'value').reset_index()
mypiv['year'] = mypiv['new field'].apply( lambda x: int(x) / 100)  
mypiv['month'] = mypiv['new field'] % 100

Answer 1

采纳答案by Alexander

You can group and then unstack.

您可以先分组，然后再取消堆叠。

>>> df.groupby(['year', 'month', 'item'])['value'].sum().unstack('item')
item        item 1  item 2
year month                
2004 1          33     250
     2          44     224
     3          41     268
     4          29     232
     5          57     252
     6          61     255
     7          28     254
     8          15     229
     9          29     258
     10         49     207
     11         36     254
     12         23     209

Or use pivot_table:

或使用pivot_table：

>>> df.pivot_table(
        values='value', 
        index=['year', 'month'], 
        columns='item', 
        aggfunc=np.sum)
item        item 1  item 2
year month                
2004 1          33     250
     2          44     224
     3          41     268
     4          29     232
     5          57     252
     6          61     255
     7          28     254
     8          15     229
     9          29     258
     10         49     207
     11         36     254
     12         23     209

Answer 2

回答by Ajean

I believe if you include itemin your MultiIndex, then you can just unstack:

我相信如果你包含item在你的 MultiIndex 中，那么你就可以解除堆栈：

df.set_index(['year', 'month', 'item']).unstack(level=-1)

This yields:

这产生：

                value      
item       item 1 item 2
year month              
2004 1         21    277
     2         43    244
     3         12    262
     4         80    201
     5         22    287
     6         52    284
     7         90    249
     8         14    229
     9         52    205
     10        76    207
     11        88    259
     12        90    200

It's a bit faster than using pivot_table, and about the same speed or slightly slower than using groupby.

它比使用快一点pivot_table，并且与使用groupby.

Answer 3

回答by moshevi

thanks to gmoutso commentyou can use this:

感谢 gmoutso评论你可以使用这个：

def multiindex_pivot(df, index=None, columns=None, values=None):
    if index is None:
        names = list(df.index.names)
        df = df.reset_index()
    else:
        names = index
    list_index = df[names].values
    tuples_index = [tuple(i) for i in list_index] # hashable
    df = df.assign(tuples_index=tuples_index)
    df = df.pivot(index="tuples_index", columns=columns, values=values)
    tuples_index = df.index  # reduced
    index = pd.MultiIndex.from_tuples(tuples_index, names=names)
    df.index = index
    return df

usage:

用法：

df.pipe(multiindex_pivot, index=['idx_column1', 'idx_column2'], columns='foo', values='bar')

You might want to have a simple flat column structure and have columns to be of their intended type, simply add this:

您可能想要一个简单的扁平列结构，并且列是其预期类型，只需添加以下内容：

(df
   .infer_objects()  # coerce to the intended column type
   .rename_axis(None, axis=1))  # flatten column headers

Python pandas：如何使用多索引运行数据透视表？

提问by Pythonista anonymous

采纳答案by Alexander

回答by Ajean

回答by moshevi

相关推荐

最近更新

标签

Python pandas：如何使用多索引运行数据透视表？

提问by Pythonista anonymous

采纳答案by Alexander

回答by Ajean

回答by moshevi

相关推荐

如何在 Python 3 中比较两个字符串中的单个字符

Python 如何以张量为范围运行循环？（在张量流中）

使用Chrome驱动程序通过python和selenium下载指定位置的文件

Python Tensorflow 图中的张量名称列表

相关推荐

最近更新

标签