pandas 通过使用第二个索引作为列将熊猫多索引系列转换为数据帧

Question

提问by s5s

Hi I have a DataFrame/Series with 2-level multi-index and one column. I would like to take the second-level index and use it as a column. For example (code taken from multi-index docs):

嗨，我有一个带有 2 级多索引和一列的 DataFrame/Series。我想把二级索引作为一列使用。例如（代码取自多索引文档）：

import pandas as pd
import numpy as np

arrays = [['bar', 'bar', 'baz', 'baz', 'foo', 'foo', 'qux', 'qux'],
          ['one', 'two', 'one', 'two', 'one', 'two', 'one', 'two']]
tuples = list(zip(*arrays))
index = pd.MultiIndex.from_tuples(tuples, names=['first', 'second'])
s = pd.DataFrame(np.random.randn(8), index=index, columns=["col"])

Which looks like:

看起来像：

first  second
bar    one      -0.982656
       two      -0.078237
baz    one      -0.345640
       two      -0.160661
foo    one      -0.605568
       two      -0.140384
qux    one       1.434702
       two      -1.065408
dtype: float64

What I would like is to have a DataFrame with index [bar, baz, foo, qux]and columns [one, two].

我想要的是有一个带有索引[bar, baz, foo, qux]和列的 DataFrame [one, two]。

Answer 1

回答by AChampion

You just need to unstackyour series:

你只需要unstack你的系列：

>>> s.unstack(level=1)
second       one       two
first                     
bar    -0.713374  0.556993
baz     0.523611  0.328348
foo     0.338351 -0.571854
qux     0.036694 -0.161852

Answer 2

回答by Divakar

Here's a solution using array reshaping -

这是使用数组整形的解决方案 -

>>> idx = s.index.levels
>>> c = len(idx[1])
>>> pd.DataFrame(s.values.reshape(-1,c),index=idx[0].values, columns=idx[1].values)
          one       two
bar  2.225401  1.624866
baz  1.067359  0.349440
foo -0.468149 -0.352303
qux  1.215427  0.429146

If you don't care about the names appearing on top of indexes -

如果您不关心出现在索引顶部的名称 -

>>> pd.DataFrame(s.values.reshape(-1,c), index=idx[0], columns=idx[1])
second       one       two
first                     
bar     2.225401  1.624866
baz     1.067359  0.349440
foo    -0.468149 -0.352303
qux     1.215427  0.429146

Timings for the given dataset size -

给定数据集大小的时间 -

# @AChampion's solution
In [201]: %timeit s.unstack(level=1)
1000 loops, best of 3: 444 μs per loop

# Using array reshaping step-1
In [199]: %timeit s.index.levels
1000000 loops, best of 3: 214 ns per loop

# Using array reshaping step-2    
In [202]: %timeit pd.DataFrame(s.values.reshape(-1,2), index=idx[0], columns=idx[1])
10000 loops, best of 3: 47.3 μs per loop

Answer 3

回答by Chaoste

Another powerful solution is using .reset_indexand .pivot:

另一个强大的解决方案是使用.reset_indexand .pivot：

levels= [['bar', 'baz'], ['one', 'two', 'three']]
index = pd.MultiIndex.from_product(levels, names=['first', 'second'])
series = pd.Series(np.random.randn(6), index)

df = series.reset_index()
# Shorthand notation instead of explicitly naming index, columns and values
df = df.pivot(*df.columns)

Result:

结果：

second       one     three       two
first                               
bar     1.047692  1.209063  0.891820
baz     0.083602 -0.303528 -1.385458

pandas 通过使用第二个索引作为列将熊猫多索引系列转换为数据帧

提问by s5s

回答by AChampion

回答by Divakar

回答by Chaoste

相关推荐

最近更新

标签

pandas 通过使用第二个索引作为列将熊猫多索引系列转换为数据帧

提问by s5s

回答by AChampion

回答by Divakar

回答by Chaoste

相关推荐

pandas 如何将数据帧拆分为多个数据帧，其中每个数据帧包含相等但随机的数据

具有 Nan 支持的 Pandas Lambda 函数

pandas 数组维度为 3 时的混淆矩阵错误

如何按列和索引连接 Pandas DataFrames？

相关推荐

最近更新

标签