pandas 通过使用第二个索引作为列将熊猫多索引系列转换为数据帧
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/44142591/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Converting a pandas multi-index series to a dataframe by using second index as columns
提问by s5s
Hi I have a DataFrame/Series with 2-level multi-index and one column. I would like to take the second-level index and use it as a column. For example (code taken from multi-index docs):
嗨,我有一个带有 2 级多索引和一列的 DataFrame/Series。我想把二级索引作为一列使用。例如(代码取自多索引文档):
import pandas as pd
import numpy as np
arrays = [['bar', 'bar', 'baz', 'baz', 'foo', 'foo', 'qux', 'qux'],
['one', 'two', 'one', 'two', 'one', 'two', 'one', 'two']]
tuples = list(zip(*arrays))
index = pd.MultiIndex.from_tuples(tuples, names=['first', 'second'])
s = pd.DataFrame(np.random.randn(8), index=index, columns=["col"])
Which looks like:
看起来像:
first second
bar one -0.982656
two -0.078237
baz one -0.345640
two -0.160661
foo one -0.605568
two -0.140384
qux one 1.434702
two -1.065408
dtype: float64
What I would like is to have a DataFrame with index [bar, baz, foo, qux]
and columns [one, two]
.
我想要的是有一个带有索引[bar, baz, foo, qux]
和列的 DataFrame [one, two]
。
回答by AChampion
You just need to unstack
your series:
你只需要unstack
你的系列:
>>> s.unstack(level=1)
second one two
first
bar -0.713374 0.556993
baz 0.523611 0.328348
foo 0.338351 -0.571854
qux 0.036694 -0.161852
回答by Divakar
Here's a solution using array reshaping -
这是使用数组整形的解决方案 -
>>> idx = s.index.levels
>>> c = len(idx[1])
>>> pd.DataFrame(s.values.reshape(-1,c),index=idx[0].values, columns=idx[1].values)
one two
bar 2.225401 1.624866
baz 1.067359 0.349440
foo -0.468149 -0.352303
qux 1.215427 0.429146
If you don't care about the names appearing on top of indexes -
如果您不关心出现在索引顶部的名称 -
>>> pd.DataFrame(s.values.reshape(-1,c), index=idx[0], columns=idx[1])
second one two
first
bar 2.225401 1.624866
baz 1.067359 0.349440
foo -0.468149 -0.352303
qux 1.215427 0.429146
Timings for the given dataset size -
给定数据集大小的时间 -
# @AChampion's solution
In [201]: %timeit s.unstack(level=1)
1000 loops, best of 3: 444 μs per loop
# Using array reshaping step-1
In [199]: %timeit s.index.levels
1000000 loops, best of 3: 214 ns per loop
# Using array reshaping step-2
In [202]: %timeit pd.DataFrame(s.values.reshape(-1,2), index=idx[0], columns=idx[1])
10000 loops, best of 3: 47.3 μs per loop
回答by Chaoste
Another powerful solution is using .reset_index
and .pivot
:
另一个强大的解决方案是使用.reset_index
and .pivot
:
levels= [['bar', 'baz'], ['one', 'two', 'three']]
index = pd.MultiIndex.from_product(levels, names=['first', 'second'])
series = pd.Series(np.random.randn(6), index)
df = series.reset_index()
# Shorthand notation instead of explicitly naming index, columns and values
df = df.pivot(*df.columns)
Result:
结果:
second one three two
first
bar 1.047692 1.209063 0.891820
baz 0.083602 -0.303528 -1.385458