Pandas 在没有手动指定级别的情况下在多索引列上融化 (Python 3.5.1)
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/36431413/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Pandas Melt on Multi-index Columns Without Manually Specifying Levels (Python 3.5.1)
提问by Vincent
I have a Pandas DataFrame that looks something like:
我有一个 Pandas DataFrame,它看起来像:
df = pd.DataFrame({'col1': {0: 'a', 1: 'b', 2: 'c'},
'col2': {0: 1, 1: 3, 2: 5},
'col3': {0: 2, 1: 4, 2: 6},
'col4': {0: 3, 1: 6, 2: 2},
'col5': {0: 7, 1: 2, 2: 3},
'col6': {0: 2, 1: 9, 2: 5},
})
df.columns = [list('AAAAAA'), list('BBCCDD'), list('EFGHIJ')]
A
B C D
E F G H I J
0 a 1 2 3 7 2
1 b 3 4 6 2 9
2 c 5 6 2 3 5
I basically just want to melt
the data frame so that each column level becomes a new column. In other words, I can achieve what I want pretty simply with pd.melt()
:
我基本上只是想要melt
数据框,以便每个列级别成为一个新列。换句话说,我可以非常简单地实现我想要的pd.melt()
:
pd.melt(df, value_vars=[('A', 'B', 'E'),
('A', 'B', 'F'),
('A', 'C', 'G'),
('A', 'C', 'H'),
('A', 'D', 'I'),
('A', 'D', 'J')])
However, in my real use-case, There are many initial columns (a lot more than 6), and it would be great if I could make this generalizable so I didn't have to precisely specify the tuples in value_vars
. Is there a way to do this in a generalizable way? I'm basically looking for a way to tell pd.melt
that I just want to set value_vars
to a list of tuples where in each tuple the first element is the first column level, the second is the second column level, and the third element is the third column level.
但是,在我的实际用例中,有许多初始列(远多于 6 个),如果我可以将其泛化就太好了,这样我就不必在value_vars
. 有没有办法以通用的方式做到这一点?我基本上是在寻找一种方法来告诉pd.melt
我我只想设置value_vars
一个元组列表,其中每个元组中的第一个元素是第一列级别,第二个元素是第二列级别,第三个元素是第三列等级。
采纳答案by unutbu
If you don't specify value_vars
, then all columns (that are not specified as id_vars
) are used by default:
如果未指定value_vars
,则id_vars
默认使用所有列(未指定为):
In [10]: pd.melt(df)
Out[10]:
variable_0 variable_1 variable_2 value
0 A B E a
1 A B E b
2 A B E c
3 A B F 1
4 A B F 3
...
However, if for some reason you do need to generate the list of column-tuples, you could use df.columns.tolist()
:
但是,如果由于某种原因确实需要生成列元组列表,则可以使用df.columns.tolist()
:
In [57]: df.columns.tolist()
Out[57]:
[('A', 'B', 'E'),
('A', 'B', 'F'),
('A', 'C', 'G'),
('A', 'C', 'H'),
('A', 'D', 'I'),
('A', 'D', 'J')]
In [56]: pd.melt(df, value_vars=df.columns.tolist())
Out[56]:
variable_0 variable_1 variable_2 value
0 A B E a
1 A B E b
2 A B E c
3 A B F 1
4 A B F 3
...
回答by pyrocarm
I had this same question, but my base dataset was actually just a series with 3-level Multi-Index. I found this answer to 'melt' a Series into a Dataframe from this blog post: https://discuss.analyticsvidhya.com/t/how-to-convert-the-multi-index-series-into-a-data-frame-in-python/5119/2
我有同样的问题,但我的基础数据集实际上只是一个具有 3 级多索引的系列。我从这篇博客文章中找到了将系列“融化”到数据帧中的答案:https://discuss.analyticsvidhya.com/t/how-to-convert-the-multi-index-series-into-a-data-框架蟒蛇/5119/2
Basically, you just use the DataFrame Constructor on the Series and it does exactly what you want Melt to do.
基本上,您只需在 Series 上使用 DataFrame 构造函数,它就会完全按照您的要求执行 Melt。
pd.DataFrame(series)