Python Pandas DataFrame 将多列值堆叠成单列
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/34376053/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Pandas DataFrame stack multiple column values into single column
提问by borice
Assuming the following DataFrame:
假设以下数据帧:
key.0 key.1 key.2 topic
1 abc def ghi 8
2 xab xcd xef 9
How can I combine the values of all the key.* columns into a single column 'key', that's associated with the topic value corresponding to the key.* columns? This is the result I want:
如何将所有 key.* 列的值组合成一个单列“key”,该列与对应于 key.* 列的主题值相关联?这是我想要的结果:
topic key
1 8 abc
2 8 def
3 8 ghi
4 9 xab
5 9 xcd
6 9 xef
Note that the number of key.N columns is variable on some external N.
请注意,key.N 列的数量在某些外部 N 上是可变的。
采纳答案by Alexander
You can melt your dataframe:
你可以融化你的数据框:
>>> keys = [c for c in df if c.startswith('key.')]
>>> pd.melt(df, id_vars='topic', value_vars=keys, value_name='key')
topic variable key
0 8 key.0 abc
1 9 key.0 xab
2 8 key.1 def
3 9 key.1 xcd
4 8 key.2 ghi
5 9 key.2 xef
It also gives you the source of the key.
它还为您提供了密钥的来源。
From v0.20
, melt
is a first class function of the pd.DataFrame
class:
From v0.20
,melt
是该类的第一类函数pd.DataFrame
:
>>> df.melt('topic', value_name='key').drop('variable', 1)
topic key
0 8 abc
1 9 xab
2 8 def
3 9 xcd
4 8 ghi
5 9 xef
回答by miraculixx
After trying various ways, I find the following is more or less intuitive, provided stack
's magic is understood:
在尝试了各种方法之后,我发现以下内容或多或少是直观的,前提stack
是理解了 的魔法:
# keep topic as index, stack other columns 'against' it
stacked = df.set_index('topic').stack()
# set the name of the new series created
df = stacked.reset_index(name='key')
# drop the 'source' level (key.*)
df.drop('level_1', axis=1, inplace=True)
The resulting dataframe is as required:
生成的数据框符合要求:
topic key
0 8 abc
1 8 def
2 8 ghi
3 9 xab
4 9 xcd
5 9 xef
You may want to print intermediary results to understand the process in full. If you don't mind having more columns than needed, the key steps are set_index('topic')
, stack()
and reset_index(name='key')
.
您可能需要打印中间结果以全面了解该过程。如果您不介意列多于需要,关键步骤是set_index('topic')
,stack()
和reset_index(name='key')
。
回答by YOBEN_S
OK , cause one of the current answer is mark as duplicated of this question, I will answer here.
好的,因为当前答案之一被标记为与此问题的重复,我将在这里回答。
By Using wide_to_long
通过使用 wide_to_long
pd.wide_to_long(df, ['key'], 'topic', 'age').reset_index().drop('age',1)
Out[123]:
topic key
0 8 abc
1 9 xab
2 8 def
3 9 xcd
4 8 ghi
5 9 xef