Python Pandas DataFrame 将多列值堆叠成单列

Question

提问by borice

Assuming the following DataFrame:

假设以下数据帧：

  key.0 key.1 key.2  topic
1   abc   def   ghi      8
2   xab   xcd   xef      9

How can I combine the values of all the key.* columns into a single column 'key', that's associated with the topic value corresponding to the key.* columns? This is the result I want:

如何将所有 key.* 列的值组合成一个单列“key”，该列与对应于 key.* 列的主题值相关联？这是我想要的结果：

   topic  key
1      8  abc
2      8  def
3      8  ghi
4      9  xab
5      9  xcd
6      9  xef

Note that the number of key.N columns is variable on some external N.

请注意，key.N 列的数量在某些外部 N 上是可变的。

Answer 1

采纳答案by Alexander

You can melt your dataframe:

你可以融化你的数据框：

>>> keys = [c for c in df if c.startswith('key.')]
>>> pd.melt(df, id_vars='topic', value_vars=keys, value_name='key')

   topic variable  key
0      8    key.0  abc
1      9    key.0  xab
2      8    key.1  def
3      9    key.1  xcd
4      8    key.2  ghi
5      9    key.2  xef

It also gives you the source of the key.

它还为您提供了密钥的来源。

From v0.20, meltis a first class function of the pd.DataFrameclass:

From v0.20,melt是该类的第一类函数pd.DataFrame：

>>> df.melt('topic', value_name='key').drop('variable', 1)

   topic  key
0      8  abc
1      9  xab
2      8  def
3      9  xcd
4      8  ghi
5      9  xef

Answer 2

回答by miraculixx

After trying various ways, I find the following is more or less intuitive, provided stack's magic is understood:

在尝试了各种方法之后，我发现以下内容或多或少是直观的，前提stack是理解了的魔法：

# keep topic as index, stack other columns 'against' it
stacked = df.set_index('topic').stack()
# set the name of the new series created
df = stacked.reset_index(name='key')
# drop the 'source' level (key.*)
df.drop('level_1', axis=1, inplace=True)

The resulting dataframe is as required:

生成的数据框符合要求：

   topic  key
0      8  abc
1      8  def
2      8  ghi
3      9  xab
4      9  xcd
5      9  xef

You may want to print intermediary results to understand the process in full. If you don't mind having more columns than needed, the key steps are set_index('topic'), stack()and reset_index(name='key').

您可能需要打印中间结果以全面了解该过程。如果您不介意列多于需要，关键步骤是set_index('topic'),stack()和reset_index(name='key')。

Answer 3

回答by YOBEN_S

OK , cause one of the current answer is mark as duplicated of this question, I will answer here.

好的，因为当前答案之一被标记为与此问题的重复，我将在这里回答。

By Using wide_to_long

通过使用 wide_to_long

pd.wide_to_long(df, ['key'], 'topic', 'age').reset_index().drop('age',1)
Out[123]: 
   topic  key
0      8  abc
1      9  xab
2      8  def
3      9  xcd
4      8  ghi
5      9  xef

Python Pandas DataFrame 将多列值堆叠成单列

提问by borice

采纳答案by Alexander

回答by miraculixx

回答by YOBEN_S

相关推荐

最近更新

标签

Python Pandas DataFrame 将多列值堆叠成单列

提问by borice

采纳答案by Alexander

回答by miraculixx

回答by YOBEN_S

相关推荐

Python 使用 json.dumps() 时出现 UnicodeDecodeError

Python Tensorflow 读取带有标签的图像

从 Python 中的数组中删除空元素

Python 将 Json 嵌套到具有特定格式的 Pandas DataFrame

相关推荐

最近更新

标签