Unmelt Pandas DataFrame

Question

提问by slaw

I have a pandas dataframe with two id variables:

我有一个带有两个 id 变量的 Pandas 数据框：

df = pd.DataFrame({'id': [1,1,1,2,2,3], 
               'num': [10,10,12,13,14,15],
               'q': ['a', 'b', 'd', 'a', 'b', 'z'],
               'v': [2,4,6,8,10,12]})

   id  num  q   v
0   1   10  a   2
1   1   10  b   4
2   1   12  d   6
3   2   13  a   8
4   2   14  b  10
5   3   15  z  12

I can pivot the table with:

我可以使用以下方法旋转表格：

df.pivot('id','q','v')

And end up with something close:

最后得到一些接近的东西：

q    a   b   d   z
id                
1    2   4   6 NaN
2    8  10 NaN NaN
3  NaN NaN NaN  12

However, what I really want is (the original unmelted form):

但是，我真正想要的是（原始未熔化形式）：

id   num   a   b   d   z               
1    10   2   4 NaN NaN
1    12 NaN NaN   6 NaN  
2    13   8 NaN NaN NaN
2    14 NaN  10 NaN NaN
3    15 NaN NaN NaN  12

In other words:

换句话说：

'id' and 'num' my indices (normally, I've only seen either 'id' or 'num' being the index but I need both since I'm trying to retrieve the original unmelted form)
'q' are my columns
'v' are my values in the table

'id' 和 'num' 我的索引（通常，我只看到 'id' 或 'num' 作为索引，但我需要两者，因为我试图检索原始未熔化的形式）
'q' 是我的列
'v' 是我在表中的值

Update

更新

I found a closesolution from Wes McKinney's blog:

我从Wes McKinney 的博客中找到了一个接近的解决方案：

df.pivot_table(index=['id','num'], columns='q')

         v            
q        a   b   d   z
id num                
1  10    2   4 NaN NaN
   12  NaN NaN   6 NaN
2  13    8 NaN NaN NaN
   14  NaN  10 NaN NaN
3  15  NaN NaN NaN  12

However, the format is not quite the same as what I want above.

但是，格式与我上面想要的不太一样。

Answer 1

采纳答案by khammel

You're really close slaw. Just rename your column index to None and you've got what you want.

你真的很亲密。只需将您的列索引重命名为 None 即可获得所需的内容。

df2 = df.pivot_table(index=['id','num'], columns='q')
df2.columns = df2.columns.droplevel().rename(None)
df2.reset_index().fillna("null").to_csv("test.csv", sep="\t", index=None)

Notethat the the 'v' column is expected to be numeric by default so that it can be aggregated. Otherwise, Pandas will error out with:

请注意，默认情况下“v”列应为数字，以便可以对其进行聚合。否则，Pandas 会出错：

DataError: No numeric types to aggregate

To resolve this, you can specify your own aggregation function by using a custom lambda function:

要解决此问题，您可以使用自定义 lambda 函数指定自己的聚合函数：

df2 = df.pivot_table(index=['id','num'], columns='q', aggfunc= lambda x: x)

Answer 2

回答by Zero

You could use set_indexand unstack

你可以使用set_index和unstack

In [18]: df.set_index(['id', 'num', 'q'])['v'].unstack().reset_index()
Out[18]:
q  id  num    a     b    d     z
0   1   10  2.0   4.0  NaN   NaN
1   1   12  NaN   NaN  6.0   NaN
2   2   13  8.0   NaN  NaN   NaN
3   2   14  NaN  10.0  NaN   NaN
4   3   15  NaN   NaN  NaN  12.0

Answer 3

回答by johnInHome

you can remove name q.

您可以删除名称 q。

df1.columns=df1.columns.tolist()

Zero's answer + remove q =

零的答案 + 删除 q =

df1 = df.set_index(['id', 'num', 'q'])['v'].unstack().reset_index()
df1.columns=df1.columns.tolist()

   id  num    a     b    d     z
0   1   10  2.0   4.0  NaN   NaN
1   1   12  NaN   NaN  6.0   NaN
2   2   13  8.0   NaN  NaN   NaN
3   2   14  NaN  10.0  NaN   NaN
4   3   15  NaN   NaN  NaN  12.0

Answer 4

回答by Hillary Murefu

This might work just fine:

这可能工作得很好：

Pivot

枢

df2 = (df.pivot_table(index=['id', 'num'], columns='q', values='v')).reset_index())

Concatinate the 1st level column names with the 2nd

将第一级列名与第二级列名连接起来

df2.columns =[s1 + str(s2) for (s1,s2) in df2.columns.tolist()]

Answer 5

回答by slaw

Came up with a close solution

提出了一个紧密的解决方案

df2 = df.pivot_table(index=['id','num'], columns='q')
df2.columns = df2.columns.droplevel()
df2.reset_index().fillna("null").to_csv("test.csv", sep="\t", index=None)

Still can't figure out how to drop 'q' from the dataframe

仍然无法弄清楚如何从数据框中删除“q”

Answer 6

回答by Quant Christo

It can be done in three steps:

可以分三步完成：

#1: Prepare auxilary column 'id_num': 
df['id_num'] = df[['id', 'num']].apply(tuple, axis=1)
df = df.drop(columns=['id', 'num'])

#2: 'pivot' is almost an inverse of melt:
df, df.columns.name = df.pivot(index='id_num', columns='q', values='v').reset_index(), ''

#3: Bring back 'id' and 'num' columns:
df['id'], df['num'] = zip(*df['id_num'])
df = df.drop(columns=['id_num'])

This is a result, but with different order of columns:

这是一个结果，但列的顺序不同：

     a     b    d     z  id  num
0  2.0   4.0  NaN   NaN   1   10
1  NaN   NaN  6.0   NaN   1   12
2  8.0   NaN  NaN   NaN   2   13
3  NaN  10.0  NaN   NaN   2   14
4  NaN   NaN  NaN  12.0   3   15

Alternatively with proper order:

或者以适当的顺序：

def multiindex_pivot(df, columns=None, values=None):
    #inspired by: https://github.com/pandas-dev/pandas/issues/23955
    names = list(df.index.names)
    df = df.reset_index()
    list_index = df[names].values
    tuples_index = [tuple(i) for i in list_index] # hashable
    df = df.assign(tuples_index=tuples_index)
    df = df.pivot(index="tuples_index", columns=columns, values=values)
    tuples_index = df.index  # reduced
    index = pd.MultiIndex.from_tuples(tuples_index, names=names)
    df.index = index
    df = df.reset_index() #me
    df.columns.name = ''  #me
    return df

df = df.set_index(['id', 'num'])
df = multiindex_pivot(df, columns='q', values='v')

Unmelt Pandas DataFrame

提问by slaw

采纳答案by khammel

回答by Zero

回答by johnInHome

回答by Hillary Murefu

回答by slaw

回答by Quant Christo

相关推荐

最近更新

标签

Unmelt Pandas DataFrame

提问by slaw

采纳答案by khammel

回答by Zero

回答by johnInHome

回答by Hillary Murefu

回答by slaw

回答by Quant Christo

相关推荐

pandas 按数据框计算分类数据熊猫组

Pandas sparse dataFrame转稀疏矩阵，内存中不生成稠密矩阵

Pandas：不断地从函数写入 csv

在 pandas.DataFrame 中添加一个 np.array 作为列

相关推荐

最近更新

标签