Unmelt Pandas DataFrame
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 
原文地址: http://stackoverflow.com/questions/31306741/
Warning: these are provided under cc-by-sa 4.0 license.  You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Unmelt Pandas DataFrame
提问by slaw
I have a pandas dataframe with two id variables:
我有一个带有两个 id 变量的 Pandas 数据框:
df = pd.DataFrame({'id': [1,1,1,2,2,3], 
               'num': [10,10,12,13,14,15],
               'q': ['a', 'b', 'd', 'a', 'b', 'z'],
               'v': [2,4,6,8,10,12]})
   id  num  q   v
0   1   10  a   2
1   1   10  b   4
2   1   12  d   6
3   2   13  a   8
4   2   14  b  10
5   3   15  z  12
I can pivot the table with:
我可以使用以下方法旋转表格:
df.pivot('id','q','v')
And end up with something close:
最后得到一些接近的东西:
q    a   b   d   z
id                
1    2   4   6 NaN
2    8  10 NaN NaN
3  NaN NaN NaN  12
However, what I really want is (the original unmelted form):
但是,我真正想要的是(原始未熔化形式):
id   num   a   b   d   z               
1    10   2   4 NaN NaN
1    12 NaN NaN   6 NaN  
2    13   8 NaN NaN NaN
2    14 NaN  10 NaN NaN
3    15 NaN NaN NaN  12
In other words:
换句话说:
- 'id' and 'num' my indices (normally, I've only seen either 'id' or 'num' being the index but I need both since I'm trying to retrieve the original unmelted form)
- 'q' are my columns
- 'v' are my values in the table
- 'id' 和 'num' 我的索引(通常,我只看到 'id' 或 'num' 作为索引,但我需要两者,因为我试图检索原始未熔化的形式)
- 'q' 是我的列
- 'v' 是我在表中的值
Update
更新
I found a closesolution from Wes McKinney's blog:
我从Wes McKinney 的博客中找到了一个接近的解决方案:
df.pivot_table(index=['id','num'], columns='q')
         v            
q        a   b   d   z
id num                
1  10    2   4 NaN NaN
   12  NaN NaN   6 NaN
2  13    8 NaN NaN NaN
   14  NaN  10 NaN NaN
3  15  NaN NaN NaN  12
However, the format is not quite the same as what I want above.
但是,格式与我上面想要的不太一样。
采纳答案by khammel
You're really close slaw. Just rename your column index to None and you've got what you want.
你真的很亲密。只需将您的列索引重命名为 None 即可获得所需的内容。
df2 = df.pivot_table(index=['id','num'], columns='q')
df2.columns = df2.columns.droplevel().rename(None)
df2.reset_index().fillna("null").to_csv("test.csv", sep="\t", index=None)
Notethat the the 'v' column is expected to be numeric by default so that it can be aggregated. Otherwise, Pandas will error out with:
请注意,默认情况下“v”列应为数字,以便可以对其进行聚合。否则,Pandas 会出错:
DataError: No numeric types to aggregate
To resolve this, you can specify your own aggregation function by using a custom lambda function:
要解决此问题,您可以使用自定义 lambda 函数指定自己的聚合函数:
df2 = df.pivot_table(index=['id','num'], columns='q', aggfunc= lambda x: x)
回答by Zero
You could use set_indexand unstack
你可以使用set_index和unstack
In [18]: df.set_index(['id', 'num', 'q'])['v'].unstack().reset_index()
Out[18]:
q  id  num    a     b    d     z
0   1   10  2.0   4.0  NaN   NaN
1   1   12  NaN   NaN  6.0   NaN
2   2   13  8.0   NaN  NaN   NaN
3   2   14  NaN  10.0  NaN   NaN
4   3   15  NaN   NaN  NaN  12.0
回答by johnInHome
you can remove name q.
您可以删除名称 q。
df1.columns=df1.columns.tolist()
Zero's answer + remove q =
零的答案 + 删除 q =
df1 = df.set_index(['id', 'num', 'q'])['v'].unstack().reset_index()
df1.columns=df1.columns.tolist()
   id  num    a     b    d     z
0   1   10  2.0   4.0  NaN   NaN
1   1   12  NaN   NaN  6.0   NaN
2   2   13  8.0   NaN  NaN   NaN
3   2   14  NaN  10.0  NaN   NaN
4   3   15  NaN   NaN  NaN  12.0
回答by Hillary Murefu
This might work just fine:
这可能工作得很好:
- Pivot
- 枢
df2 = (df.pivot_table(index=['id', 'num'], columns='q', values='v')).reset_index())
df2 = (df.pivot_table(index=['id', 'num'], columns='q', values='v')).reset_index())
- Concatinate the 1st level column names with the 2nd
- 将第一级列名与第二级列名连接起来
df2.columns =[s1 + str(s2) for (s1,s2) in df2.columns.tolist()]
df2.columns =[s1 + str(s2) for (s1,s2) in df2.columns.tolist()]
回答by slaw
Came up with a close solution
提出了一个紧密的解决方案
df2 = df.pivot_table(index=['id','num'], columns='q')
df2.columns = df2.columns.droplevel()
df2.reset_index().fillna("null").to_csv("test.csv", sep="\t", index=None)
Still can't figure out how to drop 'q' from the dataframe
仍然无法弄清楚如何从数据框中删除“q”
回答by Quant Christo
It can be done in three steps:
可以分三步完成:
#1: Prepare auxilary column 'id_num': 
df['id_num'] = df[['id', 'num']].apply(tuple, axis=1)
df = df.drop(columns=['id', 'num'])
#2: 'pivot' is almost an inverse of melt:
df, df.columns.name = df.pivot(index='id_num', columns='q', values='v').reset_index(), ''
#3: Bring back 'id' and 'num' columns:
df['id'], df['num'] = zip(*df['id_num'])
df = df.drop(columns=['id_num'])
This is a result, but with different order of columns:
这是一个结果,但列的顺序不同:
     a     b    d     z  id  num
0  2.0   4.0  NaN   NaN   1   10
1  NaN   NaN  6.0   NaN   1   12
2  8.0   NaN  NaN   NaN   2   13
3  NaN  10.0  NaN   NaN   2   14
4  NaN   NaN  NaN  12.0   3   15
Alternatively with proper order:
或者以适当的顺序:
def multiindex_pivot(df, columns=None, values=None):
    #inspired by: https://github.com/pandas-dev/pandas/issues/23955
    names = list(df.index.names)
    df = df.reset_index()
    list_index = df[names].values
    tuples_index = [tuple(i) for i in list_index] # hashable
    df = df.assign(tuples_index=tuples_index)
    df = df.pivot(index="tuples_index", columns=columns, values=values)
    tuples_index = df.index  # reduced
    index = pd.MultiIndex.from_tuples(tuples_index, names=names)
    df.index = index
    df = df.reset_index() #me
    df.columns.name = ''  #me
    return df
df = df.set_index(['id', 'num'])
df = multiindex_pivot(df, columns='q', values='v')

