融化 Pandas 数据框的上三角矩阵
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/34417685/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Melt the Upper Triangular Matrix of a Pandas Dataframe
提问by Ramón J Romero y Vigil
Given a square pandas DataFrame of the following form:
给定以下形式的方形 Pandas DataFrame:
a b c
a 1 .5 .3
b .5 1 .4
c .3 .4 1
How can I melt
only the upper triangleto get
我怎样才能melt
只有上三角形才能得到
Row Column Value
a a 1
a b .5
a c .3
b b 1
b c .4
c c 1
#Note the combination a,b is only listed once. There is no b,a listing
I'm more interested in an idiomatic pandas solution, a custom indexer would be easy enough to write by hand...
我对惯用的Pandas解决方案更感兴趣,自定义索引器很容易手写...
Thank you in advance for your consideration and response.
预先感谢您的考虑和回应。
回答by jezrael
First I convert lower values of df
to NaN
by where
and numpy.triu
and then stack
, reset_index
and set column names:
首先,我转换的较低值df
,以NaN
通过where
和numpy.triu
,然后stack
,reset_index
和集列名:
import numpy as np
print df
a b c
a 1.0 0.5 0.3
b 0.5 1.0 0.4
c 0.3 0.4 1.0
print np.triu(np.ones(df.shape)).astype(np.bool)
[[ True True True]
[False True True]
[False False True]]
df = df.where(np.triu(np.ones(df.shape)).astype(np.bool))
print df
a b c
a 1 0.5 0.3
b NaN 1.0 0.4
c NaN NaN 1.0
df = df.stack().reset_index()
df.columns = ['Row','Column','Value']
print df
Row Column Value
0 a a 1.0
1 a b 0.5
2 a c 0.3
3 b b 1.0
4 b c 0.4
5 c c 1.0
回答by Matthew Davis
Building from solution by @jezrael, boolean indexing would be a more explicit approach:
从@jezrael 的解决方案构建,布尔索引将是一种更明确的方法:
import numpy
from pandas import DataFrame
df = DataFrame({'a':[1,.5,.3],'b':[.5,1,.4],'c':[.3,.4,1]},index=list('abc'))
print df,'\n'
keep = np.triu(np.ones(df.shape)).astype('bool').reshape(df.size)
print df.stack()[keep]
output:
输出:
a b c
a 1.0 0.5 0.3
b 0.5 1.0 0.4
c 0.3 0.4 1.0
a a 1.0
b 0.5
c 0.3
b b 1.0
c 0.4
c c 1.0
dtype: float64