融化 Pandas 数据框的上三角矩阵

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/34417685/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 00:25:03  来源:igfitidea点击:

Melt the Upper Triangular Matrix of a Pandas Dataframe

pythonpandasnumpyreshapemelt

提问by Ramón J Romero y Vigil

Given a square pandas DataFrame of the following form:

给定以下形式的方形 Pandas DataFrame:

   a  b  c
a  1 .5 .3
b .5  1 .4
c .3 .4  1

How can I meltonly the upper triangleto get

我怎样才能melt只有上三角形才能得到

 Row     Column    Value
  a        a       1
  a        b       .5 
  a        c       .3
  b        b       1
  b        c       .4
  c        c       1 

#Note the combination a,b is only listed once.  There is no b,a listing     

I'm more interested in an idiomatic pandas solution, a custom indexer would be easy enough to write by hand...

我对惯用的Pandas解决方案更感兴趣,自定义索引器很容易手写...

Thank you in advance for your consideration and response.

预先感谢您的考虑和回应。

回答by jezrael

First I convert lower values of dfto NaNby whereand numpy.triuand then stack, reset_indexand set column names:

首先,我转换的较低值df,以NaN通过wherenumpy.triu,然后stackreset_index和集列名:

import numpy as np

print df
     a    b    c
a  1.0  0.5  0.3
b  0.5  1.0  0.4
c  0.3  0.4  1.0

print np.triu(np.ones(df.shape)).astype(np.bool)
[[ True  True  True]
 [False  True  True]
 [False False  True]]

df = df.where(np.triu(np.ones(df.shape)).astype(np.bool))
print df
    a    b    c
a   1  0.5  0.3
b NaN  1.0  0.4
c NaN  NaN  1.0

df = df.stack().reset_index()
df.columns = ['Row','Column','Value']
print df

  Row Column  Value
0   a      a    1.0
1   a      b    0.5
2   a      c    0.3
3   b      b    1.0
4   b      c    0.4
5   c      c    1.0

回答by Matthew Davis

Building from solution by @jezrael, boolean indexing would be a more explicit approach:

从@jezrael 的解决方案构建,布尔索引将是一种更明确的方法:

import numpy
from pandas import DataFrame

df = DataFrame({'a':[1,.5,.3],'b':[.5,1,.4],'c':[.3,.4,1]},index=list('abc'))
print df,'\n'
keep = np.triu(np.ones(df.shape)).astype('bool').reshape(df.size)
print df.stack()[keep]

output:

输出:

     a    b    c
a  1.0  0.5  0.3
b  0.5  1.0  0.4
c  0.3  0.4  1.0 

a  a    1.0
   b    0.5
   c    0.3
b  b    1.0
   c    0.4
c  c    1.0
dtype: float64