在 pandas.DataFrame 的对角线上设置值
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/24475094/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Set values on the diagonal of pandas.DataFrame
提问by Tim
I have a pandas dataframe I would like to se the diagonal to 0
我有一个Pandas数据框,我想将对角线设为 0
import numpy
import pandas
df = pandas.DataFrame(numpy.random.rand(5,5))
df
Out[6]:
0 1 2 3 4
0 0.536596 0.674319 0.032815 0.908086 0.215334
1 0.735022 0.954506 0.889162 0.711610 0.415118
2 0.119985 0.979056 0.901891 0.687829 0.947549
3 0.186921 0.899178 0.296294 0.521104 0.638924
4 0.354053 0.060022 0.275224 0.635054 0.075738
5 rows × 5 columns
now I want to set the diagonal to 0:
现在我想将对角线设置为 0:
for i in range(len(df.index)):
for j in range(len(df.columns)):
if i==j:
df.loc[i,j] = 0
df
Out[9]:
0 1 2 3 4
0 0.000000 0.674319 0.032815 0.908086 0.215334
1 0.735022 0.000000 0.889162 0.711610 0.415118
2 0.119985 0.979056 0.000000 0.687829 0.947549
3 0.186921 0.899178 0.296294 0.000000 0.638924
4 0.354053 0.060022 0.275224 0.635054 0.000000
5 rows × 5 columns
but there must be a more pythonic way than that!?
但一定有比这更 Pythonic 的方式!?
回答by unutbu
In [21]: df.values[[np.arange(df.shape[0])]*2] = 0
In [22]: df
Out[22]:
0 1 2 3 4
0 0.000000 0.931374 0.604412 0.863842 0.280339
1 0.531528 0.000000 0.641094 0.204686 0.997020
2 0.137725 0.037867 0.000000 0.983432 0.458053
3 0.594542 0.943542 0.826738 0.000000 0.753240
4 0.357736 0.689262 0.014773 0.446046 0.000000
Note that this will only work if dfhas the same number of rows as columns. Another way which will work for arbitrary shapes is to use np.fill_diagonal:
请注意,这仅df在行数与列数相同时才有效。另一种适用于任意形状的方法是使用np.fill_diagonal:
In [36]: np.fill_diagonal(df.values, 0)
回答by Pietro Battiston
Both approaches in unutbu's answerassume that labels are irrelevant (they operate on the underlying values).
unutbu 的答案中的两种方法都假设标签无关紧要(它们对基础值进行操作)。
The OP code works with .locand so is label based instead (i.e. put a 0 on cells in row-column with same labels, rather than in cells located on the diagonal - admittedly, this is irrelevant in the specific example given, in which labels are just positions).
OP 代码可以使用.loc,因此是基于标签的(即在具有相同标签的行列中的单元格上放置 0,而不是在位于对角线上的单元格中 - 诚然,这与给出的特定示例无关,其中标签是只是位置)。
Being in need of the "label-based" diagonal filling (working with a DataFramedescribing an incomplete adjacency matrix), the simplest approach I could come up with was:
需要“基于标签的”对角线填充(使用DataFrame描述不完整的邻接矩阵),我能想到的最简单的方法是:
def pd_fill_diagonal(df, value):
idces = df.index.intersection(df.columns)
stacked = df.stack(dropna=False)
stacked.update(pd.Series(value,
index=pd.MultiIndex.from_arrays([idces,
idces])))
df.loc[:, :] = stacked.unstack()
回答by Philipp Schwarz
This solution is vectorized and very fast and unless the other suggested solution works for any column names and size of df matrix.
这个解决方案是矢量化的并且非常快,除非其他建议的解决方案适用于任何列名和 df 矩阵的大小。
def pd_fill_diagonal(df_matrix, value=0):
mat = df_matrix.values
n = mat.shape[0]
mat[range(n), range(n)] = value
return pd.DataFrame(mat)
Performance on Dataframe of 507 columns and rows
507 列和行的 Dataframe 的性能
% timeit pd_fill_diagonal(df, 0)
1000 loops, best of 3: 145 μs per loop
1000 个循环,最好的 3 个:每个循环 145 μs
回答by Andrew Louw
Using np.fill_diagonal(df.values, 1)Is the easiest, but you need to make sure your columns all have the same data type I had a mixture of np.float64 and python floats and it would only effect the numpy values. to fix you have to cast everything to numpy.
使用np.fill_diagonal(df.values, 1)是最简单的,但您需要确保您的列都具有相同的数据类型,我混合了 np.float64 和 python 浮点数,它只会影响 numpy 值。要修复,您必须将所有内容都转换为 numpy。
回答by qed
Here is a hack that worked for me:
这是一个对我有用的黑客:
def set_diag(self, values):
n = min(len(self.index), len(self.columns))
self.values[[np.arange(n)] * 2] = values
pd.DataFrame.set_diag = set_diag
x = pd.DataFrame(np.random.randn(10, 5))
x.set_diag(0)

