pandas 使用 np.where 但如果条件为 False 则保持现有值

Question

提问by Woody Pride

I like np.where, but have never fully got to grip with it.

我喜欢 np.where，但从来没有完全掌握它。

I have a dataframe lets say it looks like this:

我有一个数据框可以说它看起来像这样：

import pandas as pd
import numpy as np
from numpy import nan as NA
DF = pd.DataFrame({'a' : [ 3, 0, 1, 0, 1, 14, 2, 0, 0, 0, 0],
                   'b' : [ 3, 0, 1, 0, 1, 14, 2, 0, 0, 0, 0],
                   'c' : [ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
                   'd' : [5, 1, 2 ,1, 1 ,22, 30, 1, 0, 0, 0]})

Now what I want to do is replace the 0 values with NaN values when all row values are zero. Critically I want to maintain whatever other values are in the row in the cases where all row values are not zero.

现在我想要做的是当所有行值都为零时用 NaN 值替换 0 值。至关重要的是，在所有行值都不为零的情况下，我想保留行中的任何其他值。

I want to do something like this:

我想做这样的事情：

cols = ['a', 'b', 'c', 'd']
condition = (DF[cols] == 0).all(axis=1)
for col in cols:
    DF[col] = np.where(condition, NA, ???)

I put the ??? to indicate that I do not know what value to place there if the condition is False, I just want to preserve whatever is there already. Is this possible with np.where, or should I use another technique?

我把？？？为了表明如果条件为 False，我不知道在那里放置什么值，我只想保留已经存在的任何值。这可以通过 np.where 实现，还是应该使用其他技术？

Answer 1

回答by JaminSore

There is a pandas.Seriesmethod (whereincidentally) for exactly this kind of task. It seems a little backward at first, but from the documentation.

对于这种任务，有一种pandas.Series方法（where顺便说一下）。起初似乎有点落后，但从文档来看。

Series.where(cond, other=nan, inplace=False, axis=None, level=None, try_cast=False, raise_on_error=True)
Return an object of same shape as self and whose corresponding entries are from self where cond is True and otherwise are from other.

Series.where(cond, other=nan, inplace=False, axis=None, level=None, try_cast=False, raise_on_error=True)
返回一个与 self 形状相同的对象，其对应的条目来自 self ，其中 cond 为 True ，否则来自 other 。

So, your example would become

所以，你的例子会变成

cols = ['a', 'b', 'c', 'd']
condition = (DF[cols] == 0).all(axis=1)
for col in cols:
    DF[col].where(~condition, np.nan, inplace=True)

But, if all you're trying to do is replace rows of all zeros for specific set of columns with NA, you could do this instead

但是，如果您要做的只是将特定列集的全零行替换为NA，则可以改为执行此操作

DF.loc[condition, cols] = NA

EDIT

编辑

To answer your original question, np.wherefollows the same broadcasting rulesas other array operations so you would replace ???with DF[col], changing your example to:

要回答您的原始问题，请np.where遵循与其他数组操作相同的广播规则，因此您将替换???为DF[col]，将您的示例更改为：

cols = ['a', 'b', 'c', 'd']
condition = (DF[cols] == 0).all(axis=1)
for col in cols:
    DF[col] = np.where(condition, NA, DF[col])

Answer 2

回答by Arvind Subramaniam

You can do something like this:

你可以这样做：

    array_binary = np.where(array[i]<threshold,0,1)
    array_sparse = np.multiply(array_binary,np.ones_like(array))

do an element-wise multiplication of the binary array and an array of ones using np.multiply. Hence, the non-zero elements will be recovered/maintained. array_sparse is the sparse version of array

使用 np.multiply 对二进制数组和一个数组进行逐元素乘法。因此，非零元素将被恢复/维护。array_sparse 是数组的稀疏版本

pandas 使用 np.where 但如果条件为 False 则保持现有值

提问by Woody Pride

回答by JaminSore

回答by Arvind Subramaniam

相关推荐

最近更新

标签

pandas 使用 np.where 但如果条件为 False 则保持现有值

提问by Woody Pride

回答by JaminSore

回答by Arvind Subramaniam

相关推荐

pandas 保留 NaN 值并删除非缺失值

pandas 比较 2 个熊猫系列时会发生什么

Pandas 中的列直方图

在 Pandas 中将行名称转换为列

相关推荐

最近更新

标签