pandas 使用 np.where 但如果条件为 False 则保持现有值
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/25717397/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Using np.where but maintaining exisitng values if condition is False
提问by Woody Pride
I like np.where, but have never fully got to grip with it.
我喜欢 np.where,但从来没有完全掌握它。
I have a dataframe lets say it looks like this:
我有一个数据框可以说它看起来像这样:
import pandas as pd
import numpy as np
from numpy import nan as NA
DF = pd.DataFrame({'a' : [ 3, 0, 1, 0, 1, 14, 2, 0, 0, 0, 0],
'b' : [ 3, 0, 1, 0, 1, 14, 2, 0, 0, 0, 0],
'c' : [ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
'd' : [5, 1, 2 ,1, 1 ,22, 30, 1, 0, 0, 0]})
Now what I want to do is replace the 0 values with NaN values when all row values are zero. Critically I want to maintain whatever other values are in the row in the cases where all row values are not zero.
现在我想要做的是当所有行值都为零时用 NaN 值替换 0 值。至关重要的是,在所有行值都不为零的情况下,我想保留行中的任何其他值。
I want to do something like this:
我想做这样的事情:
cols = ['a', 'b', 'c', 'd']
condition = (DF[cols] == 0).all(axis=1)
for col in cols:
DF[col] = np.where(condition, NA, ???)
I put the ??? to indicate that I do not know what value to place there if the condition is False, I just want to preserve whatever is there already. Is this possible with np.where, or should I use another technique?
我把???为了表明如果条件为 False,我不知道在那里放置什么值,我只想保留已经存在的任何值。这可以通过 np.where 实现,还是应该使用其他技术?
回答by JaminSore
There is a pandas.Seriesmethod (whereincidentally) for exactly this kind of task. It seems a little backward at first, but from the documentation.
对于这种任务,有一种pandas.Series方法(where顺便说一下)。起初似乎有点落后,但从文档来看。
Series.where(cond, other=nan, inplace=False, axis=None, level=None, try_cast=False, raise_on_error=True)
Return an object of same shape as self and whose corresponding entries are from self where cond is True and otherwise are from other.
Series.where(cond, other=nan, inplace=False, axis=None, level=None, try_cast=False, raise_on_error=True)
返回一个与 self 形状相同的对象,其对应的条目来自 self ,其中 cond 为 True ,否则来自 other 。
So, your example would become
所以,你的例子会变成
cols = ['a', 'b', 'c', 'd']
condition = (DF[cols] == 0).all(axis=1)
for col in cols:
DF[col].where(~condition, np.nan, inplace=True)
But, if all you're trying to do is replace rows of all zeros for specific set of columns with NA, you could do this instead
但是,如果您要做的只是将特定列集的全零行替换为NA,则可以改为执行此操作
DF.loc[condition, cols] = NA
EDIT
编辑
To answer your original question, np.wherefollows the same broadcasting rulesas other array operations so you would replace ???with DF[col], changing your example to:
要回答您的原始问题,请np.where遵循与其他数组操作相同的广播规则,因此您将替换???为DF[col],将您的示例更改为:
cols = ['a', 'b', 'c', 'd']
condition = (DF[cols] == 0).all(axis=1)
for col in cols:
DF[col] = np.where(condition, NA, DF[col])
回答by Arvind Subramaniam
You can do something like this:
你可以这样做:
array_binary = np.where(array[i]<threshold,0,1)
array_sparse = np.multiply(array_binary,np.ones_like(array))
do an element-wise multiplication of the binary array and an array of ones using np.multiply. Hence, the non-zero elements will be recovered/maintained. array_sparse is the sparse version of array
使用 np.multiply 对二进制数组和一个数组进行逐元素乘法。因此,非零元素将被恢复/维护。array_sparse 是数组的稀疏版本

