Python 使用条件检查从 numpy 数组中删除某些元素
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/20917703/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Deleting certain elements from numpy array using conditional checks
提问by Abhinav Kumar
I want to remove some entries from a numpy array that is about a million entries long.
我想从大约一百万个条目长的 numpy 数组中删除一些条目。
This code would do it but take a long time:
这段代码可以做到,但需要很长时间:
a = np.array([1,45,23,23,1234,3432,-1232,-34,233])
for element in a:
if element<(-100) or element>100:
some delete command.
Can I do this any other way?
我可以用其他方式做到这一点吗?
采纳答案by qwwqwwq
I'm assuming you mean a < -100 or a > -100, the most concise way is to use logical indexing.
我假设你的意思是a < -100 or a > -100,最简洁的方法是使用逻辑索引。
a = a[(a >= -100) & (a <= 100)]
This is not exactly "deleting" the entries, rather making a copy of the array minus the unwanted values and assigning it to the variable that was previously assigned to the old array. After this happens the old array has no remaining references and is garbage collected, meaning its memory is freed.
这并不是完全“删除”条目,而是制作数组副本减去不需要的值并将其分配给之前分配给旧数组的变量。发生这种情况后,旧数组没有剩余的引用并且被垃圾收集,这意味着它的内存被释放。
It's worth noting that this method does not use constant memory, since we make a copy of the array it uses memory linear in the size of the array. This could be bad if your array is so huge it reaches the limits of the memory on your machine. The process of actually going through and removing each element in the array "in place", aka using constant memory, would be a very different operation, as elements in the array would need to be swapped around and the block of memory resized. I'm not sure you can do this with a numpyarray, however one thing you can do to avoid copying is to use a numpymasked array:
值得注意的是,此方法不使用常量内存,因为我们制作了数组的副本,它使用的内存与数组的大小成线性关系。如果您的阵列如此之大以至于达到您机器上的内存限制,这可能会很糟糕。实际上“就地”遍历和删除数组中的每个元素的过程,也就是使用常量内存,将是一个非常不同的操作,因为数组中的元素需要被交换,并且需要调整内存块的大小。我不确定你可以用numpy数组来做到这一点,但是你可以做的一件事是使用numpy掩码数组来避免复制:
import numpy.ma as ma
mx = ma.masked_array(a, mask = ((a < -100) | (a > 100)) )
All operations on the masked array will act as if the elements we "deleted" don't exist, but we didn't really "delete" them, they are still there in memory, there is just a record of which elements to skip now associated with the array, and we don't ever need to make a copy of the array in memory. Also if we ever want our deleted values back, we can just remove the mask like so:
对掩码数组的所有操作都将表现为我们“删除”的元素不存在,但我们并没有真正“删除”它们,它们仍然存在于内存中,现在只是记录要跳过哪些元素与数组关联,我们永远不需要在内存中制作数组的副本。此外,如果我们想要恢复已删除的值,我们可以像这样删除掩码:
mx.mask = ma.nomask
回答by falsetru
You can use masked indexwith inversed condition.
您可以使用具有反转条件的屏蔽索引。
>>> a = np.array([1,45,23,23,1234,3432,-1232,-34,233])
>>> a[~((a < -100) | (a > 100))]
array([ 1, 45, 23, 23, -34])
>>> a[(a >= -100) & (a <= 100)]
array([ 1, 45, 23, 23, -34])
>>> a[abs(a) <= 100]
array([ 1, 45, 23, 23, -34])
回答by zhangxaochen
In [140]: a = np.array([1,45,23,23,1234,3432,-1232,-34,233])
In [141]: b=a[(-100<=a)&(a<=100)]
In [142]: b
Out[142]: array([ 1, 45, 23, 23, -34])

