Python 过滤numpy数组的行?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/26154711/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Filter rows of a numpy array?
提问by killajoule
I am looking to apply a function to each row of a numpy array. If this function evaluates to true I will keep the row, otherwise I will discard it. For example, my function might be:
我希望将函数应用于 numpy 数组的每一行。如果此函数的计算结果为 true,我将保留该行,否则我将丢弃它。例如,我的功能可能是:
def f(row):
if sum(row)>10: return True
else: return False
I was wondering if there was something similar to:
我想知道是否有类似的东西:
np.apply_over_axes()
which applies a function to each row of a numpy array and returns the result. I was hoping for something like:
它将函数应用于 numpy 数组的每一行并返回结果。我希望是这样的:
np.filter_over_axes()
which would apply a function to each row of an numpy array and only return rows for which the function returned true. Is there anything like this? Or should I just use a for loop?
这会将函数应用于 numpy 数组的每一行,并且只返回函数返回 true 的行。有这样的吗?或者我应该只使用 for 循环?
采纳答案by Roger Fan
Ideally, you would be able to implement a vectorized version of your function and use that to do boolean indexing. For the vast majority of problems this is the right solution. Numpy provides quite a few functions that can act over various axes as well as all the basic operations and comparisons, so most useful conditions should be vectorizable.
理想情况下,您将能够实现函数的矢量化版本并使用它来进行布尔索引。对于绝大多数问题,这是正确的解决方案。Numpy 提供了很多可以作用于各个轴以及所有基本操作和比较的函数,因此大多数有用的条件应该是可向量化的。
import numpy as np
x = np.random.randn(20, 3)
x_new = x[np.sum(x, axis=1) > .5]
If you are absolutely sure that you can't do the above, I would suggest using a list comprehension (or np.apply_along_axis) to create an array of bools to index with.
如果您绝对确定不能执行上述操作,我建议使用列表推导式(或np.apply_along_axis)来创建要索引的 bool 数组。
def myfunc(row):
return sum(row) > .5
bool_arr = np.array([myfunc(row) for row in x])
x_new = x[bool_arr]
This will get the job done in a relatively clean way, but will be significantly slower than a vectorized version. An example:
这将以相对干净的方式完成工作,但会比矢量化版本慢得多。一个例子:
x = np.random.randn(5000, 200)
%timeit x[np.sum(x, axis=1) > .5]
# 100 loops, best of 3: 5.71 ms per loop
%timeit x[np.array([myfunc(row) for row in x])]
# 1 loops, best of 3: 217 ms per loop

