Python 在返回向量的函数上使用 Numpy Vectorize

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/3379301/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-18 10:44:23  来源:igfitidea点击:

Using Numpy Vectorize on Functions that Return Vectors

pythonarraysnumpyvectorization

提问by prodigenius

numpy.vectorizetakes a function f:a->b and turns it into g:a[]->b[].

numpy.vectorize将函数 f:a->b 转换为 g:a[]->b[]。

This works fine when aand bare scalars, but I can't think of a reason why it wouldn't work with b as an ndarrayor list, i.e. f:a->b[] and g:a[]->b[][]

这工作得很好,当ab是标量,但我想不出理由,为什么它不会与B工作作为ndarray或列表,即F:A-> B []和G:一[] - > B [] []

For example:

例如:

import numpy as np
def f(x):
    return x * np.array([1,1,1,1,1], dtype=np.float32)
g = np.vectorize(f, otypes=[np.ndarray])
a = np.arange(4)
print(g(a))

This yields:

这产生:

array([[ 0.  0.  0.  0.  0.],
       [ 1.  1.  1.  1.  1.],
       [ 2.  2.  2.  2.  2.],
       [ 3.  3.  3.  3.  3.]], dtype=object)

Ok, so that gives the right values, but the wrong dtype. And even worse:

好的,这样就给出了正确的值,但给出了错误的 dtype。更糟糕的是:

g(a).shape

yields:

产量:

(4,)

So this array is pretty much useless. I know I can convert it doing:

所以这个数组几乎没用。我知道我可以转换它做:

np.array(map(list, a), dtype=np.float32)

to give me what I want:

给我我想要的:

array([[ 0.,  0.,  0.,  0.,  0.],
       [ 1.,  1.,  1.,  1.,  1.],
       [ 2.,  2.,  2.,  2.,  2.],
       [ 3.,  3.,  3.,  3.,  3.]], dtype=float32)

but that is neither efficient nor pythonic. Can any of you guys find a cleaner way to do this?

但这既不高效也不pythonic。你们中的任何人都可以找到一种更清洁的方法来做到这一点吗?

Thanks in advance!

提前致谢!

回答by unutbu

np.vectorizeis just a convenience function. It doesn't actually make code run any faster. If it isn't convenient to use np.vectorize, simply write your own function that works as you wish.

np.vectorize只是一个方便的功能。它实际上并没有使代码运行得更快。如果使用起来不方便np.vectorize,只需编写自己的函数即可。

The purpose of np.vectorizeis to transform functions which are not numpy-aware (e.g. take floats as input and return floats as output) into functions that can operate on (and return) numpy arrays.

目的np.vectorize是将不能识别 numpy 的函数(例如,将浮点数作为输入并返回浮点数作为输出)转换为可以对(并返回)numpy 数组进行操作的函数。

Your function fis already numpy-aware -- it uses a numpy array in its definition and returns a numpy array. So np.vectorizeis not a good fit for your use case.

您的函数f已经是 numpy 感知的——它在其定义中使用了一个 numpy 数组并返回一个 numpy 数组。所以np.vectorize不太适合您的用例。

The solution therefore is just to roll your own function fthat works the way you desire.

因此,解决方案只是推出您自己的功能f,以您想要的方式工作。

回答by Aniq Ahsan

import numpy as np
def f(x):
    return x * np.array([1,1,1,1,1], dtype=np.float32)
g = np.vectorize(f, otypes=[np.ndarray])
a = np.arange(4)
b = g(a)
b = np.array(b.tolist())
print(b)#b.shape = (4,5)
c = np.ones((2,3,4))
d = g(c)
d = np.array(d.tolist())
print(d)#d.shape = (2,3,4,5)

This should fix the problem and it will work regardless of what size your input is. "map" only works for one dimentional inputs. Using ".tolist()" and creating a new ndarray solves the problem more completely and nicely(I believe). Hope this helps.

这应该可以解决问题,无论您的输入大小如何,它都会起作用。“地图”仅适用于一维输入。使用“.tolist()”并创建一个新的 ndarray 可以更完整和更好地解决问题(我相信)。希望这可以帮助。

回答by bburks832

The best way to solve this would be to use a 2-D NumPy array (in this case a column array) as an input to the originalfunction, which will then generate a 2-D output with the results I believe you were expecting.

解决这个问题的最好方法是使用二维 NumPy 数组(在本例中为列数组)作为原始函数的输入,然后生成一个二维输出,结果我相信你是期待的。

Here is what it might look like in code:

下面是它在代码中的样子:

import numpy as np
def f(x):
    return x*np.array([1, 1, 1, 1, 1], dtype=np.float32)

a = np.arange(4).reshape((4, 1))
b = f(a)
# b is a 2-D array with shape (4, 5)
print(b)

This is a much simpler and less error prone way to complete the operation. Rather than trying to transform the function with numpy.vectorize, this method relies on NumPy's natural ability to broadcast arrays. The trick is to make sure that at least one dimension has an equal length between the arrays.

这是完成操作的更简单且不易出错的方法。该方法不是尝试使用 numpy.vectorize 转换函数,而是依赖于 NumPy 广播数组的天然能力。诀窍是确保数组之间至少有一个维度具有相等的长度。

回答by Syrtis Major

I've written a function, it seems fits to your need.

我写了一个函数,它似乎适合你的需要。

def amap(func, *args):
    '''array version of build-in map
    amap(function, sequence[, sequence, ...]) -> array
    Examples
    --------
    >>> amap(lambda x: x**2, 1)
    array(1)
    >>> amap(lambda x: x**2, [1, 2])
    array([1, 4])
    >>> amap(lambda x,y: y**2 + x**2, 1, [1, 2])
    array([2, 5])
    >>> amap(lambda x: (x, x), 1)
    array([1, 1])
    >>> amap(lambda x,y: [x**2, y**2], [1,2], [3,4])
    array([[1, 9], [4, 16]])
    '''
    args = np.broadcast(None, *args)
    res = np.array([func(*arg[1:]) for arg in args])
    shape = args.shape + res.shape[1:]
    return res.reshape(shape)

Let try

让我们试试

def f(x):
        return x * np.array([1,1,1,1,1], dtype=np.float32)
amap(f, np.arange(4))

Outputs

输出

array([[ 0.,  0.,  0.,  0.,  0.],
       [ 1.,  1.,  1.,  1.,  1.],
       [ 2.,  2.,  2.,  2.,  2.],
       [ 3.,  3.,  3.,  3.,  3.]], dtype=float32)

You may also wrap it with lambda or partial for convenience

为方便起见,您也可以用 lambda 或 partial 包装它

g = lambda x:amap(f, x)
g(np.arange(4))


Note the docstring of vectorizesays

注意文档字符串的vectorize

The vectorizefunction is provided primarily for convenience, not for performance. The implementation is essentially a for loop.

vectorize提供该功能主要是为了方便,而不是为了性能。实现本质上是一个 for 循环。

Thus we would expect the amaphere have similar performance as vectorize. I didn't check it, Any performance test are welcome.

因此,我们希望amaphere 具有与 相似的性能vectorize。我没有检查它,欢迎任何性能测试。

If the performance is really important, you should consider something else, e.g. direct array calculation with reshapeand broadcastto avoid loop in pure python (both vectorizeand amapare the later case).

如果性能是非常重要的,你应该考虑别的东西,如直接序列计算与reshapebroadcast在纯Python避免循环(包括vectorizeamap是后一种情况)。

回答by Cosyn

A new parameter signaturein 1.12.0 does exactly what you what.

signature1.12.0 中的一个新参数完全符合您的要求。

def f(x):
    return x * np.array([1,1,1,1,1], dtype=np.float32)

g = np.vectorize(f, signature='()->(n)')

Then g(np.arange(4)).shapewill give (4L, 5L).

然后g(np.arange(4)).shape会给(4L, 5L)

Here the signature of fis specified. The (n)is the shape of the return value, and the ()is the shape of the parameter which is scalar. And the parameters can be arrays too. For more complex signatures, see Generalized Universal Function API.

这里f指定了 的签名。的(n)是返回值的形状,并且()是这是一个标量参数的形状。参数也可以是数组。有关更复杂的签名,请参阅Generalized Universal Function API

回答by DerWeh

You want to vectorize the function

您想对函数进行矢量化

import numpy as np
def f(x):
    return x * np.array([1,1,1,1,1], dtype=np.float32)

Assuming that you want to get single np.float32arrays as result, you have to specify this as otype. In your question you specified however otypes=[np.ndarray]which means you want every element to be an np.ndarray. Thus, you correctly get a result of dtype=object.

假设您想获得单个np.float32数组作为结果,您必须将其指定为otype. 然而otypes=[np.ndarray],在您的问题中,您指定了这意味着您希望每个元素都是np.ndarray. 因此,您正确地得到了结果dtype=object

The correct call would be

正确的调用是

np.vectorize(f, signature='()->(n)', otypes=[np.float32])

For such a simple function it is however better to leverage numpy's ufunctions; np.vectorizejust loops over it. So in your case just rewrite your function as

然而,对于这样一个简单的函数,最好利用numpy的 ufunctions;np.vectorize只是循环它。所以在你的情况下,只需将你的函数重写为

def f(x):
    return np.multiply.outer(x, np.array([1,1,1,1,1], dtype=np.float32))

This is faster and produces less obscure errors (note however, that the results dtypewill depend on xif you pass a complex or quad precision number, so will be the result).

这更快并且产生的模糊错误更少(但是请注意,结果dtype将取决于x您传递的是复数还是四精度数,结果也将如此)。