Python 如何在 NumPy 中规范化数组?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/21030391/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-18 21:51:17  来源:igfitidea点击:

How to normalize an array in NumPy?

pythonnumpyscikit-learnstatisticsnormalization

提问by Donbeo

I would like to have the norm of one NumPy array. More specifically, I am looking for an equivalent version of this function

我想要一个 NumPy 数组的规范。更具体地说,我正在寻找此功能的等效版本

def normalize(v):
    norm = np.linalg.norm(v)
    if norm == 0: 
       return v
    return v / norm

Is there something like that in skearnor numpy?

skearn或 中有类似的东西numpy吗?

This function works in a situation where vis the 0 vector.

此函数适用于v0 向量的情况。

采纳答案by ali_m

If you're using scikit-learn you can use sklearn.preprocessing.normalize:

如果您使用 scikit-learn,您可以使用sklearn.preprocessing.normalize

import numpy as np
from sklearn.preprocessing import normalize

x = np.random.rand(1000)*10
norm1 = x / np.linalg.norm(x)
norm2 = normalize(x[:,np.newaxis], axis=0).ravel()
print np.all(norm1 == norm2)
# True

回答by Eelco Hoogendoorn

I would agree that it were nice if such a function was part of the included batteries. But it isn't, as far as I know. Here is a version for arbitrary axes, and giving optimal performance.

我同意如果这样的功能是随附电池的一部分就好了。但据我所知,事实并非如此。这是任意轴的版本,并提供最佳性能。

import numpy as np

def normalized(a, axis=-1, order=2):
    l2 = np.atleast_1d(np.linalg.norm(a, order, axis))
    l2[l2==0] = 1
    return a / np.expand_dims(l2, axis)

A = np.random.randn(3,3,3)
print(normalized(A,0))
print(normalized(A,1))
print(normalized(A,2))

print(normalized(np.arange(3)[:,None]))
print(normalized(np.arange(3)))

回答by Eduard Feicho

You can specify ord to get the L1 norm. To avoid zero division I use eps, but that's maybe not great.

您可以指定 ord 来获得 L1 范数。为了避免零除法,我使用 eps,但这可能不是很好。

def normalize(v):
    norm=np.linalg.norm(v, ord=1)
    if norm==0:
        norm=np.finfo(v.dtype).eps
    return v/norm

回答by Joe

There is also the function unit_vector()to normalize vectors in the popular transformationsmodule by Christoph Gohlke:

Christoph Gohlkeunit_vector()在流行的转换模块中也有标准化向量的函数:

import transformations as trafo
import numpy as np

data = np.array([[1.0, 1.0, 0.0],
                 [1.0, 1.0, 1.0],
                 [1.0, 2.0, 3.0]])

print(trafo.unit_vector(data, axis=1))

回答by Jaden Travnik

If you have multidimensional data and want each axis normalized to its max or its sum:

如果您有多维数据并希望每个轴归一化为其最大值或总和:

def normalize(_d, to_sum=True, copy=True):
    # d is a (n x dimension) np array
    d = _d if not copy else np.copy(_d)
    d -= np.min(d, axis=0)
    d /= (np.sum(d, axis=0) if to_sum else np.ptp(d, axis=0))
    return d

Uses numpys peak to peakfunction.

使用 numpys峰峰值函数。

a = np.random.random((5, 3))

b = normalize(a, copy=False)
b.sum(axis=0) # array([1., 1., 1.]), the rows sum to 1

c = normalize(a, to_sum=False, copy=False)
c.max(axis=0) # array([1., 1., 1.]), the max of each row is 1

回答by mrk

This might also work for you

这也可能对你有用

import numpy as np
normalized_v = v / np.sqrt(np.sum(v**2))

but fails when vhas length 0.

但在v长度为 0时失败。

回答by max0r

If you want to normalize n dimensional feature vectors stored in a 3D tensor, you could also use PyTorch:

如果要对存储在 3D 张量中的 n 维特征向量进行归一化,也可以使用 PyTorch:

import numpy as np
from torch import FloatTensor
from torch.nn.functional import normalize

vecs = np.random.rand(3, 16, 16, 16)
norm_vecs = normalize(FloatTensor(vecs), dim=0, eps=1e-16).numpy()

回答by paulmelnikow

If you're working with 3D vectors, you can do this concisely using the toolbelt vg. It's a light layer on top of numpy and it supports single values and stacked vectors.

如果您正在使用 3D 矢量,您可以使用工具带vg简洁地完成此操作。它是 numpy 之上的一个轻层,它支持单值和堆叠向量。

import numpy as np
import vg

x = np.random.rand(1000)*10
norm1 = x / np.linalg.norm(x)
norm2 = vg.normalize(x)
print np.all(norm1 == norm2)
# True

I created the library at my last startup, where it was motivated by uses like this: simple ideas which are way too verbose in NumPy.

我在上次创业时创建了这个库,它的动机是这样的:简单的想法在 NumPy 中过于冗长。

回答by WY Hsu

You mentioned sci-kit learn, so I want to share another solution.

你提到了sci-kit学习,所以我想分享另一个解决方案。

sci-kit learn MinMaxScaler

sci-kit 学习 MinMaxScaler

In sci-kit learn, there is a API called MinMaxScalerwhich can customize the the value range as you like.

在 sci-kit learn 中,有一个 API 调用MinMaxScaler,可以根据需要自定义取值范围。

It also deal with NaN issues for us.

它还为我们处理 NaN 问题。

NaNs are treated as missing values: disregarded in fit, and maintained in transform. ... see reference [1]

NaN 被视为缺失值:在拟合中被忽略,并在转换中保持不变。... 见参考文献 [1]

Code sample

代码示例

The code is simple, just type

代码很简单,输入即可

# Let's say X_train is your input dataframe
from sklearn.preprocessing import MinMaxScaler
# call MinMaxScaler object
min_max_scaler = MinMaxScaler()
# feed in a numpy array
X_train_norm = min_max_scaler.fit_transform(X_train.values)
# wrap it up if you need a dataframe
df = pd.DataFrame(X_train_norm)
参考

回答by sergio verduzco

If you don't need utmost precision, your function can be reduced to:

如果您不需要最高的精度,您的功能可以简化为:

v_norm = v / (np.linalg.norm(v) + 1e-16)