在python中规范化numpy数组列

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/29661574/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 04:51:22  来源:igfitidea点击:

Normalize numpy array columns in python

pythonnumpynormalize

提问by ahajib

I have a numpy array where each cell of a specific row represents a value for a feature. I store all of them in an 100*4 matrix.

我有一个 numpy 数组,其中特定行的每个单元格代表一个特征的值。我将它们全部存储在一个 100*4 的矩阵中。

A     B   C
1000  10  0.5
765   5   0.35
800   7   0.09  

Any idea how I can normalize rows of this numpy.array where each value is between 0 and 1?

知道如何规范这个 numpy.array 的行,其中每个值都在 0 和 1 之间吗?

My desired output is:

我想要的输出是:

A     B    C
1     1    1
0.765 0.5  0.7
0.8   0.7  0.18(which is 0.09/0.5)

Thanks in advance :)

提前致谢 :)

采纳答案by ali_m

If I understand correctly, what you want to do is divide by the maximum value in each column. You can do this easily using broadcasting.

如果我理解正确,您想要做的是除以每列中的最大值。您可以使用广播轻松地做到这一点。

Starting with your example array:

从您的示例数组开始:

import numpy as np

x = np.array([[1000,  10,   0.5],
              [ 765,   5,  0.35],
              [ 800,   7,  0.09]])

x_normed = x / x.max(axis=0)

print(x_normed)
# [[ 1.     1.     1.   ]
#  [ 0.765  0.5    0.7  ]
#  [ 0.8    0.7    0.18 ]]

x.max(0)takes the maximum over the 0th dimension (i.e. rows). This gives you a vector of size (ncols,)containing the maximum value in each column. You can then divide xby this vector in order to normalize your values such that the maximum value in each column will be scaled to 1.

x.max(0)在第 0 个维度(即行)上取最大值。这为您提供了一个大小向量,(ncols,)其中包含每列中的最大值。然后x,您可以除以该向量以标准化您的值,以便每列中的最大值将缩放为 1。



If xcontains negative values you would need to subtract the minimum first:

如果x包含负值,您需要先减去最小值:

x_normed = (x - x.min(0)) / x.ptp(0)

Here, x.ptp(0)returns the "peak-to-peak" (i.e. the range, max - min) along axis 0. This normalization also guarantees that the minimum value in each column will be 0.

此处,x.ptp(0)返回沿轴 0 的“峰峰值”(即范围,最大值 - 最小值)。此归一化还保证每列中的最小值为 0。

回答by Marcin Mrugas

You can use sklearn.preprocessing:

您可以使用 sklearn.preprocessing:

from sklearn.preprocessing import normalize
data = np.array([
    [1000, 10, 0.5],
    [765, 5, 0.35],
    [800, 7, 0.09], ])
data = normalize(data, axis=0, norm='max')
print(data)
>>[[ 1.     1.     1.   ]
[ 0.765  0.5    0.7  ]
[ 0.8    0.7    0.18 ]]