在python中规范化numpy数组列
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/29661574/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Normalize numpy array columns in python
提问by ahajib
I have a numpy array where each cell of a specific row represents a value for a feature. I store all of them in an 100*4 matrix.
我有一个 numpy 数组,其中特定行的每个单元格代表一个特征的值。我将它们全部存储在一个 100*4 的矩阵中。
A B C
1000 10 0.5
765 5 0.35
800 7 0.09
Any idea how I can normalize rows of this numpy.array where each value is between 0 and 1?
知道如何规范这个 numpy.array 的行,其中每个值都在 0 和 1 之间吗?
My desired output is:
我想要的输出是:
A B C
1 1 1
0.765 0.5 0.7
0.8 0.7 0.18(which is 0.09/0.5)
Thanks in advance :)
提前致谢 :)
采纳答案by ali_m
If I understand correctly, what you want to do is divide by the maximum value in each column. You can do this easily using broadcasting.
如果我理解正确,您想要做的是除以每列中的最大值。您可以使用广播轻松地做到这一点。
Starting with your example array:
从您的示例数组开始:
import numpy as np
x = np.array([[1000, 10, 0.5],
[ 765, 5, 0.35],
[ 800, 7, 0.09]])
x_normed = x / x.max(axis=0)
print(x_normed)
# [[ 1. 1. 1. ]
# [ 0.765 0.5 0.7 ]
# [ 0.8 0.7 0.18 ]]
x.max(0)
takes the maximum over the 0th dimension (i.e. rows). This gives you a vector of size (ncols,)
containing the maximum value in each column. You can then divide x
by this vector in order to normalize your values such that the maximum value in each column will be scaled to 1.
x.max(0)
在第 0 个维度(即行)上取最大值。这为您提供了一个大小向量,(ncols,)
其中包含每列中的最大值。然后x
,您可以除以该向量以标准化您的值,以便每列中的最大值将缩放为 1。
If x
contains negative values you would need to subtract the minimum first:
如果x
包含负值,您需要先减去最小值:
x_normed = (x - x.min(0)) / x.ptp(0)
Here, x.ptp(0)
returns the "peak-to-peak" (i.e. the range, max - min) along axis 0. This normalization also guarantees that the minimum value in each column will be 0.
此处,x.ptp(0)
返回沿轴 0 的“峰峰值”(即范围,最大值 - 最小值)。此归一化还保证每列中的最小值为 0。
回答by Marcin Mrugas
You can use sklearn.preprocessing:
您可以使用 sklearn.preprocessing:
from sklearn.preprocessing import normalize
data = np.array([
[1000, 10, 0.5],
[765, 5, 0.35],
[800, 7, 0.09], ])
data = normalize(data, axis=0, norm='max')
print(data)
>>[[ 1. 1. 1. ]
[ 0.765 0.5 0.7 ]
[ 0.8 0.7 0.18 ]]