在python中规范化numpy数组列

Question

提问by ahajib

I have a numpy array where each cell of a specific row represents a value for a feature. I store all of them in an 100*4 matrix.

我有一个 numpy 数组，其中特定行的每个单元格代表一个特征的值。我将它们全部存储在一个 100*4 的矩阵中。

A     B   C
1000  10  0.5
765   5   0.35
800   7   0.09

Any idea how I can normalize rows of this numpy.array where each value is between 0 and 1?

知道如何规范这个 numpy.array 的行，其中每个值都在 0 和 1 之间吗？

My desired output is:

我想要的输出是：

A     B    C
1     1    1
0.765 0.5  0.7
0.8   0.7  0.18(which is 0.09/0.5)

Thanks in advance :)

提前致谢：）

Answer 1

采纳答案by ali_m

If I understand correctly, what you want to do is divide by the maximum value in each column. You can do this easily using broadcasting.

如果我理解正确，您想要做的是除以每列中的最大值。您可以使用广播轻松地做到这一点。

Starting with your example array:

从您的示例数组开始：

import numpy as np

x = np.array([[1000,  10,   0.5],
              [ 765,   5,  0.35],
              [ 800,   7,  0.09]])

x_normed = x / x.max(axis=0)

print(x_normed)
# [[ 1.     1.     1.   ]
#  [ 0.765  0.5    0.7  ]
#  [ 0.8    0.7    0.18 ]]

x.max(0)takes the maximum over the 0th dimension (i.e. rows). This gives you a vector of size (ncols,)containing the maximum value in each column. You can then divide xby this vector in order to normalize your values such that the maximum value in each column will be scaled to 1.

x.max(0)在第 0 个维度（即行）上取最大值。这为您提供了一个大小向量，(ncols,)其中包含每列中的最大值。然后x，您可以除以该向量以标准化您的值，以便每列中的最大值将缩放为 1。

If xcontains negative values you would need to subtract the minimum first:

如果x包含负值，您需要先减去最小值：

x_normed = (x - x.min(0)) / x.ptp(0)

Here, x.ptp(0)returns the "peak-to-peak" (i.e. the range, max - min) along axis 0. This normalization also guarantees that the minimum value in each column will be 0.

此处，x.ptp(0)返回沿轴 0 的“峰峰值”（即范围，最大值 - 最小值）。此归一化还保证每列中的最小值为 0。

Answer 2

回答by Marcin Mrugas

You can use sklearn.preprocessing:

您可以使用 sklearn.preprocessing：

from sklearn.preprocessing import normalize
data = np.array([
    [1000, 10, 0.5],
    [765, 5, 0.35],
    [800, 7, 0.09], ])
data = normalize(data, axis=0, norm='max')
print(data)
>>[[ 1.     1.     1.   ]
[ 0.765  0.5    0.7  ]
[ 0.8    0.7    0.18 ]]

在python中规范化numpy数组列

提问by ahajib

采纳答案by ali_m

回答by Marcin Mrugas

相关推荐

最近更新

标签

在python中规范化numpy数组列

提问by ahajib

采纳答案by ali_m

回答by Marcin Mrugas

相关推荐

在 Python Django 中加载 Mysqldb 模块时出错

Apache Spark 和 python lambda

Python Django 1.9 弃用警告 app_label

在 Python 中读取文件并绘制 CDF

相关推荐

最近更新

标签