Python 归一化以引入 [0,1] 的范围
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/18380419/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Normalization to bring in the range of [0,1]
提问by pypro
I have a huge data set from which I derive two sets of datapoints, which I then have to plot and compare. These two plots differ in their in their range, so I want them to be in the range of [0,1]. For the following code and a specific data set I get a constant line at 1 as the dataset plot, but this normalization works well for other sets:
我有一个巨大的数据集,从中可以得出两组数据点,然后我必须绘制和比较这些数据点。这两个图的范围不同,所以我希望它们在 [0,1] 的范围内。对于以下代码和特定数据集,我在 1 处得到一条恒定线作为数据集图,但这种归一化适用于其他数据集:
plt.plot(range(len(rvalue)),np.array(rvalue)/(max(rvalue)))
and for this code :
对于此代码:
oldrange = max(rvalue)-min(rvalue) #NORMALIZING
newmin=0
newrange = 1 + 0.9999999999 - newmin
normal = map(lambda x, r=float(rvalue[-1] - rvalue[0]): ((x - rvalue[0]) / r)*1 - 0, rvalue)
plt.plot(range(len(rvalue)),normal)
I get the error:
我收到错误:
ZeroDivisionError: float division by zero
for all the data sets. I am unable to figure out how to get both the plots in one range for comparison.
对于所有数据集。我无法弄清楚如何将两个图都放在一个范围内进行比较。
回答by Brionius
I tried to simplify things a little. Try this:
我试图简化一些事情。尝试这个:
oldmin = min(rvalue)
oldmax = max(rvalue)
oldrange = oldmax - oldmin
newmin = 0.
newmax = 1.
newrange = newmax - newmin
if oldrange == 0: # Deal with the case where rvalue is constant:
if oldmin < newmin: # If rvalue < newmin, set all rvalue values to newmin
newval = newmin
elif oldmin > newmax: # If rvalue > newmax, set all rvalue values to newmax
newval = newmax
else: # If newmin <= rvalue <= newmax, keep rvalue the same
newval = oldmin
normal = [newval for v in rvalue]
else:
scale = newrange / oldrange
normal = [(v - oldmin) * scale + newmin for v in rvalue]
plt.plot(range(len(rvalue)),normal)
The only reason I can see for the ZeroDivisionErroris if the data in rvalue were constant (all values are the same). Is that the case?
我能看到的唯一原因ZeroDivisionError是右值中的数据是否恒定(所有值都相同)。是这样吗?
回答by CT Zhu
Finding the range of an array is provided by numpybuild-in function numpy.ptp(), your question can be addresses by:
查找数组的范围由numpy内置函数提供numpy.ptp(),您的问题可以通过以下方式解决:
#First we should filter input_array so that it does not contain NaN or Inf.
input_array=np.array(some_data)
if np.unique(input_array).shape[0]==1:
pass #do thing if the input_array is constant
else:
result_array=(input_array-np.min(input_array))/np.ptp(input_array)
#To extend it to higher dimension, add axis= kwarvg to np.min and np.ptp
回答by Marissa Novak
Use scikit: http://scikit-learn.org/stable/modules/preprocessing.html#scaling-features-to-a-range
使用 scikit:http://scikit-learn.org/stable/modules/preprocessing.html#scaling-features-to-a-range
It has built in functions to scale features to a specified range. You'll find other functions to normalize and standardize here.
它具有将特征缩放到指定范围的内置函数。您会在此处找到其他用于规范化和标准化的函数。
See this example:
看这个例子:
>>> X_train = np.array([[ 1., -1., 2.],
... [ 2., 0., 0.],
... [ 0., 1., -1.]])
...
>>> min_max_scaler = preprocessing.MinMaxScaler()
>>> X_train_minmax = min_max_scaler.fit_transform(X_train)
>>> X_train_minmax
array([[ 0.5 , 0. , 1. ],
[ 1. , 0.5 , 0.33333333],
[ 0. , 1. , 0. ]])
回答by user3284005
Use the following method to normalize your data in the range of 0 to 1 using min and max value from the data sequence:
使用以下方法使用数据序列中的最小值和最大值在 0 到 1 的范围内标准化您的数据:
import numpy as np
def NormalizeData(data):
return (data - np.min(data)) / (np.max(data) - np.min(data))
回答by Jay Dangar
A simple way to normalize anything between 0 and 1 is just divide all the values by max value, from the all values. Will bring values between range of 0 to 1.
将 0 和 1 之间的任何值归一化的一种简单方法是将所有值除以最大值,即所有值。将带来 0 到 1 范围内的值。
回答by R Zhang
scikit_learn has a function for thissklearn.preprocessing.minmax_scale(X, feature_range=(0, 1), axis=0, copy=True)
scikit_learn 有一个功能sklearn.preprocessing.minmax_scale(X, feature_range=(0, 1), axis=0, copy=True)
More convenient than using the Class MinMaxScale.
比使用类 MinMaxScale 更方便。

