Python 归一化以引入 [0,1] 的范围
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/18380419/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Normalization to bring in the range of [0,1]
提问by pypro
I have a huge data set from which I derive two sets of datapoints, which I then have to plot and compare. These two plots differ in their in their range, so I want them to be in the range of [0,1]. For the following code and a specific data set I get a constant line at 1 as the dataset plot, but this normalization works well for other sets:
我有一个巨大的数据集,从中可以得出两组数据点,然后我必须绘制和比较这些数据点。这两个图的范围不同,所以我希望它们在 [0,1] 的范围内。对于以下代码和特定数据集,我在 1 处得到一条恒定线作为数据集图,但这种归一化适用于其他数据集:
plt.plot(range(len(rvalue)),np.array(rvalue)/(max(rvalue)))
and for this code :
对于此代码:
oldrange = max(rvalue)-min(rvalue) #NORMALIZING
newmin=0
newrange = 1 + 0.9999999999 - newmin
normal = map(lambda x, r=float(rvalue[-1] - rvalue[0]): ((x - rvalue[0]) / r)*1 - 0, rvalue)
plt.plot(range(len(rvalue)),normal)
I get the error:
我收到错误:
ZeroDivisionError: float division by zero
for all the data sets. I am unable to figure out how to get both the plots in one range for comparison.
对于所有数据集。我无法弄清楚如何将两个图都放在一个范围内进行比较。
回答by Brionius
I tried to simplify things a little. Try this:
我试图简化一些事情。尝试这个:
oldmin = min(rvalue)
oldmax = max(rvalue)
oldrange = oldmax - oldmin
newmin = 0.
newmax = 1.
newrange = newmax - newmin
if oldrange == 0: # Deal with the case where rvalue is constant:
if oldmin < newmin: # If rvalue < newmin, set all rvalue values to newmin
newval = newmin
elif oldmin > newmax: # If rvalue > newmax, set all rvalue values to newmax
newval = newmax
else: # If newmin <= rvalue <= newmax, keep rvalue the same
newval = oldmin
normal = [newval for v in rvalue]
else:
scale = newrange / oldrange
normal = [(v - oldmin) * scale + newmin for v in rvalue]
plt.plot(range(len(rvalue)),normal)
The only reason I can see for the ZeroDivisionError
is if the data in rvalue were constant (all values are the same). Is that the case?
我能看到的唯一原因ZeroDivisionError
是右值中的数据是否恒定(所有值都相同)。是这样吗?
回答by CT Zhu
Finding the range of an array is provided by numpy
build-in function numpy.ptp()
, your question can be addresses by:
查找数组的范围由numpy
内置函数提供numpy.ptp()
,您的问题可以通过以下方式解决:
#First we should filter input_array so that it does not contain NaN or Inf.
input_array=np.array(some_data)
if np.unique(input_array).shape[0]==1:
pass #do thing if the input_array is constant
else:
result_array=(input_array-np.min(input_array))/np.ptp(input_array)
#To extend it to higher dimension, add axis= kwarvg to np.min and np.ptp
回答by Marissa Novak
Use scikit: http://scikit-learn.org/stable/modules/preprocessing.html#scaling-features-to-a-range
使用 scikit:http://scikit-learn.org/stable/modules/preprocessing.html#scaling-features-to-a-range
It has built in functions to scale features to a specified range. You'll find other functions to normalize and standardize here.
它具有将特征缩放到指定范围的内置函数。您会在此处找到其他用于规范化和标准化的函数。
See this example:
看这个例子:
>>> X_train = np.array([[ 1., -1., 2.],
... [ 2., 0., 0.],
... [ 0., 1., -1.]])
...
>>> min_max_scaler = preprocessing.MinMaxScaler()
>>> X_train_minmax = min_max_scaler.fit_transform(X_train)
>>> X_train_minmax
array([[ 0.5 , 0. , 1. ],
[ 1. , 0.5 , 0.33333333],
[ 0. , 1. , 0. ]])
回答by user3284005
Use the following method to normalize your data in the range of 0 to 1 using min and max value from the data sequence:
使用以下方法使用数据序列中的最小值和最大值在 0 到 1 的范围内标准化您的数据:
import numpy as np
def NormalizeData(data):
return (data - np.min(data)) / (np.max(data) - np.min(data))
回答by Jay Dangar
A simple way to normalize anything between 0 and 1 is just divide all the values by max value, from the all values. Will bring values between range of 0 to 1.
将 0 和 1 之间的任何值归一化的一种简单方法是将所有值除以最大值,即所有值。将带来 0 到 1 范围内的值。
回答by R Zhang
scikit_learn has a function for thissklearn.preprocessing.minmax_scale(X, feature_range=(0, 1), axis=0, copy=True)
scikit_learn 有一个功能sklearn.preprocessing.minmax_scale(X, feature_range=(0, 1), axis=0, copy=True)
More convenient than using the Class MinMaxScale.
比使用类 MinMaxScale 更方便。