pandas 在 0 和 1 之间标准化忽略 NaN

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/39758449/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 02:06:15  来源:igfitidea点击:

Normalise between 0 and 1 ignoring NaN

pythonpandasnumpyscikit-learn

提问by RockJake28

For a list of numbers ranging from xto ythat may contain NaN, how can I normalise between 0 and 1, ignoring the NaNvalues (they stay as NaN).

对于从xy可能包含的数字列表,NaN我如何在 0 和 1 之间标准化,忽略NaN值(它们保持为NaN)。

Typically I would use MinMaxScaler(ref page) from sklearn.preprocessing, but this cannot handle NaNand recommends imputing the values based on mean or median etc. it doesn't offer the option to ignore all the NaNvalues.

通常我会使用MinMaxScaler参考页面) from sklearn.preprocessing,但这无法处理NaN并建议根据平均值或中位数等估算值。它不提供忽略所有NaN值的选项。

采纳答案by piRSquared

consider pd.Seriess

考虑 pd.Seriess

s = pd.Series(np.random.choice([3, 4, 5, 6, np.nan], 100))
s.hist()

enter image description here

在此处输入图片说明



Option 1
Min Max Scaling

选项 1
最小最大缩放

new = s.sub(s.min()).div((s.max() - s.min()))
new.hist()

enter image description here

在此处输入图片说明



NOT WHAT OP ASKED FOR
I put these in because I wanted to

不是 OP 要求的
我把这些放进去是因为我想

Option 2
sigmoid

选项 2
sigmoid

sigmoid = lambda x: 1 / (1 + np.exp(-x))

new = sigmoid(s.sub(s.mean()))
new.hist()

enter image description here

在此处输入图片说明



Option 3
tanh (hyperbolic tangent)

选项 3
tanh(双曲正切)

new = np.tanh(s.sub(s.mean())).add(1).div(2)
new.hist()

enter image description here

在此处输入图片说明

回答by Chris Farr

Here's a different approach and one that I believe answers the OP correctly, the only difference is this works for a dataframe instead of a list, you can easily put your list in a dataframe as done below. The other options didn't work for me because I needed to store the MinMaxScaler in order to reverse transform after a prediction was made. So instead of passing the entire column to the MinMaxScaler you can filter out NaNs for both the target and the input.

这是一种不同的方法,我相信它可以正确回答 OP,唯一的区别是这适用于数据框而不是列表,您可以轻松地将列表放入数据框中,如下所示。其他选项对我不起作用,因为我需要存储 MinMaxScaler 以便在做出预测后进行反向变换。因此,您可以过滤掉目标和输入的 NaN,而不是将整个列传递给 MinMaxScaler。

Solution Example

解决方案示例

import pandas as pd

import pandas as pd

import numpy as np

import numpy as np

from sklearn.preprocessing import MinMaxScaler

from sklearn.preprocessing import MinMaxScaler

scaler = MinMaxScaler(feature_range=(0, 1))

scaler = MinMaxScaler(feature_range=(0, 1))

d = pd.DataFrame({'A': [0, 1, 2, 3, np.nan, 3, 2]})

d = pd.DataFrame({'A': [0, 1, 2, 3, np.nan, 3, 2]})

null_index = d['A'].isnull()

null_index = d['A'].isnull()

d.loc[~null_index, ['A']] = scaler.fit_transform(d.loc[~null_index, ['A']])

d.loc[~null_index, ['A']] = scaler.fit_transform(d.loc[~null_index, ['A']])