pandas 在 0 和 1 之间标准化忽略 NaN
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/39758449/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Normalise between 0 and 1 ignoring NaN
提问by RockJake28
For a list of numbers ranging from x
to y
that may contain NaN
, how can I normalise between 0 and 1, ignoring the NaN
values (they stay as NaN
).
对于从x
到y
可能包含的数字列表,NaN
我如何在 0 和 1 之间标准化,忽略NaN
值(它们保持为NaN
)。
Typically I would use MinMaxScaler
(ref page) from sklearn.preprocessing
, but this cannot handle NaN
and recommends imputing the values based on mean or median etc. it doesn't offer the option to ignore all the NaN
values.
通常我会使用MinMaxScaler
(参考页面) from sklearn.preprocessing
,但这无法处理NaN
并建议根据平均值或中位数等估算值。它不提供忽略所有NaN
值的选项。
采纳答案by piRSquared
consider pd.Series
s
考虑 pd.Series
s
s = pd.Series(np.random.choice([3, 4, 5, 6, np.nan], 100))
s.hist()
Option 1
Min Max Scaling
选项 1
最小最大缩放
new = s.sub(s.min()).div((s.max() - s.min()))
new.hist()
NOT WHAT OP ASKED FOR
I put these in because I wanted to
不是 OP 要求的
我把这些放进去是因为我想
Option 2
sigmoid
选项 2
sigmoid
sigmoid = lambda x: 1 / (1 + np.exp(-x))
new = sigmoid(s.sub(s.mean()))
new.hist()
Option 3
tanh (hyperbolic tangent)
选项 3
tanh(双曲正切)
new = np.tanh(s.sub(s.mean())).add(1).div(2)
new.hist()
回答by Chris Farr
Here's a different approach and one that I believe answers the OP correctly, the only difference is this works for a dataframe instead of a list, you can easily put your list in a dataframe as done below. The other options didn't work for me because I needed to store the MinMaxScaler in order to reverse transform after a prediction was made. So instead of passing the entire column to the MinMaxScaler you can filter out NaNs for both the target and the input.
这是一种不同的方法,我相信它可以正确回答 OP,唯一的区别是这适用于数据框而不是列表,您可以轻松地将列表放入数据框中,如下所示。其他选项对我不起作用,因为我需要存储 MinMaxScaler 以便在做出预测后进行反向变换。因此,您可以过滤掉目标和输入的 NaN,而不是将整个列传递给 MinMaxScaler。
Solution Example
解决方案示例
import pandas as pd
import pandas as pd
import numpy as np
import numpy as np
from sklearn.preprocessing import MinMaxScaler
from sklearn.preprocessing import MinMaxScaler
scaler = MinMaxScaler(feature_range=(0, 1))
scaler = MinMaxScaler(feature_range=(0, 1))
d = pd.DataFrame({'A': [0, 1, 2, 3, np.nan, 3, 2]})
d = pd.DataFrame({'A': [0, 1, 2, 3, np.nan, 3, 2]})
null_index = d['A'].isnull()
null_index = d['A'].isnull()
d.loc[~null_index, ['A']] = scaler.fit_transform(d.loc[~null_index, ['A']])
d.loc[~null_index, ['A']] = scaler.fit_transform(d.loc[~null_index, ['A']])