pandas 在 0 和 1 之间标准化忽略 NaN

Question

提问by RockJake28

For a list of numbers ranging from xto ythat may contain NaN, how can I normalise between 0 and 1, ignoring the NaNvalues (they stay as NaN).

对于从x到y可能包含的数字列表，NaN我如何在 0 和 1 之间标准化，忽略NaN值（它们保持为NaN）。

Typically I would use MinMaxScaler(ref page) from sklearn.preprocessing, but this cannot handle NaNand recommends imputing the values based on mean or median etc. it doesn't offer the option to ignore all the NaNvalues.

通常我会使用MinMaxScaler（参考页面） from sklearn.preprocessing，但这无法处理NaN并建议根据平均值或中位数等估算值。它不提供忽略所有NaN值的选项。

Answer 1

采纳答案by piRSquared

consider pd.Seriess

考虑 pd.Seriess

s = pd.Series(np.random.choice([3, 4, 5, 6, np.nan], 100))
s.hist()

Option 1
Min Max Scaling

选项 1
最小最大缩放

new = s.sub(s.min()).div((s.max() - s.min()))
new.hist()

NOT WHAT OP ASKED FOR
I put these in because I wanted to

不是 OP 要求的
我把这些放进去是因为我想

Option 2
sigmoid

选项 2
sigmoid

sigmoid = lambda x: 1 / (1 + np.exp(-x))

new = sigmoid(s.sub(s.mean()))
new.hist()

Option 3
tanh (hyperbolic tangent)

选项 3
tanh（双曲正切）

new = np.tanh(s.sub(s.mean())).add(1).div(2)
new.hist()

Answer 2

回答by Chris Farr

Here's a different approach and one that I believe answers the OP correctly, the only difference is this works for a dataframe instead of a list, you can easily put your list in a dataframe as done below. The other options didn't work for me because I needed to store the MinMaxScaler in order to reverse transform after a prediction was made. So instead of passing the entire column to the MinMaxScaler you can filter out NaNs for both the target and the input.

这是一种不同的方法，我相信它可以正确回答 OP，唯一的区别是这适用于数据框而不是列表，您可以轻松地将列表放入数据框中，如下所示。其他选项对我不起作用，因为我需要存储 MinMaxScaler 以便在做出预测后进行反向变换。因此，您可以过滤掉目标和输入的 NaN，而不是将整个列传递给 MinMaxScaler。

Solution Example

解决方案示例

import pandas as pd

import numpy as np

from sklearn.preprocessing import MinMaxScaler

scaler = MinMaxScaler(feature_range=(0, 1))

d = pd.DataFrame({'A': [0, 1, 2, 3, np.nan, 3, 2]})

null_index = d['A'].isnull()

d.loc[~null_index, ['A']] = scaler.fit_transform(d.loc[~null_index, ['A']])

pandas 在 0 和 1 之间标准化忽略 NaN

提问by RockJake28

采纳答案by piRSquared

回答by Chris Farr

相关推荐

最近更新

标签

pandas 在 0 和 1 之间标准化忽略 NaN

提问by RockJake28

采纳答案by piRSquared

回答by Chris Farr

相关推荐

pandas 如何在 IronPython 中安装包/模块

pandas 从 DatetimeIndex 到时间列表

熊猫日均值，pandas.resample

pandas 熊猫删除所有不是“日期时间”类型的行

相关推荐

最近更新

标签