Python 将 StandardScaler 应用于数据集的一部分

Question

提问by mitsi

I want to use sklearn's StandardScaler. Is it possible to apply it to some feature columns but not others?

我想用sklearn的StandardScaler。是否可以将其应用于某些特征列而不是其他特征列？

For instance, say my datais:

例如，说我data是：

data = pd.DataFrame({'Name' : [3, 4,6], 'Age' : [18, 92,98], 'Weight' : [68, 59,49]})

   Age  Name  Weight
0   18     3      68
1   92     4      59
2   98     6      49


col_names = ['Name', 'Age', 'Weight']
features = data[col_names]

I fit and transform the data

我适合并改造 data

scaler = StandardScaler().fit(features.values)
features = scaler.transform(features.values)
scaled_features = pd.DataFrame(features, columns = col_names)

       Name       Age    Weight
0 -1.069045 -1.411004  1.202703
1 -0.267261  0.623041  0.042954
2  1.336306  0.787964 -1.245657

But of course the names are not really integers but strings and I don't want to standardize them. How can I apply the fitand transformmethods only on the columns Ageand Weight?

但当然，名称并不是真正的整数，而是字符串，我不想标准化它们。如何仅对列和应用fit和transform方法？AgeWeight

Answer 1

回答by ayhan

Update:

更新：

Currently the best way to handle this is to use ColumnTransformer as explained here.

目前来处理这一点的最好办法是使用ColumnTransformer作为解释在这里。

First create a copy of your dataframe:

首先创建数据框的副本：

scaled_features = data.copy()

Don't include the Name column in the transformation:

不要在转换中包含 Name 列：

col_names = ['Age', 'Weight']
features = scaled_features[col_names]
scaler = StandardScaler().fit(features.values)
features = scaler.transform(features.values)

Now, don't create a new dataframe but assign the result to those two columns:

现在，不要创建新的数据框，而是将结果分配给这两列：

scaled_features[col_names] = features
print(scaled_features)


        Age  Name    Weight
0 -1.411004     3  1.202703
1  0.623041     4  0.042954
2  0.787964     6 -1.245657

Answer 2

回答by Guy C

Introduced in v0.20 is ColumnTransformerwhich applies transformers to a specified set of columns of an array or pandas DataFrame.

v0.20 中引入了 ColumnTransformer，它将转换器应用于数组或 Pandas DataFrame 的一组指定列。

import pandas as pd
data = pd.DataFrame({'Name' : [3, 4,6], 'Age' : [18, 92,98], 'Weight' : [68, 59,49]})

col_names = ['Name', 'Age', 'Weight']
features = data[col_names]

from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import StandardScaler

ct = ColumnTransformer([
        ('somename', StandardScaler(), ['Age', 'Weight'])
    ], remainder='passthrough')

ct.fit_transform(features)

NB: Like Pipeline it also has a shorthand version make_column_transformerwhich doesn't require naming the transformers

注意：像流水线一样，它也有一个速记版本make_column_transformer不需要命名变压器

Output

输出

-1.41100443,  1.20270298,  3.       
 0.62304092,  0.04295368,  4.       
 0.78796352, -1.24565666,  6.

Answer 3

回答by Danil

Another option would be to drop Name column before scaling then merge it back together:

另一种选择是在缩放之前删除 Name 列，然后将其合并在一起：

data = pd.DataFrame({'Name' : [3, 4,6], 'Age' : [18, 92,98], 'Weight' : [68, 59,49]})
from sklearn.preprocessing import StandardScaler

# Save the variable you don't want to scale
name_var = data['Name']

# Fit scaler to your data
scaler.fit(data.drop('Name', axis = 1))

# Calculate scaled values and store them in a separate object
scaled_values = scaler.transform(data.drop('Name', axis = 1))

data = pd.DataFrame(scaled_values, index = data.index, columns = data.drop('ID', axis = 1).columns)
data['Name'] = name_var

print(data)

Answer 4

回答by hashcode55

A more pythonic way to do this -

一种更pythonic的方法来做到这一点 -

from sklearn.preprocessing import StandardScaler
data[['Age','Weight']] = data[['Age','Weight']].apply(
                           lambda x: StandardScaler().fit_transform(x))
data

Output -

输出 -

         Age  Name    Weight
0 -1.411004     3  1.202703
1  0.623041     4  0.042954
2  0.787964     6 -1.245657

Python 将 StandardScaler 应用于数据集的一部分

提问by mitsi

回答by ayhan

Update:

更新：

回答by Guy C

Output

输出

回答by Danil

回答by hashcode55

相关推荐

最近更新

标签

Python 将 StandardScaler 应用于数据集的一部分

提问by mitsi

回答by ayhan

Update:

更新：

回答by Guy C

Output

输出

回答by Danil

回答by hashcode55

相关推荐

Python 如何使用 pip 安装 Openpyxl

带有 Python 3.8 的 Jupyter Notebook - NotImplementedError

Python 和 Matplotlib：在 Jupyter Notebook 中使 3D 绘图具有交互性

Python 更新错误“模块”对象不可调用后 pip 不再工作

相关推荐

最近更新

标签