pandas 缩放/标准化熊猫列

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/50027959/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 05:30:36  来源:igfitidea点击:

Scaling / Normalizing pandas column

pythonpandasscikit-learn

提问by machump

I have a dataframe like:

我有一个数据框,如:

TOTAL | Name
3232     Jane
382      Hyman
8291     Jones

I'd like to create a newly scaled column in the dataframe called SIZEwhere SIZEis a number between 5 and 50.

我想在名为SIZEwhereSIZE是 5 到 50 之间的数字的数据框中创建一个新缩放的列。

For Example:

例如:

TOTAL | Name | SIZE
3232     Jane   24.413
382      Hyman   10
8291     Jones  50

I've tried

我试过了

from sklearn.preprocessing import MinMaxScaler
import pandas as pd

scaler=MinMaxScaler(feature_range=(10,50))
df["SIZE"]=scaler.fit_transform(df["TOTAL"])

but got Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample.

但得到 Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample.

I've tried other things, such as creating a list, transforming it, and appending it back to the dataframe, among other things.

我尝试了其他事情,例如创建列表、转换它并将其附加回数据框等。

What is the easiest way to do this?

什么是最简单的方法来做到这一点?

Thanks!

谢谢!

回答by cs95

Option 1
sklearn
You see this problem time and time again, the error really should be indicative of what you need to do. You're basically missing a superfluous dimension on the input. Change df["TOTAL"]to df[["TOTAL"]].

选项 1
sklearn
您一次又一次地看到这个问题,错误确实应该表明您需要做什么。您基本上在输入中缺少一个多余的维度。更改df["TOTAL"]df[["TOTAL"]]

df['SIZE'] = scaler.fit_transform(df[["TOTAL"]])

df
   TOTAL   Name       SIZE
0   3232   Jane  24.413959
1    382   Hyman  10.000000
2   8291  Jones  50.000000


Option 2
pandas
Preferably, I would bypass sklearn and just do the min-max scaling myself.

选项 2
pandas
最好,我会绕过 sklearn 并自己进行最小-最大缩放。

a, b = 10, 50
x, y = df.TOTAL.min(), df.TOTAL.max()
df['SIZE'] = (df.TOTAL - x) / (y - x) * (b - a) + a

df
   TOTAL   Name       SIZE
0   3232   Jane  24.413959
1    382   Hyman  10.000000
2   8291  Jones  50.000000

This is essentially what the min-max scaler does, but without the overhead of importing scikit learn (don't do it unless you have to, it's a heavy library).

这基本上就是 min-max 缩放器所做的,但没有导入 scikit learn 的开销(除非必须,否则不要这样做,它是一个沉重的库)。

回答by Yehia Elshater

In case you want to scale only one column in the dataframe, you have to reshape the column values as follows:

如果您只想缩放数据框中的一列,则必须按如下方式重塑列值:

from sklearn.preprocessing import MinMaxScaler

scaler = MinMaxScaler()
df['SIZE'] = scaler.fit_transform(df['TOTAL'].values.reshape(-1,1))