pandas 缩放/标准化熊猫列
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/50027959/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Scaling / Normalizing pandas column
提问by machump
I have a dataframe like:
我有一个数据框,如:
TOTAL | Name
3232 Jane
382 Hyman
8291 Jones
I'd like to create a newly scaled column in the dataframe called SIZE
where SIZE
is a number between 5 and 50.
我想在名为SIZE
whereSIZE
是 5 到 50 之间的数字的数据框中创建一个新缩放的列。
For Example:
例如:
TOTAL | Name | SIZE
3232 Jane 24.413
382 Hyman 10
8291 Jones 50
I've tried
我试过了
from sklearn.preprocessing import MinMaxScaler
import pandas as pd
scaler=MinMaxScaler(feature_range=(10,50))
df["SIZE"]=scaler.fit_transform(df["TOTAL"])
but got Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample.
但得到 Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample.
I've tried other things, such as creating a list, transforming it, and appending it back to the dataframe, among other things.
我尝试了其他事情,例如创建列表、转换它并将其附加回数据框等。
What is the easiest way to do this?
什么是最简单的方法来做到这一点?
Thanks!
谢谢!
回答by cs95
Option 1sklearn
You see this problem time and time again, the error really should be indicative of what you need to do. You're basically missing a superfluous dimension on the input. Change df["TOTAL"]
to df[["TOTAL"]]
.
选项 1sklearn
您一次又一次地看到这个问题,错误确实应该表明您需要做什么。您基本上在输入中缺少一个多余的维度。更改df["TOTAL"]
为df[["TOTAL"]]
。
df['SIZE'] = scaler.fit_transform(df[["TOTAL"]])
df
TOTAL Name SIZE
0 3232 Jane 24.413959
1 382 Hyman 10.000000
2 8291 Jones 50.000000
Option 2pandas
Preferably, I would bypass sklearn and just do the min-max scaling myself.
选项 2pandas
最好,我会绕过 sklearn 并自己进行最小-最大缩放。
a, b = 10, 50
x, y = df.TOTAL.min(), df.TOTAL.max()
df['SIZE'] = (df.TOTAL - x) / (y - x) * (b - a) + a
df
TOTAL Name SIZE
0 3232 Jane 24.413959
1 382 Hyman 10.000000
2 8291 Jones 50.000000
This is essentially what the min-max scaler does, but without the overhead of importing scikit learn (don't do it unless you have to, it's a heavy library).
这基本上就是 min-max 缩放器所做的,但没有导入 scikit learn 的开销(除非必须,否则不要这样做,它是一个沉重的库)。
回答by Yehia Elshater
In case you want to scale only one column in the dataframe, you have to reshape the column values as follows:
如果您只想缩放数据框中的一列,则必须按如下方式重塑列值:
from sklearn.preprocessing import MinMaxScaler
scaler = MinMaxScaler()
df['SIZE'] = scaler.fit_transform(df['TOTAL'].values.reshape(-1,1))