Python:使用熊猫逐列缩放数字
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/21764475/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Python: Scaling numbers column by column with pandas
提问by Rodolphe
I have a Pandas data frame 'df' in which I'd like to perform some scalings column by column.
我有一个 Pandas 数据框 'df',我想在其中逐列执行一些缩放。
- In column 'a', I need the maximum number to be 1, the minimum number to be 0, and all other to be spread accordingly.
- In column 'b', however, I need the minimum number to be 1, the maximum number to be 0, and all other to be spread accordingly.
- 在“a”列中,我需要最大数为 1,最小数为 0,所有其他数相应地展开。
- 但是,在列 'b' 中,我需要最小数为 1,最大数为 0,所有其他数相应地展开。
Is there a Pandas function to perform these two operations? If not, numpy would certainly do.
是否有 Pandas 函数来执行这两个操作?如果没有,numpy 肯定会做。
a b
A 14 103
B 90 107
C 90 110
D 96 114
E 91 114
采纳答案by Andy Hayden
You could subtract by the min, then divide by the max (beware 0/0). Note that after subtracting the min, the new max is the original max - min.
您可以减去最小值,然后除以最大值(注意 0/0)。请注意,减去最小值后,新的最大值是原始最大值 - 最小值。
In [11]: df
Out[11]:
a b
A 14 103
B 90 107
C 90 110
D 96 114
E 91 114
In [12]: df -= df.min() # equivalent to df = df - df.min()
In [13]: df /= df.max() # equivalent to df = df / df.max()
In [14]: df
Out[14]:
a b
A 0.000000 0.000000
B 0.926829 0.363636
C 0.926829 0.636364
D 1.000000 1.000000
E 0.939024 1.000000
To switch the order of a column (from 1 to 0 rather than 0 to 1):
切换列的顺序(从 1 到 0 而不是 0 到 1):
In [15]: df['b'] = 1 - df['b']
An alternative method is to negate the b columns first(df['b'] = -df['b']).
另一种方法是否定B柱第一(df['b'] = -df['b'])。
回答by Falcon9
This is not very elegant but the following works for this two column case:
这不是很优雅,但以下适用于这两个列的情况:
#Create dataframe
df = pd.DataFrame({'A':[14,90,90,96,91], 'B':[103,107,110,114,114]})
#Apply operates on each row or column with the lambda function
#axis = 0 -> act on columns, axis = 1 act on rows
#x is a variable for the whole row or column
#This line will scale minimum = 0 and maximum = 1 for each column
df2 = df.apply(lambda x:(x.astype(float) - min(x))/(max(x)-min(x)), axis = 0)
#Want to now invert the order on column 'B'
#Use apply function again, reverse numbers in column, select column 'B' only and
#reassign to column 'B' of original dataframe
df2['B'] = df2.apply(lambda x: 1-x, axis = 1)['B']
If I find a more elegant way (for example, using the column index: (0 or 1)mod 2 - 1 to select the sign in the apply operation so it can be done with just one apply command, I'll let you know.
如果我找到了一种更优雅的方法(例如,使用列索引:(0 or 1)mod 2 - 1 在应用操作中选择符号,以便只需一个应用命令即可完成,我会告诉您.
回答by Zelazny7
This is how you can do it using sklearnand the preprocessingmodule. Sci-Kit Learn has many pre-processing functions for scaling and centering data.
这是您如何使用sklearn和preprocessing模块来做到这一点。Sci-Kit Learn 具有许多用于缩放和居中数据的预处理功能。
In [0]: from sklearn.preprocessing import MinMaxScaler
In [1]: df = pd.DataFrame({'A':[14,90,90,96,91],
'B':[103,107,110,114,114]}).astype(float)
In [2]: df
Out[2]:
A B
0 14 103
1 90 107
2 90 110
3 96 114
4 91 114
In [3]: scaler = MinMaxScaler()
In [4]: df_scaled = pd.DataFrame(scaler.fit_transform(df), columns=df.columns)
In [5]: df_scaled
Out[5]:
A B
0 0.000000 0.000000
1 0.926829 0.363636
2 0.926829 0.636364
3 1.000000 1.000000
4 0.939024 1.000000
回答by Alejandro Andrade
given a data frame
给定一个数据框
df = pd.DataFrame({'A':[14,90,90,96,91], 'B':[103,107,110,114,114]})
scale with mean 0 and var 1
用均值 0 和 var 1 进行缩放
df.apply(lambda x: (x - np.mean(x)) / np.std(x), axis=0)
scale with range between 0 and 1
范围在 0 到 1 之间的缩放
df.apply(lambda x: x / np.max(x), axis=0)
回答by Yehia Elshater
In case you want to scale only one column in the dataframe, you can do the following:
如果您只想缩放数据框中的一列,您可以执行以下操作:
from sklearn.preprocessing import MinMaxScaler
scaler = MinMaxScaler()
df['Col1_scaled'] = scaler.fit_transform(df['Col1'].values.reshape(-1,1))
回答by Markus Dutschke
I think Acumenus'comment in thisanswer, should be mentioned explicitly as an answer, as it is a one-liner.
我认为Acumenus在这个答案中的评论应该作为答案明确提及,因为它是单行的。
>>> import pandas as pd
>>> from sklearn.preprocessing import minmax_scale
>>> df = pd.DataFrame({'A':[14,90,90,96,91], 'B':[103,107,110,114,114]})
>>> minmax_scale(df)
array([[0. , 0. ],
[0.92682927, 0.36363636],
[0.92682927, 0.63636364],
[1. , 1. ],
[0.93902439, 1. ]])

