Python：使用熊猫逐列缩放数字

Question

提问by Rodolphe

I have a Pandas data frame 'df' in which I'd like to perform some scalings column by column.

我有一个 Pandas 数据框 'df'，我想在其中逐列执行一些缩放。

In column 'a', I need the maximum number to be 1, the minimum number to be 0, and all other to be spread accordingly.
In column 'b', however, I need the minimum number to be 1, the maximum number to be 0, and all other to be spread accordingly.

在“a”列中，我需要最大数为 1，最小数为 0，所有其他数相应地展开。
但是，在列 'b' 中，我需要最小数为 1，最大数为 0，所有其他数相应地展开。

Is there a Pandas function to perform these two operations? If not, numpy would certainly do.

是否有 Pandas 函数来执行这两个操作？如果没有，numpy 肯定会做。

    a    b
A   14   103
B   90   107
C   90   110
D   96   114
E   91   114

Answer 1

采纳答案by Andy Hayden

You could subtract by the min, then divide by the max (beware 0/0). Note that after subtracting the min, the new max is the original max - min.

您可以减去最小值，然后除以最大值（注意 0/0）。请注意，减去最小值后，新的最大值是原始最大值 - 最小值。

In [11]: df
Out[11]:
    a    b
A  14  103
B  90  107
C  90  110
D  96  114
E  91  114

In [12]: df -= df.min()  # equivalent to df = df - df.min()

In [13]: df /= df.max()  # equivalent to df = df / df.max()

In [14]: df
Out[14]:
          a         b
A  0.000000  0.000000
B  0.926829  0.363636
C  0.926829  0.636364
D  1.000000  1.000000
E  0.939024  1.000000

To switch the order of a column (from 1 to 0 rather than 0 to 1):

切换列的顺序（从 1 到 0 而不是 0 到 1）：

In [15]: df['b'] = 1 - df['b']

An alternative method is to negate the b columns first(df['b'] = -df['b']).

另一种方法是否定B柱第一（df['b'] = -df['b']）。

Answer 2

回答by Falcon9

This is not very elegant but the following works for this two column case:

这不是很优雅，但以下适用于这两个列的情况：

#Create dataframe
df = pd.DataFrame({'A':[14,90,90,96,91], 'B':[103,107,110,114,114]})

#Apply operates on each row or column with the lambda function
#axis = 0 -> act on columns, axis = 1 act on rows
#x is a variable for the whole row or column
#This line will scale minimum = 0 and maximum = 1 for each column
df2 = df.apply(lambda x:(x.astype(float) - min(x))/(max(x)-min(x)), axis = 0)

#Want to now invert the order on column 'B'
#Use apply function again, reverse numbers in column, select column 'B' only and 
#reassign to column 'B' of original dataframe
df2['B'] = df2.apply(lambda x: 1-x, axis = 1)['B']

If I find a more elegant way (for example, using the column index: (0 or 1)mod 2 - 1 to select the sign in the apply operation so it can be done with just one apply command, I'll let you know.

如果我找到了一种更优雅的方法（例如，使用列索引：(0 or 1)mod 2 - 1 在应用操作中选择符号，以便只需一个应用命令即可完成，我会告诉您.

Answer 3

回答by Zelazny7

This is how you can do it using sklearnand the preprocessingmodule. Sci-Kit Learn has many pre-processing functions for scaling and centering data.

这是您如何使用sklearn和preprocessing模块来做到这一点。Sci-Kit Learn 具有许多用于缩放和居中数据的预处理功能。

In [0]: from sklearn.preprocessing import MinMaxScaler

In [1]: df = pd.DataFrame({'A':[14,90,90,96,91],
                           'B':[103,107,110,114,114]}).astype(float)

In [2]: df
Out[2]:
    A    B
0  14  103
1  90  107
2  90  110
3  96  114
4  91  114

In [3]: scaler = MinMaxScaler()

In [4]: df_scaled = pd.DataFrame(scaler.fit_transform(df), columns=df.columns)

In [5]: df_scaled
Out[5]:
          A         B
0  0.000000  0.000000
1  0.926829  0.363636
2  0.926829  0.636364
3  1.000000  1.000000
4  0.939024  1.000000

Answer 4

回答by Alejandro Andrade

given a data frame

给定一个数据框

df = pd.DataFrame({'A':[14,90,90,96,91], 'B':[103,107,110,114,114]})

scale with mean 0 and var 1

用均值 0 和 var 1 进行缩放

df.apply(lambda x: (x - np.mean(x)) / np.std(x), axis=0)

scale with range between 0 and 1

范围在 0 到 1 之间的缩放

df.apply(lambda x: x / np.max(x), axis=0)

Answer 5

回答by Yehia Elshater

In case you want to scale only one column in the dataframe, you can do the following:

如果您只想缩放数据框中的一列，您可以执行以下操作：

from sklearn.preprocessing import MinMaxScaler

scaler = MinMaxScaler()
df['Col1_scaled'] = scaler.fit_transform(df['Col1'].values.reshape(-1,1))

Answer 6

回答by Markus Dutschke

I think Acumenus'comment in thisanswer, should be mentioned explicitly as an answer, as it is a one-liner.

我认为Acumenus在这个答案中的评论应该作为答案明确提及，因为它是单行的。

>>> import pandas as pd
>>> from sklearn.preprocessing import minmax_scale
>>> df = pd.DataFrame({'A':[14,90,90,96,91], 'B':[103,107,110,114,114]})
>>> minmax_scale(df)
array([[0.        , 0.        ],
       [0.92682927, 0.36363636],
       [0.92682927, 0.63636364],
       [1.        , 1.        ],
       [0.93902439, 1.        ]])

Python：使用熊猫逐列缩放数字

提问by Rodolphe

采纳答案by Andy Hayden

回答by Falcon9

回答by Zelazny7

回答by Alejandro Andrade

回答by Yehia Elshater

回答by Markus Dutschke

相关推荐

最近更新

标签

Python：使用熊猫逐列缩放数字

提问by Rodolphe

采纳答案by Andy Hayden

回答by Falcon9

回答by Zelazny7

回答by Alejandro Andrade

回答by Yehia Elshater

回答by Markus Dutschke

相关推荐

Python 如何关闭 SQLAlchemy 会话？

在 Python 中将整数转换为 2 字节的十六进制值

Python 如何将 Iperf 结果保存在输出文件中

Python MySQLdb TypeError：并非所有参数都在字符串格式化期间转换

相关推荐

最近更新

标签