如何使用 pandas/numpy 标准化/规范化日期?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/31036148/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-13 23:31:33  来源:igfitidea点击:

How to standardize/normalize a date with pandas/numpy?

pythonnumpypandas

提问by user1587451

With following code snippet

使用以下代码片段

import pandas as pd
train = pd.read_csv('train.csv',parse_dates=['dates'])
print(data['dates'])

I load and control the data.

我加载和控制数据。

My question is, how can I standardize/normalize data['dates'] to make all the elements lie between -1 and 1 (linear or gaussian)??

我的问题是,如何标准化/标准化 data['dates'] 以使所有元素都位于 -1 和 1(线性或高斯)之间??

回答by bakkal

import pandas as pd
import numpy as np
from sklearn.preprocessing import MinMaxScaler
import time

def convert_to_timestamp(x):
    """Convert date objects to integers"""
    return time.mktime(x.to_datetime().timetuple())


def normalize(df):
    """Normalize the DF using min/max"""
    scaler = MinMaxScaler(feature_range=(-1, 1))
    dates_scaled = scaler.fit_transform(df['dates'])

    return dates_scaled

if __name__ == '__main__':
    # Create a random series of dates
    df = pd.DataFrame({
        'dates':
            ['1980-01-01', '1980-02-02', '1980-03-02', '1980-01-21',
             '1981-01-21', '1991-02-21', '1991-03-23']
    })

    # Convert to date objects
    df['dates'] = pd.to_datetime(df['dates'])

    # Now df has date objects like you would, we convert to UNIX timestamps
    df['dates'] = df['dates'].apply(convert_to_timestamp)

    # Call normalization function
    df = normalize(df)

Sample:

样本:

Date objects that we convert using convert_to_timestamp

我们转换使用的日期对象 convert_to_timestamp

       dates
0 1980-01-01
1 1980-02-02
2 1980-03-02
3 1980-01-21
4 1981-01-21
5 1991-02-21
6 1991-03-23

UNIX timestamps that we can normalize using a MinMaxScalerfrom sklearn

我们可以使用MinMaxScalerfrom标准化的 UNIX 时间戳sklearn

       dates
0  315507600
1  318272400
2  320778000
3  317235600
4  348858000
5  667069200
6  669661200

Normalized to (-1, 1), the final result

归一化为(-1, 1),最终结果

[-1.         -0.98438644 -0.97023664 -0.99024152 -0.81166138  0.98536228
  1.        ]

回答by Jianxun Li

import pandas as pd
import numpy as np
from sklearn.preprocessing import MinMaxScaler

df = pd.DataFrame(np.random.randint(1, 100, (1000, 2)).astype(float64), columns=['A', 'B'])

      A   B
0    87  95
1    15  12
2    85  88
3    33  61
4    33  29
5    33  91
6    67  19
7    68  20
8    79  18
9    29  93
..   ..  ..
990  70  84
991  37  24
992  91  12
993  92  13
994   4  64
995  32  98
996  97  62
997  38  40
998  12  56
999  48   8

[1000 rows x 2 columns]

# specify your desired range (-1, 1)
scaler = MinMaxScaler(feature_range=(-1, 1))
scaled = scaler.fit_transform(df.values)
print(scaled)

[[ 0.7551  0.9184]
 [-0.7143 -0.7755]
 [ 0.7143  0.7755]
 ..., 
 [-0.2449 -0.2041]
 [-0.7755  0.1224]
 [-0.0408 -0.8571]]

df[['A', 'B']] = scaled

Out[30]: 
          A       B
0    0.7551  0.9184
1   -0.7143 -0.7755
2    0.7143  0.7755
3   -0.3469  0.2245
4   -0.3469 -0.4286
5   -0.3469  0.8367
6    0.3469 -0.6327
7    0.3673 -0.6122
8    0.5918 -0.6531
9   -0.4286  0.8776
..      ...     ...
990  0.4082  0.6939
991 -0.2653 -0.5306
992  0.8367 -0.7755
993  0.8571 -0.7551
994 -0.9388  0.2857
995 -0.3673  0.9796
996  0.9592  0.2449
997 -0.2449 -0.2041
998 -0.7755  0.1224
999 -0.0408 -0.8571

[1000 rows x 2 columns]

回答by steboc

a solution with Pandas

Pandas 的解决方案

df = pd.DataFrame({
        'A':
            ['1980-01-01', '1980-02-02', '1980-03-02', '1980-01-21',
             '1981-01-21', '1991-02-21', '1991-03-23'] })
df['A'] = pd.to_datetime(df['A']).astype('int64')
max_a = df.A.max()
min_a = df.A.min()
min_norm = -1
max_norm =1
df['NORMA'] = (df.A- min_a) *(max_norm - min_norm) / (max_a-min_a) + min_norm