如何使用 pandas/numpy 标准化/规范化日期?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/31036148/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-13 23:31:33 来源:igfitidea点击:
How to standardize/normalize a date with pandas/numpy?
提问by user1587451
With following code snippet
使用以下代码片段
import pandas as pd
train = pd.read_csv('train.csv',parse_dates=['dates'])
print(data['dates'])
I load and control the data.
我加载和控制数据。
My question is, how can I standardize/normalize data['dates'] to make all the elements lie between -1 and 1 (linear or gaussian)??
我的问题是,如何标准化/标准化 data['dates'] 以使所有元素都位于 -1 和 1(线性或高斯)之间??
回答by bakkal
import pandas as pd
import numpy as np
from sklearn.preprocessing import MinMaxScaler
import time
def convert_to_timestamp(x):
"""Convert date objects to integers"""
return time.mktime(x.to_datetime().timetuple())
def normalize(df):
"""Normalize the DF using min/max"""
scaler = MinMaxScaler(feature_range=(-1, 1))
dates_scaled = scaler.fit_transform(df['dates'])
return dates_scaled
if __name__ == '__main__':
# Create a random series of dates
df = pd.DataFrame({
'dates':
['1980-01-01', '1980-02-02', '1980-03-02', '1980-01-21',
'1981-01-21', '1991-02-21', '1991-03-23']
})
# Convert to date objects
df['dates'] = pd.to_datetime(df['dates'])
# Now df has date objects like you would, we convert to UNIX timestamps
df['dates'] = df['dates'].apply(convert_to_timestamp)
# Call normalization function
df = normalize(df)
Sample:
样本:
Date objects that we convert using convert_to_timestamp
我们转换使用的日期对象 convert_to_timestamp
dates
0 1980-01-01
1 1980-02-02
2 1980-03-02
3 1980-01-21
4 1981-01-21
5 1991-02-21
6 1991-03-23
UNIX timestamps that we can normalize using a MinMaxScalerfrom sklearn
我们可以使用MinMaxScalerfrom标准化的 UNIX 时间戳sklearn
dates
0 315507600
1 318272400
2 320778000
3 317235600
4 348858000
5 667069200
6 669661200
Normalized to (-1, 1), the final result
归一化为(-1, 1),最终结果
[-1. -0.98438644 -0.97023664 -0.99024152 -0.81166138 0.98536228
1. ]
回答by Jianxun Li
import pandas as pd
import numpy as np
from sklearn.preprocessing import MinMaxScaler
df = pd.DataFrame(np.random.randint(1, 100, (1000, 2)).astype(float64), columns=['A', 'B'])
A B
0 87 95
1 15 12
2 85 88
3 33 61
4 33 29
5 33 91
6 67 19
7 68 20
8 79 18
9 29 93
.. .. ..
990 70 84
991 37 24
992 91 12
993 92 13
994 4 64
995 32 98
996 97 62
997 38 40
998 12 56
999 48 8
[1000 rows x 2 columns]
# specify your desired range (-1, 1)
scaler = MinMaxScaler(feature_range=(-1, 1))
scaled = scaler.fit_transform(df.values)
print(scaled)
[[ 0.7551 0.9184]
[-0.7143 -0.7755]
[ 0.7143 0.7755]
...,
[-0.2449 -0.2041]
[-0.7755 0.1224]
[-0.0408 -0.8571]]
df[['A', 'B']] = scaled
Out[30]:
A B
0 0.7551 0.9184
1 -0.7143 -0.7755
2 0.7143 0.7755
3 -0.3469 0.2245
4 -0.3469 -0.4286
5 -0.3469 0.8367
6 0.3469 -0.6327
7 0.3673 -0.6122
8 0.5918 -0.6531
9 -0.4286 0.8776
.. ... ...
990 0.4082 0.6939
991 -0.2653 -0.5306
992 0.8367 -0.7755
993 0.8571 -0.7551
994 -0.9388 0.2857
995 -0.3673 0.9796
996 0.9592 0.2449
997 -0.2449 -0.2041
998 -0.7755 0.1224
999 -0.0408 -0.8571
[1000 rows x 2 columns]
回答by steboc
a solution with Pandas
Pandas 的解决方案
df = pd.DataFrame({
'A':
['1980-01-01', '1980-02-02', '1980-03-02', '1980-01-21',
'1981-01-21', '1991-02-21', '1991-03-23'] })
df['A'] = pd.to_datetime(df['A']).astype('int64')
max_a = df.A.max()
min_a = df.A.min()
min_norm = -1
max_norm =1
df['NORMA'] = (df.A- min_a) *(max_norm - min_norm) / (max_a-min_a) + min_norm

