pandas 我怎样才能干净地规范化数据,然后在以后“非规范化”它?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/43382716/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How can I cleanly normalize data and then "unnormalize" it later?
提问by maxbfuer
I am using Anaconda with a Tensorflow neural network. Most of my data is stored with pandas
.
I am attempting to predict cryptocurrency markets. I am aware that this lots of people are probably doing this and it is most likely not going to be very effective, I'm mostly doing it to familiarize myself with Tensorflow and Anaconda tools.
I am fairly new to this, so if I am doing something wrong or suboptimally please let me know.
我正在将 Anaconda 与 Tensorflow 神经网络一起使用。我的大部分数据都存储在pandas
.
我正在尝试预测加密货币市场。我知道很多人可能都在这样做,而且很可能不会非常有效,我这样做主要是为了熟悉 Tensorflow 和 Anaconda 工具。
我对此很陌生,所以如果我做错了什么或不是最理想的,请告诉我。
Here is how I aquire and handle the data:
以下是我获取和处理数据的方式:
- Download datasets from quandl.com into pandas
DataFrames
- Select the desired columns from each downloaded dataset
- Concatenate the
DataFrames
- Drop all NaNs from the new, merged
DataFrame
- Normalize each column (independently) to
0.0-1.0
in the newDataFrame
using the codedf = (df - df.min()) / (df.max() - df.min())
- Feed the normalized data into my neural network
- Unnormalize the data (This is the part that I haven't implemented)
- 从 quandl.com 下载数据集到 Pandas
DataFrames
- 从每个下载的数据集中选择所需的列
- 连接
DataFrames
- 从新的合并中删除所有 NaN
DataFrame
- 使用代码
0.0-1.0
将每一列(独立地)标准化为新DataFrame
的df = (df - df.min()) / (df.max() - df.min())
- 将标准化数据输入我的神经网络
- 对数据进行非规范化(这是我尚未实现的部分)
Now, my question is, how can I cleanly normalize and then unnormalize this data? I realize that if I want to unnormalize data, I'm going to need to store the initial df.min()
and df.max()
values, but this looks ugly and feels cumbersome.
I am aware that I can normalize data with sklearn.preprocessing.MinMaxScaler
, but as far as I know I can't unnormalize data using this.
现在,我的问题是,我怎样才能彻底规范化这些数据,然后不规范化这些数据?我意识到如果我想对数据进行非规范化,我将需要存储初始值df.min()
和df.max()
值,但这看起来很难看而且感觉很麻烦。
我知道我可以使用 对数据进行规范化sklearn.preprocessing.MinMaxScaler
,但据我所知,我无法使用它对数据进行非规范化。
It might be that I'm doing something fundamentally wrong here, but I'll be very surprised if there isn't a clean way to normalize and unnormalize data with Anaconda or other libraries.
可能是我在这里做了一些根本性的错误,但如果没有一种干净的方法来使用 Anaconda 或其他库对数据进行规范化和非规范化,我会感到非常惊讶。
回答by tmrlvi
All the scalers in sklearn.preprocessing
have inverse_transform
method designed just for that.
中的所有缩放器sklearn.preprocessing
都有inverse_transform
专门为此设计的方法。
For example, to scale and un-scale your DataFrame
with MinMaxScaler
you could do:
例如,为了扩展和未扩展您的DataFrame
使用MinMaxScaler
,你可以做:
from sklearn.preprocessing import MinMaxScaler
scaler = MinMaxScaler()
scaled = scaler.fit_transform(df)
unscaled = scaler.inverse_transform(scaled)
Just bear in mind that the transform
function (and fit_transform
as well) return a numpy.array
, and not a pandas.Dataframe
.
请记住,该transform
函数(fit_transform
以及)返回 a numpy.array
,而不是 a pandas.Dataframe
。