Python Pandas 系列:对数归一化
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/37890849/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Pandas Series: Log Normalize
提问by Benni
I have a Pandas Series, that needs to be log-transformed to be normal distributed. But I can′t log transform yet, because there are values =0 and values below 1 (0-4000). Therefore I want to normalize the Series first. I heard of StandardScaler(scikit-learn), Z-score standardization and Min-Max scaling(normalization). I want to cluster the data later, which would be the best method? StandardScaler and Z-score standardization use mean, variance etc. Can I use them on "not yet normal distibuted" data?
我有一个 Pandas 系列,需要对数转换为正态分布。但是我还不能记录转换,因为有值 =0 和低于 1 (0-4000) 的值。因此我想先规范化系列。我听说过 StandardScaler(scikit-learn)、Z-score 标准化和 Min-Max scaling(normalization)。我想稍后对数据进行聚类,哪种方法最好?StandardScaler 和 Z-score 标准化使用均值、方差等。我可以在“尚未正常分布”的数据上使用它们吗?
回答by mtadd
To transform to logarithms, you need positive values, so translate your range of values (-1,1] to normalized (0,1] as follows
要转换为对数,您需要正值,因此将您的值范围 (-1,1] 转换为归一化 (0,1] 如下
import numpy as np
import pandas as pd
df = pd.DataFrame(np.random.uniform(-1,1,(10,1)))
df['norm'] = (1+df[0])/2 # (-1,1] -> (0,1]
df['lognorm'] = np.log(df['norm'])
results in a dataframe like
导致像这样的数据帧
0 norm lognorm
0 0.360660 0.680330 -0.385177
1 0.973724 0.986862 -0.013225
2 0.329130 0.664565 -0.408622
3 0.604727 0.802364 -0.220193
4 0.416732 0.708366 -0.344795
5 0.085439 0.542719 -0.611163
6 -0.964246 0.017877 -4.024232
7 0.738281 0.869141 -0.140250
8 0.558220 0.779110 -0.249603
9 0.485144 0.742572 -0.297636
回答by Has QUIT--Anony-Mousse
If your data is in the range (-1;+1) (assuming you lost the minus in your question) then log transform is probably not what you need. At least from a theoretical point of view, it's obviously the wrongthing to do.
如果您的数据在 (-1;+1) 范围内(假设您在问题中丢失了减号),那么对数转换可能不是您需要的。至少从理论的角度来看,这显然是错误的做法。
Maybe your data has already been preprocessed (inadequately)? Can you get the raw data? Why do you think log transform will help?
也许您的数据已经过预处理(不充分)?你能得到原始数据吗?为什么您认为对数转换会有所帮助?
If you don't care about what is the meaningful thing to do, you can call log1p
, which is the same as log(1+x)
and which will thus work on (-1;∞).
如果你不关心什么是有意义的事情,你可以调用log1p
,它log(1+x)
与 (-1;∞)相同,因此将在 (-1;∞) 上工作。