Python Pandas 系列：对数归一化

Question

提问by Benni

I have a Pandas Series, that needs to be log-transformed to be normal distributed. But I can′t log transform yet, because there are values =0 and values below 1 (0-4000). Therefore I want to normalize the Series first. I heard of StandardScaler(scikit-learn), Z-score standardization and Min-Max scaling(normalization). I want to cluster the data later, which would be the best method? StandardScaler and Z-score standardization use mean, variance etc. Can I use them on "not yet normal distibuted" data?

我有一个 Pandas 系列，需要对数转换为正态分布。但是我还不能记录转换，因为有值 =0 和低于 1 (0-4000) 的值。因此我想先规范化系列。我听说过 StandardScaler(scikit-learn)、Z-score 标准化和 Min-Max scaling(normalization)。我想稍后对数据进行聚类，哪种方法最好？StandardScaler 和 Z-score 标准化使用均值、方差等。我可以在“尚未正常分布”的数据上使用它们吗？

Answer 1

回答by mtadd

To transform to logarithms, you need positive values, so translate your range of values (-1,1] to normalized (0,1] as follows

要转换为对数，您需要正值，因此将您的值范围 (-1,1] 转换为归一化 (0,1] 如下

import numpy as np
import pandas as pd

df = pd.DataFrame(np.random.uniform(-1,1,(10,1)))
df['norm'] = (1+df[0])/2 # (-1,1] -> (0,1]
df['lognorm'] = np.log(df['norm'])

results in a dataframe like

导致像这样的数据帧

          0      norm   lognorm
0  0.360660  0.680330 -0.385177
1  0.973724  0.986862 -0.013225
2  0.329130  0.664565 -0.408622
3  0.604727  0.802364 -0.220193
4  0.416732  0.708366 -0.344795
5  0.085439  0.542719 -0.611163
6 -0.964246  0.017877 -4.024232
7  0.738281  0.869141 -0.140250
8  0.558220  0.779110 -0.249603
9  0.485144  0.742572 -0.297636

Answer 2

回答by Has QUIT--Anony-Mousse

If your data is in the range (-1;+1) (assuming you lost the minus in your question) then log transform is probably not what you need. At least from a theoretical point of view, it's obviously the wrongthing to do.

如果您的数据在 (-1;+1) 范围内（假设您在问题中丢失了减号），那么对数转换可能不是您需要的。至少从理论的角度来看，这显然是错误的做法。

Maybe your data has already been preprocessed (inadequately)? Can you get the raw data? Why do you think log transform will help?

也许您的数据已经过预处理（不充分）？你能得到原始数据吗？为什么您认为对数转换会有所帮助？

If you don't care about what is the meaningful thing to do, you can call log1p, which is the same as log(1+x)and which will thus work on (-1;∞).

如果你不关心什么是有意义的事情，你可以调用log1p，它log(1+x)与 (-1;∞)相同，因此将在 (-1;∞) 上工作。

Python Pandas 系列：对数归一化

提问by Benni

回答by mtadd

回答by Has QUIT--Anony-Mousse

相关推荐

最近更新

标签

Python Pandas 系列：对数归一化

提问by Benni

回答by mtadd

回答by Has QUIT--Anony-Mousse

相关推荐

使用python进行非线性回归 - 更好地拟合这些数据的简单方法是什么？

Python 姜戈错误。不能分配必须是一个实例

KeyError：“ [['', '']] 中没有 [['', '']] 在 [columns]” 熊猫 python

Python Pandas：减去两个日期列，结果是一个整数

相关推荐

最近更新

标签