pandas 与熊猫的加权相关系数

Question

提问by Yehuda Karlinsky

Is there any way to compute weighted correlation coefficient with pandas? I saw that R has such a method. Also, I'd like to get the p value of the correlation. This I did not find also in R. Link to Wikipedia for explanation about weighted correlation: https://en.wikipedia.org/wiki/Pearson_product-moment_correlation_coefficient#Weighted_correlation_coefficient

有没有办法计算与Pandas的加权相关系数？我看到R有这样的方法。另外，我想获得相关性的 p 值。这我也没有在 R. 链接到维基百科以解释加权相关性：https: //en.wikipedia.org/wiki/Pearson_product-moment_correlation_coefficient#Weighted_correlation_coefficient

Answer 1

回答by root

I don't know of any Python packages that implement this, but it should be fairly straightforward to roll your own implementation. Using the naming conventions of the wikipedia article:

我不知道有任何 Python 包可以实现这一点，但是推出自己的实现应该相当简单。使用维基百科文章的命名约定：

def m(x, w):
    """Weighted Mean"""
    return np.sum(x * w) / np.sum(w)

def cov(x, y, w):
    """Weighted Covariance"""
    return np.sum(w * (x - m(x, w)) * (y - m(y, w))) / np.sum(w)

def corr(x, y, w):
    """Weighted Correlation"""
    return cov(x, y, w) / np.sqrt(cov(x, x, w) * cov(y, y, w))

I tried to make the functions above match the formulas in the wikipedia as closely as possible, but there are some potential simplifications and performance improvements. For example, as pointed out by @Alberto Garcia-Raboso, m(x, w)is really just np.average(x, weights=w), so there's no need to actually write a function for it.

我试图使上述函数尽可能地匹配维基百科中的公式，但有一些潜在的简化和性能改进。例如，正如@Alberto Garcia-Raboso 所指出的，m(x, w)实际上只是np.average(x, weights=w)，因此没有必要为它实际编写函数。

The functions are pretty bare-bones, just doing the calculations. You may want to consider forcing inputs to be arrays prior to doing the calculations, i.e. x = np.asarray(x), as these functions will not work if lists are passed. Additional checks to verify all inputs have equal length, non-null values, etc. could also be implemented.

这些函数非常简单，只是进行计算。在进行计算之前，您可能需要考虑强制输入为数组，即x = np.asarray(x)，因为如果传递列表，这些函数将不起作用。还可以实施额外的检查以验证所有输入具有相等的长度、非空值等。

Example usage:

用法示例：

# Initialize a DataFrame.
np.random.seed([3,1415])
n = 10**6
df = pd.DataFrame({
    'x': np.random.choice(3, size=n),
    'y': np.random.choice(4, size=n),
    'w': np.random.random(size=n)
    })

# Compute the correlation.
r = corr(df['x'], df['y'], df['w'])

There's a discussion hereregarding the p-value. It doesn't look like there's a generic calculation, and it depends on how you're actually getting the weights.

有一个讨论，在这里关于p值。看起来没有通用计算，这取决于您实际如何获得权重。

Answer 2

回答by drevicko

The statsmodelspackage has an implementation of weighted correlation.

该statsmodels包具有加权相关的实施。

pandas 与熊猫的加权相关系数

提问by Yehuda Karlinsky

回答by root

回答by drevicko

相关推荐

最近更新

标签

pandas 与熊猫的加权相关系数

提问by Yehuda Karlinsky

回答by root

回答by drevicko

相关推荐

AttributeError: 模块“pandas.io.sql”没有属性“frame_query”

pandas 使用在熊猫中滚动的滑动窗口迭代器

pandas 如何在不丢失格式的情况下在终端中打印 df？

使用样式和 css 更改 Pandas 数据框 html table python 中文本的颜色

相关推荐

最近更新

标签