pandas 如何执行分类列之间的相关性

Question

提问by

I have a set of columns (col1,col2,col3) in dataframe df1 I have another set of columns (col4,col5,col6) in dataframe df2 Assume this two dataframes has the same number of rows.

我在数据帧 df1 中有一组列 (col1,col2,col3) 我在数据帧 df2 中有另一组列 (col4,col5,col6) 假设这两个数据帧具有相同的行数。

How do I generate a correlation table that do pairwise correlation between df1 and df2?

如何生成在 df1 和 df2 之间进行成对相关的相关表？

the table will look like

桌子看起来像

    col1 col2 col3
col4 ..   ..   ..
col5 ..   ..   ..
col6 ..   ..   ..

I use df1.corrwith(df2), it does not seem to generate the table as required.

我使用df1.corrwith(df2)，它似乎没有按要求生成表。

I have a asked a similar question here: How to perform Correlation between two dataframes with different column namesbut now I am dealing with categorical columns.

我在这里问了一个类似的问题： How to perform Correlation between two dataframes with different column names但现在我正在处理分类列。

If it is not comparable directly, is there a standard way to make it comparable (like using get_dummies)? and is that a faster way to automatically process all fields (assume all are categorical) and calculate their correlation?

如果它不能直接比较，是否有一种标准方法可以使其具有可比性（例如使用 get_dummies）？这是自动处理所有字段（假设所有字段都是分类的）并计算它们的相关性的更快方法吗？

Answer 1

回答by Ted Petrou

You are correct that pd.get_dummieswould be needed to get the correlation. Below, I will create some fake data with two categorical columns and then use corrwith

您是正确的，pd.get_dummies这是获得相关性所必需的。下面，我将创建一些带有两个分类列的假数据，然后使用corrwith

df = pd.DataFrame({'col1':np.random.choice(list('abcde'),100),
                  'col2':np.random.choice(list('xyz'),100)}, dtype='category')
df1 = pd.DataFrame({'col1':np.random.choice(list('abcde'),100),
                   'col2':np.random.choice(list('xyz'),100)}, dtype='category')

dfa = pd.get_dummies(df)
dfb = pd.get_dummies(df1)
dfa.corrwith(dfb)

col1_a   -0.057735
col1_b    0.002513
col1_c    0.137956
col1_d   -0.095050
col1_e   -0.114022
col2_x    0.022568
col2_y   -0.081699
col2_z   -0.128350

pandas 如何执行分类列之间的相关性

提问by

回答by Ted Petrou

相关推荐

最近更新

标签

pandas 如何执行分类列之间的相关性

提问by

回答by Ted Petrou

相关推荐

pandas 打印数据帧名称

pandas 如何在熊猫中选择不以某些 str 开头的行？

LOC 函数中的 Pandas 使用和运算符

pandas 如何合并数据帧熊猫中的两行

相关推荐

最近更新

标签