pandas.DataFrame corrwith() 方法

Question

提问by Nikita Sivukhin

I recently start working with pandas. Can anyone explain me difference in behaviour of function .corrwith()with Seriesand DataFrame?

我最近开始使用pandas. 谁能.corrwith()用Series和解释我在功能行为上的差异DataFrame？

Suppose i have one DataFrame:

假设我有一个DataFrame：

frame = pd.DataFrame(data={'a':[1,2,3], 'b':[-1,-2,-3], 'c':[10, -10, 10]})

And i want calculate correlation between features 'a' and all other features. I can do it in the following way:

我想计算特征“a”和所有其他特征之间的相关性。我可以通过以下方式做到这一点：

frame.drop(labels='a', axis=1).corrwith(frame['a'])

And result will be:

结果将是：

b   -1.0
c    0.0

But very similar code:

但非常相似的代码：

frame.drop(labels='a', axis=1).corrwith(frame[['a']])

Generate absolutely different and unacceptable table:

生成完全不同且不可接受的表：

a   NaN
b   NaN
c   NaN

So, my question is: why in case of DataFrameas second argument we get such strange output?

所以，我的问题是：为什么在DataFrame作为第二个参数的情况下，我们会得到如此奇怪的输出？

Answer 1

回答by piRSquared

What I think you're looking for:

我认为你在找什么：

Let's say your frame is:

假设您的框架是：

frame = pd.DataFrame(np.random.rand(10, 6), columns=['cost', 'amount', 'day', 'month', 'is_sale', 'hour'])

You want the 'cost'and 'amount'columns to be correlated with all other columns in every combination.

您希望'cost'和'amount'列与每个组合中的所有其他列相关联。

focus_cols = ['cost', 'amount']
frame.corr().filter(focus_cols).drop(focus_cols)

Answering what you asked:

回答你的问题：

Compute pairwise correlation between rows or columns of two DataFrame objects.
Parameters:
other: DataFrame
axis : {0 or ‘index', 1 or ‘columns'},
default 0 0 or ‘index' to compute column-wise, 1 or ‘columns' for row-wise drop : boolean, default False Drop missing indices from result, default returns union of all Returns: correls : Series

计算两个 DataFrame 对象的行或列之间的成对相关性。
参数：
其他：数据帧
轴：{0 或“索引”，1 或“列”}，
默认 0 0 或 'index' 计算列方式，1 或 'columns' 为行方式 drop ：boolean，默认 False 从结果中删除缺失的索引，默认返回所有的并集返回：correls：系列

corrwithis behaving similarly to add, sub, mul, divin that it expects to find a DataFrameor a Seriesbeing passed in otherdespite the documentation saying just DataFrame.

corrwith同样表现于add，sub，mul，div，它希望找到一个DataFrame或Series在被传递other，尽管文档只是说DataFrame。

When otheris a Seriesit broadcast that series and matches along the axis specified by axis, default is 0. This is why the following worked:

当other是Series它的广播沿着指定的轴那个系列和火柴axis，默认值为0。这就是为什么以下工作：

frame.drop(labels='a', axis=1).corrwith(frame.a)

b   -1.0
c    0.0
dtype: float64

When otheris a DataFrameit will match the axis specified by axisand correlate each pair identified by the other axis. If we did:

当other是 a 时DataFrame，它将匹配由指定的轴axis并关联由另一个轴标识的每一对。如果我们这样做：

frame.drop('a', axis=1).corrwith(frame.drop('b', axis=1))

a    NaN
b    NaN
c    1.0
dtype: float64

Only cwas in common and only chad its correlation calculated.

只有c共同点，只c计算其相关性。

In the case you specified:

在您指定的情况下：

frame.drop(labels='a', axis=1).corrwith(frame[['a']])

frame[['a']]is a DataFramebecause of the [['a']]and now plays by the DataFramerules in which its columns must match up with what its being correlated with. But you explicitly drop afrom the first frame then correlate with a DataFramewith nothing but a. The result is NaNfor every column.

frame[['a']]是DataFrame因为[['a']]并且现在DataFrame遵循规则，其中的列必须与其相关联的内容相匹配。但是您明确地a从第一帧中删除，然后与 a 关联DataFrame，除了a. 结果是NaN针对每一列的。

Answer 2

回答by MaxU

corrwith defined as DataFrame.corrwith(other, axis=0, drop=False), so the axis=0per default - i.e. Compute pairwise correlation between columns of two **DataFrame** objects

corrwith 定义为DataFrame.corrwith(other, axis=0, drop=False)，因此axis=0默认情况下 - 即Compute pairwise correlation between columns of two **DataFrame** objects

So the column names / labels must be the same in both DFs:

因此，两个 DF 中的列名/标签必须相同：

In [134]: frame.drop(labels='a', axis=1).corrwith(frame[['a']].rename(columns={'a':'b'}))
Out[134]:
b   -1.0
c    NaN
dtype: float64

NaN- means (in this case) there is nothing to compare / correlate with, because there is NO column named cin otherDF

NaN- 意味着（在这种情况下）没有什么可比较/关联的，因为c在otherDF 中没有命名列

if you pass a series as otherit will be translated (from the link, you've posted in comment) into:

如果您传递一个系列，other因为它将被翻译（来自链接，您已在评论中发布）为：

In [142]: frame.drop(labels='a', axis=1).apply(frame.a.corr)
Out[142]:
b   -1.0
c    0.0
dtype: float64

Answer 3

回答by Zahoor Ahmad

enter image description here

在此处输入图片说明

A simple output

一个简单的输出

Answer 4

回答by Zahoor Ahmad

Sorry a bit late.. There is no way of corwith of series while panda dataframe could only be analyzed with having same columnz

抱歉有点晚了.. 没有办法与系列相关联，而Pandas数据框只能使用相同的 columnz 进行分析

like

喜欢

x = np.array([2, 4, 6, 8.2]).reshape(-1, 1)

y = np.array([2.3, 3.11, .5, 7, 10, 11, 12]).reshape(-1, 1)

a = pd.DataFrame(x, columns=['aa']) b = pd.DataFrame(y, columns=['aa'])

a.corrwith(b)

a.对应(b)

pandas.DataFrame corrwith() 方法

提问by Nikita Sivukhin

回答by piRSquared

What I think you're looking for:

我认为你在找什么：

Answering what you asked:

回答你的问题：

回答by MaxU

回答by Zahoor Ahmad

回答by Zahoor Ahmad

相关推荐

最近更新

标签

pandas.DataFrame corrwith() 方法

提问by Nikita Sivukhin

回答by piRSquared

What I think you're looking for:

我认为你在找什么：

Answering what you asked:

回答你的问题：

回答by MaxU

回答by Zahoor Ahmad

回答by Zahoor Ahmad

相关推荐

pandas ValueError：项目错误长度 907 而不是 2000

pandas 无法从熊猫数据框中删除一列

Pandas 中的 Excel VLOOKUP 等效项

pandas 将列表设置为熊猫数据框列中的值

相关推荐

最近更新

标签