pandas.DataFrame corrwith() 方法

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/38422001/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 01:36:07  来源:igfitidea点击:

pandas.DataFrame corrwith() method

pythonpandasdataframe

提问by Nikita Sivukhin

I recently start working with pandas. Can anyone explain me difference in behaviour of function .corrwith()with Seriesand DataFrame?

我最近开始使用pandas. 谁能.corrwith()Series和解释我在功能行为上的差异DataFrame

Suppose i have one DataFrame:

假设我有一个DataFrame

frame = pd.DataFrame(data={'a':[1,2,3], 'b':[-1,-2,-3], 'c':[10, -10, 10]})

And i want calculate correlation between features 'a' and all other features. I can do it in the following way:

我想计算特征“a”和所有其他特征之间的相关性。我可以通过以下方式做到这一点:

frame.drop(labels='a', axis=1).corrwith(frame['a'])

And result will be:

结果将是:

b   -1.0
c    0.0

But very similar code:

但非常相似的代码:

frame.drop(labels='a', axis=1).corrwith(frame[['a']])

Generate absolutely different and unacceptable table:

生成完全不同且不可接受的表:

a   NaN
b   NaN
c   NaN

So, my question is: why in case of DataFrameas second argument we get such strange output?

所以,我的问题是:为什么在DataFrame作为第二个参数的情况下,我们会得到如此奇怪的输出?

回答by piRSquared

What I think you're looking for:

我认为你在找什么:

Let's say your frame is:

假设您的框架是:

frame = pd.DataFrame(np.random.rand(10, 6), columns=['cost', 'amount', 'day', 'month', 'is_sale', 'hour'])

You want the 'cost'and 'amount'columns to be correlated with all other columns in every combination.

您希望'cost''amount'列与每个组合中的所有其他列相关联。

focus_cols = ['cost', 'amount']
frame.corr().filter(focus_cols).drop(focus_cols)

enter image description here

在此处输入图片说明

Answering what you asked:

回答你的问题:

Compute pairwise correlation between rows or columns of two DataFrame objects.

Parameters:

other: DataFrame

axis : {0 or ‘index', 1 or ‘columns'},

default 0 0 or ‘index' to compute column-wise, 1 or ‘columns' for row-wise drop : boolean, default False Drop missing indices from result, default returns union of all Returns: correls : Series

计算两个 DataFrame 对象的行或列之间的成对相关性。

参数:

其他:数据帧

轴:{0 或“索引”,1 或“列”},

默认 0 0 或 'index' 计算列方式,1 或 'columns' 为行方式 drop :boolean,默认 False 从结果中删除缺失的索引,默认返回所有的并集返回:correls:系列

corrwithis behaving similarly to add, sub, mul, divin that it expects to find a DataFrameor a Seriesbeing passed in otherdespite the documentation saying just DataFrame.

corrwith同样表现于addsubmuldiv,它希望找到一个DataFrameSeries在被传递other,尽管文档只是说DataFrame

When otheris a Seriesit broadcast that series and matches along the axis specified by axis, default is 0. This is why the following worked:

otherSeries它的广播沿着指定的轴那个系列和火柴axis,默认值为0。这就是为什么以下工作:

frame.drop(labels='a', axis=1).corrwith(frame.a)

b   -1.0
c    0.0
dtype: float64

When otheris a DataFrameit will match the axis specified by axisand correlate each pair identified by the other axis. If we did:

other是 a 时DataFrame,它将匹配由指定的轴axis并关联由另一个轴标识的每一对。如果我们这样做:

frame.drop('a', axis=1).corrwith(frame.drop('b', axis=1))

a    NaN
b    NaN
c    1.0
dtype: float64

Only cwas in common and only chad its correlation calculated.

只有c共同点,只c计算其相关性。

In the case you specified:

在您指定的情况下:

frame.drop(labels='a', axis=1).corrwith(frame[['a']])

frame[['a']]is a DataFramebecause of the [['a']]and now plays by the DataFramerules in which its columns must match up with what its being correlated with. But you explicitly drop afrom the first frame then correlate with a DataFramewith nothing but a. The result is NaNfor every column.

frame[['a']]DataFrame因为[['a']]并且现在DataFrame遵循规则,其中的列必须与其相关联的内容相匹配。但是您明确地a从第一帧中删除,然后与 a 关联DataFrame,除了a. 结果是NaN针对每一列的。

回答by MaxU

corrwith defined as DataFrame.corrwith(other, axis=0, drop=False), so the axis=0per default - i.e. Compute pairwise correlation between columns of two **DataFrame** objects

corrwith 定义为DataFrame.corrwith(other, axis=0, drop=False),因此axis=0默认情况下 - 即Compute pairwise correlation between columns of two **DataFrame** objects

So the column names / labels must be the same in both DFs:

因此,两个 DF 中的列名/标签必须相同:

In [134]: frame.drop(labels='a', axis=1).corrwith(frame[['a']].rename(columns={'a':'b'}))
Out[134]:
b   -1.0
c    NaN
dtype: float64

NaN- means (in this case) there is nothing to compare / correlate with, because there is NO column named cin otherDF

NaN- 意味着(在这种情况下)没有什么可比较/关联的,因为cotherDF 中没有命名列

if you pass a series as otherit will be translated (from the link, you've posted in comment) into:

如果您传递一个系列,other因为它将被翻译(来自链接,您已在评论中发布)为:

In [142]: frame.drop(labels='a', axis=1).apply(frame.a.corr)
Out[142]:
b   -1.0
c    0.0
dtype: float64

回答by Zahoor Ahmad

回答by Zahoor Ahmad

Sorry a bit late.. There is no way of corwith of series while panda dataframe could only be analyzed with having same columnz

抱歉有点晚了.. 没有办法与系列相关联,而Pandas数据框只能使用相同的 columnz 进行分析

like

喜欢

x = np.array([2, 4, 6, 8.2]).reshape(-1, 1)

x = np.array([2, 4, 6, 8.2]).reshape(-1, 1)

y = np.array([2.3, 3.11, .5, 7, 10, 11, 12]).reshape(-1, 1)

y = np.array([2.3, 3.11, .5, 7, 10, 11, 12]).reshape(-1, 1)

a = pd.DataFrame(x, columns=['aa']) b = pd.DataFrame(y, columns=['aa'])

a = pd.DataFrame(x, columns=['aa']) b = pd.DataFrame(y, columns=['aa'])

a.corrwith(b)

a.对应(b)