pandas 在数据框的两列之间运行基本关联
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/35095249/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Run a basic correlation between two columns of a dataframe
提问by Tiberius
I am trying to be able to produce a correlation matrix from a pandas dataframe using data from specified columns
我试图能够使用来自指定列的数据从Pandas数据帧生成相关矩阵
Here is my csv data:
这是我的 csv 数据:
col0,col1,col2,col3,col4
122468.9071,1417464.203,3546600,151804924,10839476
14691.1139,170036.0407,103847,19208604,2365065
Here are the two dataframes I created:
这是我创建的两个数据框:
df1 = pd.read_csv('c:/temp/test_1.csv', usecols=[0])
df2 = pd.read_csv('c:/temp/test_1.csv', usecols=[1])
I tried the corr and corrwith functions and get the following errors:
我尝试了 corr 和 corrwith 函数并得到以下错误:
Corr Function:
print df1.corr(df2)
Result:
Error: Could not compare ['pearson'] with block values
Corrwith:
print df1.corrwith(df2)
Result:
col0 NaN
col1 NaN
dtype: float64
As you can see, there are no null values in the data set and the float64 should be able to handle decimals.
如您所见,数据集中没有空值,并且 float64 应该能够处理小数。
Any assistance on a solve would be greatly appreciated.
任何有关解决的帮助将不胜感激。
Tiberius
提比略
回答by Josh Baker
If you are trying to create a correlation matrix between the two columns, I would suggest bringing them into the same dataframe, like so:
如果您尝试在两列之间创建相关矩阵,我建议将它们放入同一个数据框中,如下所示:
df = pd.read_csv('c:/temp/test_1.csv', usecols=[0,1])
df.corr()
I loaded your data into a csv myself and got a 2x2 correlation matrix of all 1s, which is expected.
我自己将您的数据加载到 csv 中,并得到了一个全为 1 的 2x2 相关矩阵,这是预期的。
You can find documentation on the pandas correlation here: http://pandas.pydata.org/pandas-docs/stable/computation.html#correlation
您可以在此处找到有关Pandas相关性的文档:http: //pandas.pydata.org/pandas-docs/stable/computation.html#correlation