pandas DataFrame 相关产生 NaN,尽管其值都是整数

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/22655667/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-13 21:51:21  来源:igfitidea点击:

DataFrame correlation produces NaN although its values are all integers

pythonpandasnancorrelationseries

提问by user2366975

I have a dataframe df:

我有一个数据框df

df   = pandas.DataFrame(pd.read_csv(loggerfile, header = 2))

values = df.as_matrix()

df2 = pd.DataFrame.from_records(values, index = datetimeIdx, columns = Columns) 

EDIT:

编辑:

Now reading the data this way as suggested:

现在按照建议以这种方式读取数据:

df2 = pd.read_csv(loggerfile, header = None, skiprows = [0,1,2])

Sample:

样本:

                         0              1       2   3   4   5   6   7   8   \
0  2014-03-19T12:44:32.695Z  1395233072695  703425   0   2   1  13   5  21   
1  2014-03-19T12:44:32.727Z  1395233072727  703425   0   2   1  13   5  21   

   9   10  11   12  13   14  15  16  
0  25   0  25  209   0  145   0   0  
1  25   0  25  209   0  146   0   0

The columns are all type int (except the first one):

列都是 int 类型(第一个除外):

print df2.dtypes

0     object
1      int64
2      int64
3      int64
4      int64
5      int64
6      int64
7      int64
8      int64
9      int64
10     int64
11     int64
12     int64
13     int64
14     int64
15     int64
16     int64

But in my correlation, some columns seem to be NaN.

但在我的相关性中,有些列似乎是 NaN。

df2.corr()

     1          2    3          4           5   6   7            8           ...    
1    1.000000   NaN  0.018752   -0.550307   NaN NaN 0.075191     0.775725
2    NaN        NaN  NaN         NaN        NaN NaN NaN          NaN
3    0.018752   NaN  1.000000   -0.067293   NaN NaN -0.579651    0.004593 
...

回答by Karl D.

Those columns do not change in value right now, yes

这些列的值现在不会改变,是的

As, Joris points out you would expected NaNif the values do not vary. To see why take a look at correlation formula:

因为,Joris 指出,NaN如果值不变,您会期望的。要了解为什么要查看相关公式:

cor(i,j) = cov(i,j)/[stdev(i)*stdev(j)]

If the values of the ith or jth variable do not vary, then the respective standard deviation will be zero and so will the denominator of the fraction. Thus, the correlation will be NaN.

如果第 i 个或第 j 个变量的值不变,则相应的标准偏差将为零,分数的分母也将为零。因此,相关性将为NaN