pandas 获取 ValueError：endog 和 exog 的索引未对齐

Question

提问by Sanoj

I am getting above error when I am running an iteration using FOR loop to build multiple models. First two models having similar data sets build fine. While building third model I am getting this error. The code where error is thrown is when I call sm.logit() using Statsmodel package of python:

当我使用 FOR 循环运行迭代以构建多个模型时，出现上述错误。前两个具有相似数据集的模型构建良好。在构建第三个模型时，我收到此错误。抛出错误的代码是当我使用 python 的 Statsmodel 包调用 sm.logit() 时：

y = y_mort.convert_objects(convert_numeric=True)

#Building Logistic model_LSVC
print("Shape of y:", y.shape, " &&Shape of X_selected_lsvc:", X.shape)
print("y values:",y.head())
logit = sm.Logit(y,X,missing='drop')

The error that appears:

出现的错误：

Shape of y: (9018,)  &&Shape of X_selected_lsvc: (9018, 59)
y values: 0    0
1    1
2    0
3    0
4    0
Name: mort, dtype: int64
ValueError                                Traceback (most recent call last)
<ipython-input-8-fec746e2ee99> in <module>()
    160     print("Shape of y:", y.shape, " &&Shape of X_selected_lsvc:", X.shape)
    161     print("y values:",y.head())
--> 162     logit = sm.Logit(y,X,missing='drop')
    163     # fit the model
    164     est = logit.fit(method='cg')

D:\Anaconda3\lib\site-packages\statsmodels\discrete\discrete_model.py in __init__(self, endog, exog, **kwargs)
    399 
    400     def __init__(self, endog, exog, **kwargs):
--> 401         super(BinaryModel, self).__init__(endog, exog, **kwargs)
    402         if (self.__class__.__name__ != 'MNLogit' and
    403                 not np.all((self.endog >= 0) & (self.endog <= 1))):

D:\Anaconda3\lib\site-packages\statsmodels\discrete\discrete_model.py in __init__(self, endog, exog, **kwargs)
    152     """
    153     def __init__(self, endog, exog, **kwargs):
--> 154         super(DiscreteModel, self).__init__(endog, exog, **kwargs)
    155         self.raise_on_perfect_prediction = True
    156 

D:\Anaconda3\lib\site-packages\statsmodels\base\model.py in __init__(self, endog, exog, **kwargs)
    184 
    185     def __init__(self, endog, exog=None, **kwargs):
--> 186         super(LikelihoodModel, self).__init__(endog, exog, **kwargs)
    187         self.initialize()
    188 

D:\Anaconda3\lib\site-packages\statsmodels\base\model.py in __init__(self, endog, exog, **kwargs)
     58         hasconst = kwargs.pop('hasconst', None)
     59         self.data = self._handle_data(endog, exog, missing, hasconst,
---> 60                                       **kwargs)
     61         self.k_constant = self.data.k_constant
     62         self.exog = self.data.exog

D:\Anaconda3\lib\site-packages\statsmodels\base\model.py in _handle_data(self, endog, exog, missing, hasconst, **kwargs)
     82 
     83     def _handle_data(self, endog, exog, missing, hasconst, **kwargs):
---> 84         data = handle_data(endog, exog, missing, hasconst, **kwargs)
     85         # kwargs arrays could have changed, easier to just attach here
     86         for key in kwargs:

D:\Anaconda3\lib\site-packages\statsmodels\base\data.py in handle_data(endog, exog, missing, hasconst, **kwargs)
    564     klass = handle_data_class_factory(endog, exog)
    565     return klass(endog, exog=exog, missing=missing, hasconst=hasconst,
--> 566                  **kwargs)

D:\Anaconda3\lib\site-packages\statsmodels\base\data.py in __init__(self, endog, exog, missing, hasconst, **kwargs)
     74         # this has side-effects, attaches k_constant and const_idx
     75         self._handle_constant(hasconst)
---> 76         self._check_integrity()
     77         self._cache = resettable_cache()
     78 

D:\Anaconda3\lib\site-packages\statsmodels\base\data.py in _check_integrity(self)
    450                 (hasattr(endog, 'index') and hasattr(exog, 'index')) and
    451                 not self.orig_endog.index.equals(self.orig_exog.index)):
--> 452             raise ValueError("The indices for endog and exog are not aligned")
    453         super(PandasData, self)._check_integrity()
    454 

ValueError: The indices for endog and exog are not aligned

The y matrix and X matrix have shape of (9018,),(9018, 59). Therefore any mismatch in dependent and independent variables doesn't appear. Any idea?

y 矩阵和 X 矩阵的形状为 (9018,),(9018, 59)。因此，不会出现因变量和自变量的任何不匹配。任何的想法？

Answer 1

回答by yper

Try converting yinto a list before the sm.Logit()line.

尝试将y转换为sm.Logit()行之前的列表。

y = list(y)

Answer 2

回答by Ashish Anand

The error message indicates that you have endog and exog with different shape. This is common error in python which can be easily solved by using 'reshape' function on dependent variable to align it with independent variable's shape.

错误消息表明您有不同形状的 endog 和 exog。这是python中的常见错误，可以通过对因变量使用“reshape”函数使其与自变量的形状对齐来轻松解决。

y_train.values.reshape(-1,1)

Above lines means:- We have provided column as 1 but rows as unknown i.e. we got a single column with as many rows as X.

以上几行表示：- 我们提供的列是 1，但行是未知的，即我们得到了一个与 X 行一样多的单列。

Lets take a example:-

让我们举个例子：-

z = np.array([[1, 2], [ 3, 4]])
print(z.shape)    # (2, 2)

Now we will use reshape(-1,1) function on this array. We can see new array has 4 row and 1 column.

现在我们将在这个数组上使用 reshape(-1,1) 函数。我们可以看到新数组有 4 行和 1 列。

new_z= z.reshape(-1,1)
print(new_z)        #array([[1],[2],[3], [4]])
print(new_z.shape)  #(4, 1)

Answer 3

回答by Andreas Hsieh

Have you checked if you have Nanin your data? You can use np.isNan(X)and np.isNan(y). I saw you turned on the option dropso I suspect if you have Nanin your data then that will change the shape of your input.

您是否检查Nan过您的数据中是否有？您可以使用np.isNan(X)和np.isNan(y)。我看到你打开了这个选项，drop所以我怀疑Nan你的数据是否会改变你输入的形状。

pandas 获取 ValueError：endog 和 exog 的索引未对齐

提问by Sanoj

回答by yper

回答by Ashish Anand

回答by Andreas Hsieh

相关推荐

最近更新

标签

pandas 获取 ValueError：endog 和 exog 的索引未对齐

提问by Sanoj

回答by yper

回答by Ashish Anand

回答by Andreas Hsieh

相关推荐

pandas 我应该如何使用熊猫读取没有“未命名”行的 csv 文件？

pandas 前 n 列数据框

如果一个值是 NaN，Pandas 用 NaN 替换一行中的所有项目

如何在 Pandas 数据框中提取元组值以使用 matplotlib？

相关推荐

最近更新

标签