Python 值错误:输入数组的样本数应与目标数组相同。找到 1600 个输入样本和 6400 个目标样本

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/44184834/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 23:48:13  来源:igfitidea点击:

Value error: Input arrays should have the same number of samples as target arrays. Found 1600 input samples and 6400 target samples

pythonarraysnumpykeras

提问by shiva

I'm trying to do a 8-class classification. Here is the code:

我正在尝试进行 8 级分类。这是代码:

import keras
import numpy as np
from keras.preprocessing.image import ImageDataGenerator
from keras.models import Sequential
from keras.layers import Dropout, Flatten, Dense
from keras import applications
from keras.optimizers import SGD
from keras import backend as K
K.set_image_dim_ordering('tf')
img_width, img_height = 48,48
top_model_weights_path = 'modelom.h5'
train_data_dir = 'chCdata1/train'
validation_data_dir = 'chCdata1/validation'
nb_train_samples = 6400
nb_validation_samples = 1600
epochs = 50
batch_size = 10
def save_bottlebeck_features():
   datagen = ImageDataGenerator(rescale=1. / 255)
   model = applications.VGG16(include_top=False, weights='imagenet', input_shape=(48,48,3))
   generator = datagen.flow_from_directory(
               train_data_dir,
               target_size=(img_width, img_height),
               batch_size=batch_size,
               class_mode='categorical',
               shuffle=False)
   bottleneck_features_train = model.predict_generator(
               generator, nb_train_samples // batch_size)
   np.save(open('bottleneck_features_train', 'wb'),bottleneck_features_train)

   generator = datagen.flow_from_directory(
               validation_data_dir,
               target_size=(img_width, img_height),
               batch_size=batch_size,
               class_mode='categorical',
               shuffle=False)
   bottleneck_features_validation = model.predict_generator(
               generator, nb_validation_samples // batch_size)
   np.save(open('bottleneck_features_validation', 'wb'),bottleneck_features_validation)

def train_top_model():
   train_data = np.load(open('bottleneck_features_train', 'rb'))
   train_labels = np.array([0] * (nb_train_samples // 8) + [1] * (nb_train_samples // 8) + [2] * (nb_train_samples // 8) + [3] * (nb_train_samples // 8) + [4] * (nb_train_samples // 8) + [5] * (nb_train_samples // 8) + [6] * (nb_train_samples // 8) + [7] * (nb_train_samples // 8))
   validation_data = np.load(open('bottleneck_features_validation', 'rb'))
   validation_labels = np.array([0] * (nb_train_samples // 8) + [1] * (nb_train_samples // 8) + [2] * (nb_train_samples // 8) + [3] * (nb_train_samples // 8) + [4] * (nb_train_samples // 8) + [5] * (nb_train_samples // 8) + [6] * (nb_train_samples // 8) + [7] * (nb_train_samples // 8))
   train_labels = keras.utils.to_categorical(train_labels, num_classes = 8)
   validation_labels = keras.utils.to_categorical(validation_labels, num_classes = 8)
   model = Sequential()
   model.add(Flatten(input_shape=train_data.shape[1:]))
   model.add(Dense(512, activation='relu'))
   model.add(Dropout(0.5))
   model.add(Dense(8, activation='softmax'))
   sgd = SGD(lr=1e-2, decay=0.00371, momentum=0.9, nesterov=False)
   model.compile(optimizer=sgd,
         loss='categorical_crossentropy', metrics=['accuracy'])
   model.fit(train_data, train_labels,
          epochs=epochs,
          batch_size=batch_size,
   validation_data=(validation_data, validation_labels))
   model.save_weights(top_model_weights_path)

save_bottlebeck_features()
train_top_model()

I've added the full list of error here:

我在这里添加了完整的错误列表:

Traceback (most recent call last):

  File "<ipython-input-14-1d34826b5dd5>", line 1, in <module>
    runfile('C:/Users/rajaramans2/codes/untitled15.py', wdir='C:/Users/rajaramans2/codes')

  File "C:\Anaconda3\lib\site-packages\spyder\utils\site\sitecustomize.py", line 866, in runfile
    execfile(filename, namespace)

  File "C:\Anaconda3\lib\site-packages\spyder\utils\site\sitecustomize.py", line 102, in execfile
    exec(compile(f.read(), filename, 'exec'), namespace)

  File "C:/Users/rajaramans2/codes/untitled15.py", line 71, in <module>
    train_top_model()

  File "C:/Users/rajaramans2/codes/untitled15.py", line 67, in train_top_model
    validation_data=(validation_data, validation_labels))

  File "C:\Anaconda3\lib\site-packages\keras\models.py", line 856, in fit
    initial_epoch=initial_epoch)

  File "C:\Anaconda3\lib\site-packages\keras\engine\training.py", line 1449, in fit
    batch_size=batch_size)

  File "C:\Anaconda3\lib\site-packages\keras\engine\training.py", line 1317, in _standardize_user_data
    _check_array_lengths(x, y, sample_weights)

  File "C:\Anaconda3\lib\site-packages\keras\engine\training.py", line 235, in _check_array_lengths
    'and ' + str(list(set_y)[0]) + ' target samples.')

ValueError: Input arrays should have the same number of samples as target arrays. Found 1600 input samples and 6400 target samples.

The "ValueError: Input arrays should have the same number of samples as target arrays. Found 1600 input samples and 6400 target samples" pops up. Kindly help with the solution and the necessary modifications to the code. Thanks in advance.

“ValueError: Input arrays should have the same number of samples as target arrays. Found 1600 input samples and 6400 target samples”弹出。请帮助解决解决方案和对代码进行必要的修改。提前致谢。

回答by Jayant Sahewal

It looks like the number of examples in X_train i.e. train_data doesn't match with the number of examples in y_train i.e. train_labels. Can you double check it? And, in the future, please attach the full error since it helps in debugging the issue.

看起来 X_train 中的示例数量(即 train_data)与 y_train 中的示例数量(即 train_labels)不匹配。你能仔细检查一下吗?并且,将来,请附上完整的错误,因为它有助于调试问题。

回答by Daniel M?ller

Looks like you have 1600 examples for training. And your 8 classes are not separated in samples, so you have an array with 8 x 1600 = 6400 values.

看起来您有 1600 个训练示例。并且您的 8 个类没有在样本中分开,因此您有一个包含 8 x 1600 = 6400 个值的数组。

That array must be something such as (1600,8). That is: 1600 samples with 8 possible classes.

该数组必须是诸如 (1600,8) 之类的内容。即:1600 个样本,包含 8 个可能的类别。

Now you need to know how your train_labelsarray is organized. Maybe a simple reshape((1600,8))is enough, if the array is properly ordered.

现在您需要知道您的train_labels阵列是如何组织的。reshape((1600,8))如果数组排序正确,也许一个简单的就足够了。

If not, you have to organize it yourself in 1600 samples of eight labels.

如果没有,您必须自己在 8 个标签的 1600 个样本中组织它。

回答by Parvatharajan

It is not about len(X_train) != len(y_train).

这不是关于 len(X_train) != len(y_train)。

Split the data into equal size for training and testing(validation). Make sure that the input data size is even. If not try to trim the data by omitting the last observation in the input data.

将数据拆分为相同大小的训练和测试(验证)。确保输入数据大小均匀。如果没有尝试通过省略输入数据中的最后一个观察来修剪数据。

train_test_split(X,y, test_size = 0.5, random_state=42)

train_test_split(X,y, test_size = 0.5, random_state=42)

This is working for me.

这对我有用。

回答by kon psych

The problem in this case is on this line

在这种情况下的问题是在这一行

validation_labels = np.array([0] * (nb_train_samples // 8) + [1] * (nb_train_samples // 8) + [2] * (nb_train_samples // 8) + [3] * (nb_train_samples // 8) + [4] * (nb_train_samples // 8) + [5] * (nb_train_samples // 8) + [6] * (nb_train_samples // 8) + [7] * (nb_train_samples // 8))

there is certainly a better way of writing this since now every occurence of nb_train_samplesshould be replaced with nb_validation_samples

肯定有更好的写法,因为现在每个出现的nb_train_samples都应该替换为nb_validation_samples

回答by 0x539

I know you have an answer but for other travelers make sure your train data is divisible by your batch_size.

我知道您有答案,但对于其他旅行者,请确保您的火车数据可被您的 batch_size 整除。