Python 用于回归的 tensorflow 深度神经网络总是在一批中预测相同的结果

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/38399609/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 20:44:52  来源:igfitidea点击:

tensorflow deep neural network for regression always predict same results in one batch

pythonneural-networkregressiontensorflow

提问by Sufeng Niu

I use a tensorflow to implement a simple multi-layer perceptron for regression. The code is modified from standard mnist classifier, that I only changed the output cost to MSE (use tf.reduce_mean(tf.square(pred-y))), and some input, output size settings. However, if I train the network using regression, after several epochs, the output batch are totally the same. for example:

我使用张量流来实现一个简单的多层感知器进行回归。代码是从标准 mnist 分类器修改而来的,我只将输出成本更改为 MSE(使用tf.reduce_mean(tf.square(pred-y))),以及一些输入、输出大小设置。但是,如果我使用回归训练网络,经过几个时期后,输出批次完全相同。例如:

target: 48.129, estimated: 42.634
target: 46.590, estimated: 42.634
target: 34.209, estimated: 42.634
target: 69.677, estimated: 42.634
......

I have tried different batch size, different initialization, input normalization using sklearn.preprocessing.scale (my inputs range are quite different). However, none of them worked. I have also tried one of sklearn example from Tensorflow (Deep Neural Network Regression with Boston Data). But I got another error in line 40:

我使用 sklearn.preprocessing.scale 尝试了不同的批量大小、不同的初始化、输入规范化(我的输入范围非常不同)。然而,他们都没有工作。我还尝试了 Tensorflow 中的 sklearn 示例之一(使用波士顿数据进行深度神经网络回归)。但是我在第 40 行遇到了另一个错误:

'module' object has no attribute 'infer_real_valued_columns_from_input'

“模块”对象没有属性“infer_real_valued_columns_from_input”

Anyone has clues on where the problem is? Thank you

任何人都知道问题出在哪里?谢谢

My code is listed below, may be a little bit long, but very straghtforward:

下面列出了我的代码,可能有点长,但非常简单:

from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

import tensorflow as tf
from tensorflow.contrib import learn
import matplotlib.pyplot as plt

from sklearn.pipeline import Pipeline
from sklearn import datasets, linear_model
from sklearn import cross_validation
import numpy as np

boston = learn.datasets.load_dataset('boston')
x, y = boston.data, boston.target
X_train, X_test, Y_train, Y_test = cross_validation.train_test_split(
x, y, test_size=0.2, random_state=42)

total_len = X_train.shape[0]

# Parameters
learning_rate = 0.001
training_epochs = 500
batch_size = 10
display_step = 1
dropout_rate = 0.9
# Network Parameters
n_hidden_1 = 32 # 1st layer number of features
n_hidden_2 = 200 # 2nd layer number of features
n_hidden_3 = 200
n_hidden_4 = 256
n_input = X_train.shape[1]
n_classes = 1

# tf Graph input
x = tf.placeholder("float", [None, 13])
y = tf.placeholder("float", [None])

# Create model
def multilayer_perceptron(x, weights, biases):
    # Hidden layer with RELU activation
    layer_1 = tf.add(tf.matmul(x, weights['h1']), biases['b1'])
    layer_1 = tf.nn.relu(layer_1)

    # Hidden layer with RELU activation
    layer_2 = tf.add(tf.matmul(layer_1, weights['h2']), biases['b2'])
    layer_2 = tf.nn.relu(layer_2)

    # Hidden layer with RELU activation
    layer_3 = tf.add(tf.matmul(layer_2, weights['h3']), biases['b3'])
    layer_3 = tf.nn.relu(layer_3)

    # Hidden layer with RELU activation
    layer_4 = tf.add(tf.matmul(layer_3, weights['h4']), biases['b4'])
    layer_4 = tf.nn.relu(layer_4)

    # Output layer with linear activation
    out_layer = tf.matmul(layer_4, weights['out']) + biases['out']
    return out_layer

# Store layers weight & bias
weights = {
    'h1': tf.Variable(tf.random_normal([n_input, n_hidden_1], 0, 0.1)),
    'h2': tf.Variable(tf.random_normal([n_hidden_1, n_hidden_2], 0, 0.1)),
    'h3': tf.Variable(tf.random_normal([n_hidden_2, n_hidden_3], 0, 0.1)),
    'h4': tf.Variable(tf.random_normal([n_hidden_3, n_hidden_4], 0, 0.1)),
    'out': tf.Variable(tf.random_normal([n_hidden_4, n_classes], 0, 0.1))
}
biases = {
    'b1': tf.Variable(tf.random_normal([n_hidden_1], 0, 0.1)),
    'b2': tf.Variable(tf.random_normal([n_hidden_2], 0, 0.1)),
    'b3': tf.Variable(tf.random_normal([n_hidden_3], 0, 0.1)),
    'b4': tf.Variable(tf.random_normal([n_hidden_4], 0, 0.1)),
    'out': tf.Variable(tf.random_normal([n_classes], 0, 0.1))
}

# Construct model
pred = multilayer_perceptron(x, weights, biases)

# Define loss and optimizer
cost = tf.reduce_mean(tf.square(pred-y))
optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(cost)

# Launch the graph
with tf.Session() as sess:
    sess.run(tf.initialize_all_variables())

    # Training cycle
    for epoch in range(training_epochs):
        avg_cost = 0.
        total_batch = int(total_len/batch_size)
        # Loop over all batches
        for i in range(total_batch-1):
            batch_x = X_train[i*batch_size:(i+1)*batch_size]
            batch_y = Y_train[i*batch_size:(i+1)*batch_size]
            # Run optimization op (backprop) and cost op (to get loss value)
            _, c, p = sess.run([optimizer, cost, pred], feed_dict={x: batch_x,
                                                          y: batch_y})
            # Compute average loss
            avg_cost += c / total_batch

        # sample prediction
        label_value = batch_y
        estimate = p
        err = label_value-estimate
        print ("num batch:", total_batch)

        # Display logs per epoch step
        if epoch % display_step == 0:
            print ("Epoch:", '%04d' % (epoch+1), "cost=", \
                "{:.9f}".format(avg_cost))
            print ("[*]----------------------------")
            for i in xrange(3):
                print ("label value:", label_value[i], \
                    "estimated value:", estimate[i])
            print ("[*]============================")

    print ("Optimization Finished!")

    # Test model
    correct_prediction = tf.equal(tf.argmax(pred, 1), tf.argmax(y, 1))
    # Calculate accuracy
    accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))
    print ("Accuracy:", accuracy.eval({x: X_test, y: Y_test}))

采纳答案by CNugteren

Short answer:

简短的回答

Transpose your predvector using tf.transpose(pred).

移调您pred使用向量tf.transpose(pred)

Longer answer:

更长的答案

The problem is that pred(the predictions) and y(the labels) are not of the same shape: one is a row vector and the other a column vector. Apparently when you apply an element-wise operation on them, you'll get a matrix, which is not what you want.

问题在于pred(预测)和y(标签)的形状不同:一个是行向量,另一个是列向量。显然,当您对它们应用逐元素操作时,您会得到一个矩阵,这不是您想要的。

The solution is to transpose the prediction vector using tf.transpose()to get a proper vector and thus a proper loss function. Actually, if you set the batch size to 1 in your example you'll see that it works even without the fix, because transposing a 1x1 vector is a no-op.

解决方案是使用转置预测向量tf.transpose()以获得合适的向量,从而获得合适的损失函数。实际上,如果您在示例中将批量大小设置为 1,您会发现即使没有修复它也能工作,因为转置 1x1 向量是无操作的。

I applied this fix to your example code and observed the following behaviour. Before the fix:

我将此修复程序应用于您的示例代码并观察到以下行为。修复前:

Epoch: 0245 cost= 84.743440580
[*]----------------------------
label value: 23 estimated value: [ 27.47437096]
label value: 50 estimated value: [ 24.71126747]
label value: 22 estimated value: [ 23.87785912]

And after the fix at the same point in time:

在同一时间点修复后:

Epoch: 0245 cost= 4.181439120
[*]----------------------------
label value: 23 estimated value: [ 21.64333534]
label value: 50 estimated value: [ 48.76105118]
label value: 22 estimated value: [ 24.27996063]

You'll see that the cost is much lower and that it actually learned the value 50 properly. You'll have to do some fine-tuning on the learning rate and such to improve your results of course.

您会看到成本要低得多,而且它实际上正确地学习了值 50。当然,您必须对学习率进行一些微调,以改善您的结果。

回答by ahaque

There is likely a problem with your dataset loading or indexing implementation. If you only modified the cost to MSE, make sure predand yare correctly being updated and you did not overwrite them with a different graph operation.

您的数据集加载或索引实现可能存在问题。如果你只修改了成本MSE,确保predy正确地更新和你没有用不同的图形操作覆盖它们。

Another thing to help debug would be to predict the actual regression outputs. It would also help if you posted more of your code so we can see your specific data loading implementation, etc.

帮助调试的另一件事是预测实际的回归输出。如果您发布更多代码,以便我们可以查看您的特定数据加载实现等,这也会有所帮助。