让一个简单的神经网络在 C++ 中从头开始工作

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/2019056/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-27 21:51:11  来源:igfitidea点击:

Getting a simple Neural Network to work from scratch in C++

c++neural-network

提问by Matthew FL

I have been trying to get a simple double XOR neural network to work and I am having problems getting backpropagation to train a really simple feed forward neural network.
I have been mostly been trying to follow thisguide in getting a neural network but have at best made programs that learn at extremely slow rate.

我一直在尝试让一个简单的双 XOR 神经网络工作,但在使用反向传播来训练一个非常简单的前馈神经网络时遇到了问题。
我一直在尝试遵循指南来获得神经网络,但充其量只是制作了学习速度极慢的程序。

As I understand neural networks:

据我了解神经网络:

  1. Values are computed by taking the result of a sigmoid function from the sum of all inputs to that neuron. This is then fed to the next layer using the weight for each neuron
  2. At the end of running the error is computed for the output neurons, then using the weights, error is back propagated back by simply multiplying the values and then summing at each Neuron
  3. When all of the errors are computed the weights are adjusted by the delta = weight of connection * derivative of the sigmoid (value of Neuron weight is going to) * value of Neuron that connection is to * error of neuron * amount of output error of neuron going to * beta (some constant for learning rate)
  1. 通过从该神经元的所有输入的总和中获取 sigmoid 函数的结果来计算值。然后使用每个神经元的权重将其馈送到下一层
  2. 在运行结束时,计算输出神经元的误差,然后使用权重,通过简单地将值相乘然后在每个神经元上求和来反向传播误差
  3. 当所有的错误都被计算出来后,权重被调整为 delta = 连接权重 * sigmoid 的导数(神经元权重的值)* 连接的神经元值 * 神经元错误 * 输出错误量神经元将达到 * beta(学习率的一些常数)

Thisis my current muck of code that I am trying to get working. I have a lot of other attempts somewhat mixed in, but the main backpropagation function that I am trying to get working is on line 293 in Net.cpp

是我目前正在尝试使用的大量代码。我还有很多其他尝试,但我尝试使用的主要反向传播函数位于 Net.cpp 的第 293 行

回答by Gregory Pakosz

Have a look at 15 Steps to implement a Neural Network, it should get you started.

看看实现神经网络的 15 个步骤,它应该会让你开始。

回答by Panos

I wrote a simple a "Tutorial" that you can check out below.

我写了一个简单的“教程”,您可以在下面查看。

It is a simple implementation of the perceptron model. You can imagine a perceptron as a neural network with only one neuron. There is of curse code that you can test out that I wrote in C++. I go through the code step by step so you shouldn't have any issues.

它是感知器模型的简单实现。你可以把一个感知器想象成一个只有一个神经元的神经网络。您可以测试我用 C++ 编写的诅咒代码。我一步一步地完成了代码,所以你不应该有任何问题。

Although the perceptron isn't really a "Neural Network" it is really helpful if you want to get started and might help you better understand how a full Neural Network works.

尽管感知器并不是真正的“神经网络”,但如果您想开始使用它真的很有帮助,并且可能会帮助您更好地了解完整神经网络的工作原理。

Hope that helps! Cheers! ^_^

希望有帮助!干杯! ^_^





In this example I will go through the implementation of the perceptron model in C++ so that you can get a better idea of how it works.

在这个例子中,我将介绍感知器模型在 C++ 中的实现,以便您可以更好地了解它是如何工作的。

First things first it is a good practice to write down a simple algorithm of what we want to do.

首先,写下我们想要做的简单算法是一个很好的做法。

Algorithm:

算法:

  1. Make a the vector for the weights and initialize it to 0 (Don't forget to add the bias term)
  2. Keep adjusting the weights until we get 0 errors or a low error count.
  3. Make predictions on unseen data.
  1. 为权重创建一个向量并将其初始化为 0(不要忘记添加偏差项)
  2. 继续调整权重,直到我们得到 0 个错误或一个低错误计数。
  3. 对看不见的数据进行预测。

Having written a super simple algorithm let's now write some of the functions that we will need.

编写了一个超级简单的算法后,让我们现在编写一些我们需要的函数。

  • We will need a function to calculate the net's input (e.i *x * wT*multiplying the inputs time the weights)
  • A step function so that we get a prediction of either 1 or -1
  • And a function that finds the ideal values for the weights.
  • 我们需要一个函数来计算网络的输入(ei *x * wT*将输入乘以权重)
  • 一个阶跃函数,以便我们得到 1 或 -1 的预测
  • 以及找到权重的理想值的函数。

So without further ado let's get right into it.

因此,事不宜迟,让我们开始吧。

Let's start simple by creating a perceptron class:

让我们从创建一个感知器类开始:

class perceptron
{
public:

private:

};

Now let's add the functions that we will need.

现在让我们添加我们需要的功能。

class perceptron
{
public:
    perceptron(float eta,int epochs);
    float netInput(vector<float> X);
    int predict(vector<float> X);
    void fit(vector< vector<float> > X, vector<float> y);
private:

};

Notice how the function fittakes as an argument a vector of vector< float >. That is because our training dataset is a matrix of inputs. Essentially we can imagine that matrix as a couple of vectors xstacked the one on top of another and each column of that Matrix being a feature.

请注意函数fit如何将vector<float> 的向量作为参数。那是因为我们的训练数据集是一个输入矩阵。从本质上讲,我们可以将矩阵想象为几个向量x将一个向量堆叠在另一个向量的顶部,并且该矩阵的每一列都是一个特征。

Finally let's add the values that our class needs to have. Such as the vector wto hold the weights, the number of epochswhich indicates the number of passes that we will do over the training dataset. And the constant etawhich is the learning rate of which we will multiply each weight update in order to make the training procedure faster by dialing this value up or if etais too high we can dial it down to get the ideal result( for most applications of the perceptron I would suggest an etavalue of 0.1 ).

最后让我们添加我们的类需要的值。例如用于保存权重的向量w,表示我们将在训练数据集上执行的遍数的epoch数。常数eta是学习率,我们将乘以每个权重更新,以便通过调高该值使训练过程更快,或者如果eta太高,我们可以调低它以获得理想的结果(对于大多数应用程序)对于感知器,我建议eta值为 0.1 )。

class perceptron
{
public:
    perceptron(float eta,int epochs);
    float netInput(vector<float> X);
    int predict(vector<float> X);
    void fit(vector< vector<float> > X, vector<float> y);
private:
    float m_eta;
    int m_epochs;
    vector < float > m_w;
};

Now with our class set. It's time to write each one of the functions.

现在有了我们的班级。是时候编写每一个函数了。

We will start from the constructor ( perceptron(float eta,int epochs);)

我们将从构造函数开始(perceptron(float eta,int epochs);

perceptron::perceptron(float eta, int epochs)
{
    m_epochs = epochs; // We set the private variable m_epochs to the user selected value
    m_eta = eta; // We do the same thing for eta
}

As you can see what we will be doing is very simple stuff. So let's move on to another simple function. The predict function( int predict(vector X);). Remember that what the all predictfunction does is taking the net input and returning a value of 1 if the netInputis bigger than 0 and -1 otherwhise.

如您所见,我们将要做的事情非常简单。所以让我们转到另一个简单的函数。预测函数( int predict(vector X);)。请记住,如果netInput大于 0 和 -1 ,则 all predict函数所做的是获取净输入并返回值 1 。

int perceptron::predict(vector<float> X)
{
    return netInput(X) > 0 ? 1 : -1; //Step Function
}

Notice that we used an inline if statement to make our lives easier. Here's how the inline if statement works:

请注意,我们使用内联 if 语句使我们的生活更轻松。以下是内联 if 语句的工作原理:

condition ? if_true : else

健康)状况 ?if_true : 否则

So far so good. Let's move on to implementing the netInputfunction( float netInput(vector X);)

到现在为止还挺好。让我们继续实现netInput函数(float netInput(vector X);

The netInput does the following; multiplies the input vector by the transpose of the weights vector

netInput 执行以下操作;将输入向量乘以权重向量的转置

*x * wT*

*x * wT*

In other words, it multiplies each element of the input vector xby the corresponding element of the vector of weights wand then takes their sum and adds the bias.

换句话说,它将输入向量x的每个元素乘以权重向量w的对应元素,然后取它们的和并加上偏差。

*(x1 * w1 + x2 * w2 + ... + xn * wn) + bias*

*(x1 * w1 + x2 * w2 + ... + xn * wn) + 偏差*

*bias = 1 * w0*

*偏差 = 1 * w0*

float perceptron::netInput(vector<float> X)
{
    // Sum(Vector of weights * Input vector) + bias
    float probabilities = m_w[0]; // In this example I am adding the perceptron first
    for (int i = 0; i < X.size(); i++)
    {
        probabilities += X[i] * m_w[i + 1]; // Notice that for the weights I am counting
        // from the 2nd element since w0 is the bias and I already added it first.
    }
    return probabilities;
}

Alright so we are now pretty much done last thing we need to do is to write the fitfunction which modifies the weights.

好的,我们现在几乎完成了我们需要做的最后一件事是编写修改权重的拟合函数。

void perceptron::fit(vector< vector<float> > X, vector<float> y)
{
    for (int i = 0; i < X[0].size() + 1; i++) // X[0].size() + 1 -> I am using +1 to add the bias term
    {
        m_w.push_back(0); // Setting each weight to 0 and making the size of the vector
        // The same as the number of features (X[0].size()) + 1 for the bias term
    }
    for (int i = 0; i < m_epochs; i++) // Iterating through each epoch
    {
        for (int j = 0; j < X.size(); j++) // Iterating though each vector in our training Matrix
        {
            float update = m_eta * (y[j] - predict(X[j])); //we calculate the change for the weights
            for (int w = 1; w < m_w.size(); w++){ m_w[w] += update * X[j][w - 1]; } // we update each weight by the update * the training sample
            m_w[0] = update; // We update the Bias term and setting it equal to the update
        }
    }
}

So that was essentially it. With only 3 functions we now have a working perceptron class that we can use to make predictions!

所以基本上就是这样。只有 3 个函数,我们现在有一个有效的感知器类,我们可以用它来进行预测!

In case you want to copy-paste the code and try it out. Here is the entire class (I added some extra functionality such as printing the weights vector and the errors in each epoch as well as added the option to import/export weights.)

如果您想复制粘贴代码并尝试一下。这是整个类(我添加了一些额外的功能,例如打印每个时期的权重向量和错误,并添加了导入/导出权重的选项。)

Here is the code:

这是代码:

The class header:

类标题:

class perceptron
{
public:
    perceptron(float eta,int epochs);
    float netInput(vector<float> X);
    int predict(vector<float> X);
    void fit(vector< vector<float> > X, vector<float> y);
    void printErrors();
    void exportWeights(string filename);
    void importWeights(string filename);
    void printWeights();
private:
    float m_eta;
    int m_epochs;
    vector < float > m_w;
    vector < float > m_errors;
};

The class .cpp file with the functions:

具有以下功能的类 .cpp 文件:

perceptron::perceptron(float eta, int epochs)
{
    m_epochs = epochs;
    m_eta = eta;
}

void perceptron::fit(vector< vector<float> > X, vector<float> y)
{
    for (int i = 0; i < X[0].size() + 1; i++) // X[0].size() + 1 -> I am using +1 to add the bias term
    {
        m_w.push_back(0);
    }
    for (int i = 0; i < m_epochs; i++)
    {
        int errors = 0;
        for (int j = 0; j < X.size(); j++)
        {
            float update = m_eta * (y[j] - predict(X[j]));
            for (int w = 1; w < m_w.size(); w++){ m_w[w] += update * X[j][w - 1]; }
            m_w[0] = update;
            errors += update != 0 ? 1 : 0;
        }
        m_errors.push_back(errors);
    }
}

float perceptron::netInput(vector<float> X)
{
    // Sum(Vector of weights * Input vector) + bias
    float probabilities = m_w[0];
    for (int i = 0; i < X.size(); i++)
    {
        probabilities += X[i] * m_w[i + 1];
    }
    return probabilities;
}

int perceptron::predict(vector<float> X)
{
    return netInput(X) > 0 ? 1 : -1; //Step Function
}

void perceptron::printErrors()
{
    printVector(m_errors);
}

void perceptron::exportWeights(string filename)
{
    ofstream outFile;
    outFile.open(filename);

    for (int i = 0; i < m_w.size(); i++)
    {
        outFile << m_w[i] << endl;
    }

    outFile.close();
}

void perceptron::importWeights(string filename)
{
    ifstream inFile;
    inFile.open(filename);

    for (int i = 0; i < m_w.size(); i++)
    {
        inFile >> m_w[i];
    }
}

void perceptron::printWeights()
{
    cout << "weights: ";
    for (int i = 0; i < m_w.size(); i++)
    {
        cout << m_w[i] << " ";
    }
    cout << endl;
}

Also if you want to try out an example here is an example I made:

另外,如果你想尝试一个例子,这里是我做的一个例子:

main.cpp:

主.cpp:

#include <iostream>
#include <vector>
#include <algorithm>
#include <fstream>
#include <string>
#include <math.h> 

#include "MachineLearning.h"

using namespace std;
using namespace MachineLearning;

vector< vector<float> > getIrisX();
vector<float> getIrisy();

int main()
{
    vector< vector<float> > X = getIrisX();
    vector<float> y = getIrisy();
    vector<float> test1;
    test1.push_back(5.0);
    test1.push_back(3.3);
    test1.push_back(1.4);
    test1.push_back(0.2);

    vector<float> test2;
    test2.push_back(6.0);
    test2.push_back(2.2);
    test2.push_back(5.0);
    test2.push_back(1.5);
    //printVector(X);
    //for (int i = 0; i < y.size(); i++){ cout << y[i] << " "; }cout << endl;

    perceptron clf(0.1, 14);
    clf.fit(X, y);
    clf.printErrors();
    cout << "Now Predicting: 5.0,3.3,1.4,0.2(CorrectClass=-1,Iris-setosa) -> " << clf.predict(test1) << endl;
    cout << "Now Predicting: 6.0,2.2,5.0,1.5(CorrectClass=1,Iris-virginica) -> " << clf.predict(test2) << endl;

    system("PAUSE");
    return 0;
}

vector<float> getIrisy()
{
    vector<float> y;

    ifstream inFile;
    inFile.open("y.data");
    string sampleClass;
    for (int i = 0; i < 100; i++)
    {
        inFile >> sampleClass;
        if (sampleClass == "Iris-setosa")
        {
            y.push_back(-1);
        }
        else
        {
            y.push_back(1);
        }
    }

    return y;
}

vector< vector<float> > getIrisX()
{
    ifstream af;
    ifstream bf;
    ifstream cf;
    ifstream df;
    af.open("a.data");
    bf.open("b.data");
    cf.open("c.data");
    df.open("d.data");

    vector< vector<float> > X;

    for (int i = 0; i < 100; i++)
    {
        char scrap;
        int scrapN;
        af >> scrapN;
        bf >> scrapN;
        cf >> scrapN;
        df >> scrapN;

        af >> scrap;
        bf >> scrap;
        cf >> scrap;
        df >> scrap;
        float a, b, c, d;
        af >> a;
        bf >> b;
        cf >> c;
        df >> d;
        X.push_back(vector < float > {a, b, c, d});
    }

    af.close();
    bf.close();
    cf.close();
    df.close();

    return X;
}

The way I imported the iris dataset isn't really ideal but I just wanted something that worked.

我导入 iris 数据集的方式并不是很理想,但我只是想要一些有用的东西。

The data files can be found here.

数据文件可以在这里找到

I hope that you found this helpful!

我希望你觉得这有帮助!

Note: The code above is there only as an example. As noted by juzzlin it is important that you use const vector<float> &Xand in general pass the vector/vector<vector>objects by reference because the data can be very large and passing it by value will make a copy of it (which is inefficient).

注意:上面的代码只是一个例子。正如 juzzlin 所指出的,重要的是您使用const vector<float> &X并通常通过引用传递vector/vector<vector>对象,因为数据可能非常大并且通过值传递它会复制它(这是低效的)。

回答by Simon

Sounds to me like you are struggling with backprop and what you describe above doesn't quite match how I understand it to work, and your description is a bit ambiguous.

在我看来,您正在为反向传播而苦苦挣扎,而且您上面描述的内容与我对它的理解并不完全相符,而且您的描述有点模棱两可。

You calculate the output error term to backpropagate as the diffrence between the prediction and the actual value multiplied by the derivative of the transfer function. It is that error value which you then propagate backwards. The derivative of a sigmoid is calculated quite simply as y(1-y) where y is your output value. There are lots of proofs of that available on the web.

您计算输出误差项以作为预测值与实际值之间的差异乘以传递函数的导数进行反向传播。这是您然后向后传播的错误值。sigmoid 的导数非常简单地计算为 y(1-y),其中 y 是您的输出值。网络上有很多证据可以证明这一点。

For a node on the inner layer, you multiply that output error by the weight between the two nodes, and sum all those products as the total error from the outer layer being propagated to the node in the inner layer. The error associated with the inner node is then multiplied by the derivative of the transfer function applied to the original output value. Here's some pseudocode:

对于内层的节点,您可以将该输出误差乘以两个节点之间的权重,然后将所有这些乘积相加作为从外层传播到内层节点的总误差。然后将与内部节点相关联的误差乘以应用于原始输出值的传递函数的导数。这是一些伪代码:

total_error = sum(output_errors * weights)
node_error = sigmoid_derivative(node_output) * total_error

This error is then propagated backwards in the same manner right back through the input layer weights.

然后,该错误以相同的方式通过输入层权重向后传播。

The weights are adjusted using these error terms and the output values of the nodes

使用这些误差项和节点的输出值调整权重

weight_change = outer_error * inner_output_value

the learning rate is important because the weight change is calculated for every pattern/row/observation in the input data. You want to moderate the weight change for each row so the weights don't get unduly changed by any single row and so that all rows have an effect on the weights. The learning rate gives you that and you adjust the weight change by multiplying by it

学习率很重要,因为权重变化是针对输入数据中的每个模式/行/观察值计算的。您想缓和每一行的权重变化,以便权重不会被任何单行过度改变,并且所有行都会对权重产生影响。学习率给你,你通过乘以它来调整权重变化

weight_change = outer_error * inner_output_value * learning_rate

It is also normal to remember these changes between epochs (iterations) and to add a fraction of it to the change. The fraction added is called momentum and is supposed to speed you up through regions of the error surface where there is not much change and slow you down where there is detail.

记住时代(迭代)之间的这些变化并将其中的一小部分添加到变化中也是正常的。添加的分数称为动量,应该可以加快您通过误差面中变化不大的区域,并在有细节的地方减慢速度。

weight_change = (outer_error*inner_output_value*learning_rate) + (last_change*momentum)

There are algorithms for adjusting the learning rate and momentum as the training proceeds.

随着训练的进行,有一些算法可以调整学习率和动量。

The weight is then updated by adding the change

然后通过添加更改来更新权重

new_weight = old_weight + weight_change

I had a look through your code, but rather than correct it and post that I thought it was better to describe back prop for you so you can code it up yourself. If you understand it you'll be able to tune it for your circumstances too.

我查看了您的代码,但与其更正并发布我认为最好为您描述 back prop 以便您可以自己编写代码的帖子。如果你理解它,你也可以根据你的情况调整它。

HTH and good luck.

HTH,祝你好运。

回答by Oleg Shirokikh

How about this open-source code. It defines a simple 1 hidden layer net (2 input, 2 hidden, 1 output) and solves XOR problem:

这个开源代码怎么样。它定义了一个简单的 1 个隐藏层网络(2 个输入,2 个隐藏,1 个输出)并解决了 XOR 问题:

http://www.sylbarth.com/mlp.php

http://www.sylbarth.com/mlp.php

回答by Pe Dro

What about a simple function approximation network like the one that predicts and fits a Sine Function. Also, I think, avoiding class during implementation is a must for getting the basics easily. Let's consider a single hidden layer network.

一个简单的函数逼近网络怎么样,比如预测和拟合正弦函数的网络。另外,我认为,在实现过程中避免上课是轻松获得基础知识的必要条件。让我们考虑一个单隐藏层网络。

//Had a lot of trouble with shuffle

#include <iostream>
#include<vector>
#include <list>
#include <cstdlib>
#include <math.h>

#define PI 3.141592653589793238463
#define N

#define epsilon 0.1
#define epoch 2000

using namespace std;
// Just for GNU Plot issues
extern "C" FILE *popen(const char *command, const char *mode);

// Defining activation functions
//double sigmoid(double x) { return 1.0f / (1.0f + exp(-x)); }
//double dsigmoid(double x) { return x * (1.0f - x); }
double tanh(double x) { return (exp(x)-exp(-x))/(exp(x)+exp(-x)) ;}
double dtanh(double x) {return 1.0f - x*x ;}

double lin(double x) { return x;}
double dlin(double x) { return 1.0f;}

double init_weight() { return (2*rand()/RAND_MAX -1); }
double MAXX = -9999999999999999; //maximum value of input example

// Network Configuration
static const int numInputs = 1;
static const int numHiddenNodes = 7;
static const int numOutputs = 1;

// Learning Rate
const double lr = 0.05f;

double hiddenLayer[numHiddenNodes];
double outputLayer[numOutputs];

double hiddenLayerBias[numHiddenNodes];
double outputLayerBias[numOutputs];

double hiddenWeights[numInputs][numHiddenNodes];
double outputWeights[numHiddenNodes][numOutputs];

static const int numTrainingSets = 50;
double training_inputs[numTrainingSets][numInputs];
double training_outputs[numTrainingSets][numOutputs];

// Shuffling the data with each epoch
void shuffle(int *array, size_t n)
{
    if (n > 1) //If no. of training examples > 1
    {
        size_t i;
        for (i = 0; i < n - 1; i++)
        {
            size_t j = i + rand() / (RAND_MAX / (n - i) + 1);
            int t = array[j];
            array[j] = array[i];
            array[i] = t;
        }
    }
}

// Forward Propagation. Only used after training is done.
void predict(double test_sample[])
{
    for (int j=0; j<numHiddenNodes; j++)
    {
        double activation=hiddenLayerBias[j];
        for (int k=0; k<numInputs; k++)
        {
            activation+=test_sample[k]*hiddenWeights[k][j];
        }
        hiddenLayer[j] = tanh(activation);
    }

    for (int j=0; j<numOutputs; j++)
    {
        double activation=outputLayerBias[j];
        for (int k=0; k<numHiddenNodes; k++)
        {
            activation+=hiddenLayer[k]*outputWeights[k][j];
        }
        outputLayer[j] = lin(activation);
    }
    //std::cout<<outputLayer[0]<<"\n";
    //return outputLayer[0];
    //std::cout << "Input:" << training_inputs[i][0] << " " << training_inputs[i][1] << "    Output:" << outputLayer[0] << "    Expected Output: " << training_outputs[i][0] << "\n";
}

int main(int argc, const char * argv[])
{
    ///TRAINING DATA GENERATION
    for (int i = 0; i < numTrainingSets; i++)
    {
        double p = (2*PI*(double)i/numTrainingSets);
        training_inputs[i][0] = (p);
        training_outputs[i][0] = sin(p);

        ///FINDING NORMALIZING FACTOR
        for(int m=0; m<numInputs; ++m)
            if(MAXX < training_inputs[i][m])
                MAXX = training_inputs[i][m];
        for(int m=0; m<numOutputs; ++m)
            if(MAXX < training_outputs[i][m])
                MAXX = training_outputs[i][m];

    }

    ///NORMALIZING
    for (int i = 0; i < numTrainingSets; i++)
    {
        for(int m=0; m<numInputs; ++m)
            training_inputs[i][m] /= 1.0f*MAXX;

        for(int m=0; m<numOutputs; ++m)
            training_outputs[i][m] /= 1.0f*MAXX;

        cout<<"In: "<<training_inputs[i][0]<<"  out: "<<training_outputs[i][0]<<endl;
    }
    ///WEIGHT & BIAS INITIALIZATION
    for (int i=0; i<numInputs; i++) {
        for (int j=0; j<numHiddenNodes; j++) {
            hiddenWeights[i][j] = init_weight();
        }
    }
    for (int i=0; i<numHiddenNodes; i++) {
        hiddenLayerBias[i] = init_weight();
        for (int j=0; j<numOutputs; j++) {
            outputWeights[i][j] = init_weight();
        }
    }
    for (int i=0; i<numOutputs; i++) {
        //outputLayerBias[i] = init_weight();
        outputLayerBias[i] = 0;
    }

    ///FOR INDEX SHUFFLING
    int trainingSetOrder[numTrainingSets];
    for(int j=0; j<numInputs; ++j)
        trainingSetOrder[j] = j;


    ///TRAINING
    //std::cout<<"start train\n";
    vector<double> performance, epo; ///STORE MSE, EPOCH
    for (int n=0; n < epoch; n++)
    {
        double MSE = 0;
        shuffle(trainingSetOrder,numTrainingSets);
        std::cout<<"epoch :"<<n<<"\n";
        for (int i=0; i<numTrainingSets; i++)
        {
            //int i = trainingSetOrder[x];
            int x=i;
            //std::cout<<"Training Set :"<<x<<"\n";
            /// Forward pass
            for (int j=0; j<numHiddenNodes; j++)
            {
                double activation=hiddenLayerBias[j];
                //std::cout<<"Training Set :"<<x<<"\n";
                 for (int k=0; k<numInputs; k++) {
                    activation+=training_inputs[x][k]*hiddenWeights[k][j];
                }
                hiddenLayer[j] = tanh(activation);
            }

            for (int j=0; j<numOutputs; j++) {
                double activation=outputLayerBias[j];
                for (int k=0; k<numHiddenNodes; k++)
                {
                    activation+=hiddenLayer[k]*outputWeights[k][j];
                }
                outputLayer[j] = lin(activation);
            }

            //std::cout << "Input:" << training_inputs[x][0] << " " << "    Output:" << outputLayer[0] << "    Expected Output: " << training_outputs[x][0] << "\n";
            for(int k=0; k<numOutputs; ++k)
                MSE += (1.0f/numOutputs)*pow( training_outputs[x][k] - outputLayer[k], 2);

           /// Backprop
           ///   For V
            double deltaOutput[numOutputs];
            for (int j=0; j<numOutputs; j++) {
                double errorOutput = (training_outputs[i][j]-outputLayer[j]);
                deltaOutput[j] = errorOutput*dlin(outputLayer[j]);
            }

            ///   For W
            double deltaHidden[numHiddenNodes];
            for (int j=0; j<numHiddenNodes; j++) {
                double errorHidden = 0.0f;
                for(int k=0; k<numOutputs; k++) {
                    errorHidden+=deltaOutput[k]*outputWeights[j][k];
                }
                deltaHidden[j] = errorHidden*dtanh(hiddenLayer[j]);
            }

            ///Updation
            ///   For V and b
            for (int j=0; j<numOutputs; j++) {
                //b
                outputLayerBias[j] += deltaOutput[j]*lr;
                for (int k=0; k<numHiddenNodes; k++)
                {
                    outputWeights[k][j]+= hiddenLayer[k]*deltaOutput[j]*lr;
                }
            }

            ///   For W and c
            for (int j=0; j<numHiddenNodes; j++) {
                //c
                hiddenLayerBias[j] += deltaHidden[j]*lr;
                //W
                for(int k=0; k<numInputs; k++) {
                  hiddenWeights[k][j]+=training_inputs[i][k]*deltaHidden[j]*lr;
                }
            }
        }
        //Averaging the MSE
        MSE /= 1.0f*numTrainingSets;
        //cout<< "  MSE: "<< MSE<<endl;
        ///Steps to PLOT PERFORMANCE PER EPOCH
        performance.push_back(MSE*100);
        epo.push_back(n);
    }

    // Print weights
    std::cout << "Final Hidden Weights\n[ ";
    for (int j=0; j<numHiddenNodes; j++) {
        std::cout << "[ ";
        for(int k=0; k<numInputs; k++) {
            std::cout << hiddenWeights[k][j] << " ";
        }
        std::cout << "] ";
    }
    std::cout << "]\n";

    std::cout << "Final Hidden Biases\n[ ";
    for (int j=0; j<numHiddenNodes; j++) {
        std::cout << hiddenLayerBias[j] << " ";

    }
    std::cout << "]\n";
    std::cout << "Final Output Weights";
    for (int j=0; j<numOutputs; j++) {
        std::cout << "[ ";
        for (int k=0; k<numHiddenNodes; k++) {
            std::cout << outputWeights[k][j] << " ";
        }
        std::cout << "]\n";
    }
    std::cout << "Final Output Biases\n[ ";
    for (int j=0; j<numOutputs; j++) {
        std::cout << outputLayerBias[j] << " ";
    }
    std::cout << "]\n";


    /* This part is just for plotting the results.
       This requires installing GNU Plot. You can also comment it out.
    */
    //Plot the results
    vector<float> x;
    vector<float> y1, y2;
    //double test_input[1000][numInputs];
    int numTestSets = numTrainingSets;
    for (float i = 0; i < numTestSets; i=i+0.25)
    {
        double p = (2*PI*(double)i/numTestSets);
        x.push_back(p);
        y1.push_back(sin(p));
        double test_input[1];
        test_input[0] = p/MAXX;
        predict(test_input);
        y2.push_back(outputLayer[0]*MAXX);
    }

    FILE * gp = popen("gnuplot", "w");
    fprintf(gp, "set terminal wxt size 600,400 \n");
    fprintf(gp, "set grid \n");
    fprintf(gp, "set title '%s' \n", "f(x) = x sin (x)");
    fprintf(gp, "set style line 1 lt 3 pt 7 ps 0.1 lc rgb 'green' lw 1 \n");
    fprintf(gp, "set style line 2 lt 3 pt 7 ps 0.1 lc rgb 'red' lw 1 \n");
    fprintf(gp, "plot '-' w p ls 1, '-' w p ls 2 \n");

    ///Exact f(x) = sin(x) -> Green Graph
    for (int k = 0; k < x.size(); k++) {
        fprintf(gp, "%f %f \n", x[k], y1[k]);
    }
    fprintf(gp, "e\n");

    ///Neural Network Approximate f(x) = xsin(x) -> Red Graph
    for (int k = 0; k < x.size(); k++) {
        fprintf(gp, "%f %f \n", x[k], y2[k]);
    }
    fprintf(gp, "e\n");

    fflush(gp);
    ///FILE POINTER FOR SECOND PLOT (PERFORMANCE GRAPH)
    FILE * gp1 = popen("gnuplot", "w");
    fprintf(gp1, "set terminal wxt size 600,400 \n");
    fprintf(gp1, "set grid \n");
    fprintf(gp1, "set title '%s' \n", "Performance");
    fprintf(gp1, "set style line 1 lt 3 pt 7 ps 0.1 lc rgb 'green' lw 1 \n");
    fprintf(gp1, "set style line 2 lt 3 pt 7 ps 0.1 lc rgb 'red' lw 1 \n");
    fprintf(gp1, "plot '-' w p ls 1 \n");

    for (int k = 0; k < epo.size(); k++) {
        fprintf(gp1, "%f %f \n", epo[k], performance[k]);
    }
    fprintf(gp1, "e\n");

    fflush(gp1);

    system("pause");
    //_pclose(gp);

    return 0;
}

I too have been trying to learn simple (shallow) Neural Networks while avoiding any high level tools. I have tried to maintain some of my learning at this repository.

我也一直在尝试学习简单(浅层)神经网络,同时避免使用任何高级工具。我试图在这个存储库中保持我的一些学习。