Programming a Mediocre Neural Network From Scratch

This was a project I took on after watching 3Blue1Brown's video series on neural networks. I programmed a neural network entirely from scratch in MATLAB which was ultimately 38% successful at recognizing handwritten digits. For me, with less than a year of real programming experience, this was a huge accomplishment. I've moved on to other projects, but looking back I learned more from this project than any one of the other programming projects I've done. When I started this project, I didn't even know how to open and import files in MATLAB, and by the time I was done I had a basic understanding of how neural networks work. 

Simplified diagram of a neural network

I used the MNIST Database as my data set for ease and convenience. It includes 70,000 handwritten digits all of the same size, making it a common data set for beginners trying to make AI. 

Examples of the handwritten digits

As I got further into the project I was surprised by how simple and elegant a neural network really is. What I thought was some complex magical algorithm turned out to just be weighted averages and matrix multiplication. 

The neural network I made is separated into 4 layers, starting with the input of the pixels from the database to the output of the prediction of which digit the network sees. The first layer has 784 inputs; each of the images in the data set was 48x48 pixels. This is followed by two hidden layers, I gave both of my hidden layers 16 nodes similar to 3Blue1Brown's video. I had 10 outputs in the final layer since that's the prediction of what the digit 0-9 is. 

In my network layers 1 - 4 had 784, 16, 16, and 10 nodes respectively.

Each node is connected to every other node in the adjacent layers, and each one of those connections stores an equation for how significant its contribution is when trying to determine what the digit is. When the 784 pixels are inputted into the program, a weighted average is taken using those connections to assign a value between 0 and 1 for each of the 16 nodes on the hidden layer. This happens two more times and ultimately the program has a value between 0 and 1 for each of the 10 outputs. The one with the strongest signal, or the largest value between 0 and 1, is the predicted answer. 

Importing data code snippet

When the neural network is inevitably wrong, it can see how close it was to guessing the right answer and tweak the weights on each of the connections accordingly. By guessing tens of thousands of times, the neural network gets better and better at predicting the handwritten digits. 

Back propogation / training code snippet

A portion of the original dataset is set aside to verify the accuracy of the neural network. My neural network managed ~38% accuracy at reading the 10,000 non-training images. This is significantly better than random which I consider a success. The code is available here

So why did my network achieve 38% accuracy when modern networks achieve accuracies upwards of 99.8%? I suspect it could be one of a few reasons:
  1. There's a major bug in my code that I never worked out. 
  2. I'm not too comfortable on the backpropagation equations that change the weights on each of the connections, I may have got some of them wrong. 
  3. I need to train the program for longer, it could be that I just didn't give it enough time to tweak the connections properly. This is unlikely, I let the program run for a few days straight without significantly better results.