Deep Neural net with Keras
Hey ! I know I've been out these days, but believe me I was really swamped. So I should probably post about weather data analysis with Spark, but I was working on neural networks and basically I write about things that I've been doing lately. But I promise I'll definitely write about Spark ! So today we are going to see how we can create a Deep neural network with Keras, there are few things or actually many things that you need to know before getting started with Keras, I'll very briefly cover that and by the way I'm preparing another post about in depth neural networks from scratch ! Here's what we will see today:
  1. What is Deep learning
  2. What is Keras
  3. How to create a deep neural net
  4. Training a deep neural net
  5. Tuning
  6. Demo: Digits recognition
 

1 Deep Learning 

 
Basically Deep learning is a neural network with many hidden layers, hence the name "deep". Now I feel like I have to talk about neural nets lol, well a neural net is a model driven from biology and it tries to mimic that since human brain is so good with making decisions and processing events.
Visually what a Deep neural net looks like
Now you might ask, what are those layers ? Good question, so those layers represent the flow in which data is processed in order to make a prediction (an output). 
Going back to machine learning, we have a feature vector X by convention which represents the input and regardless of the algorithm used it tries (algorithm) to fit a model given a pattern which explains how feature X affects output y.
Similarly Input layer in red is the feature vector and lines which connect each neuron (yes neurons like brain cells) from one layer to another are weights, they actually quantify the pattern we talked about and this goes through the network until it reaches the output.
Now there's more to that at the end of each neuron in each layer except the input layer there's an output value retuned from a function which is called activation function, and if you're familiar with machine learning weights need to be updated because their initial value won't be accurate for sure, this process is called backpropation.
I know I've sped up a bit in this last paragraph but you can google these things or wait for the in depth blog post, since this is a keras tutorial not a neural networks one.

2 - What is Keras

Keras is a framework which uses other libraries in it's backend like (tensorflow, theano...) then why should I use it ? Well you should use it because it makes building a deep neural network fast so it's good for bootstrapping stuff. Or if you're new to deep learning like myself.

3 - How to create a Deep Neural net

Creating a deep neural net is having layers with neurons and attach to each layer an activation function, except the input layer.
Okay now let's walk together through the code line by line, so first thing as usual with Python we import necessary packages. Sequential creates a basic model which is fine for what we are doing, probably later you'll use other models, Dense helps us create layers. Awesome, now if you were following along we mentioned X feature vector, right? So we get number of features into n_cols, then we continue to create layers. model.add(Dense(20, input_shape=(n_cols, ))) this bit here, creates an input layer taking into consideration number of features and using 20 units meaning 20 neurons. Great we're almost there, you guessed it similarly we create hidden layers and we use relu (rectifier linear unit) activation function and finally the output layer with a different activation function. Let's take a step back, I might have been unclear regarding those activation functions but here's what you need to know. There are few of them (tanh, relu, softmax...) and each of them is a mathematical function which applies to the output of each neurons. Now depending on your use case you might want to use a specific activation function.

4 - Train a Deep Neural net

 Going back to previous code we had to compile first the deep neural net model and then fit it (train it). Now I know that what we wrote earlier was the most basic deep learning model, in the last chapter we will go through a more detailed demo. For now let's keep things simple !
Compile method has two arguments which specify the optimization that we want to use in the model and the loss function. If we were to work on a regression problem we would most likely go with (MSE, SSE, ...) similarly a classification would require something else like binary-crossentrpy or categorical-crossentropy for instance. As for the optimizer there are plenty for example there's Stochastic Gradient Descent  (SGD) so you got it, optimizer is the way you want to minimize the loss function. In order words make it converge. Of course you can customize many arguments to the fit method like number of epochs, evaluation dataset and so on...

5 - Tuning a Deep Neural net 

Like any learning scenario, you need to tune it's hyper-parameters in order to reach maximum prediction accuracy and of course tuning might help avoid overfitting.
Tuning in real world is a hard problem, you will find yourself optimizing many parameters with complex relationships. Sometimes updates are not really meaningful or you might get into what we call dying neuron phenomenon. Which basically is having weights that don't update because activation function sets value in neuron to zero.
Anyways let's get stated with simple tuning scripts
So, what have we done here ?
  • We've seen how we can use an optimizer like SGD for tuning the learning rate (google that if you don't know what it means) 
  • How we use a break condition (EarlyStopping) to stop model training if it doesn't improve it's accuracy over few epochs.
  •  We've also added few arguments to the fit function like the validation_split (equivalent of train_test_split) and number of epochs. 
There are other experimental hacks to make a model yield better predictions, that is model architecture meaning adding neurons, adding layers and so on. Creating different models in the experimental phase requires comparison, here's a comparison between two models.
Model comparison
Obviously here blue model performs better since it has lower loss function and it requires only few epochs to reach that. See, tuning Deep Neural nets requires some efforts and experiments. We've covered a general basic idea of how to get started with that, now it's your turn to get your hands dirty ! One last thing before closing, following chapter will be about building a model which classifies hand written digits.

6 - Demo.

Coming soon, stay tuned !

Leave a Reply

Your email address will not be published. Required fields are marked *