Learning Neural networks with Keras and linear regression

Learning TensorFlow/Keras by Linear regression

This is my attempt the learn artificial neural networks (ANNs) by breaking them down
into there constituent parts and running as simple code as possible.

Biological neurons

If my memory from my undergraduates days is correct, neurons are activated only once there is enough 'neurotransmitter' at the synapse. Then an electrical signal travels along the axon to whatever is at the other end - probably another neuron.

Artificial neurons (AN)

Were designed to replicate biological neurons. And so they essentially do the same thing, but with maths! If you need more than a laymans, half-remembered, explanation Google is your friend.

Learning how ANs work

First I need to make some data. I am going to create some data that makes sense to me and isn't just noise.

            
                def generate_data():
                    x = np.linspace(0, 1, 100)
                    y = 1.35 * x + np.random.randn(*x.shape) * 0.33 + 0.5
                    return x, y
                
                X, y = generate_data()
            
            

So, we can see we have some data that are positvely correlated.

Now I am going to make a model using TensorFlow and the Keras library.
If you don't understand Python perhaps this will not be for you.
All I am doing is creating a neural network with a single neuron

            
                def build_model():
                                
                    activation = 'linear'
                    
                    input_layer = Input(shape=[1])
                    output = Dense(1, activation = activation)(input_layer)
                    
                    model = Model(inputs=input_layer, outputs=output)

                    return model
            
            

My 'explanation' of the model and the basics of how it works

input_layer - literally the input (one dimensional vector) or 1 feature, no bias or weight. This is where we input the data that we think may help us predict whatever our target is - in this case the value of y. Output (Dense) - there is no 'output_layer' you use a dense layer which has a weight (w) and bias (b).

Weight = slope or gradient

Bias = intercept

Essentially, weight and bias are both trainable parameters and we adjust them as needed. So this is why model.summary() gives a value of two. If we had a hidden dense layer in the middle with one unit how many trainable paramters would we have? Yep, 4.

As far as I am aware, this is the smallest possible functioning ANN that can fit 2 dimensional data. If we had a straigh horizontal line we could not use the bias paramter and only have a weight (use_bias = False)

Just like the equation of the line y = mx + c.
Each value passed into the neuron gets multiplied by w and b is added to the product - w(x) + b.
Hopefully the value the function returns is equal to y.

In this case I am using a linear activation function which just returns the input tensor - doesn't do anything.

As to how and when the activation is used, I believe if the value of w(x) + b exceeds a certain value (depends on the function) the the neuron is activated - gives output. Just like a biological neuron.

input_layer - literally the input (one dimensional vector) or 1 feature, no bias or weight. This is where we input the data that we think may help us predict whatever our target is - in this case the value of y. Output (Dense) - there is no 'output_layer' you use a dense layer which has a weight (w) and bias (b).

Weight = slope or gradient

Bias = intercept

Essentially, weight and bias are both trainable parameters and we adjust them as needed. So this is why model.summary() gives a value of two. If we had a hidden dense layer in the middle with one unit how many trainable paramters would we have? Yep, 4.

As far as I am aware, this is the smallest possible functioning ANN that can fit 2 dimensional data. If we had a straigh horizontal line we could not use the bias paramter and only have a weight (use_bias = False)>

Just like the equation of the line y = mx + c. Each value passed into the neuron gets multiplied by w and b is added to the product - w(x) + b. Hopefully the value the function returns is equal to y.

In this case I am using a linear activation function which just returns the input tensor - doesn't do anything.

As to how and when the activation is used, I believe if the value of w(x) + b exceeds a certain value (depends on the function) the the neuron is activated - gives output. Just like a biological neuron.

Running the model

                
                optimizer = Adam(learning_rate = 0.6)
                model = build_model()
                model.compile(loss='mean_squared_error',optimizer=optimizer)
                history = model.fit(X,y,epochs=30, verbose=0)
                model.summary()
                
                


Now to check the performance

                
                    slope, intercept, r_value, p_value, std_err = stats.linregress(X,y)
                    plt.scatter(X,y)
                    plt.plot(X,model.predict(X), color = 'r', label = 'neural-net')
                    plt.plot(X, X*slope + intercept, color = 'g', label = 'lin-regress')
                    plt.legend()
                    plt.savefig('final-fig.png', dpi = 100)
                
                


The artificial neuron fit almost perfectly!

                    
                    w, b = model.layers[1].get_weights()
                    m, c = slope, intercept

                    print(f'weight {w[0][0]:.2f}, gradient {m:.2f},\nbias {b[0]:.2f}, intercept {c:.2f}')

                    weight 1.25, gradient 1.27,
                    bias 0.52, intercept 0.51
                    
                    

So now the question is, how are the weights and biases updated?

My assumption: After each epoch, the weights and biases are adjusted, randomly-ish and if the accuracy improves it carries on adjusting in that direction, if it decreases it changes direction. How these values are changed depends on the optimizer in my case Adam?

The parameter learning_rate is how big the steps are in the changes the network makes to the weights and bisases, the default is 0.001. When I left it default with 1000 epochs the weight and bais was equal to the results of linear regression. I have seen people adjust this dynamically so the learning rate becomes slower over time - takes smaller steps to hone in on the true value.

I guess it is a trade-off between speed and accuracy - yet another parameter to optimse!

In the next post about this I shall try and fit some polynomial data, and hopefully learn how the weights and biases are changed

                    
                        #packages used
                        import tensorflow as tf
                        import numpy as np
                        import matplotlib.pyplot as plt
                        import keras
                        from keras.models import Sequential, Model
                        from keras.layers import Dense, Input
                        from keras.optimizers import Adam
                        from scipy import stats
                    
                    

Find me on ...