## Series Introduction

This blog is a part of "A Guide To TensorFlow", where we will explore the TensorFlow API and use it to build multiple machine learning models for real-life examples. Uptil now we've explored much about TensorFlow API, in this guide we will try to use our knowledge to build simple machine learning models. This guide is about linear regression.

Check out the other parts of the series: Part 1, Part 2, Part 3, Part 4 and Part 5

## Logistic Regression

In the previous blog we highlighted a few points on why linear regression doesn't always work. There needs to be a way to add non-linearity to our algorithm. Logistic Regression is one of the ways to do that. It's borrowed from statistics and usually is the go-to algorithm for yes-no type questions, or to put it in more general terms, binary classification.

There is a function used commonly in machine learning called the logistic function. It is also known as the sigmoid function, because its shape is an S (and sigma is the greek letter equivalent to s).

Mathematically sigmoid is expressed as: $$\sigma(x) = \frac{1}{1+e^{-x}}$$

It is essentially an 'S' shaped curve that drops off towards 0 as values approach a higher negative number and ascends towards 1 gradually as we approach higher positive values. Essentially, the logistic function is a probability distribution function that, given a specific input value, computes the probability of the output being a success, and thus the probability for the answer to the question to be “yes.”

In tensorflow you can simply use `tf.sigmoid()`

to apply sigmoid on a particular input.

### Sigmoid for Yes or No

Logistic regression models the probability of the default class (e.g. the first class). For example, if we are modeling people’s gender as male or female from the length of their hair, then the first class could be male and the logistic regression model could be written as the probability of male given a person’s height, or more formally:

$$P (gender = male|hair\text{-}lenght)$$

This value will range anywhere between 0 and 1, for any given value of `hair-length`

a prediction can be made for `gender`

Given \(X\) as the explanatory variable and \(Y\) as the response variable, the linear regression model represents the relationship between \(P(X)\) and X as: $$P(X)=W\cdot X + B$$

Now this has to be transformed into binary values, in principal a linear function can output values greater than 1 and even less than 1, so in order to make an actual probability prediction we need to use to logistic function to encode the linear output to a value between 0 and 1

$$P(X) = \frac{1}{1+e^{W\cdot X + B}}$$

This can be rewritten as

$$\log_n\Bigl(\frac{P(X)}{1-P(X)}\Bigr) = W\cdot X + B$$

The term \(\frac{P(X)}{1-P(X)}\) inside the log function is called the *odds ratio*, it can take a value between 0 and \(\infty\) and \(\log_n(odds)\) is called the *logit*

## The Dataset

We are going to use the Titanic survivor Kaggle contest dataset. The model will have to infer, based on the passenger age, sex and ticket class if the passenger survived or not.

The following is the data dictionary for our dataset:

Variable | Definition | Key |
---|---|---|

survival | Survival Prediction | 0 = No, 1 = Yes |

pclass | Ticket class | 1 = 1st, 2 = 2nd, 3 = 3rd |

sex | Sex | |

Age | Age in years | |

sibsp | # of siblings / spouses aboard the Titanic | |

parch | # of parents / children aboard the Titanic | |

ticket | Ticket number | |

fare | Passenger fare | |

cabin | Cabin number | |

embarked | Port of Embarkation | C = Cherbourg, Q = Queenstown, S = Southampton |

## The Code

The following template gives the gist of the overall code skeleton of our graph.

```
import tensorflow as tf
... # Declare Variables Here
def combine_inputs(X):
... # Multiplies the input and weight matrix, adds bias and returns the value
def inference(X):
... # Returns the sigmoid of the comined inputs
def loss(X, Y):
... # Implementation of loss function
def read_csv(batch_size, file_name, record_defaults):
... # Function to import data from csv file
def inputs():
... # Define Inputs, convert categorical data to float and to stack it all up in a matrix with one example per row
def train(total_loss):
... # Using gradient descent optimizer to minimize the total loss
def evaluate(sess, X, Y):
... # Evalute the regression model
# Launch the graph in a session, setup boilerplate
with tf.Session() as sess:
```

Lets start by importing tensorflow and declaring the variables we are going to use. The first variable is W for storing the weights, this is a matrix of shape [5,1]. The matrix is initialized with all values equal to zero. The next variable is bias. We use `tf.Variable`

to create each of them.

```
import tensorflow as tf
import os
# same params and variables initialization as log reg.
W = tf.Variable(tf.zeros([5, 1]), name="weights")
b = tf.Variable(0., name="bias")
```

### Defining Input

```
def inputs():
passenger_id, survived, pclass, name, sex, age, sibsp, parch, ticket, fare, cabin, embarked = \
read_csv(100, "train.csv", [[0.0], [0.0], [0], [""], [
""], [0.0], [0.0], [0.0], [""], [0.0], [""], [""]])
```

So first we use a custom function `read_csv`

to read our training samples. In the dataset we have, the only classes that are relevant for our problem are `pclass`

, `sex`

, `age`

, `survived`

.

Here `pclass`

and `sex`

is a categorical type of data, which we need to represent in some numerical form that can be used for computation. A naive way to do this is assign a numerical value to each label. So by this method let's say for `pclass`

, we assign "1" for First Class, "2" for Second Class, "3" for Third Class. There is one major issue with this approach, it assumes a linear relationship amongst them which does not really exists.

What this form of organization presupposes is First Class > Second Class > Third Class on the categorical values. Say supposing our model internally calculates average, then accordingly we get, 1+2+3+2 = 8/4 = 2. This implies that the the sum of two 2nd Class Tickets, one 1st and 3rd Class Tickets is a 2nd Class Ticket. This is definitely a recipe for disaster. This model’s prediction would have a lot of errors. In intuition it may seem okay to do this with tickets, but suppose these labels were shoe brand preferences (say Nike, Adidas, Asics) introducing such a scheme will give the model a false pretext that a linear relationship exists between these shoe brands.

To appropriately represent these classes we use a technique called *one hot encoding*, it is the process of "binarization" of data. What we essentially do is convert these categorical labels to individual classes. In the case of passenger class, we can create three new classes `is_first_class`

, `is_second_class`

and `is_third_class`

with each having a value either 1 or 0. For gender, with only to values it is okay to go with only one variable, that’s because you can express a linear relationship between the values. For instance if possible values are `female = 1`

and `male = 0`

, then `male = 1 - female`

, a single weight can learn to represent both possible states.

```
# convert categorical data
is_first_class = tf.to_float(tf.equal(pclass, [1]))
is_second_class = tf.to_float(tf.equal(pclass, [2]))
is_third_class = tf.to_float(tf.equal(pclass, [3]))
gender = tf.to_float(tf.equal(sex, ["female"]))
```

Finally we stack all the features in a single matrix; We then transpose to have a matrix with one example per row and one feature per column. `tf.stack`

is used to stack the desired variables in one tensor, the we use tf.transpose to perform a 2D transpose on the data. We save this in a variable `features`

, we also create a variable `survived`

in order to save the survival status of the passengers.

```
features = tf.transpose(tf.stack([is_first_class, is_second_class, is_third_class, gender, age]))
survived = tf.reshape(survived, [100, 1])
return features, survived
```

### Importing The Dataset

In the previous tutorial we entered the data in code itself, here we shall write a generic function to input csv data:

```
def read_csv(batch_size, file_name, record_defaults):
filename_queue = tf.train.string_input_producer(
[os.path.join(os.getcwd(), file_name)])
reader = tf.TextLineReader(skip_header_lines=1)
key, value = reader.read(filename_queue)
decoded = tf.decode_csv(value, record_defaults=record_defaults)
return tf.train.shuffle_batch(decoded,
batch_size=batch_size,
capacity=batch_size * 50,
min_after_dequeue=batch_size)
```

The parameters that this function takes is `batch_size`

, `file_name`

, `record_defaults`

, we shall understand what they are soon but before that let's look at a few new functions we can see above.

`tf.train.string_input_producer`

: Output strings (e.g. filenames) to a queue for an input pipeline.`tf.TextLineReader`

: It outputs the lines of a file delimited by newlines. we use the`read`

function to read values from the csv file.`tf.decode_csv`

: This Convert CSV records to tensors such that each column maps to one tensor. decode_csv will convert a Tensor from type string (the text line) in a tuple of tensor columns with the specified defaults, which also sets the data type for each column.`tf.train.shuffle_batch`

: This function actually reads the file and loads "batch_size" rows in a single tensor. It creates batches by randomly shuffling tensors and returns it.

With `read_csv`

defined by simply calling the `inputs`

function, we can get access to the data whenever required. We can store these value using the following statement: X, Y = inputs()

### Combine Inputs Method

Before applying sigmoid function in inference we need to combine the inputs, this function multiplies the input and weight matrix, adds bias and returns the value.

```
def combine_inputs(X):
return tf.matmul(X, W) + b
```

### Inference Method

This method applies the sigmoid function on the value returned by `combine_inputs`

function.

```
def inference(X):
return tf.sigmoid(combine_inputs(X))
```

### The Loss Function

We could have used the L2 Loss function just like in our previous tutorial, however since the output we expect from our model is a probability value between 0 and 1, we will use a much more suited loss function called *cross entropy*.

Consider two scenarios, suppose for a particular example, the expected answer is "yes", however our model predicts a very low probability for it, close to 0. This means that out model is almost 100% sure that the answer is a "no". Now consider a scenario where our model predicts 20% or 30% or even 50% for a "no" ouput. L2 penalizes both of these scenarios **equally**.

If we plot cross-entropy loss against the L2 loss function we see that cross entropy penalizes much more as the ouput is further from expected.

With cross entropy, as the predicted probability comes closer to 0 for the “yes” example, the penalty increases closer to infinity. This makes it impossible for the model to make that misprediction after training. That makes the cross entropy better suited as a loss function for this model.

$$Loss = \sum_i ( y_i . log(y_{predicted_i}) + (1-y_i).log(1-y_{predicted_i}))$$

In tensorflow, we can implement this as follows.

```
def loss(X, Y):
return tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(logits=combine_inputs(X), labels=Y))
```

### Training and Evaluation

We define the training unction just like the previous tutorial, as follows:

```
def train(total_loss):
learning_rate = 0.01
return tf.train.GradientDescentOptimizer(learning_rate).minimize(total_loss)
```

To evaluate the results we are going to run the inference against a batch of the training set and count the number of examples that were correctly predicted. We call that measuring the accuracy.

```
def evaluate(sess, X, Y):
predicted = tf.cast(inference(X) > 0.5, tf.float32)
print(sess.run(tf.reduce_mean(tf.cast(tf.equal(predicted, Y), tf.float32))))
```

As the model computes a probability of the answer being yes, we convert that to a positive answer if the output for an example is greater than 0.5. Then we compare equality with the actual value using `tf.equal`

. Finally, we use `tf.reduce_mean`

, which counts all of the correct answers (as each of them adds 1) and divides by the total number of samples in the batch, which calculates the percentage of right answers.

### Launching the Session

The following piece of code create a session initalize all variables and train as well as test our model.

```
with tf.Session() as sess:
tf.global_variables_initializer().run()
X, Y = inputs()
total_loss = loss(X, Y)
train_op = train(total_loss)
coord = tf.train.Coordinator()
threads = tf.train.start_queue_runners(sess=sess, coord=coord)
# actual training loop
training_steps = 1000
for step in range(training_steps):
sess.run([train_op])
# for debugging and learning purposes, see how the loss gets decremented through training steps
if step % 10 == 0:
print("loss: ", sess.run([total_loss]))
evaluate(sess, X, Y)
import time
time.sleep(5)
coord.request_stop()
coord.join(threads)
sess.close()
```

## Wrapping Up

In the next guide we will try to build our own neural network using tensorflow. Before that it is very important to understand the math behind neural networks, In my blog Understanding Neural Networks, you will find a comprehensive explanation of what neural networks are, and a very intuitive explanation on how they work.