Automatic hyperparameter optimization using Talos on a regression based on MLP model

A data analysis project

Posted by Jiayin Guo on February 28, 2019

1. Introduction

This post is on automatic hyperparameter optimization using Talos on a regression based on MLP model. It can been regarded as the metadata for the MLP_crime_model data analysis report in my GitHub.

Talos is an automatic hyperparameter optimization tool for machine learning. To demonstrate the usage of Talos, I choose the community crime data set provided by UCI dataset repository for machine learning and use the MLP model for regression task.

We use mean squared error as loss function and s the metric for models, where is the correlation coefficient and get the best r_value as 0.78 in the experiment(Section 6) which is a improvement based on the initial MLP model with r_value 0.60.

While explaining the the process of data science project some small tricks is included. Some further questions is asked at the end this post and will be solved in the further post, hopefully.

2. Experiment setup

The data set has 1994 item and has 128 attributions with 127 being numeric type and one attribution communityname as string type. The ViolentCrimePerPop is the attribution we want to predict. In the data preparation section I did the following:

  • Convert the original data set as DataFrame data indexed by communityname
  • Separate data as data x and target y Normalize x and replace all np.NaN to 0. (added this after experiment 6)
  • Separate x and y as training data ,validation data and test data.

Follows the tutorial of Talos we set the model, set the parameter space run the experiment in section 2, 3 and 4 in our report respectively. I set three dictionary or set variablesbest_param_each_round
,data_each_round
,exp_nums to store the experiment result in each round they will be used to generate the summary report at the summary section 6.

For the display part, I use section 5 to display the result of the current experiment. As mse is the loss function for each model. I use r_value as the metric for different models and plot the correlation graph for the best model by the metric in each round of experiment. The reason for consider both mse and r_value is that they indicates different aspect of the fit of the model, it is worth to note that they are quite related when the predictions are ground truth are of same mean and variance. I use val_r_value r_value for validation data to select best model as one can expect validation data tends to have larger metric than training data.

I use Section 6 to display the results of all experiments.

3. Experiment process

enter image description here
In the first experiment, I just test if this setup works well by setting one set of parameter in parameter space and get val_r_value equals 0.40, which corresponds to correlation coefficient as 0.60.

enter image description here

In experiment 2-6 I try to vary first_neuron, first_activation, hidden_layer,hidden_neuron, batch_size,epochs,kernel_initializer,optimizers,activation.

The best result appears in experiment 3 with val_r_value equals 0.27.
We observe the following points.

  • there is no special preference of individual optimizers,kernel_initializer,first_activation,activation but some combination gives good val_r_value
  • batch_size,epochs need to be bigger to stabilized val_r_value.
  • val_r_value is highly sensitive to the hyperparameter.

enter image description here
In experiment 7 I tried to normalized the x and y and add one more hyper perimeter lr as learning rate for SGD, as the optimisers I tried has their recommended learning rates. We observe the following points.

  • val_r_value is lowered
  • val_r_value is not sensitive to the hyperparameter lr

So it seems normalize the data gives improvement val_r_value to 0.22.

enter image description here

In experiment 8-11 I try to vary hyperparameters using the normalized data and add one more hyperparameter dropout. Finally improve val_r_value to 0.20.

4. Problems to do

  • I did not compare MLP model with other model for regression.
  • In this data analysis, I do not use the cross validation technique.
  • From the final model we can see the over fit problems still exists.

    enter image description hereenter image description hereenter image description here