1. Introduction
This post is on automatic hyperparameter optimization using Talos on a regression based on MLP model. It can been regarded as the metadata for the MLP_crime_model data analysis report in my GitHub.
Talos is an automatic hyperparameter optimization tool for machine learning. To demonstrate the usage of Talos, I choose the community crime data set provided by UCI dataset repository for machine learning and use the MLP model for regression task.
We use mean squared error as loss function and s the metric for models, where is the correlation coefficient and get the best r_value as 0.78 in the experiment(Section 6) which is a improvement based on the initial MLP model with r_value 0.60.
While explaining the the process of data science project some small tricks is included. Some further questions is asked at the end this post and will be solved in the further post, hopefully.
2. Experiment setup
The data set has 1994 item and has 128 attributions with 127 being numeric type and one attribution communityname
as string type. The ViolentCrimePerPop
is the attribution we want to predict. In the data preparation section I did the following:
- Convert the original data set as DataFrame
data
indexed bycommunityname
- Separate
data
as datax
and targety
Normalizex
and replace allnp.NaN
to 0. (added this after experiment 6) - Separate
x
andy
as training data ,validation data and test data.
Follows the tutorial of Talos we set the model, set the parameter space run the experiment in section 2, 3 and 4 in our report respectively. I set three dictionary or set variablesbest_param_each_round
,data_each_round
,exp_nums
to store the experiment result in each round they will be used to generate the summary report at the summary section 6.
For the display part, I use section 5 to display the result of the current experiment. As mse
is the loss function for each model. I use r_value
as the metric for different models and plot the correlation graph for the best model by the metric in each round of experiment. The reason for consider both mse
and r_value
is that they indicates different aspect of the fit of the model, it is worth to note that they are quite related when the predictions are ground truth are of same mean and variance. I use val_r_value
r_value for validation data to select best model as one can expect validation data tends to have larger metric than training data.
I use Section 6 to display the results of all experiments.
3. Experiment process
In the first experiment, I just test if this setup works well by setting one set of parameter in parameter space and get val_r_value
equals 0.40, which corresponds to correlation coefficient as 0.60.
In experiment 2-6 I try to vary first_neuron
, first_activation
, hidden_layer
,hidden_neuron
, batch_size
,epochs
,kernel_initializer
,optimizers
,activation
.
The best result appears in experiment 3 with val_r_value
equals 0.27.
We observe the following points.
- there is no special preference of individual
optimizers
,kernel_initializer
,first_activation
,activation
but some combination gives goodval_r_value
batch_size
,epochs
need to be bigger to stabilizedval_r_value
.val_r_value
is highly sensitive to the hyperparameter.
In experiment 7 I tried to normalized the x
and y
and add one more hyper perimeter lr
as learning rate for SGD
, as the optimisers I tried has their recommended learning rates. We observe the following points.
val_r_value
is loweredval_r_value
is not sensitive to the hyperparameterlr
So it seems normalize the data gives improvement val_r_value
to 0.22.
In experiment 8-11 I try to vary hyperparameters using the normalized data and add one more hyperparameter dropout
. Finally improve val_r_value
to 0.20.
4. Problems to do
- I did not compare MLP model with other model for regression.
- In this data analysis, I do not use the cross validation technique.
From the final model we can see the over fit problems still exists.