In this section we will build some models with three algorithms: k-nearest neighbors, logistic regression and decision tree. We will use the dataset we prepared in the previous steps of the project.

First let's import the data sets with predictor features:

Now import the data sets with target features:

Let's remember the names of the features:

Now let's prepare a pipeline to train and test three models. First, we will define a dictionary which will contain the names of the three algorithms and the corresponding python commands. We will update this dictionary with accuracy values obtained from the train and test processes. Then we will define a function to visually compare the performances of the three methods.

The result of the 'Tree' method indicates that the model is overfitted. This occurs because we have implemented the algorithm with no arguments controlling the complexity of the model. Therefore, we will now introduce the argument "max_depth" to limit the depth of the tree, and set it equal to 5 (this is arbitrary). First, we will create a copy of the dictionary we created earlier, and we will update its elemets. Then we will use the fuctions defined above to train and test the models, and to display the results.

As we can see, limiting the depth of the tree decreases overfitting, and this leads to a lower accuracy on the training set, but an improvement on the test set. We can see this better by comparing the results on a same plot sharing the scale of the y axis: