My Experience With Flatiron School’s Immersive Data Science Bootcamp

Part 10

5 min readMay 7, 2020

Welcome back one and all to my journey through Flatiron School. This week we will be diving into Week 9 of my Flatiron School experience, which introduces us to some of the standard machine learning models used in Data Science today.

However, we started this week first with a re-review of Time Series, as the Time Series Project from the previous week ran into many many issues. Of the 10 groups that embarked on the project, only 1 group managed to successfully create a model and project results from their model. Unfortunately, we had some very basic questions that threw some of the efficacy of our lectures into doubt.

For example, rolling means and rolling standards are intended to induce stationarity in our data. Other transformations like logs, squares and cubes don’t affect trends because the data is sequential. Also, a lot of us were constantly getting different error messages when running ARIMA, and none us knew why. Sometimes, ARIMA just didn’t like a d value greater than 1.

This 1 hour question and answer session left us somewhat dissatisfied, and inspired a number of us to write blogs and do side projects on Time Series to further our understanding.

To start off our run into Machine Learning, our lectures began with KNN, or K Nearest Neighbor. As an entree, it is quite possibly the simplest form of Machine Learning to grasp. KNN is a classification system who’s basic premise is that if you have a spread of data plotted, those data points that are closest to each other are most likely to be most similar to each other. Below is an example:

Each K represents a decision boundary that allows the algorithm to determine if the data point is in Class 1 or Class 2. By calculating the distance (usually euclidean) of the green circle from the points around it, the algorithm can determine its nearest neighbor, and thus classify this new data point in its proper category.

The next machine learning algorithm we learned was Decision Trees. Decision Trees are broken down into two types, Regression Trees and Decision Trees. Regression Trees are a supervised learning algorithm that allows us to make predictions on a numeric or continuous outcome. It does this by recursively partitioning the feature space into progressively smaller segments, as seen below:

Left: Raw Data, Right: Data separated by Regression Trees through Feature Space

We can evaluate the success of our predictions by tuning to achieve the smallest mean squared error via k-fold cross validation. Using cross validation will help us determine the appropriate depth of the tree, or the optimal partitions needed for accuracy.

Classification trees operate in much the same way as Regression Trees, but instead of numerical or continuous data, they instead are applied to qualitative or categorical data and predictions. Instead of MSE however, since we are dealing with categorical data, we use classification error rates.

The follow on to Decision Trees is of course, Random Forests. Random Forests is Decision Trees, but blown out to an even greater degree. Instead of just 1 tree with multiple branches, Random Forest uses multiple trees, with multiple branches, and then takes the aggregate or the average of all these decisions to come up with a final prediction or classification. In addition, it randomizes the tuning parameters for each tree in order to find the best result, hence the Random in Random Forests.

Source: https://towardsdatascience.com/from-a-single-decision-tree-to-a-random-forest-b9523be65147

Having multiple trees will help us create a more accurate model than any single tree since we have so much more input.

Random Forests can be augmented with additional methods to increase its accuracy depending on the situation. This is known as boosting, and there are 3 popular varieties: Ada Boosting, Gradient Boosting, and XG Boosting. All of them involve some form of evaluating trees and weighting them based on their ability to correctly classify data. As more weight is added to correct or incorrect trees, our model becomes more refined.

Lastly we learned about Support Vector Machines, also known as SVM. I found SVM particularly interesting if challenging to actually put into practice. The idea behind SVM is that if you were to plot your data out, you should be able to easily classify the data by drawing a line through your data to act as a boundary. Data either falls on one side of the boundary or the other. SVM tries to solve for that line, but also tries to create the greatest margin between the two groups of data to ensure accuracy. The margin is the distance between the line and the closest data point on either side of the boundary.

source: https://commons.wikimedia.org/wiki/File:SVM_margin.png

This seems simple, but does require a bit of work because the math can be quite intense. The line doesn’t have to be straight, but can curve any number of ways in order to find the best fitting line and margin. Not only that, it can be applied to higher dimensional data using what’s known as the kernel trick. Simply described, the kernel trick helps to categorize data that we would ordinarily have trouble categorizing. For example, take a look at the data plotted below:

We’d probably have to draw a circle to properly categorize this data. But if we use the kernel trick, we can project our data into a 3rd dimension like so:

From there, we can simply draw a plane to easily separate the two data. For a more in depth exploration of SVM, I refer you to this video, https://www.youtube.com/watch?v=N1vOgolbjSc which many of my classmates found very helpful and informative.

I think we’ll end here for the week, tune in next week when we move into Naïve Bayes!!

Click the link below for Part 11!

https://hammychang.medium.com/my-experience-with-flatiron-schools-immersive-data-science-bootcamp-bc2d1cadfcf2

My Experience With Flatiron School’s Immersive Data Science Bootcamp

Part 10

Written by Hamilton Chang

No responses yet