My Experience with Flatiron School’s Immersive Data Science Boot Camp
Part 8
Hello all and welcome back to the stream of consciousness babble that I call writing. This week we’ll be going into what we covered in Week 7 of boot camp. Bear in mind as always, that Flatiron’s Data Science curriculum is a constant work in progress, as the needs of the Data Science industry also evolves.
So what did we do this week? Well we started with a gentle introduction to the foundations that every data scientist needs to have when getting started with machine learning. No, we didn’t talk about machine learning yet, but we did learn about the importance of some key concepts. We’ll get to those in a bit.
To start, we left behind a lot of the basic mathematical concepts and python learning and truly started on the path to becoming Data Scientists. The first lecture of the week was about the Data Science process. We were invited to discuss any Data Science related projects we had worked on in our previous lives, and how we felt it might have been “Data Science-y.” Then we were introduced to the Data Scientist Venn Diagram,
This diagram nicely illustrates all the hats data scientists are perceived to need to wear. I think it also nicely illustrates the future division the job TITLE of Data Scientist will eventually separate into. If I may digress a bit here, every company is looking for something different from their data scientist. Some only need data analysts, some need machine learning engineers, some need someone to translate all this crazy stuff their data scientist turned out. The future of the Data Science role is changing, and will eventually break down into 8–10 different titles as companies eventually shake out what they actually need from data scientists.
We were next introduced to one of many methods used by data scientists in their day to day, the OSEMN Model:
We would adopt this method for the remainder of our time at Flatiron.
The rest of the week, we were introduced to new topics that would be important later for handling data and choosing appropriate machine learning models. Our first lecture after the data science process was learning about Bias and Variance, and their associated tradeoffs. For those who are unfamiliar, Bias is the difference between the prediction of a chosen model, and the actual correct data the model is trying to predict. Variance is the variability a model produces for the data it is trying to predict. The below dartboard image illustrates this for us nicely:
As you can see, the dartboard to the left has a High Bias. We are looking to hit the bullseye, but all the darts are clustered rather far away, hence a large difference between our shots and the location we want to hit. Furthermore, they are all clustered rather closely, and as such, exhibit low variance. The dartboard on the right, show the opposite of the one on the left. Bias is better, as all the darts are clustered around the bullseye, but the darts are spread out around the center, thus exhibiting high variance.
Bias and variance are a constant companion to developing models in machine learning. As no model can ever truly accurately predict the future, we must constantly balance both. Too much bias, and our model is consistently and irreparably wrong. Too much variance, and we may end up in the right ball park, but still be wrong. This leads into what’s known as the bias-variance tradeoff:
As you can see from the graph, errors generated from a model with either too much bias or variance increases the more either increases. The goal of a data scientist when using models to generate predictions is to find that sweet spot between bias and variance in order to produce the most accurate model. This also plays a major role when we are evaluating if a model is overfit or underfit.
Speaking of overfit or underfit, the next step in our foundation building involved learning about addressing overfitting. Underfitting is a an issue that is usually a problem with a realistic application of ready made or obvious solutions. Overfitting however, needs to be addressed in a more deliberate manner. Overfitting, as you can see from the graph is a symptom of model complexity. The more complicated our model gets, whether from too much data, or too much variance in our data can result in too much variance, and as the model attempts to compensate, leads to overfitting.
To compensate, we use many methods, all of which are known as regularization. Regularization helps us lower the variance at the costs of some bias, and helps shift us closer to the center of the bias/variance plot. Regularization tends to apply a penalty term to coefficient of our regression models so they do not affect the outcome a much a they originally would.
There are two popular methods of regularization. The first is Ridge regresion, and the second is known as Lasso. I’m not going to get into the math but here’s how they work in essence:
Ridge(L2): Ridge applies a penalty parameter to each predictor in order to reduce the effect that predictors have on the outcome. It will shrink the coefficients that build up our regression model, and help keep our model from being too sensitive to each predictor. Its important to note, that Ridge will reduce the size of our coefficients, but will never reduce them all the way to 0. This way, Ridge will not kick out irrelevant features, but will reduced their overall impact on the model.
Lasso(L1): Lasso is similar to Ridge, but instead of squaring the penalty term it applies to the predictors, its penalty term is the absolute value of the coefficients, multiplied by lambda. I know, I got a bit mathy there. The most important thing to know about Lasso however, is that it differs from Ridge in that it will reduce some coefficients to 0, effectively rendering them non-existent from our model. It performs well when you have higher dimensional data where some predictors are useless.
The last 3 lessons were spent on reviewing regression, and logistic regression, and applying what we learned of the bias/variance tradeoff, Ridge, and Lasso to our previous projects to determine if they made our results better or worse. We also were introduced to the beginnings of machine learning, and learning how to use confusion matrices, evaluation matrices, ROC, AUC, and cross validation.
I’m going to end our little jaunt through the week here. I hope you found this week’s entry useful. Farewell and see you all next week!
Click the link below for Part 9!