My Experience with Flatiron School’s Immersive Data Science Bootcamp — Part 11

Naïve Bays and Machine Learning

5 min readMay 14, 2020

Welcome all to another round of me telling you about my Data Science bootcamp experience. Last week we went into covering the basics of machine learning and how they work. This week, we’ll wrap up with a final machine learning algorithm and then dive into the machine learning project we embarked upon.

The last machine learning algorithm we learned about was Naïve Bayes. Naïve Bayes is an extension of Bayes Theorem, and is a popular model to use in the field of Natural Language Processing. The essence of Bayes Theorem is that we can describe the probability of an event, based on observations or knowledge related to that event. So for example, in its most famous application, sorting spam email, there is a probability the word “offer” is in the majority of our email, but a certain percentage of that is spam. Bayes allows us to determine the probability of a given email with the word offer being Spam. Here’s what that looks like in mathematical terms:

P(A|B) — What we are solving for, or, the Probability of A given B, or based in the hypothetical we discussed earlier, the probability that an email is spam, given the word “offer” is in it.

P(B|A) — The Probability of B given A, or as before, the probability of the word “offer” given that the email is spam.

P(A) — The Probability of receiving an email that is spam.

P(B) — The Probability of receiving an email with the word offer. This is further broken down into subsets of spam and not spam to encompass all instances of the word “offer”:

P(Offer) = P(Spam)*P(Offer|Spam)+P(Not Spam)*P(Offer|Not Spam)

This all can be broken down algebraically by solving for one or the other, and can further easily be solved once you have the known variables. For example, if we knew that the word “Offer” occurred in 70% of spam messages, 20% of your non-spam emails included the word “Offer”, and 30% of your emails ARE spam, we could know the probability of the next email being spam.

P(A) = .30

P(B) = P(Spam)* P(Offer|Spam) + P(Not Spam) * P(Offer|Not Spam) =(.30 * .70) + (.70 * .20) = .21 + .14 = .35

P(B|A) = .70

Therefore:

P(A|B) = .70*.3/.35

P(A|B) = .60

The Probability of the email you received with the word “offer” in it is spam is thus 60%.

Naïve Bayes follows Bayes Theorem in structure, but the formula becomes much more complicated. The essence of Naïve Bayes is that instead of solving for one variable, it extends to multiple variables, assuming each one is independent from each other. These conditions are rarely independent however, and thus why it’s called Naïve Bayes. The formula for it, for those interested looks something like this:

After covering Naïve Bayes and its potential distributions, our next phase went into creating our own machine learning projects. This particular project was to be our first solo project. Prior to this, our projects were entirely group based, but as we were to shortly embark on our final projects, Flatiron wanted us to get some experience working on our own. This included generating our own idea, gathering are own data, and experimenting with our models on our own.

I chose to do my project on classifying whiskies by national origin based on their flavor profile. I have significant experience with whiskey, and while I don’t have a refined enough palette to be able to discern brand or region, I have enough experience to be able to tell if I’m drinking a Scotch, Bourbon or Irish Whiskey.

The problem with my project was finding data. I had initially attempted to scrape a dozen or so whiskey review sites and gather tasting notes that way. The problem was whiskey is a complicated beverage. While there are certainly outstanding obvious traits to each whiskey, like strong peat, caramel notes, or honey, there are also a myriad of other factors that can make tasting notes complicated. Flavors such as cheese, or beef, or plastic were also thrown into the mix, complicating my variables. If I took them all, I would essentially have every single possible flavor in the world. Whiskey flavor profiles were so diverse, one could walk into a Grand Central Station, and lick everything and everyone in there and all would be valid flavors for whiskey.

Fortunately, I was able to find help in the website WhiskeyAnalysis.com. This gentleman had done the work of scraping tasting notes from a handful of whiskey review sites. Like me, he was confronted with millions of possible whiskey flavors. What he decided to do with this information was to do a Principal Component Analysis in order to simplify all this information.

For those who don’t know, Principal Component Analysis is a statistical technique for reducing the dimensionality of large data sets. Without diving into terminology like eigenvector and eigenvalues, what it does is try to establish a relationship between similar values, and thus simplifies and reclassifies our data along those lines. For whiskey, with the help of Dr. David Wishart of the University of St Andrews, the author of the page was able to reduce the myriad flavors into a select few categories, as seen below:

source: https://whiskyanalysis.com/index.php/methodology-introduction/methodology-flavour-comparison/

Using these clusters, we are better able to handle the millions of variables that encompass the flavors of whiskey.

From there, it is a matter of testing different machine learning models and tuning the parameters to find one that best classifies the whiskeys. More details of that particular project can be found here. One method I really wanted to try was SVM using the polynomial kernel, alas, my poor laptop couldn’t handle running it. I may give it a shot again sometime in the near future using Google Collab to see if I might get a more accurate result.

That’s all for this week, I hope you found this helpful. I’ll see you all next week with something new I’m sure!

Click the link below for Part 12!

https://medium.com/@hammychang/my-experience-with-flatiron-schools-immersive-data-science-bootcamp-b9ae9a33480b

My Experience with Flatiron School’s Immersive Data Science Bootcamp — Part 11

Naïve Bays and Machine Learning

Written by Hamilton Chang

No responses yet