My Experience With Flatiron School’s Immersive Data Science Program — Part 5

Hamilton Chang
3 min readFeb 13, 2020

--

Hello and welcome back to my Flatiron School experience.

We’re on week 4 of the program, and week 4 is all about Pandas and Probability. We started off the week learning how to unlock the full functionality of Pandas. This was a bit annoying because the first week we got started with USING pandas without really knowing how to do more with it. Our learn labs started using it on a regular basis as a way to present data and derive statistics from it, using basic commands like df.head(), df.describe() etc, but they were offered in a vacuum. Useful to be sure, but using them did require a bit of lookup to figure out some of the parameters. I recall once asking why df.head() only gave me 5 rows, and thought something was wrong with the data frame, like I had accidentally deleted the data, then being told it gives the first 5 by default, and I can enter a number in the parentheses to change that number. All this before we were introduced to the concept of parameters, let alone methods. Hopefully the recent curriculum changes have resolved that particular issue.

That week we dispensed with the standard project and instead did an in-depth assessment, where we were given a json file to convert into a dataframe. We were then asked to answer questions based on the data frame. Our subject was the NBA, which was a bit difficult for me because I lacked domain knowledge when it comes to basketball. Some of the abbreviations in the dataframe made sense to the basketball fans in my cohort, but were completely incomprehensible to me, so I found myself asking the instructor what some of them were. Who knew baskets were called called Field Goals in basketball statistics? I didn’t! Some questions also has basketball terms, like “What is the probability that a randomly selected game had a player record a triple-double in that game?” Had to ask what a triple double was of course.

This assessment wasn’t that difficult, but I did cheat a bit using the SQLDF library, which allowed me to use SQL queries to interrogate the dataframe. Good old buddy SQL helping me out again. Panda queries are very similar to SQL queries, but they’re a bit stricter, and persnickety. I’ve since strengthened my panda queries, but fortunately they’re much easier to interpret when visualized.

The other half of the week was spent learning probability. This was a bit tough for me, as I’d never done probability before. Learning new mathematical notation was frustrating, and even at this point, having gotten a solid grasp of probability, I still harbor resentment for it.

In any case, we learned basic probability concepts such as the probability mass function, the cumulative mass function, Bayes Theorem and delved a bit into discrete random variables. I’ll be honest, this one was tough to grasp, and a lot of our class struggled with wrapping our head around it. For me, sometimes when I was considering probability questions the uncertainty in contemplating a bag of dice, or the number of infected patients made me question my sanity. Fortunately, there was a lot of practical use, which made understanding the concepts a bit easier. Learning math via a series of formulas is stressful and a bit obtuse, but putting them to practical use made the concepts much easier to understand.

That pretty much wraps up week 4. We do a lot more probability and linear regression next week!

Click the link below for Part 6!

https://medium.com/@hammychang/my-experience-with-flatiron-schools-immersive-data-science-boot-camp-f4e914f21a53

--

--

Hamilton Chang
Hamilton Chang

Written by Hamilton Chang

Data Scientist, Financial Planner. Trying to educate and make information accessible to EVERYONE. Let’s Connect! shorturl.at/aBGY5

No responses yet