Normal Distributions

Hamilton Chang
4 min readJul 16, 2020

--

Hello everyone, I hope you’re all well. Today we’re going to be talking about normal distributions, their purpose, and what they mean. Distributions in general are an important component to statistics, and their analysis is frequently utilized in Data Science. It is also frequently used in areas such as finance, engineering, and medicine. Basically, any field that uses numbers, taps into statistics, and uses distributions to a certain extent.

Let’s start from the beginning then. What’s a normal distribution?

That’s a normal distribution, whoooo so fancy right? It is also known as a Gaussian distribution or a bell curve. In more interesting terms, a normal distribution is any distribution of data, or probabilities that are distributed around the mean. Additionally, the data should be distributed more frequently around the mean, and taper off the further away it is from the mean. Normal distributions are known for being symmetrical, though not all symmetrical distributions are necessarily normal.

One often finds normal distributions everywhere we are collecting large amounts of data, especially in populations. Here’s a fun example:

As you can see, this is the distribution of heights in a particular college class, and it is normally distributed. We humans are a normally distributed lot, and in fact most things in nature follow a normal distribution, such as blood pressure, IQ score, finger length, income, you name it. Gather enough of a sample population, and no matter what you’re measuring, you’ll find your normal distribution.

The example above also illustrates an interesting association with normal distribution known as the Central Limit Theorem. The Central Limit Theorem states that as your sample size increases, the more your data will take on the appearance of a normal distribution. Take the picture above as an example. If you were to have a sample size of two, 1 from the shortest, and 1 from the highest, you wouldn’t have much of a normal distribution would you? But as you increase your sample size, people of “average” height will slowly fill out your distribution until the average height of the population fills out the distribution into a normal one. Its the law of large numbers in action!

Now the other cool thing about normal distributions is that with a little variation, we can determine approximately what percentage of our data will fall into what categories along this normal distribution!

Each number along the x-axis represents a standard deviation. So if we read this correctly, in a normal distribution, approximately 34% of all our data will fall within 1 standard deviation above the mean, and 34% will fall within 1 standard deviation below the mean. 13.5% of data will fall with 2 standard deviations above the mean, and the inverse is true for 2 standard deviations below the mean!

Other things to know

There are a few other technical terms to know when we talk about normal distributions, and they actually apply to distributions in general:

Skew: Skewness measures the symmetry of a distribution. Normal distributions ordinarily have a skewness of 0, since all our data is symmetrically laid out. However real world data is seldom perfect, so we will often see our distribution skewed left or right, or negative and positive relative to the mean.

Kurtosis: The definition of Kurtosis is “fatness of tails.” What it measures is if the edges of our distribution is large or small, specifically as compared to the tails of a normal distribution. Below is a helpful diagram of the types of kurtosis you may see:

source: https://itfeature.com/statistics/measure-of-dispersion/measure-of-kurtosis

Normal Distribution and Data Science

So how do we use normal distributions in data science? Well there are a couple of ways. The first, as we saw from the previous examples is for analysis of our data. Seeing a normal distribution of our data, is somewhat gratifying, if our data was skewed, we might have to question our sources, and should they all pan out, we could draw conclusions on the nature of our data.

The other thing we use normal distributions for is in statistical and probability analysis. Using normal distributions, we can test things like poll data to determine if a candidate is likely to win an election. They do so using statistical testing to determine the probability of an event happening using z-scores and p-values. Plotting our data along a normal distribution, we can then use that data to determine the probability of another observation falling within the distribution and how far from the mean it will likely be.

This is just the surface of what you can use normal distributions for, but I’m going to end here and pick it up in more detail on my next blog!

--

--

Hamilton Chang

Data Scientist, Financial Planner. Trying to educate and make information accessible to EVERYONE. Let’s Connect! shorturl.at/aBGY5