Some univariate statistics notation

Pearson’s Correlation Coefficient

Correlation: Linear association or dependence between the values of variables X and Y

Pearson’s way to measure correlation between variables \(X\) and \(Y\), denoted as \(\rho(X,Y)\):

\(E[XY] = E[X]E[Y]\)

\(\rho(X,Y) = \frac{E[XY] − E [X]E[Y]}{\sigma_X\sigma_Y}\)

Some examples of Pearson’s Correlation Coefficient:

Independent variables will have a correlation close to zero, but a correlation close to zero does not mean independence

Anscombe’s quartet

Anscombe’s quartet has four examples of scatter plots for two variables. All cases have the same mean and standard deviation for the variables, and the same positive correlation coefficient: 0.816

The first case fits what we expect of linear correlation. The second shows a nonlinear correlation where the upwards trend becomes downwards, the third is a case where an outlier decreases the correlation coefficient and the fourth case is a correlation coefficient generated by a single outlier. Always look at scatter plots to know what your correlations mean!

The Datasaurus dozen

The Datasaurus dozen shows 12 (+1) examples with the same means and standard deviations and the same correlation coefficient of -0.06:

Remember, not everything that has a correlation of zero is independent! There are many kinds of relationships between variables beyond linear ones.