class: center, middle, inverse, title-slide .title[ # Measuring correlation ] .author[ ### David Garcia
ETH Zurich
] .date[ ### Social Data Science ] --- layout: true <div class="my-footer"><span>David Garcia - Social Data Science - ETH Zurich</span></div> --- # Some univariate statistics notation - `\(X\)` is a random variable - In data: `\(X_i\)` is the value of the variable for entry `\(i\)` - For example the GDP of a country - `\(E[X]\)` is the expected value of `\(X\)` - We estimate the expected value as the mean of `\(X\)`: `$$\mu_X = \frac{1}{N}\sum_i X_i$$` - `\(N\)` is the number of data points, for example the number of countries --- # Some more univariate statistics notation - `\(V[X]\)` is the variance of `\(X\)` - We calculate it as the expected squared difference to the mean X: `$$V[X] = \frac{1}{N}\sum_i (X_i - \mu_X)^2$$` - It is measured in squared units of `\(X\)` - `\(\sigma_X\)` is the standard deviation of X - `\(\sigma_X = \sqrt{V[X]}\)`, which is convenient because it measures dispersion in the same units as `\(X\)` - in R you can calculate it with the function sd() --- ## Pearson's Correlation Coefficient `\(\rho(X,Y)\)` > **Correlation:** Linear association or dependence between the values of variables `\(X\)` and `\(Y\)` - If `\(X\)` and `\(Y\)` are independent, they satisfy that the expectation of the product equals the product of expectations: `$$E[XY] = E[X]E[Y]$$` - The principle: correlation as the deviation from `\(E[XY] − E [X]E[Y] = 0\)` - The absolute value of this difference can be at most `\(\sigma_X\sigma_Y\)` - `\(\rho(X,Y)\)` rescales the difference to be between −1 and 1 `$$\rho(X,Y) = \frac{E[XY] − E [X]E[Y]}{\sigma_X\sigma_Y}$$` --- ### Some examples of Pearson's Correlation Coefficient data:image/s3,"s3://crabby-images/2aeb2/2aeb299c04561cd52a98463a4b957cd9aec69fd1" alt="" --- data:image/s3,"s3://crabby-images/23567/23567bd411adb47f82f5bd5370d2ca11bd54a18a" alt="" Independent variables will have a correlation close to zero, but a correlation close to zero does not mean independence --- ## Anscombe's quartet ( `\(\rho=0.816\)` ) .center[data:image/s3,"s3://crabby-images/598b9/598b9f5b8914f22cfcad03537d3243ac17287f98" alt=":scale 65%"] --- ## The Datasaurus dozen .center[data:image/s3,"s3://crabby-images/18b92/18b92a267506c3d86a19946f30f8bd0aed30ebbc" alt=":scale 68%"]