Probability and Statistics


We will be following Seeing Theory, a fantastic tool created by students at Brown University.

  • Introduction to Probability
  • Compound Probability
  • Probability Distributions
  • Frequentist Inference
  • Bayesian Inference
  • Regression Analysis

Introduction to Probability

Chance Events

🐑 You flip a coin twice. Assume the probability that the coin lands on heads is $p$. What is the probability of have at least one heads?

What is the mathematical definition of independence?

$$ P \cap B = P(A)P(B) $$


🐑 Using the definition of expectation, calculate the expectation of a single coin flip.

$$ E(X) = \sum_{x \in X(\Omega)} xP(X=x) $$


🐑 Compute the variance of a die roll, i.e. a uniform random variable over the sample space $\Omega = \{1, 2, 3, 4, 5, 6\}$.

Compound Probability

Set Theory

🐑 Prove that $(A\cup B)^c = A^c \cap B^c$


🐑 In Poker, here are examples of possible hands:

  1. Royal Flush: A, K, Q, J, 10 all in the same suit.
  2. Straight Flush: Five cards in a sequence, all in the same suit.
  3. Four of a Kind: Exactly what it sounds like.
  4. Full House: 3 of a kind with a pair.

Calculate the probabilities of the above hands.

Conditional Probability and Bayes Rule

🐑 What is the definition of conditional probability?

🐑 Using the definition of conditional probability, prove Bayes Rule:

$$ P(A \mid B) = \dfrac{P(B \mid A) P(A)}{P(B)} $$

🐑 You have two coins in a bag. A biased coin and fair coin. The biased coin lands on heads with probability 0.95. Define the following events:

$ A = \{ \text{Picking the biased coin} \}$

$ B = \{ \text{Flipping 3 heads out of 3 total flips} \}$

Compute $P(A \mid B)$ using Bayes Rule.

Probability Distributions

Random Variables

Distrete vs continuous (countable vs uncountable)

Central Limit Theorem

🐑 We load on a plane 100 packages whose weights are independent random variables that are uniformly distributed between 5 and 50 kilograms. What is the probability that the total weight will exceed 3000 kilograms?

Frequentist Inference

Point Estimation

How could we estimate $\pi$ ?

Confidence interval

Why is the 95% confidence interval commonly used?

The Bootstrap

Where do you often see bootstrapping used in econometrics?

Bayesian Inference

Bayes Theorem

$ P(A \mid B) = \dfrac{P(B \mid A) P(A)}{P(B)} $

Terms: prior, likelihood, posterior

Example exercises

Likelihood Function

How does the likelihood function change as we choose larger samples?

Regression Analysis

Ordinary Least Squares

Graph your data! Check out the Datasaurus Dozen inspired by Anscombe's Quartet.

Derivation of matrix form of OLS.

And we could fit OLS by hand.

You have data on the grades of 10 students in primary school and high school. You would like to estimate the relationship between the grades. Assume that high school grades are linearly related to primary school grades with some idiosyncratic error, $\epsilon$.

Primary school = 5, 2, 3, 4, 8, 9, 10, 8, 5, 6 High School = 6, 4, 3, 4, 6, 7, 8 , 9, 3, 5

The model is Primary = $\alpha$ + $\beta$ High + $\epsilon$

Estimate the value of $\alpha$ and $\beta$ by hand.


  • Covariance
  • Correlation coefficient

🐑 Argue that for given random variables X and Y, the correlation lies between −1 and 1.

Correlation does not imply causation. See this website for examples.

🐑 Independence implies zero correlation but zero correlation does not imply independence.

In [10]:
sns.regplot(x="primary", y="high", data=grades);
In [17]:
results = smf.ols('high ~ primary', data=grades).fit()
/Applications/anaconda/lib/python3.6/site-packages/scipy/stats/ UserWarning: kurtosistest only valid for n>=20 ... continuing anyway, n=10
  "anyway, n=%i" % int(n))
OLS Regression Results
Dep. Variable: high R-squared: 0.682
Model: OLS Adj. R-squared: 0.643
Method: Least Squares F-statistic: 17.17
Date: Sun, 09 Sep 2018 Prob (F-statistic): 0.00323
Time: 18:12:14 Log-Likelihood: -15.198
No. Observations: 10 AIC: 34.40
Df Residuals: 8 BIC: 35.00
Df Model: 1
Covariance Type: nonrobust
coef std err t P>|t| [0.025 0.975]
Intercept 1.6562 1.007 1.645 0.138 -0.665 3.977
primary 0.6406 0.155 4.144 0.003 0.284 0.997
Omnibus: 0.901 Durbin-Watson: 2.214
Prob(Omnibus): 0.637 Jarque-Bera (JB): 0.411
Skew: 0.465 Prob(JB): 0.814
Kurtosis: 2.655 Cond. No. 17.1