# Probability and Statistics¶

## Outline¶

We will be following Seeing Theory, a fantastic tool created by students at Brown University.

• Introduction to Probability
• Compound Probability
• Probability Distributions
• Frequentist Inference
• Bayesian Inference
• Regression Analysis

## Introduction to Probability¶

### Chance Events¶

🐑 You flip a coin twice. Assume the probability that the coin lands on heads is $p$. What is the probability of have at least one heads?

What is the mathematical definition of independence?

$$P \cap B = P(A)P(B)$$

### Expectation¶

🐑 Using the definition of expectation, calculate the expectation of a single coin flip.

$$E(X) = \sum_{x \in X(\Omega)} xP(X=x)$$

### Variance¶

🐑 Compute the variance of a die roll, i.e. a uniform random variable over the sample space $\Omega = \{1, 2, 3, 4, 5, 6\}$.

## Compound Probability¶

### Set Theory¶

🐑 Prove that $(A\cup B)^c = A^c \cap B^c$

### Counting¶

🐑 In Poker, here are examples of possible hands:

1. Royal Flush: A, K, Q, J, 10 all in the same suit.
2. Straight Flush: Five cards in a sequence, all in the same suit.
3. Four of a Kind: Exactly what it sounds like.
4. Full House: 3 of a kind with a pair.

Calculate the probabilities of the above hands.

### Conditional Probability and Bayes Rule¶

🐑 What is the definition of conditional probability?

🐑 Using the definition of conditional probability, prove Bayes Rule:

$$P(A \mid B) = \dfrac{P(B \mid A) P(A)}{P(B)}$$

🐑 You have two coins in a bag. A biased coin and fair coin. The biased coin lands on heads with probability 0.95. Define the following events:

$A = \{ \text{Picking the biased coin} \}$

$B = \{ \text{Flipping 3 heads out of 3 total flips} \}$

Compute $P(A \mid B)$ using Bayes Rule.

## Probability Distributions¶

### Random Variables¶

Distrete vs continuous (countable vs uncountable)

### Central Limit Theorem¶

🐑 We load on a plane 100 packages whose weights are independent random variables that are uniformly distributed between 5 and 50 kilograms. What is the probability that the total weight will exceed 3000 kilograms?

## Frequentist Inference¶

### Point Estimation¶

How could we estimate $\pi$ ?

### Confidence interval¶

Why is the 95% confidence interval commonly used?

### The Bootstrap¶

Where do you often see bootstrapping used in econometrics?

## Bayesian Inference¶

### Bayes Theorem¶

$P(A \mid B) = \dfrac{P(B \mid A) P(A)}{P(B)}$

Terms: prior, likelihood, posterior

Example exercises

### Likelihood Function¶

How does the likelihood function change as we choose larger samples?

## Regression Analysis¶

### Ordinary Least Squares¶

Graph your data! Check out the Datasaurus Dozen inspired by Anscombe's Quartet.

Derivation of matrix form of OLS.

And we could fit OLS by hand.

You have data on the grades of 10 students in primary school and high school. You would like to estimate the relationship between the grades. Assume that high school grades are linearly related to primary school grades with some idiosyncratic error, $\epsilon$.

Primary school = 5, 2, 3, 4, 8, 9, 10, 8, 5, 6 High School = 6, 4, 3, 4, 6, 7, 8 , 9, 3, 5

The model is Primary = $\alpha$ + $\beta$ High + $\epsilon$

Estimate the value of $\alpha$ and $\beta$ by hand.

### Correlation¶

• Covariance
• Correlation coefficient

🐑 Argue that for given random variables X and Y, the correlation lies between −1 and 1.

Correlation does not imply causation. See this website for examples.

🐑 Independence implies zero correlation but zero correlation does not imply independence.

In [10]:
sns.regplot(x="primary", y="high", data=grades);

In [17]:
results = smf.ols('high ~ primary', data=grades).fit()
results.summary()

/Applications/anaconda/lib/python3.6/site-packages/scipy/stats/stats.py:1390: UserWarning: kurtosistest only valid for n>=20 ... continuing anyway, n=10
"anyway, n=%i" % int(n))

Out[17]:
Dep. Variable: R-squared: high 0.682 OLS 0.643 Least Squares 17.17 Sun, 09 Sep 2018 0.00323 18:12:14 -15.198 10 34.40 8 35.00 1 nonrobust
coef std err t P>|t| [0.025 0.975] 1.6562 1.007 1.645 0.138 -0.665 3.977 0.6406 0.155 4.144 0.003 0.284 0.997
 Omnibus: Durbin-Watson: 0.901 2.214 0.637 0.411 0.465 0.814 2.655 17.1