Course grade: 8.0. Extensive summary for the course Bayesian Multilevel Models. The summary contains the content of all lectures and tutorials, including additional notes and explanations. The course is taught by dr. B. Nicenboim, as part of the MSc Data Science & Society at Tilburg University. Not...
Bayesian Multilevel Models
MSc Data Science & Society
Tilburg University
1
,Week 1. Introduction to Bayesian Modelling
Lecture 1. Introduction to Bayesian Modelling
Uncertainty
“In April 1997, the Red River of the North flooded Grand Forks, North Dakota, overtopping the town’s
levees and spilling more than two miles into the city. Although there was no loss of life, nearly all of
the city’s 50,000 residents had to be evacuated, cleanup costs ran into the billions of dollars, and 75
percent of the city’s homes were damaged or destroyed. Unlike a hurricane or an earthquake, the Grand
Forks flood may have been a preventable disaster.” – The Signal and the Noise, Nate Silver.
The problem:
- The levees in Grand Forks had been built to handle a flood of 15.5 m;
- Weather Service didn’t communicate the uncertainty in their forecast to the public,
emphasizing only the 15 m prediction;
- The river crested to 16.5 m;
- However, the 95% Predictive Interval was [12.3, 17.7] m. ( and for a 99% confidence
interval, the interval should be extended even further).
What is Bayesian Data Analysis
“Bayesian statistics is an approach to data analysis based on Bayes’ theorem, where available
knowledge about parameters in a statistical model is updated with the information in observed data.
The background knowledge is expressed as a prior distribution and combined with observational data
in the form of a likelihood function to determine the posterior distribution. The posterior can also be
used for making predictions about future events.” – van de Schoot et al 2021.
Advantages of Bayesian Modeling for Data Science
1. It yields predictions with their associated uncertainty.
2. It allows for the use of our domain knowledge or prior information (when available) to
ensure that our estimates and predictions fall within a reasonable range. (Multilevel
models is one way to include prior information).
3. It’s not “data eager”. Bayesian methods specially shine with small to medium sized data
sets. If you have a lot data available, and you don’t know a lot about the data, you should
not be using Bayesian methods.
What is Probability?
Examples:
- Probability of being killed on a single airline flight of one of the 78 major world airlines
= 2.13 ⋅ 10−7 (in odds = 1 in 4.7 million);
- Probability of getting heads in a regular coin = 0.5;
- Probability of Hillary Clinton winning on the 2016 US election assessed on September
23𝑟𝑑 = .603.
Two Interpretations of Probabilities
1. Frequentist probability: probability is always defined in connection to a countable event
and its frequency in very large samples. E.g.: toss coin a lot of times.
2. Bayesian probability:
2
, o Probability describes uncertainty.
o Uncertainty is based on our limited information. If we knew everything, there
would be no uncertainty and no probability. This is never the case, hence the
importance of bayesian probability.
o Parameters (and models and measurements) can have probability distributions.
Properties of Probability (Kolmogorov’s axioms)
- The probability of an event must lie between 0 and 1, where 0 means that the event is
impossible and cannot happen, and 1 means that the event is certain to happen.
- For any two mutually exclusive events, the probability that one or the other occurs is the
sum of their individual probabilities.
- Two events are independent if and only if the probability of both events happening is
equal to the product of the probabilities of each event happening.
- The probabilities of all possible events in the entire sample space must sum up to 1.
Conditional probability
- 𝑃(𝑎|𝑏) stands for the probability of an event 𝑎 given 𝑏 (order is important for the
conditional probability, i.e., 𝑃(𝑎|𝑏) ≠ 𝑃(𝑏|𝑎)).
𝑃(𝑑𝑖𝑒|𝑠ℎ𝑎𝑟𝑘 𝑏𝑖𝑡𝑒 𝑜𝑓 𝑦𝑜𝑢𝑟 ℎ𝑒𝑎𝑑) = 1; 𝑃(𝑠ℎ𝑎𝑟𝑘 𝑏𝑖𝑡𝑒 𝑜𝑓𝑓 𝑦𝑜𝑢𝑟 ℎ𝑒𝑎𝑑|𝑑𝑖𝑒) = 0
- 𝑃(𝑎|𝑏) stands for the probability of 𝑎 given 𝑏, i.e., 𝑃(𝑎, 𝑏) = 𝑃(𝑏, 𝑎).
Bayes’ rule
Given that a and b are events, the conditional probability is defined as follows:
𝑃(𝑎, 𝑏)
𝑃(𝑎|𝑏) = ; 𝑃(𝑏) > 0
𝑃(𝑏)
This means that 𝑃(𝑎, 𝑏) = 𝑃(𝑎|𝑏)𝑃(𝑏). Since 𝑃(𝑏|𝑎) = 𝑃(𝑎, 𝑏), we can write:
𝑃(𝑏, 𝑎) = 𝑃(𝑏|𝑎)𝑃(𝑎) = 𝑃(𝑎|𝑏)𝑃(𝑏) = 𝑃(𝑎, 𝑏)
Rearranging terms leads to the Bayes’ rule:
𝑃(𝑎|𝑏)𝑃(𝑏)
𝑃(𝑏|𝑎) =
𝑃(𝑎)
The Bayes’ rule can be used for frequentists. It not “really” Bayesian. Frequentist statistics
traditionally do not incorporate prior beliefs or probabilities. Instead, frequentist methods rely
on concepts such as maximum likelihood estimation and hypothesis testing based on p-values.
These methods do not explicitly involve the Bayes' rule.
Probability distributions
What is a random variable (RV)? A function that maps from a sample space to a probability space
(how likely it is that some event happens; it assigns probabilities to all possible outcomes). A
sample space: outcomes of a non-deterministic process (given our knowledge). Examples:
- head or tail after throwing a coin;
- customer stays or leave;
- reaction time in an experiment under certain conditions;
- height of the water for a potential flood.
3
,Random variable theory
RVs allow us to think about:
- 𝑃(𝑋 < 𝑥); what probability of RV X to have observed values that are lower than 𝑥?
- 𝐸[𝑋]; what is the expected value (mean) of X?
- Or in general to summarize X somehow (e.g., using summary statistics such as mean,
median, quantiles)
To do that we need to understand how X is distributed:
- Every discrete random variable X has associated with it a probability mass function (pmf).
- Every continuous random variable X has associated with it a probability density function
(pdf).
Probability density functions (continuous case) or probability mass functions (discrete case) are
functions that assign probabilities or relative frequencies to all events in a sample space.
The expression:
𝑋~𝑓(∙)
Means that the random variable X has pdf/pmf 𝑓(∙).
Discrete random variables:
Example with the binomial
distribution.
n = total number of customers;
theta: probability of success
(e.g., customers staying).
In figure 1a: the probability of 4
customers staying, given the
parameters, is equal to 0.20. The
height of all possible outcomes
sum to 1.
Likelihood associated with the
binomial distribution
Total number of possibilities is
known. What is the most likely
value of theta? à use Maximum
Likelihood Estimate (MLE). If 7
customers stay out of 10, the
probability of customers staying
is equal to 0.70. Key point: the
function is the same, before we
had that everything is
conditional on theta, k, and n.
However, in the second case, the
function is conditional on k and n.
4
, Maximum Likelihood Estimate
(MLE)
For only 10 customers, MLE
will vary a lot. As n increases,
the estimate converges to the
true value of 0.70.
What information does a probability distribution provide?
1. Compute the probability (pmf) of a particular outcome (discrete case only).
This line of code uses the `dbinom` function to compute the probability of a specific
outcome in a binomial distribution. The arguments are the number of successes (5 in this
case), the total number of trials `size`, and the probability of success in each trial (prob).
The output of this computation is 0.25. This means that in a binomial distribution with
10 trials and a success probability of 0.5 in each trial, the probability of observing exactly
5 successes is 0.25. The `dbinom` function is used for computing the probability mass
function (PMF) of the binomial distribution. It calculates the probability of each possible
outcome in the distribution. In this case, it provides the probability of observing 5
successes in 10 trials.
2. Compute the cumulative probability of k or less (more) than k successes.
This line of code calculates the cumulative probability of obtaining 0, 1, or 2 successes
out of 10 trials in a binomial distribution. The `dbinom` function is used to compute the
probability of obtaining a specific number of successes (k) given the total number of trials
(size) and the probability of success in a single trial (prob). By summing the probabilities
of each individual outcome, we obtain the cumulative probability. The output of this
computation is 0.055. This means that the cumulative probability of getting 2 or fewer
successes (0, 1, or 2) in 10 trials, with a success probability of 0.5 in each trial, is 0.055.
The second line of code uses the `pbinom` function to directly calculate the cumulative
probability of obtaining 2 or fewer successes in 10 trials. The arguments specify the
desired number of successes 2, the total number of trials (size), the success probability in
each trial (prob), and lower.tail = TRUE to indicate that we want to compute the
probability in the lower tail of the distribution. The output is 0.055, which matches the
result from the previous calculation. This confirms that both approaches yield the same
cumulative probability.
5
Les avantages d'acheter des résumés chez Stuvia:
Qualité garantie par les avis des clients
Les clients de Stuvia ont évalués plus de 700 000 résumés. C'est comme ça que vous savez que vous achetez les meilleurs documents.
L’achat facile et rapide
Vous pouvez payer rapidement avec iDeal, carte de crédit ou Stuvia-crédit pour les résumés. Il n'y a pas d'adhésion nécessaire.
Focus sur l’essentiel
Vos camarades écrivent eux-mêmes les notes d’étude, c’est pourquoi les documents sont toujours fiables et à jour. Cela garantit que vous arrivez rapidement au coeur du matériel.
Foire aux questions
Qu'est-ce que j'obtiens en achetant ce document ?
Vous obtenez un PDF, disponible immédiatement après votre achat. Le document acheté est accessible à tout moment, n'importe où et indéfiniment via votre profil.
Garantie de remboursement : comment ça marche ?
Notre garantie de satisfaction garantit que vous trouverez toujours un document d'étude qui vous convient. Vous remplissez un formulaire et notre équipe du service client s'occupe du reste.
Auprès de qui est-ce que j'achète ce résumé ?
Stuvia est une place de marché. Alors, vous n'achetez donc pas ce document chez nous, mais auprès du vendeur tiu43862142. Stuvia facilite les paiements au vendeur.
Est-ce que j'aurai un abonnement?
Non, vous n'achetez ce résumé que pour €7,99. Vous n'êtes lié à rien après votre achat.