100% satisfaction guarantee Immediately available after payment Both online and in PDF No strings attached
logo-home
Summary Bayesian Multilevel Models (880661-M-6) $9.16   Add to cart

Summary

Summary Bayesian Multilevel Models (880661-M-6)

 95 views  6 purchases
  • Course
  • Institution

Course grade: 8.0. Extensive summary for the course Bayesian Multilevel Models. The summary contains the content of all lectures and tutorials, including additional notes and explanations. The course is taught by dr. B. Nicenboim, as part of the MSc Data Science & Society at Tilburg University. Not...

[Show more]
Last document update: 8 months ago

Preview 5 out of 119  pages

  • June 2, 2023
  • January 19, 2024
  • 119
  • 2022/2023
  • Summary
avatar-seller
Bayesian Multilevel Models
MSc Data Science & Society
Tilburg University




1

,Week 1. Introduction to Bayesian Modelling

Lecture 1. Introduction to Bayesian Modelling
Uncertainty
“In April 1997, the Red River of the North flooded Grand Forks, North Dakota, overtopping the town’s
levees and spilling more than two miles into the city. Although there was no loss of life, nearly all of
the city’s 50,000 residents had to be evacuated, cleanup costs ran into the billions of dollars, and 75
percent of the city’s homes were damaged or destroyed. Unlike a hurricane or an earthquake, the Grand
Forks flood may have been a preventable disaster.” – The Signal and the Noise, Nate Silver.

The problem:
- The levees in Grand Forks had been built to handle a flood of 15.5 m;
- Weather Service didn’t communicate the uncertainty in their forecast to the public,
emphasizing only the 15 m prediction;
- The river crested to 16.5 m;
- However, the 95% Predictive Interval was [12.3, 17.7] m. ( and for a 99% confidence
interval, the interval should be extended even further).

What is Bayesian Data Analysis
“Bayesian statistics is an approach to data analysis based on Bayes’ theorem, where available
knowledge about parameters in a statistical model is updated with the information in observed data.
The background knowledge is expressed as a prior distribution and combined with observational data
in the form of a likelihood function to determine the posterior distribution. The posterior can also be
used for making predictions about future events.” – van de Schoot et al 2021.

Advantages of Bayesian Modeling for Data Science
1. It yields predictions with their associated uncertainty.
2. It allows for the use of our domain knowledge or prior information (when available) to
ensure that our estimates and predictions fall within a reasonable range. (Multilevel
models is one way to include prior information).
3. It’s not “data eager”. Bayesian methods specially shine with small to medium sized data
sets. If you have a lot data available, and you don’t know a lot about the data, you should
not be using Bayesian methods.

What is Probability?
Examples:
- Probability of being killed on a single airline flight of one of the 78 major world airlines
= 2.13 ⋅ 10−7 (in odds = 1 in 4.7 million);
- Probability of getting heads in a regular coin = 0.5;
- Probability of Hillary Clinton winning on the 2016 US election assessed on September
23𝑟𝑑 = .603.

Two Interpretations of Probabilities
1. Frequentist probability: probability is always defined in connection to a countable event
and its frequency in very large samples. E.g.: toss coin a lot of times.
2. Bayesian probability:



2

, o Probability describes uncertainty.
o Uncertainty is based on our limited information. If we knew everything, there
would be no uncertainty and no probability. This is never the case, hence the
importance of bayesian probability.
o Parameters (and models and measurements) can have probability distributions.

Properties of Probability (Kolmogorov’s axioms)
- The probability of an event must lie between 0 and 1, where 0 means that the event is
impossible and cannot happen, and 1 means that the event is certain to happen.
- For any two mutually exclusive events, the probability that one or the other occurs is the
sum of their individual probabilities.
- Two events are independent if and only if the probability of both events happening is
equal to the product of the probabilities of each event happening.
- The probabilities of all possible events in the entire sample space must sum up to 1.

Conditional probability
- 𝑃(𝑎|𝑏) stands for the probability of an event 𝑎 given 𝑏 (order is important for the
conditional probability, i.e., 𝑃(𝑎|𝑏) ≠ 𝑃(𝑏|𝑎)).
𝑃(𝑑𝑖𝑒|𝑠ℎ𝑎𝑟𝑘 𝑏𝑖𝑡𝑒 𝑜𝑓 𝑦𝑜𝑢𝑟 ℎ𝑒𝑎𝑑) = 1; 𝑃(𝑠ℎ𝑎𝑟𝑘 𝑏𝑖𝑡𝑒 𝑜𝑓𝑓 𝑦𝑜𝑢𝑟 ℎ𝑒𝑎𝑑|𝑑𝑖𝑒) = 0
- 𝑃(𝑎|𝑏) stands for the probability of 𝑎 given 𝑏, i.e., 𝑃(𝑎, 𝑏) = 𝑃(𝑏, 𝑎).

Bayes’ rule
Given that a and b are events, the conditional probability is defined as follows:

𝑃(𝑎, 𝑏)
𝑃(𝑎|𝑏) = ; 𝑃(𝑏) > 0
𝑃(𝑏)

This means that 𝑃(𝑎, 𝑏) = 𝑃(𝑎|𝑏)𝑃(𝑏). Since 𝑃(𝑏|𝑎) = 𝑃(𝑎, 𝑏), we can write:

𝑃(𝑏, 𝑎) = 𝑃(𝑏|𝑎)𝑃(𝑎) = 𝑃(𝑎|𝑏)𝑃(𝑏) = 𝑃(𝑎, 𝑏)

Rearranging terms leads to the Bayes’ rule:
𝑃(𝑎|𝑏)𝑃(𝑏)
𝑃(𝑏|𝑎) =
𝑃(𝑎)

The Bayes’ rule can be used for frequentists. It not “really” Bayesian. Frequentist statistics
traditionally do not incorporate prior beliefs or probabilities. Instead, frequentist methods rely
on concepts such as maximum likelihood estimation and hypothesis testing based on p-values.
These methods do not explicitly involve the Bayes' rule.

Probability distributions
What is a random variable (RV)? A function that maps from a sample space to a probability space
(how likely it is that some event happens; it assigns probabilities to all possible outcomes). A
sample space: outcomes of a non-deterministic process (given our knowledge). Examples:
- head or tail after throwing a coin;
- customer stays or leave;
- reaction time in an experiment under certain conditions;
- height of the water for a potential flood.


3

,Random variable theory
RVs allow us to think about:
- 𝑃(𝑋 < 𝑥); what probability of RV X to have observed values that are lower than 𝑥?
- 𝐸[𝑋]; what is the expected value (mean) of X?
- Or in general to summarize X somehow (e.g., using summary statistics such as mean,
median, quantiles)

To do that we need to understand how X is distributed:
- Every discrete random variable X has associated with it a probability mass function (pmf).
- Every continuous random variable X has associated with it a probability density function
(pdf).

Probability density functions (continuous case) or probability mass functions (discrete case) are
functions that assign probabilities or relative frequencies to all events in a sample space.

The expression:
𝑋~𝑓(∙)

Means that the random variable X has pdf/pmf 𝑓(∙).

Discrete random variables:
Example with the binomial
distribution.
n = total number of customers;
theta: probability of success
(e.g., customers staying).
In figure 1a: the probability of 4
customers staying, given the
parameters, is equal to 0.20. The
height of all possible outcomes
sum to 1.

Likelihood associated with the
binomial distribution
Total number of possibilities is
known. What is the most likely
value of theta? à use Maximum
Likelihood Estimate (MLE). If 7
customers stay out of 10, the
probability of customers staying
is equal to 0.70. Key point: the
function is the same, before we
had that everything is
conditional on theta, k, and n.
However, in the second case, the
function is conditional on k and n.




4

, Maximum Likelihood Estimate
(MLE)
For only 10 customers, MLE
will vary a lot. As n increases,
the estimate converges to the
true value of 0.70.




What information does a probability distribution provide?
1. Compute the probability (pmf) of a particular outcome (discrete case only).
This line of code uses the `dbinom` function to compute the probability of a specific
outcome in a binomial distribution. The arguments are the number of successes (5 in this
case), the total number of trials `size`, and the probability of success in each trial (prob).
The output of this computation is 0.25. This means that in a binomial distribution with
10 trials and a success probability of 0.5 in each trial, the probability of observing exactly
5 successes is 0.25. The `dbinom` function is used for computing the probability mass
function (PMF) of the binomial distribution. It calculates the probability of each possible
outcome in the distribution. In this case, it provides the probability of observing 5
successes in 10 trials.




2. Compute the cumulative probability of k or less (more) than k successes.
This line of code calculates the cumulative probability of obtaining 0, 1, or 2 successes
out of 10 trials in a binomial distribution. The `dbinom` function is used to compute the
probability of obtaining a specific number of successes (k) given the total number of trials
(size) and the probability of success in a single trial (prob). By summing the probabilities
of each individual outcome, we obtain the cumulative probability. The output of this
computation is 0.055. This means that the cumulative probability of getting 2 or fewer
successes (0, 1, or 2) in 10 trials, with a success probability of 0.5 in each trial, is 0.055.

The second line of code uses the `pbinom` function to directly calculate the cumulative
probability of obtaining 2 or fewer successes in 10 trials. The arguments specify the
desired number of successes 2, the total number of trials (size), the success probability in
each trial (prob), and lower.tail = TRUE to indicate that we want to compute the
probability in the lower tail of the distribution. The output is 0.055, which matches the
result from the previous calculation. This confirms that both approaches yield the same
cumulative probability.




5

The benefits of buying summaries with Stuvia:

Guaranteed quality through customer reviews

Guaranteed quality through customer reviews

Stuvia customers have reviewed more than 700,000 summaries. This how you know that you are buying the best documents.

Quick and easy check-out

Quick and easy check-out

You can quickly pay through credit card or Stuvia-credit for the summaries. There is no membership needed.

Focus on what matters

Focus on what matters

Your fellow students write the study notes themselves, which is why the documents are always reliable and up-to-date. This ensures you quickly get to the core!

Frequently asked questions

What do I get when I buy this document?

You get a PDF, available immediately after your purchase. The purchased document is accessible anytime, anywhere and indefinitely through your profile.

Satisfaction guarantee: how does it work?

Our satisfaction guarantee ensures that you always find a study document that suits you well. You fill out a form, and our customer service team takes care of the rest.

Who am I buying these notes from?

Stuvia is a marketplace, so you are not buying this document from us, but from seller tiu43862142. Stuvia facilitates payment to the seller.

Will I be stuck with a subscription?

No, you only buy these notes for $9.16. You're not tied to anything after your purchase.

Can Stuvia be trusted?

4.6 stars on Google & Trustpilot (+1000 reviews)

73773 documents were sold in the last 30 days

Founded in 2010, the go-to place to buy study notes for 14 years now

Start selling
$9.16  6x  sold
  • (0)
  Add to cart