BIOL 300 Fundamentals of Biostatistics (Summary "The Analysis of Biological Data")
Probability and Variables
Summary of the book, The Analysis of Biological Data Chapter 1-4,6
All for this textbook (8)
Written for
Universiteit Leiden (UL)
Molecular Biology
Statistics for biologists II
All documents for this subject (3)
Seller
Follow
biovandijk
Reviews received
Content preview
Summary of the book, The Analysis of Biological Data
Chapter 7-14
Authors: Withlock – Schluter
Second edition
Chapter 7, analysing proportions
In this chapter, we’ll describe how best to estimate a population proportion using a random
sample, including how to calculate its confidence interval.
Consider a measurement made on individuals that divides them into two mutually exclusive
groups, such as success or failure, alive or dead, left-handed or right-handed, or diabetic or
nondiabetic. In the population, a fixed proportion p of individuals fall into one of the two
groups (call it “success”) and the remaining individuals fall into the other group (call it
“failure”).
If we take a random sample of n individuals from this population, the sampling distribution
for the number of individuals falling into the success category is described by the binomial
distribution. The term “binomial” reveals its meaning: there are only two (bi-) possible
outcomes, and both are named (-nomial) categories.
The binomial distribution provides the probability distribution for the number of
“successes” in a fixed number of independent trials, when the probability of success is the
same in each trial.
The binomial distribution assumes that
- The number of trials (n) is fixed,
- Separate trials are independent, and
- The probability of success (p) is the same in every trial.
p = 0.25 of the individuals are successes and
1 − p = 0.75 of the individuals are failures.
The sample size (n) is in the denominator of the standard error equation, so the standard
error decreases as the sample size increases. Larger samples yield more precise estimates.
The improvement in precision as sample size increases is called the law of large numbers.
The binomial test is used when a variable in a population has two possible states (i.e.,
“success” and “failure”), and we wish to test whether the relative frequency of successes in
the population (p) matches a null expectation (p0).
The hypothesis statements look like this:
- H0: The relative frequency of successes in the population is p0.
- HA: The relative frequency of successes in the population is not p0.
The null expectation (p0) can be any specific proportion between zero and one, inclusive.
,The binomial test uses data to test whether a population proportion (p) matches a null
expectation (p0) for the proportion.
The standard deviation of the sampling distribution for an estimate is known as the standard
error of that estimate.
Confidence interval is the range of most-plausible values of the parameter we are trying to
estimate, based on the data. The 95% confidence interval of a proportion will enclose the
true value of the proportion 95% of the time that it is calculated from new data.
The most commonly used method to determine a confidence interval for a proportion is
called the Wald method.
Chapter 8 Fitting probability models to frequent data
A goodness-of-fit test is a method for comparing an observed frequency distribution with
the frequency distribution that would be expected under a simple probability model
governing the occurrence of different outcomes.
Under the proportional model, each day of the week should have the same probability of a
birth, that is, 1/7 (see Example 8.1). This is the simplest possible model, so it’s our null
hypothesis:
H0: The probability of birth is the same on every day of the week.
HA: The probability of birth is not the same on every day of the week.
The χ2 statistic measures the discrepancy between observed frequencies from the data and
expected frequencies from the null hypothesis. It’s important to notice that the χ2
calculations use the absolute frequencies (i.e., counts) for the observed and expected
frequencies, not proportions or relative frequencies. Using proportions in the calculation of
χ2 will give the wrong answer.
The number of degrees of freedom of a χ2 statistic specifies which χ2 distribution to use as
the null distribution.
A critical value is the value of a test statistic that marks the boundary of a specified area in
the tail (or tails) of the sampling distribution under H0. Because our observed χ2 value
(15.05) is greater than 12.59 (i.e., further out in the right tail of the distribution), χ2 values
of 15.05 or greater occur more rarely under the null hypothesis than 5% of the time.
Therefore, our P-value must be less than 0.05, P=Pr[χ62≥15.05]<0.05, so we reject the null
hypothesis.
Assumptions of the x2 goodness of fit test:
- Individuals in the data set are a random sample from the whole population (counts
for every test)
- None of the categories should have an expected frequency less than 1.
- No more than 20% of the categories should have expected frequencies less than 5.
, If one of these conditions is not met, then we have two options. One option, if possible, is
to combine some of the categories having small expected frequencies to yield fewer
categories having larger expected frequencies (remember to change the degrees of freedom
accordingly).
The Poisson distribution describes the number of successes in blocks of time or space, when
successes happen independently of each other and occur with equal probability at every
instant in time or point in space. Rejecting a null hypothesis of a Poisson distribution of
successes implies that successes are not independent or that the probability of a success
occurring is not constant over time or space.
One unusual property of the Poisson distribution is that the variance in the number of
successes per block of time (the square of the standard deviation) is equal to the mean (μ).
In an observed frequency distribution, if the variance is greater than the mean, then the
distribution is clumped. If the variance is less than the mean, successes are more evenly
distributed than expected by the Poisson distribution.
Chapter 9, Contingency analysis (associations between categorical variables).
Contingency analysis estimates and tests for an association between two or more
categorical variables. Contingency analysis allows us to determine whether, and to what
degree, two (or more) categorical variables are associated. In other words, a contingency
analysis helps us to decide whether the proportion of individuals falling into different
categories of a response variable is the same for all groups.
The odds ratio measures the magnitude of association between two categorical variables
when each variable has only two categories. One of the variables is the response variable—
let’s call its two categories “success” and “failure,” where success just refers to the focal
outcome of interest. The other variable is the explanatory variable, whose two categories
identify the two groups whose probability of success is being compared. The odds ratio
compares the proportion of successes and failures between the two groups.
The odds of success are the probability of success divided by the probability of failure.
The odds ratio is the odds of success in one group divided by the odds of success in a second
group.
If the odds ratio is equal to one, then the odds of success in the response variable are
independent of treatment; the odds of success are the same for both groups. If the odds
ratio is greater than one, then the event has higher odds in the first group than in the
The benefits of buying summaries with Stuvia:
Guaranteed quality through customer reviews
Stuvia customers have reviewed more than 700,000 summaries. This how you know that you are buying the best documents.
Quick and easy check-out
You can quickly pay through credit card or Stuvia-credit for the summaries. There is no membership needed.
Focus on what matters
Your fellow students write the study notes themselves, which is why the documents are always reliable and up-to-date. This ensures you quickly get to the core!
Frequently asked questions
What do I get when I buy this document?
You get a PDF, available immediately after your purchase. The purchased document is accessible anytime, anywhere and indefinitely through your profile.
Satisfaction guarantee: how does it work?
Our satisfaction guarantee ensures that you always find a study document that suits you well. You fill out a form, and our customer service team takes care of the rest.
Who am I buying these notes from?
Stuvia is a marketplace, so you are not buying this document from us, but from seller biovandijk. Stuvia facilitates payment to the seller.
Will I be stuck with a subscription?
No, you only buy these notes for $4.89. You're not tied to anything after your purchase.