Summary Introduction to the Practice of Statistics (Extended Version) - Statistics
19 views 0 purchase
Course
Statistics
Institution
Vrije Universiteit Amsterdam (VU)
Book
Introduction to the Practice of Statistics
This summary based on the classic textbook for teaching statistics 'Introduction to the Practice of Statistics', helps students to correctly produce and interpret data found in a real-world context. The summary can be seen as a guide through the different types of data gathering and the analysis. U...
Summary Introduction to the Practice of Statistics (Extended Version) - Statistics
All for this textbook (6)
Written for
Vrije Universiteit Amsterdam (VU)
Master's Global Health
Statistics
All documents for this subject (3)
Seller
Follow
Myrtevdbergh
Reviews received
Content preview
Statistics Exam Notes
CHAPTER 1 - looking at Data Distributions
Terms:
- cases: objects described by a set of data → usually people in global health, could also
be villages, tractors etc.
- variable: a characteristic of a case → e.g. height
- value: different cases have different values of a variable → the height in cm
- label (unique ID): used to distinguish or uniquely identify cases with the dataset →
e.g. gender
- the key characteristics of a data set answer the questions: who, what and why?
Examining distributions:
- overall pattern:
● shape: e.g. normally distributed
● center
● spread
- deviations
- symmetry - skewed to the left / skewed to the right
● In statistics, a negatively skewed (also known as left-skewed) distribution is a
type of distribution in which more values are concentrated on the right side
(tail) of the distribution graph while the left tail of the distribution graph is
longer.
Measuring center:
1. The mean
- symbolized by x̄
- sensitive to outliers and skew
2. The median
- represented by M
- midpoint of a distribution
● half of the observations are smaller, the other half larger
- resistant to outliers and skew
- two numbers in the middle → take the average: e.g. 3,4 → M = 3.5
,Measuring spread: the quartiles
● works with the median (not the mean)
● splitting data into quartiles means splitting into 4 parts
● the median split the data into 2
● IQR (interquartile range)= Q3-Q1
● 1.5 x IQR rule for identifying outliers → anything greater than Q3 (or smaller than
Q1) + outcome of (1.5xIQR) is an outlier
- Multiplying the interquartile range (IQR) by 1.5 will give us a way to
determine whether a certain value is an outlier. If we subtract 1.5 x IQR from
the first quartile, any data values that are less than this number are considered
outliers.
● Order: minimum - quartile 1 - median/quartile 2 - quartile 3 - maximum
Boxplots
Measuring spread: the standard deviation
- works with the mean (not the median)
- symbolized by Sx
- average distance of the observations from the mean
1
,Choosing measures of center and spread:
NOTE: The median and IQR are usually better than the mean and standard deviation for
describing a skewed distribution or a distribution with outliers.
→ use mean and standard deviation only for reasonably symmetric distributions that
do not have outliers
Models
A model: a simplified representation of something more complex that helps us to understand
something
1. density curve:
- smooth curve drawn over the distribution
- it is a model of the distribution
- it is a model of what value the variable takes and how often
- if a smooth curve is always above the x-axis and the total mass/area/volume
under the curve is scaled to 1, it is a density curve
2
, Area under the curve:
● total area under a density curve is 1
● EXAMPLE: proportion of the density curve that is shaded (from 6 and <) is equal to
0.293 in a model showing the vocabulary score of 947 seventh graders → how to
interpret? About 29.3% of the vocabulary scores of the 947 seventh graders is below a
6.
Greek letters
● When mean and standard deviation come from a model of the data, Greek letters are
used:
Normal density curve:
- mathematical model for normally distributed data
- symmetric, single-peaked, and bell-shaped
- completely described by two numbers: u (mean) and 𝜎 (standard deviation)
- N (u,𝜎)
The 68-95-99.7 rule
In the Normal distribution with mean u and standard deviation 𝜎:
- approximately 68% of the observations fall within 1𝜎 of u
- approximately 95% of the observations fall within 2𝜎 of u
- approximately 99.7% of the observation fall within 3𝜎 of u
Standard normal distribution
● N (0,1)
● Simply easier to work with
● All normal distributions can be transformed (standardized) to N (0,1) (mean, SD))
--> standard normal probability/ standardized value of x/ z-score
3
The benefits of buying summaries with Stuvia:
Guaranteed quality through customer reviews
Stuvia customers have reviewed more than 700,000 summaries. This how you know that you are buying the best documents.
Quick and easy check-out
You can quickly pay through credit card or Stuvia-credit for the summaries. There is no membership needed.
Focus on what matters
Your fellow students write the study notes themselves, which is why the documents are always reliable and up-to-date. This ensures you quickly get to the core!
Frequently asked questions
What do I get when I buy this document?
You get a PDF, available immediately after your purchase. The purchased document is accessible anytime, anywhere and indefinitely through your profile.
Satisfaction guarantee: how does it work?
Our satisfaction guarantee ensures that you always find a study document that suits you well. You fill out a form, and our customer service team takes care of the rest.
Who am I buying these notes from?
Stuvia is a marketplace, so you are not buying this document from us, but from seller Myrtevdbergh. Stuvia facilitates payment to the seller.
Will I be stuck with a subscription?
No, you only buy these notes for $22.37. You're not tied to anything after your purchase.