Lecture 1: Measurement, scaling and norms
Psychological construct (not observable = latent variable) -> observable behaviour
(operational definitions)
Degree of depression -> response to item
Measuring psychological attributes
- Observable behaviour is sensitive to psychological construct
- Determined in systematic way (response to test item)
- For the purpose of making comparisons
1. Individual differences (over time)
2. Inter-individual differences (different people)
Scaling
= the way numerical values are assigned to psychological attributes
- More practical: how is a test score or category determined from the
observations?
- 4 scales of measurements (nominal, ordinal, interval, and ratio)
Interpretation of test scores
- Test: systematic behavioural sample
- Scaling: assigning quantitative test score
- Norming: interpretation of test scores
Norming
- Distribution of test scores
- Standard score Z -> Z = (X-Xbar): sx
1. Number of standard deviations from the mean
2. Positive and negative values
3. Mean = 0, SD= 1/-1
- Converted standard score TX
TX= 10 x ZX + 50
- Percentile ranks PX: percentage of scores above or equal to a specific test score ->
for X = 10, PX is the % of people with score ≤ 10
PX= (absolute amount X) : 100 -> look up cumulative PX value in output table
Lecture 2: Reliability
2.1
Reliability = the extent to which differences in test scores are a function of real individual
differences (true scores), and the extent to what a test is free of random errors
Validity = the extent to which the test measures what it’s intended to measure, and the
extent to which the test is free of systematic errors
Classical test theory:
= for every subject, the observed score is the sum of the true score and random error
True score: not directly observable (= latent variable -> must be estimated)
, Error: difference between observed and true score (Xe = Xo – Xt). Can be positive or
negative and is also a latent variable
Assumptions:
1. µe = 0 -> mean error in population is zero (no systematic over- or underestimation of
true scores for population as a whole)
2. ret = 0 errors are completely uncorrelated with true scores (no systematic over- or
underestimation of true scores in subpopulations
3. reiej = 0 -> errors are completely uncorrelated with each other (error of subject 1 says
nothing about subject 2 etc.)
Variance:
Variance of Xo as a composite variable (Xo = Xt + Xe)
Because of assumption 2, variance of observed scores is equal to true score variance plus
error variance:
Reliabilty coefficient
= (Rxx) proportion of variance of observed scores explained by true scores
If CTT assumptions are valid, then in all cases 0 ≤ Rxx ≤ 1
Proportion of explained varance is a squared correlation, therefore alternative definition of
reliability is squared correlation of observed scores with true scores:
Unsquared correlation rot is called reliability index (not used often)
Standard error of measurement (two ways):
2.2 Estimating reliability
Parallel measurements
1. alternate forms: two different tests for the same construct
2. test-retest: same test at two different times
3. split-half: two parallel half-tests
Alternate forms:
Requirement: two measurements must be parallel, they should:
- measure exactly the same true scores
- have identical error variances
, Consequences:
- identical observed variances:
- identical correlations with true score
Problems:
1. Are tests really parallel? Never certain
Partial solutions: Domain sampling & consequences of parallelness
2. Carry-over effects (taking test 1 can influence results of test 2 -> can lead to
correlation between tests being too high -> overestimation of reliability)
Test-retest
= more plausible with test-retest than with parallel forms. Shouldn’t a test be parallel to
itself? Yes, but..
- People change: lower rxy -> underestimation of reliability -> short time between
test and retest!
- Carry-over effects: perhaps even stronger than with parallel tests -> can lead to
correlated errors or change in error variance (over- or underestimation) -> long
time between test and retest!
Split-half (from half-test to total test)
= correlation between (parallel) half-test -> reliability of half-tests, but we want reliability for
whole test, so..
Spearman-Brown formula
= gives effect on reliabilty of lengthening (or shortening) the test
“What whould be the reliability of the lengthened test if test with known reliability is made
n times as long?”
n = number of test halves
Problems:
- Parallelness
- Many splits are possible
Limited solutions:
- Most parallel half-tests
- Parallel item-pairs
- Evaluation of solution (split-half is not used often)
The benefits of buying summaries with Stuvia:
Guaranteed quality through customer reviews
Stuvia customers have reviewed more than 700,000 summaries. This how you know that you are buying the best documents.
Quick and easy check-out
You can quickly pay through credit card or Stuvia-credit for the summaries. There is no membership needed.
Focus on what matters
Your fellow students write the study notes themselves, which is why the documents are always reliable and up-to-date. This ensures you quickly get to the core!
Frequently asked questions
What do I get when I buy this document?
You get a PDF, available immediately after your purchase. The purchased document is accessible anytime, anywhere and indefinitely through your profile.
Satisfaction guarantee: how does it work?
Our satisfaction guarantee ensures that you always find a study document that suits you well. You fill out a form, and our customer service team takes care of the rest.
Who am I buying these notes from?
Stuvia is a marketplace, so you are not buying this document from us, but from seller fiorafleur. Stuvia facilitates payment to the seller.
Will I be stuck with a subscription?
No, you only buy these notes for $9.53. You're not tied to anything after your purchase.