Summary Test Construction - A Conceptual Introduction to Psychometrics, ISBN: 9789490947293 (PSMM-6)
Lecture notes Test Construction (PSMM-6)
Summary PSMM-6 Test Construction (A conceptual introduction to psychometrics by Gideon J. Mellenbergh)
All for this textbook (7)
Written for
Rijksuniversiteit Groningen (RuG)
Psychologie
Test construction (PSMM6)
All documents for this subject (3)
Seller
Follow
taravanderveen
Reviews received
Content preview
Summary Test Construction 2024
Chapter 1: introduction
Psychological and educational tests
- Test construction - development and application.
- What does the test look like?
- Instructions for administration, scoring and interpretation.
- Actual administrations of tests – what information does it give? – what is the usefulness of this information, and
for whom (individuals, policy)?
When you want to develop/ construct a test. You need to think about the above. Just thinking about what does the test
look like and what do I want to measure.
- Test theory – statistical theory about behaviour of item scores and test scores.
- Examples: classical test theory, item response theory
- Important issues: quantitative measures for the quality of items and tests for target groups of respondents.
You cannot develop a test without test theory. Are the items correct indicators of the construct you want to measure?
Both are needed for a sensible use of tests.
Use of tests: in practice
- Human Resource management: personnel selection and development
- Education: development and performance of students: identify deviating patterns of development by means of
pupil assessment system, prediction of most suitable type of high school at the end of primary school
- Psychodiagnostics – Neuropsychology, clinical psychology, developmental psychology
Judgments on individuals. Characterization.
Use of tests: in research
- Testing of hypothesis, theory; theory building
- e.g., ‘Location and size of brain damage determines type and severity of behavioural difficulties in the long term’
- Variables: ▪ indicators of location and size of brain damage ▪ behavioural difficulties - e.g., Anxiety, Agression,
Childish behaviour, Apathy, Lack of Insight
Judgments on populations. Group level. More often in research.
Test definitions:
A psychological or educational test is an instrument for the measurement of a person’s maximum or typical
performance under standardized conditions, where the performance is assumed to reflect one or more latent attributes.
Important aspects:
1. Measurement instrument
A test is used for measurement. Other uses are considered important applications of the measurement ▪ e.g., to predict
job success using an intelligence test - test measures the level of intelligence - prediction is application.
2. Test types.
Typical performance test:
The responses of the task are typical for the person, typifies person – no correct answers - e.g., personality, attitude,
mental health. Three main types:
- Personality test (questionnaires): measure personality characteristics of a person.
- Interest inventories: measure a person's interest in teaching, gardening, etc.
- Attitude questionnaires: measure a person's attitude towards a certain topic.
Maximum performance test:
The responses of the task are the best a person can do. Person’s achievement - e.g., intelligence, ability level.
Distinctions:
- pure power test: focus on accuracy, no time limit. Problems that the test taker tries to solve.
- time-limited power test: focus on accuracy, time is limited - speed test: focus on time taken to solve problems.
Another distinction based on which attributes they measure:
- ability test: measuring a person’s best performance in an area that is not thought through education or training.
- achievement test: measuring a person's best performance in an area that is thought through education or training.
3. Standardization
- Test conditions are fixed: e.g., test material, instructions, administration procedure, score computing are the same
for every testee. User manual for every test administrator, so that they do the exact same thing for every person.
- Aim: to ensure comparability of test performances between persons and test occasions. If two people have
different instructions, you cannot compare the test scores with each other.
- Difficult to achieve perfect standardization.
,- Specific aspects to standardize dependent on, for example, test or target population.
4. Latent attribute
Attribute that cannot be measured directly: e.g., verbal ability, arithmetic skills, severity of depression. This is why
we need a test. We come up with items that are indicators of the construct we like to measure.
- Test score (S) should reflect the latent attribute of interest (T; true score). What you can not know for sure, but
would like to approach as close as possible.
- causal relationship between attribute (true score) and test
score (S)
- thus: if two persons differ on the attribute, the test scores
differ as well, and the other way around.
Some important terminology
- Item: smallest test unit, on which person is scored, score can be the same as person’s response. Item is just a
question or task in a questionnaire.
- Subtest (also denoted as subscale, or just scale): independent part of a test, indicative of an attribute, consists of
various items. Subtest is a cluster of answers, consisting of multiple items.
- Dimensionality: the number of laten attributes (variables), which affect test performance. How many latent
attributes do I want to measure, affect the test performance.
- unidimensional (1 latent attribute that influences performance) vs multidimensional test (more then 1).
- Two types of testing situations: one where the test taker is the same person as the one who is measured (often self-
report questionnaire), or when the test taker is someone other than the person who is measured (observation test).
Chapter 2: Developing maximum performance tests. / Chapter 3: Developing typical performance
tests.
The test construction process is always in the following steps:
1. Define the construct of interest:
Specify the latent variable that needs to be measured by the test. This is also called the construct. The construct can for
example be defined using literature search: Test developer can choose definitions for constructs, and on their purpose
of using the test.
Constructs for Maximum Performance tests:
- Mental abilities to psychomotor skills and physical abilities.
- Scope. (domein)
- Educational to psychological variables.
Constructs for Typical performance tests:
- Attitudes
- Interests
- Values
- Opinions
- Personality characteristics.
2. Develop the test:
Essential aspects:
,1. Measurement mode of the test:
Maximum performance test:
- self-performance mode: somebody performs a mental or physical task.
- self-evaluation mode: ask somebody how good they are in a certain task.
- other-evaluation mode: ask others to evaluate a person's ability to perform a certain task.
Typical performance test:
- Self-report : the test taker answers question on a typical performance construct.
- Other-report: a person answering questions about another person’s construct.
- Somatic indicators: uses somatic signs to measure constructs. (anxiety through galvanic skin).
- physical traces: uses traces that are left behind to measure constructs.
These measurement modes can occur in two varieties (not with maximum, because there they always act their best
way, not with typical):
o Reactive: when test takers can deliberately distort their construct value (unmotivated person pretends to be
highly motivated in a self-report).
o Nonreactive: when test takers can not do the above.
Think about: How would you like to administer the test. how do you fill it out? Who is filling it out?
2. Objectives of the test
The test developer must specify objectives of the test, what is the test used for? (same for Maximum and Typical).
- research (study human intellectual functioning) vs practice (select job applicant).
- Individual (select applicant) or group level (compare scores of groups of students).
- description vs diagnosis vs decision-making
- to describe performances (description). Just describe performance, monitor the skills.
- to add a conclusion to a description (diagnosis). Add a conclusion to the test score.
- to make decisions based on tests (decision-making). Based on the conclusion, a program is used.
3. Population and subpopulations of testees
- Target population: the set of persons to whom the test has to be applied. The group that you make the test for
(target group) could be different from your norm group (people without a problem that needs to be measured).
You must define this:
- be as specific as possible. In describing for who this test is. Not just “Dutch people”, because not all ages can
make it.
- inclusion and exclusion criteria. Who are included or excluded?
- too broad > implications for norm groups and their representativeness.
4. Conceptual framework of the test
The conceptual framework gives the item writer a handle to write items. More specific than just definition; it helps to
write items. Based on the description of the framework. How are topics related to each other?
For Typical performance: three broad classes of strategies.
- intuitive class - relation between construct and items is of an intuitive nature. Just think about the constructs, what
comes to mind, write the items.
rational (loose description based on expert or target population knowledge) and prototypical (ask people of
your target population what is prototypical for this construct) methods.
- deductive class - start from theoretical or conceptual notions of the construct. Start from theory, write items based
on that.
construct method: use of theoretical framework (e.g. Koster et al). Items written based on theoretical
framework. Construct is in a network with other constructs, based on this items are written..
facet design method: conceptual analysis of the construct. Observe behavior that applies to the construct, the
behavior is classified according to aspects (facets) and these contain a number of elements (facet elements).
Based on that, items can be written.
- inductive class - constructs to be measured cannot be defined beforehand, but are identified using empirical data
(i.e., association measures). Often done with personality.
internal method: associations among items. Which items cluster together? Take them together. Which
construct describes the association between items. (factor analysis?)
external method: associations between items and external criterion (predictive validity). How well do the
items associate with a construct.
, 5. Item response mode
How can people answer? This needs to be specified before item writing starts. Main distinction:
- Free response mode: no answers, you need to fill it out yourself.
Short-answer items:
Essay items:
- Choice response mode: multiple choice.
Conventional multiple-choice items: consist of a stem of two or more options. Divided into one corrects
answer and one or more distractors. Distractors can be to show that the person does not have certain
knowledge about a subskill of the question.
Frequently used scales:
- Dichotomous = binary: e.g., yes/no, true/false, correct/incorrect - typically coded as 0, 1.
- Ordinal polytomous: e.g., never/sometimes/often.
- Partial ordinal polytomous: the correct options is ordered above distractors, but distractors are not ordered.
6. Administration mode
- Oral: test is presented orally by a single test administrator to a single test taker.
- Paper-and-pencil: test is presented in a booklet; test can be administered in groups. Same order for every person.
- Computerized: same as above, but on a computer instead of on paper.
- Computerized adaptive test administration: the computer program searches for the items that best fit the test taker.
7. Item writing
When you have all this information, you can write the items. There are guidelines for this.
Maximum performance test:
- Focus on one relevant aspect, so on one part of the construct.
- Use independent item content, one item should not influence the other.
- Avoid overly specific and overly general content.
- Avoid items that deliberately deceive test takers, so deliberately take their attention away from the problem they
have to solve.
- Keep vocabulary simple for the population of test takers.
- Put item options vertically.
- Minimize reading time and avoid unnecessary information.
- Use correct/ non-sensitive language: correct spelling and not offensive to target audience.
- Use a clear stem and include the central idea in the stem. So, if we need more than 1 option selected, make sure
that is clear.
- Word the item positively, avoid negatives.
- Use three options unless it is easy to write plausible distractors.
- Use one option that is unambiguously the correct or best answer.
- Place the options in alphabetical, logical or numerical order.
- Vary the location of correct options across the test.
- Keep options homogeneous in length, content, grammar etc.
- Avoid “all of the above” as the last answer. Could cause confusion.
- Make distractors plausible, look like a correct answer.
- Avoid giving clues to the correct option.
Typical performance tests:
- Elicit different answers at different construct positions: If people have different construct positions, they should
still give the same answer.
- Focus on one aspect per item. No double-barrelled questions.
- Avoid making assumptions about the test takers.
- Correct language.
- Clear and comprehensible wording.
- Use non-sensitive language.
- Put the situational or conditional part of a statement at the beginning and behavioral part at the end (at
examinations, I feel stressed).
- Use positive statements.
- 5-7 categories in ordinal-polytomous scales.
The benefits of buying summaries with Stuvia:
Guaranteed quality through customer reviews
Stuvia customers have reviewed more than 700,000 summaries. This how you know that you are buying the best documents.
Quick and easy check-out
You can quickly pay through credit card or Stuvia-credit for the summaries. There is no membership needed.
Focus on what matters
Your fellow students write the study notes themselves, which is why the documents are always reliable and up-to-date. This ensures you quickly get to the core!
Frequently asked questions
What do I get when I buy this document?
You get a PDF, available immediately after your purchase. The purchased document is accessible anytime, anywhere and indefinitely through your profile.
Satisfaction guarantee: how does it work?
Our satisfaction guarantee ensures that you always find a study document that suits you well. You fill out a form, and our customer service team takes care of the rest.
Who am I buying these notes from?
Stuvia is a marketplace, so you are not buying this document from us, but from seller taravanderveen. Stuvia facilitates payment to the seller.
Will I be stuck with a subscription?
No, you only buy these notes for $7.33. You're not tied to anything after your purchase.