Exam (elaborations)

Examining the validity of an analytic rating scale for a Spanish test for academic purposes using the argument-based approach to validation

6 views 0 purchase

Course
Validity of an analytic rating scale

Institution
Validity Of An Analytic Rating Scale

Students wishing to study at a university where the medium of instruction is different from their mother tongue are often required to prove their proficiency by taking a language test for academic purposes. These tests are considered high-stakes because results are used to make decisions that hav...

[Show more]

Preview 2 out of 15 pages

View example

Uploaded on August 6, 2024
Number of pages 15
Written in 2024/2025
Type Exam (elaborations)
Contains Questions & answers

examining the validity of an analytic rating scale

Institution Validity of an analytic rating scale
Course Validity of an analytic rating scale

$14.99

Added

Add to cart Add to wishlist

100% satisfaction guarantee
Immediately available after payment
Both online and in PDF
No strings attached

Assessing Writing 35 (2018) 41–55

Contents lists available at ScienceDirect

Assessing Writing
journal homepage: www.elsevier.com/locate/asw

T
Examining the validity of an analytic rating scale for a Spanish test
for academic purposes using the argument-based approach to
validation
Arturo Mendozaa, , Ute Knochb
⁎

a
Department of Applied Linguistics, School of Languages, Linguistics and Translation, Universidad Nacional Autónoma de México, Circuito interior s/
n, CP 04510, Mexico City, Mexico
b
Director, Language Testing Research Centre, University of Melbourne, Parkville 3010, Victoria, Australia

AR TI CLE I NF O AB S T R A CT

Keywords: Rating scales are used to assess the performance of examinees presented with open-ended tasks.
Analytic rating scales Drawing on an argument-based approach to validation, this study reports on the development of
Writing assessment for academic purposes an analytic rating scale designed for a Spanish test for academic purposes. The study is one of the
Argument-based approach to validation ﬁrst that sets out the detailed scale development and validation activities for a rating scale for
Many-facet Rasch measurement
Spanish as a second language. The rating scale was grounded in a communicative competence
model and developed and validated over two phases. The ﬁrst version was trialed by ﬁve raters,
and its quality was analyzed by means of many-facet Rasch measurement. Based on the raters’
experience and on the statistical results, the rating scale was modiﬁed and a second version was
trialed by six raters. After the rating process, raters were sent an online questionnaire in order to
collect their opinions and perceptions of the rating scale, the training and the feedback provided
during the rating process. The results suggest the rating scale was of good quality and raters’
comments were generally positive, although they mentioned that more samples and training
were needed. The study has implications for rating scale development and validation for lan-
guages other than English.

1. Introduction

Students wishing to study at a university where the medium of instruction is diﬀerent from their mother tongue are often required
to prove their proﬁciency by taking a language test for academic purposes. These tests are considered high-stakes because results are
used to make decisions that have important consequences in students’ lives (Bachman & Palmer, 2010; Kane, 2013). In order to
guarantee that scores are fair, language tests must be carefully scrutinized and validated to ensure that the scores and the inter-
pretations based on the scores are valid and fair. When examinees are presented with open-ended writing tasks, the scripts they
produce are usually assessed by trained raters who use a rating scale to assign a score to the examinee’s performance. Rating such
performances is a complex undertaking. A score on such a writing test is not always purely a reﬂection of the writers’ performance,
but the outcome of the interaction between the rater, the rating scale and the script (Crusan, 2014; McNamara, 1996; Weigle, 2002).
This interaction can lead to undesired sources of variability that threaten the reliability of the exam and its results (East, 2009). This is
why rater training and monitoring is essential (Knoch, 2009, 2011; Weigle, 2002), as is studying the quality of the scoring process and

⁎
Corresponding author at: Department of Applied Linguistics, School of Languages, Linguistics and Translation, Universidad Nacional Autónoma de México,
Circuito interior s/n, CP 04510, Mexico City, Mexico.
E-mail addresses: a.mendoza@enallt.unam.mx (A. Mendoza), uknoch@unimelb.edu.au (U. Knoch).

https://doi.org/10.1016/j.asw.2017.12.003

1075-2935/ © 2018 Elsevier Inc. All rights reserved.
Received 11 July 2016; Received in revised form 7 December 2017; Accepted 19 December 2017

, A. Mendoza, U. Knoch Assessing Writing 35 (2018) 41–55

the scores (Montee & Malone, 2014).
While publications on work on rating processes, including scale development, and rater functioning are abundant in the as-
sessment of English as a second or foreign language, very little has been written about similar endeavors in the assessment of other
languages, for example Spanish (but see e.g., Ducasse & Hill, 2015 for the development of a rating scale to assess the writing of
Spanish speaking graduate students). In this paper, we describe the development and validation of a rating scale for Spanish for
academic purposes. This study is important as it sets out in detail the kind of procedures that other researchers involved in scale
development for rating scales for languages other than English may want to follow or adapt. In particular, we argue that rating scales
for languages other than English cannot simply rely on adapting a scale developed for English in such contexts as there are clear
diﬀerences in the languages and in the way second language ability develops. In the literature review that follows, we describe the
existing literature on rating scale development and validation, the assessment of language for academic purposes more generally and
the assessment of Spanish for academic purposes. We then describe the context of the study and the current project in more detail.

1.1. Rating scale development and validation

The development and validation of rating scales for academic writing is no simple undertaking. Scales should be conceived and
designed with the purpose of the assessment in mind (Crusan, 2014; Fulcher, 2010; Knoch, 2009; Montee & Malone, 2014; Weigle,
2002) and should be a good representation of the construct of the assessment (McNamara, 2002). In the anglophone context, rating
scales are often adapted or adopted from existing scales (Becker, 2011). For instance, in an academic setting, rating scales might be
derived from rating scales used in large-scale language tests for academic purposes. However, East (2009) cautions about the perils of
adapting rating scales from other similar ones, especially across languages. He argues that rating scales should take the target
language into account.
Rating scale developers have a number of decisions to make in the development process, all of which have been described in detail
in the literature. The type of rating scale selected (e.g., holistic, analytic, checklist, etc.) needs to closely reﬂect the purpose of the test
(Crusan, 2014; Hamp-Lyons, 1991; Montee & Malone, 2014; Weigle, 2002) and the outcome reported to users (Knoch, 2009). The
criteria in a scale are usually a reﬂection of the test construct and can either be based on a theory of language learning or devel-
opment or may be a reﬂection of a careful empirical analysis of written data produced by students (Fulcher, 2010; Knoch, 2009, 2011;
Montee & Malone, 2014). Scale designers need to ensure that the scale is not overly context-dependent, and therefore not gen-
eralizable to other testing contexts (Fulcher, 2010). Further decisions involve the number of band levels included in a scale (see e.g.,
Alderson, Clapham, & Wall, 1995; Attali, Lewis, & Steier, 2012).
Rating scale validation is often not clearly articulated in scale development reports, which makes it diﬃcult to conduct com-
parisons between studies or replication research. Scale validation projects are also rarely framed within a theoretical model of scale
validation or validation in assessment. A brief review of recently published scale development and validation studies in language
assessments shows that very few of these studies were grounded within a theoretical model of validation (but see Deygers & Van
Gorp, 2015; Janssen, Meier, & Trace, 2015; Knoch, 2009; Lallmamode, Daud, & Kassim, 2016; Youn, 2015). In a recent paper
integrating rating processes into an argument-based framework to validation, Knoch and Chapelle (2017) put forward a range of
warrants, assumptions and possible sources of backing, many of which are directly relevant to the validation of rating scales. Drawing
on Kane’s conceptualization of inferences, warrants and assumptions Kane (2001, 2006, 2013), they were able to show that rating
processes are not only located within the evaluation inference as commonly conceptualized, but have relevance throughout most
inferences described in validation work. The warrants and assumptions relating to rating scales focused not only narrowly on the
scoring inference (as it was previously conceptualized), but showed that rating scales relate more broadly to all inferences in an
argument-based approach to validation, including the explanation inference (which examines the theoretical construct underlying
the test and the scale, as well as test consequences and decisions). Their framework provides a useful starting point for rating scale
validation and we will draw on this framework to situate our validation work as outlined in the description of the current study
below. Due to the scope of this study, we will focus on parts of the evaluation and explanation inference in this paper only, however
in the ﬁnal section of the paper, we also provide suggestions for future work to broaden the validation activities. We list the speciﬁc
warrants and assumptions for which we sought backing for this study in Table 1 in the methodology section.

1.2. Language tests for academic purposes

Tests designed for academic purposes should authentically reﬂect the writing skills needed by students for academic success
(Cumming, 2013, 2014). These skills vary from ﬁeld to ﬁeld, making the selection of writing tasks a diﬃcult endeavor. Studies
conducted in Anglophone contexts have shown the diversity of genres and writing tasks required of university students across
academic disciplines (Canseco & Byrd, 1989; Cooper & Bikowski, 2007; Gardner & Nesi, 2012; Hale et al., 1996; Horowitz, 1986).
Research has also been conducted with faculty members and students regarding the importance of diﬀerent academic writing skills
(Rosenfeld, Courtney, & Fowles, 2004; Rosenfeld, Leung, & Oltman, 2001) and these studies have highlighted the importance of
academic writing skills such as paraphrasing, and the ability to appropriately cite from a range of sources. For Spanish, such studies
are few, but they reﬂect to a large extent what has been found in Anglophone contexts (Castelló et al., 2012; Hernández & Castelló,
2014; Mendoza, 2014). Without a careful examination of the setting under assessment – in this case, academic writing – there is a risk
of under-representing or ill-deﬁning the construct (Cumming, 2014).

42

The benefits of buying summaries with Stuvia:

Guaranteed quality through customer reviews

Stuvia customers have reviewed more than 700,000 summaries. This how you know that you are buying the best documents.

Quick and easy check-out

You can quickly pay through credit card or Stuvia-credit for the summaries. There is no membership needed.

Focus on what matters

Your fellow students write the study notes themselves, which is why the documents are always reliable and up-to-date. This ensures you quickly get to the core!

Frequently asked questions

What do I get when I buy this document?

You get a PDF, available immediately after your purchase. The purchased document is accessible anytime, anywhere and indefinitely through your profile.

Satisfaction guarantee: how does it work?

Our satisfaction guarantee ensures that you always find a study document that suits you well. You fill out a form, and our customer service team takes care of the rest.

Who am I buying these notes from?

Stuvia is a marketplace, so you are not buying this document from us, but from seller Ariikelsey. Stuvia facilitates payment to the seller.

Will I be stuck with a subscription?

No, you only buy these notes for $14.99. You're not tied to anything after your purchase.

Can Stuvia be trusted?

4.6 stars on Google & Trustpilot (+1000 reviews)

83637 documents were sold in the last 30 days

Founded in 2010, the go-to place to buy study notes for 14 years now

Start selling

Popular Universities in the United States

Popular books

Find notes and summaries for these qualifications