Automated Assessment with Multiple-choice Questions using Weighted
Answers
Francisco de Assis Zampirollia,∗, Val ´erio Ramos Batistab, Carla Rodriguezc,
Rafaela Vilela da Rochadand Denise Goyae
Centro de Matem ´atica, Computac ¸ ˜ao e Cognic ¸ ˜ao, Universidade Federal do ABC (UFABC),
09210-580, Santo Andr ´e, SP , Brazil
Keywords: Automated Assessment, Multiple Choice Questions, Parametrized Quizzes.
Abstract: A resource that has been used increasingly in order to assess people is the evaluation through multiple-choice
questions. However, in many cases some test alternatives are wrong just because of a detail and scoring
nought for them can be counter-pedagogical. Because of that, we propose an adaptation of the open-source
system MCTest , which considers weighted test alternatives. The automatic correction is carried out by a
spreadsheet that stores the students’ responses and compares them with the individual answer keys of the
corresponding test issues. Applicable to exams either in hardcopy or online, this study was validated to a
total of 607 students from three different courses: Networks & Communications, Nature of Information, and
Compilers.
1 INTRODUCTION
In the case of multiple-choice questions, it is ex-
pected that teachers and professors engage in extra en-
deavour to elaborate ones that fairly assess their stu-
dents’ competencies and skills. There are widely ac-
cepted methods to evaluate and classify a large num-
ber of candidates, for instance the Item Response The-
ory (Aybek and Demirtasli, 2017). As an example, let
us consider the Brazilian National High School Exam
(ENEM), elaborated by Instituto Nacional de Estu-
dos e Pesquisas Educacionais An ´ısio Teixeira (INEP).
In January 2021 ENEM had almost 5.8 million stu-
dents enrolled for the classroom tests, but the absence
rate was 55.3% mostly due to the Coronavirus pan-
demic. The reader can see enem.inep.gov.br for de-
tails, but here we highlight that for the first time ap-
plicants could sit this exam online in some venues.
This was possible for 93,079 of the candidates but
precisely they contributed 70% to absence. Anyway,
INEP foresees that 100% of the tests will be online al-
ahttps://orcid.org/0000-0002-7707-1793
bhttps://orcid.org/0000-0002-8761-2450
chttps://orcid.org/0000-0002-1522-3130
dhttps://orcid.org/0000-0003-4573-3016
ehttps://orcid.org/0000-0003-0852-6456
∗Grant #2018/23561–1, S ˜ao Paulo Research Founda-
tion (FAPESP).ready in 2026. In fact, it is following the same trend of
many others, e.g. the TOEFL language exam, which
is now online (ets.org), and such a trend boosts more
sophisticated studies devoted to the elaboration of ap-
plicable questions.
In (Burton, 2001) the author presents a study on
improvements for the reliability of multiple-choice
questions through deterring examinees from just
guessing the right answer. The paper states that pure
guessing can be discouraged by fractional marks at-
tributed to wrong answers, namely ‘negative mark-
ing’ or ‘penalty scoring’. However, the final perfor-
mance can be damaged by the examinee’s uncertainty
in case they have solved a question just partially. The
very author cites some works that debate such penal-
ties, but he focuses on achieving percentage values
of unreliability of a test by studying three scenarios:
Q, where the only random element is the drawing
of some items from Question Banks (QB) in which
scope and difficulty are equally levelled, and the final
mark is exactly the number of correct answers; G, in
which the only random element is the drawing of the
alternatives; QG, which uses both random elements.
All questions must be answered. For Q and QG one
must have QB with at least five times the number of
questions in the exam. In his model, the author con-
siders an exam with sixty questions and four alterna-
tives per question. By taking the average knowledge
254
Zampirolli, F ., Batista, V., Rodriguez, C., Vilela da Rocha, R. and Goya, D.
Automated Assessment with Multiple-choice Questions using Weighted Answers.
DOI: 10.5220/0010338002540261
InProceedings of the 13th International Conference on Computer Supported Education (CSEDU 2021) - Volume 1 , pages 254-261
ISBN: 978-989-758-502-9
Copyright c/circlecopyrt2021 by SCITEPRESS – Science and Technology Publications, Lda. All rights reserved of 50% the mean scores in cases Q, G and QG were
30, 37.5 and 37.5, respectively. He concludes that a
60-question four-choice test is rather unreliable, and
one of the main reasons is that G and QG allow guess-
ing, which is not the case of Q.
The authors in (Oliveira Neto and Nascimento,
2012) adapted the Learning Management System
(LMS) Moodle to make formative assessment during
the teaching-learning process with high quality feed-
back for a distance learning course of 40h per week
in Mathematical Finance. These evaluations can bet-
ter direct the student’s performance if the feedback
is quick and precise at pointing out their difficulties.
Moreover, the feedback can guide the teacher about
the adopted teaching process, and so the students’
understanding can be reinforced regarding some top-
ics that have not been assimilated yet. By analysing
the students’ answers in previous classes, the authors
have improved the QB with additional rules to tests,
error messages and links to either theoretical topics or
extra exercises.
In the elaboration of multiple-choice questions, it
is also important to consider suitable wrong options
among the alternatives, also called distractors . Un-
suitable distractors enable the examinee to guess the
correct answer by discard, as discussed in (Moser
et al., 2012), where the authors present a text pro-
cessing algorithm for automatic selection of distrac-
tors. A more recent work is (Susanti et al., 2018),
but devoted to automatic production of distractors for
the English vocabulary. In (Ali and Ruit, 2015) the
authors present an empirical study on flawed alterna-
tives and low distractor functioning. They conclude
that removal or replacement of such defective distrac-
tors, together with increasing the cognitive level, im-
prove detection of high- and low-ability examinees.
Our present work introduces an automatic gen-
erator and corrector devoted to exams that consist
of multiple-choice questions with weighted alter-
natives. It is adapted from the open-source sys-
tem MCTest available on GitHub. For such exams
MCTest stores the correction in a CSV-file and emails
it to the professor. This file contains each student’s
responses compared with the individual answer key
of the exam issue received by that student. Common
programs like Excel and LibreOffice open the file in a
spreadsheet with built-in formulas that give each stu-
dent’s final mark according to the weights, as we shall
detail in this paper.
As a related work we cite (Presedo et al., 2015),
in which the authors use Moodle to create multiple-
choice questions with weighted alternatives. Their
system also enables the user to give an exam in hard-
copy but with neither the student’s id nor variationsof the exam. Moreover, it requires the plugin Offline
Quiz (moodle.org/plugins/mod offlinequiz). Moodle
enables Calculated question type , that we call para-
metric question in MCTest , in which the statement
and the alternatives accept wildcards but in Moodle
only for simple mathematical operations. By contrast,
MCTest enables nominal exams, numerous variations
and wildcards that accept complex formulas written in
Python and its libraries. Details on parametric ques-
tions with MCTest can be found in (Zampirolli et al.,
2021; Zampirolli et al., 2020; Zampirolli et al., 2019).
The paper is organized as follows. Section 2 de-
scribes the adapted MCTest for an automated assess-
ment with multiple-choice questions using weighted
answers; Section 3 shows the obtained results and dis-
cusses them; finally, Section 4 presents our main con-
clusions and opportunities for future work.
2 USING ADAPTED MCTest :
MATERIALS AND STEPS
This work applies the open-source Information and
Communication Technology (ICT) MCTest available
on GitHub (https://github.com/fzampirolli/mctest).
We have implemented MCTest in order to enable
weighting answers of multiple-choice questions. In
this section, we explain how to create exams that in-
clude such questions with weighted answers.
2.1 Creating Multiple-choice Questions
After downloading MCTest from GitHub, the sys-
tem administrator must install it on a server. Be-
fore creating a question, they have to include Insti-
tution, Course, Discipline and also associate a pro-
fessor as Discipline Coordinator. This one can cre-
ate discipline Topics and also add more professors.
See vision.ufabc.edu.br for details. Afterwards, any
of them can add a Class and also questions asso-
ciated to a Topic thereof. An example would be
setting[ED]<template-figure >atChoose Topic
in Figure 1. Namely, this topic belongs to a dis-
cipline called ED, a mnemonic to Example Disci-
pline. In that figure we have Short Description:
template-fig-tiger-en , which is optional but
makes it easier to locate questions in Question Banks
(QB), as we shall explain in Subsection 2.2. The field
Group is also optional for the user to define a group
of questions, so that in each exam MCTest will always
draw only one question from that group. The most rel-
evant field is Description , where we can insert para-
graphs in L ATEX and also combine them with a Python
code, as explained later in another example.Automated Assessment with Multiple-choice Questions using Weighted Answers
255