Question types in English language diagnostic testing
English language proficiency testing, like large-scale testing in many other domains, often uses multiple-choice questions, to exploit the efficiency of automatic marking. An experiment supplementing established mcq tests with very short free-text answer questions in English diagnostic testing has shown that the latter are better discriminators at the lower end of student ability. Although not economic with paper-based marking on a large scale, the Assess By Computer e-assessment software offers marking options for such answers which make constructed-answer tests a realistic option.
Constructed vs selected answers in large-scale testing
In many domains, there is a need for quick and efficient large-scale testing of straightforward material. Selected answer1 tests can be automatically marked, and are thus widely used (for example, in the UK, for the DVLA automobile Driving Theory test.) Diagnostic testing of English language proficiency is another such domain, with global scope. Locally at the University of Manchester (UoM), the University Language Centre (ULC) tests over 1,000 students per academic year - around 850 in a single week each September - to assess their linguistic ability to follow an academic course. (This is additional to the standard TOEFL / IELTS admission requirements.) The UoM ULC tests consist mainly of mcq’s, as described below. However, selected-answer questions tightly constrain the extent to which a candidate can give evidence of incompetence, even in this intellectually limited domain. Our hypothesis was that, given the chance to answer freely, the weakest 1 We reserve the term “objective” to mean questions to which the answer, rather than the marking judgement, is a matter of objective fact - as opposed to “subjective”. Objective questions can require constructed answers, and subjective questions selected answers (“Give your opinion on a scale from 1 to 5 …”) candidates would give evidence of greater weakness than could be seen from MCQ results. This was confirmed by the data.
English language diagnostic testing
The test employed by UoM ULC for English language diagnostic testing is the Chaplen Speeded Grammar and Vocabulary Test (Chaplen, 1970), which has been used at Manchester University since the early 1970s. It is a well tried and tested gauge of a learner’s knowledge of the English language system and its formal or “educated” vocabulary. The total number of correct answers is presented as a percentage score. The test discriminates well at the upper intermediate and advanced levels of language proficiency, with students at these levels typically attaining scores ranging from 50% - 90%. A score of more than 90% indicates that the learner is approaching native speaker level. Below intermediate levels (approx 40%), however, it is not a useful instrument as it begins to lose its discriminatory power. High marks on the Chaplen typically correlate well with the number of years studying English as a foreign language in formal settings, though the strength of this relationship has not been tested. Originally developed in the late 1960s, the test reflects the structuralist description of language and methods of language testing. Using a multiple choice format, the Grammar (10 mins) and Vocabulary (18 mins) sections test students knowledge of a range of individual items of structure and lexis in “everyday educated English” (Chaplen, 1970: 174). For each section, there are 50 questions, each consisting of a sentence with a word or phrase omitted. The test taker must choose the correct filler from the list of possible answers provided. There is a choice of three possible answers in the Grammar section and five possible answers in the Vocabulary section. The short amount of time allowed for each section means that students work under considerable pressure of time and only the more proficient students manage to complete all the questions. The test is quick to administer and quick to mark. This is one of its major advantages, since it permits the rapid processing of very large numbers of students at low cost. Combining this rapidity of administration with an OMR marking system means that up to 1000 students can be tested and given their mark within a few days. The theoretical assumption which underpins the use of the test at Manchester is that adequate knowledge of the general language system can serve as a reliable indication of a student s ability to apply this knowledge in academic situations. Its principal use is to identify recently arrived overseas students who would benefit from attending classes in academic writing provided by the University, or who will probably experience difficulties in their academic work due to less than adequate levels of English language proficiency in reading and writing. A score 40% or less, broadly indicates that a student has an inadequate level of English language proficiency for academic study. The extensive trials that the test underwent during its development would appear to support this (Chaplen, 1970). In addition, in two follow-up studies, the test has been shown to have reasonable predictive validity (James, 1980; O’Brien, 1993). Because Chaplen is basically a test of discrete item recognition, it has to be complemented by a piece of continuous writing. The writing test consists of three questions to which short “essay” answers are expected, to be completed in 30 minutes. Morley (2000), who made a number of improvements to the test, has shown that the test scores correlate quite strongly with assessment of students continuous writing using trained assessors2. Despite all its advantages, it needs to be emphasised that Chaplen is not a test of language production, and it is not a test of language skills. In fact, it assesses a fairly narrow aspect of language competence through the recognition of correct lexical and grammatical choices provided as part of an artificially restricted set of choices. In this sense, it is less of a finely tuned instrument than the much more sophisticated, and much more expensive, internationally recognised university entry tests (eg IELTS and TOEFL) which take very much longer but which also test a broad range of language skills. Furthermore, despite the strong correlations with writing scores mentioned above, it is still not uncommon to come across cases of good spoken and written communicators who do not score well on Chaplen, and of good Chaplen scorers who are not good communicators. Finally, because of the multiple choice design, the discriminatory power of the test below a certain level is weak (around 40%) and even non–existent (around 25%). We therefore sought ways of maintaining the efficiency of the instrument, whilst at the same time endeavoring to measure students’ ability to produce correct language rather then simply to choose it. The aim was to fine-tune the instrument and to increase its discriminatory power without any loss of efficiency.
The experimental tests Our hypothesis was that, if a practical way could be found of testing with freetext questions - even with single-word answers - (a) all the students would be more effectively challenged by what would become, in effect, a production rather than a recognition task; (b) the weakest students would make more extreme errors than any of the mcq distractors, and we would thus have more effective discrimination at the bottom end of the range. The ABC (Assess By Computer) e-assessment software (Sargeant et al 2004) developed at UoM looked promising, and has been used in (to date) two trial runs of free-text question tests, with a third scheduled. The original UoM English language proficiency diagnostic assessment, as described above, consists of three separately timed tests3: “Grammar and
Usage” (10 minutes), “Vocabulary” (18 minutes), and “Writing” (30 minutes). These tests were set up in the ABC software and first taken in this form in February 2007 by 23 students (January entrants to postgraduate programs in the School of Computer Science, UoM). Although the students were unfamiliar with the software, none showed any signs of difficulty in using it, and results were as expected from previous experience with similar groups, i.e. there was no evidence of bias caused by use of the software. The clear difference lay in the speed of marking. The MCQ tests were marked automatically, with marking complete within minutes of the last student submitting their answers, rather than waiting several days for a scanning service. The answers to the “writing” test were output as a pdf file and marked on paper: the saving here lay in the greater ease of reading typescript than handwriting.
المادة المعروضة اعلاه هي مدخل الى المحاضرة المرفوعة بواسطة استاذ(ة) المادة . وقد تبدو لك غير متكاملة . حيث يضع استاذ المادة في بعض الاحيان فقط الجزء الاول من المحاضرة من اجل الاطلاع على ما ستقوم بتحميله لاحقا . في نظام التعليم الالكتروني نوفر هذه الخدمة لكي نبقيك على اطلاع حول محتوى الملف الذي ستقوم بتحميله .
|