A Good Test

You’ve set the tests and graded the scripts. What next? Dr Lee Ong Kim explains the finer points of assessment and tells us what to look out for when testing our students.

Q: Why is it important to have a proper assessment system in schools?

We all know that the whole idea of teaching and learning in schools is for students’ holistic development. This means that students should be assessed in all three domains – cognitive, affective and physical. Let’s talk about cognitive assessment, since that is the main focus of schools.

Assessments should effectively inform teachers of students’ progress, of their strengths, and of areas that are still not clearly understood. The assessment must therefore validly indicate to the teacher each student’s problems in conceptual understanding of the material taught, so that a plan for remedial action may be made.

Also, students will have additional learning through the assessment itself, even if they do not perform well on it. Many talk about assessment of learning, for learning and as learning. But in schools, it is mainly for learning because it is invariably a formative assessment.

So a proper assessment system is needed to safeguard test integrity and validity in order to enhance the teaching-learning process.

Q: What is assessment validity?

There are several types of test validity – namely, face validity, content validity, construct validity, and criterion-related validity. Face validity is simple. As long as a test consists of questions on the subject it is claiming to be testing, then it has face validity. However, this is clearly not sufficient. There has to be content validity as well.

A test has content validity if the questions are on topics already taught to the students and on areas required by the curriculum. In addition, the levels of difficulty of the questions should be according to the test plan, that is, it must reflect the proportion of the number of questions planned for each level of the cognitive taxonomy.

Thirdly, the test must also have construct validity. This means that the questions must elicit the psychological constructs that they are purported to be testing. For example, if the test is to measure students’ ability to use proper grammar in the English Language, then the questions must be on their grammatical skills. Test questions must not deviate from the intended purpose.

Sometimes we may find a question on a Math test that is based on a baseball game. The teacher must ensure that the question can be answered based on knowledge and ability on the mathematical construct intended and not based on knowledge of baseball games. Likewise, the language used on the Math test must not be at such a high level that the test construct shifts from math ability to language ability. This will make the test lose its construct validity.

Criterion-related validity is not so crucial for classrooms. In short, there are two types of criterion-related validity – concurrent validity and predictive validity. Concurrent validity means that if the test is said to be able to show how skilful the students are on one criterion, then performance on the test must also reflect the level of ability on the second, related criterion. Predictive validity would be the ability of the test to predict future success of the students, either on a job or at a higher level of study.

Q: How then should classroom tests be planned?

Assessments should effectively inform teachers of students’ progress, of their strengths, and of areas that are still not clearly understood.

– Lee Ong Kim, Vice-President, World Educational Research Association

Classroom tests are usually planned through the drawing up of a “test blueprint”, sometimes also referred to as the “table of specifications”. The essential ingredients of the blueprint are the topics and sub-topics to be tested, the proportion of the number of questions on each sub-topic, the proportion of the number of questions at each level of the cognitive taxonomy for each topic and sub-topic, and the stated objective of each question.

A test blueprint is also very useful because if the teacher who is supposed to construct the test is unable to do so for some reason, another teacher will be able to take over the task without deviating much from what the first teacher would have set.

Q: What else is important in the assessment process for teaching and learning?

A test, no matter how well planned and prepared, will be rendered useless if the scores obtained by students cannot be properly interpreted. Testing is a measurement process where we measure the status of the students’ learning at the time of testing. Measurement is always for the purpose of comparison.

Comparisons require an understanding of distributions with their Means and Standard Deviations, which can tell teachers the group status of their class compared to other classes, and how their students are spread out in their ability compared to other classes. Likewise, it is to compare how an individual student performs in comparison to others who took the test with him. Such interpretations of performance through comparisons are also termed norm-referenced interpretations.

Another form of performance interpretation is the criterion-referenced interpretation. This form of interpretation is like answering the question, “Is the student able to add two double-digit numbers that involves regrouping?” and other such questions. As long as the answer to such questions is “Yes”, the student has made the grade. Hence, a good teacher will be able to make all (that is, 100%) of his or her students learn so well as to achieve a grade A. This is in contrast to using each student cohort as its own norm group, where even a group of A students will be spread out in order to get a certain percentage to be graded below A.

Q: What are some common errors teachers make when making comparisons?

A test, no matter how well planned and prepared, will be rendered useless if the scores obtained by students cannot be properly interpreted.

– Ong Kim on how teachers should interpret results carefully

Teachers are more likely to take test scores as “absolute”, and conclude that a student who scored 86 marks is better than the one who scored 84 marks. This is a flawed interpretation if the Standard Error of the test is, say, 3 raw score points. Sometimes we even hear of students with an average of 92.5 marks being given the book prize on a school’s speech day and the student with an average of 92.3 is not awarded anything, being interpreted as falling into “second” place. If we think about Standard Errors, it could well be that the so-called second place is in reality in first place and vice versa.

Another common wrong conclusion made by some teachers is to think that two classes with equal Mean scores are two equal groups. This is not necessarily the case because in one class the students may be more homogeneous while the other class may have a bigger spread of abilities about the same Mean. The class with the bigger spread has some weak students to be helped. The Mean of the class is the same simply because there are some better students who have “balanced off” the poor performances of the weaker students.

Comparisons also have to be made using proper scaling, which needs an understanding of scale linearization and measurement errors. But teachers need only know the basics of such issues – the basic statistics of Means, Standard Deviations and Standard Errors.

Q: How can a teacher tell if a student has improved over time?

This appears to be much harder for teachers to do in schools. This is because to be able to compare students’ performances across different tests at different levels or different time points, the tests will first have to be equated. Equating is required because the tests may not be of the same difficulty levels and hence equal scores on them do not reflect equal abilities.

Equating means putting the tests on a single common scale. We all know that to compare the lengths of two pieces of strings, we should measure them on the same scale, the meter rule, for example. It would be really great if school teachers are taught how to equate tests so that their interpretations of students’ growth may be made more accurately.

In fact, if teachers are to do research, such as to see which teaching method is better for a given subject or topic, the outcome variable being students’ growth, it becomes all the more important that the tests be equated. Otherwise, the research conclusions will not be defensible.

Q: What assessment skills should teachers develop?

Teachers should develop skills in test planning, test-item construction, and interpretation of students’ performance on the tests, in order to be able to take the next step in their lesson planning.

We should remember that the teacher’s job consists of three main aspects – the curriculum, the pedagogy, and the assessment. It is not sufficient just to be able to interpret, plan and implement a curriculum, or to be excellent pedagogically. Without excellent skills in assessment, the teacher will not know how well the students have learned, and they will also not know how well they have taught.

Strength in all three aspects will increase the teacher’s professionalism tremendously.