Reading Time: Approx 4mins
Standardising a test score is fairly straightforward and I walked readers through the process here. As I said, however, “deciding whether test scores reported in this way are meaningful depends on a number of other variables which come into play – and that requires an understanding of a whole raft of issues”.
One of the key issues is whether a test itself has been standardised. This blog will look at how tests are standardised, why so much effort is taken to ensure this happens, and some frequently heard comments from teachers concerning standardised tests.
A standardised test is designed so that the questions which are asked, the conditions in which the test is taken, the way in which the test is marked and the interpretations of the test results are the same for all candidates who sit the test.
Standardising questions may seem straightforward, in that all those who take the test are given the same questions. Whilst this is true for a standardised test, the questions themselves must be – as far as possible – as clear as they can be for each candidate who sits the test. Test questions (or items, as they are known) have to be assessed for clarity; test designers will strip out any ambiguous or unnecessary carrier language from test items.
Standardised tests are published alongside administrative guides to ensure the test is taken in the same conditions by all candidates. Areas covered include how to keep materials secure prior to the test, how to prepare spaces in which candidates will take the tests and which, if any, equipment candidates should have access to during the test. Guides describe when and how to distribute materials to candidates, and they specify the required conditions which should be in place whilst the test is being taken.
The administrative guide for a standardised test must also provide test administrators with instructions for what they should do if the test is disrupted, or if a candidate is not able to continue during a test administration window.
A guide for scoring tests is also required for a standardised test, indicating how those marking the tests should allocate marks for candidates’ answers.
Finally, results of a standardised test will be reported in a specified manner. Results may be reported as grades, with a grade awarded for raw scores within given ranges of marks. For some standardised tests, mark boundaries for grades will be issued after the test has been taken; others are published alongside boundaries for grades. Many standardised tests report standardised scores. Yet others, such as Key Stage 2 tests, report in a bespoke form such as KS2 Scaled Scores.
Standardised tests are designed to estimate a student’s position within a cohort of similar students. The limitations of testing mean that there is an element of measurement error inherent in any test. Whilst some of this error cannot easily be minimised, a substantial amount of error can be mitigated by ensuring that conditions are, as far as possible, the same for all of those who sit the test.
Whilst to a large extent the standardisation of testing has become commonplace, at least for externally provided tests which have been standardised, it is worth considering what is known about the effects of not standardising a test. Where tests are administered in differing conditions, students’ outcomes differ systematically not due to the test itself, but due to the different conditions in place.
Those administering tests often worry about the content of the tests, and there are some common concerns which are often raised by those in schools.
It is important to note that standardised tests aim to place students within a distribution of similar students, and test items are therefore chosen to discriminate between students rather than to test specific knowledge. Items which are correctly answered by roughly a third to two-thirds of students are therefore selected for their discriminatory power rather than by their ability to test recall of a particular fact or concept. This means that the ideal test item will be too difficult for a substantial minority of those taking the test to answer correctly.
Tests will have been trialled with a representative sample of the population prior to your students taking the test, and this sample will have been taught in a number of different schools across the country. Therefore, whilst your students may have been taught a particular curriculum in ways which may be systematically different to others taking the test, this will be true to a greater or lesser extent for all of those who are assessed using a nationally standardised test.
Tests are never entirely free from measurement error. Any raw score, and consequently any standardised score, is simply an approximation of a student’s true score. In this blog, Rob Coe discusses the differences required to be confident that one score is really better than another; even on a simple 20-mark test, a score would have to differ by eight marks to allow you to be confident that a higher score is really better than a low one. Whilst this will be lower on a standardised test, there will always be a margin of error within any reported test score.
Furthermore, the limitations of testing mean that scores for those at the tails of the distribution - those scoring close to zero or to full marks - are more likely to be problematic, with ‘floor’ and ‘ceiling’ effects coming in to play. For those closer to the middle of the distribution, however, test scores provide a good indication of the student’s approximate position within the cohort.
Grades are an acknowledgement that standardised scores are estimates rather than precise measurements and are often preferred when reporting results for this reason.
It is also well understood that measurement error is often introduced when tests have high stakes attached to them. Test scores from cohorts which have been ‘taught to the test’ may not be a fair reflection of the students taking the test. Equally, where tests are not given the appropriate level of importance when administered, students may score lower than their true score.
Where a test has been properly standardised and administered, the results can give a useful indication of the relative position of a student within the population. This independently assessed information can be a powerful tool to support teaching and learning in school as it provides a check against the working assumptions made by teachers about students’ abilities. Whilst care has to be taken not to over interpret the results of standardised tests, understanding how and why they have been standardised should help to explain why it is worth knowing where students appear to be within the wider student population.
Richard is an experienced primary school teacher and data specialist. He has worked as an intervention teacher, focusing on supporting individuals and small groups across the primary age range. Since 2014, Richard has written about education for The Guardian, The Times Educational Supplement and Schools Week, and has spoken at national education conferences including Northern Rocks and ResearchEd. Richard is part of ASCL’s Expert Panel on Primary Assessment and has worked closely with bodies responsible for education at a national level. His first book, Databusting for Schools was published by Sage Publications in 2018.
Find out how CEM’s standardised assessments can help you.