Reading Time: Approx 4mins
It has become common, although I still find it surprising, to hear teachers use the word ‘data’ as if it were a bad thing.
‘Data drops’ have come to epitomise a pointless exercise in collecting meaningless numbers and feeding them into a system that can have no possible benefit for learners. People even say that Ofsted is ‘too reliant on data’, as if a judgement process could - or should – rely on anything other than data.
The problem is that data is often not actually data. The numbers that are typed into spreadsheets or tracking systems every six weeks don’t signify anything; they have no informational content, so cannot really be described as data.
If we are talking about assessment data, then some basic knowledge about assessment could help us to decide whether these numbers have any meaning or value. As a starting point, I offer here five criteria for assessment. In my view, if some ‘data’ production process does not meet these five, then it is NOT AN ASSESSMENT and no one should waste any time on it.
Assessment must be:
Assessment must contain information. In practice, that means it could surprise you: it could tell you something you don’t already know. It follows that simply recording an overall holistic judgement about the level at which a student is working is NOT AN ASSESSMENT.
If you just write down what is already your impression, it can’t surprise you.
On the other hand, if you have just taught a topic and then give students an assignment on it, you might find that some of them have not actually understood it. Or you might ask a child a very hard question and be surprised to find that they can do something well beyond what they have ever shown or been required to do before.
If you know your pupils well, you should hope not to be too surprised too often: mostly assessment will confirm or be consistent with what you already know. The point is not that all assessment has to surprise you, just that in principle surprise has to be possible.
Of course, if the result of an assessment is surprising, it doesn’t necessarily follow that it is right and you are wrong. All assessment is imprecise and can be wrong: we know that there are many reasons why someone might give a wrong answer to a question they know and should get right, or a right answer to a question they don’t actually understand.
So the information in an assessment also has a weight, depending on how reliable it is – how much information it conveys. An accurate, reliable assessment should probably make you question your judgement if assessment and prior judgement disagree, but an unreliable assessment (for example, the answer to a single question) may contain very little information and should not override an existing well-formed judgement.
So if you can’t say something about the weight, trustworthiness and precision of an assessment outcome then it is NOT AN ASSESSMENT.
This is really just a corollary of the need for assessment to contain information. If the result of an assessment is pre-constrained in any way, then it is NOT AN ASSESSMENT.
For example, if you ask a teacher to assign pupils to levels (eg ‘approaching’, ‘expected’ or ‘exceeding’) that have clear normative expectations (for example, if any child allocated ‘approaching’ will receive additional scrutiny, teacher workload and the implication that the teacher has not done a good job) then don’t be surprised if very few pupils end up in that category, but this is NOT AN ASSESSMENT.
Other examples of this are the Y1 ‘phonics screening check’ that shows a massive bulge just above the ‘pass’ mark, or the Early Years Foundation Stage Profile, in which most children are given the ‘expected’ level on all 17 early learning goals1.
When we assess pupils we are almost never interested to know that something has been done once in one particular context. Instead, we want to know that they will be able to do it again, to respond similarly to similar tasks and to transfer that performance to other contexts.
Examples of how this might fail to be the case include an assessment where the students are given hints about what questions are going to be asked. We can’t then generalise to performance on any other questions, so this is NOT AN ASSESSMENT.
Equally, an assessment where the assessed work has been directly shaped by comments and feedback from the teacher is NOT AN ASSESSMENT.
A key part of generalisability is replicability, often referred to in assessment contexts as reliability.
This usually refers to the interchangeability of arbitrary aspects of the assessment process that we want to be able to ignore, such as the time or occasion of testing (which could equally well have been morning or afternoon, or even on another day), the particular questions presented (which may be thought of as taken from a pool of possible questions on the topic that could have been asked) or the particular marker who assessed it (where other markers were, or might have been, involved).
If it isn’t replicable then it is NOT AN ASSESSMENT (or at least not a useful one). If the outcome would vary massively with a different occasion, different questions or a different marker, then it doesn’t actually tell us anything about the candidate’s knowledge or abilities.
Replicability is closely related to accuracy (as discussed above), both of which are often included under the heading of reliability.
If an assessment outcome is consistently replicable with little variation, then we can take it as a precise, accurate estimate of likely future performance and give it significant weight of evidence in drawing inferences about the knowledge and abilities of the student to whom it relates. For all these reasons, knowing the reliability of an assessment is a vital part of judging its quality.
In schools, data has become a four-letter word. But data is a good thing and we need to reclaim it.