Katharine Bailey is Director of Policy here at CEM, and for many years she has been working with schools and governments in the UK and around the world, helping them to use assessment data for pupil and school improvement.
At the recent Schools North East’s Evidence-based Excellence event, Katharine shared some of the research she is involved with, focussing on how assessment data is reported and exploring how data can affect student outcomes.
Her presentation at the event centred on a major ongoing study involving over 37,000 children in more than 1,000 primary schools in England.
Results from this study show that the type of assessment data provided to teachers at the end of the school year, as well as cultural factors, levels of engagement, and assessment and data literacy can impact children’s outcomes.
Download the presentation slides here
00:00:00 Katharine Bailey
What we hope is that the data that we get out of an assessment will impact on pupil outcomes if we don't think that, then I don't think there's any real reason why we will be looking at that data in the 1st place. So, an assumption is that by having this data and by using it, we're going to improve the rate at which our pupils learn or the amount.
00:00:20 Katharine Bailey
Which our pupils learn, but of course it isn't really isn't as simple as that. There's actually little evidence in the research base about the impact that assessment data directly has on pupil outcomes.
00:00:32 Katharine Bailey
A lot of that is because it's such a complex thing to unpack, so it's not just about the data in that report. There's such a whole load of other factors, and I'll just touch on those. But as Stuart was saying this morning, a lot of it there as well as around causality. So, it's quite easy to identify correlations so.
00:00:52 Katharine Bailey
Particularly strong leadership with good data outcomes, but that is not necessarily a causal relationship. That's a correlational relationship. So, there's quite a lot of.
00:01:03 Katharine Bailey
Of investigation needed and there are so. So, what I'm talking about there are there's these mediating factors between the assessment data and the outcomes and that's what we'll just explore a little bit.
00:01:15 Katharine Bailey
But because there's that lack of evidence, there is a real need for rigorous randomised controlled trials to explore the what actual benefit the reports have, and to some ways of reporting, or some ways of using reports, have additional benefit on pupil outcomes over others. And just getting a design a study design that will answer that question.
00:01:36 Katharine Bailey
Is not an easy thing to do.
00:01:38 Katharine Bailey
So those mediating factors, so between the data or the reports, that's the little bar chart at the top, there are a series of other mediating factors and I've grouped them into three sorts on the left hand side, that's a school building. It's really can't tell what's on that screen very well, but that's supposed to indicate things around the school.
00:01:57 Katharine Bailey
Culture. So, you might have a culture in the school which is particularly supportive of data use. You might not, but there's cultural factors at play.
00:02:06 Katharine Bailey
There's interactions or practices within the school interaction between colleagues that also can have an impact, and I'll say a little bit more about that. And then there's the report itself and the understanding of that report and how that's used. And so all of those things in the middle are factors at play when we're looking at how well our assessment data or our decisions around assessment.
00:02:26 Katharine Bailey
Impact on pupil outcomes.
00:02:29 Katharine Bailey
Let's just start at the top of that with the report. So, in the rest of what I say, I'm going to make an assumption that the reports are based on robust, well designed, well evaluated assessment tools assessment instruments.
00:02:45 Katharine Bailey
And that is not always the case, but let's assume that it is a robust assessment underpinning that. But then some thought to the right level of thought has been given to the design of those reports, so coming from them, we design and develop reports and a lot of the responsibility we bear in supporting you make accurate interpretations.
00:03:05 Katharine Bailey
And the data is that those reports are designed to do exactly that, to support you in making the right the right interpretation, not just any old interpretation. So, a lot of the work we do goes into making sure that that data is easily understandable.
00:03:22 Katharine Bailey
By you can see what the pertinent issues are straight away from that report and you know where that report needs challenging. The reports need to be validated. So, once we've developed something and we think it's the best we can do based on the research evidence that we've got, we need to test that out. So, we would go into schools and we would ask them, what did this report show you? Does it show you the types of things that we think it should be showing you? And if not, that's not your problem.
00:03:45 Katharine Bailey
That's my problem because we haven't designed those properly.
00:03:49 Katharine Bailey
The uses of those reports should be made absolutely clear and, more importantly, the things that shouldn't be used for should be made clear. So, we might say that the report is perfectly appropriate for identifying particular areas of strength and weakness for a child. But what we would say is that it is not appropriate for holding teachers to account. So, we need to make sure that those uses are absolutely clear.
00:04:12 Katharine Bailey
There's some interesting work on the desirable difficulty of pieces of data and analysis, so if a report is too simple, it's very easy for us to make very quick decisions around that without thinking about it too much.
00:04:26 Katharine Bailey
If a report is too hard, we don't look at it because it's just too inaccessible and we think, right, that's not for me. I don't have the skills to do that. But there's somewhere in, in between where you need to think, to work out what that's telling you and you're engaging lots of different processes to interpret that report. You're also realizing what other information you need to bring to
00:04:45 Katharine Bailey
bare to that and that is where you're likely to make the best decisions. So, there's something again for us as assessment designers to make sure that there is an optimum level of desirable difficulty within those reports.
00:04:57 Katharine Bailey
All of that thinking that goes into reports is not commonly used. It's in fact, it's quite rarely used, so some of the new systems you get around tracking people, tracking where lots of data goes in and some reports come out, they won't have gone through that rigorous process. That doesn't mean you shouldn't use them. It is. You'd be careful that you're triangulating it with evidence from other areas to make sure that you get the best decisions out of that data.
00:05:21 Katharine Bailey
OK, so let's look at the mediating factors now. So, in your school, you may have a particular data rich culture, so you're more likely to engage with the data in the.
00:05:30 Katharine Bailey
You may have very strong leadership that strong leadership will impact on your data use, but it will also impact on various other elements of teaching and learning so.
00:05:40 Katharine Bailey
The school who chooses to use a data like the type that we provide.
00:05:45 Katharine Bailey
May not the impact they see or the improvements they see won't be specifically about the use of our data or our assessment. It will be about the fact that the, the, the culture within the school or the school leadership has chosen to use that sort of method. So that complicates any evaluation of the impact of that assessment data.
00:06:01 Katharine Bailey
Is that data you've LED from the top? What's the school's approach to accountability? All those factors are going to mitigate are going to be mitigating factors between identifying where that impact is.
00:06:12 Katharine Bailey
Within the school, then, we've chosen an assessment. We've got some data. Who owns that data? Is it owned and kept quite close by senior leadership or is it made available to everybody?
00:06:22 Katharine Bailey
Does everybody have access to all of it?
00:06:25 Katharine Bailey
Is time made available for talking about those reports and for engaging with colleagues around questions that are coming out of those reps?
00:06:33 Katharine Bailey
Again, those factors are going to are going to be mediating the the outcomes, the pupil outcomes as a result of those decisions on that reporting. And finally, there's something in there about your own personal understanding and skills regarding the use of data. There's quite a broad spectrum of statistical and data literacy within schools.
00:06:54 Katharine Bailey
And you will get some as you would imagine, maths and science teachers who are quite comfortable with data. You'll get some people who are a lot less comfortable with data.
00:07:03 Katharine Bailey
There's quite a lot of skills you need to build up to be able to tackle that properly. There's knowledge around statistics around mathematics, around the asset. The area that you're assessing, and there's also a set of dispositions and beliefs. So, you will get people who have an over who are over reliant on data to the extent that they put aside their own personal.
00:07:24 Katharine Bailey
Judgments and you'll get the opposite end of the spectrum where someone only really trusts their own judgment and is very sceptical about data. And you need to have a healthy balance.
00:07:34 Katharine Bailey
OK, so there's lots of mediating factors which make it very complicated to understand. The direct impact that assessment data has, but there is some evidence coming out. Quite an old study now. 2011 was a randomized controlled trial conducted in a district in the states with 59 schools and they did some extensive.
00:07:54 Katharine Bailey
Training there on the use of assessment data and how that might impact on pupils learning in maths and literacy.
00:08:02 Katharine Bailey
They found an improvement in maths and they found a smaller improvement in reading, but these were small effect sizes.
00:08:09 Katharine Bailey
These would be in the low, perhaps point 9, point 10.
00:08:15 Katharine Bailey
The obvious issue with this study is that what they're actually evaluating is more about the training and not so much about the report. So, the date the assessment data worked. But how is it possible to separate out that the use of that specific data from the level of training that the teachers got? So, there's still there's a mediating factor in the.
00:08:35 Katharine Bailey
A more recent study, which was conducted in the US using PISA, PISA questionnaire data. So, I don't know if any of you have completed the PISA school questionnaire, but there is a section in there about data use. What you use data for and how you use that data and who owns that data within the school and such like.
00:08:55 Katharine Bailey
So that asks questions around do you use data for monitoring progress, for judging effectiveness, for identifying areas for improvements and such like?
00:09:04 Katharine Bailey
And they found that those practices were positively linked to pupil outcomes, but only when the uses that the teachers were talking about were made publicly were reported publicly. So, this was about school. So, when they knew the data was going to be shared more widely.
00:09:24 Katharine Bailey
As in outside the school, then you saw the impact on pupil learning improve.
00:09:28 Katharine Bailey
When the use was purely internal, some instructional purposes, there was no improvement. That's very interesting. There's quite a lot to explore in that. So, what does that say about accountability? Does that say accountability is a good thing? So, the accountability meant that the pupils had had greater learning. It means greater learning.
00:09:50 Katharine Bailey
That's really only two studies that I've found that have held a reasonable amount of water. A lot of the others either are low powered, as in they don't have very many children in them, or there are issues around causality.
00:10:03 Katharine Bailey
So, any of you who know who know us and have worked particularly in primary, might know that we were part of one of the governments now abandoned another abandoned policy around assessing children in the early years. So, the reception baseline policy in 20/15/2016.
00:10:24 Katharine Bailey
So that's now gone by the wayside. However, we did have, we did gather quite a lot of data and we were able to do some research on that and to try and start to answer some more of these questions.
00:10:34 Katharine Bailey
So, for those of you who aren't in the early years, this was a policy which meant that all children would be assessed on entry to school.
00:10:43 Katharine Bailey
In the first six weeks and that that data would be used as a baseline for the measurement of progress up to the end of year 6, 7 years later. And that would be the measure by which schools are to be held accountable.
00:10:57 Katharine Bailey
I'm not going to go into details about our assessment. That's not why I'm here. But just so that you know, it was a computer delivered assessment around 20 minutes. Early literacy, early mathematics skills, and we put that all together in the data together to give a single measure, which was the baseline for the children.
00:11:15 Katharine Bailey
We enhance that so that the very at the very basic level, you did that one baseline assessment and you got a standardized score per child. We offered two further packages. One of them meant that you could repeat the assessment at the end of the year and you've got some more enhanced reporting.
00:11:31 Katharine Bailey
The final option again, you could assess at the end of the year, but you've got some even more enhanced reporting. That's basically the summary of it. It looked like this, this green arrow down the side here, this says increased content and interactivity of the reporting. So, this is the most basic option, the baseline option. So, the further down the packages you go.
00:11:52 Katharine Bailey
The more data is provided and the more able you are to interact with that data. It's computer presented in as through a computer you could actually do things with it. You could manipulate that data.
00:12:03 Katharine Bailey
So, the first package we only gathered data at the start of the year, so we have no outcome measure. So, if you think about what the, what's the girls? These ladies before we're talking about and Stuart and Rob were saying, we're looking to measure the impact of the intervention. We want to measure at the start and ideally and measure at the end. So, we can see what the impact of the intervention is. We weren't able to do anything.
00:12:24 Katharine Bailey
Further, with these schools, because we didn't have any follow up measures, however, we were able to compare our medium package with our top package, so they were assessed at the start of reception again at the end of reception. And we're comparing the amounts of progress made between those time points across those two packages.
00:12:44 Katharine Bailey
I'm just going to flick through these in the basic package. All teachers got was per child a standardized score. That's it.
00:12:53 Katharine Bailey
With the second package, the middle package, you really can't see this at all. Kindness. I'm just going to briefly describe it for literacy and for maths there's an arrow.
00:13:04 Katharine Bailey
Going up the text to the left hand side rather than give a score describes what the child knows at that entry point into school and again at the end. So, it says can read and understand text. Choosing appropriate words to complete simple sentences, and the idea is that at the start of the year you get the blue line indicates where the child is.
00:13:25 Katharine Bailey
Compared to a national average at the end of the year, that's where they were compared to the national average. They've made that much progress.
00:13:32 Katharine Bailey
Their for literacy there for maths and they got it on the class level as well so they could compare children across those different assessment points.
00:13:41 Katharine Bailey
With the top package, you've got all of that, but you also got this interactive dashboard which allowed you to filter and sort pupils based on their EAL status based on their looked after status based on a number of other factors. I won't go into detail about it, but this was where you could actually do some filtering and sorting with their data.
00:14:04 Katharine Bailey
So, the research question we were trying to answer was, is the amount of people progress related to the content and interactivity of the reporting option you choose?
00:14:15 Katharine Bailey
We had just over 1000 English primary schools involved in this study, 37 thousand children.
00:14:24 Katharine Bailey
We looked at various factors at school and pupil level, so at school level we did have some independent schools and some state schools, so we were able to control for that. We looked at socioeconomics.
00:14:35 Katharine Bailey
Status the options of reporting that they chose and then the pupil information was to start and end literacy and mathematics scores and gender.
00:14:45 Katharine Bailey
We built multi level models to look at this and what that means is essentially we look to control for different things to see what level of impact those different things have on the output.
00:14:57 Katharine Bailey
And it means what we can do is we can, we can rule out the importance of whether you're an Independent School or a state school because we've built that into our model. So, we controlled for, as you would describe it, control, we controlled for pupil sex, the standardised school, whether it was a, a school in a high or low socioeconomic area and whether it was a state school or an Independent School.
00:15:19 Katharine Bailey
And the results were these. So, each level describes. So, we've got four different models and they varied between an effect size of point 13 and point 18. And it was like that for literacy. It was very similar for maths.
00:15:33 Katharine Bailey
What this tells us is that there is an impact of the report that you choose, so the report, the report you choose is related to increased pupil outcomes. It's a small effect. So, for you, with your class in front of your class, that effect size may seem very small.
00:15:52 Katharine Bailey
But this has policy implications because to have this size of effect over a large number of schools.
00:16:00 Katharine Bailey
Is a significant improvement when you consider the idea of iterating on that. So as Rob mentioned earlier, what we're looking to do is to find something that works and tweak it and tweak it and look for point 1, point 2.3 increased gains until we get something that's really robust. So, we shared this with the EF and we're going to be putting this in.
00:16:20 Katharine Bailey
As a bid to evaluate this more fully and the reason we need to do that is for the very reasons that Stuart was talking about, it's because we don't understand the causal relationship. We only understand that there is a correlation.
00:16:35 Katharine Bailey
So we found a significant association. However, there are cultural factors there, so we are well known as in CEM as being quite a scientific measurement oriented research driven unit.
00:16:48 Katharine Bailey
The one of the most popular option was an option, which is the common which has been common practice in schools for over 15 years, which is the observational assessment of children in the early years. So, there is something there about the choice people might have chosen our data because they already have a healthy relationship with data and they're already engaged with what data.
00:17:08 Katharine Bailey
And tell them so. Maybe that's come out in the outcomes.
00:17:12 Katharine Bailey
Levels of engagement as well, what we don't know is how widely that data was shared. Who was talking about it? Did that have an impact? And then was it that the reports were particularly well presented or was it that the teachers who were looking at them were particularly well engaged with that data? There is so much that we don't know. And the only way to really address that.
00:17:33 Katharine Bailey
Is to do a proper randomised controlled trial, because that that neutralizes all of those other factors and gives us a real picture of whether that those improvements are specifically down to our report or interactions with our reports.