Systematic Reviews and Weather Forecasts – how purpose shapes the significance of systematic reviews for different education stakeholders

Featured Image

Philippa Cordingley – Chief Executive CUREE
Paul Crisp – Managing Director CUREE
Steve Higgins – Professor of Education, Durham University

We are planning a barbeque in our new house next weekend and we want to know if it’s going to rain. We look at the weather forecast and it tells us there’s a 25% chance of rain.

That’s no good to us – we want to know will it rain, yes or no?

Now we take a look at the weather history for our region and we see that in three years of the last five it’s rained that week. Then we notice that someone in the next street has an amateur weather station in his garden, so we track him down and ask him if it’s going to rain. He looks at his local information and says ‘no’.

This is the definite answer we wanted but can we rely on it?

The importance of purpose

Back in our education world, practitioners face the same issues when trying to make decisions based on research. Researchers (and meteorologists) are looking for the broad, generally applicable features and concepts. Practitioners (and barbeque planners) have a different purpose – they just want a definite answer to a specific problem (will it rain?). This difference in purpose is even greater when the research is in the form of systematic reviews and meta-analyses – and we are focusing on these because they have been getting official endorsement and some traction in the broader system.

Researchers may describe them as the ‘gold standard’ but – as is the nature of science - there are dissenting voices drawing attention to flaws and in these methodologies and inviting you to dismiss them all as ill-founded and misleading. The art and science of systematic review is a relatively new ‘technology’ in education and lots of the critique is based on myth and misunderstanding.  

Here we offer our take on what they are and why they matter in the context of purpose - and planning barbeques.

Map of evidence

For systematic reviews, the starting point is to round up all the discoverable empirical research on a specified topic and then scrutinise it through a rigorous and transparent critical review process.

Meta-analyses take the quantified results (e.g. effect sizes) from the most rigorous studies that relate to their question and use statistical techniques combine or integrate them often into a single average value. In both cases, the purpose is to establish an overview of what is currently known in a given field to inform the next wave of research.

Lots of attention is given to rigour and generalisability because the goal is a generic one; to pave the way for further critique - not to put a full stop to knowledge development. The outcome is a kind of high-level map of the evidence – good enough to find the island but not accurate enough to lead you to the buried treasure.

Selecting relevant and high-quality research

There is a strong emphasis on rigour in conducting these reviews so there are elaborate procedures for ensuring that finding all the research and selecting the relevant and the high-quality from it is done in a comprehensive, systematic and transparent way.

Every stage is documented so that someone else can replicate or challenge the review’s findings. This is important because you won’t find enough (say 20) high quality research studies all directly answering your research question so you have to include some which are partially or indirectly relevant.

Neither quality nor relevance is a binary issue so judgements have to be made about whether to include a study and what conclusions can be drawn from it.

For instance, most studies about CPD say very little about impact on learners (because they are investigating impact on teachers); you are more likely to find relevant but indirect data in a study about the impact on pupils of an intervention on, say, maths that also researches the CPD supporting the intervention.

A good fit with policy

Beyond the research world, systematic reviews are often commissioned by governments because they are interested in broad system impacts and, sometimes, to promote evidence informed practice or policy making.

Examples of reviews with this kind of purpose and impact are the Best Evidence Studies in New Zealand, and much of the research underpinning the EEF Teaching and Learning Toolkit here in the UK.

These are often ‘reviews of reviews’ and, for policy actors, the broad nature of the findings closely matches the policy issues they are trying to address such as “what level of impact do different reading interventions have on primary age children?”.

The questions and aims are generalised so the more general findings are helpful. That’s how policy works.

Much the same is true in the interpretation of effect sizes where a meta-analysis is looking at a lot of studies of interventions which are broadly related to each other and it reports the average effect. The fact that, within this average, there’s a lot of variation, doesn’t matter to policy makers.  

Making informed professional judgements

Teachers, however, are more like us and the regional weather forecast. It’s a bit of help knowing that there’s a 1 in 4 chance of it raining on our party but nothing like enough help for us to make a decision.

Teachers want to use evidence to inform their practice. To do that they have to combine evidence from their pupils and their practice, i.e. their real world, with evidence from research to make informed professional judgements about what is likely to work for their pupils in their context.

A systematic review helps them focus in on the kind of thing likely to work and, possibly more valuable, the kind of thing which probably won’t. But all the detail of the intervention which would enable you to figure out if it would work for you – and to actually do it in your classroom - is probably missing from the systematic review.

Here, individual studies, and high quality small scale case studies (almost as rare as hen’s teeth) are likely to provide the texture and detail you need - even though they would be discounted as not significant in a meta-analysis. Your high quality local data related tightly to your purpose means generalisability is less important than relevance but only once you also have the high level map – you need both.

‘Nothing works everywhere’

School and MAT leaders are in a difficult, in-between place.

For them, the purpose of evidence is to inform their thinking, planning and practice expressed usually in school policies and systems. They too have to combine evidence from the real world but this time it has to encompass teachers’ capacities and starting points as well as evidence from their pupils and school performance. But because they are making a policy which is generalised there is more need to ensure they are working with generalisable evidence.

There is an important difference between the weather forecast and systematic reviews. The forecast is a specific attempt to predict the future by extrapolating from evidence in the past and present. Systematic reviews only tell you what worked (or didn’t) and, as Dylan Wiliam famously remarked, “everything works somewhere; nothing works everywhere”.

Even the most rigorous research won’t guarantee that something will work for you.

What about our barbeque? Well, the synoptic forecast said that there was only a 25% risk of rain, so my local guy’s prediction was supported by other more systematic evidence.

Good enough to go ahead – but warm up the oven too just in case!

More Information

This blog was first published on the CUREE website, and is part of three of a series of articles on meta-analysis published to coincide with a panel discussion at the resarchED national conference.

Part one: Serious critiques of meta-analysis and effect size: researchED 2018

Part two: What should we do about meta-analysis and effect size?