Evaluation for the times we are in — from ‘What Works’ to more complexity and systems aware practices.

Kerry McCarthy
10 min readJul 27, 2023


In 2022 I was commissioned by Joseph Rowntree Foundation to write a paper introducing the different ways we conceive of ‘impact’. The paper included a detailed look at the ‘what works’ tradition, dominant in the UK, and some of the ways evaluation is evolving to better meet the needs of complex, systems focused work, like JRF’s Emerging Futures. This article is a shortened version of that paper.

My perspective comes from twenty years of working as an evaluation practitioner, with a growing sense of unease over whether traditional approaches are really meeting the needs of today’s more complex, systems focused work. This unease has led me to reflect on my own practice, and explore how others are thinking about and practicing evaluation in new and different ways.

“We need to understand our impact

People often start a conversation with me by saying something like “we need to understand our impact”. If I don’t respond straight away the pause is usually filled with an elaboration of what that means to them.

There are no rights or wrongs in these elaborations, ‘understand our impact’ often involves a wide range of motivations and questions, in different combinations, for different aspects of their work. Most common are:

Influencing: ‘Are others interested in what we are doing?’, ‘Are they trying to learn from us, or do something similar?’; ‘Are we being asked for our opinion?’; “Are we shaping how issues are thought and talked about?”

Proving: ‘Has it worked how we thought it would?’; ‘What was unexpected?’

Improving: ‘What have we learnt from doing this?’; ‘How could we do this better?’; ‘What is our next step?’

How work is experienced: ‘Is it addressing peoples’ concerns, their needs as they experience them?’, ‘Do people feel good or bad about being involved with this work?’.

Change: ‘Has the change we are aiming for happened?’; ‘By how much?’; ‘How long will the change last?’; ‘How much of the change are we responsible for?’

Accountability: ‘How has this money been used?’, ‘Is it in line with our organisation’s mission, with the objectives that we signed off on?’, ‘Is it in line with the values and principles that guide us?’

Future planning: ‘Should we do more or less of this?’; ‘Should we do a lot more of this, and persuade others to do it too?’; ‘Is it a good use of resources?’.

Evaluation and learning activity can be about all of these things, and also about: supporting innovation and adaptation (developmental evaluation); involving stakeholders to transfer power and / or support social action (participatory action research, peer research); evaluation as a tool for and of equity (Equitable Evaluation Framework); focusing on the intended uses by intended users in a given context (utilization focused evaluation) and so on. For more, see Michael Quinn Patton’s video on the extent of the evaluation landscape.

What Works

In my experience, when people expand on ‘we need to understand our impact’, rarely do they mean they want to specifically attribute a set of outcomes to their work by employing evaluation designs that include a counterfactual (commonly a comparator group).

But it is this, quite specific, definition of impact as causal attribution, established through RCTs and quasi experimental methods, that has traditionally been dominant in the UK as the ideal standard of robust or rigorous evidence of ‘what works’.

Traditional approaches to impact evaluation — concerned with assessing how the work being evaluated affects and impacts on a set of predetermined outcomes — is rooted in Newtonian thinking, that cause and effect can be isolated, separate from the wider contexts in which policies or programs are operating. Evaluation is often described as having ‘grown up in the projects’. For an individual project it is more possible to develop a theory of change, showing causal pathways, as a basis for testing predetermined, clearly defined goals. All of which can be suitable for a stable or standardised intervention, only likely to change through tweaks to an existing design, operating in a less complex environment with fewer stakeholders, and where the purpose of the evaluation is to demonstrate impact.

We know lots of work does not fit this description; increasingly we recognise how discrete projects don’t operate in isolation from each other and the multiple systems they are part of. Challenges like poverty are interrelated with others, and are systemic, adaptive, and emergent in nature. Work to address them takes place in contexts that are constantly changing, and which are influenced by events, policies and other programmes of work, in ways that cannot be predicted. Work can cut across multiple systems, involve many different activities and interventions, and involve large numbers of stakeholders (organisations and individuals), who will bring their own perspectives on how things should work in practice and what counts as evidence of having an impact.

There is an important role for RCTs and quasi experimental methods, and the ‘what works’ movement has been influential in positive ways. It has encouraged focused thinking on outcomes and impact, and the critical thinking that goes along with this. Through ‘what works’ evaluation we can learn about very specific mechanisms of change, how they can be tweaked to be more or less effective. Some What Works Centres provide easy access to validated and reliable measures, which support organisations with more consistent approaches to measuring impact, along with helpful overviews of existing evidence and where there are gaps.

This is not about dismissing impact evaluation, it has a role. But by expanding our view of what is possible in how we approach evaluation we can make more intentional choices that best fit the context and purposes of the work.

Applying a ‘what works’ model to learning about your work signals an intention to ‘test & prove’ a pretty well defined intervention, or perhaps isolated aspects of an intervention. For exploratory or complex work, an approach that will support ‘improving and adapting’ is likely to be more appropriate.

If the approach to evaluation and learning is not sufficiently sensitive to context and forces an over simplification of conclusions, we risk undermining the effectiveness of the work itself. We can miss less straightforward, shifting, emergent causal chains; variations in what is happening in different settings (for example different kinds of self-organisation and adaption); the influence of wider context; interaction between local systems and so on.

Stakeholders’ experiences being distorted or ignored is a particular risk for initiatives where synthesis and sensemaking of multiple perspectives is important to overall learning and direction setting. As Martha Bicket and colleagues note in their article on bringing complexity thinking UK Government evaluation guidance:

“Choice of evaluation approach and methods is often driven by being concerned about ‘getting the right answer’ but, when working with complex adaptive systems, the greater concern should be ‘how not to get an answer that’s very wrong’. In a complex setting, there are a number of reasons why an evaluation can result in wrong conclusions being drawn or generate findings that key stakeholders find difficult to accept. The most obvious cause is choosing an evaluation approach that fails to reflect the complexity involved, leading to overly simplistic, or misleading, conclusions being drawn.”

Reliance on statements about ‘what works’ risks missing or stifling potential, and minimising the need for ongoing improvement. It can support ‘silver bullet’ thinking, which is unhelpful and unrealistic for addressing complex challenges. More than ever, the nature of new, complex, systems focused work requires evaluation to be a reflective and questioning endeavour, one that supports the learning capacity and adaptation of the work as it is happening.

So what should we be doing to support complex, systems shifting work?

The good news is there is lots of thinking and practice underway to support complexity and systems aware approaches to evaluation and learning.

Since 2016 a coalition of UK research councils, government departments and agencies, the Centre for the Evaluation of Complexity Across the Nexus (CECAN), has been pioneering and promoting new policy evaluation approaches to make it fit for a complex world. In 2020, the UK Government’s official guidance on policy evaluation was updated to include a supplement on complexity.

As early as 2014, Hallie Preskill and Srik Gopal were setting out how to approach evaluating complexity, along with examples of specific methods and case studies where they have been applied. Michael Quinn Patton’s work on principles focused evaluation, offers an approach whereby effectiveness principles guide choices and decision making as paths forward evolve in real time.

You can see the influence of a principles approach in, for example, the Rockefeller Philanthropy Advisors Shifting Systems Initiative evaluation, exploring the extent to which the philanthropy sector has embraced the concept of systems change:

“Learning-led evaluations support strategists and implementers who work in complex environments, helping them assess the patterns and dynamics at play in the systems in which they work. Unlike evaluation approaches that are built on linear thinking and seek determinative answers, this evaluation intends to provide insights that will help SSI grapple with the strategic choices that take place in complex environments, where there is not necessarily a unique, correct, or linear way forward.”

UNDP’s Strategic Innovation Unit has been hosting an online M&E Sandbox. These participatory, practice focused sessions recognise the need to rethink monitoring and evaluation to be more coherent with the complex nature of the challenges facing the world today, along with collating an overview of some of the methods and resources out there.

Evaluation and learning to support complex and systems shifting work does not have the same tried and tested lineage as more traditional approaches to evaluation. There is much less infrastructure in place to share practice, a lack of the guides and toolkits available for ‘what works’. We need to adapt commissioning and contract management to allow us to work in new ways. Experienced practitioners like me have few direct experiences to share, of our own or from our peers, we too are learning as we go.

Perhaps there is advantage in not having a well-trodden path to follow, to encourage definitions of impact and design of approaches that are most useful to the context. And to encourage new configurations of insight, methods and skills from different disciplines. Combining, for example, data and analytic capabilities, participatory and equity approaches, systems and complexity science, the arts, indigenous insights and so on.

What is clear, however, is that new approaches to evaluation are about more than trying out some new methods, in what is essentially the same framework. It also involves re-thinking some fundamental, underpinning concepts.

Rigour starts with how we think about what we are doing, not the methods we use.

Kelly Fitzsimmons of Project Evident, reflects on how rigid hierarchies of evidence, where some approaches are held to be more rigorous or valuable than others, support decision making and resource allocation in a way that biases towards status quo and limits the kind of evidence we look at, to the detriment of equity, inclusivity, learning, and practical usefulness.

Project Evident calls for an emphasis on supporting practitioners working in new ways to interrogate the thinking behind their approach by including evidence reviews and needs assessments that meaningfully incorporate input from the people who will be participating; for building actionable evidence that is focused on community context and impact over academic relevance, with a focus on testing and learning to support innovation in response to emerging needs, “think R&D, not compliance”.

“Importantly, these proposed revisions do not discard rigor, but rather attempt to reclaim rigor with practical relevance and credibility. Rigor and credibility are important in all evidence building, whether the evidence is used to inform public spending or to support continuous improvement. But for too long, we have falsely equated rigor with randomized control trials (RCTs) alone, when in reality, rigor applies throughout the tiers, from early-stage evidence gathering to large-scale evaluation design and implementation. We believe that practitioners should strive to build evidence that is seen as credible not just in the eyes of researchers, but also in the eyes of those who are most proximate to the challenges being addressed and often the ones providing the data.”

Others have proposed new definitions of rigour. Lynn & Preskill include quality of thinking; credible and legitimate claims; cultural context and responsiveness; and the quality and value of the learning process. And the American Statistical Association call for “moving to a world beyond “p < 0.05””, to accept uncertainty, be thoughtful, be open, be modest (using the acronym ATOM).

Purpose and accountability

I usually bring a utilization focused approach to most of my work, asking “who are the intended users of this evaluation activity, and how will they use it?”, and designing back from this starting point. There are interesting perspectives about how to expand our vision of the uses for evaluation, to better support complex, systems focused work.

Emily Gate’s talks about expanding the role for evaluation from a determination of value of some bounded initiative, to continuously developing value within systems change efforts. By drawing on data, deliberative processes and different perspectives to understand: what is business as usual, what is emerging practice, how it makes a difference, and what other factors affect the emergence of this practice? Evaluation can help guide where to go next, focusing on processes for generating systems changes and the conditions to sustain them.

This expansion of the classic definition of evaluation from determining value / evaluative judgment (i.e., how well did we do); to developing value, a continuing, co-constructed process of evaluative learning and deliberating (i.e., what should we do next) can be found in the open access New Directions in Evaluation Special Issue: Systems & Complexity Informed Evaluation: Insights from Practice Issue, along with many useful discussions and examples from practice.

This role for evaluation, from judging the work to supporting the work, changes the nature of accountability. From results-based management and reporting outcomes, to being accountable for why the work is going in the direction it is. Data is in service of the learning needed to do the work, so yesterday’s learning informs tomorrow’s practice. As Ruth Richardson and Michael Quinn Patton describe, the role is “to infuse team discussions with evaluative questions, thinking and data to facilitate systematic data based reflection and decision making in the developmental process”.

This brings us back to the questions I like to start with “who are the intended users of this evaluation activity, and how will they use it?”.

For those in positions of power, holding work to account through reporting and governance structures, there are explicit choices to be faced.

What trade-offs are you making in how evaluation and learning resources are being deployed?

If the work itself is exploratory, feels risky, is full of unknowns, is evaluation and learning activity expected to provide you with reassurance, that the work will be ‘what works’?

Or is this resource (money and effort) being deployed to help guide where the work goes next, when the destination, and sometimes even the next step, is not clear?