Skip to main content
SearchLoginLogin or Signup

Communicating the ‘stories’ of policing evaluations through theories of change: a case study using New Zealand Police’s Tactical Response Model

Testing police practice is the essence of evidence-based policing, and great strides have been made in past decades assessing myriad police activities through impact evaluation. But policing initiatives often suffer from a lack of clarity about how the changes in police ...

Published onMar 25, 2024
Communicating the ‘stories’ of policing evaluations through theories of change: a case study using New Zealand Police’s Tactical Response Model


Testing police practice is the essence of evidence-based policing, and great strides have been made in past decades assessing myriad police activities through impact evaluation. But policing initiatives often suffer from a lack of clarity about how the changes in police activities will bring about the changes in desired outcomes. Such obscurity hinders communication, both internally within the police agency, and externally to other police agencies through the reporting of the evaluation. Police would therefore benefit from a tool that helps to communicate the intention and mechanics of policing initiatives. A ‘theory of change’ approach could fulfil this purpose yet may not be intuitive to evidence-based policing practitioners. In this paper we present a case study from the evaluation of a large-scale police initiative, that employed a theory of change to make sense of the impact evaluation findings. Through the case study we aim to demystify the process for other evaluators so that police can draw on one of their great strengths: storytelling, to better communicate evidence-based practice.

Keywords: police reform, causal process, mechanisms, translational criminology


The first wave of the evidence-based policing movement promoted the habitual testing of police practices through impact evaluations and randomised control trials: RCTs (Knutsson & Tompson, 2017). This wave generated foundational and, in some cases, substantial evidence about police practices that appear successful according to their aims (e.g., body worn cameras: see, among others, Ariel et al., 2015). Importantly, it also found that some initiatives can produce unwelcome impacts on reoffending (e.g., Scared Straight, see Petrosino et al., 2013).

Traditionally, impact evaluations were viewed as ‘black box’ operations because analysis focused on the inputs (the resources put into an initiative) and outcomes (changes that occur), with little regard for what happened in-between (Pawson and Tilley, 1994, although see Green et al., 2010 for a counterargument). They did not capture the ways in which the initiative unfolded. Therefore, although impact evaluations can inform us whether something ‘works’ they generally cannot discern why something works (Cowen & Cartwright, 2019). Moreover, impact evaluations and evidence syntheses thereof are typically silent on how to get that something to work (Fagan, 2017; Tompson et al., 2021). These features of impact evaluations limit their practical value to practitioners and policymakers in various ways.

A lack of clarity on why a new (or existing) police practice ought to be effective in changing a particular outcome hinders understanding about the initiative and can lead to failure to identify and capture data along that causal pathway between initiative activities and outcomes (Belur et al., 2023; Weisburd et al., 2015). Further, this lack of causal clarity can impede communication—firstly to the leaders and frontline staff in the police agency required to enact the change in practice, and secondly to other police agencies who could benefit from understanding the aims and mechanics of an initiative.

Internal communication is important because without sufficient awareness of why they should be doing things a certain way, frontline officers may not fully grasp the importance of following a protocol required by the initiative. A lack of clear communication from the leaders—or evaluators—to the ‘doers’ of an initiative can undermine completion of the tasks needed to enact a causal process that flows through to the desired outcome(s). Or, otherwise put, not being able to explain why frontline officers should be doing something different to business-as-usual risks implementation failure.

External communication is important because it provides an evidence base from which others can learn and build on. Police leaders are encouraged to judge whether evidence from elsewhere can be extrapolated to their policing issue and context (Sherman, 2013). Hence, police leaders are likely to be thinking more broadly than ‘what works’, rather ‘is this likely to work in my jurisdiction, with my resources’. They want to generalise. Generalisability—also known as external validity—is achieved when the mechanism of an initiative is shown to replicate across different settings, times, and/or populations. Mechanisms are a process by which something takes place, a causal relationship between action and effect. However, in the absence of a mature evidence base, it can be challenging to predict how a mechanism will play out in a given context.

To help understand how an initiative works—if indeed it does—a different form of evaluation is required: process evaluation. Process evaluations capture the inner workings of an initiative: the who did what, when, and the myriad barriers to the implementation working smoothly. Process evaluations can sometimes surface the ‘why’—the causal processes or mechanisms through which initiatives work. Whilst other evidence-based domains (such as medicine) may have well-developed evidence bases from which knowledge can be extracted about mechanisms before undertaking an RCT, the same cannot be said for policing (Cowen and Cartwright, 2019). To use a colloquialism, we are often compelled to ‘build the plane while flying it’ when designing and evaluating policing initiatives.

Even if there is strong impact evaluation evidence suggesting that a police initiative ‘works’ in one set of circumstances, we cannot simply generalise from this evidence to conclude the initiative will be successful elsewhere, because policing initiatives are embedded in social systems that are perpetually changing. For instance, such initiatives often involve multiple interrelated actors. Human behaviour, influenced as it is by perceptions, emotions, and motivations, is not as predictable as one would want when replicating an initiative. Implementers can change the context in which an initiative occurs, sparking a feedback loop whereby the context changes the initiative (Pawson and Tilley, 1997). For example, in a policing initiative to support victims of a particular crime type, victims might not behave as assumed and the initiative is tweaked in response.

Further, policing initiatives occur within adaptive social systems. They typically seek to change components of that system, whether that be a localised or small-scale system or large-scale—even organisation-level—system. Such systems involve the interplay of many individuals and institutions, and micro and macro social processes (Pawson and Tilley, 1997, ch. 3). Consequently, a hallmark of policing initiatives is complexity. Such complexity can be difficult to conceptualise and verbalise, meaning that both the implementation and evaluation of policing initiatives can suffer from a lack of clarity. Without a central ‘narrative’ to make sense of the complexity, the evaluation can become a collection of miscellaneous facts, hindering understanding and underserving evidence-based policing. We therefore need tools that enable sense-making and narrative-building in evaluating policing initiatives.

In this article we argue and illustrate that theories of change can be a useful tool for communicating the ‘stories’ of evaluations. Theories, generally speaking, are systematic explanations of relationships between concepts, and are fundamental to research as they provide a means of testing predictions about the world. Theories of change are precisely what they imply: they are explanations for why we might see changes in outcomes—the cause and effect or causal process. Although theories of change underpin all policing initiatives, they are usually implicit, rather than clearly articulated. Our contention is that theories of change can help to arrange knowledge about a policing initiative so that it can be expressed in our primal human preference for stories (Olson, 2015).

Police are avid storytellers, particularly in the informal corners of their professional worlds (known as canteen culture, see Reiner, 2010). Thus, a tool that can help police to tell ‘stories’ about the initiatives they run would facilitate communication about innovations in practice, both internally and externally. Further, a way of capturing and explaining the supposed causal process through which initiatives exert their effects would be a helpful translational tool for both evaluation and communication purposes.

A theory of change approach

Originating from settings marked by ‘uncertainty and emergence’ (VanTulder and Keen, 2018), theories of change (ToCs) have evolved since their original incarnation by Weiss (1977) into a systematic method that can provide clarity amid complexity. At its core, a ToC explains, step-by-step, how an initiative is intended to work (Belur et al., 2023). For example, consider an initiative to reduce crime through increased street lighting. A ToC would logically chart the causal process from the initial action of installing lights, through intermediate steps (e.g., people noticing the lighting changes, increased visibility at night deterring offenders, increased feelings of safety encouraging greater foot traffic), to the outcome of reductions in crime (Welsh and Farrington, 2008). The ToC helps to organise knowledge about multiple causal pathways that might be activated by the initiative and to identify various outcomes that might be realised (e.g., reductions in crime, reductions in fear of crime, improved public-police relations). ToCs can be intricate and precise or can be applied at a mid-range of theorising, as recommended by Cowen and Cartwright (2019), to produce ‘street-level theories of change’ which explain generalities, rather than specificities, in behaviour. Hence, the example above might fall into the ‘street-level’ category for it seeks to explain how street lighting in general might reduce crime. If applied to a very specific location, a theory of change would likely need more precision (i.e., with regards to who uses that location, for what, and how they might respond to the lighting changes). Regardless of the level of abstraction employed, ToCs serve to reduce the complexity of initiatives in the social world down into more digestible sequences of events.

A ToC’s ability to unravel the mechanisms at play within an initiative can also identify important data collection needs and opportunities. By articulating mechanisms, ToCs highlight measurement points—interim outcomes—enabling a nuanced evaluation of the intervention's effectiveness (Tompson et al., 2020). Returning to the street lighting example, if people were unaware of the lighting changes in their area (as was found in Perkins et al, 2022 for reductions in lighting), then the subsequent causal chain involving behaviour change is implausible. Measuring interim outcomes along the theory of change enables the discernment of whether the initiative is operating as intended. In the lighting example, it would not be the street lighting itself that had failed to cause reductions in crime, but the absence of an additional component (such as the lights needing to be brighter or the lighting changes widely publicised) necessary to activate the mechanism causing behaviour change. Hence, ToCs can enable important insights into explanations of success or failure of initiatives.

Theories of change align seamlessly with the realist evaluation approach, which seeks to understand what works for whom and in what circumstances (epitomised in the EMMIE—Effect, Mechanism, ​Moderators, Implementation, Economics—framework by Johnson et al., 2015). The EMMIE structure calls for evaluations to explicitly consider mechanisms and context (moderators), with these elements also increasingly required in translational criminology products (e.g., Campbell Collaboration systematic reviews). However, most empirical research on crime prevention and policing has underexplored mechanisms and moderators (Tompson et al., 2021). This deficiency in the literature limits our understanding of the contexts in which initiatives are effective and, perhaps more importantly, the contexts in which mechanisms can backfire to produce unwelcome outcomes. ToCs can bridge this gap, providing a lens through which the generalisability and portability of initiatives to different contexts, times, and populations can be assessed. Ultimately this lens assists in maximising the positive outcomes of policing initiatives, such as the Tactical Response Model—the initiative we use in this paper as a case study to illustrate the benefits of a ToC approach.

Case study: The Tactical Response Model

The police role is unique in society; the core function of the police exposes officers to risk by way of the unpredictability of the behaviour of the people they encounter (Crank, 2004). The Tactical Response Model (TRM) was designed to be a safety system to make New Zealand Police (NZP) staff, and the communities they serve, safer and feel safer. The catalyst for the TRM was the 2020 fatal shooting of an on-duty constable by a member of the public, a rare event in the context of policing in New Zealand. However, perceptions of the risk environment had heightened in the years preceding this event, in part reflecting an increase in firearms in circulation among the offending population (New Zealand Police, 2022a). These views were amplified by frequent media coverage about officer safety, causing an increase in anxiety for both officers and their families (Seals, 2022). Although the risk of fatal or serious injuries to officers is much lower in New Zealand than in other countries, both the rate of assault on police officers per 10,000 events attended and the rate of firearms victimisation in the community (that police had to respond to) had increased from 2018 to 2022.

The senior leadership of NZP were mindful of the consequences of a lack of safety on morale and job satisfaction and the organisation’s duty of care to frontline staff. Moreover, they were cognizant that intense negative emotions can undermine rational decision-making (Smith & Ellsworth, 1985, cited by Loewenstein & Lerner, 2003) and officer performance in high-risk situations (Jenkins et al., 2021). In addition, heightened emotional arousal to future anticipated events means that people find cold comfort in probability-based estimates (i.e., of their risk). A reimagining of human and other resources devoted to tactical incidents was therefore required to address feelings of a lack of safety.

The overarching aim of the TRM was to strengthen NZP’s capability to better understand, prevent and/or respond to high-risk and critical incidents. Its desired outcomes were increases in safety and feelings of safety for frontline staff and communities. As a large-scale system, the TRM involved several interdependent components. These included a) tactical training for frontline staff; b) increased specialist capability to respond to medium- and high-risk incidents; and c) new risk-based deployment processes, which included tactical intelligence, 24-hour coverage of command centres and double-crewing1 after 9pm. The TRM was trialled in four of the 12 police districts from November 2021, with the evaluation period being January-June 2022. Due to heterogeneity in geography, public demands on police, leadership structure and centralised processes, implementation was differentially tailored to each district.

Although all three ‘pillars’ of the TRM could notionally contribute to all the desired outcomes, the evaluation report focused on the most plausible casual processes, given the data trends analysed and the limited evaluation period for processes to play out. For this case study, we focus on the training component—the Frontline Skills Enhancement in District (FSED) course delivered to frontline officers working in Public Safety Teams (officers primarily responsible for responding to calls for police service) and Road Policing.

The FSED training was intended to improve trainees’ understanding of how to manage the cognitive load associated with high-risk situations. In turn, better cognitive load management was believed to improve risk assessment, decision-making and communication. The training was evidence-based insofar as it enabled trainees to practise stress management skills and tactical skills under conditions that approximated the stresses experienced in the operational environment (Robson & Manacapilli, 2014). Research suggests that such scenario-based practice is one of the most effective police training approaches in high-risk interactions with the public (Preddy, 2018). The four-day scenario-based training focused on appropriate tactical responses and de-escalation techniques in specific situations. Importantly, the trainers were current Armed Offenders Squad (AOS) officers, lending crucial operational credibility to the training content. Another important training feature was that frontline staff were collectively trained in the teams (sections) they worked in.

TRM evaluation method

The TRM evaluation included both process and impact evaluation and was undertaken by an in-house research2 team at NZP’s Evidence Based Policing Centre, with support from two university research teams. The high-profile nature of the TRM, and attendant government funding, meant that the evaluation was wide-ranging and employed myriad data sources. A mixed-methods evaluation approach enabled the triangulation of data to minimise the limitations of any single data source. The methods included:

  • Quantitative analysis of perceptions of safety and wellbeing collected through pre-and post-TRM waves of a national survey of frontline staff (the ‘frontline safety survey’).

  • Quantitative analysis of administrative data (including about specialist capability team deployments, police use of force, citizen complaints, assaults on police and firearms offences).

  • Quantitative analysis of survey data from trainees (training survey).

  • Thematic analysis of interviews and focus groups with staff affected by the implementation of the TRM, at months 2-3 and 5-6 of the trial, focusing on perceptions and impacts of the trial period.

  • Qualitative synthesis of observational data collected at training events.

Theory of change method

Transcription and analysis of interview and focus group audio recordings was undertaken by university researchers, with ethical approval granted by the University of Waikato (ref: FS2022-04). Survey, interview, and focus group participation was voluntary and informed consent was obtained from all participants. Interview and focus group recordings and transcripts were transferred via a secure filesharing platform and deleted from the platform once the project concluded. All university researchers signed confidentiality agreements prohibiting sharing individual participant identities with others. Information relating to participants’ identities was redacted during transcription to provide further safeguards for the participants.

Integrating the multitude of findings from the many data sources and methods into a coherent narrative was demanding, coupled with the challenge of communicating the complexity of the system change generated by the TRM. The ToC started life as working theories generated by the in-house research team to make sense of this complexity when identifying outcome and intermediate impact3 measures, and when interpreting results. Subsequently, the lead author abductively generated themes from the interview and focus group transcripts using her experiential knowledge of policing (Thompson, 2022). This experiential knowledge was gained from five years working in a policing agency, followed by over 15-years of being a ‘critical friend’ to police agencies in a research capacity. This knowledge related to police culture, police training and policing to support crime prevention; importantly, it did not extend to experience of tactical policing.

The analytic approach taken by the lead author was steered by the objective of surfacing theories of change about how the TRM had exerted its effects on feelings and experiences of safety. Sense-making sessions were held between the university team and police research team, whereby visualisations of the casual process were presented, to verify assertions and clarify causal processes. Observations that emerged from quantitative data analysis and other aspects of the fieldwork (i.e., were not captured in the transcripts) were triangulated with postulated chains of events.

Evaluation results

The findings of the TRM evaluation were written up in an evaluation report (New Zealand Police, 2022a), accompanied by a technical appendix with supplementary methodological detail (New Zealand Police, 2022b). Because the report was for a lay audience, the findings were not couched in ‘theory of change’ terminology. Instead, they were framed as ‘pathways to safety’, to connote the causal process from TRM activities (e.g., training) to interim impacts (e.g., improved decision-making), through to system-wide outcomes (e.g., improved safety), in an accessible way for a diverse readership. The choice of the word pathways was deliberate; it conveyed that there might exist multiple routes (mechanisms) to the desired outcomes and signalled that these may be overlapping and intersecting.

Here we present two versions of ToCs for the FSED training. The first ToC (Figure 1) is the simplified representation of the main causal pathways from the training to increased safety and feelings of safety. This version reflects how the findings were written up in the TRM evaluation report and is a distillation of Figure 2 below. For an external audience, wishing to understand the initiative in general terms, this level of abstraction was deemed appropriate. The temporal sequencing of the steps in the process, revealed by the ToC, lends itself to talking about the effects of the initiative as a story (i.e., what happened ‘next’).

Figure 1 – simple theory of change plotting the causal process for the FSED training, through interim impacts (dark grey boxes), to realise the system-wide outcomes (black boxes)

The second ToC (Figure 2) more precisely specifies how the causal chain might operate to influence the overall intended outcomes of the TRM to tell the initiative’s ‘story’. It aligns with what Cowen and Cartwright (2019) refer to as a ‘street-level’ ToC because, while not accounting for all eventualities, it explains in detail the steps from mechanism to outcomes and exposes the necessary preconditions that must be in place for each step to produce the next. This version would be relevant to police responsible for ensuring the causal mechanisms were firing as the initiative was rolled out more widely or replicated. We now elucidate the causal process, illustrated in Figure 2, and how it emerged from the voices of trainees to form a narrative about how the initiative worked.

An overarching theme that emerged from officers—both trainees and non-trainees—was that there was a general expectation that training would improve competencies. Expectancy theory, which comes from organisational psychology, hypothesises that training motivation and outcome expectancy are important intervening processes for realising training outcomes (Scaduto et al., 2008). In communicating that they expected the training to be effective, officers were exhibiting the expectancy and instrumental aspects of training motivation. These likely primed trainees to engage effortfully with the FSED training.

The training box in Figure 2 articulates some of the facilitators that were required to ensure training outcomes were maximised. These included an easy and accessible booking system, credible trainers, scenarios grounded in real-life situations requiring discretionary police decision-making, training together as a ‘section’ (team) and annual refresher training. Most of these facilitators were in place when the TRM was trialled, albeit some aspects needed to be fine-tuned after launch.

Figure 2 – complex theory of change for the FSED training, emerging from the qualitative data

The scenario-based element of the training was viewed as vital for multiple reasons. The first was, simply, the officers enjoyed engaging in what they saw as ‘proper police work’ (i.e., tactical competencies that are part of police officer’s self-image). These worked through scenarios boosted morale and, irrespective of any skills uplift, was reported to increase feelings of wellbeing, which contributed to police feeling safer. Secondly, the realism of the scenarios, combined with the fact that the trainers were current tactical operational staff, meant that trainees saw the training as credibly contributing to their self-efficacy (see Schwoerer, et al, 2005 for a general exposition of this term):

“So, it seemed to be based off, like they've really said, based off their experience like trainer's experience. So, you made it feel like you could actually use, and adapt what you were learning.” Participant FG3

“Although role playing scenarios etc are very much out of my comfort zone, I recognise that running through the high-risk vehicle stop scenarios opened my mind up to how I can train my mind to think about TENR [Threat Exposure Necessity Response], safety and how to put some of my tactical training to work.” Participant N8

These reasons combined suggest that, for many trainees at least, the prospect of ‘training transfer’ (i.e., that the knowledge and skills learned in the training could be transferred to the operational environment they worked within) was tangible. The opportunity to put the training into practice assisted with consolidating skills development and further cemented appropriate assessment of tactical options.

Quantitative analysis of the training reaction surveys lent credence to this ToC pathway. Trainees overwhelmingly agreed that the training had positively influenced their understanding and competence in tactical safety skills (all relevant questions received positive responses from over 93% of participants – see Tables 8.1 and 8.2 in New Zealand Police, 2022a). Qualitative analysis of the free text field answers in the survey complemented this finding, with officers’ answers exhibiting perceptions that the FSED training would result in better decision-making in the future. Generally, frontline safety survey respondents who had received FSED training were significantly more likely to report positive feelings of confidence than officers who had not received the training. Further, officers who had received FSED training were significantly more likely to hold perceptions that safe outcomes would be achieved in high-risk scenarios that had been covered in the training, in the community, compared to officers who had not, indicating that the skills uplift embedded in the FSED training could translate into safer outcomes for both officers and the community.

Another pathway to safety was through enhanced teamworking. This pathway played out in several ways. For example, training as a team strengthened the bonds and cohesion within the team, as expressed by one participant,

“… [it] feels a lot safer and yeah, like we, we work more as a cohesively as a team, whereas yeah, we used to probably work more as individuals.” Participant FG18

Further, through doing the training together, a team could understand how to work optimally together:

“Everybody has different abilities and strengths and weaknesses. You need to understand those [to] work effectively as a team.” Participant Q9

Lastly, the training enhanced officers’ trust in the shared competencies across the team:

“I feel safer working with people all the time who have the same level of training that I do.” Participant B3

The FSED training covered how stress influences cognition and promoted a ‘slow thinking’ approach that enabled more conscious risk assessment to be undertaken by officers. This pathway is indicated in Figure 2, with the mechanism being that slowing down decision-making can result in the contemplation of different tactical options. In situations involving a high degree of discretionary decision-making, this slowing down was seen as a distinctive advantage:

“I felt like the FSED course was actually like, 'hey guys, we've done all those tick boxes. We're actually here to learn what's gonna happen if a, you know, the shit hits the fan, like how do we respond to this and, and learn about yourself?” Participant FG7

Participants also appreciated the chance to practise tactical skills that they might not have cause to use often, which helped them to habitualise the skills and develop better quality instinctive ‘quick’ decision-making:

“at least with this training, you sort of can actually like apply it regularly enough to keep on top of it. …. having this kind of stuff's good cause it sort of keeps, keeps you fresh and you know, you ready, if something like that does happen.” Participant FG2

As well as covering appropriate tactical responses and de-escalation techniques, the FSED training included firearms training. As NZP officers are not routinely armed but have access to a secured firearm in police vehicles, firearm carrying is not the norm. Before the TRM officers obtained an annual certification in firearms but did not have opportunities to use the firearms in a frequent or realistic way outside of that. This lack of regular opportunities to assess their own competency with firearms may have contributed to feelings of (un)safety. Trainees reported that that the firearms training helped them do their jobs better, which we infer might translate into increased safety for the community. As one officer explained,

“…with more training we're getting, the more confident and competent people are getting with their weapons. Just cause we're getting more and more familiar with the more training we get, and the more of that we do get the better we will be.” Participant FG28

Higher quality decision-making should, ostensibly, result in fewer officers assaulted, fewer incidents where use of force is applied and fewer citizen complaints about use of force. Analysis of administrative data comparing the trial districts’ trends with estimates of expected trends without the TRM largely supported these outcomes. Although there was no effect on total assaults on police in the evaluation period, analysis showed the proportion of assaults and the proportion of use of force events that resulted in injury to police reduced (with 90% probability). Further, the rate of use of force events reduced by around 17% (95% credible interval -5% to -28%). Further analysis revealed that this reduction was limited to reactive calls for service, lending weight to it being caused through the frontline FSED training pathways, because reactive calls for service are predominantly attended by frontline rather than the TRM’s specialist capability teams. Last, the trial districts received on average 29% fewer complaints (95% credible interval: -60% to 3%) about the use of force by police than expected without the TRM.

In addition to causal pathways with positive outcomes, the discussion in one focus group turned to potential backfire effects. Experienced officers conjectured that the training could prime officers to expect a higher level of risk in the situations to which they were deployed:

“…you've gotta be very careful cause in some of the occasions it's made people feel more at risk and their perception that is more firearms out there and there's more dangerous jobs. I think it's had an unintended consequence of, of that perception that they are in more danger rather than safer in many cases. Participant FG14

Hence, training using ‘worst case’ scenarios can produce and reproduce perceptions of threats to safety (Branch, 2021), and can serve to reinforce the pervasive self-image of policing as dangerous (Reiner, 2010, p. 119). Such perceptions need to be carefully managed as they can feed into cognitive biases that encourage the excessive use of force (Staller et al., 2022).

The use of force results described above suggest that any backfire effect on feelings of safety did not flow into such negative behavioural outcomes that would make the community less safe. Although FSED trained frontline survey respondents felt less safe in their duties in general than did other respondents to the frontline survey they were more likely than other respondents to feel safe during the specific scenarios they had been trained in and respondents to the post-training survey overwhelmingly reported their feelings of safety had improved. Moreover, both surveys provided strong quantitative evidence of improvements in trainees’ confidence and (self-assessed) competence, indicating that even if they felt less safe, they felt better equipped to manage unsafe situations—with this effect playing out in the use of force results (New Zealand Police, 2022a).

That said, in their free text answers, a small proportion of training survey respondents reported not feeling confident about applying the training in real-life situations. However, a lack of confidence is not necessarily a negative outcome: the relationship between confidence and competence is not linear. In some ways a lack of confidence increases one’s self-awareness and appraisal of one’s abilities. This may be preferable to over-confidence, particularly in situations involving high-risk. An over-confidence effect—whereby officers are too quick to use force—was not borne out in either the qualitative or quantitative (use of force) data.

Overall, the data analysed after the TRM had been trialled for six months—at which point not all frontline officers had completed all training days—showed that the behavioural benefits attributable to the training were largely confined to use of force scenarios, with less use of force an indicator of community safety. It is possible that the training saturation that was intended post-evaluation will contribute to the police being safer as well. The generally positive effects on officers’ feelings of safety are supported by the myriad causal processes, surfaced in the data and represented in the ToCs (Figures 1 and 2), that collectively comprise the narrative or ‘story’ of the initiative.


In this paper we illustrate how employing a theory of change (ToC) approach in an evaluation of a complex police reform initiative enabled the construction of an accessible narrative about the findings. The case study outlined the causal pathways underpinning enhanced frontline officer training, one component of the larger Tactical Response Model (TRM) initiative developed by New Zealand Police (NZP) to increase officers’ and communities’ safety and feelings of safety. The resulting ToCs depict multiple overlapping ‘pathways to safety’ through which training frontline staff in cognitive and tactical skills might achieve the desired long-term safety outcomes. This inherent complexity, typical in policing initiatives that are implemented in the messy social world, is difficult to convey without a ‘story arc’. We contend that ToCs provide a fruitful way of capitalising on the (often covert) practicality of theory in evidence-based policing.

Using a ToC approach in evaluations of policing initiatives offers significant benefits throughout the research process. A ToC systematically outlines an expected chain of events, from initial actions to desired outcomes, enabling researchers to gain a comprehensive understanding of the initiative's underlying logic and objectives. It acts as a conceptual roadmap, crystallizing the aims and purpose of a policing initiative to enable the design of rigorous evaluations grounded in a precise comprehension of what the initiative seeks to achieve—and how. Indeed, this is how the in-house police research team started planning their evaluation and the data they sought to collect.

Further, a ToC can serve as a powerful communication tool, facilitating engagement with practitioners. In the case study example here, the initial ToC helped to clarify what was expected to happen when the trial began to the myriad NZP researchers working on the evaluation. And the more formal ToC from the evaluation findings helped to shape the way those findings were communicated to internal and external audiences. Hence, a ToC can serve as a unifying framework, fostering a shared understanding among stakeholders, and promoting collaboration (Belur et al., 2023).

As happened in the case study illustrated herein, a ToC can also attune researchers to less satisfactory effects of an initiative and prompt the assessment of whether they are seen in the data. A thorough ToC not only outlines desired outcomes but also serves as a tool to reveal unintended consequences. It can enable researchers to anticipate and identify potential backfire effects, offering valuable insights to implementers for modifying components that do not appear to be working, rather than changing the entire initiative. Thus, it can save precious resources.

As seen in the case study, collecting implementation data, such as dosage levels (i.e., training saturation), helped to protect against disappointment with a lack of notable improvements in the officer safety outcome. The fact that other ‘impacts’ in the case study showed promise lent credence to the TRM being a viable initiative that was simply being measured prematurely. Hence, a ToC approach can be invaluable when the follow-up period of an evaluation is insufficient to evidence long-term outcomes (Belur et al., 2023). By recognising and specifying interim impacts, a ToC helps practitioners to avoid snap judgments and ‘throwing the baby out with the bathwater’ (Cowen and Cartwright, 2019). Instead, the nuanced understanding of causal chains allows practitioners to evidence initiatives with positive traction that may require additional time to fully realise their long-term goals.

Although the case study illustrates many of the ToC benefits described above, it naturally has some limitations. For example, the ToC was an ancillary project to the TRM evaluation, so the interview and focus group schedules did not explicitly ask participants about their working theories of the TRM. The data might have been richer and illustrated more nuanced pathways through the causal process had such questions been posed. Additionally, it is always possible in qualitative analysis that the data might be interpreted differently by other researchers. However, the sense-making sessions did not reveal any glaring omissions or misinterpretations among the multiple researchers working on the evaluation. Lastly, we only illustrate one, bottom-up data derived way of creating a ToC. We encourage other researchers to experiment with alternative ToC generation methods (e.g., triangulating literature with perceptions of implementers or recipients of initiatives) and to explore ‘what works’ in employing ToCs in an EBP setting.


To date, EBP has given primacy to evidence from randomised controlled trials (RCTs; Tompson and Knutsson, 2017). RCTs are good for internal validity but fall short on external validity (generalisability). They may be sufficiently informative when the initiative is being implemented in stable conditions, for example when the implementation obstacles have been identified and worked through. But they are ill-suited to pilot or complex initiatives that have a plethora of actors in a dynamic environment. If we wish to expand our understanding about what might work in different circumstances (be they policing jurisdictions, populations, or time points) we argue we need to focus on how initiatives work. Knowledge about mechanisms then promotes extrapolation, or generalising, which is at the heart of decision-making in evidence-based policing.

We argue, and have hopefully demonstrated herein, that the most profound practical value bestowed by ToC approaches is their communicative ability. That is, ToCs can organise and synthesise disparate indicators of impact (data trends) into a sequential ‘narrative form’, allowing practitioners to convey the complex causal pathways through which initiatives are expected to produce desired outcomes. In short, the ToC approach equips policing practitioners with a comprehensive and flexible tool that accommodates the dynamic nature of interventions and their evolving pathways towards success (or lack thereof).

We hope that this article will begin to socialise evidence-based policing enthusiasts to the idea that theory can be inherently practical in service of developing the evidence base. Theory can be deftly turned into ‘stories’ by the police, so expanding Sherman’s (2013) ‘Triple-T’ approach to Evidence Based Policing—targeting, testing and tracking—with an additional ‘T’: telling (Cowan and Williams, 2021). Such communication enhances opportunities for broadening and contextualising the evidence base required for modern-day policing.


Ariel, B., Farrar, W.A. & Sutherland, A. (2015). The Effect of Police Body-Worn Cameras on Use of Force and Citizens’ Complaints Against the Police: A Randomized Controlled Trial. Journal of Quantitative Criminology, 31, 509–535.

Belur, J., Tompson, L., & Jerath, K. (2023). A theory of change driven approach to evaluating a multi-agency stalking intervention programme. CrimRxiv.

Branch, M. (2021). ‘The nature of the beast:’ the precariousness of police work. Policing and society31(8), 982-996.

Cowan, D. and Williams, S. (2021). Editor’s Foreword. Police Science: Australia and New Zealand Journal of Evidence Based Policing. Available at:

Cowen, N., & Cartwright, N. (2019). Street-level theories of change: Adapting the medical model of evidence-based practice for policing. In Critical reflections on evidence-based policing (pp. 52-71). Routledge.

Crank, J. (2004). Understanding police culture (2nd ed.). New York: Routledge.

Fagan, A. A. (2017). Illuminating the black box of implementation in crime prevention. Criminology & Public Policy, 16, 451-455.

Green, D. P., Ha, S. E., & Bullock, J. G. (2010). Enough already about “black box” experiments: Studying mediation is more difficult than most scholars suppose. The Annals of the American Academy of Political and Social Science, 628(1), 200-208.

Jenkins, B., Semple, T., & Bennell, C. (2021). An evidence-based approach to critical incident scenario development. Policing: An International Journal of Police Strategies and Management, 44, 437–454.

Johnson, S. D., Tilley, N., & Bowers, K. J. (2015). Introducing EMMIE: An evidence rating scale to encourage mixed-method crime prevention synthesis reviews. Journal of Experimental Criminology, 11, 459-473.

Knutsson, J., & Tompson, L. (Eds.). (2017). Advances in evidence-based policing. Abingdon: Routledge.

New Zealand Police (2022a). Tactical Response Model Evaluation Report. Available at: (accessed 23 March 2024).

New Zealand Police (2022b). Tactical Response Model: Evaluation Report Technical Appendices. Available at: (accessed 23 March 2024).

Olson, R. (2015). Houston, we have a narrative: Why science needs story. University of Chicago Press.

Pawson, R., & Tilley, N. (1994). What works in evaluation research? British Journal of Criminology, 34, 291-306

Pawson, R., & Tilley, N. (1997). Realistic evaluation. Sage.

Perkins, C., Steinbach, R., Tompson, L., Green, J., Johnson, S., Grundy, C., ... & Edwards, P. (2015). Public views and private concerns: a rapid appraisal of the impact of reduced street lighting at night on well-being in England and Wales. In What is the effect of reduced street lighting on crime and road traffic injuries at night? A mixed-methods study. NIHR Journals Library.

Petrosino, A., Turpin‐Petrosino, C., Hollis‐Peel, M. E., & Lavenberg, J. G. (2013). 'Scared Straight' and other juvenile awareness programs for preventing juvenile delinquency. Cochrane database of systematic reviews, (4).

Preddy, J. E. (2018). Building a cognitive readiness construct for violent police-public encounters [Doctoral dissertation, Old Dominion University, Norfolk, Virginia, USA].

Reiner, R. (2010). The politics of the police. Oxford University Press, Oxford.

Robson, S., & Manacapilli, T. (2014). Enhancing performance under stress: Stress inoculation training for battlefield airmen. Prepared for the United States Airforce. Santa Monica, CA: Rand Corporation. (accessed 23 March 2024).

Scaduto, A., Lindsay, D., & Chiaburu, D. S. (2008). Leader influences on training effectiveness: motivation and outcome expectation processes. International Journal of Training and Development, 12(3), 158-170.

Schwoerer, C. E., May, D. R., Hollensbe, E. C., & Mencl, J. (2005). General and specific self‐efficacy in the context of a training intervention to enhance performance expectancy. Human resource development quarterly16(1), 111-129.

Seals, C. (2022). Tactical response model interview and focus group results: Thematic analysis – case study 2 and crosscase analysis. Commissioned report, Evidence Based Policing Centre, New Zealand Police.

Sherman, L. W. (2013). The rise of evidence-based policing: Targeting, testing, and tracking. Crime and justice, 42(1), 377-451.

Smith, C. A., & Ellsworth, P. C. (1985). Patterns of cognitive appraisal in emotion. Journal of personality and social psychology, 48(4), 813.

Staller, M. S., Zaiser, B., & Koerner, S. (2022). The problem of entanglement: Biases and fallacies in police conflict management. International Journal of Police Science & Management24(2), 113-123.

Thompson, J. (2022). A Guide to Abductive Thematic Analysis. The Qualitative Report, 27(5), 1410-1421.

Tompson, G., & Knutsson, J. (2017). A realistic agenda for evidence based policing. Advances in Evidence-Based Policing. Abingdon, UK: Routledge, 214-224.

Tompson, L., Belur, J., & Giorgiou, N. (2020). Evidencing the impact of Neighbourhood Watch.

Tompson, L., Belur, J., Thornton, A., Bowers, K. J., Johnson, S. D., Sidebottom, A., Tilley, N. & Laycock, G. (2021). How strong is the evidence-base for crime reduction professionals?. Justice Evaluation Journal4(1), 68-97.

Van Tulder, R., & Keen, N. (2018). Capturing collaborative challenges: Designing complexity-sensitive theories of change for cross-sector partnerships. Journal of Business Ethics, 150(2), 315-332.

Weisburd, D., Hinkle, J. C., Braga, A. A., & Wooditch, A. (2015). Understanding the mechanisms underlying broken windows policing: The need for evaluation evidence. Journal of research in crime and delinquency, 52(4), 589-608.

Welsh BC, Farrington DP (2008). Effects of improved street lighting on crime. Campbell Systematic Review 4:1–51.


The authors would like to thank research assistants from the University of Waikato for the transcription of qualitative data and staff from New Zealand Police, including but not limited to the Evidence Based Policing Centre team, Frontline Safety Improvement Programme staff and all the police staff who gave their time to share their thoughts on the initiative, We are also grateful to Anna Sutton who provided advice on organisational psychology principles regarding training.

Disclosure statement

The authors report there are no competing interests to declare.

No comments here
Why not start the discussion?