Skip to main content# Forecasting crime trends to support police strategic decision making

**ABSTRACT**

# Introduction

# Existing evidence on forecasting crime

# The present study

# Forecasting methods

# Data and Methods

# Results

## Which methods produce the most accurate forecasts?

## Which combined method is the most accurate?

## How much data is needed to make an accurate forecast?

## How far in the future is it possible to forecast?

# Discussion

## Recommendations for crime analysts

## Data availability

# Appendix

# Gamma regression models for differences between the accuracy of forecasting methods

# References

Published onJun 28, 2023

Forecasting crime trends to support police strategic decision making

*Police leaders often need to make decisions, for example about resource allocations, that could usefully be informed by accurate forecasts of the frequency of crime over the medium-term future (weeks, months or years into the future). While several forecasting methods exist that are designed to support such strategic decision making in business, they are not widely used in policing. This study compared the accuracy of ten different forecasting methods to identify whether they could usefully support police decision making. The accuracy of each forecasting method was tested in the context of three realistic police decision-making scenarios using recorded-crime data from 12 large cities in the United States. Several forecasting methods produced accurate forecasts, with a method that combined the forecasts produced by multiple forecasting methods (ETS, linear model, seasonal naïve and STL) being the most accurate. Reasonable forecasts of monthly crime frequency can be produced with 12 months of prior data, although forecasts typically become more accurate when more prior data are available. Quantitative forecasting of crime can produce reliable information that can be used for police decision making as part of a framework of decision-making tools, including predictive policing and horizon scanning.*

**Keywords:** crime analysis, forecasting, policing, police leadership, decision making

This is the version of this article submitted to a journal for publication. It has not yet been peer reviewed. It may vary from the final published version. This version date: 27 June 2023.

For the purpose of Open Access, the author has applied a Creative Commons Attribution (CC-BY) public copyright licence to any author accepted manuscript (AAM) version arising from this submission.

Correspondence concerning this article should be addressed to Matthew P J Ashby, 35 Tavistock Square, London WC1H 9EZ, United Kingdom. E-mail: [email protected]

Police chiefs, managers and crime analysts constantly face the problem of how best to respond to demands for police services given limited resources. Evidence-based policing provides guidance on how to manage this problem by testing which police practices are likely to achieve their objectives, targeting resources when and where they are most needed, and tracking that operations are being carried out as planned (Sherman, 2013).

This paper seeks to develop the ability of police to target resources efficiently by forecasting the likely frequency of crime in police administrative units for the purposes of strategic analysis and planning. Gorr and Harries (2003) expected police to become routine users of crime forecasts (in the context of CompStat), but a decade later Ismail and Ramli (2013) noted that this had not occurred. In 2015, the police inspectorate in England and Wales warned that police forces often did not understand likely future changes in demand for police services (Her Majesty’s Inspectorate of Constabulary, 2016). When police have attempted to forecast future crime rates, this has often involved very basic methods such as simply extrapolating the linear trend of previous crime counts, even where doing so leads to nonsensical results – see Metropolitan Police Service (2022) for examples.

Based on discussions with crime analysts and the lack of recent literature on the topic, it appears that strategic forecasting in policing remains uncommon. The current paper is timely for two reasons: new forecasting techniques have been developed over the past decade that have the potential to help police understand likely future crime patterns, and recent computational advances mean these forecasts can now be quickly and routinely produced by non-specialist crime analysts as part of their day-to-day work.

Before considering existing evidence on crime forecasting, it is necessary to distinguish forecasting from related issues. The term forecasting, as used in this study, describes analysis of likely future crime trends at meso (e.g. police district) or macro (e.g. city or county) spatial scales over medium or long time scales (e.g. weeks and months). This is distinct from predictive policing, which typically seeks to ascertain in which micro spatial areas (e.g. police beats, street segments or grid cells) the greatest concentrations of crime will occur over much shorter time scales (e.g. hours or police shifts). Predictive policing may be a useful tool for deciding where officers should focus their patrols in the short term (although evidence is mixed: see Ratcliffe et al., 2020), but provides little information relevant to questions of how many officers should be assigned to each police district or which types of crime are most likely to increase over the next year. It is for longer-term questions like these that forecasting crime trends may be useful.

Several studies have attempted to forecast long-term (i.e. year-on-year and greater) trends in crime, particularly using economic and demographic statistics to forecast changes (e.g. Blumstein & Larson, 1969; L. E. Cohen & Land, 1987; Deadman, 2003; Dhiri, Brand, Harries, & Price, 1999; Fox & Piquero, 2003; Schneider, 2002; Shoesmith, 2012). However, such models have often required substantial time for development: Shepherd, See, Kongmuang, and Clarke (2004) pointed out that it took 10 years to develop the models reported by Dhiri et al. (1999), a time scale clearly inappropriate for use in policing. These attempts have also required extensive expertise in choosing and refining variables and their processing within each model (see Harries, 2003 for a detailed description). The forecasts made by such studies have often showed low accuracy, with Ormerod and Smith (2001) concluding that there is insufficient information in annual crime counts (as opposed more-granular crime data) to make forecasting for future years possible.

Forecasts over shorter time periods (e.g. forecasts of monthly crime counts) appear to have been more successful. Gorr, Olligschlaeger, and Thompson (2003) forecast monthly counts of crimes using data from Pittsburgh and variations on an exponential smoothing method (see below) and concluded that “practically any model-based forecasting approach is vastly more accurate than current police practices”. J. Cohen and Gorr (2005) tested a similar model against a neural network using data from Pittsburgh and Rochester, NY, and found that the best model varied depending on the analytical task at hand. However, both J. Cohen and Gorr (2005) and Gorr et al. (2003) attempted to forecast monthly crime counts only a single month in advance, limiting the value of the forecasts for longer-term decision making. Flaxman (2014) found that a Gaussian process model outperformed an ARIMA model (terms which will be explained below) for forecasting weekly counts of crime in Chicago. Ismail and Ramli (2013) used a decomposition model to forecast violent and property crime in Malaysia, but did not compare their model to any other, making it difficult to assess its accuracy.

Existing studies have typically compared a single proposed forecasting method against one or two existing methods. As such, there has been little comparison of promising methods against each other. Studies have also generally assessed the value of forecasts for a single policing task. Almost all existing studies have used data from one or two cities. As J. Cohen and Gorr (2005) highlighted, it is likely that the conditions under which forecasts can be made will differ between cities, for example because crime varies more-strongly across the seasons in some cities than in others. It is therefore difficult to make recommendations about which forecasting methods police should use (if any) in any particular circumstances.

The objective of the research reported here was to investigate the potential usefulness of different forecasting methods within crime analysis and policing. A large number of forecasting methods have been proposed by researchers, but many are likely to be inappropriate for use in policing because they require access to expensive software, or require specialist knowledge to fine-tune inputs or interpret results. Police budgets are inevitably limited, meaning analysts often only have access to software if it is ubiquitous (such as Microsoft Office) or free – proprietary software is likely to only be available if it is useful for several different analytical tasks. Analysts must also be proficient in many different tasks, since they are often the only staff in a police agency who are expert in handling quantitative data. This means they must produce results quickly and reliably so they can move on to the next task. It is therefore unlikely they will have the time to develop extensive expertise in, for example, manually selecting tuning parameters for forecasting methods.

This study therefore focused on what might be termed semi-automatic forecasting, using methods that can reliably produce results without the need for extensive model development. The need for automated forecasts is common in business applications (Hyndman, Koehler, Snyder, & Grose, 2002), but until now does not appear to have been considered in the literature on forecasting crime. The resulting forecasts may be less accurate than those that could be produced by a team of statisticians with a large budget and distant deadline. However, it is important that applied research reflects the constraints commonly faced by practitioners.

Within this context, four research questions were investigated:

Can semi-automatic methods be used to forecast the future frequency of crime more accurately than simply assuming crime will continue to occur at the present frequency?

If so, which methods should police use to forecast crime?

How far back must historical data be available to provide sufficient data on which to base crime forecasts?

How far into the future can these forecasts reasonably extend before they become unreasonably inaccurate?

To evaluate these questions in the context of everyday policing, available forecasting methods were compared in three realistic but hypothetical policing scenarios.

In *Scenario 1*, a police chief asks a crime analyst to estimate how many crimes are likely to take place within their jurisdiction each month over the next three years. The chief may wish to use this forecast to understand how crime trends might influence crime-related demand for police services, to lobby for greater or different resources, or to establish a baseline against which the effectiveness of organisational changes can be understood.

In *Scenario 2*, the department in charge of planning overtime asks the analyst to estimate how many crimes the agency is likely to deal with on each of the following 90 days. They may wish to use this information to give officers notice of when they may be required to work overtime, especially in agencies where overtime is paid at a lower rate if officers are given greater notice. Depending on the relevant rules, it may also be possible to move officers between shifts to cover expected peaks in crime without incurring overtime payments.

In *Scenario 3*, the commander responsible for detectives asks the analyst to forecast the number of aggravated assaults expected to occur in a police district each month for the next year. The chief may want to use this to move detectives between units responsible for investigating different crime types, for example if the frequency of one crime type was forecast to decrease while that of another was expected to increase.

These scenarios are illustrative and the details of each will not be identical to practice in different police agencies, but this should not detract from their general applicability in evaluating alternative forecasting models.

All three scenarios relate to the forecasting of crime, even though responding to and investigating crime forms only a part (even a minority) of the challenges faced by police. Nevertheless, crime is a substantial source of demand for policing and police are the primary agency responsible for dealing with crime and many of its immediate consequences. The forecasting methods tested here could well be used for other types of police demand, for example emergency and non-emergency calls for service, but for reasons of space non-crime forecasts are not investigated here.

In each of these scenarios there are potential benefits to be gained from accurate forecasts, but also negative consequences of inaccurate forecasting. For example, in Scenario 1, a police chief’s credibility with politicians or the media may suffer if they lobby for more resources based on an expected increase in crime only to see crime decrease. If the forecasts produced in Scenario 2 are wrong, there could be insufficient officers on duty on days where crime was forecast to be low but was actually high, while in the opposite circumstance resources would be wasted by having more officers than were needed. It is these negative consequences, as well as the potential benefits, that make accurate forecasting important.

Ten forecasting methods were compared in the present study. Methods were selected based on their being possible to use in free or commonly used software by a non-specialist crime analyst. This excluded some models, such as the Gaussian process method outlined by Flaxman (2014), which are only available in specialist proprietary software or require extensive mathematical knowledge.

All forecasting methods estimate future crime frequency based on patterns extracted from previous crime counts. These patterns typically include a long-term *trend* (crime might be generally increasing over time), one or more *seasonal* patterns (crime might be more frequent in summer) and some short-term irregular (or *residual*) variation. What varies between methods is how these patterns are identified, which patterns are used in the forecast, and how the different patterns are combined. Some forecast methods also use additional variables that are believed to be correlated with changes in crime. These *exogenous* variables might include, for example, the number of public holidays in a month or the occurrence of major sporting events.

The ten forecasting methods used in this study were as follows. These descriptions are purposefully brief, with citations provided for readers to obtain more details.

*Naïve*. Forecasts the next value will be equal to the most-recent known value. For example, that the number of crimes next month will equal the count this month (in modelling terms, a lag variable). Does not incorporate trend, seasonal variation or exogenous predictors.*Seasonal naïve*. Referred to by J. Cohen and Gorr (2005) as the CompStat method. Forecasts the next value will be equal to the most-recent value at the same seasonal period. For example, that the number of crimes next March will be equal to the count this March. Does not incorporate trend or exogenous predictors.*Decomposition*(STL). Separates data into multiple components (long-term trend, seasonal fluctuations and remainder variation), analyses each separately and combines results to produce an overall forecast using seasonal decomposition by LOESS (Cleveland, Cleveland, McRae, & Terpenning, 1990). Does not incorporate a lagged count or exogenous predictors.*Exponential smoothing*(ETS). Trend, seasonal and remainder variation are modelled as a weighted average of previous values, with recent values weighted more highly (Hyndman et al., 2002). Weights were selected automatically following Hyndman and Khandakar (2007). Does not incorporate a lagged count or exogenous predictors.*Linear model*. Similar to the ordinary least-squares model used elsewhere in statistics, but with variables to represent trend and seasonal variation. Incorporates exogenous predictors but not a lagged count. Note that this method is not simply a linear extrapolation of the previous trend in crime (as is possible in Excel, for example), which would not incorporate seasonal variation or exogenous predictors.*Auto-regressive integrated moving average*(ARIMA). Forecasts using previous values, possibly smoothed using a moving average of those values (Box & Jenkins, 1970). The number of previous values to include in the model and in calculating the moving average were selected automatically following Hyndman and Khandakar (2007). Incorporates lag, trend, seasonal variation and exogenous predictors.*Neural network*. Machine learning algorithm that uses combinations of lagged crime counts and values of exogenous variables that in the past have been associated with different frequencies of crime to predict future values (Hyndman & Athanasopoulos, 2019). Incorporates lag, trend, seasonal variation and exogenous predictors.*Prophet*. Developed by Facebook to forecast different time series in its business, based on general additive regression models (Taylor & Letham, 2017). Incorporates trend, seasonal variation and exogenous predictors, but not a lagged count.*Forecasting with additive switching of seasonality, trend and exogenous regressors*(FASSTER). Developed to handle time series following multiple patterns at different points (e.g. weekdays and weekends). While this switching ability requires manual selection and so was not used in the present study, FASSTER models were included because they can flexibly handle other common time series (O’Hara-Wild, 2019). Incorporates lag, trend, seasonal variation and exogenous predictors.*Combined*(also known as the ensemble method). Forecasts for each period are the arithmetic mean of the separate forecasts produced by the ETS, linear model, seasonal naïve and STL methods. Incorporates lag, trend, seasonal variation and exogenous predictors.

For a more-detailed discussion of available time-series models, see Hyndman and Athanasopoulos (2019).

Throughout this paper, forecasting methods will be compared to the naïve method. This method simply assumes that crime will continue to occur at the same rate at which it occurred in the most recent period for which data are available. For example, if 43 crimes had occurred this month, the naïve method would forecast that there would be 43 crimes next month and every month into the future. Such a forecast will obviously be inaccurate on many occasions, since it ignores both any trends in crime and any seasonal variations. Nevertheless, the naïve method is used as the standard benchmark against which to compare forecasting methods (Hyndman & Athanasopoulos, 2019). The naïve method can also be used as a benchmark because naïve forecasts are (by forecasting that crime will continue as it is now) likely to lead to decisions to keep the *status quo* for funding, squad allocations, shift patterns, etc. These are likely to be the same decisions made by agencies that simply do not forecast crime at all. As such, the difference in forecast accuracy between the naïve method and any other forecasting method can be seen as representing the knowledge gained by using that other method relative to not making any forecast at all.

The same exogenous variables were used for all the models capable of handling them. Variables were selected based on whether or not they would be easily available to crime analysts in one of the scenarios outlined above, and whether there were reasons to think they would influence the forecasts. This excluded variables that change so slowly or are measured so infrequently (such as residential population) that they would be largely or completely constant across forecast time periods. Also excluded were variables that cannot be predicted accurately, such as daily rainfall more than a few days in advance.

For Scenarios 1 and 3, which required monthly forecasts, two exogenous variables were included: the number of weekdays and the number of federal public holidays in each month. These variables can easily be calculated from reference almanacs or date-manipulation functions within common software packages. They were included because the frequency of crime commonly varies between weekdays and weekends, and on public holidays.

Scenario 2 produced daily forecasts so more exogenous variables were expected to influence the forecast counts. Whether or not a date was a federal public holiday, Black Friday or Halloween were all included as separate variables. Also included were whether or not a date was the first or last day of a month, since some police agencies record crimes for which the exact offence date is unknown as occurring on those dates. Separate boolean variables indicated whether each date corresponded to a home game in a major professional sports league (Major League Baseball, Major League Soccer, National Basketball Association, National Football League or National Hockey League), a game in the NCAA Division I Football Bowl Subdivision of college football, or a NASCAR or Formula 1 auto racing event. These variables were included because major events may be associated with variations in crime (Piquero, Piquero, & Riddell, 2019).

These are relatively simple models – it is likely that other exogenous variables could have been used to forecast crime in each scenario. However, each additional variable requires the analyst to take more time collecting and processing data, especially for data which must be collected manually or updated frequently. Analysts should therefore balance the potential predictive value of including more exogenous variables against the additional time and effort required. Similarly, these models could potentially be improved by transforming the data using a Box-Cox or log transformation, but this too would likewise require manual calculation.

Almost all existing studies on crime forecasting have used data from a single city, limiting the generalisability of the results. The present study instead used data from 12 of the largest 50 cities in the United States: Austin, Chicago, Detroit, Kansas City, Los Angeles, Louisville, Memphis, New York, San Francisco, Seattle, St Louis and Tucson. Crime data for these cities was obtained from the Crime Open Database for the calendar years 2010 to 2019. This database contains harmonised police-recorded crime data for large US cities, derived from the open data released by individual cities (see Ashby, 2019 for details of the data).

These cities represent different regions of the United States, different climates and different urban configurations. Figure 1 shows monthly crime counts along with three summary statistics showing the differences in temporal patterns of crime in each city. Mean monthly counts varied from 3,500 in Louisville to 39,900 in New York. The extent to which there was a trend in crime frequency over time also varied. This can be seen using the trend strength (TS) statistic included in Figure 1, for which values closer to 1 indicate a stronger long-term trend (Wang, Smith, & Hyndman, 2006). The downward trend in crime in Detroit is much greater than that in Tucson, while Los Angeles experienced an opposite trend. Similarly, the seasonal strength (SS) statistic – which has the same interpretation – shows that crime in Chicago is much more seasonally concentrated than crime in San Francisco. The different temporal patterns across cities mean that each forecasting method can be evaluated in several different contexts, providing greater confidence in the generalisability of the results.

*Figure 1: Trends in monthly crime counts in the study cities.*

Many software programs can produce forecasts from time-series data, although not all can use every method investigated here. The present study used the R *fable* forecasting framework (O’Hara-Wild, Hyndman, & Wang, 2020). Forecasting was done in R because it is free (an important consideration since police budgets are always stretched), useful for many other crime-analysis tasks, already used by some (although by no means most) analysts, and supported by extensive free online training and guidance for analysts wishing to develop their skills. Forecasting in R does require some coding, but forecasts can typically be produced with only a few lines of code. For example, the following code is all that is required to produce the plot of monthly crime forecasts shown in Figure 2 from a text file containing previous crime counts:

```
library(fable)
library(tidyverse)
data <- read_csv("monthly_crime_counts.csv") |>
mutate(month = yearmonth(month)) |>
as_tsibble(index = month)
model <- model(data, ARIMA(crime_count ~ trend() + season()))
forecast <- forecast(model, h = "3 years")
autoplot(forecast, data)
```

A more-detailed tutorial on how to produce forecasts of crime using the fable framework in R is included in the online supplementary material accompanying this article.

*Figure 2: Example ARIMA forecast of monthly crime counts.*

Several R packages are available for forecasting time-series data. The *fable* framework was chosen because it can produce forecasts with the parameters selected automatically. This is done by automatically producing multiple forecasts based on different parameters and choosing the parameters that produce the smallest forecast error, i.e. the smallest difference between the actual values in the time series and the values predicted by the forecast (Hyndman & Khandakar, 2007). This semi-automatic forecasting means analysts are not required to develop expertise in selecting parameters or spend time manually identifying the optimal parameter values.

This study used an out-of-sample rolling-window design to test the forecasts produced by each method. The key test of the value of a forecast is how accurately the forecast compares to counts of crime that were *not* used in developing the forecasts, since testing only against existing data tends to under-estimate forecast errors (Tashman, 2000). Out-of-sample testing requires a portion of the data to be held back (i.e. not used to develop the forecasts) so it can be used to test the forecasts. In testing all three scenarios outlined above, three years of data were used to develop each forecast, since it is likely that most police agencies will have access to at least three years of previous crime data. To test Scenario 1 (which produced monthly forecasts for 36 months ahead), three years of data were held back. To test Scenario 2 (which produced daily forecasts for 90 days ahead), 90 days of data were held back and to test Scenario 3 (which made monthly forecasts for 12 months ahead), one year of data were held back.

A variation on the rolling-window design outlined by Armstrong and Grohman (1972) was used to generate forecasts for each city using each method. As Figure 3 illustrates, separate forecasts were produced using three years of training data beginning in each successive month. So (for Scenario 1) the first forecast was derived from three years of data beginning in January 2010, the second forecast used data beginning in February 2010 and so on. Each forecast was then tested against the data beginning with the first period that was *not* included in the data used to generate the forecast. This meant the accuracy of the forecasts produced in each scenario could be tested many times with different data, producing more generalisable conclusions.

Each scenario was tested 48 times in each city:

For Scenario 1, forecasts were produced for 36-month periods beginning with each month from January 2013 to December 2016, making 20,736 individual forecasts in total (

$36\ \text{months\ of\ forecasts} \times 12\ \text{cities} \times 48\ \text{repetitions}$ ) for each forecasting method.For Scenario 2, forecasts were produced for 90-day periods beginning on one randomly chosen day of each month from January 2013 to December 2016, making 51,840 individual forecasts in total (

$90\ \text{days\ of\ forecasts} \times 12\ \text{cities} \times 48\ \text{repetitions}$ ) for each method.For Scenario 3, forecasts were produced for 12-month periods beginning with each month from January 2013 to December 2016, making 6,912 individual forecasts in total (

$12\ \text{months\ of\ forecasts} \times 12\ \text{cities} \times 48\ \text{repetitions}$ ) for each method. Since Scenario 3 attempted to forecast crime at the police-district level, one district was chosen at random in each city and forecasts generated for that district. The only exception to this was Kansas City, for which district boundaries were not available. To compensate, two districts were selected at random from New York City (the city with the largest population).

*Figure 3: Data used to produce and test each forecast using a rolling-window design (only the first 12 repetitions shown).*

The forecasts produced by each model were evaluated based on the weighted absolute percentage error (WAPE) statistic. This measure of forecast accuracy can be used to compare forecasts produced using datasets of different size, which makes it suitable for comparing forecast errors across different cities of different sizes. The WAPE statistic is also not vulnerable to some of the issues that affect other measures of forecast error such as the mean absolute percentage error (Hewamalage, Ackermann, & Bergmeir, 2022). WAPE is the sum of the absolute differences between the actual number of crimes and the forecast number of crimes for each period (day, month, etc.), divided by the sum of the actual total number of crimes. Since WAPE is a measure of error, smaller values indicate a more-accurate forecast. One limitation of using WAPE as a measure of forecast accuracy is that it treats under- and over-estimates of crime as being equivalent. This may not always be the case in every analytical context (Berk, 2011), but whether under- or over-estimates of future crime are more concerning will be highly context dependent.

Figure 4 shows an example of the forecasts (dotted lines) produced for crime in Austin, Texas, for 36 months from January 2013, compared to the actual number of crimes that occurred (solid lines). With the exception of the naïve method (which assumes that crime will continue to occur at the same frequency as in the month the forecasts were made), all the forecasts reflect some of the temporal patterns in the data, such as the seasonal variation in crime throughout the year. However, not all of the forecasts were equally accurate: in this example, the time-series linear model produced forecasts that were much more accurate than (for example), the seasonal naïve model. This was because there was a downward trend in crime in Austin in this period that the seasonal naïve method (which simply forecasts that crime in a month this year will occur at the same frequency as in the same month last year) could not detect.

*Figure 4: Example of forecasts produced for Austin, TX, for 36 months from January 2013, compared to the actual number of crimes that occurred.*

*Figure 5: Differences between monthly forecasts and actual crime counts for each model using data for Austin for 48 separate periods of 36 months.*

While Figure 4 shows examples of forecasts for one city for one point in time, this is not sufficient to evaluate the general accuracy of the different forecasting methods, since the performance of methods may vary depending on the patterns in the data. Figure 5 shows the absolute errors (equivalent to the height of the shaded area in Figure 4) for each of the 17,280 individual monthly forecasts of crime in Austin made using each method for each of the 48 separate time periods included in the rolling-origin study design. In contrast to the order of methods in Figure 4, Figure 5 shows that *on average* the most-accurate methods for forecasting crime in Austin in Scenario 1 were the combined, linear-model and ARIMA methods. In particular, the FASSTER method that was the second most-accurate method in Figure 4 was on-average among the least accurate methods in Figure 5, with some forecasting errors being very large.

*Figure 6: Weighted absolute percentage error (WAPE) for forecasting method in each city for Scenario 1.*

To fully understand how accurate a forecasting method is likely to be in future, it is necessary to test how accurate is has been not only for many different periods but also in several different cities. Figure 6 shows the weighted absolute percentage error (WAPE) of each forecasting method for Scenario 1 in each city. The forecasting method with the lowest WAPE in each city is highlighted with a black border and bold text, while methods with a WAPE within one percentage point of the method with the lowest WAPE are highlighted with a striped background and italic text.

Multi-level gamma regression models were fit to the absolute forecast errors for each of the three scenarios. Each of these models had two levels, with fixed effects for each forecasting method and random intercepts for each city. These models were used to determine whether the forecast errors produced by each method were significantly different from the errors produced by the naïve method applied to the same data. This makes it possible to identify whether each method is likely to be more accurate than simply assuming crime is likely to continue to occur at the same rate as now. Since the large number of forecasting repetitions mean that even small differences in accuracy between methods are likely to be statistically significant, and because statistical significance was of secondary importance compared to the absolute differences in accuracy between forecasting methods, full results for the regression model for each scenario are shown in the Appendix.

Figure 6 shows that for Scenario 1 the combined method had the lowest overall WAPE across all cities, and that it had the lowest city-level WAPE in four out of 12 cities (no other method was the best method in more than two cities). The WAPE of the combined method was within one percentage point of the lowest city-level WAPE in nine cities and within two percentage points of the lowest city-level WAPE in all 12 cities. The combined method was also the only method that had WAPE values of below 10% in all the cities in the study. Conversely, the prophet and FASSTER methods had WAPE values across almost all cities that were notably worse than the best method in each case – the overall WAPE for the FASSTER method (20.3%) was almost three times as high as the overall WAPE for the combined method (6.6%). The FASSTER and prophet methods were also the only two methods to have significantly higher errors than the naïve method applied to the same data.

*Figure 7: Weighted absolute percentage error (WAPE) for forecasting method in each city for Scenario 2.*

Figure 7 shows the WAPE of each forecasting method for Scenario 2 in each city. A comparison with Figure 6 shows that the relative performance of the different forecasting methods varied substantially across the two scenarios. For Scenario 2, the prophet method had the lowest WAPE overall and for eight individual cities, whereas in Scenario 1 the prophet method had amongst the largest errors. For Scenario 2, the combined method had a WAPE within one percentage point of the lowest city-level WAPE for nine of the 12 cities and within two percentage points for all cities. All the methods produced forecasts with significantly lower errors than the naïve method except for FASSTER, which (as in Scenario 1) produced forecasts with significantly higher errors than the naïve method.

*Figure 8: Weighted absolute percentage error (WAPE) for forecasting method in each city for Scenario 3.*

Figure 7 shows the WAPE of each forecasting method for Scenario 3 in each city. Of note is that the forecast errors for Scenario 3 are substantially larger for Scenario 3 than for Scenario 1 (which also used monthly data), likely as a result of the much lower crime counts being forecast in Scenario 3. For example, in Scenario 1 the combined method had an overall WAPE of 6.6%, while in Scenario 3 the same method had an overall WAPE of 20.4%. Also of note is that none of the methods were able to effectively forecast the frequency of aggravated assaults in the selected police district in Louisville – even the best method produced forecasts with a WAPE of 62.2%, which is clearly too large for the resulting forecasts to be considered useful.

As for Scenario 1, the combined method had the lowest overall WAPE, although the ETS method had a marginally lower WAPE in five of the 12 cities. The combined method had a WAPE within one percentage point of the best model in eight of 12 cities and within two percentage points of the best model in all but one city. As in Scenario 1, the prophet and FASSTER methods produced forecasts with errors that were significantly larger than for the naïve method – all other methods produced forecasts with significantly lower errors than the naïve method.

Looking at the results across all three scenarios, several conclusions can be drawn. First, it is clearly the case that it is possible to accurately forecast the frequency of crime in different real-world scenarios. Figure 4 illustrates that different forecasting methods can identify the complex patterns in crime data and use them to produce forecasts; Figure 6 shows that the typical errors produced by a 12-month forecast made using the combined method can be as low as 3.8%.

Secondly, not all of the forecasting methods tested here appear to be equally useful for crime data. This is important because it demonstrates the potential pitfalls of someone who is analysing crime data choosing a method because (for example) they have found it useful for other types of data or because it is the method *du jour*. While the FASSTER model had the highest forecasting errors in all three scenarios, perhaps more problematic are methods such as prophet, which performed well in Scenario 2 but poorly in Scenarios 1 and 3. Some analysts may be in a position to devote time to testing different methods for a specific forecasting purpose, in which case they will be able to identify the specific method that is most appropriate. More often, crime analysts will need to create forecasts quickly, and so will benefit from being able to use a method that is known to work for forecasting crime data generally, even if this is at the cost of using a model that is slightly less accurate. Fortunately, the differences in accuracy between several of the methods were relatively small: for example, while the prophet method was most accurate for Scenario 2, the WAPE value for that method was only 0.6 percentage points lower than for the combined method (which had the lowest errors in Scenarios 1 and 3).

Finally, it is notable that the ordering of the accuracy of the forecasting methods was generally fairly consistent across cities. This provides some confidence in applying the ranking of different methods (and in particular the preference for the combined method) produced using data from these 12 cities to other cities not included in the present study (at least in situations where analysts do not have the time to test the accuracy of particular methods using their own data).

The combined method presented in the results above uses forecasts that are the arithmetic mean of the forecasts produced by the ETS, linear-model, seasonal-naïve and STL methods. However, a combined forecast can be produced by combining any combination of forecasts produced by other methods. The specific combination used here was chosen by comparing the accuracy of forecasts produced by every possible combination of between two and five of the other methods used in this study. The 371 possible combinations of methods were compared by calculating the mean of the WAPE of all the forecasts produced by each combination for each of the three scenarios. The combination of methods that produced the lowest overall WAPE was the combination of the ETS, neural-network, seasonal-naïve, STL and linear-model methods. However, seven other combinations of forecasts produced an overall WAPE that was within 0.1% of the lowest overall WAPE, meaning the forecasts produced by each of those combinations were almost identical overall. Since neural-networks are much slower to run than any of the other forecasting methods, the combined method used here was the combination with the lowest WAPE that did not include the neural-network method.

The results presented above make it clear that it is possible to produce accurate statistical forecasts of crime frequencies. These results were based on 36 months of training data for each forecast for Scenarios 1 and 3, and 90 days of training data for each forecast for Scenario 2. This leaves open the question of how many periods (i.e. days or months, in these scenarios) of training data are needed to produce accurate forecasts.

To answer this question, forecasts were made using each method for between 12 and 60 months of training data for Scenarios 1 and 3, and between seven and 365 days of training data for Scenario 2 (the FASSTER and prophet methods were excluded from this procedure because the results above show that they were worse than a naïve model for one or more scenarios). WAPE values were then calculated separately for each number of periods of training data (i.e. WAPE was calculated separately for all forecasts based on one period of training data, for two periods of training data, etc.).

*Figure 9: Differences in weighted absolute percentage error (WAPE) for forecasts based on different periods of training data.*

Figure 9 shows the WAPE values for each forecasting method for each scenario for each number of periods of training data. The dashed line on each panel in this figure shows the WAPE value for a naïve model produced using the same number of periods of training data. The horizontal axis on the panels of this figure showing forecasts for Scenarios 1 and 3 begin with 12 months of data so that the forecasting methods were trained on data that included at least one observation from each of the 12 months of the year. Similarly, the horizontal axis on the panel showing forecast for Scenario 2 begins at 14 days so that the forecasts could be based on data including every day of the week.

Across all three scenarios, Figure 9 shows that most forecasting methods are more accurate than the naïve method even when only a few periods of training data are available, although the ARIMA and linear-model methods require more training data than the other methods. This demonstrates the value of making recommendations on minimum standards for the number of periods of training data that are needed to produce accurate forecasts.

For Scenario 1, all of the forecasting methods had lower WAPE values than a naïve model once 30 months of training data were used, while the combined, ETS, neural-network, seasonal-naïve and STL methods were better than a naïve model even with only 13 months of training data. Once more than 36 months of training data were used, the WAPE values for most forecasting methods stabilised, suggesting that there is little benefit to using more than three years of training data (although there does not appear to be any harm from including more months of data, either). The results for Scenario 3 were similar.

For Scenario 2, all the forecasting methods had lower WAPE values than a naïve method once more than 56 days of data were used, while the combined, ETS, neural-network, seasonal-naïve and STL models were better than the naïve method with only 21 days of data. The WAPE values for most methods stabilised once about 90 days of data were used, although some models continued to improve in accuracy the more data were provided.

The final research question in this paper asked how far into the future it is possible to forecast the frequency of crime. Axiomatically, the accuracy of forecasts will go down as the forecasts extend further into the future, for two reasons. First, it is inevitable that any forecasting method will not completely capture all of the patterns underlying the data. As forecasts extend into the future, the cumulative effect of these small inaccuracies will be compounded. Second, the further into the future a forecast extends, the more likely it becomes that there will be some event that changes the processes that drive the patterns in the data.

To understand how quickly forecast error increases for each forecasting method, forecasts were generated for each city using the same rolling-origin design as described above, but forecasting for 1 to 60 periods into the future in each case. For example, for Scenario 1 forecasts were generated for one month into future, two months, etc., up to 60 months (five years) into the future. Once this had been done for each city for each of the 48 sets of test data available for each scenario, the WAPE was calculated to find the average error for all the forecasts of one month into the future, two months, etc.

*Figure 10: Weighted absolute percentage error (WAPE) for forecasts made for different numbers of periods into the future.*

Figure 10 shows the WAPE values for forecasts over each number of forecasting periods for each forecasting method, compared to the WAPE for the naïve method over the same number of forecasting periods. To make the trends clearer and because the naïve method suffers from substantial seasonal variation in its accuracy, the naïve WAPE is shown as a linear trend. Figure 10 shows that (as expected) forecast error increases as forecasts are made further into the future, with that error increasing in a broadly linear fashion once seasonal variation is taken into account. All the methods shown in the figure produced lower errors than the naïve method for monthly forecasts made out to at-least two years from the date the forecast is made. The combined and neural-network methods continued to produce forecasts that were on-average more accurate than naïve forecasts for at least five years from the date the forecast was made.

For Scenario 2, Figure 10 shows that all methods produced more-accurate forecasts on-average than the naïve method, with many methods continuing to produce more-accurate forecasts up to 60 days into the future. For Scenario 3 the picture was different, as was seen in previous sections due to the smaller number of crimes present in the data. For Scenario 3, most methods were more accurate than the naïve method for only 18 months into the future, and in some cases (e.g. the ARIMA method) the forecasts were only marginally more accurate.

The results in Figure 10 demonstrate that forecasting error increases over time as expected, and that the combined method remains more accurate on-average than the naïve method for the number of forecasting periods specified in each of the scenarios.

Taking the results presented above as a whole, it is possible to answer the four research questions as follows:

Yes, semi-automatic methods can be used to forecast the future frequency of crime more accurately than simply assuming crime will continue to occur at the present frequency. Figures 6, 7 and 8 show that several forecasting methods produced lower errors than assuming that crime will continue at the present frequency (i.e., a naïve forecast). This finding was also consistent across the cities studied.

The combined method (which takes the mean of forecasts for each period produced by the ETS, linear-model, seasonal-naïve and STL methods) should generally be preferred, since across all three scenarios it produced either the lowest forecasting errors of all the methods or had a WAPE value within one percentage point of that for the best model.

Several methods (including the combined method recommended here) can produce accurate forecasts with as few as 13 periods of training data, although forecast accuracy increases if more periods of training data are included (Figure 9).

Forecasts can be produced for many periods into the future, but will in-general become less accurate in a linear fashion, the father into the future we attempt to forecast (Figure 10).

The results presented here demonstrate that data-driven forecasting methods are more accurate than a naïve forecast across multiple realistic scenarios. Put another way, these forecasting methods have the ability to provide police leaders with more information on which to base a variety of decisions, relative to the often-default assumption of assuming that crime will continue to occur broadly as it has in the past.

Whether the benefits of forecasting are valuable is a judgement that must be made by police leaders (and the analysts supporting them) applying these techniques to their own specific needs. The differences between forecasting methods in Scenario 1 (which produced monthly forecasts for three years) were quite small: the combined method produced an average error (WAPE) of 6.6%, compared to an average error of 10.0% for the naïve method (although the combined method was substantially better than the worst method – FASSTER had a WAPE of 20.3%). Whether a difference in accuracy of this magnitude will be valuable will depend on circumstances. However, since producing forecasts using the combined method is a fairly simple analytical task, it will probably make sense to use the most-accurate available forecasting method. The online supplementary material accompanying this article includes a tutorial for crime analysts to use to produce forecasts using the combined method used in this study.

The present study used data from 12 of the largest 50 cities in the United States, with a median population of 683,000 (range 293,000 to 8,468,000). Since many police leaders are responsible for smaller areas, it would be useful to extend the present research to assess the validity of forecasting methods for places with crime counts lower than those seen in the police districts used for Scenario 3.

For reasons of space, this study has not considered the value of probabilistic forecasting, that is forecasting the probability of the future frequency of crime being above or beyond a particular value. Probabilistic forecasting may be useful in a police setting to answer questions such as “under a business-as-usual scenario, how likely is it that the frequency of a particular type of crime will increase by more than 10% over the next year?” Answering such questions would help, for example, to understand whether spikes in crime were within the range that might be expected based on historical trends, or whether it was likely that something had changed in the factors underlying crime trends. Future research should assess the usefulness of probabilistic forecasts for realistic police forecasting scenarios such as those used in this study.

The results presented here must, by the nature of trying to predict the future, come with some caveats. Most importantly, all of the forecasting methods used in this study assume the causes of crime will be the same in the future as they are now: what J. Cohen and Gorr (2005, p 9) referred to as “business as usual” forecasts. That assumption is likely to become less reasonable the further into the future forecasts are made.

A good forecasting method can take into account changes in crime frequency caused by long-term trends, seasonal variations and the recurrence of events such as public holidays, major public events and so on. What none of these methods can do is warn a police executive that in a few months’ time a global pandemic will cause drastic changes in the frequency of crime (Ashby, 2020b) and the nature of calls for service (Ashby, 2020a). Identifying future changes in the processes that underlie the frequency of crime requires a separate process of *horizon scanning* (Ekblom, 2022). Large agencies, or government ministries responsible for criminal-justice policy, may have the resources to support horizon scanning, which often requires knowledge of multiple disparate domains. For example, depending on the circumstances of a particular agency, scanning for potential changes in the processes driving crime over the next 10 years may require knowledge of the likely effects of climate change, developments in technology and various types of social change. The history of crime shows that some once-common crime types such as safe-breaking have become obsolete (Walsh, 1994) while new types of crime such as cyber-enabled fraud have bloomed. For smaller agencies, accurate horizon scanning is likely to be very challenging.

Crime forecasting (as described in this paper) sits on a continuum of techniques that can be useful in supporting police decision making over different time-scales. At the shortest time-scales (hours and days into the future), predictive policing techniques may be valuable in assisting police decision making. Decisions made at these short time-scales are likely to primarily be made by front-line supervisors or local commanders who have spans of control short enough to have fine-grained control over officer activity. Over medium time-scales (weeks and months into the future, up to about 5 years), forecasting using the combined method has the potential to be useful. Decisions over medium time-scales are typically those made by police executives, such as decisions about future budgets or allocation of resources. Over longer time-scales (5 years or more into the future), horizon scanning is likely to be more useful. These longer-term decisions are more likely to be those made by politicians and policy-makers in government, such as whether to invest in new technologies to deal with potential new types of crime.

Police leaders can combine quantitative forecasts such as those discussed here with other types of information in order to inform their decisions. For example, an agency may believe that an event such as an annual fair is likely to have more crime this year because intelligence suggests two feuding gangs intend to attend the event. Police agencies will also be able to draw on a pool of expert knowledge among their officers and staff. Numeric forecasts can be used as a ‘sense check’ on the assumptions that expert officers may have made in preparing for an event or deciding on where to focus their resources over an upcoming period of time. Whenever numeric forecasts are used, it should be remembered that they are a guide to a potential future that is driven by patterns in crime data from the past. A crime forecast is not a guarantee that the future will turn out in a particular way – as in many areas of policing, such guarantees do not exist.

To use quantitative forecasts to support strategic decision making in police, crime analysts can take the following steps.

If the analyst has the time and resources, consider testing multiple forecasting methods on data from their own agency in the context of the actual scenario that they want to forecast. For example, if an analyst wants to forecast weekly counts of robbery in the future, they could produce forecasts using different methods for a period in the recent past, then compare the forecasts produced by each method to the actual number of crimes that occurred.

If the analyst does not have the time or resources to do this, consider using the combined method (i.e. the mean of forecasts produced by the ETS, linear-model, seasonal-naïve and STL methods), since this has been shown in this study to produce reasonable forecasts across a range of scenarios in different cities.

Use at least three years of training data for annual forecasts and at least three months of data for daily forecasts.

Update forecasts whenever possible, since the further into the future a forecast is estimated for, the less accurate it will be. For example, if forecasts for the next 12 months are used at a monthly meeting, it is preferable to re-estimate the forecasts before each meeting so that they take account of the latest available data.

Use quantitative forecasts in context with other information, such as intelligence information about particular crime problems or existing knowledge about specific influences on crime on particular days.

Make it clear to decision makers that quantitative forecasts are estimates of what is likely to happen under a business-as-usual scenario – it is always possible that changes to the drivers of crime will render future forecasts less accurate than they would otherwise have been.

This study used data from the Crime Open Database (Ashby, 2019) and is deposited at the Open Science Framework at https://doi.org/10.17605/OSF.IO/ZYAQN

Separate multi-level gamma regression models were run for each of the three scenarios to identify differences between the individual absolute forecast errors of different methods, taking into account the structure of the data in which observations were clustered in cities. Each model had fixed effects for each forecasting method and random intercepts for each city. Following the notation used in Chapter 12 of Gelman and Hill (2007), each model can be written as:

$\begin{matrix}
\text{error}_{i} & \sim N\left( \mu,\sigma^{2} \right) \\
\mu & \sim \alpha_{j\lbrack i\rbrack} + \beta_{1}\left( \text{method} \right) \\
\alpha_{j} & \sim N\left( \mu_{\alpha_{j}},\sigma_{\alpha_{j}}^{2} \right)\text{,\ for\ city\ j\ =\ 1,}\ldots\text{,J} \\
\end{matrix}$

Gamma regression with a log link (estimated using maximum likelihood) was used because the absolute forecast errors were positive, continuous and positively skewed. Since a log link was used and some of the errors were exactly zero, a fixed value of 0.001 was added to each absolute forecast error before modelling.

*Figure 11: Fixed-effects estimates of a multi-level gamma regression model of the forecast errors produced by different forecasting methods for Scenario 1 in different cities.*

For Scenario 1, overall model explanatory power (conditional

estimate | 95% CI | |
---|---|---|

intercept | 915.70 | 676.96–1,238.62 |

| ||

Austin | 0.72 | 0.70–0.75 |

Chicago | 3.15 | 3.04–3.26 |

Detroit | 0.98 | 0.95–1.02 |

Kansas City | 1.20 | 1.16–1.24 |

Los Angeles | 2.13 | 2.05–2.21 |

Louisville | 0.46 | 0.44–0.47 |

Memphis | 1.20 | 1.16–1.24 |

New York | 2.94 | 2.84–3.05 |

San Francisco | 0.73 | 0.71–0.76 |

Seattle | 0.62 | 0.60–0.64 |

St Louis | 0.47 | 0.46–0.49 |

Tucson | 0.50 | 0.49–0.52 |

| ||

ARIMA | 0.90 | 0.88–0.92 |

ETS | 0.79 | 0.78–0.81 |

FASSTER | 2.19 | 2.16–2.23 |

STL | 0.78 | 0.76–0.79 |

VAR | 1.69 | 1.66–1.72 |

combined | 0.69 | 0.67–0.70 |

linear model | 0.87 | 0.85–0.89 |

neural network | 0.75 | 0.74–0.76 |

prophet | 1.26 | 1.24–1.28 |

seasonal naïve | 0.76 | 0.74–0.77 |

theta | 0.92 | 0.90–0.94 |

*Figure 12: Fixed-effects estimates of a multi-level gamma regression model of the forecast errors produced by different forecasting methods for Scenario 2 in different cities.*

For Scenario 2, conditional

estimate | 95% CI | |
---|---|---|

intercept | 43.17 | 32.84–56.75 |

| ||

Austin | 0.74 | 0.74–0.75 |

Chicago | 2.29 | 2.27–2.31 |

Detroit | 0.97 | 0.96–0.98 |

Kansas City | 1.53 | 1.51–1.54 |

Los Angeles | 1.63 | 1.62–1.64 |

Louisville | 0.54 | 0.53–0.54 |

Memphis | 0.83 | 0.83–0.84 |

New York | 3.27 | 3.25–3.30 |

San Francisco | 0.85 | 0.84–0.86 |

Seattle | 0.58 | 0.57–0.58 |

St Louis | 0.57 | 0.57–0.58 |

Tucson | 0.60 | 0.59–0.60 |

| ||

ARIMA | 0.67 | 0.66–0.67 |

ETS | 0.71 | 0.71–0.72 |

FASSTER | 5.17 | 5.11–5.23 |

STL | 0.71 | 0.70–0.72 |

VAR | 0.67 | 0.66–0.67 |

combined | 0.66 | 0.65–0.67 |

linear model | 0.69 | 0.68–0.70 |

neural network | 0.85 | 0.84–0.86 |

prophet | 0.63 | 0.62–0.63 |

seasonal naïve | 0.86 | 0.85–0.87 |

theta | 0.69 | 0.68–0.69 |

*Figure 13: Fixed-effects estimates of a multi-level gamma regression model of the forecast errors produced by different forecasting methods for Scenario 3 in different cities.*

For Scenario 3, conditional

Scenario 3 | ||
---|---|---|

estimate | 95% CI | |

intercept | 8.73 | 6.16–12.39 |

| ||

Austin, Sector Baker | 1.68 | 1.65–1.71 |

Chicago, 19th District | 0.93 | 0.92–0.95 |

Detroit, 8th Precinct | 2.24 | 2.20–2.28 |

Los Angeles, Devonshire Division | 1.38 | 1.35–1.40 |

Louisville, Fifth Division | 0.22 | 0.22–0.22 |

Memphis, Tillman Station | 1.28 | 1.26–1.30 |

New York City, 34th Precinct | 0.73 | 0.72–0.74 |

New York City, 67th Precinct | 1.11 | 1.09–1.13 |

San Francisco, Northern Station | 0.66 | 0.65–0.67 |

Seattle, Southwest Precinct | 0.56 | 0.55–0.57 |

St Louis, District 6 | 2.35 | 2.30–2.39 |

Tucson, Division South | 1.04 | 1.02–1.06 |

| ||

ARIMA | 0.97 | 0.94–1.00 |

ETS | 0.83 | 0.80–0.85 |

FASSTER | 1.52 | 1.47–1.56 |

STL | 0.81 | 0.78–0.83 |

combined | 0.79 | 0.77–0.82 |

linear model | 0.89 | 0.86–0.92 |

neural network | 0.90 | 0.87–0.93 |

prophet | 1.14 | 1.11–1.17 |

seasonal naïve | 0.91 | 0.88–0.94 |

theta | 0.95 | 0.92–0.98 |

Armstrong, J. S., & Grohman, M. C. (1972). A comparative study of methods for long-range market forecasting. *Management Science*, *19*(2), 211–221.

Ashby, M. P. J. (2019). Studying crime and place with the crime open database. *Research Data Journal for the Humanities and Social Sciences*, *4*(1), 65–80. https://doi.org/10.1163/24523666-00401007

Ashby, M. P. J. (2020a). Changes in police calls for service during the early months of the 2020 coronavirus pandemic. *Policing: A Journal of Policy and Practice*, *14*(4), 1054–1072. https://doi.org/https://doi.org/10.1093/police/paaa037

Ashby, M. P. J. (2020b). Initial evidence on the relationship between the coronavirus pandemic and crime in the united states. *Crime Science*. https://doi.org/10.1186/s40163-020-00117-6

Berk, R. (2011). Asymmetric loss functions for forecasting in criminal justice settings. *Journal of Quantitative Criminology*, *27*(1), 107–123. https://doi.org/10.1007/s10940-010-9098-2

Blumstein, A., & Larson, R. (1969). Models of a total criminal justice system. *Operations Research*, *17*(2), 199–232. Retrieved from https://www.jstor.org/stable/168830

Box, G. E. P., & Jenkins, G. M. (1970). *Time series analysis: Forecasting and control*. San Francisco: Holden-Day.

Cleveland, R. B., Cleveland, W. S., McRae, J. E., & Terpenning, I. (1990). STL: A seasonal-trend decomposition procedure based on Loess. *Journal of Official Statistics*, *6*(1), 3–73.

Cohen, J., & Gorr, W. L. (2005). *Development of crime forecasting and mapping systems for use by police*. 150. https://doi.org/10.3886/ICPSR04545.v1

Cohen, L. E., & Land, K. C. (1987). Age structure and crime: Symmetry versus asymmetry and the projection of crime rates through the 1990s. *American Sociological Review*, *52*(2), 170–183. Retrieved from https://www.jstor.org/stable/2095446

Deadman, D. (2003). Forecasting residential burglary. *International Journal of Forecasting*, *19*(4), 567–578. https://doi.org/10.1016/S0169-2070(03)00091-8

Dhiri, S., Brand, S., Harries, R., & Price, R. (1999). *Modelling and predicting property home office research studies*. London: Home Office. Retrieved from http://library.college.police.uk/docs/hors/hors198.pdf

Ekblom, P. (2022). Facing the future: The role of horizon-scanning in helping security keep up with the changes to come. In M. Gill (Ed.), *The handbook of security*. Cham: Palgrave Macmillan. https://doi.org/https://doi.org/10.1007/978-3-030-91735-7_38

Flaxman, S. R. (2014). *A general approach to prediction and forecasting crime rates with gaussian processes*. Heinz College working paper. Retrieved from https://www.ml.cmu.edu/research/dap-papers/dap_flaxman.pdf

Fox, J. A., & Piquero, A. R. (2003). Deadly demographics: Population characteristics and forecasting homicide trends. *Crime and Delinquency*, *49*(3), 339–359. https://doi.org/10.1177/0011128703253760

Gelman, A., & Hill, J. (2007). *Data analysis using regression and multilevel/hierarchical models*. Cambridge: Cambridge University Press. https://doi.org/https://doi.org/10.1017/CBO9780511790942

Gorr, W., & Harries, R. (2003). Introduction to crime forecasting. *International Journal of Forecasting*, *19*(4), 551–555. https://doi.org/10.1016/S0169-2070(03)00089-X

Gorr, W., Olligschlaeger, A., & Thompson, Y. (2003). Short-term forecasting of crime. *International Journal of Forecasting*, *19*(4), 579–594. https://doi.org/10.1016/S0169-2070(03)00092-X

Harries, R. (2003). Modelling and predicting recorded property crime trends in england and wales - a retrospective. *International Journal of Forecasting*, *19*(4), 557–566. https://doi.org/10.1016/S0169-2070(03)00090-6

Her Majesty’s Inspectorate of Constabulary. (2016). *State of policing 2015*. London: HMIC. Retrieved from https://www.justiceinspectorates.gov.uk/hmicfrs/wp-content/uploads/state-of-policing-2015.pdf

Hewamalage, H., Ackermann, K., & Bergmeir, C. (2022). *Forecast evaluation for data scientists: Common pitfalls and best practices*. Preprint. https://doi.org/10.48550/arXiv.2203.10716

Hyndman, R. J., & Athanasopoulos, G. (2019). *Forecasting: Principles and practice* (3rd ed.). Melbourne: OTexts. Retrieved from https://otexts.com/fpp3/

Hyndman, R. J., & Khandakar, Y. (2007). *Automatic time series forecasting: The forecast package for R*. Melbourne: Monash University. Retrieved from https://www.monash.edu/business/ebs/research/publications/ebs/wp06-07.pdf

Hyndman, R. J., Koehler, A. B., Snyder, R. D., & Grose, S. (2002). A state space framework for automatic forecasting using. *International Journal of Forecasting*, *18*(3), 439–454. https://doi.org/10.1016/S0169-2070(01)00110-8

Ismail, S., & Ramli, N. (2013). Short-term crime forecasting in kedah. *Procedia - Social and Behavioral Sciences*, *91*, 654–660. https://doi.org/10.1016/j.sbspro.2013.08.466

Metropolitan Police Service. (2022). *Force management statement 2022*. London: Metropolitan Police Service. Retrieved from https://www.met.police.uk/SysSiteAssets/media/downloads/force-content/met/about-us/force-management-statement-2022.pdf

O’Hara-Wild, M. (2019). *Forecasting with additive switching of seasonality, trend and exogenous regressors*. Retrieved from https://github.com/tidyverts/fasster

O’Hara-Wild, M., Hyndman, R. J., & Wang, E. (2020). *Fable: Forecasting models for tidy time series*. Retrieved from https://fable.tidyverts.org

Ormerod, P., & Smith, L. (2001). *Assessing the predictability of social and economic time-series data: The example of crime in the UK* (p. 16). London: Volterra Consulting. Retrieved from http://arxiv.org/abs/cond-mat/0102371

Piquero, A. R., Piquero, N. L., & Riddell, J. R. (2019). Do (sex) crimes increase during the United States Formula 1 grand prix? *Journal of Experimental Criminology*. https://doi.org/10.1007/s11292-019-09398-7

Ratcliffe, J. H., Taylor, R. B., Askey, A. P., Thomas, K., Grasso, J., Bethel, K. J., … Koehnlein, J. (2020). The philadelphia predictive policing experiment. *Journal of Experimental Criminology*. https://doi.org/10.1007/s11292-019-09400-2

Schneider, S. (2002). *Predicting crime: A review of the research* (pp. 1–37). Ottawa: Department of Justice.

Shepherd, P., See, L., Kongmuang, C., & Clarke, G. (2004). *An analysis of crime and disorder in Leeds 2000/01 to 2003/04*. Leeds: University of Leeds.

Sherman, L. W. (2013). The rise of evidence-based policing: Targeting, testing, and tracking. *Crime and Justice*, *42*(1), 377–451. https://doi.org/10.1086/670819

Shoesmith, G. L. (2012). Space–time autoregressive models and forecasting national, regional and state crime rates. *International Journal of Forecasting*, *29*(1), 191–201. https://doi.org/10.1016/j.ijforecast.2012.08.002

Tashman, L. J. (2000). Out-of-sample tests of forecasting accuracy: An analysis and review. *International Journal of Forecasting*, *16*(4), 437–450. https://doi.org/https://doi.org/10.1016/S0169-2070(00)00065-0

Taylor, S. J., & Letham, B. (2017). Forecasting at scale. *PeerJ Preprints*, *5*, 1–25. https://doi.org/10.7287/peerj.preprints.3190v2

Walsh, D. P. (1994). The obsolescence of crime forms. In *Crime Prevention Studies*: *Vol.* *2*. *Crime prevention studies* (pp. 149–163). Monsey, NY: Criminal Justice Press.

Wang, X., Smith, K., & Hyndman, R. J. (2006). Characteristic-based clustering for time series data. *Data Mining and Knowledge Discovery*, *13*(3), 335–364. https://doi.org/10.1007/s10618-005-0039-x