Description
Version-of-record in The Crime Data Handbook
It is widely known that police recorded crime data provides only a partial picture of the true extent of crimes, with surveys identifying a large number of ‘hidden’ victims. Often referred to as the dark figure of crime, this gap between police records and ‘true’ level of ...
It is widely known that police recorded crime data provides only a partial picture of the true extent of crimes, with surveys identifying a large number of ‘hidden’ victims. Often referred to as the dark figure of crime, this gap between police records and ‘true’ level of crime has been attributed to a range of influences including an unwillingness for some victims to report their experiences to the police, coupled with selectivity in police recording practices and errors in the translation of police records into official statistics. Studies have also demonstrated the potentially severe implications of failing to account for these hidden crimes for the veracity of models of crime data. Comparing police records against victim surveys presents us with a potential framework to generate corrected model estimates. But to date we know little about the nature of underreporting and recording at the local area level, with victim surveys generally only suitable for regional or broad police force level comparisons. In this chapter we explore a novel solution to this problem, using a synthetic population dataset to examine the extent that police recording practices vary systematically across England and Wales. Designed to match the UK population on basic demographics as measured by the Census, and with each resident given a victimisation profile derived from the Crime Survey for England and Wales, this synthetic population enables an examination of the extent of crime undercounting at a range of spatial scales.
All codes and data used in this article are available from a Github repository (https://github.com/davidbuilgil/synthetic-crime).
This work is supported by the Secondary Data Analysis Initiative of the Economic and Social Research Council (Grant Ref: ES/T015667/1). We are grateful for the support received by Anja Le Blanc, from the University of Manchester's Research IT, who adapted our original codes to run the simulations included in this study.
Accurately measuring the number of crimes that occur is remarkably difficult. Whilst police recorded crime data seemingly give us a fully audited record of all criminal incidents that the police deal with on a daily basis, research has repeatedly shown that this is, at best, an incomplete picture of what actually happens. Studies have identified deficiencies and inconsistencies in recording practices across police forces (Burrows et al., 2000; Her Majesty Inspectorate of Constabulary, 2014) and over time (Boivin and Cordeau, 2011), as well as high profile incidents of selective counting (Eterno et al., 2016) and more legitimate opportunities for police discretion (Klinger and Bridges, 1997). But even if police recording practices could be considered error free, the resulting data can only ever provide a partial picture of the full extent of crime, since many incidents do not reach the attention of the police at all (Tarling and Morris, 2010). This can be because they are not actively reported by the victim (Xie and Baumer, 2019) or due to a lack of awareness from individuals that they have even been victimised in the first place. It may also reflect the nature of the incident, with some crimes appearing ‘victimless’ (or at least the victims exist someway downstream of the event itself), and online crimes being generally less visible to the police. In this chapter we present a novel computational approach to quantifying crime. This approach builds on the framework proposed by Buil-Gil et al. (2022), using survey data to estimate relative crime risks and mapping these on to a synthetic population that matches the UK census. We believe our synthetic data can provide researchers with a novel way to better understand the gaps in recorded crime data, and we encourage researchers to take this data as a starting point for the development of further innovations in synthetic crime data.
The limitations of police recorded crime data are well known (Biderman and Reiss, 1967; Skogan, 1977) and there now exists a formidable evidence base from victimisation surveys on the nature of the problem (Lohr, 2019; Xie and Baumer, 2019). Individual victimisation surveys provide valuable evidence about the extent of under-reporting, suggesting the true extent of crime to be many orders of magnitude larger than official police estimates. The likelihood of crime reporting varies between crime types and population groups (Hart and Rennison, 2003). Surveys also give us important detail about why some incidents do not reach the attention of the police, including some victims determining them to be too trivial to report, or being reluctant to report them due to fear of repercussions, or because relations with the police are strained (Goudriaan et al., 2006; Tarling and Morris, 2010). Similarly, business victimisation surveys have unearthed low reporting rates affecting several crime types, including theft, fraud and cybercrime (Kemp et al., 2021; Taylor, 2002).
However, whilst our understanding of the true extent of crime has undoubtedly been enhanced by the incorporation of survey data, this has largely been restricted to summaries of the total amounts of crime experienced each year at the population level (Lynch and Addington, 2007). More fine-grained assessments of the local extent of crime, and more specifically, its patterns of under-reporting and under-recording, have been more elusive. Some promising advances have been made, including the application of new statistical methods to better assess the measurement properties of crime data (e.g., multitrait-multimethod models; Cernat et al., 2022), the estimation of the ‘dark figure’ of crime in small geographies (e.g., small area estimation; Buil-Gil et al., 2021) and the development of methods to quantify the impact of measurement error on regression models (e.g., simulation of measurement error mechanisms; Pina-Sánchez et al., 2022a).
In this chapter we outline another way forward with the creation and use of synthetically generated crime data. In recent years it has become comparatively easy to generate population scale synthetic datasets mirroring the true population on a broad range of attributes. This has resulted in a growing interest in the use of synthetic data in a range of fields (El Emam et al., 2020; Elliot et al., 2020). Synthetic data can be defined as information generated via computer simulations or algorithms instead of real-world events. This data is thus artificial by nature, and yet it can be extremely helpful for a wide array of research endeavours, since – if generated properly – it will mimic key population distributions in the real-world. While a given individual observation from such synthetic data will offer no meaningful information, we can still learn much about the relationships between the characteristics defining the individuals, groups or areas captured in these datasets. Many researchers have previously advocated the use of data simulations for crime research (Brantingham and Brantingham, 2004; Groff and Mazerolle, 2008; Hipp and Williams, 2020; Liu and Eck, 2008; Townsley and Birks, 2008).
Our synthetic data is designed to mimic the real population in terms of key socio-demographic characteristics at the micro geographical scale from the UK Census 2011, including indicators for whether each individual (or household) has been victimised in the last year and whether each crime is known to the police. This allows us to straightforwardly compute crime-risk profiles at any geographical scale simply by summing the total number of synthetically generated crimes within the defined spatial boundaries. Victimisation indicators for each synthetic individual (and household) are derived from an empirical model of crime survey data collected during the same time window as the 2011 Census, including the same set of socio-demographic characteristics used to define the synthetic population. We also derive reporting flags for each victim, allowing us to match our synthetic population to police recorded crime data.
In the next section we provide a step-by-step description of the method used to generate the synthetic crime data. This is followed by a discussion of the range of checks of its validity and reliability which suggest it outperforms existing crime estimates at a range of spatial scales. Finally, we apply this new research tool to explore patterns of crime under-counting – including differences across crime types, and correlations with area characteristics of key criminological interest – providing an initial demonstration of its potential utility for researchers and criminal justice practitioners.
Our goal is to generate a synthetic population of the UK possessing the necessary individual attributes to sufficiently differentiate those likely to be victimised (based on a range of key predictors of crime victimisation) from those unlikely to be victims of crime. We then differentiate synthetic crimes known and unknown to the police. To achieve this, first we generate a synthetic population of UK residents and households that mimics the true UK population based on a set of relevant socio-demographic variables obtained from the Census. Second, we use Crime Survey for England and Wales (CSEW) data to estimate appropriate models of crime victimisation (distinguishing violence, property and damage) and crime being known to the police using the same set of socio-demographic predictors. Third, we use these models to assign victimisation and reporting propensities to each synthetic population unit. Below we elaborate on each of these steps, including details of the empirical models and estimation framework.[1] Full code to generate the synthetic population data is available on Github (https://github.com/davidbuilgil/synthetic-crime).
UK Census 2011 data is currently available in univariate summary tables to the level of Output Areas (OA, each containing approximately 100 households and 250 residents), and covers a broad range of socio-demographic attributes of the UK population. From the full list of key tables (https://www.nomisweb.co.uk/census/2011), we select those socio-demographic attributes that can be directly mapped to the CSEW. At the individual level, for each OA, this covers the percentage of residents who are: unemployed (including students), with at least level-4 education, white, male, married, and born in UK, as well as the mean age. At the household level, this covers the percentage of households that are: single occupancy, non-vehicle owning, white, practising religion, social rented, terraced, have unemployed household heads, and household heads over 65.
We generate a synthetic population of UK residents and (separately) households with OA population counts equal to their real-world population counts, and summary distributions for each socio-demographic variable constrained to match the known values from the census summary tables. However, simply matching to OA summary tables on each socio-demographic attribute ignores the dependencies between each attribute within each OA. For example, we might reasonably expect that an OA containing a higher proportion of high-income residents will also have a higher proportion of residents with higher levels of education. Unit-level census data is however not publicly available, and the published multivariate tables are typically restricted to bivariate or trivariate tables at broader scales. Instead, we approximate the multivariate distribution of the selected socio-demographic variables using the equivalent multivariate truncated normal and binary variance-covariance matrix from the CSEW, constraining the synthetic population data to also match this multivariate distribution within each OA. Hence, we generate the synthetic population following a multivariate truncated normal and binary distribution (Demirtas et al., 2014) based on parameters recorded in the Census in each OA and a variance-covariance matrix estimated in the CSEW. Although not perfect, we believe this represents a novel and parsimonious solution to the lack of available Census micro data while also retaining close correspondence to our victimisation models.
Having generated our synthetic populations of individuals and households, we next assign each unit within each OA victimisation and reporting propensities. To generate appropriate victimisation propensities, we again turn to the CSEW. We estimate violent crime victimisation (covering: violence or sexual assault, wounding due to violence, and robbery) using negative binomial models and the same set of individual socio-demographic variables included in the previous estimations using Census data. Conditional on the estimated victimisation status, we then estimate a logistic model to predict if the police came to know about each crime victimisation, which we use as an indicator of whether the crime was likely to feature in police recorded crime statistics (Table 1).
Here we can see that men and younger people are more likely to be victims of violence, along with those born in the UK and unmarried. No statistically significant differences in reporting propensities were observed, although we do observe moderately lower reporting amongst men and the unemployed, but higher reporting amongst those who were married and born in the UK.
Table 1. Individual-level negative binomial model of number of violent crime incidents experienced and logistic regression model of violent crime known to police (CSEW)
| Violent crime victimisation | Violent crime known to police |
(Intercept) | -1.29*** | -0.41 |
Age | -0.04*** | 0.00 |
Male | 0.36*** | -0.14 |
White | -0.01 | -0.00 |
Unemployed | 0.03 | -0.09 |
Higher education | 0.04 | -0.05 |
Born in UK | 0.30* | 0.20 |
Married | -0.69*** | 0.22 |
Sample | 34,761 | 1,184 |
AIC | 10490 | 1631.9 |
Nagelkerke’s Pseudo R2 | 0.07 | 0.01 |
***p-value<0.001; **p-value<0.01, *p-value<0.05
We use the same two-stage modelling approach to estimate criminal damage (covering: arson and other criminal damage, and criminal damage to a vehicle) and property crime (covering: theft from a vehicle, theft of a vehicle, bike theft, and burglary), only using the set of household variables rather than the individual-level variables. The results are reported in Table 2, where we see that older, white, single person households without a car are less likely to be victimised. By contrast, social rented accommodation, religious households and those in terraced properties are more likely to suffer crimes. Reporting is less likely when the household does not own a car, whilst social renters are more likely to report criminal damage but less likely to report property offences.
Table 2. Household-level negative binomial models of number of property and criminal damage incidents experienced, and logistic regression models of property crime and damage known to police (CSEW)
| Property crime victimisation | Property crime known to police | Damage victimisation | Damage known to police |
(Intercept) | -2.15*** | -0.11 | -2.27*** | -0.84*** |
Aged 65+ (HRP) | -1.36*** | 0.20 | -0.76*** | 0.06 |
Terraced | 0.25*** | -0.05 | 0.48*** | 0.13 |
White (HRP) | -0.25*** | 0.17 | -0.17* | 0.04 |
One person HH | -0.19*** | 0.15 | -0.11 | -0.14 |
No income | 0.08 | 0.02 | 0.05 | 0.20* |
No car | -0.14* | 0.20* | -0.91*** | 0.43** |
Social renter | 0.36*** | -0.18* | 0.43*** | 0.31** |
No religion (HRP) | 0.13** | -0.02 | 0.13* | 0.05 |
Sample | 45,361 | 3,472 | 45,361 | 3,001 |
AIC | 25244 | 4807.2 | 24225 | 3795.6 |
Nagelkerke’s Pseudo R2 | 0.04 | 0.01 | 0.03 | 0.02 |
***p-value<0.001; **p-value<0.01, *p-value<0.05; HRP refers to the Household Reference Person
Having estimated crime victimisation and ‘crimes-known-to-police’ propensities, we use the estimates of the regression coefficients and dispersion parameters from these models to generate the number of crimes suffered by individuals and households in our synthetic population following a negative binomial model. Specifically, we assign each individual in our synthetic population a victimisation propensity based on the combination of attributes they possess, with those propensities used to identify the number of victimisations. This results in the generation of two synthetic population datasets. The first covers all 56 million individuals living in the 227,759 OAs in England and Wales that distinguishes whether or not each individual was the victim of violence in 2011/12 and whether or not each crime was known to the police. The second considers the 23 million households, identifying those that had suffered criminal damage or a property crime. Importantly, whilst the household file contains information on two separate crime types, they should be considered as independent realisations of household victimisation, with no attempt made to incorporate any serial dependency between damage and property crimes in the data generation process. As a result, the simulation does not account for poly-victimisation (i.e., the propensity of households suffering damage to also suffer property crime). The method does, however, account for repeat victimisation within each crime type – each synthetic observation could suffer multiple offences.
To be of value for researchers and crime analysts, it is important that the synthetic crime population is able to generate sensible approximations of the ‘true’ extent of crime. We assess this in four ways. First, we compare the demographic characteristics of our synthetic population against population data recorded in the Census. Second, we consider the extent that our synthetic data recovers the spatial distribution of police recorded crime data measured at Police Force Area (PFA), Community Safety Partnership (CSP) and Middle layer Super Output Area (MSOA) levels.[2] Here, we restrict our focus to those incidents that our synthetic residents indicate were known to the police. Third, we select a sample of residents from our synthetic population to match the CSEW sample from 2011/12, and see how closely the estimates of crime victimisation align. Fourth, we select stratified repeated random samples (strata: geographic areas) from our synthetic population data and calculate crime rates for each sample to assess the internal reliability (i.e., consistency of results between multiple samples taken from the same population) of our synthetic crime data.
Comparing our synthetic population (aggregated to OAs) against Census data reveals a very close correspondence (Figure 1). Correlations for all measures are over 0.95, with the exception of the percentage of men within each OA (0.78) and the percentage of households that are non-religious (0.92).
To assess the face-validity of our synthetic crime data, we compare how well our synthetic population is able to replicate patterns observed in police recorded crime data. We compute counts of crime in each PFA, CSP and MSOA using the subset of synthetic residents (and households) that we identify as having reported their victimisation to the police. We note a large linear relationship between our synthetic estimates and police recorded crime data at PFA level, with correlations of 0.96, 0.98 and 0.91 for violent crime, property crime and damage offences respectively (Figure 2). Our synthetic data also performs well when considering CSP estimates, with correlations of 0.83, 0.88 and 0.82. In both cases, there is also some evidence of systematic under-counting using police recorded crime data, at least when violent crime is considered, with the raw counts from the synthetic data larger than the corresponding police recorded crime figures in 86% of PFA and 88% of CSP. This is in line with expectations (Pina-Sanchez et al., 2022a), with the difference likely reflecting police decisions to ‘no-crime’ some incidents as well as clerical errors and inconsistencies in recording practice (Her Majesty Inspectorate of Constabulary, 2014).
Perhaps unsurprisingly, there is considerably more noise when we shift the focus to MSOA. These contain a minimum of 2,000 households and around 7,800 individuals so are much smaller than CSPs, and therefore we should expect a greater degree of variation between MSOAs. Here we observe correlations of 0.4, 0.6 and 0.3 for violent crime, property crime and damage, respectively. The low correlation for damage, in particular, suggests that our synthetic data may be less successfully replicating the true data generation process when dealing with incidents of criminal damage. However, a lower degree of correspondence with police data is also anticipated in this instance, with these incidents typically covering a broad range of different offence types for which there is wide scope for police discretion.[3] However, for property crime and personal crime, our synthetic data still appears to be reasonably successful.
Comparisons with the CSEW are undertaken at the PFA and CSP levels only, since the sample size of the CSEW at the MSOA level is too small and therefore unreliable. Selecting a sample from our synthetic population using a sampling design that matches the original data collection plan (a stratified sample of areas before sampling households/individuals; TNS BMRB, 2012), the synthetic sample shows close correspondence to the true CSEW estimates. The synthetic household data performs particularly well, with correlations of 0.92 and 0.75 for property crime and damage crime. The correlation for violent crime is notably weaker at 0.52. However, this still indicates a reasonable correspondence between our synthetic data and the actual CSEW sample, with the lower correlations when compared to police recorded crime data in part reflecting of the presence of sampling errors. Correlations at CSP level are noticeably worse, reflecting the comparatively small number of observations available at CSP level in the original sample (average n = 146.6).
To assess the internal reliability of our synthetic crime data (i.e., to examine the consistency in crime rates taken from multiple samples in the same synthetic populations), we select stratified repeated random samples of individuals and households (strata: geographic areas) and calculate their respective crime rates. We select samples of 10% of the population size in each area. We then use these empirically derived sampling distributions to derive the 95% confidence intervals of our crime rate estimates for each area at the PFA, CSP and MSOA levels. Specifically, we follow a bootstrap approach with 100 iterations, hence selecting 100 stratified repeated random samples and calculating 100 crime rates for each area and crime type combination. Results are visualised in Figure 4, where the black line visualizes the observed crime rate in our synthetic population, and the shaded grey area is the 95% confidence interval computed via bootstrapping. As can be seen in Figure 4, while crime rates computed at the levels of PFA and CSP show a high degree of internal reliability, confidence intervals are significantly wider for MSOAs. The internal reliability of our synthetic data is better at the PFA and CSP levels than at lower spatial scales.
Synthetic crime data can be used to address a variety of research questions in crime research. Here we apply this framework to gain new insights into the data generation process that produces police recorded crime rates, and the measurement error mechanisms at play that make it diverge from the true extent of crime. The patterns of crime counting can be explored by considering variables reproduced exclusively in our synthetic data (‘all’ synthetic crimes against synthetic crimes known to police), or by comparing the synthetic data against real-world police recorded crime data. Here, we opt to compare all synthetic crimes against synthetic crimes known to police. Given the results of the empirical evaluation presented in the previous section, we feel confident undertaking these comparisons at the level of PFA and CSP, for all three crime types. Understanding the prevalence and form of crime under-counting is crucial. First, to be able to make sense of the true extent of crime, and second, to be able to anticipate the biasing effects that such measurement error could exert on analysis exploring the relationship of crime with other criminological variables of interest (Pina-Sánchez et al., 2022b).
Because the synthetic crimes and synthetic crimes-known-to-police estimates can be generated at any spatial scale, it is straightforward to explore whether measurement error depends on spatial resolution. Figure 5 plots the distribution of measurement error at PFA and CSP geographies for all crimes, as well as separately for damage, property crime and violence. Measurement error here is defined as the recording rate (synthetic crimes known to police / all synthetic crimes) for each area. Here we see generally lower recording rates, on average, for damage offences than property crime and violence. In addition, we see greater variability at the smaller spatial scale (CSP) as well as more variability in violence than household offences. The former corroborates Buil-Gil et al. (2022), who following a similar approach noted that recording rates are likely to vary even more at lower spatial scales, such as OAs and LSOAs.
We can also examine what might help explain variations in recording rates across areas. For example, Table 3 reports bivariate correlations between our measures of recording rates and area aggregates of worry about crime, perceptions of Anti-Social Behaviour (ASB), perceived police effectiveness, and collective efficacy. These four measures are taken as area-level means of latent scores computed from a Confirmatory Factor Analysis in the CSEW. Correlations are reported for PFA and CSP geographies.
Recording rates for damage and property crime tend to be higher in areas with higher levels of collective efficacy, perhaps reflecting the greater tendency of residents of these areas to take crime seriously and intervene (Sampson et al., 1997). Consistent with Broken Windows Theory (Wilson and Kelling, 1982), recording rates for criminal damage tend to be lower in areas characterised by more anti-social behaviour, where we might expect it to be a common feature of the environment that is rarely immediately dealt with. Crime recording rates are also lower in areas where people are more worried about crime.
Table 3. Pearson’s correlation coefficient of measurement error (recording rate) by crime type and different criminological constructs
| Correlation with: | ||||
Worry about crime | Perceived ASB | Perceived police effectiveness | Collective efficacy | ||
Damage | CSP | -0.28*** | -0.25*** | 0.08 | 0.30*** |
PFA | -0.21 | -0.01 | 0.02 | 0.13 | |
Property crime | CSP | -0.13* | -0.09 | -0.00 | 0.22*** |
PFA | -0.04 | 0.12 | -0.04 | 0.08 | |
Violence | CSP | -0.13* | -0.09 | -0.01 | -0.05 |
PFA | -0.29 | -0.14 | 0.07 | -0.06 | |
All | CSP | -0.24*** | -0.18** | 0.02 | 0.13* |
PFA | -0.31* | -0.06 | 0.04 | 0.07 |
***p-value<0.001; **p-value<0.01, *p-value<0.05
Despite some recent innovations in the study of measurement error in police recorded crime data, we still do not know much about its relative presence across different spatial scales or area characteristics. These findings are important as they allow us to undertake more effective measurement error adjustments. Most of the measurement error adjustment strategies currently adopted in this field have instead relied on strong simplifying assumptions, such as police under-counting being independent of any other area characteristics considered in the substantive model of interest, or uniform across geographic scales.
More generally, existing criminological work has also struggled to make use of crime survey estimates of crime beyond examinations of national trends. Here, we present a novel way forward based on the use of a synthetic population of 56 million residents of England and Wales (and 23 million households), matched to the UK census 2011 on key attributes also shared by the CSEW. By predicting victimisation (and reporting) propensities for each synthetic resident, we are able to calculate crime counts and recording rates at any spatial scale. Whilst not perfect, our approach can successfully recreate crime profiles at PFA and CSP levels, as well as providing reasonable approximations of some crimes at MSOA level. Using this data, we have shown the importance of spatial scales when using police recorded data, as there appears to be a direct relationship between spatial scale and non-systematic measurement error; and we have also demonstrated that these errors are not independent of key area features commonly considered in crime research.
With the data and computing landscape continuing to evolve, we believe that this type of computational approach has the potential to complement existing strategies that aim to provide a more critical lens on the measurement of crime and other social phenomena. Therefore, we encourage researchers to take our data and code as a starting point for the development of innovations in synthetic population data.
Biderman, A.D., & Reiss, A.J. (1967). On Exploring the ‘Dark Figure’ of Crime. The ANNALS of the American Academy of Political and Social Science, 374(1), 1–15.
Boivin, R., & Cordeau, G. (2011). Measuring the Impact of Police Discretion on Official Crime Statistics: A Research Note. Police Quarterly, 14(2), 186–203.
Brantingham, P.L., & Brantingham, P.J. (2004). Computer Simulation as a Tool for Environmental Criminologists. Security Journal, 17, 21-30.
Buil-Gil, D., Medina, J., & Shlomo, N. (2021). Measuring the Dark Figure of Crime in Geographic Areas. Small Area Estimation from the Crime Survey for England and Wales. The British Journal of Criminology, 61(2), 364-388.
Buil-Gil, D., Moretti, A., & Langton, S.H. (2022). The Accuracy of Crime Statistics: Assessing the Impact of Police Data Bias on Crime Mapping. Journal of Experimental Criminology, 18, 515-541.
Burrows, J., Tarling, R., Mackie, A., Lewis, R., & Taylor, G. (2000). Review of Police Forces’ Crime Recording Practices. London: Home Office.
Cernat, A., Buil-Gil, D., Brunton-Smith, I., Pina-Sánchez, J., & Murrià-Sangenís, M. (2022). Estimating Crime in Place: Moving Beyond Residence Location. Crime & Delinquency, 68(11), 2061–2091.
Demirtas, H., Amatya, A., & Doganay, B. (2014) BinNor: An R Package for Concurrent Generation of Binary and Normal Data. Communications in Statistics - Simulation and Computation, 43(3), 569-579.
El Emam, K., Mosquera, L., & Hoptroff, R. (2020). Practical Synthetic Data Generation: Balancing Privacy and the Broad Availability of Data. O’Reilly.
Elliot, M., Mackey, E., & O’Hara, K. (2020). The Anonymisation Decision-Making Framework 2nd Edition: European Practitioners’ Guide. Manchester: UKAN.
Eterno, J. A., Verma, A., & Silverman, E.B. (2016). Police Manipulations of Crime Reporting: Insiders’ Revelations. Justice Quarterly, 33(5), 811-835.
Goudriaan, H., Wittebrood, K., & Nieuwbeerta, P. (2006). Neighbourhood Characteristics and Reporting Crime: Effects of Social Cohesion, Confidence in Police Effectiveness and Socio-Economic Disadvantage. The British Journal of Criminology, 46, 719–742.
Groff, E.R., & Mazerolle, L. (2008). Simulated Experiments and Their Potential Role in Criminology and Criminal Justice. Journal of Experimental Criminology, 4, 187.
Hart, T., & Rennison, C. (2003). Reporting Crime to the Police, 1992–2000. Special Report, Bureau of Justice Statistics.
Her Majesty Inspectorate of Constabulary (2014). Crime-Recording: Making the Victim Count—The Final Report of an Inspection of Crime Data Integrity in Police Forces in England and Wales. Justice Inspectorates. Retrieved from: https://www.justiceinspectorates.gov.uk/hmicfrs/wp-content/uploads/crime-recording-making-the-victim-count.pdf
Hipp, J.R., & Williams, S.A. (2021). Accounting for Meso- or Micro-Level Effects When Estimating Models Using City-Level Crime Data: Introducing a Novel Imputation Technique. Journal of Quantitative Criminology, 37, 915-951.
Kemp, S., Buil-Gil, D., Miró-Llinares, F., & Lord, N. (2021). When do businesses report cybercrime? Findings from a UK study. Criminology & Criminal Justice, 0(0).
Klinger, D.A., & Bridges, G.S. (1997). Measurement Error in Calls-for-Services as an Indicator of Crime. Criminology, 35(4), 705–726.
Liu, L., & Eck, J. (Eds.) (2008). Artificial Crime Analysis Systems. Using Computer Simulations and Geographic Information Systems. Hershey: IGI Global.
Lohr, S.L. (2019). Measuring Crime: Behind the Statistics. Boca Raton: CRC.
Lynch, J.P., & Addington, L.A. (Eds.) (2007). Understanding Crime Statistics: Revisiting the Divergence of the NCVS and UCR. New York: Cambridge University Press.
Pina-Sánchez, J., Buil-Gil, D., Brunton-Smith, I., & Cernat, A. (2022a). The Impact of Measurement Error in Regression Models Using Police Recorded Crime Rates. Journal of Quantitative Criminology.
Pina-Sánchez, J., Brunton-Smith, I., Buil-Gil, D., & Cernat, A. (2022b). rcme: A Sensitivity Analysis Tool to Explore the Impact of Measurement Error in Police Recorded Crime Rates. SocArXiv.
Sampson, R.J., Raudenbush, S.W., & Earls, F. (1997). Neighborhoods and Violent Crime: A Multilevel Study of Collective Efficacy. Science, 277(5328), 918-924.
Skogan, W.G. (1977). Dimensions of the Dark Figure of Unreported Crime. Crime & Delinquency, 23(1), 41–50.
Tarling, R., & Morris, K. (2010). Reporting Crime to the Police. The British Journal of Criminology, 50(3), 474–490.
Taylor, N. (2002). Under-Reporting Of Crime Against Small Businesses: Attitudes Toward Police And Reporting Practices. Policing and Society, 13(1), 79-89.
TNS BMRB. (2012). The 2011/12 Crime Survey for England and Wales. Technical Report Volume One. Retrieved from: http://doc.ukdataservice.ac.uk/doc/7252/mrdoc/pdf/7252_csew_2011-2012_technicalreport.pdf
Townsley, M., & Birks, D.J. (2008). Building Better Crime Simulations: Systematic Replication and the Introduction of Incremental Complexity. Journal of Experimental Criminology, 4, 309-333.
Wilson, J.Q., & Kelling, G.L. (1982). Broken Windows: The Police and Neighborhood Safety. Atlantic Monthly, 249(3), 29-38.
Xie, M., & Baumer, E.P. (2019). Crime Victims’ Decisions to Call the Police: Past Research and New Directions. Annual Review of Criminology, 2, 217-240.
[1] This framework was, in part, originally developed by Buil-Gil et al. (2022), but it is expanded here in several significant ways: first, we consider the multivariate relationships between demographic variables in our synthetic population generation; second, we generate synthetic data for all areas in England and Wales, instead of just one city; and third, we generate synthetic data not only for individuals but also households.
[2] As far as possible, our police recorded crime data is restricted to those offence categories covering domestic properties and individual victims. Police recorded crime data was accessed directly from www.data.police.uk
[3] The greater variation at MSOA level may also be accounted for by differences in environmental features/opportunity structures which have not been considered here. Moreover, we might expect differences between where victims live and where they are victimised to be increasingly apparent when considering MSOA, with the CSEW data focusing on victim residence and police recorded crime on offence location (Cernat et al., 2021).