Police administrative data systems are an increasingly accessible and varied source of crime data. This chapter outlines the many human and technological intricacies of working with police crime data as a researcher–which are often learnt as tricks of the trade over many ...
Police administrative data systems are an increasingly accessible and varied source of crime data. This chapter outlines the many human and technological intricacies of working with police crime data as a researcher–which are often learnt as tricks of the trade over many years. To reveal this ‘hidden curriculum’, we draw from our experiences of working with and within police organizations in England and New Zealand researching crime and policing. We describe the potential for police data to be used in crime research, highlighting opportunities presented by previously under-utilized sources. We then cover the pitfalls that researchers new to this area should be forewarned about when acquiring, cleaning, analyzing and interpreting police crime data. These arise from the design of data recording systems, (lack of) data completeness and accuracy, and the many pains—technical and bureaucratic—involved in accessing and understanding different data sources. Throughout we weave advice to circumvent the trials and tribulations that come with obtaining and using police crime data in research.
Police organizations are a valuable source of crime data. With increasing digitization of records tracking all aspects of police work, police hold an increasing variety of data that can yield insight into crime and the people who commit it or suffer its harm. But, as with any administrative data not collected for research purposes, police data present many ‘traps for new players’ into which researchers unfamiliar with those data-generating processes can fall. Learning how to acquire, analyze and interpret police data without falling into these traps is therefore important for any researcher wanting to leverage the potential this data poses.
Some police data traps are common to police organizations in general; some reflect the idiosyncrasies of the systems or processes of a given organization. Knowledge of these traps–and the means to avoid them–can therefore develop over years of working in or with a given police organization, and the knowledge often sits in the heads of a few go-to subject matter experts (SMEs). This ‘hidden curriculum’ of police data– analogous to the ‘hidden curriculum’ in academia (Bergenhenegouwen, 1987)–is the topic of this chapter. We first highlight the range of data collected by contemporary police organizations that has potential to answer research questions about crime, criminality and victimization. We then discuss methods and tips for acquiring, checking and cleaning, and analyzing and interpreting police crime data, respectively. We focus on strategies for identifying and mitigating the ‘traps’ we’ve encountered in our careers as crime analysts and researchers and point the reader to data- and method-specific resources for more detail.
Sources of police crime data are many and varied; knowing about this variation may help guide researchers’ expectations and questions when acquiring police data. The ‘usual suspects’ for data are well known. These include calls for service (e.g., Laufs et al., 2021; Ratcliffe, 2021), crime records (e.g., Birks et al., 2020; Chainey et al., 2019; Haberman et al., 2021), offender proceedings data such as detections, arrests, apprehensions and charges (e.g., Payne, 2007; Piatkowska & Camacho, 2022; Strom & Planty, 2022) and traffic or pedestrian stops (e.g., Pierson et al., 2020; Tiratelli et al., 2018). Police typically record the time and place the event occurred, a code capturing the general nature of the event (e.g., crime type), some free text details, and the identities (if known) of the people involved.
However, the details captured can vary between systems and jurisdictions. For example, calls for service records in Computer Aided Dispatch (CAD) databases include the name and phone number of the caller, but not necessarily details of other parties involved in the event. This practice reflects the purpose of CAD systems: to enable rapid capture of information crucial to guiding the initial response to the call. More structured and detailed records are usually entered in a case management database post-response to capture the investigation phase. However, not all calls for service flow through into case records, and not all case records have a corresponding call for service (when reported through other channels). Different legislative and policy contexts governing police activities affect what activities are recorded. For example, United Kingdom (UK) legislation mandates recording ‘stop and search’ so databases exist to capture these (College of Policing, 2020). In New Zealand (NZ) and the UK, offenders are often not arrested and taken into custody (e.g., issued with a warning, or charged and summoned to court) so arrests are poorer indicators of cases being ‘cleared’ or solved, or of recidivism, than in countries such as the US where arrests are more common (compare for example, Baughman, 2020; Pearson et al., 2018). Other jurisdictional variations include the extent to which suspect descriptions and modus operandi are recorded in structured fields that facilitate analysis (Davies & Woodhams, 2019; Fox et al., 2020).
Other emerging or traditionally under-utilized police data sources can expose where, when, why, how, by whom and to whom crime is committed. Intelligence records1 can be useful for analyzing criminal networks (Bichler, 2019). Databases or spreadsheets tracking high risk/priority individuals (e.g., prolific offenders, gang members) or locations (e.g., drug dealing or manufacturing premises) are also illuminative (Breetzke et al., 2022; Gerell, 2018; Herold & Eck, 2020; Morgan et al., 2020). Other examples include: offender activity location data for studying offenders’ crime location choices (Curtis-Ham et al., 2021, 2022), forensic records (DNA) for comparing solved and unsolved cases (Lammers, 2014), and CCTV and body-worn camera (BWC) footage for understanding the situational dynamics of criminal–and police–behavior (Chillar et al., 2021; Lindegaard et al., 2022).2
When scoping research projects that might use police data, therefore, it pays to understand what might (or might not) be recorded by local police in a given study area. Such information can often be found online3, in method sections of papers or reports using their data, or by talking to SMEs in the organization.
Acquiring police data can sometimes be the hardest hurdle to traverse. One way to acquire police data is through research partnerships with police or being commissioned by them to conduct research. Data provision under these circumstances is often governed by a legal contract for services. Working in partnership enables access to practitioners and data SMEs but comes with challenges: partnerships take time to develop, and there may be constraints imposed on the research topic and design. Academic discussion of and guidance for navigating partnerships is plentiful (Crawford, 2020; Huey & Mitchell, 2018; Tompson et al., 2017).
Outside formal partnerships, police crime data might be accessible through open data sources, research data applications, or OIA/FOIA requests. Open data sources should always be your first port of call. Many organizations publish at least some crime data on their websites,4 or report it to a central government agency who publishes it.5 Several repositories of open crime data from the US, pre-cleaned for research use, exist (Ashby, 2019; Kaplan, 2022a, 2022b). Internationally, the United Nations Office for Drugs and Crime6 and Eurostat7 (European Union) compile data covering multiple countries.
Published data will often be aggregated, but anonymized (disaggregated) data is increasingly available. To maintain people’s anonymity, in aggregate data small numbers are often adjusted (e.g., rounded up or down) or ‘suppressed’ (not reported), and the spatial coordinates are often similarly adjusted at record level (e.g., rounded to the nearest 100 meters or ‘snapped’ to a street segment - see Tompson et al, 2015). Typically, the more jurisdictions covered by the dataset, the more aggregated and less detailed the data will be.
Alternatively, the police organization may have a specific process for requesting research data.8 If this information isn’t available on their website, it’s worth contacting their statistics or research team to ask if there’s a research governance process. If there is, you will likely have to complete an application much like an academic research ethics committee application.9 In our experience completing and reviewing such applications it pays to:
Be as detailed as possible about what you are seeking and why.
Identify the benefits to the organization of the research. Who are the key stakeholders? Do they support the research? Does the research align with the organization’s research agenda10 or strategic priorities?
Acknowledge the costs to the organization. These include any time staff will spend advising, responding to surveys, being interviewed, or extracting and cleaning data.
Identify risks to privacy and data security and how you will mitigate them (there may be specific rules about how data can be stored outside the police organization)11.
Address ethics considerations, including informed consent and confidentiality. For example, if the data includes personal or personally identifiable information12 how will confidentiality be preserved?
Another option for obtaining data is via OIA or FOIA (Official Information Act/Freedom of Information Act) request–or your country/state equivalent. This method is geared to journalists or members of the public requiring information about specific cases or high-level statistics, rather than to researchers requesting reams of data. It can therefore be hit and miss, especially given there is no opportunity for dialogue with the data providers. See Güss et al. (2020) and Clifton-Sprig et al. (2020) for guidance on this option.
Regardless of how you acquire police data, the chances of getting the right data to answer the research questions are higher the more you know about the data from the outset. Get as much information as possible about the data from open sources, then ask for internal data documentation (e.g., data dictionaries, data quality reports), and talk to a range of SMEs–analysts, researchers or data scientists who are used to wrangling the data at the ‘back end’. It can take years to truly understand any given organization’s data–not just what’s available but common data entry errors, appropriate counting rules, and changes in codes and recording practices over time. So discussing with SMEs the questions you want the data to answer will help identify what’s feasible and how to articulate the data specifications clearly.
Any data you acquire will inevitably require some munging to render it analysis-ready. Police data are (hu)man-made (Tompson & Ashby, 2023), which means they are fallible and/or subject to the structures set up by the organization for operational purposes. Data recording infrastructure in police can be complex, bespoke, and subject to retrofitting as crime data recording evolves. At times data storage systems change, resulting in odd trends and bad media takes. Consequently, systems designed for operational purposes, rather than research, can end up resembling a patchy Frankenstein that you somehow must make sense of. Data recording is often done at the end of a shift or in moments of heightened stress, and even at other times can be perceived as burdensome (Huey et al., 2022). Hence, police motivation to record timely, accurate and precise data is variable (Terpstra & Kort, 2017). Every organization has its own idiosyncrasies but here we list some common issues to check for–and what to do about them.
An important first check is the completeness of the data fields. Some police data fields will be mandatory, others are discretionary. Mandatory fields may be structured, with drop-down menus to select from, but these can also be free-text fields (e.g., name). Even within these mandatory fields data can be incomplete by taking the form of ‘unknown’ or blank (unselected) answers. Incomplete data are most problematic when they are systematically missing. For example, if race is missing from events that involve people of color, this can introduce serious levels of sampling bias into the data. Assessing whether incomplete data are random or patterned is prudent, as this will inform whether you discard those cases from analysis or impute data (see Chapter 4 of Tabachnick & Fidell, 2007 for guidance).
(In)accuracies to check for can take several forms. The range of scenarios police crime data cover cannot be fully covered by standardized forms, so free text fields, with more room for inconsistency and error, are common. Such unstructured data can be difficult to aggregate or cross-reference across datasets, necessitating imposing structure (i.e., coding) before analyzing. Ideally this would follow research principles with a coding scheme and inter-rater reliability tests. Automating coding is sometimes possible, to save time (Birks et al., 2020; Kuang et al., 2017).
For structured fields, accuracy checks should look for anomalies: anything that looks odd–for instance, outliers and data incongruous with the variable name–should be investigated and removed if unresolvable. For example, with unique identifiers, checking the length of the string of characters or digits can expose data entry errors. Similarly, birth and event dates can reveal improbably young or old offenders, or events yet to occur. Structured fields can alert you to duplicate records, which are common in many data sets.
Accuracy is particularly important in crime research that uses the date/time (e.g., the timestamp) and/or location data fields. There are usually multiple date and time fields captured in crime data. These can represent when the offense was reported, the earliest and latest time the offense could have occurred and key points in the investigation (such as disposal date). Requesting fields appropriate to the research question is crucial. Crime research typically uses the earliest and latest time the offense could have occurred, but these can be imprecise due to limitations in victim recall. Examining the time span between these dates can highlight records where data entry errors are probable. The time span itself can vary: long time spans can undermine analyses that are done at the day or sub-day unit of analysis because neither the earliest nor latest times best indicate when the event occurred. See Ashby & Bowers (2013), Boldt & Borg (2016) and Ratcliffe (2000) for strategies for managing this temporal imprecision.
Locational accuracy presents another challenge. Locations will have often been geocoded using a gazetteer: a database of the coordinates of known locations. Or you may need to manually geocode the data to convert addresses to geographic coordinates to enable spatial analysis (Chainey & Ratcliffe, 2005). Either way, it pays to check the geocoding rate––because if many records have no coordinates then the data may be unreliable (Andresen et al., 2020; Briz-Redón et al., 2019). Further, addresses/coordinates can represent imprecise locations and should be treated with caution in analyses. For example, oftentimes the public’s awareness of the crime’s location is fuzzy (e.g., “I heard screaming coming from outside my apartment”) and/or the entered location only represents one location involved in the crime (e.g., a brawl that starts outside a bar, but continues down the street). Sometimes crimes are recorded to default locations, such as police stations, or ‘default points’ such as a parking lot near a beach when locations are ‘non-addressables’ (Tompson et al., 2015). Other times the location is simply unknown. If research requires precise location data then it is worth checking for commonly used ‘defaults’ in the data and/or taking a dip sample of records to check how often the recorded address–if there is one–matches the geocoded location on a map.
Rigorous research demands that data checking and cleaning is documented for transparency purposes. Keeping a record of what variables are checked, what is discovered, and how issues are resolved is not just necessary for quality research, but saves considerable self-chastisement later. Researchers working in areas with underdeveloped data sometimes report on data completeness in research publications (Cockbain & Bowers, 2019). Walter and Drawve (2018) provide additional tips on cleaning US police crime data while Eterno and colleagues (2022) cover other countries' data recording practices. This cleaning stage should also involve a two-way dialogue with the data providers: alerting them to errors and anomalies can assist them to refine their data processing policies.
The fact that police do not typically record crime data for research purposes also impacts how researchers should analyze it and interpret the results. Analytic methods are so diverse we cannot cover specifics; instead, we highlight a few general issues to consider. First, ‘counting rules’: what does one record/row in the data represent? Is there one record/row per crime, or one per victim per crime? Is there one record/row for each separate crime that occurred during an event (e.g., robberies often also involve assaults; a theft of a car from a driveway can be both a burglary and a vehicle theft), or one for only the most serious crime within the event13? If you have aggregate data, check whether its counting rule matches your research question. If there’s a mismatch, consider how it may play out in your results (and thus what to caveat in your limitations section!). If you have disaggregated data, think carefully about the best counting rule for your question.
Another consideration is the unit of analysis. This can be an issue in spatial and temporal analysis, whereby analysis at higher order units of analysis (e.g., neighborhoods) can mask important variation that occurs at smaller scales (e.g., city blocks; Andresen, 2013). It also applies to crime types: lumping distinct behavioral phenomena together under broad crime categories can mask countervailing trends that ‘cancel out’, generating misleading results (Andresen & Linning, 2012). When crime classifications do not reliably group similar behaviors together, some manual coding may be necessary. For example, stalking can be classified variously in the UK as harassment, malicious communications or domestic violence, and even robbery and burglary can take distinct forms that may warrant separate analysis (Fox & Farrington, 2012; Haberman et al., 2021).
When interpreting results, it’s particularly important to consider how bias in what’s reported to and recorded by police can affect the results. For example, police are more likely to have data about certain places or people, potentially producing inflated associations between variables (if the data production processes are confounded with your variables of interest), or results that do not generalize to underrepresented places or populations. For instance, different jurisdictions (and even police beats within jurisdictions) may differ in their interpretation of how to classify crime (e.g., hate crime) or discretionary police activities (e.g., stops). Any resulting data patterns may be artifacts of police practices more than the phenomenon being studied. Hence, care must be taken not to over extrapolate to all crime (Buil-Gil et al., 2021) or places. Furthermore, the data represent only a snapshot of police recorded crime. For example, recently extracted data can undercount crimes only reported after a delay, and initial crime classifications may not reflect the final classification after investigation (Simpson & Orosco, 2021).
At the interpretation stage, we have found it helpful to again engage operational and data SMEs who may be able to explain odd patterns. For example, big step changes in a time series are more likely to reflect changes in recording (e.g., new crime codes, reporting mechanisms or priorities) than changes in criminal behavior (which usually appear as gradual change, or temporary fluctuations). SMEs can help to decipher such results, and to tease out what they mean for practice–an important element of any discussion section.
Given the increased potential for police data to be used in crime research, there is also an increased need for researchers to be aware of its traps, and be equipped to navigate them. This chapter provided tips and tricks for acquiring, cleaning, analyzing and interpreting these data, that we wish we knew earlier in our careers. Future researchers forewarned and forearmed about these issues can ensure their research is faithful to the underlying processes that generate that data, thus increasing the practical relevance and quality of crime research.
Andresen, M. A., Malleson, N., Steenbeek, W., Townsley, M., & Vandeviver, C. (2020). Minimum geocoding match rates: An international study of the impact of data and areal unit sizes. International Journal of Geographical Information Science, 34(7), 1306–1322. https://doi.org/10.1080/13658816.2020.1725015
Breetzke, G. D., Curtis-Ham, S., Gilbert, J., & Tibby, C. (2022). Gang membership and gang crime in New Zealand: A national study identifying spatial risk factors. Criminal Justice and Behavior, 49(8), 1154–1172. https://doi.org/10.1177/00938548211034200
Briz-Redón, Á., Martinez-Ruiz, F., & Montes, F. (2019). Re-estimating a minimum acceptable geocoding hit rate for conducting a spatial analysis. International Journal of Geographical Information Science, 34(7), 1283–1305. https://doi.org/10.1080/13658816.2019.1703994
Chainey, S. P., Pezzuchi, G., Guerrero Rojas, N. O., Hernandez Ramirez, J. L., Monteiro, J., & Rosas Valdez, E. (2019). Crime concentration at micro-places in Latin America. Crime Science, 8(1), 5. https://doi.org/10.1186/s40163-019-0100-5
Chillar, V., Piza, E., & Sytsma, V. (2021). Conducting a systematic social observation of body-camera footage: Methodological and practical insights. Journal of Qualitative Criminal Justice & Criminology. https://doi.org/10.21428/88de04a1.6642b3cd
Cockbain, E., & Bowers, K. (2019). Human trafficking for sex, labour and domestic servitude: How do key trafficking types compare and what are their predictors? Crime, Law and Social Change, 72(1), 9–34. https://doi.org/10.1007/s10611-019-09836-7
Crawford, A. (2020). Effecting change in policing through police/academic partnerships: The challenges of (and for) co-production. In N. Fielding (Ed.), Critical reflections on evidence-based policing. Routledge. https://www.taylorfrancis.com/chapters/edit/10.4324/9780429488153-10/effecting-change-policing-police-academic-partnerships-adam-crawford
Curtis-Ham, S., Bernasco, W., Medvedev, O. N., & Polaschek, D. L. L. (2021). A national examination of the spatial extent and similarity of offenders’ activity spaces using police data. ISPRS International Journal of Geo-Information, 10(2), Article 2. https://doi.org/10.3390/ijgi10020047
Curtis-Ham, S., Bernasco, W., Medvedev, O. N., & Polaschek, D. L. L. (2022). Relationships between offenders’ crime locations and different prior activity locations as recorded in police data. Journal of Police and Criminal Psychology. https://doi.org/10.1007/s11896-022-09540-8
Fox, B. H., & Farrington, D. P. (2012). Creating burglary profiles using latent class analysis: A new approach to offender profiling. Criminal Justice and Behavior, 39(12), 1582–1611. https://doi.org/10.1177/0093854812457921
Haberman, C. P., Clutter, J. E., & Lee, H. (2021). A robbery is a robbery is a robbery? Exploring crime specificity in official police incident data. Police Practice & Research. https://doi.org/10.1080/15614263.2021.2009345
Lammers, M. (2014). Are arrested and non-arrested serial offenders different? A test of spatial offending patterns using DNA found at crime scenes. Journal of Research in Crime and Delinquency, 51(2), 143–167. https://doi.org/10.1177/0022427813504097
Laufs, J., Bowers, K., Birks, D., & Johnson, S. D. (2021). Understanding the concept of ‘demand’ in policing: A scoping review and resulting implications for demand management. Policing and Society, 31(8), 895–918. https://doi.org/10.1080/10439463.2020.1791862
Lindegaard, M. R., Liebst, L. S., Philpot, R., Levine, M., & Bernasco, W. (2022). Does danger level affect bystander intervention in real-life conflicts? Evidence from CCTV footage. Social Psychological and Personality Science, 13(4), 795–802. https://doi.org/10.1177/19485506211042683
Morgan, A., Dowling, C., & Voce, I. (2020). Australian outlaw motorcycle gang involvement in violent and organised crime (No. 586; Trends & Issues in Crime and Criminal Justice). Australian Institute of Criminology. https://www.aic.gov.au/publications/tandi/tandi637
Pearson, G., Rowe, M., & Turner, L. (2018). Policy, practicalities, and PACE s. 24: The subsuming of the necessity criteria in arrest decision making by frontline police officers. Journal of Law and Society, 45(2), 282–308. https://doi.org/10.1111/jols.12087
Piatkowska, S. J., & Camacho, J. (2022). Foreign-born arrestees and recidivism: A multilevel analysis of arrest data from a Florida county Sheriff’s office. Crime, Law and Social Change, 77(5), 479–501. https://doi.org/10.1007/s10611-021-10005-y
Pierson, E., Simoiu, C., Overgoor, J., Corbett-Davies, S., Jenson, D., Shoemaker, A., Ramachandran, V., Barghouty, P., Phillips, C., Shroff, R., & Goel, S. (2020). A large-scale analysis of racial disparities in police stops across the United States. Nature Human Behaviour, 4(7), Article 7. https://doi.org/10.1038/s41562-020-0858-1
Ratcliffe, J. H. (2000). Aoristic analysis: The spatial interpretation of unspecific temporal events. International Journal of Geographical Information Science, 14(7), 669–679. https://doi.org/10.1080/136588100424963
Simpson, R., & Orosco, C. (2021). Re-assessing measurement error in police calls for service: Classifications of events by dispatchers and officers. PLOS ONE, 16(12), e0260365. https://doi.org/10.1371/journal.pone.0260365
Strom, K. J., & Planty, M. (2022). Using National Incident-Based Reporting System data to assess agency differences in clearance rates: A recommendation for law enforcement. Police Practice and Research, 23(4), 444–457. https://doi.org/10.1080/15614263.2021.2022480
Tiratelli, M., Quinton, P., & Bradford, B. (2018). Does stop and search deter crime? Evidence from ten years of London-wide data. The British Journal of Criminology, 58(5), 1212–1231. https://doi.org/10.1093/bjc/azx085
Tompson, L., Johnson, S., Ashby, M. P. J., Perkins, C., & Edwards, P. (2015). UK open source crime data: Accuracy and possibilities for research. Cartography and Geographic Information Science, 42(2), 97–111. https://doi.org/10.1080/15230406.2014.972456
Examples of open-source international crime data
Includes police and crime data for a number of African Countries including South Africa and Nigeria.
Includes recorded crime – offenders, victims and court data
Dashboards available in PowerBI.
A multitude of data tables covering various crime topics.
Official crime statistics for European Union (Austria Belgium, Bulgaria, Croatia, Cyprus, Czechia, Denmark, Estonia, Finland, France, Germany, Greece, Hungary, Ireland, Italy, Latvia, Lithuania, Luxemburg, Malta, Netherlands, Poland, Portugal, Romania, Slovakia, Slovenia, Spain, Sweden) and European Free Trade Association (Iceland, Liechtenstein, Norway, Switzerland) members.
Reports in English and aggregated data are downloadable. Input ‘crime’ into the search bar (or click through the ‘browse dataset’s icons and click on ‘law and security’) to reveal crime related data.
Reports in English, however raw data can be found in Japanese by clicking through on the NPA website or going straight to e-stat.
Statistics for Mexico includes crime stats. Raw data are in Spanish.
Datasets including victim, offender, and proceedings
Some Public Safety available here.
SAPS is the South African Police Service, while Crime Hub was launched by the Institute for Security Studies (ISS) and uses SAPS data to created maps, tools and analysis.
The first link covers England, Wales, and Northern Ireland, and provides data on crime and Anti-Social Behaviour.
The second link covers Scottish crime data.
United Nations Office on Drugs and Crime. Provides global statistics on crime, criminal justice, and drug use and trafficking.
Provides a number of crime and policing data sets available to download in various formats.
United States (Archive of Criminal Justice Data)
Contains a number of statistical data series