Skip to main content
SearchLoginLogin or Signup

Diversifying crime datasets in introductory statistical courses in criminology

Buil-Gil, D., Bui, L., Trajtenberg, N., Diviak, T., Kim, E., & Solymosi, R. (2024). Diversifying crime datasets in introductory statistical courses in criminology. Journal of Criminal Justice Education, pp. 1–27. https://doi.org/10.1080/10511253.2024.2334706

Published onFeb 16, 2024
Diversifying crime datasets in introductory statistical courses in criminology
·

Abstract

Contemporary criminology issues are increasingly global, cross-cultural, and multilingual. Moreover, students from different cultural and national backgrounds will need to apply data analytics in their respective contexts. Crime data used in statistical courses should reflect this diversity, and in turn enhance the equality and inclusivity of the teaching curriculum. Supported by evidence-based pedagogic principles and evaluations, researchers have identified strategies to enhance the teaching and learning of quantitative skills. Promoting students’ understanding of quantitative methods and their application in criminology requires that teaching materials reflect real-world problems and the diversity of today’s student population. To facilitate this aim, the article first describes over forty open and accessible crime data sources across political, cultural, and linguistic borders in the Global South. Moreover, to support educators in their implementation and use of these datasets, the article presents three case studies of exemplar pedagogic activities using available data sources in an undergraduate Criminology program in the UK. Exemplar activities include (1) time series analysis of homicide in Asia; (2) bivariate analysis of trust in police and victimization in Algeria; and (3) mapping kidnappings in Mexico. We end by discussing the pedagogical and research implications of diversifying datasets and some future challenges.

Introduction

Teaching quantitative research methods to social science students presents a unique blend of challenges and rewards. In particular, numerous surveys have indicated that students in social science disciplines, both at the undergraduate and graduate levels, often are reluctant to learn quantitative research methods (Adeney and Carey, 2011; Buckley et al., 2015; Williams and Sutton, 2011). Moreover, a considerable number of students experience heightened anxiety when confronted with statistics and data analysis modules (Chamberlain et al., 2015; Williams et al., 2008). Statistics anxiety, though varying across individuals and population groups (Liu et al., 2011; Ralston et al., 2016), is notably more prevalent in the social sciences and humanities compared to other academic fields (Koh and Zawi, 2014).[1] Nevertheless, research consistently highlights that social science students equipped with quantitative skills not only tend to have better grades in their degrees (Eick et al., 2021) and stand a better chance of securing well-paid jobs with improved working conditions than those without such skills (Rosemberg et al., 2022), but these skills also play a pivotal role in shaping their social, civil, and professional lives in today’s data-driven world (Andersen and Harsell, 2005). The intersection of these two crucial facts – the lack of data analysis skills among social science students and the pivotal role of such skills in their career and personal development – underscores the imperative for educators to devise evidence-based pedagogies aimed at enhancing the learning of quantitative research methods in higher education degrees, and in turn, address students’ anxiety and reluctance to learn.

A critical component in teaching data analytics for social science disciplines, particularly in criminology, lies in ensuring that students comprehend how the acquired methods and skills can be applied to better understand and address real-world issues central to the field of study. Importantly, these issues often transcend national borders, especially in the increasingly global and diverse landscape of today's criminology (Bennett, 2004; Ouassini and Ouassini, 2020; Zhang and Liu, 2023). Freda Adler, in her 1995 Presidential Address to the American Society of Criminology, emphasized the necessity for criminology education to adopt a global perspective, urging students to broaden their horizons beyond local contexts:

“Think and teach globally. Whether your students are headed for a career in academia or in the field, they must be able to expand their horizons beyond the local level in order to confront issues that are now clearly universal.” (Adler, 1996: 7)

This imperative is not confined to modules covering substantive theoretical and practical debates, but extends to introductory and advanced statistical units as well. It is crucial for statistical modules to leverage diverse criminological datasets from various contexts, reflecting the international and cultural diversity of both the subject matter and student cohorts. Such reflection is conducive to inclusivity whereby students also see themselves reflected in the curriculum.

This article addresses the need to diversify and internationalize datasets used in introductory statistical modules within criminology programs. We argue that international criminological data recorded across Global South countries can be effectively employed to teach both basic and advanced statistical analysis modules that are guided by learn-by-doing and hands-on pedagogic principles. Beyond relying solely on survey and police-recorded crime data from North America, Europe, and Oceania, educators are encouraged to incorporate data recorded in the Global South.

Moreover, further dissemination and use of crime datasets recorded in Global South countries may have extensive benefits for criminological research beyond teaching and learning: despite the high concentration of violence in the Global South, especially in certain subregions of Africa, Latin America and the Caribbean (UNODC, 2023), most research has been conducted in high-income societies, particularly in North America and Western Europe (Aas, 2012; Eisner, 2023; Murray et al., 2018). Diversifying crime datasets would also aid in evaluating the empirical validity and generalizability of many criminological hypotheses and theories.

The following sections describe our approach to teaching statistical reasoning in computer lab environments, using real-world data recorded across countries. We explore a wide array of criminological data sources available in the Global South and provide pedagogic examples demonstrating how these datasets can be used in practical exercises covering descriptive univariate statistics, inferential statistics, and geographic crime data analysis in R Software (R Core Team, 2023).

Toward Computer Lab-Based Teaching and Learning

A growing group of scholars have contributed to enhancing the teaching and learning of criminology quantitative skills (Ashby, 2023; Groff and Haberman, 2023; Kaplan, 2022; Medina Ariza and Solymosi, 2023; Woodich et al., 2022). Supported by evidence-based pedagogic principles and empirical evaluations in higher education contexts, researchers have identified a series of promising strategies and practices. Increasingly, pedagogic approaches to teaching data analysis courses in social sciences, and particularly in criminology degrees, are influenced by theories of learn-by-doing and hands-on education. American philosopher John Dewey (1916) argued that all acquired knowledge is in some regard ‘experiential’ and relates to individuals’ lived experiences. Following Vygotsky’s (1930) sociocultural theory of learning, full competence is understood as a combination of activity (i.e., ‘doing’) and social interactions embedded in social and cultural environments. Similarly, behaviorist models of education understand learning as a process which involves active participation and action in the world (Skinner, 1971). The student is not a passive agent of educational processes, but becomes actively involved in his or her own learning. In this regard, chalk and board-based teaching, and even PowerPoint-assisted lectures, have been proven to decrease students’ motivation to learn and lower their understanding and engagement with statistical courses (Meguid and Collins, 2017). Instead, evidence suggests that hands-on, computer lab-based teaching that makes use of data recorded on the real-world encourages conceptual and theoretical understanding of statistical principles as well as their application to real-world problems (Harlow, 2013).

Throughout the years, we have applied the proposition ‘data skills are learned through practice’ to our teaching both at the undergraduate and graduate levels in a British university. Students are provided with real-world criminological data and access to open-source statistical software in computer lab environments to learn descriptive and inferential statistical analyses while exploring these data to provide insight on important open research questions. For example, students learn how to apply a variety of data visualization approaches by accessing real-world crime records published by the FBI and plotting changes in crime trends (e.g., https://cde.ucr.cjis.gov/LATEST/webapp/#/pages/home). Students not only learn the theoretical and conceptual principles of data visualization and how to apply these in practice using real-world data and open software, but also they learn about how different crimes have evolved over time and how law enforcement and public administrations respond to it. In the end-of-term assessment, students are asked to use existing data sources and learnt data analysis skills to analyze a given criminological topic and write a report with policy recommendations. This learning process can be further facilitated by freely accessible web-based interactive sites that provide all necessary information for students to learn the theory and practice of data analytics (e.g., https://maczokni.github.io/MSCD_labs/ and https://books.lesscrime.info/learncrimemapping/). Nonetheless, in order for these computer lab-based, hands-on sessions to successfully achieve their aim – developing students’ understanding and skills in quantitative methods and their reasoning – they need to follow four principles guided by evidence-based pedagogical practices.

First, for complex, sometimes abstract statistical principles (e.g., Central Limit Theorem, statistical significance and p-values, moving averages) to be fully comprehended, it is often not enough to ask students to undertake one isolated activity at one point in time, but it is essential to repeat the same principles several times and apply these to a variety of settings, sometimes throughout the whole module – data skills are learned through repetition (Signoretta et al., 2014).

Second, the learning environment is key for students to effectively engage with the intended learning outcomes – data skills are learned in environments that allow hands-on activities (MacInnes, 2014; MacInnes et al., 2016). This refers to classroom settings (e.g., cluster with computers for students and screens for educators to demonstrate how to execute certain functions), but also the technical specifications of devices (e.g., software and computing power to process data) and the teaching support available (e.g., enough teaching staff in the room to solve technical issues and provide feedback to students).

Third, social science students appear to be less engaged with quantitative methods courses taught by non-social scientists who know little about the substantive interests of students (Payne, 2014) – data skills are better learned when they are taught by educators with specialization in the area.

And fourth, statistical and data analysis principles covered in social science modules ought to be, where possible, embedded in real-world problems relevant to the discipline – data skills are learned by seeing their application in the world (Stockemer, 2019). For example, while it will likely be easier for criminology students to engage with the learning of t-tests if they are comparing differences in victimization between population groups than differences in financial profits between business sectors, this is unlikely to be the case for students of business administration. In this regard, it is also important to acknowledge that students from different cultural and national backgrounds are often interested in seeing the application of data analytics in their respective contexts. Where possible, it is necessary to provide students with real-world datasets of crimes recorded in a variety of countries, including the Global South, so they can apply learned techniques to understand criminological phenomena in each place.

While following these principles is not always possible (e.g., organizational restraints, staff resources, data availability), we have co-developed a syllabus of criminological data analysis modules that follow the principles of learn by repetitively doing, computer lab-based, real-world problems-focused, and taught by specialists, which place quantitative skills learning at the core of the curriculum. Our modules make use of data recorded across cultures and nations, in turn representing the global nature of contemporary debates in criminology and the diversity of student cohorts.

Availability of (Open Access) Criminological Data Sources in the Global South

One of the main challenges faced by educators willing to use diversified datasets in introductory statistical courses is finding suitable criminological data sources to enable hands-on pedagogic activities. Such sources of data should be sufficiently versatile and granular, as well as easy to access, so to facilitate various activities aimed at learning essential methodological skills. These skills include univariate analysis (e.g., measures of central tendency and dispersion for variables with different levels of measurement, bar charts), bivariate analysis (e.g., cross-tabulations, row and column percentages, scatter plots, bivariate correlation, t-tests), construction of indexes and scales (e.g., composite measures, reliability tests, factor analysis), regression analysis (e.g., linear and logistic regression), time series analysis (e.g., trends analysis, seasonal decomposition, autocorrelation and partial autocorrelation) and geographic data analysis (e.g., spatial autocorrelation, hotspot analysis, Moran’s I). While appropriate data sources exist in the Global North, including examples such as the National Crime Victimization Survey, the FBI’s Uniform Crime Reporting data, and the Crime Survey for England and Wales, fewer sources located in the Global South are widely known. However, there are many sources of criminological data recorded in the Global South that can be easily adapted to teaching materials used in statistical courses. We consider the ‘Global South’ those nations defined as ‘developing economies’ by the United Nations Conference on Trade and Development (UNCTAD) (2022; see Figure 1).

Figure 1. Global North and Global South countries as defined by UNCTAD (2022)

In Table 1, we present an extensive list of criminological data sources available in the Global South that meet the following criteria: (1) available in open access via an online platform without the need to register or with free registration for students and educators, regardless of their country of residence or institutional affiliation; (2) record data in at least one Global South country; (3) record data of at least two variables of interest in criminology (e.g., offending, victimization, fear of crime, crime incidents, attitudes and perceptions towards the criminal justice system); and (4) enable at least one form of statistical analysis covered in criminology introductory statistical modules.

To select data sources, we reviewed previous articles on the availability and use of sources of criminological data (Ashby, 2023; Buil-Gil et al., 2024; Curtis-Ham et al., 2024; Nivette, 2021), consulted the UNODC (n.d.) atlas on crime victimization surveys, searched academic databases for criminological articles published in the Global South, and consulted colleagues in Global South countries, specifically in Chile, China, Colombia, Ecuador, India, Mexico, and Uruguay. While Table 1 aims to be as exhaustive and comprehensive as possible, some other relevant sources of criminological data may have been missed. For instance, we had to exclude some interesting data sources available in countries such as Cambodia, Papua New Guinea, Thailand, Uganda, and Venezuela as they were not openly available online. Furthermore, we note that while all URLs included in the table were last visited in January 2023, some links may undergo changes.

In Table 1, we included the most relevant characteristics of selected datasets to ensure their value for educators planning their courses. Particularly, whether they record information about perpetration and victimization of various types of crimes, crimes and violence committed by significant others (e.g., peers, parents), relevant correlates of crime, reporting of crime to authorities, fear of crime, and attitudes or perceptions of different criminal justice system actors. We considered not only details on how users can access data sources online but also the range of years and geographical coverage of available datasets, as well as what type of variables of interest are included. We also detailed the types of analyses made possible by the different sources of data, distinguishing between univariate, bivariate, regression, temporal analysis, and geographic analysis. Furthermore, we note that some of these data sources enable other forms of data analysis, such as the construction of indexes and scales (e.g., Chilean National Urban Citizen Security Survey, Latinobarometer). Table 1 presents a comprehensive list of over forty data sources that can be used in introductory and advanced statistical courses in criminology, including self-report delinquency studies, victimization surveys, general social and health surveys, police recorded crime data, data recorded by other criminal justice system agencies, and crowdsourced data.

Table 1. Open Access Criminological Data Sources in the Global South

Name

Temporal coverage

Geographic coverage

Key variables

Types of analyses it allows

Web access

Self-report delinquency surveys

International Self-Report Delinquency Study (ISRD)

ISRD 1: 1992

ISRD 2: 2005-2007

ISRD 3: 2012-2019

ISRD 4: 2020-2023

(ISRD 3 and 4 will be available soon)

49 countries (across all continents)

Offending, deviance, victimization,

delinquent peers,

victimization correlates, perceptions about CJS

Univariate analysis, bivariate analysis, regression analysis

https://isrdstudy.org/

Victimization surveys

International Crime Victims Survey (ICVS)

ICVS-1: 1989

ICVS-2: 1992

ICVS-3: 1996

ICVS-4: 2000

ICVS-5: 2004/2005

78 countries (across all continents)

Victimization, crime reporting, victimization correlates, fear of crime, punitiveness, perceptions about CJS

Univariate analysis, bivariate analysis, regression analysis

https://www2.unil.ch/icvs/

National Urban Citizen Security Survey (ENUSC)

2003-Present

Chile

Victimization, crime reporting, fear of crime, perceptions about CJS

Univariate analysis, bivariate analysis, regression analysis, temporal analysis, geographic analysis

https://cead.spd.gov.cl/estudios-y-encuestas/

Gender Violence in the Family and other Spaces Survey (ENVIF)

2008 (not available), 2012, 2017, 2022

 

Chile

Victimization, perpetrator characteristics, crime reporting, contact with CJS

Univariate analysis, bivariate analysis, regression analysis,

temporal analysis

https://cead.spd.gov.cl/estudios-y-encuestas/

National Survey of Violence in the School (ENVAE)

ENVAE-I: 2005

ENVAE-II: 2007

ENVAE-III: 2009

ENVAE- IV: 2014

 

Chile

Victimization, offending, antisocial behavior

Univariate analysis, bivariate analysis, regression analysis, temporal analysis

https://cead.spd.gov.cl/estudios-y-encuestas/

National Survey of Victimization and Perception of Public Safety (ENVIPE)

2011-Present

Mexico

Victimization, crime reporting, fear of crime, perceptions about CJS

Univariate analysis, bivariate analysis, regression analysis, temporal analysis

https://en.www.inegi.org.mx/programas/envipe/

National Victimization Survey (ENV)

2017

Argentina

Victimization, crime reporting, fear of crime, perceptions about CJS

Univariate analysis, bivariate analysis, regression analysis

https://www.indec.gob.ar/indec/web/Institucional-Indec-BasesDeDatos-5

National Household Survey (ENAHO)

2018

Costa Rica

Victimization, crime reporting

Univariate analysis, bivariate analysis, regression analysis

http://sistemas.inec.cr/pad5/index.php/catalog/203

National Survey of Perceptions of Public Safety and Victimization (ENPEVI)

2018

Guatemala

Victimization, crime reporting, fear of crime, perceptions about CJS

Univariate analysis, bivariate analysis, regression analysis

https://mingob.gob.gt/la-encuesta-nacional-de-percepcion-de-seguridad-publica-y-victimizacion-2018-enpevi-2018/

National Household Survey for Multiple Purposes (ENHOGAR)

2006-Present

Dominican Republic

Victimization, crime reporting, fear of crime, perceptions about CJS

Univariate analysis, bivariate analysis, regression analysis, temporal analysis

https://www.one.gob.do/datos-y-estadisticas/

Survey of Coexistence and Public Safety (ECSC)

2011-2021

Colombia

Victimization, crime reporting, fear of crime, perceptions about CJS

Univariate analysis, bivariate analysis, regression analysis, temporal analysis

https://microdatos.dane.gov.co/index.php/catalog/SEGDEF-Microdatos

Integrated Household Panel Survey (IHPS)

2010, 2013, 2016, 2019

Malawi

Victimization

Univariate analysis, bivariate analysis, regression analysis, temporal analysis

https://microdata.worldbank.org/index.php/catalog/3819/

General social and health surveys

World Values Survey

Wave 1: 1981-1984

Wave 2: 1990-1994

Wave 3: 1995-1998

Wave 4: 1999-2004

Wave 5: 2005-2009

Wave 6: 2010-2014

Wave 7: 2017-2022

64 countries (across all continents)

Victimization, victimization correlates, fear of crime, perceptions about CJS

Univariate analysis, bivariate analysis,

regression analysis, temporal analysis

https://www.worldvaluessurvey.org/WVSContents.jsp

Afrobarometer

Round1: 1999-2001

Round2: 2002-2003

Round3: 2005-2006

Round4: 2008-2009

Round5: 2011-2013

Round6: 2014-2015

Round7: 2016-2018

Round8: 2019-2021

50 countries of Africa

Victimization, victimization correlates, fear of crime, perceptions about CJS

Univariate analysis, bivariate analysis, regression analysis, temporal analysis

https://www.afrobarometer.org/data/data-sets/

Asian Barometer

Wave 1: 2001-2003

Wave 2: 2005-2008

Wave 3: 2010-2012

Wave 4: 2014-2016

Wave 5: 2018-2022

Wave 6: 2023-Present

18 countries of Asia

Victimization, perceptions about CJS

Univariate analysis, bivariate analysis, construction of index/scales

https://www.asianbarometer.org/datar?page=d10

Americas Barometer (LAPOP)

2004-Present

26 countries of America

Victimization, victimization correlates, insecurity, perceptions about CJS

Univariate analysis, bivariate analysis, regression analysis, temporal analysis

https://www.vanderbilt.edu/lapop/raw-data.php

Latinobarometer

2020-Present

18 countries of Latin America

Victimization, victimization correlates, fear of crime, perceptions about CJS

Univariate analysis, bivariate analysis, regression analysis, temporal analysis

https://www.latinobarometro.org/latContents.jsp

Multiple Indicator Cluster Surveys

(MICS)

MICS1: 1993-1998

MICS2: 1999-2003

MICS3: 2005-2010

MICS4: 2009-2013

MICS5: 2012-2017

MICS6: 2017-2023

MICS7: 2023-2024

118 countries (across the world)

Victimization, victimization correlates, attitudes toward violence

Univariate analysis, bivariate analysis, regression analysis, temporal analysis

https://mics.unicef.org/surveys

Demographic Health Surveys

Phase 1: 1984-1989

Phase 2: 1988-1993

Phase 3: 1992-1997

Phase 4: 1997-2003

Phase 5: 2003-2008

Phase 6: 2008-2013

Phase 7: 2013-2018

Phase 8: 2018-2023

90 countries (across the world)

Victimization, victimization correlates, attitudes toward violence

Univariate analysis, bivariate analysis, regression analysis

https://dhsprogram.com/data/available-datasets.cfm

Global School-based Health Survey (GSHS)

GSHS1: 2003-2008

GSHS2: 2009-2012

GSHS3: 2013-2017

GSHS4: 2018-2020

GSHS5: 2021

99 countries (across the world)

Victimization, offending, victimization, perpetration correlates

Univariate analysis, bivariate analysis, regression analysis, temporal analysis

https://extranet.who.int/ncdsmicrodata/index.php/catalog/gshs/?page=1&ps=15&repo=GSHS

Police recorded crime data

World Bank Homicide Data

1990-Present

266 countries (across all continents)

Crime incidents

Univariate analysis, temporal analysis, geographic analysis

https://data.worldbank.org/indicator/VC.IHR.PSRC.P5

dataUNODC: Violent and sexual crime

2010-Present

157 countries (across all continents)

Crime incidents

Univariate analysis, temporal analysis, geographic analysis

https://dataunodc.un.org/dp-crime-violent-offences

dataUNODC: Corruption and economic crime

2010-Present

157 countries (across all continents)

Crime incidents

Univariate analysis, temporal analysis, geographic analysis

https://dataunodc.un.org/dp-crime-corruption-offences

dataUNODC: Wildlife trafficking

2005-Present

129 countries (across all continents)

Crime incidents

Univariate analysis, temporal analysis, geographic analysis

https://dataunodc.un.org/dp-wildlife-seizures

dataUNODC: Drug trafficking and cultivation

2010-Present

130 countries (across all continents)

Drug seizures, drug prices, drug purity

Univariate analysis, temporal analysis, geographic analysis

https://dataunodc.un.org/dp-drug-seizures

crimemappingdata: Car jackings in Mexico City

2019

Mexico City (Mexico)

Crime incidents

Univariate analysis, temporal analysis, geographic analysis

https://pkgs.lesscrime.info/crimemappingdata/

crimemappingdata: Violent crime counts in Malaysia

2006-2017

Malaysia

Crime incidents

Univariate analysis, temporal analysis, geographic analysis

https://pkgs.lesscrime.info/crimemappingdata/

crimemappingdata: Medelling homicides

2010-2019

Medellin (Colombia)

Crime incidents

Univariate analysis, temporal analysis, geographic analysis

https://pkgs.lesscrime.info/crimemappingdata/

Crime in India 2019

2019

India

Crime incidents, persons arrested, persons prosecuted

Univariate analysis, temporal analysis, geographic analysis

https://data.gov.in/catalog/crime-india-2019

National Public Security Data

2015-Present

Brazil

Crime incidents

Univariate analysis, temporal analysis, geographic analysis

https://www.gov.br/mj/pt-br/assuntos/sua-seguranca/seguranca-publica/estatistica/dados-nacionais-1/dados-nacionais

Reports of domestic violence and associated crimes

2017-Present

Uruguay

Crime incidents

Univariate, temporal analysis, geographic analysis

https://catalogodatos.gub.uy/dataset/violencia-domestica-y-asociados

Crime Statistics in Detail

2008-2019

Hong Kong

Crime incidents

Univariate analysis, temporal analysis

https://data.gov.hk/en-data/dataset/hk-hkpf-stat-crm-stat-detail

National Census of State Government, Public Safety and Penitentiary System

2011-Present

Mexico

Crime incidents, police interventions, registered offenders

Univariate analysis, temporal analysis, geographic analysis

http://en.www.inegi.org.mx/programas/cngspspe/2020/#open_data // https://www.gob.mx/sesnsp/acciones-y-programas/datos-abiertos-de-incidencia-delictiva?state=published

Public Safety Data

2022

Singapore

Crime incidents

Univariate analysis

https://www.singstat.gov.sg/find-data/search-by-theme/society/public-safety/

ISS Crime Hub

2013-Present

South Africa

Crime incidents

Univariate analysis, bivariate analysis, temporal analysis, geographic analysis

https://crimehub.org/topics/crime-statistics // https://www.saps.gov.za/services/crimestats.php

Mitchell Centre for Social Network Analysis Covert Networks

-

Criminal networks in countries including Bali, India, Indonesia and Philippines

Criminal networks

Social network analysis

http://www.casos.cs.cmu.edu/tools/datasets/external/index.php

Data recorded by other criminal justice system agencies

dataUNODC: Prisons and prisoners

2010-Present

227 countries (across all continents)

Persons in prison,

prison facilities, mortality in prison

Univariate analysis, temporal analysis, geographic analysis

https://dataunodc.un.org/dp-prisons-persons-held

dataUNODC: Access and functioning of justice

2010-Present

159 countries (across all continents)

Persons convicted, persons arrested, persons prosecuted, criminal justice personnel

Univariate analysis, temporal analysis, geographic analysis

https://dataunodc.un.org/dp-cjs-persons-convicted

World Prison Brief

2000-Present

180 countries (across all continents)

Persons in prison, pretrial inmates, occupancy levels

Univariate analysis, bivariate analysis, temporal analysis

https://www.prisonstudies.org/world-prison-brief-data

Number of persons in custody by correctional institution

2019-Present

Hong Kong

Persons in prison

Univariate analysis,

bivariate analysis, temporal analysis

https://data.gov.hk/en-data/dataset/hk-csd-csdpsidata-csdnuminstitutions

National Census of State Government, Public Safety and Penitentiary System

2011-Present

Mexico

Persons in prison

Univariate analysis, temporal analysis, geographic analysis

http://en.www.inegi.org.mx/programas/cngspspe/2020/#open_data

Social media and crowdsourced data

Place Pulse

2013-2019

56 cities (across all continents)

Fear of crime

Univariate analysis, bivariate analysis, geographic analysis

https://figshare.com/articles/dataset/Place_Pulse/11859993

Pedagogical Examples

This section presents three exemplar pedagogic activities that make use of the criminological data from Table 1. The purpose is to demonstrate how Global South criminological data can be easily adapted for undergraduate and graduate teaching. These activities are designed to help students apply a variety of data analysis techniques using R Software (R Core Team, 2023), including descriptive, inferential, and temporal and geographic analysis. Therefore, it is advisable to introduce exemplar activities, such as the ones presented here, to students after a theoretical overview of these analytical techniques in class. This theoretical introduction can take the form of a lecture, reading materials, or a combination of both. Consequently, the activities below serve the pedagogic function of teaching students how to apply key data analysis techniques once they already possess the knowledge of what these methods are and why they are important. We also note that the below activities are summarized versions of longer computer lab notes, and hence they are presented here as brief examples of potential types of activities that can be applied in class. All data and analytic codes used in these activities are available on Github: https://github.com/davidbuilgil/crim-data-south.

Namely, we first use homicide data made available by the World Bank in East and Southeast Asian countries to illustrate how to undertake descriptive and temporal analysis of numeric variables. Second, data from the Afrobarometer is used to analyze the relationship between victimization and trust in the police in Algeria making use of inferential statistics. And third, data about kidnappings recorded by the police in Mexico is used to explore the geographic distribution of these offenses. Thus, we effectively demonstrate with examples how to employ criminological data recorded in Asia, Africa, and Latin America to undertake activities aimed at learning descriptive, inferential, and geographic analysis.

Case Study 1: Descriptive and Time Series Analyses of Homicide in East and Southeast Asia

This activity demonstrates how to undertake univariate and temporal analysis to explore the volume and temporal changes of homicide in East and Southeast Asia. Homicide is considered the most reliable crime indicator because definitions are similar cross-nationally and information is widely collected and registered (Oberwittler, 2019).

The rise of developed economies in Asia from the late 1970s onwards defied the conventional notion that rapid industrialization inevitably led to increases in crime. Japan was the first to develop into a rising economic superpower and was followed by the four high-growth economies of the ‘Asian tigers’: Hong Kong, Singapore, South Korea, and Taiwan (Bui and Farrington, 2019). Advanced economic development and comparatively low crime were attributed to ‘Asian Exceptionalism’: Confucian values, peculiar to East Asia and some parts of Southeast Asia, supposedly explained the success (Sheptycki, 2008). Yet, such exceptionalism does not account for Asian countries with developing economies despite Confucian influence, and necessitates a comparison with the rest of the world. Indeed, ‘exceptionalism’ calls for a comparison. It is a ‘fundamentally comparative concept’ because only comparisons can test its assumption (Karstedt, 2012). Informed by this context, we are curious to know the extent to which these advanced economically developed Asian countries differ from their developing counterparts in homicide over time, and whether these Asian countries are exceptional to that of the world. We first compare rates of homicide between developed and developing East and Southeast Asian countries and visualize patterns between the years 2000 and 2020. We use homicide data recorded and made available by The World Bank Data repository (https://data.worldbank.org/indicator/VC.IHR.PSRC.P5?most_recent_year_desc=true&locations=).

We begin by loading the required packages in R:

library(here) # to identify the path to the data
library(dplyr) # for data wrangling
library(tidyr) # for data wrangling
library(stringr) # to work with strings
library(ggplot2) # for visualizations

We open this dataset and select the following East and Southeast Asian countries that have been influenced by Confucianism (Oldstone-Moore, 2023): high growth developed economies identified in the literature – Hong Kong, Japan, Singapore, and South Korea – and developing economies – (People’s Republic of) China, Indonesia, Macau, Malaysia, Mongolia, North Korea, and Vietnam[2]; a total of 11 countries plus the world average. The function ‘filter()’ from within the ‘dplyr’ package (Wickham et al., 2019) can be used to select a set of observations from our dataset. In this case, we select only those countries that were previously added to the list named ‘countries_interest’.

# Read csv file with data
data_homicides <- read.csv(here("data/API_VC.IHR.PSRC.P5_DS2_en_csv_v2_5996865.csv"),
  skip = 3) # Skip first three rows (no data)

# Make list of East and Southeast (SE) Asian countries
countries_interest <- c("China", "Hong Kong SAR, China", "Japan",
  "Singapore", "Korea, Rep.", "Macao SAR, China",
  "Indonesia", "Mongolia", "Malaysia",
  "Korea, Dem. People's Rep.", "Viet Nam", "World")

# Select the listed East and SE Asian countries in dataset
data_homicides <- data_homicides %>%  
  filter(Country.Name %in% countries_interest)

We are particularly interested in those countries that have complete data between 2000 and 2020. Using the ‘select()’ function of the ‘dplyr’ package, we select columns of interest; we then tidy up the column names, filter out countries without complete data using the ‘complete.cases()’ function, and gather columns to change them into rows under a new variable called “Homicide rate”.

data_homicides <- data_homicides %>%  
  # Select columns of interest  
  select(Country.Name, X2000:X2020) %>%  
  # Remove X from column names  
  rename_with(~str_replace(., "^X", ""), starts_with("X")) %>%   
  # Filter out incomplete cases  
  filter(complete.cases(.)) %>%  
  # Gather columns to rows  
  gather(Year, "Homicide rate", -Country.Name)

Consequently, six of the 11 countries (China, Indonesia, Mongolia, North Korea, Singapore, and Vietnam) are excluded from the dataset.

We are interested in knowing the average homicide rate across the remaining countries and their variance over the years. We group the data by countries (‘group_by()’ function) and summarize the mean and variance of homicides in each country.

data_homicides %>%  
  group_by(Country.Name) %>%  
  summarize(mean = mean(`Homicide rate`),
            variance = var(`Homicide rate`))

## # A tibble: 6 × 3
##   Country.Name          mean variance
##   <chr>                <dbl>    <dbl>
## 1 Hong Kong SAR, China 0.524   0.0534
## 2 Japan                0.394   0.0135
## 3 Korea, Rep.          0.793   0.0187
## 4 Macao SAR, China     0.782   0.436 
## 5 Malaysia             1.84    0.259 
## 6 World                6.17    0.160

Although Confucianism has influenced all of these countries, average homicide rates for all three developed Asian countries are lower than those of the developing countries: Hong Kong has the lowest (0.52) whereas Malaysia has the highest (1.84) homicide rate. Mean homicide rates in our sample of Asian countries, however, are remarkably lower than the global average (6.17).

Finally, we want to display the annual averages in a temporal line graph. The package ‘ggplot2’ allows displaying data in graphs (Wickham, 2016). We create a line graph (‘geom_line()’ function) with the ‘Year’ as x-axis and the ‘Homicide rate’ as y-axis, then group the data by country. We add a title using the ‘ggtitle()’ function and use a classic theme for the visualization (‘theme_classic()’).

ggplot(data_homicides, aes(x = Year, y = `Homicide rate`,
                           group = Country.Name)) +  
  # Visualize each country with a different color  
  geom_line(aes(color = Country.Name)) +  
  # Add title  
  ggtitle("Intentional homicide rate (per 100,000 people)") +  
  # Classic graph theme  
  theme_classic()

Figure 2. Time series of homicide rates in East and Southeast Asian countries and the world average

Figure 2 shows that Malaysia has the highest homicide rate across the two decades compared with the other four countries. However, we observe that, although Malaysia has the highest homicide rate, its rate has been declining over time. It is noteworthy the difference in crime rates between East and Southeast Asian countries and the world average.

Case Study 2: Testing the Relationship Between Victimization and Trust in the Police Using the Afrobarometer

This activity shows how to undertake bivariate and inferential statistical analysis. We will analyze the relationship between the experience of being victimized and perceptions of trust in police in Africa.

There is a long tradition in criminology of research on public attitudes toward criminal justice and legal institutions (Bradford and Jackson, 2010; Sherman, 2002; Tyler, 2006). One of the key determinants of trust in criminal justice institutions is prior experiences with crime. Previous studies have shown that citizens that were victims of crime are more likely to show negative perceptions toward the police and report less confidence in them (Berthelot et al., 2018; Callahan and Rosenberg, 2011). Moreover, some research has shown that those with less trust in the police are less likely to report violent crimes (Kääriäinen and Sirén, 2011). Yet, most of the research has been conducted in North America and Europe. The elevated levels of violence observed across some countries in the Global South, particularly in Latin America and Africa (UNOCD, 2O23), and the often under-resourced and weak criminal justice institutions in these regions (Bergman, 2018), underscore the significance of exploring the relationship between victimization and trust in the police in the Global South. Some recent research has shown that in African countries such as South Africa, trust in police is higher among citizens that have not been victimized (Olutola and Bello, 2016). Alda et al. (2017) found that the difference in trust in police due to victimization is a characteristic of developing countries more so than of developed nations.

In this context, we are interested in knowing if there is a relationship between victimization and trust in the police in Algeria. This exercise will allow us to explore the creation of new variables collapsing categories, building cross tables, and analysis of association and testing independence.

We begin by loading the required packages in R:

library(here) # to identify the path to the data
library(tidyverse) # for data wrangling
library(haven) # for importing data from SPSS files
library(ggplot2) # for visualizations
library(gmodels) # for crosstables
library(lsr) # to calculate Cramer's V

To explore the relationship between victimization and trust in the police in Algeria, we will use the Algerian part of the Afrobarometer dataset (Hammani et al., 2015). Using the ‘haven’ package in R (Wickham and Miller, 2020), we will load the dataset and inspect the two variables of interest: having been a victim of assault (named ‘Q11B’ in the original data) and trust in the police (‘Q52H’). The ‘table()’ and ‘prop.table()’ functions can be used to inspect the variables of interest.

# Read sav file with survey data
alg <- read_sav(here("data/alg_r6_data.sav"))

# Inspect variable Q11B, as an example
table(alg$Q11B)
prop.table(alg$Q11B)

Both variables have five different possible values: 0 = not at all, 1 = a little, 2 = somewhat, 3 = a lot, and 9 = NA for trust in policing’; and 0 = no, 1 = once, 2 = twice, 3 = three or more, and 9 = NA for victimization experiences. In terms of experience with having been assaulted, almost 80% of respondents declared having no such experience in the past year, whereas 14% of respondents claimed to have been victims once with the remaining options not even adding up to 1%. The variable measuring trust in police displays quite an equal distribution of respondents declaring at least some level of trust, with the share of respondents fluctuating between 25% and 30%, while only 13% of respondents declared no trust in police at all. For both variables, ‘9’ (i.e., NA) refers to values that we would like to filter out from the analysis. To do this, we use the ‘tidyverse’ suite of R packages (Wickham et al., 2019) and we also simultaneously select only the two variables we will be working with in this example using the pipe operator ‘%>%’.

# Removing 9s (NAs) from variables
algS <- alg %>%   
  # Select only the columns to be analyzed  
  select(Q11B, Q52H) %>%   
  # Filter out ‘no answer’ responses (9 = NA)  
  filter(Q11B != 9, Q52H != 9)

# Convert to labelled factors
algS <- algS %>%     
  mutate(Q11Bf = as_factor(Q11B),         
         Q52Hf = as_factor(Q52H))

Since very few respondents suffered more than one assault, we will create a binary measure of victimization, which distinguishes victims from non-victims.

# Collapsing the categories using ‘mutate’ function
algS <- algS %>%  
  mutate(Q11BfR = recode_factor(Q11Bf,
                                "No" = "No",
                                "Yes, once" = "Yes",
                                "Yes, twice" = "Yes",
                                "Yes, three or more times" = "Yes"))

A suitable way to visualize the relationship between two categorical variables is to use stacked or clustered bar graphs. In Figure 3, we use a stacked bar graph as implemented in the ‘ggplot2’ package (Wickham, 2016). This visualization suggests that the share of respondents who had been previously victimized and trust the police ‘a lot’ appears to be remarkably smaller than that of respondents who did not report having suffered a crime.

# Stacked bar chart
ggplot(algS, aes(x  = Q11BfR, fill = Q52Hf)) +  
  geom_bar(position = "fill") +  
  labs(x = "Victimization",       
       fill = 'Trust in police') +  
  theme_minimal()

Figure 3. Stacked bar graph of assault victimization and trust in policing

Visualizations and numeric descriptions of variables can be informative, but it is not sufficient for testing the hypothesis of whether there is an association between these two variables. This is usually done with the use of contingency tables also known as crosstables. The ‘gmodels’ package (Warnes et al., 2019) allows users to create highly informative and visually clear crosstables.

# Crosstable with residuals and row percent, and Chi-squared 
CrossTable(algS$Q52Hf, algS$Q11BfR, format = c("SPSS"),
  chisq = TRUE, prop.c = FALSE, prop.chisq = FALSE,
  prop.t = FALSE, asresid = TRUE)

Table 2. Crosstab and Chi-squared of assault victimization and trust in policing

 

No

Yes

Row Total

Not at all

136

13.66%

0.016

20

13.61%

-0.016

156

Just a little

277

27.81%

-1.722

51

34.69%

1.72

328

Somewhat

276

27.71%

-3.256

60

40.82%

3.256

336

A lot

307

30.82%

5.012

16

10.88%

-5.012

323

Column Total

996

147

1,143

Pearson’s Chi-squared

Chi-squared = 27.62

D.F.=3

p-value < 0.001

The crosstabulation (Table 2) displays the categories of trust in police in rows and assault victimization in columns. Each cell contains the number of observations, its column percentage, and its adjusted standardized residual, respectively. Below the table, we can also find the information related to Pearson’s chi-squared test of independence. The p-value of this test suggests that we can reject the null hypothesis of independence between these two variables. We can therefore conclude that the two variables are not independent of each other.

Finally, we also use the ‘lsr’ package (Navarro, 2015) to calculate one of the most frequently used coefficients of association for categorical variables – Cramér’s V. Using a coefficient of association allows us to complement the statistical test with a measure of effect size. The resulting value of 0.16 indicates that the strength of the relationship between trust in police and being a victim of assault is ‘medium’, indeed showing that these two variables are related.

# Select only relevant columns
tab <- table(algS$Q52Hf, algS$Q11BfR)

# Coefficient of association
tabR <- tab[2:5, 1:2] cramersV(tabR)

## [1] 0.155449

Case Study 3: Mapping Kidnappings in Mexico

In this spatial analysis activity, we explore the spatial distribution of kidnapping crime in Mexico. Kidnapping, defined as the unlawful abduction or captivity of individuals against their will, carries profound security implications. Within the context of Mexico, kidnapping is closely linked to organized crime, particularly drug cartels, and poses a challenge to national security (Jones, 2013; Massa and Fondevila, 2021; Ochoa, 2012). Kidnapping in Mexico is largely associated with the demand for illicit substances in the United States (Ochoa, 2019). Criminal groups also exploit individuals, including children, for various criminal purposes (Bureau of Democracy, Human Rights, and Labor, 2022). Understanding the geographic distribution of these crimes may help develop better-informed crime prevention practices and policies targeting areas where these offences are most concentrated, particularly in countries with limited public resources for crime prevention.

We use police-recorded crime data released by the National Public Security System (SNSP) through the official portal of the Mexican Government https://www.gob.mx/sesnsp/acciones-y-programas/datos-abiertos-de-incidencia-delictiva?state=published.

We begin by loading the required packages in R:

library(here) # to identify the path to the data
library(readr) # to read in CSV data
library(dplyr) # for data wrangling
library(ggplot2) # for data visualization
library(sf) # for spatial data manipulation and visualization
library(viridis) # for color schemes

The data are saved in a CSV (comma-separated data) and we employ the ‘read_csv()’ function from the ‘readr’ package (Wickham et al., 2024), specifying the ‘Latin1’ encoding to correctly import the crime data, originally coded in Spanish, into the ‘data_Mexico’ object.

# Read csv file with crime data
data_Mexico <- read_csv(here("data/IDM_nov2023.csv"), 
  locale = locale(encoding = "Latin1"))

We examine the dataset and identify the three variables of interest: (a) ‘TIPO’ (in English ‘TYPE’: type of crime); (b) ‘AÑO’ (in English ‘YEAR’: year of crime recorded); and (c) ‘ENTIDAD’ (in English ‘ENTITY’: state of Mexico). The data contains 18 different crime types across 32 federal entities in Mexico from 2011 to 2017. To prepare the data for analysis, we first reshape the dataset by filtering the records specific to the year 2017 and the crime type ‘SECUESTRO’ (i.e., kidnapping).

data_Mexico <- data_Mexico %>%  
  filter(TIPO == "SECUESTRO") %>% # filter SECUESTRO (kidnapping)  
  filter(AÑO == 2017) # filter 2017

We would like to create a table which contains two columns: the state name (ENTIDAD) and the total number of kidnappings for each state. Currently, the data records kidnapping crimes monthly, with separate columns for each month of the year. To calculate the annual total for each state, we use the ‘rowSums()’ function, and specify the columns to be included using the ‘across()’ function. It is important to set the ‘na.rm’ parameter to TRUE to ensure that missing data (NAs) are just treated as 0 (i.e., no kidnapping crimes occurred in those months).

data_Mexico <- data_Mexico %>%  
  mutate(sum_secuestro = rowSums(across(8:19),#create new variable of sums
  na.rm = TRUE))#treat NA as 0 here

We then compute the total number of kidnappings in each state (ENTIDAD) in 2017. By using the ‘group_by()’ and ‘summarise()’ functions, we create a frequency table, which is saved as a new object called ‘data_Mexico_states’. In this table, each row corresponds to each state, providing an overview of kidnappings in 2017.

# Calculate number of crimes in each state
data_Mexico_states <- data_Mexico %>%  
  group_by(ENTIDAD) %>% # group by state
  summarise(secuestro = sum(sum_secuestro)) # sum all kidnappings

We observe that three states (Mexico, Tamaulipas, and Veracruz) stand out, with a remarkably larger number of kidnappings than the rest of the country. To assess if these states are more at risk to kidnappings, we considered the kidnapping rate per population. It is possible that the three states have higher kidnapping counts because they have large populations, where more people can lead to more crime incidents, including kidnappings. To account for this, we calculate kidnapping rates per 100,000 residents. We obtained census data from the National Institute of Statistics and Geography (INEGI): https://www.inegi.org.mx/app/tabulados/default.html?nc=mdemo02.

To link the population data with the kidnapping data, we combined the two tables using a common identifier, which remains as ‘ENTIDAD’ in the crime data and ‘STATE’ in INEGI’s population statistics. We employ the ‘left_join()’ function.

# Read csv file with population data
population <- read_csv(here("data/Population2010.csv"))

# Merge with crime data and calculate crime rates
data_Mexico_states <- data_Mexico_states %>%  
  left_join(population, by = c("ENTIDAD" = "STATE")) %>%
  mutate(secuestro_rate = secuestro / Population2010 * 100000)

We examine the top three states with the highest kidnapping rate per 100,000.

top_n(data_Mexico_states, 3, secuestro_rate)

## # A tibble: 3 × 4
##   ENTIDAD    secuestro Population2010 secuestro_rate
##   <chr>          <dbl>          <dbl>          <dbl>
## 1 TABASCO           77        2238603           3.44
## 2 TAMAULIPAS       140        3268554           4.28
## 3 ZACATECAS         67        1490668           4.49

We observe that the states with the highest kidnapping rates are Zacatecas and Tamaulipas. Tamaulipas consistently stood out in both a high count and a high rate of kidnappings. This highlights the significance of exploring potential problem-solving interventions to mitigate kidnapping incidents in this region.

To further our understanding, we examine the spatial distribution of kidnappings and inspect any potential spatial relationships among these states through a thematic map. The spatial data in JSON format used in this exercise was obtained from the following source: https://github.com/strotgen/mexico-leaflet/.

To link geographical information with kidnapping rates, we begin by importing geospatial data, which represents the Mexican state’s boundaries. We then standardize state names and merge the geospatial data with the kidnapping data to create a new dataset for spatial analysis.

# Read geojson of Mexico states
states_geojson <- st_read(here("data/states.geojson"))

# Merge crime rates with geojson file
states_geojson <- states_geojson %>%  
  mutate(state_name = toupper(state_name),#capital letters for consistency
         state_name = recode(state_name, #rename states for consistency
                             'DISTRITO FEDERAL' = 'CIUDAD DE MEXICO',
                             'MÉXICO' = 'MEXICO',
                             'MICHOACÁN DE OCAMPO' = 'MICHOACAN',
                             'QUERÉTARO' = 'QUERETARO',
                             'SAN LUIS POTOSÍ' = 'SAN LUIS POTOSI',
                             'VERACRUZ DE IGNACIO DE LA LLAVE'='VERACRUZ',
                             'NUEVO LEÓN' = 'NUEVO LEON',
                             'COAHUILA DE ZARAGOZA' = 'COAHUILA',
                             'YUCATÁN' = 'YUCATAN')) %>%   
  left_join(data_Mexico_states, by = c("state_name" = "ENTIDAD"))

We generate a heatmap of kidnapping rates across Mexican states using the ‘ggplot2’ package (Wickham, 2016) and customize the fill colors.

ggplot(data = states_geojson) +  
  ggtitle("Rate of kidnappings per 100,000 residents in Mexico states") +  
  geom_sf(aes(fill = secuestro_rate)) +  
  scale_fill_viridis(option = "magma") +  
  theme_void()

Figure 4. Rate of kidnappings recorded by the police across states in Mexico

The heatmap in Figure 4 illustrates the rate of kidnappings per 100,000 population across 32 states in Mexico. Yellow/orange shades on the map indicate higher kidnapping rates, while darker shades represent lower rates. These variations provide valuable insights for targeted interventions, highlighting areas where increased law enforcement resources may be necessary to effectively address kidnapping incidents. This spatial analysis underscores the influence of strategic choices in crime prevention.

Discussion and Conclusions

This article has addressed the imperative to integrate international and diversified datasets into statistical modules within criminology programs. It is important today not to rely exclusively on crime data sourced from North America and Europe; therefore, we encourage educators to integrate data from the Global South. Criminological datasets recorded in Global South countries offer valuable opportunities for teaching basic and advanced statistical analysis. Diversifying crime data used in introductory and advanced statistical courses may have short- and long-term benefits for learning outcomes, namely (a) diversified crime datasets better reflect the nature of contemporary criminological debates within and beyond academia, thus enhancing students’ understanding of global criminology beyond national borders; (b) learning to handle different sources of data, with different formats and structures, enhances skills to make sense of, and work with, multiple sources of information; and (c) students from different cultural and national backgrounds are keen to see the application of data analytics in their respective contexts.

Beyond the immediate pedagogical benefits of diversifying crime datasets, the wider adoption and dissemination of these datasets from Global South countries hold far-reaching implications for criminological research. Despite the prevalence of violence in the Global South, particularly in certain regions of Africa and Latin America (UNODC, 2023), and the remarkable, largely unexplained differences in crime prevalence across nearby Asian countries, the bulk of criminological research has been concentrated in North America and Western Europe (Aas, 2012; Eisner, 2023). By diversifying crime datasets, future scholars will be better equipped to evaluate the empirical validity and generalizability of various criminological hypotheses and theories.

In this article we have presented over forty open sources of criminological data recorded in the Global South. Existing data originates from diverse self-report delinquency studies, victimization surveys, general social and health surveys, police recorded crime data, data recorded by other criminal justice system agencies, and crowdsourced data across many countries in Asia, Africa and Latin America and the Caribbean. These sources of data can be effectively employed to teach a variety of analytical techniques commonly covered in criminology programs, including univariate  and bivariate analysis, regression analysis, time series analysis, and geographic data analysis. Furthermore, we have illustrated with three case studies how data recorded in Asia, Africa and Latin America can be effectively employed in statistical courses in criminology programs. While we have used R Software in our examples, these datasets can be equally employed to learn other statistical and data analysis software such as Microsoft Excel, Python, SPSS, SAS, ArcMap, QGIS, or Stata. The added benefit of R over other existing software is its free availability, wide range of available packages for most types of analyses, and the opportunity to familiarize students with the basics of writing and understanding computer code.

Beyond quantitative data, we note that secondary sources of qualitative criminological data are also increasingly becoming available both in Global North and South countries, and can be effectively employed in qualitative data analysis courses. Baird (2018), for instance, made available to all researchers affiliated to a UK university a set of interview transcripts with locals and experts on crime and violence in Port-of-Spain, Trinidad and Tobago (see https://reshare.ukdataservice.ac.uk/853648/); the SHERLOC Case Law database is a database of criminal court records worldwide (see https://sherloc.unodc.org/cld/en/v3/sherloc/cldb/index.html); the website ChainAbuse offers detailed crowdsourced experiences of cyber-enabled fraud (https://www.chainabuse.com/reports); and the Unified Court Records (UCR) database provides access to the public court records of the International Criminal Tribunal for Rwanda (https://ucr.irmct.org/). These are only some examples of open qualitative datasets that can be used in criminology qualitative data analysis courses.

In conclusion, this article underscores the need to incorporate international and diversified datasets into introductory statistical modules within criminology programs. By advocating for the integration of criminological data from Global South countries into data analysis modules, educators can enrich the learning experiences of students and foster a deeper understanding of global criminology. Diversifying crime datasets not only better reflects contemporary criminological issues but also enhances students’ analytical skills in handling diverse sources of information. Moreover, by broadening the scope of criminological research beyond North America and Europe, the wider adoption of crime datasets from the Global South holds potential for future evaluations of hypotheses and theories in criminology. The plethora of open sources of criminological data highlighted in this article, ranging from self-report delinquency studies to qualitative interviews and court records, offers educators a wealth of resources to enhance both quantitative and qualitative data analysis courses. As we move forward, embracing the richness of data from diverse cultural and national contexts will not only advance pedagogy within criminology programs but also contribute to more robust and inclusive criminological research agendas on a global scale.

References

Aas, K. F. (2012). ‘The Earth is One but the World is not’: Criminological Theory and its Geopolitical Divisions. Theoretical Criminology, 16(1), 5-20.

Adler, F. (1996). Our American Society of Criminology, the world, and the state of the art the American Society of Criminology 1995 presidential address. Criminology, 34(1), 1–10

Adeney, K., & Carey, S. (2011). How to Teach the Reluctant and Terrified to Love Statistics: The Importance of Context in Teaching Quantitative Methods in the Social Sciences. In G. Payne and M. Williams (Eds.), Teaching Quantitative Methods: Getting the Basics Right (pp. 85-98). London: SAGE.

Alda, E., Bennett, R. R., & Morabito, M. S. (2017). Confidence in the Police and the Fear of Crime in the Developing World. Policing: An International Journal, 40(2), 366–379. https://doi.org/10.1108/PIJPSM-03-2016-0045

Andersen, K., & Harsell, D. M. (2005). Assessing the Impact of a Quantitative Skills Course for Undergraduates. Journal of Political Science Education, 1(1), 17-27. https://doi.org/10.1080/15512160490921824

Ashby, M. (2023). crimemappingdata: Data for Learning Crime Mapping. R package version 0.3.0.

Baird, A. (2019). Breaking bad: Interviews with locals and experts on crime, violence and gender in Port of Spain, Trinidad 2017-2018. [Data Collection]. Colchester, Essex: UK Data Service. https://doi.org/10.5255/UKDA-SN-853648

Bennett, R. R. (2004). Comparative Criminology and Criminal Justice Research: The State of our Knowledge. Justice Quarterly, 21, 1-22. https://doi.org/10.1080/07418820400095721

Bergman, M. (2018). More money, more crime: Prosperity and rising crime in Latin America. Oxford University Press.

Berthelot, E. R., McNeal, B. A., & Baldwin, J. M. (2018). Relationships between agency-specific contact, victimization type, and trust and confidence in the police and courts. American Journal of Criminal Justice, 43, 768-791. https://doi.org/10.1007/s12103-018-9434-x

Bradford, B., & Jackson, J. (2010). Trust and Confidence in the Police: A Conceptual Review. National Policing Improvement Agency Wiki. Retrieved from: https://ssrn.com/abstract=1684508

Buckley, J., Brown, M., Thomson, S., Olsen, W., & Carter, J. (2015). Embedding Quantitative Skills into the Social Science Curriculum: Case Studies from Manchester. International Journal of Social Research Methodology, 18(5), 495-510. https://doi.org/10.1080/13645579.2015.1062624

Bui, L., & Farrington, D. P. (2019). Crime in Japan: A psychological perspective. Palgrave MacMillan.

Buil-Gil, D., Trajtenberg, N., & Aebi, M. F. (2024). Measuring Cybercrime and Cyberdeviance in Surveys. In Routledge Handbook of Online Deviance. Routledge.

Bureau of Democracy, Human Rights, and Labor (2022). 2022 Country Reports on Human Rights Practices: Mexico. U.S. Department of State Reports. https://www.state.gov/reports/2022-country-reports-on-human-rights-practices/mexico/ (Last accessed 01/01/2024)

Chamberlain, M. J, Hillier, J, & Signoretta, P. (2015). Counting Better? An Examination of the Impact of Quantitative Method Teaching on Statistical Anxiety and Confidence. Active Learning in Higher Education, 16(1), 51-66. https://doi.org/10.1177/1469787414558983

Curtis-Ham, S., Tompson, L., & Czarnomski (2024). Forewarned is Forearmed: The Hidden Curriculum of Working with Police Crime Data. In L. Huey and D. Buil-Gil (Eds.), The Crime Data Handbook (pp. 9-22). Bristol: Bristol University Press.

Dewey, J. (1916). Democracy and Education. Teddington: Echo Library.

Eick, G. M., Larsen, E. G., Geiger, B. B., & Sundberg, T. (2021). Beyond the Numbers: The Impact of Quantitative Teaching on Overall Student Performance. Journal of Political Science Education, 17(sup1), 693-702. https://doi.org/10.1080/15512169.2021.1897603

Eisner, M. (2023). Towards a global comparative criminology. In A. Liebling, S. Maruna & L. McAra (Eds.), The Oxford Handbook of Criminology (pp. 75-98). Oxford: Oxford University Press.

Groff, E. R., & Haberman, C. P. (2023). Understanding Crime and Place: A Methods Handbook. Philadelphia: Temple University Press.

Hammami, R., Hussein, A., & Mezlini, I. (2015). Afrobarometer Round 6: The Quality of Democracy and Governance in Algeria, 2015. Inter-university Consortium for Political and Social Research [distributor], 2017-10-30. https://doi.org/10.3886/ICPSR36644.v1

Harlow, L. L. (2013). Teaching Quantitative Psychology. In T. D. Little (Ed.), The Oxford Handbook of Quantitative Methods (pp. 105-117). New York: Oxford University Press.

Jones, N. (2013). The unintended consequences of kingpin strategies: kidnap rates and the Arellano-Félix Organization. Trends in Organized Crime, 16, 156–176. https://doi.org/10.1007/s12117-012-9185-x

Kääriäinen, J., & Sirén, R. (2011). Trust in the police, generalized trust and reporting crime. European Journal of Criminology, 8(1), 65-81. https://doi.org/10.1177/1477370810376562

Karstedt, S. (2012). Comparing Justice and Crime across Cultures. In D. Gadd, S. Karstedt, & S. F. Messner (Eds.), The SAGE Handbook of Criminological Research Methods (pp. 373–390). Sage.

Kaplan, J. (2022). Crime by the Numbers: A Criminologist’s Guide to R. Boca Raton: CRC Press.

Koh, D., & Zawi, M. K. (2014). Statistics Anxiety among Postgraduate Students. International Education Studies, 7(13), 166-174.

Liu, S., Onwuegbuzie, A. J., & Meng, L. (2011). Examination of the Score Reliability and Validity of the Statistics Anxiety Rating Scale in a Chinese Population: Comparisons of Statistics Anxiety Between Chinese College Students and their Western Counterparts. Journal of Educational Enquiry, 11(1), 29-42.

MacInnes, J. (2014). Teaching Quantitative Methods. Enhancing Learning in the Social Sciences, 6(2), 1-5. https://doi.org/10.11120/elss.2014.00038

MacInnes, J., Breeze, M., de Haro, M., Kandlik, M., & Karels, M. (2016). Measuring Up: International Case Studies on the Teaching of Quantitative Methods in the Social Sciences. London: The British Academy.

Massa, R., and Fondevila, G. (2021). Criminal displacement in Mexico city’s metropolitan area: The case of kidnapping. International Journal of Law, Crime and Justice, 67, 100479. https://doi.org/10.1016/j.ijlcj.2021.100479

Medina Ariza, J., & Solymosi, R. (2023). Crime Mapping and Spatial Data Analysis using R. Boca Raton: CRC Press.

Meguid, E. A., & Collins, M. (2017). Students’ Perceptions of Lecturing Approaches: Traditional Versus Interactive Teaching. Advances in Medical Education and Practice, 8, 229-241. https://doi.org/10.2147/AMEP.S131851

Murray, J., Shenderovich, Y., Gardner, F., Mikton, C., Derzon, J. H., Liu, J., & Eisner, M. (2018). Risk Factors for Antisocial Behavior in Low- and Middle-Income Countries: A Systematic Review of Longitudinal Studies. Crime and Justice, 47(1), 255-364.

Nivette, A. E. (2021). Exploring the Availability and Potential of International Data for Criminological Study. International Criminology, 1, 70–77. https://doi.org/10.1007/s43576-021-00009-y

Oberwittler, D. (2019). Lethal Violence: A Global View on Homicide. In Oxford Research Encyclopedia of Criminology and Criminal Justice (pp. 1–58). Oxford University Press. https://doi.org/10.1093/acrefore/9780190264079.013.402

Ochoa. R. (2012). Not just the rich: New tendencies in kidnapping in Mexico City. Global Crime, 13(1), 1-21. https://doi.org/10.1080/17440572.2011.632499

Ochoa, R. (2019). Intimate crimes: Kidnapping, gangs and trust in Mexico. Oxford: Oxford University Press.

Oldstone-Moore, J. (2023). The Oxford Handbook of Confucianism. In The Oxford Handbook of: Confucianism. Oxford University Press. https://doi.org/10.1093/oxfordhb/9780190906184.001.0001

Olutola, A. A., & Bello, P. O. (2016). An exploration of the factors associated with public trust in the South African Police Service. International Journal of Economics and Finance Studies, 8(2), 219-236.

Ouassini, N., & Ouassini, A. (2020). Criminology in the Arab World: Misconceptions, Nuances and Future Prospects. The British Journal of Criminology, 60(3), 519–536. https://doi.org/10.1093/bjc/azz067

Payne, G. (2014). Surveys, Statisticians and Sociology: A History of (a Lack of) Quantitative Methods. Enhancing Learning in the Social Sciences, 6(2), 74-89. https://doi.org/10.11120/elss.2014.00028

R Core Team (2023). R: A language and environment for statistical computing. Vienna: R Foundation for Statistical Computing.

Ralston, K., MacInnes, J., Crow, G., & Gayle, V. J. (2016). We Need to Talk about Statistical Anxiety. A Review of the Evidence around Statistical Anxiety in the Context of Quantitative Methods Pedagogy. NCRM Working Paper.

Rosemberg, C., Allison, R., De Scalzi, M., Krčál, M., Bryan, B., Farla, K., Dobson, C., Cimatti, R., Wain, M., & Jávorka, Z. (2022). Evaluation of the Q-Step Programme: Final Report. London: Nuffield Foundation.

Sheptycki, J. (2008). Transnationalisation, orientalism and crime. Asian Journal of Criminology, 3(1), 13–35. https://doi.org/10.1007/s11417-008-9049-0

Sherman, L. W. (2002). Trust and confidence in criminal justice. National Institute of Justice Journal, 248, 22-31.

Signoretta, P., Chamberlain, J. M., & Hillier, J. (2014). ‘A Picture Is Worth 10,000 Words’: A Module to Test the ‘Visualization Hypothesis’ in Quantitative Methods Teaching. Enhancing Learning in the Social Sciences, 6(2), 90-104. https://doi.org/10.11120/elss.2014.00029

Skinner, B. F. (1971). Beyond Freedom & Dignity. Indianapolis: Hackett Publishing.

Stockemer, D. (2019). Quantitative Methods for the Social Sciences: A Practical Introduction with Examples in SPSS and Stata. Cham: Springer.

The British Academy (2015). Count Us In: Quantitative Skills for a New Generation. London: The British Academy.

Tyler, T. R. (2006). Psychological perspectives on legitimacy and legitimation. Annual Review of Psychology, 57, 375–400. https://doi.org/10.1146/annurev.psych.57.102904.190038

UNCTAD (2022). Handbook of Statistics 2022. New York: United Nations Publications.

UNODC (n.d.). Atlas on Crime Victimization Surveys. Center for Data and Evaluation UNODC. https://www.cdeunodc.inegi.org.mx/index.php/atlas-on-cvs/ (Last accessed 01/01/2024)

UNODC (2023). Global Study on Homicide 2023. Understanding Homicide. Vienna

Vygotsky, L. (1930). Mind in Society. Cambridge: Harvard University Press.

Warnes, G. R., Bolker, B., Lumley, T., & Johnson, R. C. (2019). gmodels: Various R Programming Tools for Model Fitting. R package version 2.18.1.1. https://CRAN.R-project.org/package=gmodels

Wickham, H. (2016). ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York.

Wickham, H., & Miller, E. (2020). haven: Import and Export 'SPSS', 'Stata' and 'SAS' Files. R package version 2.5.4. https://CRAN.R-project.org/package=haven

Wickham, H., Averick, M., Bryan, J., Chang, W., McGowan, L. D., François, R., Grolemund, G., Hayes, A., Henry, L., Hester, J., Kuhn, M., Pedersen, T. L., Miller, E., Bache, S. M., Muller, K., Ooms, J., Robinson, D., Seidel, D. P., Spinu, V., Takahashi, K., Vaughan, D., Wilke, C., Woo, K., & Yutani, H. (2019). Welcome to the tidyverse. Journal of Open Source Software, 4(43), 1686. https://doi.org/10.21105/joss.01686

Wickam, H., Hester, J., Francois, R., Bryan, J., Bearrows, S., Software, P., Julanki, J., & Jorgensen, M. (2024). readr: Read Rectangular Text Data. R package version 2.1.5. https://cran.r-project.org/web/packages/readr/index.html

Williams, M., Payne, G., Hodgkinson, L., & Poade, D. (2008). Does British Sociology Count?: Sociology Students’ Attitudes toward Quantitative Methods. Sociology, 42(5), 1003-1021. https://doi.org/10.1177/0038038508094576

Williams, M., Sloan, L., & Brookfield, C. (2019). The quantitative crisis in UK sociology. In J. Evans, S. Ruane & H. Southall (Eds.), Data in Society: Challenging Statistics in an Age of Globalisation (pp. 337-348). Bristol: Bristol University Press.

Williams, M., & Sutton, C. (2011). Challenges and Opportunities for Developing Teaching in Quantitative Methods. In G. Payne & M. Williams (Eds.), Teaching Quantitative Methods: Getting the Basics Right (pp. 66-84). London: SAGE.

Wooditch, A., Johnson, N. J., Solymosi, R., Medina Ariza, J., & Langton, S. (2022). A Beginner’s Guide to Statistics for Criminology and Criminal Justice Using R. Cham: Springer.

Zhang, J., & Liu, J. (2023). Asian Criminology: Its Contribution in Linking Global North and South. International Annals of Criminology, 61(3-4), 223-242. https://doi.org/10.1017/cri.2023.22


[1]As highlighted in the two seminal British Academy reports titled ‘Count Us In’ (The British Academy, 2015) and ‘Measuring Up’ (MacInnes et al., 2016), the issue of students’ statistics anxiety is more problematic in the UK than in other countries that place quantitative skills learning at the core of their degrees’ curriculum. In the UK, one of the most significant initiatives aimed at improving quantitative analytical skills among social science students is the Q-Step Program, a 19.5 GBP million project funded by Nuffield Foundation, the Economic and Social Research Council, and the Higher Education Funding Council for England, involving 18 universities in United Kingdom (Williams et al., 2019).

[2] Data from Taiwan were unavailable in this dataset.

Comments
0
comment
No comments here
Why not start the discussion?