Skip to main content
SearchLoginLogin or Signup

Open Science for Theoretical Work: A Document Analysis Case Study of Social Disorganization Theory

Published onAug 13, 2024
Open Science for Theoretical Work: A Document Analysis Case Study of Social Disorganization Theory
·

Abstract

Growing concerns about the reproducibility of scientific findings have led to efforts to enhance the transparency and rigor of criminological research. Improving the accuracy and precision of theory development and adopting open science practices, such as open data, are widely considered effective means to achieve these ends. However, current open science practices primarily target quantitative methodologies and tend to ignore qualitative methods, like document analysis, due to perceived insurmountable obstacles. Since document analysis is often crucial for constructing accurate and precise theories, the inability to integrate open science practices with qualitative research methods undermines the effectiveness of these solutions to the replication crisis. In the case study, we use MAXQDA qualitative software to simulate a method of open science applicable to qualitative document analysis in a purposive sample of n=15 empirical social disorganization research articles. Preliminary findings indicate using qualitative software packages is an effective way to increase transparency and rigor of data coding for theory development, thus integrating the benefits of open science and open data with theory construction. We discuss the utility, limitations, and implications of using this approach.

INTRODUCTION

The replication crisis, characterized by the troubling inability of researchers to reproduce results from published scientific studies, has already profoundly affected the field of psychology and is now spreading to other disciplines, such as criminology (see, e.g., Chin et al. 2023; Niemeyer et al. 2022; Pridemore, Makel, and Plucker 2018)(see, e.g., Chin et al. 2023; Niemeyer et al. 2022; Pridemore, Makel, and Plucker 2018)(see, e.g., Chin et al. 2023; Niemeyer et al. 2022; Pridemore, Makel, and Plucker 2018). The diffusion of the crisis has presented significant challenges, such as eroding public trust in science and diminishing the role of scientists as reliable sources for evidence-based policy and practice (Korbmacher et al. 2023; see also Hendriks, Kienhues, and Bromme 2020)(Korbmacher et al. 2023; see also Hendriks, Kienhues, and Bromme 2020). Researchers such as (Niemeyer et al. 2022) and (Pridemore, Makel, and Plucker 2018) have begun investigating the replication crisis in criminology. Early findings from this research align with studies in psychology, emphasizing the importance of strong theoretical frameworks, hypotheses, rigorous methodology, and robust analyses. Furthermore, these studies highlight the need to minimize researcher bias and questionable research practices (QRPs) that can lead to false positive results.(Korbmacher et al. 2023; see also Hendriks, Kienhues, and Bromme 2020)(Korbmacher et al. 2023; see also Hendriks, Kienhues, and Bromme 2020)

To address the replication crisis, psychology has taken the approach of embracing the open science movement, including open data and study preregistration (Finkel, Eastwick, and Reis 2017; Haeffel 2022)(Finkel, Eastwick, and Reis 2017; Haeffel 2022). Although open science practices have generally improved reproducibility in quantitative research (Sotola and Credé 2022), there are significant concerns about applying these practices to qualitative research. These concerns include issues related to obtaining participant consent, ensuring anonymity and confidentiality, the context-specific nature of qualitative data, and the distinctive role that qualitative researchers play in data generation (Bucerius and Copes 2024; Chauvette, Schick-Makaroff, and Molzahn 2019; Elman and Kapiszewski 2014; Mannheimer et al. 2019; Steltenpohl et al. 2023)(Bucerius and Copes 2024; Chauvette, Schick-Makaroff, and Molzahn 2019; Elman and Kapiszewski 2014; Mannheimer et al. 2019; Steltenpohl et al. 2023)(Bucerius and Copes 2024; Chauvette, Schick-Makaroff, and Molzahn 2019; Elman and Kapiszewski 2014; Mannheimer et al. 2019; Steltenpohl et al. 2023)(Bucerius and Copes 2024; Chauvette, Schick-Makaroff, and Molzahn 2019; Elman and Kapiszewski 2014; Mannheimer et al. 2019; Steltenpohl et al. 2023)(Bucerius and Copes 2024; Chauvette, Schick-Makaroff, and Molzahn 2019; Elman and Kapiszewski 2014; Mannheimer et al. 2019; Steltenpohl et al. 2023)As a result, the qualitative research community has pushed back against the implementation or adaptation of open science practices (Huma and Joyce 2023; Bucerius and Copes 2024)(Huma and Joyce 2023; Bucerius and Copes 2024). However, some have proposed adaptations to the existing quantitative-centered open science practices to bolster the robustness of qualitative findings using specialized data repositories or tiered access to data (Elman and Kapiszewski 2014; Jones and Alexander 2018; Mannheimer et al. 2019; Steltenpohl et al. 2023)(Elman and Kapiszewski 2014; Jones and Alexander 2018; Mannheimer et al. 2019; Steltenpohl et al. 2023)(Elman and Kapiszewski 2014; Jones and Alexander 2018; Mannheimer et al. 2019; Steltenpohl et al. 2023)(Elman and Kapiszewski 2014; Jones and Alexander 2018; Mannheimer et al. 2019; Steltenpohl et al. 2023).

Despite the valid concerns qualitative researchers have raised, they may not apply equally to all types of qualitative data. Some qualitative methodologies, such as the use of document analysis when constructing theory, could strongly benefit from adopting open science practices without experiencing the ethical and practical concerns of other types of more sensitive qualitative data. Since theory is crucial in addressing the replication crisis, adopting open science practices in this domain will be particularly significant. Using a qualitative software package such as MAXQDA allows for transparency in coding and extracting coded segments from texts when constructing theory that is analogous to making a partial quantitative data set available for examination. Given the context of the replication crisis in the social sciences and the key role of robust theory in the derivation chain (Ducate et al. 2024), it is crucial for pure theoretical work to engage with open science practices.

In this paper, we first describe the aims of criminological research and the limits imposed by the mode of science, as well as the contribution of theory and methods to the larger replication crisis. Next, we examine the adoption of open science practices and their limitations in addressing the replication crisis and situate the replication crisis in criminology. The current study argues that theoretical research is a type of qualitative research (i.e., document analysis) and presents a case study applying open science practices and data sharing to this subset of qualitative research. We examine the benefits of adopting an open science approach and carefully consider uniquely qualitative concerns with data sharing and their implications in the case study. We conclude with a full discussion of the implications of broader implications of adopting open science methodologies to theorical data and acknowledge that these recommendations are limited to specific non-sensitive subsets of qualitative data.

BACKGROUND

THE REPLICATION CRISIS AND THE CONTRIBUTION TO THEORY AND METHODS

In recent years, several well-known problems inherent to the hypothetico-deductive method and null hypothesis significance testing (HDNHST) have been implicated as major contributors to the so-called replication crisis (Borsboom et al. 2021; Eronen and Romeijn 2020)(Borsboom et al. 2021; Eronen and Romeijn 2020). Here, the replication crisis refers to the widespread problem in scientific research where an alarming number of studies, particularly in psychology (Open Science Collaboration 2015; Simmons, Nelson, and Simonsohn 2011; Vazire, Schiavone, and Bottesini 2022)(Open Science Collaboration 2015; Simmons, Nelson, and Simonsohn 2011; Vazire, Schiavone, and Bottesini 2022)(Open Science Collaboration 2015; Simmons, Nelson, and Simonsohn 2011; Vazire, Schiavone, and Bottesini 2022) and biomedical science (Kozlov 2022; Rodgers and Collings 2021)(Kozlov 2022; Rodgers and Collings 2021), cannot be replicated or reproduced by other researchers, casting doubt on their reliability. Consequently, researchers across the sciences—albeit largely outside of criminology—have worked hard to identify and address how HDNHST contributes to the replication crisis (see, e.g., (Finkel, Eastwick, and Reis 2017; Hoekstra and Vazire 2021; Scheel 2022; Scheel et al. 2021)(Finkel, Eastwick, and Reis 2017; Hoekstra and Vazire 2021; Scheel 2022; Scheel et al. 2021)(Finkel, Eastwick, and Reis 2017; Hoekstra and Vazire 2021; Scheel 2022; Scheel et al. 2021)(Finkel, Eastwick, and Reis 2017; Hoekstra and Vazire 2021; Scheel 2022; Scheel et al. 2021).

Most efforts to understand the relationship between HDNHST and the replication crisis have focused on the role of questionable methodological factors, such as p-hacking (Chin et al. 2023; Guest and Martin 2021; Protzko et al. 2023; Wooditch et al. 2020)(Chin et al. 2023; Guest and Martin 2021; Protzko et al. 2023; Wooditch et al. 2020)(Chin et al. 2023; Guest and Martin 2021; Protzko et al. 2023; Wooditch et al. 2020)(Chin et al. 2023; Guest and Martin 2021; Protzko et al. 2023; Wooditch et al. 2020), hypothesizing-after-the-results-are-known (i.e., HARKING) (Chin et al. 2023; Guest and Martin 2021; Van Bavel et al. 2024)(Chin et al. 2023; Guest and Martin 2021; Van Bavel et al. 2024)(Chin et al. 2023; Guest and Martin 2021; Van Bavel et al. 2024), insufficient sample size and power (Barnes et al. 2020; Brunner and Schimmack 2020; Claesen et al. 2022; Protzko et al. 2023; Van Bavel et al. 2024)(Barnes et al. 2020; Brunner and Schimmack 2020; Claesen et al. 2022; Protzko et al. 2023; Van Bavel et al. 2024)(Barnes et al. 2020; Brunner and Schimmack 2020; Claesen et al. 2022; Protzko et al. 2023; Van Bavel et al. 2024)(Barnes et al. 2020; Brunner and Schimmack 2020; Claesen et al. 2022; Protzko et al. 2023; Van Bavel et al. 2024)(Barnes et al. 2020; Brunner and Schimmack 2020; Claesen et al. 2022; Protzko et al. 2023; Van Bavel et al. 2024), selective reporting (i.e., the file drawer problem) (Adler, Röseler, and Schöniger 2023; Pridemore, Makel, and Plucker 2018)(Adler, Röseler, and Schöniger 2023; Pridemore, Makel, and Plucker 2018), lack of proper training in the methods employed (Scheel et al. 2021), and fraud (Chin et al. 2023; Fanelli 2009)(Chin et al. 2023; Fanelli 2009). However, it is now understood that weak, poorly developed theory also contributes significantly to non-reproducibility in several ways (Borsboom et al. 2021; Bringmann, Elmer, and Eronen 2022; Eronen and Bringmann 2021; Oberauer and Lewandowsky 2019)(Borsboom et al. 2021; Bringmann, Elmer, and Eronen 2022; Eronen and Bringmann 2021; Oberauer and Lewandowsky 2019)(Borsboom et al. 2021; Bringmann, Elmer, and Eronen 2022; Eronen and Bringmann 2021; Oberauer and Lewandowsky 2019)(Borsboom et al. 2021; Bringmann, Elmer, and Eronen 2022; Eronen and Bringmann 2021; Oberauer and Lewandowsky 2019). First, theory is critical to formulating the null and alternative hypotheses, ensuring they are meaningful, relevant, and based on existing knowledge or assumptions. Weak theory undermines HDNHST by generating equally weak hypotheses (Combs 2010; Mearsheimer and Walt 2013; Oberauer and Lewandowsky 2019)(Combs 2010; Mearsheimer and Walt 2013; Oberauer and Lewandowsky 2019)(Combs 2010; Mearsheimer and Walt 2013; Oberauer and Lewandowsky 2019). For example, a weak alternative hypothesis might merely indicate “a relationship”, whereas a stronger theory will specify the strength and direction of the hypothesized relationship.

Second, theory provides insight into the nature of the studied variables, their expected relationships, and the complex ways they might interact. This understanding is crucial for designing experiments or studies that can effectively test the hypotheses. Weak theory undermines HDNHST through poorly defined phenomena, difficulties with construct validity, and challenges with determining causality (Eronen and Bringmann 2021; Mearsheimer and Walt 2013; Oberauer and Lewandowsky 2019)(Eronen and Bringmann 2021; Mearsheimer and Walt 2013; Oberauer and Lewandowsky 2019)(Eronen and Bringmann 2021; Mearsheimer and Walt 2013; Oberauer and Lewandowsky 2019). Third, a strong theoretical basis guides the choice of appropriate statistical methods and tests. Different data types and research questions require different statistical approaches, and theory helps choose the most suitable ones (Borrego, Douglas, and Amelink 2009; Gardenier and Resnik 2002; Jackson and Kuha 2016; Ritter 2022)(Borrego, Douglas, and Amelink 2009; Gardenier and Resnik 2002; Jackson and Kuha 2016; Ritter 2022)(Borrego, Douglas, and Amelink 2009; Gardenier and Resnik 2002; Jackson and Kuha 2016; Ritter 2022)(Borrego, Douglas, and Amelink 2009; Gardenier and Resnik 2002; Jackson and Kuha 2016; Ritter 2022). Finally, NHST's effectiveness is influenced by the strength of the relationship under study and the sample size—i.e., the likelihood of rejecting the null hypothesis increases with stronger relationships and larger sample sizes. This relationship is important as it underscores the need for a sound theoretical basis to determine the expected strength of the relationship and the appropriate sample size (Brick et al. 2016; W. Velicer et al. 2013; W. F. Velicer et al. 2008)(Brick et al. 2016; W. Velicer et al. 2013; W. F. Velicer et al. 2008)(Brick et al. 2016; W. Velicer et al. 2013; W. F. Velicer et al. 2008). By failing to specify the predicted value of either, a weak theory allows researchers to justify a possibly unnecessarily large sample size that may make arbitrary effect sizes significant (Combs 2010; Guo, Straub, and Zhang 2014; Smith and Little 2018)(Combs 2010; Guo, Straub, and Zhang 2014; Smith and Little 2018)(Combs 2010; Guo, Straub, and Zhang 2014; Smith and Little 2018).

ADDRESSING THE REPLICATION CRISIS THROUGH OPEN SCIENCE PRACTICES

Open science research practices are often presented as solutions to the underlying causes of the reproducibility crisis. These practices include a broad range of recommendations—and, increasingly, prescriptions (Bahlai et al. 2019)—such as registered reports and preregistration; open data, protocols, materials, software, and code; open access publications and preprints; and open evaluation and transparent peer review (Center for Open Science n.d.). Advocates ranging from individual scientists (Allen and Mehler 2019; McKiernan et al. 2016)(Allen and Mehler 2019; McKiernan et al. 2016) to major research institutes and universities (Christian and Wetterberg 2023; KU Leuven 2023)(Christian and Wetterberg 2023; KU Leuven 2023) professional associations (Committee on Toward an Open Science Enterprise et al. 2018), and government agencies (NASA n.d.; National Institutes of Health n.d.; UNESCO 2021)(NASA n.d.; National Institutes of Health n.d.; UNESCO 2021)(NASA n.d.; National Institutes of Health n.d.; UNESCO 2021) argue that open science research practices improve scientific rigor by reducing publication bias, promoting accountability, minimizing the likelihood of data manipulation and selective reporting, and verifying findings to identify errors that might otherwise go unnoticed. Recently, a pilot program has begun paying specialists ~$1,100 USD to evaluate highly cited social and behavioral science papers for errors and up to ~$2,800 USD bonus for any errors they identify (Elson 2024). Should the adoption of these recommendations, prescriptions, benefits, and incentives continue to spread, it is likely open-source practices will become the norm for how scientific research is performed.

THE LIMITS OF OPEN SCIENCE PRACTICES

Adopting open science practices has significantly improved the reproducibility of published quantitative research findings (Flaxman et al. 2020). However, within the qualitative community, there are legitimate questions regarding the functionality of open science practices and a unique set of concerns in addition to those shared with quantitative data (Bucerius and Copes 2024; Huma and Joyce 2023; Khan, Hirsch, and Zeltzer-Zubida 2024)(Bucerius and Copes 2024; Huma and Joyce 2023; Khan, Hirsch, and Zeltzer-Zubida 2024)(Bucerius and Copes 2024; Huma and Joyce 2023; Khan, Hirsch, and Zeltzer-Zubida 2024). Within the field of criminology, (Bucerius and Copes 2024) wrote a strong response piece to the new open data requirements in Criminology (Sweeten et al. 2024) noting that qualitative open data exhibits specific challenges such restricting knowledge, inequities, ethics, and damage to careers of early career scholars in particular concerns that are shared by the larger qualitative community. Infrastructure challenges are not unique to qualitative data but are perhaps more complex for this type of data. (Wilkinson et al. 2016) generated the “FAIR” guidelines for open data suggesting that they should be findable, accessible, interoperable, and reusable. Metadata is a significant portion of the qualitative open data concerns as it needs to be inclusive enough to contextualize the data as needed in order to meet the reusable and interoperable standards set by (Wilkinson et al. 2016), which is a time-consuming process (Jones and Alexander 2018).

In terms of data, qualitative data components may be extensive and varied. For example, data may include field notes, jottings, transcripts, reflections, etc., that capture both the analysis and the process (Chauvette, Schick-Makaroff, and Molzahn 2019; Steltenpohl et al. 2023)(Chauvette, Schick-Makaroff, and Molzahn 2019; Steltenpohl et al. 2023). Additionally, the data are viewed as co-created by the researcher and the research participants and are not comprehensively captured in transcripts (Bucerius and Copes 2024; Chauvette, Schick-Makaroff, and Molzahn 2019; Mannheimer et al. 2019)(Bucerius and Copes 2024; Chauvette, Schick-Makaroff, and Molzahn 2019; Mannheimer et al. 2019)(Bucerius and Copes 2024; Chauvette, Schick-Makaroff, and Molzahn 2019; Mannheimer et al. 2019). Context is key to interoperability and reusability, but data are often difficult to deidentify while replicating the exact cultural situations surrounding a study and gathering connections between the data (Bucerius and Copes 2024; Chauvette, Schick-Makaroff, and Molzahn 2019; Khan, Hirsch, and Zeltzer-Zubida 2024; Mannheimer et al. 2019)(Bucerius and Copes 2024; Chauvette, Schick-Makaroff, and Molzahn 2019; Khan, Hirsch, and Zeltzer-Zubida 2024; Mannheimer et al. 2019)(Bucerius and Copes 2024; Chauvette, Schick-Makaroff, and Molzahn 2019; Khan, Hirsch, and Zeltzer-Zubida 2024; Mannheimer et al. 2019)(Bucerius and Copes 2024; Chauvette, Schick-Makaroff, and Molzahn 2019; Khan, Hirsch, and Zeltzer-Zubida 2024; Mannheimer et al. 2019). Additionally, if data are accessible, it may impact the openness and willingness of research participants to provide quality responses, particularly when discussing sensitive information (Bucerius and Copes 2024; Khan, Hirsch, and Zeltzer-Zubida 2024)(Bucerius and Copes 2024; Khan, Hirsch, and Zeltzer-Zubida 2024).

Qualitative data sharing encounters unique ethical concerns, including issues surrounding participant consent and participant confidentiality/ anonymity (Chauvette, Schick-Makaroff, and Molzahn 2019; Jones and Alexander 2018; Mannheimer et al. 2019; Steltenpohl et al. 2023)(Chauvette, Schick-Makaroff, and Molzahn 2019; Jones and Alexander 2018; Mannheimer et al. 2019; Steltenpohl et al. 2023)(Chauvette, Schick-Makaroff, and Molzahn 2019; Jones and Alexander 2018; Mannheimer et al. 2019; Steltenpohl et al. 2023)(Chauvette, Schick-Makaroff, and Molzahn 2019; Jones and Alexander 2018; Mannheimer et al. 2019; Steltenpohl et al. 2023). Research participants are asked for informed consent prior to participating in research. This consent, however, is limited to the original study and does not address secondary data analysis (Chauvette, Schick-Makaroff, and Molzahn 2019; Jones and Alexander 2018)(Chauvette, Schick-Makaroff, and Molzahn 2019; Jones and Alexander 2018). Institutional Review Boards (IRBs) may not consider the potential risks or harms to participants during secondary data analysis, meaning that without participant consent, there is no mechanism to protect research participants (Jones and Alexander 2018). Depending on the sensitivity of the data, IRBs may not approve of data sharing for some projects (Bucerius and Copes 2024). Maintaining participant confidentiality or anonymity is another key ethical concern with qualitative data sharing. This concern is challenging to balance with the need to preserve key contextual information in the data. Specifically, data must be carefully de-identified to protect participant confidentiality or anonymity, after which the data may be too redacted to be useful (Chauvette, Schick-Makaroff, and Molzahn 2019). Given the typically small sample size and richly detailed data on participant experiences, there is the potential that participants may still be identified even after omitting some information (Chauvette, Schick-Makaroff, and Molzahn 2019). (Bucerius and Copes 2024) also note that challenges of secondary data apply to open qualitative data. Specifically, it is possible if a complete qualitative data set is available, other research groups will publish on the findings before the team that collected the data. This risk is perhaps most relevant for early career qualitative scholars that have collected data and established scholars with the resources and capacity to quickly write and publish on available secondary data.

Consequently, open science practices must be carefully adapted for use in qualitative research. Additionally, the core goals of qualitative research differ in terms of generalizability. Given the frequent use of small, non-representative samples, specific findings cannot be generalized to a larger population or even other populations. This negates the purpose of conceptual replication unless it is adapted to consider the generalizability of broad patterns (Khan, Hirsch, and Zeltzer-Zubida 2024).

ADDRESSING THE LIMITS OF OPEN SCIENCE PRACTICES

As the replication crisis continues to grow, there is a collective effort to develop open science practices that address their limitations. Some examples are standardization and new techniques to incorporate open science practices into qualitative research. Standardization is complicated by the differing methodologies available for replication studies. Whether the design is unconstrained, constrained, or a direct replication will ultimately affect the power and ultimate success of the study (Hedges and Schauer 2021). Once the study's design has been decided, differing populations, settings, and unforeseen confounds have the impact of creating replication failure (Wong, Anglin, and Steiner 2022). While open science preregistration minimizes the effects of HARKing (hypothesizing after results are known), ensuring adequate statistical power is still necessary. (Wong, Anglin, and Steiner 2022) propose using the causal replication framework (CRF) to standardize assumptions and estimates within replications. This method constrains the amount of variation introduced into a direct replication to assure quality replications that are standardized across methods (Wong, Anglin, and Steiner 2022). Part of this may include using the same measurement instruments to standardize measurement across studies. This still requires validity checks to ensure that the same constructs are measured across separate samples or populations.

Qualitative analysis techniques offer additional challenges to replicability and, thus, have largely been absent from open science discussion and implementation (Huma and Joyce 2023). When gathering participant data, participants only consent to participate in the original study, so allowing access to others for replication or secondary data analysis may be problematic (Chauvette, Schick-Makaroff, and Molzahn 2019; Huma and Joyce 2023; Jones and Alexander 2018; Kern and Mustasilta 2023)(Chauvette, Schick-Makaroff, and Molzahn 2019; Huma and Joyce 2023; Jones and Alexander 2018; Kern and Mustasilta 2023)(Chauvette, Schick-Makaroff, and Molzahn 2019; Huma and Joyce 2023; Jones and Alexander 2018; Kern and Mustasilta 2023)(Chauvette, Schick-Makaroff, and Molzahn 2019; Huma and Joyce 2023; Jones and Alexander 2018; Kern and Mustasilta 2023). Despite the challenges in obtaining consent and confidentiality, (Enriquez 2024) found that participants in a carefully structured qualitative interview were excited for their interview transcripts to be openly available and viewed their open data as a societal benefit and protection against out of context quotes. Preregistration for qualitative research does exist but does not accommodate all types of qualitative projects (Huma and Joyce 2023). For example, work by (T. L. Haven et al. 2020) developed new preregistration formats for qualitative studies using a Delphi study design, but they note the additional forms are not inclusive for all types of qualitative research. Further, other techniques are often not described in open science, requiring researchers to develop their own strategies for replication, transparency, and reflexivity (Huma and Joyce 2023). Interaction analysis (IA) is one example of an adaptive strategy as it includes detailed transcript extracts so readers (and reviewers) can verify claims made by the authors (Huma and Joyce 2023). Secondary Qualitative Data Analysis (SQDA) reanalyzes data collected from others in previous studies. SQDA methods often require data recoding as different researchers will code data differently and still require understanding the context of the original study, which requires transparency from the original data (Kern and Mustasilta 2023). Overall, despite work in adaptions to open science methods for qualitative research, there is a substantial amount of work to be done as these problems are only beginning to be addressed.

THE REPLICATION CRISIS IN CRIMINOLOGY

Although criminologists have been slow to evaluate the credibility of their field’s published research findings, recent publications are beginning to force the issue. For example, a series of whistleblower complaints, stand-alone studies, and special issues have identified several instances of research fraud (e.g., Pickett 2020; Schumm et al. 2023)(e.g., Pickett 2020; Schumm et al. 2023), failed replications (e.g., Christ et al. 2018; McNeeley and Warner 2015)(e.g., Christ et al. 2018; McNeeley and Warner 2015), and questionable research practices known to generate false positives (Barnes et al. 2020; Chin et al. 2023; Pridemore, Makel, and Plucker 2018; Sweeten 2020)(Barnes et al. 2020; Chin et al. 2023; Pridemore, Makel, and Plucker 2018; Sweeten 2020)(Barnes et al. 2020; Chin et al. 2023; Pridemore, Makel, and Plucker 2018; Sweeten 2020)(Barnes et al. 2020; Chin et al. 2023; Pridemore, Makel, and Plucker 2018; Sweeten 2020). Akin to the theory crisis in psychology (Eronen and Bringmann 2021; Scheel et al. 2021)(Eronen and Bringmann 2021; Scheel et al. 2021), Ducate et al. (Ducate et al. 2024) and (Proctor et al. 2024) assert that criminology also exhibits the warning signs of weak theory that drive failures in replication. According to Proctor and Niemeyer's (Proctor and Niemeyer 2020) critique, Akers' (Akers 2009) social learning theory of crime is deeply flawed in its understanding of how declarative memory is acquired and concluded this inaccuracy casts serious doubt on the validity of the theory's research findings. Inspired by the above findings, Niemeyer et al. (Niemeyer et al. 2022) conducted a series of Bayesian simulations to estimate the false positive rate of published criminological research due to the interaction of questionable research practices and poor theory. Based on reasonable parameter estimates, their simulations suggest that most criminological findings are likely unreliable. In arguably the most pragmatic example, (Stevenson 2023) found that most interventions evaluated in criminology with randomized controlled trials (RCTs) have few, if any, lasting impacts. In her conclusion to her study, (Stevenson 2023) explicitly notes that RCTs experience serious constraints in measuring the impact of interventions on complex human behavior, and their limitations need to be recognized.

Many criminologists and non-criminologists are now working to institutionalize open science practices within the field. For example, in 2024, the incoming editorial team of Criminology revealed a three-step plan to encourage data and code sharing for the articles they publish that use quantitative and qualitative data (Sweeten et al. 2024). The announcement also clarifies that although the journal's data and code-sharing policy is currently voluntary, it will eventually become mandatory once any initial challenges have been overcome (ibid, p10). Additionally, CrimRxiv, an “online repository and hub,” was founded in 2020 to advance open access to criminological research worldwide (University of Manchestor 2023). The platform has since “freely shared over 2,000 publications with nearly 230,000 views from more than 112,000 readers in 209 countries” (University of Manchestor 2023). Given the current and future opportunities and incentives for open-science criminological research, there is an urgent need for new practices that can address not only the general limitations of open science but also the specific limitations more common to the field of criminology and to the theoretical work needed to address the replication crisis.

A CASE STUDY OF OPEN DATA FOR DOCUMENT ANALYSIS

The challenge of adapting open science practices to meet the needs of qualitative research is arguably worth the effort based on the benefits.1 Firstly, governments, funding agencies, and publication pipelines are requesting open science practices and open data (Chauvette, Schick-Makaroff, and Molzahn 2019; Jones and Alexander 2018)(Chauvette, Schick-Makaroff, and Molzahn 2019; Jones and Alexander 2018). Mechanisms that enable ethical qualitative data sharing open new funding streams and publication outlets with open data requirements to qualitative researchers and improve the accessibility of qualitative research. The descriptive benefits of qualitative open data are related to the richness of the data itself and its ability to increase the depth of understanding of the participants beyond the capabilities of quantitative data (Jones and Alexander 2018). The detail within the data contributes to answering new and detailed research questions not able to be answered using quantitative data. The material benefits of open data for qualitative research are twofold: open data reduces start-up costs (e.g., time and funding) for qualitative researchers with the ability to perform secondary data analysis and reduce the research burden on individuals and communities frequently engaged as research participants (Jones and Alexander 2018). Reducing the start-up costs associated with engaging in rigorous qualitative research reduces a barrier to entry for early career academics and graduate students as they can analyze secondary qualitative data. Additionally, reducing the research burden on frequently studied communities and individuals is also a benefit by reducing research fatigue.

The secondary analysis of qualitative data opens up the possibility of answering new questions in an existing data set or validating the original study's findings (Heaton 2008). Repositories, including the Qualitative Data Repository (QDR), the Inter-University Consortium for Political and Social Research (ICPSR), and the Qualidata in the United Kingdom, are answering the increased demand for available qualitative data for secondary analysis (Heaton 2008; Jones and Alexander 2018; Mannheimer et al. 2019)(Heaton 2008; Jones and Alexander 2018; Mannheimer et al. 2019)(Heaton 2008; Jones and Alexander 2018; Mannheimer et al. 2019).

Transparency, or the ability to assess claims and findings in research, has become crucial against the backdrop of a replication crisis in the social sciences and is a key benefit of open science (T. Haven et al. 2022; Steltenpohl et al. 2023)(T. Haven et al. 2022; Steltenpohl et al. 2023). Transparency through open data allows for validation and replication of researcher statements and findings by reviewers, editors, and readers and increases confidence in the robustness of stated findings (Elman and Kapiszewski 2014; T. Haven et al. 2022; Jones and Alexander 2018)(Elman and Kapiszewski 2014; T. Haven et al. 2022; Jones and Alexander 2018)(Elman and Kapiszewski 2014; T. Haven et al. 2022; Jones and Alexander 2018). (Elman and Kapiszewski 2014) argue that open data in qualitative research will showcase the robust process with reflexive components already included in qualitative research for an increased appreciation of the importance and value of qualitative work. Open data and transparency for qualitative research also address the criticism that qualitative research is often opaque in its reporting of data and process with clear documentation that the appropriate research procedures have been followed and an explanation of how subsets of data are used to support claims made in the analysis (Elman and Kapiszewski 2014).

Given the questionable state of theoretical robustness, addressing the replication crisis in criminology requires effective open-science practices that are compatible with the qualitative methods employed in theory construction. Toward this end, we present an open science research approach for pure theoretical work and other types of document analysis to promote transparency and provide an “open data set” similar to quantitative research. Document analysis is a growing type of qualitative analysis that analyzes data collected from a variety of text-based documents (Bowen 2009; Wood, Sebar, and Vecchio 2020)(Bowen 2009; Wood, Sebar, and Vecchio 2020). Document analysis yields a qualitative data set of text extraction and quotes that are thematically coded and organized in the analysis process (Bowen 2009; Wood, Sebar, and Vecchio 2020)(Bowen 2009; Wood, Sebar, and Vecchio 2020).

This methodology can identify weak theories in published criminological articles, evaluate their potential contributions to generating non-reproducible findings, and publicly disclose the raw data, analysis code, and other research materials employed. Using MAXQDA or other qualitative software packages for document analysis is similar to the currently accepted use of QSPs to extract information to write literature reviews (Bandara et al. 2015; Gizzi and Harm 2021; O’Neill, Booth, and Lamb 2018; Onwuegbuzie, Leech, and Collins 2012)(Bandara et al. 2015; Gizzi and Harm 2021; O’Neill, Booth, and Lamb 2018; Onwuegbuzie, Leech, and Collins 2012)(Bandara et al. 2015; Gizzi and Harm 2021; O’Neill, Booth, and Lamb 2018; Onwuegbuzie, Leech, and Collins 2012)(Bandara et al. 2015; Gizzi and Harm 2021; O’Neill, Booth, and Lamb 2018; Onwuegbuzie, Leech, and Collins 2012). We provide a case study of theoretical evaluation using a purposive sample of social disorganization articles. Through this methodology, it will be possible for readers of future theoretical assessments to achieve the same methodological understanding and engage in the same immediate peer replication and false-positive assessment as other forms of assessment for validity.  

DATA

For the case study demonstrating the utility of MAXQDA for the tagging and coding of social disorganization theory articles, a sample of 15 empirical articles was selected using purposive sampling (see Appendix A for full article list). (Morgan 2022) notes that purposive sampling is typically the starting point for qualitative thematic analysis of documents. These articles were selected due to their inclusion on the University of Cincinnati comprehensive exam reading list for crime prevention in the community and environmental criminology section. The original sample was 39 items; 24 of the items were dropped as they were books or not empirical, as the aim was to examine the empirical use of theoretical constructs. The final sample of articles was N=15.   

METHODOLOGY   

We followed the steps of document analysis described by (Onwuegbuzie, Leech, and Collins 2012). Once the articles were uploaded into MAXQDA 22 (VERBI Software 2023), the research team began constructing the codebook. The codebook was created using a mixed (inductive and deductive) approach and included major and minor headings for all items typically present in empirical articles, including variable names, operationalization, descriptive statistics, data source(s), analysis, model fit statistics, coefficients, etc. See Figure 1 for an excerpt from the codebook.   

Figure 1. Excerpt from MAXQDA Codebook

Variable names are operationalizations we preloaded with the three key concepts in social disorganization theory: poverty, racial and ethnic heterogeneity, and residential stability. The first author generated known operationalizations of these key concepts (e.g., for poverty,

including SES, female-headed households, percent below the poverty line, etc.). Once the initial codebook was completed, the research team coded a few test articles. Additional missing operationalizations were added to the codebook as needed for completeness. When this process was complete, the full sample of social disorganization theory articles was coded for variable names and operationalization of the key social disorganization concepts using MAXQDA. After the articles were coded, code retrievals were generated. Figure 2 displays a portion of the code retrieval. The codes are retrieved across the 15 articles allowing for easy comparison.   

DATA TRANSPARENCY 

As discussed, the data components in qualitative data sets can be complex. In the case study, the data comprise the code system and the extracted coded text (i.e., the sentence that describes the operationalization of each key theoretical construct measured and the variable name). While Figures 1 and 2 display a partial code book and data extraction, the full data set is available in Appendix B and C.

 

Figure 2. Excerpt from MAXQDA Code Retrieval for Social Disorganization Data Sources 

Increasingly, funding agencies and journals are requiring open data (Morgan 2022). Theoretical data, either pure theoretical work or theoretical coding of existing textual data as in the case study document analysis, is a subset of qualitative data that can adhere to open science and open data policies with minimal concerns. In the theoretical coding of existing published textual data, the concerns for participant consent, anonymity or confidentiality, and the contextual nature of the data do not exist in the same ways as other types of qualitative data. Concerns are closer to those of other types of textual analysis, including copyright issues2. However, if the coded data segments are extracted and just the coded text is shared, this avoids a copyright issue. A full data set could be made available through a qualitative data repository based on institutional access to the included documents.  

Qualitative data sharing for theoretical work such as this case study is not subject to the same level of concern (e.g., infrastructure, data, and ethical concerns) as other more sensitive types of qualitative data but generates the same benefits (e.g., transparency, descriptive, and material benefits). (Wilkinson et al. 2016) FAIR guidelines for open data suggest that shared data should be findable, accessible, interoperable, and reusable, standards that are significantly easier to reach with theoretical coding than sensitive data from a limited number of research participants. Assuming the theoretical textual data are published, extensive deidentification is not necessary if the publisher's copyright is protected (e.g., through making extracted and coded text available rather than each document reproduced in full). Working with existing repositories and libraries allows researchers to share qualitative data ethically and while addressing concerns (Mannheimer et al. 2019). Both the Qualitative Data Repository (QDR) and the Inter-University Consortium for Political and Social Research (ICPSR) have mechanisms restricting access to data with the QDR having the ability to curate based on copyright access (Jones and Alexander 2018).

Data concerns surrounding qualitative data include the components of the data (e.g., field notes, transcripts, reflections, etc.), the cooperative process of generating the data, and the requirement for contextual interpretation (Chauvette, Schick-Makaroff, and Molzahn 2019; Mannheimer et al. 2019; Steltenpohl et al. 2023)(Chauvette, Schick-Makaroff, and Molzahn 2019; Mannheimer et al. 2019; Steltenpohl et al. 2023)(Chauvette, Schick-Makaroff, and Molzahn 2019; Mannheimer et al. 2019; Steltenpohl et al. 2023). Purely theoretical data does not experience these issues in the same way. Components of the data will likely be limited to the theoretical texts, code systems, and notes generated during analysis. With qualitative software programs, all this data can be housed in a single database that makes it easy to extract memos, codes, and coded text into a single, contextual, copyright-friendly data set for sharing. Since the researcher engaged in theoretical analysis is engaging with texts rather than individuals, there is cooperation between participant and researcher in generating the data is also limited. Finally, contextual information can be preserved in datasets using document, text, and code system memos that are part of the shared data file. Published texts should have little need to be deidentified to protect participant privacy, given their limited publicly available status.

Open notebook science is one mechanism for document analysis data sharing at the extreme end of open science. In this approach, the online “notebook” is available online both during the duration of the project and after and reflects real-time updates and progress on the project (Bradley, Owens, and Williams 2008; Clinio and Albagli 2017)(Bradley, Owens, and Williams 2008; Clinio and Albagli 2017). This process facilitates collaboration and cumulative science more rapidly than the traditional publication process while reducing redundant research efforts (Schapira, The Open Lab Notebook Consortium, and Harding 2019). Open science notebooks also include detailed research protocols and record researcher decisions making these projects amenable to replication efforts (Schapira, The Open Lab Notebook Consortium, and Harding 2019). The file drawer problem, or the frequent non-publication of null findings, is also addressed through open notebooks, which include real-time reporting of all analyses regardless of statistical significance (Schapira, The Open Lab Notebook Consortium, and Harding 2019). Open notebooks allow engagement with collaborators or other researchers through comment features which can be used to offer suggestions or constructive criticism (Schapira, The Open Lab Notebook Consortium, and Harding 2019). The real-time interactive comment feature is helpful to researchers as mistakes or oversights may otherwise be overlooked until much later in the project. One challenge of adopting open science notebooks is the potential for another group to publish on the data or findings prior to the original research team (Clinio and Albagli 2017; Schapira, The Open Lab Notebook Consortium, and Harding 2019)(Clinio and Albagli 2017; Schapira, The Open Lab Notebook Consortium, and Harding 2019). Partially open notebooks or pseudo-open notebooks could include time-delayed access to address this concern.

Ethical concerns in theoretical research are also limited since it includes textual analysis of existing documents rather than human-subject research. To this end, concerns regarding obtaining consent for secondary data analysis are moot. Concerns for participant confidentiality or anonymity are also not applicable to this type of qualitative research.

PRELIMINARY FINDINGS AND ANALYSIS 

Preliminary findings suggest that within the sample of social disorganization articles coded, there is a lack of standard operationalization of the key concepts. Table 1 illustrates how poverty, ethnic and racial heterogeneity, and residential mobility were operationalized in the sample.   

As demonstrated by the frequency count greater than 15, some articles included multiple measures of a key conceptual variable. For example, poverty was measured both through socioeconomic status and percent below the poverty line, including both individual and neighborhood-level measures. Even within this small sample of articles, the wide range of operationalizations of key conceptual variables may pose a problem for replication within this literature.   

Table 1. Social Disorganization Variable Operationalization Frequency 

Variable Operationalization 

Poverty 

21 

     SES 

(28.57%) 

     % below poverty line 

(23.81%) 

     Female-headed households 

(4.76%) 

     Concentrated disadvantage 

(42.86%) 

     Family disruption 

(4.76%) 

Racial and Ethnic Heterogeneity 

11 

     Racial demographics 

(27.27%) 

     Ethnic heterogeneity 

(72.73%) 

Residential Mobility 

     Residential stability 

(50.00%) 

     Home rental rate 

(12.50%) 

     5-year resident rate 

(37.50%) 

*Note N=15 articles; articles listed in Appendix A.

Another preliminary finding involves the quality of the scanned PDF document. Some of the earlier articles (prior to the 1990s) did not use a standard cell format. This meant that MAXQDA could not accurately identify small bits of coded text. For this reason, it was impossible to code and accurately retrieve the table items, including descriptive and inferential statistics, and small bits of text using MAXQDA in these articles. Future research performing document analysis on older theoretical texts should be aware of this limitation and may wish to investigate OCR software to improve the quality of the scan prior to coding.

STRENGTH OF DATA TRANSPARENCY 

Assuming that theoretical data is open access (i.e., code assignments and text extractions), a reviewer, editor, or reader can assess the validity of assessments or arguments made by the authors (Elman and Kapiszewski 2014; Jones and Alexander 2018)(Elman and Kapiszewski 2014; Jones and Alexander 2018). In the case study, the ability to simultaneously compare the findings reported in Table 1 against the full coded data in Appendix C guards against QRPs and data falsification. Based on the original text, readers can decide if the claims regarding problematic levels of diversity in operationalization made in the findings section are valid or overstated. In short, it increases the reader's confidence in the robustness of the reported findings. It also protects against overstatement of findings or implications within the text and makes it easier to differentiate between what is clearly stated in the original text and the author’s interpretation of such items. With access to view the full code system in Appendix B, it is easy for readers to recognize that the authors interpreted family disruption as an operationalization of poverty regardless of whether this was explicit in the original study text. In this way, it provides a method of triangulating the author’s arguments against the original text and determining the validity and reproducibility of the statement.

High fidelity to original theoretical statements is expected in disciplines with a strong cumulative nature of science. In criminology and other social sciences facing replication crises, however, a game of theoretical “telephone” is often observed with reinterpretations offering increasingly high levels of the author’s interpretation of the theory rather than strict fidelity to original theoretical statements (even if a bit ambiguous) (Francis T. Cullen and Kornhauser 2015; Dooley and Goodison 2020)(Francis T. Cullen and Kornhauser 2015; Dooley and Goodison 2020). Currently, without adopting open science practices in theoretical work, the only way to distinguish between interpretation and fidelity to original statements is to trust in the author’s forthcomingness or deep knowledge of original theoretical texts. As the field of criminology becomes increasingly empirically driven, it is increasingly important to have clarity in the theoretical work that informs empirical models. Distinctions between restatements of theory or interpretation of theory will reduce lumping errors in empirical models and increase model targeting. In turn, this will strengthen the derivation chain and increase the likelihood of reproducibility of findings.

Beyond the case study presented here, a well-known example of the influence of theoretical interpretation and framing is Ruth Kornhauser’s (Kornhauser 1978) Social Sources of Delinquency. This well-cited book reframed Shaw and McKay’s social disorganization theory and other contributors to the Chicago School into a control theory tradition without the cultural deviance portions (Bursik 2015; F. T. Cullen and Wilcox 2015; R. L. Matsueda 2015)(Bursik 2015; F. T. Cullen and Wilcox 2015; R. L. Matsueda 2015)(Bursik 2015; F. T. Cullen and Wilcox 2015; R. L. Matsueda 2015). Specifically, (Ross L. Matsueda 1988; R. L. Matsueda 2015)(Ross L. Matsueda 1988; R. L. Matsueda 2015) notes that Kornhauser identifies Thrasher as a pure control theorist, ignoring his work detailing the cultural transmission of gang values from older gang members to younger boys as well as rejecting Sutherland’s theory of differential association based on misinterpretations of the texts. Today, Kornhauser’s book remains one of the most cited works in the social disorganization tradition, even though her interpretation seriously departs from the original theoretical statements of the Chicago School (F. T. Cullen and Wilcox 2015). Without the differentiation between Kornhauser’s view of social disorganization and that of Shaw and McKay, it is unclear that Sampson’s incorporation of collective efficacy (Sampson, Raudenbush, and Earls 1997) and legal cynicism (Sampson and Bartusch 1998) is a return to the inclusion of culture in this tradition rather than a departure from a control tradition (Bursik 2015; R. L. Matsueda 2015)(Bursik 2015; R. L. Matsueda 2015). Additionally, to the growing ranks of empiricists, a citation from Kornhauser may not be differentiated from a citation from foundational social disorganization texts such as Shaw and McKay, Burgess, and others due to a focus outside of theory. The danger is that this lack of clarity leads to misinformed or mis-specified empirical models that fail to support the theory. This lack of support may be attributed to the original theory rather than the reinterpretation, which has the potential for premature dismissal of theories.  

Open data and other open science practices for pure theoretical work address these concerns as the interpretation is clearly tied to the original text through code systems, text extractions, and memos. The reader can analyze the same original text and decide if the author’s claims or interpretations are supported and reasonably stated (not overstated or understated).

DISCUSSION 

As the social sciences engage with the replication crisis and work to address its causes, it is becoming clear that weak or underdeveloped theory is one key contributor (Ducate et al. 2024; Haeffel 2022; Niemeyer et al. 2022)(Ducate et al. 2024; Haeffel 2022; Niemeyer et al. 2022)(Ducate et al. 2024; Haeffel 2022; Niemeyer et al. 2022). Theory informs the rest of the derivation chain (i.e., hypothesis generation, methods/ measurement, statistics/ analysis, and interpretation) (Ducate et al. 2024; Meehl 1967)(Ducate et al. 2024; Meehl 1967). Strong and well-developed theory constrains the rest of the derivation chain. If theory is weak or underdeveloped, the constraints are reduced and there are increasing researcher degrees of freedom within each step of the derivation chain (Combs 2010; Mearsheimer and Walt 2013; Oberauer and Lewandowsky 2019)(Combs 2010; Mearsheimer and Walt 2013; Oberauer and Lewandowsky 2019)(Combs 2010; Mearsheimer and Walt 2013; Oberauer and Lewandowsky 2019). For example, in the case study, social disorganization theory describes the concepts examined (i.e., poverty, racial and ethnic heterogeneity, and residential stability) as part of the process of social disorganization but does not constrain how each should be measured. As a result of the increased researcher degrees of freedom, we note that the studies in the sample operationalize each key concept in a variety of ways3 (e.g., poverty is operationalized as socioeconomic status, percent below the poverty line, female-headed households, concentrated disadvantage, and family disruption). Empirically, in replications or a meta-analysis to assess the effect size of poverty, it is likely that the variety of operationalizations will impact findings. For example, in (Pratt and Cullen 2005) meta-analysis of macro-level predictors of crime, the effect size of poverty depended on its operationalization, with family disruption having a substantial effect size, socioeconomic status having a moderate effect size, and unemployment without age restrictions having a weak effect size. Each of these individual findings further impacts the interpretation of support of the relationship between poverty and crime in the social disorganization tradition.

Strengthening the derivation chain will reduce these negative impacts and generate findings with a higher replication rate. This can be accomplished in two ways: formalizing theory to a robust version that constrains the derivation chain and focusing efforts on open science practices and data sharing to constrain QRPs and make decisions made within researcher degrees of freedom transparent. The preliminary findings from the social disorganization case study suggest that, overall, MAXQDA can be very effective in coding text for theoretical document analysis. Additionally, using MAXQDA to compile the full data set (e.g., code system, memos, and coded text) generates an easily sharable and comprehensive data file (see Appendix B and C) that meets the FAIR guidelines for open data and could be uploaded as a qualitative version of an open science notebook (Bradley, Owens, and Williams 2008; Wilkinson et al. 2016)(Bradley, Owens, and Williams 2008; Wilkinson et al. 2016). While the literature identifies many concerns regarding qualitative data sharing, they do not apply to theoretical document analysis, meaning that this is a subset of qualitative research that can easily participate in open science practices and open data with few adaptations required. Open science practices and data sharing with other subsets of qualitative data still require substantial consideration to adequately address concerns of infrastructure, data, and ethics (Huma and Joyce 2023; Khan, Hirsch, and Zeltzer-Zubida 2024)(Huma and Joyce 2023; Khan, Hirsch, and Zeltzer-Zubida 2024).

While open science practices and data sharing can substantially contribute to better science and a stronger derivation chain, they cannot solve the problem of weak theory (Ducate et al. 2024; Scheel et al. 2021)(Ducate et al. 2024; Scheel et al. 2021). Theoretical work to formalize theoretical statements, define phenomena of interest, clarify the relationships between concepts, and unpack the mechanisms and explanations underlying empirical relationships is crucial. As many social science disciplines, including psychology and criminology, move towards differentiating specialties (i.e., theory, methods, and statistics), fewer graduate students who are early career academics are choosing theoretical pathways. Instead, most disciplines only leverage theory to justify including certain variables in statistical models. This makes the transparency and rigor of open science practices even more important for those who engage in theoretical work to validate findings.

LIMITATIONS AND FUTURE RESEARCH 

As we examine the benefits of open science practices and the challenges associated with sharing qualitative data, we acknowledge that this case study and its implications are limited. Specifically, only theoretical work using thematic document analysis is examined. The suggestions cannot be generalized to other types of qualitative research. Many types of qualitative data are sensitive and serious consideration is needed to approach and adapt open science practices to meet the FAIR guidelines and remain ethical. Additionally, the case study explores the use of MAXQDA to engage in open science and data sharing. This is only one qualitative software program of many (e.g., NVivo, Atlas.ti, OTHERS) and is not meant to imply a preference for this program over others. Beyond the sharing of qualitative data, there are other mechanisms for achieving the goals of open science (i.e., transparency and rigor), including process analysis (Elman and Kapiszewski 2014; Steltenpohl et al. 2023)(Elman and Kapiszewski 2014; Steltenpohl et al. 2023) and interaction analysis (Huma and Joyce 2023). This exploratory case study is not intended to be comprehensive but to encourage those engaged in theoretical document analysis to consider the importance of their work in addressing the replication crisis and how they might engage in open science practices such as open data.

Regarding the preliminary findings of the case study, the primary limitation is the small sample of articles and purposive sampling methodology. Given these constraints, it is unclear if the apparent variability in operationalization is representative of the larger empirical social disorganization literature or the theoretical literature of criminology as a whole. However, as a case study of a tool for transparency and a novel way of coding theoretical data, it points to future research areas. Specifically, the findings surrounding flexibility in the operationalization of key social disorganization concepts in this sample suggest future research should examine both the theoretical flexibility within the theory and its implications for the replication of findings.  

The replication literature is quite clear on the repercussions of a weak derivation chain on the replicability of findings. (Meehl 1967, 1978)(Meehl 1967, 1978) and (Scheel et al. 2021) identify strong theory as the first component to setting a strong derivation chain. Within theory, the concepts must be clearly defined and have implications for standard measurement (Borsboom et al. 2021; Eronen and Bringmann 2021; Smaldino 2019)(Borsboom et al. 2021; Eronen and Bringmann 2021; Smaldino 2019)(Borsboom et al. 2021; Eronen and Bringmann 2021; Smaldino 2019). Additionally, these concepts should be tethered to the reality of the phenomenon they are modeling (Haig 2023). Future research should examine whether the flexibility in concept operationalization present in this sample is limited to this sample or is a broader issue for empirical tests of social disorganization theory. If it is a broader issue, key findings from social disorganization theory should be assessed for replicability and false positive findings. It is important to be clear regarding the robustness and veracity of foundational theoretical findings. Assuming diagnostic research discovers widespread theoretical flexibility and consequent challenges to replication of findings, additional theoretical work will need to be done to address the impending “theory crisis” as seen in adjacent disciplines (e.g., psychology). Given the importance of this work to the derivation chain and its connections to the replication crisis, work in this area should engage in open science practices, such as the data-sharing technique in the case study, to improve transparency and rigor.

IMPLICATIONS FOR REPLICATION AND SECONDARY DATA ANALYSIS  

Beyond the benefits of using MAXQDA or other qualitative software packages to code empirical articles for transparency, open data packages allow for replication efforts and secondary data analysis. When open data files are used for replication in MAXQDA, the software program has a function to check for inter-rater reliability. This function will quickly assess whether both research teams assigned the same codes to the same text extracts (i.e., the findings successfully replicated). It should be noted that carefully documenting the coding process in the methods section or in an open science notebook will still be crucial for transparency (Bandara et al. 2015).   

In addition to replication, open data is available for secondary data analysis to answer further questions by the original research group or new questions from other researchers (Heaton 2008; Kern and Mustasilta 2023)(Heaton 2008; Kern and Mustasilta 2023). The literature identifies concerns surrounding rigor and ethics in secondary data analysis of qualitative data (Irwin and Winterton 2011; Ruggiano and Perry 2019)(Irwin and Winterton 2011; Ruggiano and Perry 2019). Qualitative data are highly contextual, and if there is a long period of time between data connection and analysis or if the researcher does not engage in primary data collection, there are concerns that key contextual components of the data will be lost, resulting in a lack of rigor in secondary data analysis (Irwin and Winterton 2011; Ruggiano and Perry 2019)(Irwin and Winterton 2011; Ruggiano and Perry 2019). With document analysis of theoretical information, the loss of contextual information is less of a concern as the data are not being constructed through interviews or fieldwork. Additionally, ethical concerns are less applicable as the data are not generated through human-subject research.

CONCLUSION

The widespread replication crisis in the sciences has necessitated a closer look at theory, methods, statistical analysis, and the derivation chain that connects them. Open science practices, including open data, are a common method for addressing concerns that findings are not reproducible and address the methods and statistics portions of the derivation chain. Well-developed theory is a crucial component of the derivation chain that constrains methods and statistics to produce higher quality and more reproducible science. Given the significance of theory and its ability to address the replication crisis, theoretical work such as thematic document analysis should engage in open science practices, including open data. These data do not have the same sets of concerns for data sharing as other subsets of qualitative data but retain the high-impact benefits. We demonstrated one mechanism for open data using a case study of thematic document analysis of published studies of social disorganization theory. With funders trending towards increasing open data requirements, the harsh reality of the replication crisis, and the need for transparency and rigor, we encourage researchers engaged in theoretical work and document analysis to consider how their work might adopt open science and open data practices.

AUTHORS’ NOTES

This work was supported by the Air Force Office of Scientific Research, funding award FA9550-23-1-0453.

Disclaimer: The views expressed in this publication are those of the authors and do not reflect the official policy or position of the United States Air Force, Department of Defense, or the United States Government.

This work contains no conflicts of interest for any contributing author.

REFERENCES

Adler SJ, Röseler L and Schöniger MK (2023) A toolbox to evaluate the trustworthiness of published findings. Journal of Business Research 167: 114189.

Akers RL (2009) Social Learning and Social Structure: A General Theory of Crime and Deviance. New Brunswick, N.J: Transaction Publishers.

Allen C and Mehler DMA (2019) Open science challenges, benefits and tips in early career and beyond. PLoS Biology17(5): e3000246.

Bahlai CA, Bartlett LJ, Burgio KR, et al. (2019) Open science isn’t always open to all scientists. American Scientist107(2). Research Triangle Park, United States: Sigma XI-The Scientific Research Society: 78–82.

Bandara W, Furtmueller E, Gorbacheva E, et al. (2015) Achieving rigor in literature reviews: insights from qualitativesduaptpaoarnt alysis and tool‐. Epub ahead of print 2015.

Barnes JC, TenEyck MF, Pratt TC, et al. (2020) How powerful is the evidence in criminology? On whether we should fear a coming crisis of confidence. Justice Quarterly 37(3). Routledge: 383–409.

Borrego M, Douglas EP and Amelink CT (2009) Quantitative, qualitative, and mixed research methods in engineering education. Journal of Engineering Education 98(1). Washington, United Kingdom: Blackwell Publishing Ltd.: 53–66.

Borsboom D, van der Maas HLJ, Dalege J, et al. (2021) Theory construction methodology: a practical framework for building theories in psychology. Perspectives on Psychological Science 16(4). SAGE Publications Inc: 756–766.

Bowen GA (2009) Document Analysis as a Qualitative Research Method. Qualitative Research Journal 9(2): 27–40.

Bradley J-C, Owens K and Williams A (2008) Chemistry Crowdsourcing and Open Notebook Science. Nature Precedings. Epub ahead of print 10 January 2008. DOI: 10.1038/npre.2008.1505.1.

Brick LAD, Velicer WF, Redding CA, et al. (2016) Extending theory-based quantitative predictions to new health behaviors. International Journal of Behavioral Medicine 23(2): 123–134.

Bringmann LF, Elmer T and Eronen MI (2022) Back to Basics: The Importance of Conceptual Clarification in Psychological Science. Current Directions in Psychological Science 31(4). SAGE Publications Inc: 340–346.

Brunner J and Schimmack U (2020) Estimating Population Mean Power Under Conditions of Heterogeneity and Selection for Significance. Meta-Psychology 4.

Bucerius S and Copes H (2024) Transparency trade-off: the risks of Criminology’s new data sharing policy. The Criminologist 50(2): 6–9.

Bursik RJ (2015) Social sources of delinquency and the second coming of Shaw and McKay. In: Cullen FT, Wilcox P, Sampson RJ, et al. (eds) Challenging Criminological Theory: The Legacy of Ruth Rosner Kornhauser. Advances in criminological theory. Routledge, pp. 105–115.

Center for Open Science (n.d.) What is open science? Available at: https://www.cos.io/open-science (accessed 3 July 2024).

Chauvette A, Schick-Makaroff K and Molzahn AE (2019) Open Data in Qualitative Research. International Journal of Qualitative Methods 18: 160940691882386.

Chin JM, Pickett JT, Vazire S, et al. (2023) Questionable Research Practices and Open Science in Quantitative Criminology. Journal of Quantitative Criminology 39(1): 21–51.

Christ CC, Schwartz JA, Stoltenberg SF, et al. (2018) The Effect of MAOA and Stress Sensitivity on Crime and Delinquency: A Replication Study. Journal of Contemporary Criminal Justice 34(3): 336–353.

Christian M and Wetterberg A (2023) Open Science: Collaboration for Equitable Research | RTI. Available at: https://www.rti.org/insights/open-science-collaboration (accessed 3 July 2024).

Claesen A, Lakens D, vanpaemel  wolf, et al. (2022) Severity and crises in science: are we getting it right when we’re right and wrong when we’re wrong. Available at: https://doi.org/10.31234/osf.io/ekhc8 (accessed 12 July 2023).

Clinio A and Albagli S (2017) Open notebook science as an emerging epistemic culture within the Open Science movement. Revue française des sciences de l’information et de la communication (11). Epub ahead of print 1 August 2017. DOI: 10.4000/rfsic.3186.

Combs JG (2010) From the editors: big samples and small effects: let’s not trade relevance and rigor for power. The Academy of Management Journal 53(1). Academy of Management: 9–13.

Committee on Toward an Open Science Enterprise, Board on Research Data and Information, Policy and Global Affairs, et al. (2018) Open Science by Design: Realizing a Vision for 21st Century Research. Washington, D.C.: National Academies Press. Available at: https://www.nap.edu/catalog/25116 (accessed 3 July 2024).

Cullen FT and Kornhauser RR (eds) (2015) Challenging Criminological Theory: The Legacy of Ruth Rosner Kornhauser. Advances in criminological theory volume 19. New Brunswick: Transaction Publ.

Cullen FT and Wilcox P (2015) The legacy of Ruth Rosner Kornhauser. In: Cullen FT, Wilcox P, Sampson RJ, et al. (eds) Challenging Criminological Theory: The Legacy of Ruth Rosner Kornhauser. Advances in criminological theory. Routledge, pp. 1–22.

Dooley BD and Goodison SE (2020) Falsification by atrophy: the Kuhnian process of rejecting theory in us criminology. The British Journal of Criminology 60(1): 24–44.

Ducate CS, Bostrom SR, Proctor KR, et al. (2024) The Theory Crisis in Criminology: Causes, Consequences, and Solutions. CrimRxiv.

Elman C and Kapiszewski D (2014) Data Access and Research Transparency in the Qualitative Tradition. PS: Political Science & Politics 47(01): 43–47.

Elson M (2024) Pay researchers to spot errors in published papers. Nature 629(8013): 730–730.

Enriquez D (2024) Publishing publicly available interview data: an empirical example of the experience of publishing interview data. Frontiers in Sociology 9: 1157514.

Eronen MI and Bringmann LF (2021) The theory crisis in psychology: how to move forward. Perspectives on Psychological Science 16(4). SAGE Publications Inc: 779–788.

Eronen MI and Romeijn J-W (2020) Philosophy of science and the formalization of psychological theory. Theory & Psychology 30(6). SAGE Publications Ltd: 786–799.

Fanelli D (2009) How Many Scientists Fabricate and Falsify Research? A Systematic Review and Meta-Analysis of Survey Data. PLoS ONE Tregenza T (ed.) 4(5): e5738.

Finkel EJ, Eastwick PW and Reis HT (2017) Replicability and other features of a high-quality science: Toward a balanced and empirical approach. Journal of Personality and Social Psychology 113(2). US: American Psychological Association: 244–253.

Flaxman AD, Hazard R, Riley I, et al. (2020) Born to fail: flaws in replication design produce intended results. BMC Medicine 18(1): 73.

Gardenier J and Resnik D (2002) The misuse of statistics: concepts, tools, and a research agenda. Accountability in Research 9(2). Taylor & Francis: 65–74.

Gizzi MC and Harm A (2021) Using MAXQDA from literature revie to analyzing coded data: Following a systematic process in student research. In: The Practice of Qualitative Data Analysis. MAXQDA Press, pp. 71–88.

Guest O and Martin AE (2021) How Computational Modeling Can Force Theory Building in Psychological Science. Perspectives on Psychological Science 16(4). SAGE Publications Inc: 789–802.

Guo W, Straub D and Zhang P (2014) A sea change in statistics: A reconsideration of what is important in the age of big data. Journal of Management Analytics 1(4). Taylor & Francis: 241–248.

Haeffel GJ (2022) Psychology needs to get tired of winning. Royal Society Open Science 9(6). Royal Society: 220099.

Haig BD (2023) Repositioning construct validity theory: from nomological networks to pragmatic theories and their evaluation by explanatory means. Perspectives on Psychological Science. SAGE Publications Inc: 17456916231195852.

Haven T, Gopalakrishna G, Tijdink J, et al. (2022) Promoting trust in research and researchers: How open science and research integrity are intertwined. BMC Research Notes 15(1): 302.

Haven TL, Errington TM, Gleditsch KS, et al. (2020) Preregistering Qualitative Research: A Delphi Study. International Journal of Qualitative Methods 19: 160940692097641.

Heaton J (2008) Secondary Analysis of Qualitative Data: An Overview. Historical Social Research 33(3): 33–45.

Hedges LV and Schauer JM (2021) The design of replication studies. Journal of the Royal Statistical Society Series A: Statistics in Society 184(3): 868–886.

Hendriks F, Kienhues D and Bromme R (2020) Replication crisis = trust crisis? The effect of successful vs failed replications on laypeople’s trust in researchers and research. Public Understanding of Science 29(3): 270–288.

Hoekstra R and Vazire S (2021) Aspiring to greater intellectual humility in science. Nature Human Behaviour 5(12). 12. Nature Publishing Group: 1602–1607.

Huma B and Joyce JB (2023) ‘One size doesn’t fit all’: Lessons from interaction analysis on tailoring open science practices to qualitative research. British Journal of Social Psychology 62(4): 1590–1604.

Irwin S and Winterton M (2011) Debates in qualitative secondary analysis: Critical reflections. University of Leeds. Epub ahead of print 2011. DOI: 10.5518/200/04.

Jackson J and Kuha J (2016) How theory guides measurement. In: The Handbook of Measurement Issues in Criminology and Criminal Justice. John Wiley & Sons, Ltd, pp. 377–415. Available at: https://onlinelibrary.wiley.com/doi/abs/10.1002/9781118868799.ch17 (accessed 6 May 2024).

Jones K and Alexander SM (2018) Qualitative data sharing and re-use for  socio-environmental systems research:  A synthesis of opportunities, challenges, resources and approaches. Digital Repository at the University of Maryland. Epub ahead of print 2018. DOI: 10.13016/M2WH2DG59.

Kern FG and Mustasilta K (2023) Beyond replication: Secondary qualitative data analysis in political science. Comparative Political Studies 56(8): 1224–1256.

Khan S, Hirsch JS and Zeltzer-Zubida O (2024) A dataset without a code book: Ethnography and open science. Frontiers in Sociology 9: 1308029.

Korbmacher M, Azevedo F, Pennington CR, et al. (2023) The replication crisis has led to positive structural, procedural, and community changes. Communications Psychology 1(1): 3.

Kornhauser RR (1978) Social Sources of Delinquency: An Appraisal of Analytic Models. Chicago: University of Chicago Press.

Kozlov M (2022) NIH issues a seismic mandate: share data publicly. Nature 602(7898): 558–559.

KU Leuven (2023) The benefits of Open Science. Available at: https://www.kuleuven.be/open-science/what-is-open-science/the-benefits-of-open-science (accessed 4 June 2024).

Mannheimer S, Pienta A, Kirilova D, et al. (2019) Qualitative Data Sharing: Data Repositories and Academic Libraries as Key Partners in Addressing Challenges. American Behavioral Scientist. Epub ahead of print 2019.

Matsueda RL (1988) The Current State of Differential Association Theory. Crime & Delinquency 34(3): 277–306.

Matsueda RL (2015) Social structure, culture, and crime: Assessing Kornhauser’s challenge to criminology. In: Cullen FT, Wilcox P, Sampson RJ, et al. (eds) Challenging Criminological Theory: The Legacy of Ruth Rosner Kornhauser. Advances in criminological theory 19. Routledge, pp. 117–144.

McKiernan EC, Bourne PE, Brown CT, et al. (2016) How open science helps researchers succeed. eLife Rodgers P (ed.) 5. eLife Sciences Publications, Ltd: e16800.

McNeeley S and Warner JJ (2015) Replication in criminology: A necessary practice. European Journal of Criminology12(5): 581–597.

Mearsheimer JJ and Walt SM (2013) Leaving theory behind: Why simplistic hypothesis testing is bad for International Relations. European Journal of International Relations 19(3). SAGE Publications Ltd: 427–457.

Meehl PE (1967) Theory-testing in psychology and physics: a methodological paradox. Philosophy of Science 34(2). Cambridge University Press: 103–115.

Meehl PE (1978) Theoretical risks and tabular asterisks: Sir Karl, Sir Ronald, and the slow progress of soft psychology. Journal of Consulting and Clinical Psychology 46(4). US: American Psychological Association: 806–834.

Morgan H (2022) Conducting a qualitative document analysis. The Qualitative Report. Epub ahead of print 2022. DOI: 10.46743/2160-3715/2022.5044.

NASA (n.d.) Open Science at NASA - NASA Science. Available at: https://science.nasa.gov/open-science/ (accessed 4 June 2024).

National Institutes of Health (n.d.) Final NIH Policy for Data Management and Sharing. Available at: https://grants.nih.gov/grants/guide/notice-files/NOT-OD-21-013.html (accessed 3 July 2024).

Niemeyer RE, Proctor KR, Schwartz JA, et al. (2022) Are most published criminological research findings wrong? Taking stock of criminological research using a Bayesian simulation approach. International Journal of Offender Therapy and Comparative Criminology. SAGE Publications Inc: 0306624X221132997.

Oberauer K and Lewandowsky S (2019) Addressing the theory crisis in psychology. Psychonomic Bulletin & Review26(5): 1596–1618.

O’Neill M, Booth S and Lamb J (2018) Using nvivoTM for literature reviews: The eight step pedagogy (n7+1). The Qualitative Report. Epub ahead of print 6 March 2018. DOI: 10.46743/2160-3715/2018.3030.

Onwuegbuzie A, Leech N and Collins K (2012) Qualitative analysis techniques for the review of the literature. The Qualitative Report. Epub ahead of print 2012. DOI: 10.46743/2160-3715/2012.1754.

Open Science Collaboration (2015) Estimating the reproducibility of psychological science. Science 349(6251): aac4716.

Pickett JTP (2020) The Stewart Retractions: A Quantitative and Qualitative Analysis. ECON JOURNAL WATCH 17(1): 152–190.

Pratt TC and Cullen FT (2005) Assessing Macro-Level Predictors and Theories of Crime: A Meta-Analysis. Crime and Justice 32: 373–450.

Pridemore WA, Makel MC and Plucker JA (2018) Replication in criminology and the social sciences. Annual Review of Criminology 1(1): 19–38.

Proctor KR and Niemeyer RE (2020) Retrofitting social learning theory with contemporary understandings of learning and memory derived from cognitive psychology and neuroscience. Journal of Criminal Justice 66: 101655.

Proctor KR, Bostrom SR, Ducate CS, et al. (2024) Flexibility in Variable Operationalization in Social Disorganizartion Theory: A Pilot Study. CrimRxiv. Epub ahead of print 26 March 2024. DOI: 10.21428/cb6ab371.93c97ca6.

Protzko J, Krosnick J, Nelson L, et al. (2023) High replicability of newly discovered social-behavioural findings is achievable. Nature Human Behaviour. Nature Publishing Group: 1–9.

Ritter EH (2022) Using theory to choose an empirical research strategy. In: Handbook of Research Methods in International Relations. Edward Elgar Publishing, pp. 233–256. Available at: https://www.elgaronline.com/edcollchap/book/9781839101014/book-part-9781839101014-24.xml (accessed 6 May 2024).

Rodgers P and Collings A (2021) What have we learned? eLife 10. eLife Sciences Publications, Ltd: e75830.

Ruggiano N and Perry TE (2019) Conducting secondary analysis of qualitative data: Should we, can we, and how? Qualitative Social Work 18(1): 81–97.

Sampson RJ and Bartusch DJ (1998) Legal Cynicism and (Subcultural?) Tolerance of Deviance: The Neighborhood Context of Racial Differences. Law & Society Review 32(4): 777–804.

Sampson RJ, Raudenbush SW and Earls F (1997) Neighborhoods and Violent Crime: A Multilevel Study of Collective Efficacy. Science 277(5328): 918–924.

Schapira M, The Open Lab Notebook Consortium and Harding RJ (2019) Open laboratory notebooks: good for science, good for society, good for scientists. F1000Research 8: 87.

Scheel AM (2022) Why most psychological research findings are not even wrong. Infant and Child Development 31(1): e2295.

Scheel AM, Tiokhin L, Isager PM, et al. (2021) Why hypothesis testers should spend less time testing hypotheses. Perspectives on Psychological Science 16(4). SAGE Publications Inc: 744–755.

Schumm WR, Crawford DW, Lockett L, et al. (2023) Research anomalies in criminology: How serious? How extensive over time? And who was responsible? Accountability in Research: 1–37.

Simmons JP, Nelson LD and Simonsohn U (2011) False-Positive Psychology: Undisclosed Flexibility in Data Collection and Analysis Allows Presenting Anything as Significant. Psychological Science 22(11). SAGE Publications Inc: 1359–1366.

Smaldino PE (2019) Better methods can’t make up for mediocre theory. Nature 575(7783). Nature Publishing Group: 9–10.

Smith PL and Little DR (2018) Small is beautiful: In defense of the small-N design. Psychonomic Bulletin & Review25(6): 2083–2101.

Sotola LK and Credé M (2022) On the predicted replicability of two decades of experimental research on system justification: A Z‐curve analysis. European Journal of Social Psychology 52(5–6): 895–909.

Steltenpohl CN, Lustick H, Meyer MS, et al. (2023) Rethinking Transparency and Rigor from a Qualitative Open Science Perspective. Journal of Trial and Error 4(1): 47–59.

Stevenson MT (2023) Cause, effect, and the structure of the social world. 4445710, SSRN Scholarly Paper. Rochester, NY. Available at: https://papers.ssrn.com/abstract=4445710 (accessed 6 May 2024).

Sweeten G (2020) Standard Errors in Quantitative Criminology: Taking Stock and Looking Forward. Journal of Quantitative Criminology 36(2): 263–272.

Sweeten G, Topalli V, Loughran T, et al. (2024) Data Transparency at Criminology. The Criminologist 5(1): 9–11.

UNESCO (2021) UNESCO Recommendation on Open Science. UNESCO. Available at: https://unesdoc.unesco.org/ark:/48223/pf0000379949 (accessed 3 July 2024).

University of Manchestor (2023) The University of Manchester becomes the new home of CrimRxiv - The global open access hub for Criminology. Available at: https://www.manchester.ac.uk/discover/news/the-university-of-manchester-becomes-the-new-home-of-crimrxiv---the-global-open-access-hub-for-criminology/ (accessed 3 July 2024).

Van Bavel JJ, Rathje S, Vlasceanu M, et al. (2024) Updating the identity-based model of belief: From false belief to the spread of misinformation. Current Opinion in Psychology 56: 101787.

Vazire S, Schiavone SR and Bottesini JG (2022) Credibility Beyond Replicability: Improving the Four Validities in Psychological Science. Current Directions in Psychological Science 31(2). SAGE Publications Inc: 162–168.

Velicer W, Brick L, Fava J, et al. (2013) Testing 40 predictions from the transtheoretical model again, with confidence. Multivariate Behavioral Research 48: 220–240.

Velicer WF, Cumming G, Fava JL, et al. (2008) Theory testing using quantitative predictions of effect size. Applied psychology = Psychologie appliquee 57(4): 589–608.

VERBI Software (2023) MAXQDA 24. Berlin, Germany: VERBI Software. Available at: maxqda.com.

Wilkinson MD, Dumontier M, Aalbersberg IjJ, et al. (2016) The FAIR Guiding Principles for scientific data management and stewardship. Scientific Data 3(1): 160018.

Wong VC, Anglin K and Steiner PM (2022) Design-based approaches to causal replication studies. Prevention Science23(5): 723–738.

Wood L, Sebar B and Vecchio N (2020) Application of Rigour and Credibility in Qualitative Document Analysis: Lessons Learnt from a Case Study. The Qualitative Report. Epub ahead of print 18 February 2020. DOI: 10.46743/2160-3715/2020.4240.

Wooditch A, Fisher R, Wu X, et al. (2020) p-value Problems? An Examination of Evidential Value in Criminology. Journal of Quantitative Criminology 36(2): 305–328.

Comments
0
comment
No comments here
Why not start the discussion?