This is a review of Jacques and Wheeler’s (2024) preprint, “A plea for open access to qualitative criminology: With a Python script for anonymizing data and illustrative analysis of error rates.”
My review will depart from a traditional review in a few ways. First, because it is not anonymous, I will talk about who I am etc. Second, it will be less of a review of the paper per se, and more of a review of its underlying idea.
It is important to note I am one of the signees of Bucerius and Copes’s (2024) argument in The Criminologist. I signed because many points in their argument warrant consideration before we have a blanket policy requiring all qualitative data be publicly available.
This is not to say I am opposed to open access publication of data. My sentiment is quite the opposite. I think it is imperative for qualitative scholars to publish their data. Like Jacques and Wheeler, I believe open access is important “for the sake of science, impact, and social justice.”
Beyond the reasons they list, it is important for the scientific community and the public to see qualitative data because the criminologists collecting and analyzing these data are people—and sometimes people get squirrely. Given the pressures to publish in our field, sometimes criminologists will fake data, adulterate it, willingly turn a blind eye to data with questionable validity, or unwittingly assume data is valid when it is not.
The latter instance recently happened to me. I won’t waste your time with the details here, but in short: I was invited to be a coauthor (4th or 5th) on a paper. When I performed a second analysis of the data I found it was not as it was presented in the paper and, even worse, many of the “interviews” were duplicates.
I removed myself from the project. But someone with less scruples or someone being crushed under the pressure to publish or find a new job might not have. Who knows how often this happens? If all data were publicly available, this would prevent situations like this from occurring and allow for remedy following the use of such data.
Back to Jacques and Wheeler’s paper. Here’s what I love about it. Rather than being adversarial, the paper proposes a solution. It is easy to point out problems (ask any anonymous reviewer who is having a bad day) but not as easy to find possible solutions.
Does this paper address all the potential problems with publicly available data? No. But it does not seek to. It does offer a novel way that will make it easier to address some of those problems—particularly with large datasets. It is an important first step toward solving those problems.
Will I use the Python script? Sure, if I have a grad student or a colleague that is smarter than me that knows how to run Python. Should other people start using this or other technological advances to help them de-identify their data? Absolutely. When applying for IRB and consenting subjects, the inclusion of such a method in the research design may help allay the fears of both groups.
What do we do about the remainder of the concerns with making data publicly available? I am not sure, particularly as the exponential growth of AI makes it a certainty that within a couple years the patterns of your speech which you willingly give away to Microsoft, Google, and Meta will be able to be used by AI to identify you or, at the very least, to narrow down the possible list of people to which you belong. Perhaps counter AI will be developed that will remove or alter patterns of speech in transcripts such that it cannot be identifiable in this way. But this will then open up new problems—such as ruining the ability of linguists to analyze data.
We, or I, could go on and on concerning the spiral of problem, solution, new problem, new solution. Rather than do this, we should focus on a single problem, as Jacques and Wheeler have done, offer a solution for that problem, and then move onto the next.
References
Bucerius, S., & Copes, H. (2024). Transparency and trade-off: The risks of Criminology’s new data sharing policy. The Criminologist, 50(2), 6-9. https://asc41.org/wp-content/uploads/ASC-Criminologist-2024-03.pdf
Jacques, S., & Wheeler, A. (2024). A plea for open access to qualitative criminology: With a Python script for anonymizing data and illustrative analysis of error rates. CrimRxiv. https://doi.org/10.21428/cb6ab371.15d7c59e