Skip to main content
SearchLoginLogin or Signup

From text to ties: Extraction of corruption network data from deferred prosecution agreements

Deferred prosecution agreements (DPAs) are a legal tool for the nontrial resolution of cases of corruption. Each DPA is accompanied by a Statement of Facts that provides detailed and publicly available textual records of the given cases, including summarized evidence of who ...

Published onJan 01, 2023
From text to ties: Extraction of corruption network data from deferred prosecution agreements
key-enterThis Pub is a Version of
From text to ties: Extraction of corruption network data from deferred prosecution agreements
From text to ties: Extraction of corruption network data from deferred prosecution agreements
Description

Deferred prosecution agreements (DPAs) are a legal tool for the nontrial resolution of cases of corruption. Each DPA is accompanied by a Statement of Facts that provides detailed and publicly available textual records of the given cases, including summarized evidence of who was involved, what they committed, and with whom. These statements can be translated into networks amenable to social network analysis allowing an analysis of the structure and dynamics of each case. In this study, we show how to extract information about which actors were involved in a given case, the relations and interactions among these actors (e.g., communication or payments), and their relevant individual attributes (gender, affiliation, and sector) from five Statements of Fact. We code the extracted information manually with two independent coders and subsequently, we assess the inter-coder reliability. For assessing the coding reliability of nodes and attributes, we use a matching coefficient, whereas for assessing the coding reliability of ties, we construct a network from the coding of each coder and subsequently calculate the graph correlations of the two resulting networks. The coding of nodes and ties in the five extracted networks turns out to be highly reliable with only slightly lower coding reliability in the case of the largest network. The coding of attributes is highly reliable as well, although it is prone to missing data on actors’ gender. We conclude by discussing the flexibility of our data collection framework and its extension by including network dynamics and nonhuman actors (such as companies) in the network representation.

 

Comments
0
comment
No comments here
Why not start the discussion?