Skip to main content
SearchLoginLogin or Signup

Entanglement: Cybercrime Connections of a Public Forum Population

Published onJul 20, 2022
Entanglement: Cybercrime Connections of a Public Forum Population
·

Acknowledgements

The authors would like to thank the Stratosphere Laboratory team and the anonymous reviewers for their reviews and suggestions, along with Anna Shirokova for the help with Russian translations. The authors would also like to thank Veronica Valeros for her help in the data gathering process, Avast Software for partial funding of this research, and Flare Systems for their access to the forum’s data.

Abstract

Many activities related to cybercrime operations do not require much secrecy, such as developing websites or translating texts. This research provides indications that many users of a popular public internet marketing forum have connections to cybercrime. It does so by investigating the involvement in cybercrime of a population of users interested in internet marketing, both at a micro and macro scale. The research starts with a case study of three users confirmed to be involved in cybercrime and their use of the public forum. It provides a first glimpse that some business with cybercrime connections is being conducted in the clear. The study then pans out to investigate the forum population’s ties with cybercrime by finding crossover users, that is, users from the public forum who also comment on cybercrime forums. The cybercrime forums on which they discuss are analyzed and the crossover users’ strength of participation is reported. Also, to assess if they represent a sub-group of the forum population, their posting behavior on the public forum is compared with that of non-crossover users. This blend of analyses shows that (i) a minimum of 7.2% of the public forum population are crossover users that have ties with cybercrime forums; (ii) their participation in cybercrime forums is limited; and (iii) their posting behavior is relatively indistinguishable from that of non-crossover users. This is the first study to formally quantify how users of an internet marketing public forum, a space for informal exchanges, have ties to cybercrime activities. We conclude that crossover users are a substantial part of the population in the public forum, and even though they have thus far been overlooked, their aggregate effect in the ecosystem must be considered. This study opens new research questions on cybercrime participation that should consider online spaces beyond their cybercrime branding.

Key words: cybercrime, public forum, case study, informal market

Introduction

There is a large spectrum of information technology (IT) tasks surrounding cybercrime that appear legal, like developing websites or redirecting traffic. For such IT tasks, “the criminal character does not have to be clearly visible to the person concerned, or it can be denied afterward” [1][p.6]. Consequently, the neutrality of IT [1, 2] allows individuals to conduct parts of their cybercrime operations in plain sight.

This study uncovers and explores the involvement in cybercrime of a population of users interested in internet marketing. It starts with a case study of three actors known to be involved in cybercrime through helping the spread of a banking Trojan botnet. Using a machine learning technique and a content analysis, we assessed the interactions of these three actors as well as their relationships with other users in a public forum. This public forum gathers individuals discussing internet marketing and informally exchanging products and services related to their business.

The focus of the research then pans out to investigate the forum population’s connections to cybercrime by finding crossover users. Crossover users are individuals from the public forum who also have commented on cybercrime forums. Three assessments are conducted. First, the population of crossover users is estimated through username matching. Second, we explore the types of cybercrime forums on which crossover users discuss, and their level of participation on these forums. Third, through a series of statistical tests, we evaluate whether the commenting patterns of crossover users, on the public forum, differ from those of non-crossover users. From this series of analyses, we conclude that:

  • The actors in the case study use the public forum to develop their internet marketing business, a business which has verified connections to cybercrime.

  • There is a minimum of 7.2% of crossover users in the public forum population.

  • Cybercrime forums on which crossover users discuss are diverse, from hacking to money laundering and blackhat SEO.

  • The participation of crossover users in cybercrime forums is limited.

  • When considering their posting behaviors, crossover users are relatively indistinguishable from other users in the public forum.

This research is the first to explore how users of a public forum on internet marketing have ties to cybercrime. It opens new research avenues on cybercrime participation, avenues that should consider online forums beyond their cybercrime branding, especially given the neutrality of IT tasks [1, 2].

Moreover, the public forum hosts a market where products and services related to internet marketing are exchanged informally. This market recalls traditional informal markets where the product or service is not necessarily illegal; it is rather the means by which it is produced or distributed that is illegal [3, 4, 5]. Informal markets are known to be attractive settings for criminal groups to operate in due to their lack of regulations [3, 6]. This known attractiveness coupled with the neutrality of IT and the findings of this study point towards the need to further investigate how informal online settings can be leveraged for cybercrime operations.

The paper is divided as follows. Section 2 presents the literature review. Then the data, methods and results are presented in Section 3 for the case study (micro scale) and in Section 4 for the public forum population (macro scale). Section 5 and Section 6 present the discussion and the study limits respectively. A short conclusion is provided in Section 7

Literature Review

To frame the results of this study, Section 2.1 presents previous work on the cybercrime industry and the IT tasks surrounding cybercrime operations. Then an overview of informal markets and their online counterparts is provided in Section 2.2 .

Beyond the Cybercrime Underground

Understanding the organization of the cybercrime industry has been a topic of interest in computer security and criminology in the past decades [7, 8, 9, 10, 11, 12, 13]. A key feature of the industry is specialization: one can specialize in a specific task, such as monetizing credit cards, and outsource the remaining tasks to other actors in the industry [14, 13, 11, 12, 15, 8]. Such specialization reduces the costs of cybercrime through increased productivity and profitability [16].

Two recent studies [8, 17] investigated specifically “as-a-service” advertisements in underground forums. One [8] focused on eight anonymous online markets (also known as darknet markets [18] or cryptomarkets [19]) over six years and the other [17] on the well-known underground forum named “HackForums” over 11 years. In both cases, despite clear evidence of specialization in the industry [14, 13, 11, 12, 15, 8], the two studies showed that the number of specialized “as-a-service” listings advertised in underground forums was limited. From these findings, the authors of [8] hypothesized that outsourcing critical parts of the cybercrime value chain may be difficult. On the other hand, such “as-a-service” offerings may be limited because a great number of tasks related to cybercrime operations require IT expertise, but not necessarily secrecy.

Several studies reported criminal groups actively seeking such expertise [20, 21, 22, 23, 2]. For example, when studying networks involved in banking theft, [20, 21, 22, 23] reported core members of criminal groups recruiting individuals to develop websites (programmers) or translate texts. Some of these tasks were not criminal, but their use was.

Bijlenga and Kleemans (2018) [2] also found that individuals and organizations with IT expertise were actively leveraged by criminal groups. They studied five Dutch criminal investigations where expertise in the IT sector was sought by individuals involved in criminal activities. In three of the five cases, the basis of the collaborations was a legal business relationship. The authors mentioned that such a relationship was possible because the criminal nature of the tasks was not always obvious; the good or service provided was legal, while its use was not.

When discussing criminal groups seeking IT expertise, [1] stated that business collaborations can be established without the contractor or seller knowing that the product or service provided will be used for criminal purposes. The authors argued that, due to the neutrality of IT, “the criminal character does not have to be clearly visible to the person concerned or it can be denied afterward” [1][p.6]. This neutrality creates a blurry frontier between legal and criminal IT tasks, allowing individuals to recruit beyond underground settings. In addition, beyond these settings, there exist informal markets that represent interesting spaces to find business partners.

The Middle-Ground: Informal Markets

Informality is a broad and multifaceted concept tackled by scholars from various disciplines, including economics, sociology, and criminology. In general, informal markets are associated with the reverse side of the official economy: the unregulated or unregistered economic activities [24]. In such markets, the product or the service exchanged is not necessarily illegal; it is rather the means by which it is produced and distributed that is illegal [5, 3]. For example, developing a website for commercial purposes and not declaring the profit associated with it represents an informal economic activity. On the other hand, using the developed website for a cybercrime operation that steals banking credentials represents a criminal activity.

Portes and Haller (2010) [4] define three aims of informal economies for market participants: survival, dependent exploitation (such as decreased labor costs), and growth. Growth includes capital accumulation, solidarity, and flexibility [p.405-6]. Thus, considering the latter, informal economies are not solely destructive; they also provide jobs to otherwise unemployed individuals, lower costs for products and services, and foster innovation [3, 4]. Informal economies are also highly dependent on social ties to develop trust among market participants [4, 25]. Indeed, their inception and development are reliant on the social structures behind them as well as their geographical position, such as access to trade routes or labor [25]. Their economic activities, although informal, are also often considered socially acceptable by their social group. In general, individuals engage in informal markets due to autonomy, social networks, ease of entry, flexibility, and freedom [3].

Informal markets are also seen as attractive settings for criminal groups due to their lack of regulations [26, 25, 27]. For example, informal financial markets represent attractive channels for money laundering [27]. Also, a study of 30 informal entrepreneurs in the UK [3] illustrated that informal market participants are ready to embark on criminal business opportunities when the prospects for profits are high and the likelihood of being caught is low. On the other hand, Sabet (2015)’s study on informal and criminal sectors in Mexico illustrated that although, in theory, informal settings offer opportunities for criminal groups, in practice, those who operate in informal settings tend to avoid being involved in criminal activities when possible [26]. All in all, the line between informal and criminal markets is often difficult to draw in practice, as these markets merge and are interrelated in various ways [27, 25, 26].

Furthermore, nowadays the online setting may change how criminal and informal markets are intertwined in specific cases. This is especially true given that the internet is becoming a robust channel for economic transactions [28] and informal online economies are thriving [29, 30]. Moreover, the potential trust problems and uncertainties that informal market participants usually face in traditional settings, leading them to develop strong social structures, have been partially neutralized with the rise of informal institutions [30, 31]. Informal institutions are platforms providing mechanisms for neutralizing trust issues among market participants through various reputation systems, such as providing feedback.

Examples of informal online institutions are freelancer platforms [32, 33, 34]. Such digital labor platforms allow hiring independent contractors based on their skills and knowledge [32]. Payments are usually negotiated individually, and the jobs contracted via freelancer platforms are often conducted remotely and related to internet marketing, such as SEO optimization, website development, marketing design, and content or legal writing [32][p.14]. Freelancer platforms thus host informal online labor markets.

Currently, these informal online markets, just like their offline counterparts, create an environment auspicious for criminal activities. Two of these platforms have already been associated with cybercrime activities. Farooqi et al. (2017) [35] tagged the platform SEOClerk as a “blackhat marketplace” and [36] considered the platform Freelancer as a hub for criminal activities. The latter assumption was based on the results of [37], a study that investigated the on-demand platform Freelancer and concluded that 66% of the jobs posted were legitimate, meaning that 33% were likely related to illegal activities, included thwarting security mechanisms or sending spam.

This study uncovers and explores the involvement in cybercrime of a population of users interested in internet marketing, both at a micro and macro scale. What gathers this population is a public forum (less formal or structured than freelancer platforms) on which users discuss internet marketing and informally exchange products and services related to their business. The case study is presented first (micro), followed by the analysis of crossover users (macro).

Case Study

The study starts with a case study of three actors involved in cybercrime and their use of the public forum. We position them with respect to other forum users by using machine learning, and conduct a content analysis on their forum interactions. These analyses allow us to better understand the role of the public forum for these actors, as explained below. In this section, the context of the case study is first presented, including an introduction to the public forum and how we gathered its publicly available data. The data and methods are then presented, followed by the results of the case study.

Context

The case study builds on previous research [38, 39, 40] that investigated the private conversations of individuals known to be involved in cybercrime activities. These private conversations came from a leaked chat log that was found on VirusTotal [41] by security researcher Veronica Valeros. Although there were dozens of individuals/actors discussing in this private chat log, the most active three actors sent over 80% of the messages. It is these three actors who are the main protagonists of this case study. They are named Actor 1, Actor 2, and Actor 3 below1.

These three actors developed websites advertised as libraries for “cracked” or “modded” Android applications (APKs). Modded APKs are modified versions of originals either providing better functionalities or unlocking paid features. Between 2017 and 2018, when the private conversations took place, the APKs available on their website were malicious and related to the Geost botnet. The Geost botnet was an Android banking Trojan botnet that infected nearly 800,000 Russian phones and had access to millions of Euros [40, 39]. Visitors to the websites of the three actors thought they were downloading modded or cracked APKs while they were actually downloading banking Trojans. The three actors were paid for every malicious APK successfully installed through their websites. They acted as affiliates in what appeared to be a black market pay-per-install (PPI) program related to the Geost botnet [40, 39].

The previous research [38] showed the difficulties that these actors faced daily. They were amateurs trying to monetize their websites through any means necessary, such as participating in various monetization programs. The three actors also discussed on a public forum, the focal point of this study. We associated the actors in the private conversations with their public forum usernames because (i) they used the same or variants of their usernames; and (ii) they posted in the private conversations links to their public interventions, such as: “ordered texts [link to comment on the public platform]”.

Introducing the Public Forum

The public forum is searchengines.guru, a Russian- and English-speaking forum dedicated to internet marketing. Internet marketing is an all-inclusive term that refers to economic activities focused on marketing products or services online. The forum was created in early 2000 and, as of 2021, reported over 400,000 registered members and 14,000,000 comments. It advertises itself as a “website allowing users to discuss issues related to creating and promoting websites on the internet [...]. The forum brings together experts in all areas of online advertising and allows you to receive both free knowledge and find mutually beneficial contacts and partners”. Topics of discussion are divided into categories which range from search engine result optimization to monetizing sites or hiring web masters, as presented in Table 1 below. Although the forum is not an official matchmaker for demand and supply of products and services related to internet marketing, many users leverage it as an advertisement space. Hence, many users conduct business deals through the forum. In this study, the public forum is conceptualized as a space where informal exchanges of products and services related to internet marketing take place.

Data Access

For this study, we used an academic access to the Flare Systems database [42], a Montreal-based company that maintains a cyber threat intelligence platform. Note that the company provided an academic access to their database out of interest in the research we conducted and with the sole agreement that we acknowledge that the data is provided by them. Since we did not gather the data on the public forum ourselves, we conduced an additional check to confirm that it was representative of the reality. To do so, we selected 50 random actors and compared the number of comments found on the database with the number of comments found on the forum from 2012 to 2020. The database contained, on average, 93% (std=0.13) of the total number of comments published per actor on the forum, illustrating a sufficiently accurate coverage for our research. Finally, this access allowed us not only to quickly gather high-quality data on the public forum, but also to identify crossover users (public forum users who also speak on cybercrime forums), as explained below.

Creating a Forum Population Map

To gain a comprehensive perspective on the public forum population, we positioned the three actors in the public forum relative to others, based on their posting behavior in each of the forum’s categories. This was made possible by generating a forum map (i.e., a visual aid). The forum map is a two dimensional representation of the forum population where each individual is graphically put in relationship with others. The dataset created for this analysis is presented below.

Dataset of the Forum Population

Since the public forum has been active for nearly 20 years, we selected the timeframe of the private discussions, 2017 and 2018, to be the study period for the forum map dataset. Selecting these two years allowed us to stay as close as possible to the context of the private discussions when mapping the forum population in relation with the three actors.

More precisely, we extracted all comments posted on the public forum between 2017 and 2018. For each comment, the extracted features were: the comment’s identification number, the text, the timestamp, the name of the actor who wrote it, the title of the thread, and the thread’s identification number. The final dataset included 685,815 comments, 34,706 threads, and 23,348 users2.

To map users based on their posting behavior in each category of the public forum, we had to extract the thread’s category. Consequently, we crawled the public forum over several days (so as to ensure that the website’s server would not experience disruption from our research activity) using the thread identification number. A total of nine categories and 80 subcategories were found.

Each category has a specific set of rules enforced by the public forum administrators. Table 1 shows the nine categories and a sample of their subcategories, along with the percentage of comments posted in each category in 2017 and 2018 combined. Categories ranged from Search Engine to Monetizing Websites to Hiring Webmasters. As shown in Table 1, the category About Monetizing Sites was the most popular one, representing 20% of the comments, followed by Not About Work with 17% and Site Building with 16%.

Table 1. Summary of categories and subcategories in the public forum, with the percentage of total comments on them between 2017 and 2018.

Category Subcategories % of Comments

Category

Subcategories

% of comments

About Monetizing Sites

Partnership Programs, General Questions about

Making Money on Sites, YouTube Monetization

20%

Not About Work

Meetings and Gatherings, Smoking Room, About the Site and Forum

17%

Site Building

Domain Names, Hosting and Servers for Websites, Web Analytics, Copywriting

16%

Communication of Professionals

Cryptocurrencies, Ecommerce, Social Media Marketing

14%

Practical Optimization Issues

Popular SEO and SEO Newbie Questions, Doorways and Cloaking,

General Optimization Issues

13%

Search Engine

Yandex, Site Directories, Google

10%

Exchange and Sales

Buying and Selling Sites, Digital Goods, Programs and Scripts

5%

Work and Services for Webmasters

Copywriting Translations, Social Media Marketing Services,

Optimization Promotion and Audit

3%

About Purchased Traffic for Websites

Teaser and Banner Advertising, Contextual Advertising,

Yandex Direct, Google Ads

2%

Descriptive Statistics

Table 2 presents the descriptive statistics of the forum population dataset. In this dataset, users commented, on average, 30 times (std = 151) on 11 threads (std=50) and two categories (std = 2). At least 50% of users commented fewer than four times, illustrating that user participation was unequal, with most users exhibiting a low rate of participation. This distribution of comments reflects the participation inequality rule found in online communities and highlighted by several scholars [43, 44, 45, 46, 47, 48].

Top Poster Dataset. Due to non-participating users blurring the visual representation, using the entire dataset to map the forum population was inconvenient. When investigating the distribution of comments per user, we noticed a slight breakdown at 10 comments, with about 70% of users posting fewer than ten comments and 30% posting more than ten comments. Consequently, we used this slight breakdown to reduce the noise induced by a mass of sporadic users and created a subset of the dataset with Top Posters: those who posted at least 10 times in 2017 and 2018. Descriptive statistics of the Top Poster dataset are presented in Table 2 as well. In this dataset, users commented, on average, 92 times (std = 267) on 34 threads (std=88) and four categories (std = 2). At least 50% commented fewer than 27 times. We used this Top Poster dataset to create the public forum map presented below.

Table 2. Descriptive statistics for the forum population dataset and the Top Poster dataset

All Users. N=23,348

Min

Max

Mean (std)

Med

N. Comments

1

6,603

30 (151)

4

N. threads

1

2,013

11 (50)

2

N. Categories

1

9

2 (2)

1

Top 30% Users. N=6924

Min

Max

Mean (std)

Med

N. Comments

10

6,603

92 (267)

27

N. threads

1

2,013

34 (88)

12

N. Categories

1

9

4 (2)

3

Uniform Manifold Approximation and Projection for Dimension Reduction (UMAP)

To project the data points of users, based on their posting behavior in each of the forum’s categories, into a comprehensible representation, we used the Uniform Manifold Approximation and Projection for Dimension Reduction (UMAP) [49]. UMAP is a mathematically robust and efficient method to project high dimensional data into lower dimensions while preserving the underlying structure both at the local and global scales. Due to its ability to embed complex structural relationships into much more comprehensible representations, UMAP has recently been used to map highly complex phenomena like cellular biology [50, 51] and multi-scale population structure [52].

The data of users were projected as points from R9 (number of forum categories) to the visual plane R2. Each dimension in R9 represents a feature that corresponds to the number of comments made by a user in one of the nine categories over the span of the two years. Note that the two coordinates in the R2 projection do not have a semantic meaning. However, the distance between points in R2 respects the underlying manifold structure in the source R9 dimensions as much as possible.

UMAP has two important parameters: the minimal number of neighbors for each point in the original dimension and a distance measure. Both parameters of UMAP are tuned as a result of an exploratory phase and the distance measure is justified according to the nature of the data and the inquiry. In this study, the final parameters used for the UMAP transformation, chosen so that it resulted in the clearest segmentation of users, were n neighbors = 100 along with the Euclidean distance. Euclidean distance is a generalization in n dimensions of the natural notion of distance between points in a plane —a straight line.

We added the three actors to the forum map, providing an insightful overview of their position in the forum population (as presented in Section 3.4). Additionally, for the case study, we conducted a content analysis on the three actors’ public comments, giving us a qualitative understanding of their use of the forum.

Analyzing Actors’ Comments

To gain a deeper understanding of the actors’ use of the public forum, we conducted a content analysis of their publicly posted comments. To do so, we gathered all available information about them from the public forum, as explained below.

Dataset of Actors’ Comments

We extracted all comments3 for each of the three actors, including the comment’s timestamp, identification number, content, related actor’s identification number, title of the thread, and thread identification number. Table 3 presents the number of comments and threads found on the public forum for all three actors, as well as their first and last year of activity. To have a global overview of actors’ use of the forum, we decided to analyze all their comments, regardless of the time they were posted. This allowed us to assess whether the actors ever mentioned their cybercrime participation in the public forum.

Table 3. Three Actors’ Participation in the Public Forum

Actor 1

Actor 2

Actor 3

N. Comments

1,385

172

471

N. Threads

759

69

331

Activity Period

2009-2020

2012-2019

2010-2019

Content Analysis

We analyzed the comments using a content analysis with the research question: how do the actors interact in the public forum? Content analysis aims at systematically uncovering the use of certain words, themes, and semantics. More precisely, we developed an initial coding scheme with a small subset of 215 comments. Then, one coder followed that scheme to classify the three actors’ comments, which summed up to 1,990 comments from 2009 to 2020. In situations of uncertainty, the coder consulted the rest of the team, who evaluated the comments and agreed on a theme.

All comments were written in Russian and translated using the Googletrans library [53]. To contextualize them, and because the automated translation was sometimes inadequate, the coder often searched the comments on the public forum and went through the related thread using web browser translation (that showed better translation results). In cases in which the coder had a hard time understanding the meaning of the interventions, such as the use of slang words, the coder consulted a native Russian-speaking researcher to understand the meaning of the comment. This Russian-speaking researcher had translated the private conversations of the three main actors in [38, 39, 40]. She was thus knowledgeable on the topics discussed by these three actors and their use of slang words. Overall, the coder could, despite translation limitations and thanks to the simple themes created as well as the help of the Russian-speaking researcher, determine with a high degree of certainty how the actors used the forum.

In terms of procedure, the meaning of each theme evolved throughout the analysis as the coder included more comments in each of them. Finally, the coder also created memos on each actor to complement the themes. At the end of the analysis, four large themes encompassed how the actors interacted on the public forum: as buyers, as sellers, as forum participants, or as tool users. Each of these is described below.

As a Buyer The theme as a buyer included comments aimed at purchasing or commenting upon the quality of a service or a product, including providing a positive, neutral or negative feedback, asking for a price, or asking to contact a seller for further information.

As a Seller The theme as a seller included comments aimed at offering a service or a product in the public forum, thanking a buyer when receiving feedback, or providing customer services.

As a Tool User The theme as a tool user included comments aimed at giving an opinion about a tool, helping others to solve similar issues, or sharing experience with a tool (such as website promotion tools or traffic monitoring tools).

As a Forum Participant The theme as a forum participant included comments that were more related to participating in the public forum in general, such as asking general questions or giving general advice (not specific to a tool) as well as sharing information on various topics.

Ethical Considerations

The case study has been approved by Simon Fraser University ethics department (study number 2020s0121) and University of

Montreal (study number CERSC-2021-131-D) under minimal risks, which required asking for a waiver of consent in line with Article 5.5A of the Canadian Tri-Council Policy Statement on research ethics. To ensure participants’ confidentiality and privacy, we do not use the real pseudonyms of the actors.

There are ethical issues regarding the research that need to be acknowledged [54]. In terms of potential harms, studying the cybercrime connections of a public population can lead to wrongly labeling individuals as cybercriminals. It can also result in profiling and marginalizing forum users, while also shifting law enforcement focus onto them. We try to avoid creating these harms by taking a nuanced approach when interpreting the results. In return, the research leads to better understanding the context that may lead a mass of users to have connections with cybercrime. It shows that part of a public internet marketing forum population has ties with cybercrime forums. It also discusses how the neutrality of IT may explain why specific cybercrime tasks (such as developing websites) can take place in plain sight, and highlights the limited participation of crossover users in cybercrime forums.

Finally, the interpretation of the results may lead to policy opportunities that can prevent cybercrime participation.

Results of the Case Study

The results of the case study are presented below, including the forum map and the three actors’ positions in it as well as the findings from the content analysis.

The Map of the Forum Population

The resulting two-dimensional representation of the public forum users is shown in Figure 1 and is called the map in this work. In the map, each point represents a user, and points that are close together are users with similar posting profiles. The (x,y) coordinates are not directly interpretable but the relationships among points are important. Figure 1 shows a set of arm-shaped groups that stretches from the center to the outside. The shapes of the groups are very informative, as groups that take the shape of long and narrow arms represent users that comment mostly in one category, with the most active users at the outer ends of the arms.

Figure 1: The Map of the Forum with the 2017-2018 Top Poster Dataset

We investigated each of the arms in the map to find the dominating public forum category on that arm. These groups define more or less tight communities, some with very active users, commenting several thousand times over the span of two years, as in the Site Building category. On the other hand, the arms that are wider or closer together can be seen as groups that blend more or less with other ones, as in the Not About Work and Communication of Professionals categories. Also, some categories like Work and Services for Webmasters and Exchange and Sales are close to each other, and do not contain users with extreme posting behavior.

Positioning the Three Actors

We added the three actors to the map in Figure 1. Their location is quite informative: they are closer to the center than to any arm’s end. The shapes of the groups which Actors 1 and 2 are close to, Work and Services for Webmasters and Exchange and Sales, are also interesting. These are much shorter arms and are closer together, indicating that fewer users in these categories were very active contributors to their groups, hinting at a more opportunistic behavior. These categories are also explicitly related to business. Actor 3, on the other hand, does not seem to belong to any group of interest on this platform.

Typical Users

Figure 2 shows the distribution of themes found for each actor based on his comments on the public forum. These themes are discussed below for each actor, along with additional contextual (and valuable) information that was available in the coder’s memos.

Actor 1 Figure 2 shows that Actor 1 interacted as a buyer 62% of the time, buying a variety of web products, including images, logos, texts, code reviews, security reviews, programs, scripts, and traffic. We noticed an overlap between the actor’s comments in the private discussions and his comments in the public forum. For example, in the private discussions, Actor 1 discussed the need to review one of his4 websites for search engine optimization; this website is known to have hosted more than 17 malicious Geost APKs. In the public forum, within the same timeframe, he asked for an external review of one of his websites (a transaction that was successfully completed according to the public conversations). Also, Actor 1 interacted as a forum participant 19% of the time, sporadically helping others on website building matters. The actor also interacted as a seller 10% of the time, offering writing services and ready-to-use APK portals —known as turnkey websites— as well as generic templates and website layouts. Lastly, the actor interacted only 9% as a tool user.

Actor 2 The second actor was less active on the public forum, posting a total of 169 comments. As shown in Figure 2, the actor interacted 37% of the time as a forum participant, commenting on topics related to programming and website traffic or recommending websites in general, and 26% of the time as a tool user. He also interacted 21% of the time as a seller, selling ready-to-use APK portals as well as generic APKs, images, videos, or texts. Lastly, about 16% of the time, Actor 2 acted as a buyer, purchasing, for example, systems to monitor sites or texts to fill websites.

Actor 3 Actor 3 commented 457 times on the public forum and 76% of his comments were as a forum participant, helping others or asking for help on internet marketing topics. Only 15% of the time did Actor 3 interact as a seller, offering, for example, scripts and parsers. He also interacted in the public forum 9% of the time as a buyer, purchasing tutorials for social media marketing or ready-to-use websites. None of his comments was as a tool user.

Figure 2: Distribution of Themes per Actor

In sum, all three actors interacted as sellers or as buyers (Actor 3 to a lesser extent) and many of the topics they discussed in the public forum were the same topics discussed in the private conversations. Consequently, the three actors actively used the public forum as source of information as well as to find products or services related to their business, which, by 2017 and 2018, had ties to cybercrime activities. According to the forum map, Actors 1 and 2 were positioned in more opportunistic groups with users who spoke less, on average. Actor 3 was positioned near the center map, in no specific groups. All in all, there is nothing special about them; they were typical users of the forum. We noticed, as well, that none of the comments studied hinted that these actors were involved in cybercrime activities.

Given the results of the case study, whether other users in the public forum had connections to cybercrime became a topic of interest. This allowed us to move from an in-depth micro understanding of three actors to a macro assessment of the scale of the problem. In the next section, we analyze crossover users, or public forum users who also participate in cybercrime forums.

Deep Dive on Crossover Users

The forum population overlap with cybercrime spaces is assessed by identifying crossover users: individuals from the public forum who also discussed on cybercrime forums. We evaluate the scale of the problem through a series of analyses. Crossover users are first identified and the cybercrime forums on which they discuss are analyzed. Then, their posting behavior on the public forum is compared with that of non-crossover users. The idea is that, if crossover users form a subgroup in the public forum by displaying specific posting behaviors, they should be further investigated as a subgroup of the forum population. The data and methods used for these analyses are first presented below, followed by the results.

Strategy to Identify Crossover Users

To identify crossover users, we searched in the database to determine whether some of the usernames found in the public forum also discussed in cybercrime forums over a similar timeframe. This cross-correlation method is based on the idea that users are likely to choose the same username in different forums, a phenomenon that was observed in previous studies [55, 56, 57, 58, 59, 60].

More precisely, several studies show that individuals can efficiently be identified across online platforms through a simple username matching method [55, 56, 57, 58, 59, 60]. On the other hand, there is a considerable strain of research aimed at developing more sophisticated methods to link individuals across online platforms based, for example, on the features of a user profile [61, 62] or his/her/their writing style (a technique known as stylometry) [63, 64]. These studies often use a sample that represents the ground-truth and test various statistical models to assess their level of accuracy at adequately identifying users across platforms. However, in our study, the ground-truth is unknown, and, consequently, the accuracy of a more sophisticated approach cannot be verified5. For these reasons, we decided to keep the method simple by cross-correlating usernames across platforms. This allowed us to find a lower bound estimate on the number of crossover users. The estimate is lower bound because the method inevitably yields false positives (those we flagged as crossover users when they are not), but, given users’ tendency to choose the same usernames [55, 56, 57, 58, 59, 60], it likely yields more false negatives (those we missed with the method) than such false positives. We also strengthened this inequality assumption by using strict username filters instead of fuzzy ones (such as considering “RoniTheJungleMaster” and “RoniTheJungleMasteR” as the same user).

Most importantly, the limits of the method do not hinder the main result of this study: that there exists an overlap between the public forum population and cybercrime populations. Such a finding also holds even when considering variations of the crosscorrelation methods through different usernames and timeframe filters, as presented below.

Filtering to Find Crossover Users

For this analysis, we used the same dataset of all users who posted in 2017 and 2018 in the public forum (generated for the UMAP analysis). This dataset includes a total of 685,815 comments, 34,706 threads, and 23,348 users. From this dataset, we created a list of usernames and developed two types of filters: username filters and timeframe filters. Also, only public forum users who commented on forums that had a clear cybercrime branding were identified as crossover users, as explained below.

Username Filters. We filtered the list of public forum usernames to keep only those with at least five characters, removing short usernames such as “Nick”, “Max,” or “bot”. The general idea was to minimize the chances of cross-correlating generic usernames (due to their popularity or lack of sophistication) and maximize the chances of keeping one-of-a-kind usernames. We developed additional -more conservative- filters, but the results did not change the narrative of the paper. For the sake of concision, only the results of the most liberal filter are presented below, gathering as many users as possible who could be crossover users6.

Timeframe Filters. We searched for the filtered five-character usernames in the database to find if some of them commented in other forums. However, since Flare Systems has visibility on forums that have been active since early 2000, we had to filter out the comments based on the year they were posted. Since the timeframe of the public forum dataset presented above is 2017 and 2018, we heuristically decided to keep only comments posted from 2015 to 2020, effectively extending the time range by 100% before and after the timeframe of the dataset.

This aimed at minimizing the chances of cross-correlating usernames that might belong to different individuals because of the time difference between the comments posted. We also developed more conservative timeframe filters, but the results, again, did not change the narrative of the paper. For the sake of concision, they are thus not presented below7.

Selecting Forums with a Cybercrime Branding. If a user with at least a five-character username commented on a forum (other than the public forum) during the timeframe mentioned above, we extracted the identification number for the comment, its timestamp, and the name of the cybercrime forum on which it was posted. This resulted in identifying 42 forums where crossover users commented. We manually verified whether these 42 forums had a cybercrime branding. Four of the 42 forums were branded as a forum for cybersecurity discussions and were therefore removed. The remaining 38 forums were openly related to cybercrime activities.

All in all, the crossover user dataset included all comments posted between 2015 and 2020 in one of the 38 verified cybercrime forums by public forum users who had usernames of at least five characters. We added a binary variable identifying crossover users to the forum population dataset.

Investigating Cybercrime Forums

We also visited the cybercrime forums to codify their main branding, such as cracking or money laundering. The code was determined based on the website’s official description and/or the main topics discussed on the front page. Sometimes, these forums were down or required registration. In such cases, to find their general branding, we extracted information about the forum in security reports and blogs or in the database. The main branding identified sometimes overlapped (e.g., hacking and blackhat SEO forums host similar discussion topics), and, in such cases, the most obvious one was kept. Also, since the branding of some forums was not well-defined, we created a catchall code named: discussions, sales and questions on various content related to cybercrime. The idea was to provide a general picture of the types of cybercrime forums on which crossover users discussed.

We also coded whether the cybercrime forums were hosted on the clearnet, meaning that they could be visited via a modern web browser, such as Google, or hosted on The Onion Router (Tor), that is, hosted on the darknet. Tor is an anonymous communication protocol developed by a network of volunteers that allows users to browse the internet anonymously [65]. The anonymous protocol also hosts websites, known as onion services, that offer anonymity to both website owners and visitors. These onion services are often associated with the darknet [66, 67, 68], a loosely defined concept that encompasses networks that are not accessible via modern web browsers and offer anonymity to their users, such as I2P, Freenet, Tor, and ZeroNet [69]. Content is more likely to be related to criminal activities when hosted on these technologies due to the anonymity provided [68].

Distinguishing Crossover Users in the Public Forum

Finally, to compare crossover users with non-crossover users on the public forum, we developed a series of posting behavior indicators. We then compared crossover users with non-crossover users based on these indicators through a series of non-parametric tests. We computed the analysis twice, once on the Top Poster dataset, to remove the potential effect of a mass of non-participating users, and once on the entire forum population dataset.

Posting Behavior Indicators

We developed three indicators with 13 sub-indicators to measure posting behaviors. They are presented below.

Activity Rate The first indicator quantifies the extent to which a user was active on the public forum. It includes two subindicators: (1) N. Posts which is the sum of all comments made by each user in 2017 and 2018, and (2) N. days active which is the number of days each user was active over the two-year period (meaning the number of days the user posted at least once).

Diversification The second indicator quantifies the extent to which a user was diversified in the public forum. It includes two sub-indicators: (1) N. cat: the number of categories in which the user commented (categories are listed in Table 1), and (2) N.

sub-cat: the number of subcategories in which the user commented (a sample of subcategories is listed in Table 1).

Topics Discussed The third indicator measures the extent to which a user was active in a specific category of the public forum. This indicator thus includes nine sub-indicators, one per category. Each sub-indicator includes the total number of comments posted by a user in one of the nine categories.

Descriptive statistics, for each sub-indicator, are shown in Table 4 for the Top Poster dataset. The descriptive statistics for the entire forum population dataset are also presented in Table 1 of the Appendix.

Table 4: Descriptive Statistics of Behavior Indicators for Top Posters

N=6,924

Activity Rate

Min

Max

Mean (std)

Med

N. posts

10

6,603

92 (267)

27

N. days active

1

708

41 (67)

17

Diversification

N. cat

1

9

4 (2)

3

N. sub-cat

1

71

7 (9)

4

Topics Discussed

Search Engines

0

1,109

9 (38)

0

Monetizing Sites

0

3,010

18 (75)

1

Practical Opt.

0

2,965

12 (56)

1

Comm. of Prof.

0

2,363

13 (75)

0

Site Building

0

2,873

14 (81)

1

Exch. and Sales

0

689

4 (17)

0

Purch. Traffic

0

880

2 (18)

0

Work Webmaster

0

296

3 (11)

0

Not About Work

0

3,532

16 (129)

0

Mann-Whitney U Tests

We computed Mann-Whitney U tests to assess if the distributions of sub-indicators for the crossover users differed from those of non-crossover users. This test was favored over more common parametric tests because all sub-indicators did not follow a normal distribution. A Mann-Whitney U test compares two groups by ranking their respective values from low to high and then comparing the average rank of the two groups. The assumptions behind the tests are that the distributions of the data from the two groups are independent; they follow a similar shape and are ordinal or continuous. The sub-indicators all respected these assumptions.

For each sub-indicator, the null hypothesis (Ho) was that there is no difference between crossover users and non-crossover users. The alternative hypothesis (Ha) was that there is a difference between crossover users and non-crossover users. The significance level of the tests was set to 0.05, meaning that there is a 5% risk of concluding that a difference exists when there is no difference.

Below, we report the group’s (crossover or non-crossover) mean, standard deviation, median, mannwhitneyu statistics (U), and p-value. Also, to measure the effect size, we used the common language effect size introduced by [70]. It represents the proportion of favorable pairs that support one direction. In this study, the effect size for each sub-indicator represents the proportion of favorable pairs for the group that scored higher for that sub-indicator (which can be inferred from the mean and/or median).

Results of Crossover Users Analyses

The results of the macro assessment on crossover users are presented below.

A Minimum of 7% of Crossover Users

A total of 21,726 users had a username of at least five characters. Out of them, 1,557 posted in one of the 38 cybercrime forums between 2015 and 2020, representing 7.2% of the public forum population. Of the Top Poster dataset, a total of 6,433 individuals had a username of at least five characters. Out of them, 510 were crossover users: they posted at least once in one of the 38 cybercrime forums between 2015 and 2020. These crossover users represent 7.9% of the Top Poster dataset.

Diversified Cybercrime Forums and Limited Involvement

We then investigated the branding behind cybercrime forums. Of the 38 forums, seven focused on hacking, seven on cracking (cracked software) or leaked information (e.g., lists of usernames and passwords), and six on carding (credit card fraud), while three were cryptomarkets (marketplaces hosted on Tor), one involved money laundering discussions, one was specialized in sharing black hat SEO techniques, and thirteen gathered discussions, sales, and questions on various content related to cybercrime (with no clear specific branding). Also, of these 38 cybercrime forums, 17 were hosted on the clearnet, meaning that they could be visited via a web browser such as Chrome. The remaining 21 were hosted on The Onion Router (Tor) network, known as the darknet.

Figure 3 illustrates the 38 forums in terms of (i) the forum’s accessibility via the clearnet or the darknet, (ii) the forum’s main branding, (iii) the total number of crossover users who commented on it, and (iv) the total number of comments. This data includes information from all crossover users identified (and not only the Top Posters). The figure is a tree map where the size of the boxes represents the number of crossover users in the sample who interacted in the cybercrime forum (specified under the name of the cybercrime forum). The color scale represents the number of comments on each cybercrime forum. Specific information on all 38 forums can also be found in Table 2 of Appendix.

Figure 3: Main Branding of Cybercrime Forums as well as Number of Comments and Number of Crossover Users on them

As shown in Figure 3, in terms of number of crossover users, the most popular forums are Nulled to (cracking and leaks), Dark Money (money laundering), Best Hack Forum (hacking), Exploit In (hacking), Black Hat World (blackhat SEO), and Club2crd (carding). These cybercrime forums are also the ones with the greatest number of comments, although in a different order. Overall, crossover users commented on a variety of cybercrime forums, from cracking (and leaks) to hacking or money laundering. Given that crossover users discussed on a public forum about internet marketing, it is interesting to note that the types of cybercrime forums found focused on similar -yet illicit- topics, such as blackhat SEO.

In terms of posting patterns, crossover users favored cybercrime forums hosted on the clearnet over those hosted on the darknet. That is, 61% of crossover users commented only on cybercrime forums hosted on the clearnet.

Also, their participation in cybercrime forums was limited. Crossover users posted, on average, 21 comments on cybercrime forums (std=75), with a minimum of one and a maximum of 1,383. More importantly, 50% of cross over users posted three comments and 75% posted only ten comments!

A Relatively Indistinguishable Group

With crossover users identified, we used the map generated in the case study section to visualize where these users were positioned in the public forum. As shown in Figure 4, which considers only the Top Poster dataset, crossover users are positioned all over the public forum. In other words, these users do not cluster in specific groups of the public forum as identified by the UMAP algorithm.

This finding is further supported by the formal statistical analysis below.

Figure 4: The Forum Map with Crossover Users (blue dots)

Table 5 shows the results of the Mann-Whitney U tests for the Top Poster dataset. The results for the entire public forum population dataset are available in Table 3 of the Appendix.

Table 5 shows that four out of the 13 sub-indicators (31%) display statistical significance and suggest that there is a difference between crossover users and non-crossover users. However, analysis of the descriptive statistics for these four sub-indicators shows that the absolute differences reported are minimal. These minimal differences are also shown in the small effect sizes, which oscillate around 50%. The same tests computed on the whole population (available in Table 3 of the Appendix) display eight significant relationships out of 13 sub-indicators (62%). However, the minimal differences in the descriptive statistics and the small effect sizes both prevent us from concluding that there exist significant differences that differentiate crossover users from non-crossover users.

This absence of noticeable differences between the two groups is quite informative. Indeed, the fact that almost all indicators are non-significant and the fact that, when there is statistical significance, the effect size is small, suggest that either the two populations are practically indistinguishable given these characteristic variables, or that they are strongly overlapping, hence hinting at a larger crossover user group in the public forum population.

Table 5: Mann-Whitney U Test Results for the Top Poster Dataset

Crossover Users

Non-Crossover Users

N=510

N=6,414

Statistics

Mean

Std

Med

Mean

Std

Med

Mann Whitney U

p-value

Effect Size

Activity Rate

N. posts

99.79

289.63

29

91.10

264.88

27

1,559,642

0.04

0.52

N. days active

44.68

68.89

19

40.36

67.16

17

1,526,452

0.01

0.53

Diversification

N. cat.

3.69

2.33

3

3.53

2.25

3

1,577,524

0.09

0.52

N. sub-cat

7.70

9.23

5

7.19

8.47

4

1,598,917

0.20

0.51

Topics Discussed

Search Engine

8.64

30.23

0

9.29

36.61

0

1,619,719

0.34

0.50

Monetizing sites

21.73

139.65

1

17.80

67.07

1

1,629,106

0.44

0.50

Practical opt.

10.81

32.21

1

11.73

57.74

1

1,595,856

0.17

0.51

Comm. of prof.

19.06

86.82

0

12.47

74.21

0

1,515,778

0.00

0.54

Site building

13.28

45.15

1

14.52

83.63

1

1,579,436

0.08

0.52

Exch. and sale

4.18

13.72

0

4.21

17.37

0

1,587,799

0.10

0.51

Purch. traffic

2.54

18.54

0

1.92

17.90

0

1,626,877

0.39

0.50

Work Webmasters

2.45

8.36

0

2.72

10.86

0

1,615,643

0.28

0.51

Not about work

17.11

102.36

0

16.43

130.75

0

1,561,045

0.02

0.52

Discussion

This research explores how users of a public forum on internet marketing have ties to cybercrime, both at a micro and macro scale. In short, the three actors in the case study actively used the public forum as a source of information as well as to find products or services related to their business which, by 2017 and 2018, had ties to cybercrime activities. Also, according to the forum map, Actors 1 and 2 were positioned in more opportunistic groups with users who commented less, on average, while Actor 3 was positioned near the center of the map, in no specific groups. Overall, there was nothing special about them; they were typical users of the forum. Additionally, none of them ever mentioned the cybercrime activities they were involved in. The findings of this case study are not unique. We extrapolated the analysis to all forum participants and a lower-bound estimate of 7.2% of crossover users was found. Also, many cybercrime forums on which crossover users discuss (e.g., blackhat SEO or hacking) are centered on IT topics similar to those of the public forum, although more explicitly illicit. Such a finding suggests that crossover users may have used these cybercrime forums to develop their business. Further studies should look more deeply into this interplay: how the activities of crossover users in public forums (not branded as cybercrime) relate to activities in cybercrime forums.

The text below centers on three key discussion points that emerge from the findings of this study: 1) the public forum as an informal space auspicious for cybercrime activities, 2) cybercrime tasks potentially hiding in plain sight, and 3) the limited involvement of crossover users in cybercrime forums.

An Informal Market Auspicious for Cybercrime. Given that the general description of the public forum mentions “find mutually beneficial contacts and partners” and given that the three case study actors used the forum to exchange products and services related to internet marketing, the public forum hosts a market. Haller and Portes (2010)’s theory on informal markets provides an interesting framework to understand the economic activities taking place on this market. In informal markets, the product or service is not necessarily illegal; it is rather the means by which it is produced or distributed that is illegal [3, 4, 5]. Although internet marketing is a legitimate and legal economic activity, the forum’s description and how business is conducted on it (such as ordering products or services through discussion threads) point towards the public forum market’s being informal.

Freelancer platforms are known as unregulated spaces where informal work is thriving [32, 33, 34]. Also, there have been reports of cybercrime activities taking place on these platforms [35, 37, 36]. Hence, by interpreting the market hosted on the public forum as informal, this study corroborates that there exists an overlap between informal online and criminal online markets, just as in traditional settings [26, 25, 27]. However, the public forum studied is less formal or official than freelancer platforms: it is not structured around creating matches for labour demand and supply with vendor and worker profiles and official job listings. Potentially, the public forum unofficial structure -which creates more flexibility- may increase the size of the informal/cybercrime overlap within the forum population. Further research should look at whether (and how) structure differences in informal online markets change the prevalence of cybercrime activities on them.

Moreover, in traditional informal markets, trust issues among market participants are usually solved through strong social ties [4, 25]. In online settings, however, informal institutions (like the public forum) are known to partially control for trust issues by, for example, banning forum users or allowing buyers to leave feedback in threads [30, 31]. Given that the three actors used the public forum opportunistically, there might be notable differences between online and offline informal markets, in terms of social structures, that warrant further investigation.

Cybercrime Specialization Tasks Hiding in Plain Sight Furthermore, the three case study actors never mentioned the cybercrime activities they were involved in and they were not part of the crossover user sample. The neutrality of the IT tasks they performed allowed them to conceal the maliciousness of their activities in the forum [1, 2]. On top of this, the results show that the posting behavior of crossover users was relatively indistinguishable from that of non-crossover users in the public forum. These results suggest that the neutrality of IT tasks [1, 2] allowed crossover users to behave the same as non-crossover users in informal settings, just like the actors in the case study. These results might also suggest that the crossover user sample is a minimum and there exist a greater number of crossover users that were not identified through the cross-correlation method presented above. In such cases, the overlap would be larger than estimated (hence the lower bound mention). In both cases, the results point toward a need to investigate further these informal spaces that may represent a hotbed for IT tasks surrounding cybercrime operations.

This is especially important given that, although specialization is known to characterize the cybercrime industry [13, 12, 14], recent studies have found that “as-a-service” advertising in underground forums is limited [8, 17]. A significant proportion of public forum users may be part of the cybercrime specialization trend observed in previous studies [13, 12, 14]. However, the neutrality of the work they achieve (e.g., building websites, managing servers, translating texts) could leave a large proportion of them out of cybercrime forums. This could explain why “as-a-service” offerings are limited in cybercrime forums [17]: other settings —less targeted by researchers and law enforcement officers— offer many of these services. The neutrality of the tasks may moreover facilitate the recruitment process, allowing a form of “hiding in plain sight” [1, 2]. These findings reinforce the need to study cybercrime participation beyond forums that advertise cybercrime in their official branding.

Dipping a Toe: Limited Involvement of Crossover Users in Cybercrime Forums. However, 75% of crossover users posted fewer than 10 comments in any cybercrime forum, suggesting limited participation. They also favored cybercrime forums hosted on the clearnet over those hosted on the darknet. The darknet is mainly linked to the anonymous Tor network, which has a reputation for fostering criminal activities [68, 71]. This suggests that crossover users may limit their cybercrime participation, at least in forums that clearly embody criminal branding. This is in line with Sabet (2015) [26], who argued that informal workers from traditional markets preferred to avoid criminal ties when possible. To better understand the reality of informal workers in online settings, further research should assess what crossover users do on cybercrime forums and their degree of involvement in them.

Moreover, researching on a broader scale, the opportunity landscape of these individuals could also further our understanding on cybercrime participation. Haller and Portes’ (2010) theory on informal markets opens a wide array of research regarding why and how individuals in informal online markets may decide to get involved in illicit activities. According to these authors, informal markets provide economic opportunities to individuals in need, and lower costs for products and services [3, 4]. Such aims align with how the three main actors positioned themselves in public forum: mainly opportunistically and looking for business opportunities. Such search for economic opportunities may have led to them to participate in cybercrime activities, as, according to [3], informal market participants are ready to embark on criminal business opportunities when the prospects for profits are high and the likelihood of getting caught is low.

Finally, research on informal markets suggests that these markets exist due to survival and dependent exploitation (decreased labor costs) but also for growth, including capital accumulation, solidarity, and flexibility. Informal markets are also known to foster innovation [3, 4]. Hence, there is a need to consider the positive effects of such informal markets, and the business landscape they offer, when seeking to understand how and when such workers end up contributing to cybercrime. Furthermore, crossovers’ limited participation suggests that they are not committed cybercrime actors. Hence, focusing both on changing their opportunity landscape and raising awareness on the harms induced by cybercrime are two alternative approaches to prevent cybercrime participation for this specific population.

Limits and Future Studies

There are several limits to the findings of this study that need to be mentioned. A first limit lies in the cultural origin of the public forum, which mainly includes Russian-speaking individuals. Analyzing whether users from other informal online settings have connections to cybercrime could provide a more global understanding of the social phenomenon. Also, crossover users were identified by linking usernames across forums, based on various username and timeframe filters, as explained in Section 4.1. This method limited our estimation on the number of crossover users to a lower bound. Further research could build on this study and present alternative estimates. For example, one could decide to link slightly similar usernames, such as Sarik9 and Sarik10. This would assume that similar usernames belong to the same individual and would yield a higher lower bound estimate. The cross correlation also depended on which cybercrime forums were monitored by Flare Systems. To expand the visibility of potential cybercrime forums, partnerships with other organizations could be developed in future studies. Finally, future studies could improve the posting behavior indicators, using machine learning techniques to analyze comments beyond their categories or subcategories.

Conclusion

This is the first study to formally quantify how users of an internet marketing public forum, a space for informal exchanges, have ties to cybercrime activities. We conclude that crossover users are a substantial part of the population in the public forum; and, even though they have been overlooked, their aggregate effect in the ecosystem must be considered. This study opens new research questions on cybercrime participation that should consider online spaces beyond their cybercrime branding. We hope that these findings can be used as a steppingstone for future studies uncovering the territories of informal online markets and their potential ties with cybercrime, especially given the neutrality of IT [1, 2] and the low prevalence of cybercrime specialization advertisements in cybercrime forums [8, 17].

Acknowledgments

The authors would like to thank the Stratosphere Laboratory team and the anonymous reviewers for their reviews and suggestions, along with Anna Shirokova for the help with Russian translations. The authors would also like to thank Veronica Valeros for her help in the data gathering process, Avast Software for partial funding of this research, and Flare Systems for their access to the forum’s data.

References

  1. Leukfeldt, ER, Kruisbergen, EW, Kleemans, ER et al. Organized financial cybercrime: Criminal cooperation, logistic bottlenecks, and money flows. In: Holt, T. and Bossler, A. (ed). Palgrave Handbook of International Cybercrime and Cyberdeviance, Switzerland: Palgrave Macmillan, 2020, 961-980. 10.1007/978-3-319-90307-165-1.

  2. Bijlenga, N, and Kleemans, ER. Criminals seeking ict-expertise: an exploratory study of dutch cases. Eur. J. Crim. Policy Res 2018, 24(3):253–268. 10.1007/s10610-017-9356-z.

  3. Ojo, S, Nwankwo, S. and Ayantunji, G. Ethnic entrepreneurship: the myths of informal and illegal enterprises in the UK. Entrepreneurship Reg. Dev 2013, 25(7-8):587–611. 10.1080/08985626.2013.814717.

  4. Haller, W, and Portes, A. The informal economy. In: Smelser, N.J. and Swedberg, S. (ed). Handbook of Economic Sociology, New York: Russel Sage Foundation, 2010, 403–425.

  5. Castell, M, and Portes, A. (1989). World underneath: the origins, dynamics and effects of the informal economy. In Portes, A., Castells, M. and Benton, L.A. (ed). The Informal Economy: Studies in Advanced and Less Developed Countries. Baltimore: John Hopkins University Press, 1989, 11–37.

  6. McElwee, G, Smith, R, and Somerville, P. Theorising illegal rural enterprise: is everyone at it? Int. J. Rural Crime 2011. 1:40-62.

  7. Anderson, R, Barton, C, Bolme, R, et al. Measuring the changing cost of cybercrime. In: Workshop on the Economics of Information Security, 1-32, Boston, MA, USA.

  8. Van Wegberg, R, Tajalizadehkhoob, S, Soska, K et al. Plug and prey? measuring the commoditization of cybercrime via online anonymous markets. In: 27th USENIX Security Symposium 2018, 1009–1026, Baltimore, MD, USA.

  9. Afroz, S, Garg, V, McCoy, D. et al. Honor among thieves: A common’s analysis of cybercrime economies. In 2013 APWG eCrime Researchers Summit, 2013, 1–11, San Francisco, CA, USA.

  10. Collier, B, Clayton, R, Hutchings, A et al. Cybercrime is (often) boring: maintaining the infrastructure of cybercrime economies. In: Workshop on the Economics of Information Security 2020, 1-25, Brussels, Belgium. 10.17863/CAM.53769

  11. Manky, D. Cybercrime as a service: a very modern business. Comput. Fraud. Secur. 2013, 6:9–13, 2013. 10.1016/S1361-3723(13)70053-8

  12. Huang, K, Siegel, M. and Madnick. S. Systematically understanding the cyber attack business: A survey. ACM Comput. Surv. 2018, 51(4):1–36. 10.1145/3199674

  13. Lusthaus, J. Industry of anonymity: Inside the business of cybercrime. Cambridge: Harvard University Press, 2018.

  14. Thomas, K, Huang, DY, Wang, D. et al. Framing dependencies introduced by underground commoditization. In: Workshop on the Economics of Information Security, 2015, 1-24. Delft, The Netherlands.

  15. Hutchings, A, and Holt, TJ. A crime script analysis of the online stolen data market. Br. J. Criminol. 2015, 55(3):596–614. 10.1093/bjc/azu106.

  16. Moore, T, Clayton, C, and Anderson, R. The economics of online crime. J Econ Perspect. 2009, 23(3):3–20.

  17. Akyazi, U, van Eeten, M, and Gañán, C.H. Measuring cybercrime as a service (CaaS) offerings in a cybercrime forum. In: Workshop on the Economics of Information Security 2021, 1-15. Virtual.

  18. Broséus J, Rhumorbarbe, D, Morelato, M. et al. A geographical analysis of trafficking on a popular darknet market. Forensic Sci. Int. 2017, 277:88–102. 10.1016/j.forsciint.2017.05.021

  19. Martin, J. Drugs on the dark net: How cryptomarkets are transforming the global trade in illicit drugs. New York: Springer, 2014.

  20. Leukfeldt, ER, Kleemans, ER, and Stol, WP. Cybercriminal networks, social ties and online forums: Social ties versus digital ties within phishing and malware networks. Br. J. Criminol. 2017, 57(3):704–722. 10.1093/bjc/azw009

  21. Leukfeldt, ER, Kleemans, ER, and Stol, WP. Origin, growth and criminal capabilities of cybercriminal networks. an international empirical analysis. Crime, Law Soc. Chang. 2017, 67(1):39–53. 10.1007/s10611-016-9663-1

  22. Leukfeldt, ER, Kleemans, ER, and Stol, WP. A typology of cybercriminal networks: from low-tech all-rounders to high-tech specialists. Crime, Law Soc. Chang. 2017, 67(1):21–37. 10.1007/s10611-016-9662-2.

  23. Leukfeldt, ER, Kleemans, ER, and Stol, WP. The use of online crime markets by cybercriminal networks: A view from within. Am Behav Sci 2017, 61(11):1387–1402. 10.1177/0002764217734267.

  24. Ponsaers, P, Shapland, J. and Williams, C.C. Does the informal economy link to organised crime? Int. J. Soc. Econ. 2008. 35(9): 644-650. 10.1108/03068290810896262.

  25. Shapland, J. The informal economy: Threat and opportunity in the city. 2004, 1-29. https://pure.mpg.de/rest/items/item_3014458/component/file_3014459/content (4 April 2022, last accessed)

  26. Sabet, DM. Informality, illegality, and criminality in mexico’s border communities. J. Borderl. Stud. 2015, 30(4):505–517. 10.1080/08865655.2015.1101704

  27. Walle, GV. A matrix approach to informal markets: towards a dynamic conceptualisation. Int. J. Soc. Econ. 2008, 35(9): 651-665. 10.1108/03068290810896271

  28. Cambini, C, Meccheri, N, Silvestri, V. et al. Competition, efficiency and market structure in online digital markets. An overview and policy implications. European rev. ind. econ. policy, 2011, 2:1–27. https://hal.archives-ouvertes.fr/hal-03468956/ (12 April 2022, last accessed).

  29. Rangaswamy, N. A note on informal economy and ICT. Electron. J. Inf. Syst. Dev. Ctries. 2019, 85(3):1-5. 10.1002/isd2.12083

  30. Dobson, S, Sukumar, A, and Tipi, L. Dark matters: the institutional entrepreneurship of illicit and illegal cyberspace. In Mcelwee, G, and Smith, R. (ed) Exploring Criminal and Illegal Enterprise: New Perspectives on Research, Policy & Practice (Vol. 5), Emerald Group Publishing Limited. 2015, 179-201.

  31. Kshetri, N. The global cybercrime industry: economic, institutional and strategic perspectives. New York: Springer, 2010.

  32. Schmidt, FA. Digital labour markets in the platform economy. Mapping the Political Challenges of Crowd Work and Gig Work. 2017, Friedrich-Ebert-Stiftung. https://library.fes.de/pdf-files/wiso/13164.pdf (April 2, last accessed).

  33. Drahokoupil J, and Piasna, A. Work in the platform economy: Beyond lower transaction costs. Inter Econ. 2017, 52(6):335–340, 2017.

  34. Drahokoupil, J, and Fabo, B, The platform economy and the disruption of the employment relationship. ETUI Research Paper-Policy Brief, 2016. 5: 1-6. https://www.etui.org/sites/default/files/Platform%20economy%20Drahokoupil%20Fabo%20Policy%20Brief%20PB%202016.05.pdf (April 7 2022, last accessed)

  35. Farooqi, S, Jourjon, G, Ikram, M et al. Characterizing key stakeholders in an online black-hat marketplace. In: APWG Symposium on Electronic Crime Research (eCrime), 2017, 17–27. Scottsdale, AZ, USA. 10.1109/ECRIME.2017.7945050

  36. Garg, V, Camp, LJ, Kanich, C. Analysis of Ecrime in Crowd-Sourced Labor Markets: Mechanical Turk vs. Freelancer. In: Böhme, R. (eds) The Economics of Information Security and Privacy. Berlin: Springer, 2013. 10.1007/978-3-642-39498-0_13

  37. Motoyama, M, McCoy, D, Levchenko, K et al. Dirty jobs: The role of freelance labor in web service abuse. In Proceedings of the 20th USENIX conference on Security, 2011, 14–14. Berkeley, CA, USA.

  38. Paquet-Clouston, M. The Role of Informal Workers in Online Economic Crime. PhD thesis. Simon Fraser University Arts & Social Sciences School of Criminology, 2021.

  39. Shirokova A, Garcia, S, Erquiaga, MJ et al. Geost botnet. operational security failures of a new android banking threat. In: IEEE European Symposium on Security and Privacy Workshops, 2019, 406–409, Stockholm, Sweden. 10.1109/EuroSPW.2019.00051

  40. Garcia, S, Erquiaga MJ, and Shirokova, A. Geost Botnet. the Story of the Discovery of a New Android Banking Trojan From an Opsec Error. In: Virus Bulletin, 2019, 1–21. London, UK.

  41. Virus total. https://www.virustotal.com/ (April 13, last accessed)

  42. Flare systems. https://flare.systems/. (2 April 2022, last accessed)

  43. Haklay, M. Why is participation inequality important? In: Capineri, C, Haklay, M, Huang, H, et al. (eds.) European Handbook of Crowdsourced Geographic Information, 2016, 35–44, London: Ubiquity Press.

  44. Paquet-Clouston M, Décary-Hétu, D, and Morselli, C. Assessing market competition and vendors’ size and scope on alphabay. Int. J. Drug Policy 2018, 54:87–98. 10.1016/j.drugpo.2018.01.003.

  45. Sun N, Rau, PPL, and Ma, L. Understanding lurkers in online communities: A literature review. Comput. Hum. Behav. 2014, 38:110–117. 10.1016/j.chb.2014.05.022.

  46. Mooney, P, and Corcoran, P. Who are the contributors to openstreetmap and what do they do? In Proceedings of the GIS Research UK 20th Annual Conference, 2012, 355–360, West Yorkshire, UK, https://www.geos.ed.ac.uk/~gisteac/proceedingsonline/GISRUK2012/Papers/presentation-87.pdf (23 April 2022, last accessed)

  47. Lund, K, Coulton, P. and Wilson, A. Participation inequality in mobile location games. In Proceedings of the 8th International Conference on Advances in Computer Entertainment Technology, 2011, 1–8. Lisbon, Portugal. 10.1145/2071423.2071457

  48. Van Mierlo, T. The 1% rule in four digital health social networks: an observational study. J. Med. Internet Res. 2014, 16(2). 10.2196/jmir.2966

  49. McInnes, L, Healy, J, and Melville, J. UMAP: Uniform manifold approximation and projection for dimension reduction. 2018. https://arxiv.org/abs/1802.03426 (March 20 2022, last accessed)

  50. Cao, J, Spielmann, M, Qiu, X. et al. The single-cell transcriptional landscape of mammalian organogenesis. Nature, 2019, 566:496–502. 10.1038/s41586-019-0969-x

  51. Packer, SJ, Zhu, Q, Huynh, C. et al. A lineage-resolved molecular atlas of c. elegans embryogenesis at single-cell resolution. Science, 2019, 365 (6459): 1-8. 10.1126/science.aax1971

  52. Diaz-Papkovich, A, Anderson-Trocmé, L, Ben-Eghan, C. et al. UMAP reveals cryptic population structure and phenotype heterogeneity in large genomic cohorts. PLoS Genet. 2019, 15(11): e1008432. 10.1371/journal.pgen.1008432

  53. Google translate API for Python, https://pypi.org/project/googletrans/ (2 September 2021, last accessed).

  54. Thomas, RD, Pastrana, S, Hutchings, A. et al. Ethical issues in research using datasets of illicit origin. In Proceedings of the 2017 Internet Measurement Conference, 2017, 445–462. London, UK. 2017. 10.1145/3131365.3131389.

  55. Perito, D, Castelluccia, C, Kaafar, MA, Manils, P. How Unique and Traceable Are Usernames?. In: Fischer-Hübner, S., Hopper, N. (eds) Privacy Enhancing Technologies (Vol. 6794). Lecture Notes in Computer Science, Berlin: Springer, 2011. 10.1007/978-3-642-22263-4_1

  56. Zafarani, R, and Liu, H. Connecting corresponding identities across communities. In Proceedings of the International AAAI Conference on Web and Social Media, 2009, 354–357, San Jose, CA, USA.

  57. Iofciu, T, Fankhauser, P, Abel, F et al. Identifying users across social tagging systems. In Proceedings of the International AAAI Conference on Web and Social Media, 2011, 522–525, Barcelona, Spain. https://ojs.aaai.org/index.php/ICWSM/article/view/14153 (14 April 2022, last accessed)

  58. Tan, S, Guan, Z, Cai, D et al. Mapping users across networks by manifold alignment on hypergraph. In Proceedings of the AAAI Conference on Artificial Intelligence, 2014, 1-7, Quebec, QC, Canada. 10.1609/aaai.v28i1.8720

  59. Sinnott, R, and Wang, Z. Linking user accounts across social media platforms. In 8th International Conference on Big Data Computing, Applications and Technologies, 2021, 18–27, Leicester, UK. 10.1145/3492324.3494157

  60. Wang, Y, Liu, T, Tan, Q et al. Identifying users across different sites using usernames. Procedia Comput. Sci. 2016, 80:376–385. 10.1016/j.procs.2016.05.336

  61. Wang, M, Tan Q, Wang, X et al.. De-anonymizing social networks user via profile similarity. In Proceedings of the IEEE Third International Conference on Data Science in Cyberspace, 2018, 889–895, Guangzhou, China. 10.1109/DSC.2018.00142

  62. Goga, O, Loiseau, P, Sommer, R et al. On the reliability of profile matching across large online social networks. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2015, 1799–1808, Sydney, Australia. 2015. 10.1145/2783258.2788601

  63. Vosoughi, S, Zhou, H, Roy, D. Digital Stylometry: Linking Profiles Across Social Networks. In: Liu, TY., Scollon, C., Zhu, W. (eds) Social Informatics (Vol. 9471). Lecture Notes in Computer Science. Switzerland: Springer, 2015. 10.1007/978-3-319-27433-1_12

  64. Ho, TN and Keong, KNg. Application of stylometry to darkweb forum user identification. In Proceedings of the International Conference on Information and Communications Security, 2016, 173–183. Singapore, Singapore. 10.1007/978-3-319-50011-9_14

  65. The Onion Router, https://www.torproject.org/about/history/ (21 April 2022, last accessed).

  66. Fidalgo, E, Alegre, E, Fernández-Robles, L. et al. Classifying suspicious content in tor darknet through semantic attention keypoint filtering. Digit Investig, 2019, 30:12–22. 10.1016/j. diin.2019.05.004.

  67. Broadhurst, R, Ball, M, and Jiang, C. Availability of COVID-19 related products on Tor darknet markets. Statistical Bulletin no. 24. Institute of Criminology. 10.52922/sb04534

  68. Owen, G, and Savage, N. The tor darknet. Global commission on internet governance, Paper series no. 20, 2015. https://www.cigionline.org/sites/default/files/no20_0.pdf (1 novembre 2021, last accessed).

  69. Hu, Y, Zou, F, Li, L et al. Traffic classification of user behaviors in tor, i2p, zeronet, freenet. In the Proceedings of the IEEE 19th International Conference on Trust, Security and Privacy in Computing and Communications, 2020, 418–424, Guangzhou, China. 10.1109/TrustCom50675.2020.00064.

  70. McGraw, KO, and Wong, SP. A common language effect size statistic. Psychol. Bull. 1992, 111(2):361-365. 10.1037/0033-2909.111.2.361

  71. Faizan, M, and Khan, RA. Exploring and analyzing the dark web: A new alchemy. First Monday, 2019, 24(5). 10.5210/fm.v24i5.9473

APPENDIX TABLES

Appendix Table 1. Descriptive Statistics of the Public Forum Population Dataset

N=23,348

Min

Max

Mean (std)

Med

N, posts

1

6,603

29 (151)

4

N. days active

1

708

14 (41)

3

Diversification

N. cat

1

9

2 (2)

1

N. sub-cat

1

71

3 (5)

1

Topics Discussed

Search Engines

0

1,109

3 (21)

0

Monetizing Sites

0

3,010

6 (42)

0

Practical Opt.

0

2,965

4 (31)

0

Comm. of Prof.

0

2,363

4 (41)

0

Site Building

0

2,873

5 (45)

0

Exch. and Sales

0

689

2 (10)

0

Purchased Traffic

0

880

1 (10)

0

Work Webmasters

0

296

1 (6)

0

Not About Work

0

3,532

5 (71)

0


Appendix Table 2. List of the 38 Cybercrime Forums Found

Forum Name

Hosted

Category

Nulled to

Clearnet

Cracking and Leaks

415

3205

Dark Money

Darknet

Money Laundering

287

4872

Best Hack Forum

Clearnet

Hacking

232

6311

Exploit in

Clearnet

Hacking

147

5845

Black Hat World

Clearnet

Black hat SEO

141

2319

Club2crd

Clearnet

Carding

113

2407

RaidForums

Clearnet

Cracking and Leaks

93

488

Cracked

Clearnet

Cracking and Leaks

86

1212

Cracking pro

Clearnet

Cracking and Leaks

84

482

Dread

Darknet

Discussions, Sales, Questions

83

1207

Hidden answer

Darknet

Discussions, Sales, Questions

45

187

Xss is

Darknet

Hacking

43

538

Prtship

Clearnet

Carding

42

383

Cracking King

Clearnet

Cracking and Leaks

39

73

Torum forum

Darknet

Discussions, Sales, Questions

36

202

Trollodrome2

Darknet

Discussions, Sales, Questions

34

90

French deep web forum

Darknet

Discussions, Sales, Questions

30

138

Rutor

Darknet

Cryptomarket

29

93

Sinister

Clearnet

Cracking and Leaks

28

738

DNMAvengers

Darknet

Cryptomarket

26

280

Dream forum

Darknet

Discussions, Sales, Questions

24

94

The Hub

Darknet

Discussions, Sales, Questions

15

78

SatForum

Darknet

Carding

12

1480

International Carding Alliance

Clearnet

Carding

9

15

Deutschland

Clearnet

Discussions, Sales, Questions

7

66

Verified Carders

Darknet

Carding

7

32

Envoy Forum

Darknet

Discussions, Sales, Questions

7

13

Onion Land

Darknet

Discussions, Sales, Questions

7

44

Wall Street forum

Darknet

Cryptomarket

5

27

Dark Anti French System (DFAS)

Darknet

Discussions, Sales, Questions

5

104

Criminality French Market

Darknet

Discussions, Sales, Questions

3

12

Xaker26

Clearnet

Hacking

2

46

CardVilla

Clearnet

Carding

2

2

Sinfulsite

Clearnet

Cracking and Leaks

1

2

Hermes

Darknet

Discussions, Sales, Questions

1

1

Main Helium

Darknet

Hacking

1

5

CryptBB

Darknet

Hacking

1

5

GreySec

Clearnet

Hacking

1

1

Appendix Table 3. Mann-Whitney U Tests for the Public Forum Population Dataset

Comments
0
comment
No comments here
Why not start the discussion?