Skip to main content
SearchLoginLogin or Signup

Using Computational Systematic Social Observation to Identify Environmental Correlates of Fear of Crime

Published onSep 02, 2024
Using Computational Systematic Social Observation to Identify Environmental Correlates of Fear of Crime
·

Abstract

There is great importance in understanding whether people perceive an environment as safe or unsafe. Perceptions are influenced by the built environment, and through better understanding, design interventions can be made to improve the feeling of safety. There is a rich body of research on this topic, yet it requires a lot of manual effort. In this work, we present an approach named Computational Systematic Social Observation (CSSO) to automate the collection and analysis process. The approach uses Google Street View and the Google Vision API to extract characteristics (herein referred to as features) of the built environment that is used to automate the process of understanding whether people will feel fear or safety. In testing this approach, we extracted ~1.3M images for the 100 locations and identified 297 features of the built environment. A measure of dependency demonstrated that some are more strongly associated with areas where people express a feeling of safety or fear. Further, through empirical testing, it is observed that these features can be used for classification. The results demonstrate the potential of the technique and were compared with human coders. The presented methodology and experimental research provide a foundation for systematic computational observation to identify environmental correlates of fear of crime.

Introduction

Whether people perceive a specific place as safe or not has implications at micro, meso and macro levels. Perceived insecurity affects individual mental and physical health [1][2] and leads to decreased social interaction [3][4][5]. At the community level, perceived insecurity can affect the real estate market [6] and lead to social stigmatisation of areas [5]. People are less willing to walk or cycle through places thought unsafe, reducing exercise which leads to negative public health results [3] and producing higher levels of emissions through choosing non-sustainable travel modes [7][8][9]. Due to its impact at individual, community, and systemic levels, perceived insecurity is important to address. Many studies explore the effect of neighbourhood design consider on perceived unsafety (e.g., [10][11]). By understanding what features of the environment affect the perception of an area as unsafe, we can inform planning and design initiatives that aim to reduce these negative outcomes.

To understand what elements of the environment are associated with perceived unsafety (or any other social phenomena), researchers must systematically take stock of these features. One approach is to carry out systematic social observation (SSO) of the areas of interest. [12] advocated for a systematic approach to social observation, namely the direct observation of social settings for research purposes. In this case, a systematic approach meant that observation and recording follow explicit rules and allow replication [13]. SSO, therefore, consists of using observations conducted by teams of trained assessors walking (or driving) through the study area, recording or rating characteristics according to pre-established criteria, to catalogue environmental features relevant to the objective of the study [13]. SSO has been widely adopted in social sciences, including criminology, where researchers have used SSO to study environmental factors associated with various topics such as robbery [14], burglary [15], and police deescalation tactics [16].

A frequent criticism of SSO concerns bias in these observations: bias can be introduced by inconsistencies between different observers and can lead to unreliable conclusions [17]. Another issue with SSO is scalability: due to the generally expensive process, the teams of researchers carrying out SSOs usually cover only a subset of an area of interest. Although this subset can be large, for example, [18] assessed 20% of all occupied face blocks in 66 Baltimore neighbourhoods while [13] collected measures of disorder across block faces within 200 census tracts, these remain samples rather than a complete inventory of the entire area of interest. How this sample is chosen can introduce further bias into the SSO process.

One approach to cutting costs associated with SSO and scaling is to employ SSO to images of the areas of interest rather than to carry out these observations through site visits. For example, many researchers have used Google Street View (GSV), a web-based collection of street-level imagery, to virtually “visit” areas and code the features of interest based on these images [19][20][21][22]. Using image-based SSO as opposed to site visits has been one approach to address scalability. However, this approach still requires the time of human raters, retaining the issues of bias, and some resource intensity, i.e. while the site visits are eliminated, the human coders must still spend time and effort coding the images, meaning there is still a selection of samples that means scaling up costs more human and, therefore, financial resources.

One possible solution is to attempt to automate the entire SSO process by collecting a representative set of images of the areas of interest and then using a computer vision algorithm to extract the present features. This results in a comprehensive and replicable catalogue of features in a particular area, which can be scaled without (much) cost. Some attempts to implement this have been recently published, for example, in relation to extracting characteristics of the built environment associated with crime [23][24], or in looking at specific elements of an environment in relation to fear of crime, such as the effect of green space [25], or the possibility of guardianship through the presence of windows [26]. All of these papers demonstrate specific examples of applying such a method to a specific question. In this paper, we take a step back and detail and evaluate a systematic process to automate SSO, which we term Computational Systematic Social Observation (CSSO). CSSO automates both the data collection process, building a database of images for each area of interest, and the feature extraction process, building a database of features present in these images. By doing so, we open up the steps of this process to a broad range of applications in criminology, and wider social sciences, urban studies, and policy research.

To demonstrate this approach, we apply CSSO to consider the question: what features are associated with areas labelled safe and unsafe? Using a data set of 49 safe and 51 unsafe areas drawn on a map by residents of Nagykanizsa, Hungary, we applied computational systematic social observation to 1) collect images of the areas and 2) extract the features of these images to a database. In doing so, we first establish a technique for generating a sufficient set of coordinates within each polygon from which to extract Google Street View (GSV) images for analysis by querying the Google API using each coordinate, acquiring four street-level images at north, east, south, and west orientations. Once a set of GSV images is extracted, to represent the area within the polygons of interest, we apply the Google Vision AI pre-trained computer vision algorithm to detect objects within the GSV images. We then compare the features between the safe and unsafe areas, facilitating an exploratory, data-driven approach to understanding elements of the built environment which impact perceived safety in a fully replicable and scalable manner (and on a large scale which provides comprehensive coverage of the areas of interest).

Background

The role of the built environment in preventing or facilitating crime opportunities, as well as generating feelings of safety or fear, is increasingly better understood. Crime prevention through environmental design (CPTED) highlights the role of urban design in enhancing community safety [27]. “The proper design and effective use of the built environment can lead to a reduction in the fear and incidence of crime, and to an improvement in the quality of life” [28]. It therefore follows that to reduce instances of fear of crime, we must explore which environmental features are associated with perceived (un)safety, to inform these urban regeneration initiatives.

Fear of crime and the environment

Here, we approach fear of crime as an experience which poses a barrier to people living healthy lives, enjoying sustainable travel, and benefiting from these equally. This experiential fear of crime must be understood as a context-specific experience, which people encounter as they go about their daily routine activities [29]. This experiential fear [30] is what people experience when doing their routine activities, which is situation specific and can be reduced with environmental design measures. We can consider this an emotional response, something in which risk perception (informed by cues in the situation) plays a key role [31]. People inferring relevant information from their immediate environments is one of the hallmarks of effective human adaptation to their environment [32][33], and we can understand that this is influential on context-specific experiences of fear.

Of course, the characteristics of the person “doing” perception also play a role; some variance in perceived safety of certain areas is due to individual characteristics [31]. However, there is an even larger portion of variance which can be explained by the characteristics of the environment itself [33]. In this paper, we focus on these environmental factors and, to do so, we must “place” these experiences of fear of crime, i.e. locate them in a specific area. As such, this experiential fear is the focus of our article, aligning ourselves with a definition of crime fear as a context-specific experience. In this way, various environmental cues lead people to perceive some spaces as more prone to crime than others [34].

Generally speaking, the environmental factors associated with fear of crime by previous research can be grouped into three categories: visibility, physical signs of disorder (environmental incivilities), and social characteristics (e.g., groups of people hanging around, drinking, and rough sleeping). The latter two, signal disorders/ signal crimes, sometimes called environmental antisocial behaviour or incivilities [35][36], and social characteristics of areas which are less about environmental design and more about indicators of activities such as street drinking, panhandling, or groups of people loitering [37]. These physical and social signs of the disorder, such as graffiti, litter, vandalism, and other signs of neglect of the environment, are seen as drivers of fear [5]. These elements are seen as “signals” [36], which communicate to people a lack of commitment to social norms [36][38]. While important, it is not necessarily in the realm of the urban planner to be able to address these as much as they are able to address visibility. Therefore, we will focus on this idea of visibility, which is summarised by the concepts of prospect, concealment (also called refuge) and entrapment [39][33][40][5].

Prospect refers to the ability of individuals to see the openness of their immediate environment [39], attempting to capture the extent to which an environment offers a good overview to an observer. From an urban design perspective, the prospect can be conceptualised as a clear line of sight, including being able to see easily see from one side of the street to the other [28]. A better prospect in an area is associated with environmental safety judgements [33]. Darkness (linked to absent or poor lightning) reduces prospect, increasing feelings of fear [40][41][5].

In addition to an area with low prospect, concealment suggests the existence of places in which the potential offender could be hiding [42][43][38]. These may include places that are not visible because they are isolated, or sight lines which are obstructed by vegetation, landscaping, or poorly designed buildings that are perceived to increase the risk of attack and therefore fear [5]. Better lighting reduces places for would-be offenders to conceal themselves [5]. In this sense, better lit areas allow for the natural surveillance of an environment. Another feature associated with concealment is dense vegetation. Previous studies have found that dense vegetation is related to increased perceived insecurity [39] as it can communicate hiding places of potential offenders and poor maintenance of spaces, representing social disorder and threat. In contrast, well-cared vegetation, such as grassy areas tended, high canopy trees, flowers, and low bushes, increases feelings of safety because it communicates social order [38].

Finally, entrapment refers to “the extent to which people judge an environment to possess characteristics that impede escape from dangerous situations ”[33]. Returning to lighting, dimly lit places reduce an individual’s field of vision, signalling that the individual may face difficulty escaping when faced with a potential offender. This idea of entrapment emerges as areas with blocked escape are highly associated with fear of crime [38]. Such obstructions to visibility also create the feeling of being ‘trapped’; by contrast, a sense of ‘openness’ in the environment is reassuring [5]. To address entrapment, urban designers can consider the connectivity of a place, in terms of the present infrastructure (e.g., footpaths, laneways, or streets) [28].

These characteristics of the built environment: prospect (perceived overview of a scene), concealment (perceived environmental affordance of hiding places), and entrapment (perceived escape possibilities), are associated with fear of crime in specific places [33]. So how to measure these? What specific features of the built environment might be present or absent to create these perceptions? To answer these questions, it is important to gain a catalogue of the environmental features present in safe and unsafe places. Traditionally, Systematic Social Observation (SSO) has been the preferred method for conducting environmental audits to gather data suitable for quantitative analysis.

Systematic social observation

An approach to understanding what it is about the built environment that makes people feel safe or unsafe is to carry out environmental audits of areas considered safe or unsafe. This is by no means the only approach; there are also experimental studies which involve the pre-hoc selection of environmental stimuli designed to elicit responses of (un)safety. In such studies, the participant may be exposed to the stimuli by walking through a pre-determined route which contains features of interest [39][43][44]. Another group of studies in the experimental research design group exposes participants to relevant stimuli using images or a pre-recorded video which contains features traditionally associated with fear of crime, and then analyse the relationship between levels of fear that people indicate and the score of the image on various fear of crime indicators (e.g. [38]).

However, observational studies allow us to explore context-specific experiences in places where people’s routine activities take them through daily. Observational studies of fear of crime and place involve collecting data on perceived safety in a particular area and then correlating the measures of perceived safety with features of the environments observed retrospectively in these areas. Such studies use direct observation to catalogue the environmental features present in these areas. Direct observation is fundamental to the advancement of science [45]. [12] advocated systematic social observation (SSO) as a key measurement strategy for a wide variety of social science phenomena [45]. SSO refers to systematically aggregating all features of interest following explicit rules that allow replication [12]. It is also important that the means of observation, whether a person or a technology, must be independent of that observed [45], and there should usually be multiple observers to allow performing interrater reliability exercises to validate the SSO data. SSO has been widely adopted in criminological studies, including the evaluation of therapeutic communities for drug offenders [46].

While SSO has been widely adopted and beneficial in social sciences, it is not without limitations. Two key issues associated with Systematic Social Observation (SSO) are observer bias (in non-agreement between observers [17] and non-random selection of environments to survey [33]) and scalability [47]. Bias arises from the human observers in SSO projects for various reasons like human subjectivity, resulting in non-agreement between observers, observer error, and even “cheating” by observers to reduce their workload or record in a desirable location [48]. [17] explore the issue of bias in SSO coders in detail and conclude extensive training of observers as a remedy.

Another limitation lies in the decisions that researchers must make due to resource constraints. Specifically, about units of analysis and the sampling frame of areas of interest [48]. Since human observers cannot cover everything within realistic time and cost constraints, they must select a sample that is not truly random, as it is based on on-site decisions [33]. Since human resources are limited, we may see bias in the areas observed with SSO.

A related issue is that of scalability. SSO research studies usually require some form of sampling of the desired areas of interest to make it feasible for observers to cover these areas. For example [45] combine Chicago’s 865 census tracts into 343 neighbourhood clusters, and then sample only 80 of these for their SSO study. Still, carrying out SSO on these neighbourhoods was a large undertaking “Between June and September 1995, observers trained by the National Opinion Research Center (NORC) drove a sport utility vehicle at a rate of five miles per hour down every street within the 80 sample NCs. The composition of the vehicle included a driver, a videographer, and two observers.” [45] p.13. In this process, “NORC collected data on 14 variables in the 23,816 observer logs with an emphasis on land use, traffic, the physical condition of buildings, and evidence of physical disorder.”[45] p.13. Even after collection this data was not fully utilised: “By contrast, because of the expense of first viewing and then coding the videotapes, a random subsample of all face-blocks was selected for coding. Specifically, in those NCs consisting of 150 or fewer face-blocks, all face-blocks were coded.”[45] p.13 “From the videotapes, 126 variables were coded, including detailed information on physical conditions, housing characteristics, businesses, and social interactions occurring on each face-block”[45] p.14 Clearly, implementing in-person field audits can be expensive if observations are needed over large or geographically dispersed areas or at multiple points in time. A reliable and more efficient method for observational audits could facilitate extendibility (i.e., expanded geographic and temporal scope) and lead to a more standardised assessment that strengthens the ability to compare results across different regions and studies [47][49].

Computational social science approach

The above section detailed the shortcomings of field audits such as SSO. These direct observations require a visit to each area by trained observers, which can be an expensive and time-consuming method, especially for a large-scale or geographically distributed study [49]. However, the advent of new, freely available remote sensing technologies provided by Google and more recently by Microsoft has been seen to offer new possibilities to obtain geospatial data collection, including much greater coverage of many parts of the world [49][50]

Technological advances in research methodologies facilitate academic researchers and policymakers to scale up investigations into the key issues facing individuals, communities, and societies. In the space of environmental audits, web-based geospatial services are increasingly being used by researchers to perform ’virtual’ environmental characteristics audits [49]. These platforms, including Google Street View (GSV), Google Earth, Bing Maps, and others, have been used in a range of domains to identify relevant features of built environments related to a range of outcomes [51]. For example, [20] manually collected a set of Google Street View images related to street addresses where burglaries had taken place, and conducted Systematic Social Observation (SSO) through GSV to identify commonalities between burgled areas. SSO has also been applied to video recordings, both CCTV and citizen recordings, for example, of police interactions [52]

However, while the collection of images representative of the environments which otherwise would have had to be visited on foot is sped up by the use of technologies such as GSV, the requirement for individual raters to go through each image and catalogue the features which they find is not solved. The issue of time taken by the researcher remains, and therefore neither is the scalability of this approach nor the issues of bias. To address these, we can consider the use of a computer vision algorithm to catalogue the features in the images instead.

Previous research in studying the relationship between crime (including the fear of) and the built environment has used supervised machine learning approaches to learn relationships between data classified by participants. However, it is evident that in previous works, the environment, which is often displayed as a street-level image, is collapsed into a small set of features. For example, [41] focus on the colour properties of the image and [23] extracts only eight key environment attributes. Although these works provide the promise of the potential to use street-level images for understanding the fear of crime, there is a substantial opportunity to progress understanding by considering a larger and diverse set of environmental features. In a similar study, [53] combined an automated image collection approach combined with a supervised machine learning algorithm trained through crowd-sourcing through Amazon Mechanical Turk to understand how they might identify visible conditions of urban environments at a large scale. In all these cases, supervised machine learning in terms of image feature recognition and classification is used, meaning that the algorithms are trained based on features predetermined to be of interest to the research. What we do is different in that we explore a fully exploratory approach, where we do not employ crowd-sourcing for any sort of training. We take a wholly inductive approach that is data-driven and entirely bottom-up with no prior hypothesis. There is an absence of research exploring the use of generalised pre-trained algorithms (e.g., Google Vision AI) with the capability to identify a significantly larger set of features.

In one recent paper, the authors used a similar approach as adopted in this paper, where Google Street View (GSV) images are acquired for an area and processed to extract features of the environment [24]. The authors focus on the area of Santa Ana in the United States, where they have information on different types of crime. They use Deeplab3++ as their machine learning engine, which is a type of Convolution Neural Network. Their approach captured GSV images every 20 metres in different orientations (north, south, east, and west), before using machine learning to recognise the occurrence of 11 predefined environmental features to understand their relationship with the different crime categories. Other recent work has followed a similar method but with different sets of environment features [23]. The work presented in this paper follows a substantially different methodology, where the aim is to explore if any features of the environment influence whether or not an area is categorised as safe or unsafe. To undertake this, it is necessary to perform an experimental approach whereby more images are collected, as we do not have a precise point-level location of interest, and to perform the extraction of a larger feature set. In this way, what we employ is a computational systematic social observation at area-level, which could be applied to while neighbourhoods (or other units of interest).

The current study

In this paper, we demonstrate the use of automatic image extraction within a polygon of interest from GSV, combined with machine learning to identify features of the built environment in these images, by identifying what features are associated with perceptions of safety. By combining the automated collection of images representative of areas relevant to people’s experiences of fear of crime with the automated extraction of specific features present in these areas, we can scale and automate SSO. This approach we term computational systematic social observation. We believe this allows the study of environmental features associated with safe/ unsafe places at unprecedented scales. Compared to human coders extracting features from images, GSV is much more time efficient. For example, [54] found that Google Vision (GV) spent 5 minutes to codify 1,818 images, while the human coder needed 35 hours to complete the same task, making Google Vision 14,880% cheaper. This means that in the same period, many more images can be processed, solving the problem of selecting non-representative samples of areas [33] by auditing the entire area. The issue of coder bias or non-agreement between coders [17] may be addressed in this way as well. While [54] found only 52.4–65.0% of the images were similarly codified between human raters and the algorithm, they found that ultimately “even if the human coder generated more diverse and concrete tags, similar conclusions can be extracted.” Replicability is further increased by applying a uniform algorithm which is consistent between all areas, rather than hiring multiple observers to cover different parts of the study area.

Therefore, this paper demonstrates such a computational approach, paired with an automated way to collect images from our desired study areas. We present a large-scale data-driven approach to extracting and evaluating environmental features about the perceived (in)security of places. Our approach is based on a purely visual audit of environmental feature here, rather than making use of additional crowdsourced or big data sources that could be used to catalogue neighbourhood features or characteristics (e.g. [55][56].

Data and methods

Our research is motivated by two recent works. First, by [23], who identify the correlation between attributes of the built environment and crime. Their technique used Google Street View (GSV) images and publicly available police data from a town in northern England. Although the research was exploratory and had to overcome the limitation of handling location-approximate crime data, the technique to acquire images and extract features of the built environment is relevant to this study. The second key work is that by [57], who captured 3,955 polygons representing safe or unsafe areas drawn by 910 respondents in Hungry between January 2016 and March 2019. Note that the authors referred to unsafe areas as those where the participant feels fear, and in this research, we use the classification of unsafe throughout. The data acquired in the study by [57] is used in this investigation. This section provides more information on the specifics of the data used in this research, as well as the technical process undertaken to extract street-level images and use Google Vision API to code features in these images. The following list presents the high-level summary of activities undertaken in this research and presented in this manuscript.

  • A sample of 100 polygons where people rated areas as safe or unsafe is taken from data collected and published in research undertaken by [57].

  • A technique is developed to calculate longitude/latitude coordinates within each polygon with a distance of 20m between each;

  • Next, Google Street View is queried using each coordinate facing north, east, south and west to extract images;

  • The images then pass through the Google Vision API to extract objects based on their pre-trained algorithm;

  • Features are grouped for each entire area, followed by accumulating them for each safety category (safe/unsafe);

  • Finally, a dependency measure is used to understand how strongly correlated an object is to whether the area is safe/unsafe.

As one of the objectives of this paper is to demonstrate this method, each step is detailed in the following subsections.

Study area and sample

For this study, 100 polygons were used that originate from a study conducted in Nagykanizsa, Hungary [57]. As part of a larger study, participants were asked to draw digital sketch maps using a web application, marking areas where they felt safe with green polygons and areas where they felt unsafe with red polygons. Data collection was carried out online and respondents accessed the platform via social media without having to register. The study focused on nine Hungarian cities, chosen based on achieving a minimum number of 50 respondents per city. We focus on Nagykanizsa, specifically a sample of 100 polygons from this area, drawn by 86 unique individuals, 31 females, and 55 males. Since these were drawn by local residents and labelled safe or unsafe by them, we take their perceptions as a “ground truth” in that they have indicated that they feel this way and this is a subjective perception. Polygons represent a good split between areas classified as safe (n = 49) and unsafe (n = 51). Polygon number 69 in the data set was excluded because it was too large to constitute a meaningful response [57] 1.

Dependent variable: Safe and Unsafe Places

A digital sketch map tool was presented to participants to draw boundaries for areas where they feel safe or unsafe in their home city [58][57]. Digital sketch maps and mental mapping as a scientific tool for detecting the perception of citizens about their environment have a long tradition [59][60][61][62]. One advantage of this technique is that it gets around the known issue of people’s experiences and self-defined neighbourhoods not necessarily coinciding with existing administrative boundaries. When collecting self-reported experiences of fear of crime, as with crime, scholars must take seriously the level of aggregation and spatial scale [63]. Such data can be collected at the point level, to pinpoint the exact fear of the experience of crime, and then study the environmental characteristics in these specific places [34]; however, this precision is better suited to data collected in real time, rather than retrospective reporting [29]. For studies concerned with neighbourhood-level fear and environmental correlates, defining a neighbourhood is not so easy. Most fear of crime studies examining the role of local context use almost exclusively administrative neighbourhoods with fixed boundaries [64]. The use of administrative boundaries is motivated pragmatically, due to data availability, and in the case of SSO, to provide a neat boundary for observers to cover, and lacks a solid theoretical justification. Administratively defined areas do not necessarily align with how inhabitants experience their unsafety [64] and instead, it is better to focus on the person–context perspectives, that is, to understand how people define their neighbourhood contexts [65]. Nevertheless, allowing participants to select areas of their city on a map which they find safe or unsafe addresses the issues of choosing the appropriate unit of analysis for fear of crime and also the issues around the Modifiable Areal Unit Problem [65]. These areas can then be used to better understand people’s lived experiences of safe and unsafe areas, creating boundaries that reflect these experiences. For more detail on this method including sample see [57].

Computational Systematic Social Observation

Once the boundaries of safe and unsafe places were collected, we applied CSSO to extract the environmental features present in each polygon. In this section we detail the steps required.

Coordinate generation

To extract Google Street View (GSV) images, it is necessary to have longitude and latitude coordinate values for each location where an image is required. The data set contains the coordinates (corners) of each polygon, but does not contain coordinate information within the polygon that is necessary for us to retrieve a GSV image. Therefore, we devised the following technique to systematically create longitude and latitude coordinate values within the polygon. The technique developed in the paper, in summary, takes coordinate pairs (start and end), generating new coordinates of a fixed distance from the previous in a straight line towards the end coordinates. the technique is repeated with all coordinate pairs as they are generated, ensuring that no duplicate coordinates in the same location, or fixed distance, are generated. Although this technique is exhaustive, it ensures a systematic approach.

step=d111111step = \frac{d}{111111}

Original coordinates locations

All new generated coordinates locations

Magnified in view of lower section

The details of the approach are presented in Algorithm [algo:coordinategeneration]. The algorithm takes as input two sets of real numbers to represent latitude pairs LA=RLA=\mathbb{R} and longitude pairs LO=RLO=\mathbb{R}. Both sets are of equal size, and the same index location in each is a corresponding pair of latitude and longitude values. The algorithm works by considering each latitude/longitude pair in comparison with each other unique pair. The straight line difference is calculated and divided into equal 20-meter segments. A new latitude and longitude pair are then generated in equal increments of 20 meters, and the process is repeated until there are no more pairs to consider. At this point, the space becomes saturated and a complete set of coordinates have been generated. An example can be seen in the three images provided in Figure [fig:area0]. Figure 2 shows the five coordinates provided in the original data set. Note that only four are visible as the first and fifth are the same. Figure 2 illustrates all the points generated within the polygon, but since there are too many to visualise, a magnified excerpt is shown in Figure 3. It is noticeable that the technique does not generate the coordinates in a perfect grid formation, but this is because the polygon is not perfectly rectangular and therefore 20-metre increments from different starting coordinates generate different locations to their immediate neighbouring coordinates, which despite appearing to be on the same line, are moving slightly in both latitude and longitude. As demonstrated in Table 5, the number of coordinates that have been generated ranges from as little as 18 for area 23 to 233,254 for area 68.

Image extraction

Following coordinate generation, the next stage of this research is to take each individual coordinate pair for each polygon and extract a street view image. In this research, we use Google Street View (GSV) as it is one of the most up-to-date street view services and it can be programmatically controlled through the Google API. We query the Google API using each coordinate and acquire four street-level images in north, east, south, and west orientations. If the coordinates do not match a valid street location, then no images are retrieved. This happens on many occasions, where for example the coordinates are within a building or in an area of open space. In the interest of saving space, each image was captured in the resolution of 600×\times800 pixels. This was a necessity due to the total number of images and the amount of storage space acquired. In total, for the 100 areas, we acquired 1,295,298 images that occupy a total of 723 gigabytes.

Example of street-level images acquired at a location in polygon 1

An example is demonstrated in Figure 4, where a location is taken from a valid generated street view location for area 1. As can be seen in the figure, four images have been extracted for that one location in North, East, South, and West orientations. This specific example is on Zrinyi Miklós street which is visible in Figure 1.

As mentioned above, polygon number 69 has been removed due to its size. In total, 674,730 coordinates were generated and this is around 3 times larger than the next largest (polygon 68). Considering that the data collection phase for polygon 68 took 40,491 minutes, which is slightly longer than 28 days, it would take around 78 days. The reason why it takes so long to extract the images is that each image requires a few seconds to acquire and a timeout duration of 10 seconds is required for each location. If an image is not returned within the timeout duration, then it can be established that those coordinates are not on a street.

Table 5 also provides information on how many images were acquired for each polygon. At best, the number is four times the number of coordinates; however, in all instances is it lower as it depends on how many of the coordinates fall on valid street view locations. If the image does not fall on a valid location, then the images at that location in the four directions are not acquired. In terms of the number of images acquired for each polygon, this varies from as few as 72 for polygon 18 occupying 42 megabytes in space, to 51,875 images for area 68 occupying a space of almost 29 gigabytes.

Computer vision

In this research, a pre-trained computer vision technique is used to detect objects within the GSV images. In terms of this research, objects are things that are visible in the GSV image and can be automatically recognised. For example, buildings, cars, trees, etc. In previous and related work, object detection techniques were specifically trained to recognise a subset of environment features (eight in total) [23]; however, as we are using a pre-trained algorithm in this research, we are able to extract all objects recognisable to the algorithm. We use the Google Vision AI API as it is one of the more popular off-the-shelf computer vision architectures to extracting features from images, and is therefore widely tested in other domains [66][67][54]. In addition, each object is identified with a percentage confidence as to how certain the algorithm is that it has identified that object. In this research, we extract the top 10 objects from each image, providing their confidence scores are above 70%. Figure 5 illustrates an example whereby vehicles, buildings and windows are identified within the image. Note that the majority have a confidence above 70%.

Demonstration of objects discovered using the Google Cloud Vision API on a street view image

Analytical approach

Validity checks

In order to report on the feasibility of CSSO for environmental audits, we consider a series of validity and reliability checks. We start with addressing face validity. For this, we compare features between safe and unsafe polygons to identify which are more associated with either type. For each identified object, we use a dependency measure to understand how strongly correlated an object is to whether the area is perceived as safe or unsafe. Specifically, we use a χ2\chi^2 statistic to measure the independence between terms and categories in text categorisation [68]. The challenge of determining independence and dependence between terms and categories in information retrieval systems shares many characteristics of measuring the relationship between safe or unsafe places and features of the built environment. The χ2\chi^2 statistical measure has many successful applications in data mining and knowledge extraction tasks, particularly those in information security [69][70]. In this research, we use a two-way contingency table of feature ff and safety category (safe or unsafe) cc, where AA is the number of times feature ff and safety category cc co-occur, BB is the number of times ff occurs without cc, CC is the number of times cc occurs without ff, DD is the number of times neither ff or cc occur, and NN is the number of areas.

χ2(f,c)=N(ADCB)2(A+B)(A+C)(B+D)(C+D)\chi^2(f, c) = \frac{N(AD - CB)^2}{(A + B)(A + C)(B + D)(C + D)}

The χ2\chi^2 scores for each of the features and their relationship to their residing location’s safety category allow us to compute the χ2(f,c)\chi^2(f, c) scores between the two different safety categories using the following equation:

diff(f)=χ2(f,safe)χ2(f,unsafe)diff(f) = \vert \chi^2(f, safe) - \chi^2(f, unsafe) \vert

Finally, the average diffavgdiff_{avg} value is calculated for all diff(f)diff(f) scores. We rank the features to identify those features with the maximum diff(f)diff(f) values.

As demonstrated in other classification tasks using χ2\chi^2 for feature selection, it is useful to categorise the data sets using the top features for each class [71]. To investigate how well these features can be used to categorise an area as ‘safe’ or ‘fear’, the frequency for each of the 298 features is calculated for each polygon, before counting the occurrence of the top features presented in Table 1 appearing in the top 25 of the full feature list. To determine which category the polygon best aligns to, the average occurrence of each of the present ‘safe’ or ‘fear’ features in the top 25 is calculated. This provides a measure of how strong the matching is with respect to either category.

We then compare the CSSO results with two exercises that employ human coders. First, a site visit traditional SSO to a sample of 6 polygons, 3 marked as safe and 3 marked as unsafe by participants. Second, we present the list of features to seven experts in the area of place-based fear of crime who rated the features as positively or negatively associated with fear of crime based on their understanding of concepts of built enirvornment and perception.

Finally, we go on to test known-group validity. Known-group validity is a form of construct validity where hypotheses are pre-specified and then tested to reflect whether a tool is able to differentiate where differences are expected a priori. Where a statistical difference is found, it supports the validity of the tool and where the differences are not significant, either the tool/item is flawed, the hypothesis flawed, or the power inadequate [72]. In fear of crime studies, gender has been established as a known factor. Therefore, we might expect different results between different genders.

Results

Preprocessing

In total, 319 unique features are identified in images acquired from areas of both classification types (safe, unsafe). After manually looking at the features it became immediately apparent that some have been identified due to parts of the image detailing user interaction components in Google Maps, such as ‘Computer’, ‘Text’, and ‘Software’. This leaves a total of 297 unique features identified for further analysis.

In terms of the percentage distribution of the identified features, the same 10 features occupy more than 55% of the total number of features identified for each area. The general observation from looking at these images is that in most instances there is little identifiable difference between the occurrence of the top 10 features and whether the areas have been classified as safe or unsafe. There are some small differences in the inadequate range for features such as ‘nature’ and ‘plant’; however, they have a similar median value for both classification types. For this reason, it is necessary to consider the use of a dependency measure between each feature and classification type.

What features are associated with safe/unsafe areas?

To identify which features occur more in safe or unsafe areas, we consider the results of our χ2\chi^2 statistic to measure the independence between terms and categories.

Feature

Total Safe

Total Unsafe

χ2(f,safe)\chi^2(f, safe)

χ2(f,unsafe)\chi^2(f, unsafe)

difffdiff_{f}

Best Category

Sky

329316

663921

0.003556

0.063941

0.060385

fear

Building

190619

307644

0.039673

0.092245

0.052573

fear

Tree

248408

582299

0.051851

0.000004

0.051848

safe

Cloud

234848

531321

0.030527

0.000750

0.029776

safe

Woody plant

22873

49032

0.003444

0.022438

0.018994

fear

Plant

292710

612018

0.009741

0.028357

0.018616

fear

Land lot

123114

204545

0.020347

0.038815

0.018468

fear

Urban design

129000

224153

0.012

0.029

0.017

fear

Asphalt

255567

527432

0.004829

0.020101

0.015272

fear

Road surface

158443

353442

0.013977

0.000141

0.013837

safe

Infrastructure

151189

334879

0.012

0.000

0.011

safe

Vehicle

94498

155784

0.017474

0.028266

0.010791

fear

Property

102420

177081

0.010632

0.021190

0.010559

fear

Mode of transport

68449

176913

0.027528

0.017581

0.009948

safe

Automotive tire

13263

18655

0.021751

0.013721

0.008030

safe

Thoroughfare

38234

124116

0.057188

0.049301

0.007887

safe

Nature

90079

208084

0.011628

0.003866

0.007761

safe

Biome

122659

270326

0.007948

0.000351

0.007597

safe

Tire

72192

115201

0.018477

0.025297

0.006820

fear

House

66208

102726

0.021341

0.027506

0.006165

fear

Car

104944

198715

0.001

0.007

0.006

fear

Light

34509

63012

0.006359

0.001134

0.005225

safe

Wheel

62358

99538

0.016045

0.021075

0.005030

fear

Ecoregion

60022

97746

0.012932

0.017291

0.004358

fear

Window

120657

242811

0.000139

0.002618

0.002479

fear

Top 25 features where difff>diffavgdiff_{f} > diff_{avg}

Table 1 provides the top 25 features, which were identified where difff>diffavgdiff_{f} > diff_{avg}. In this experiment, diffavgdiff_{avg} is 0.001629 and 25 features have a difffdiff_{f} that is larger. The table also provides information on which safety category each feature has been identified as having a stronger dependency measure. A total of 15 of the features are strongly dependent on the safe category, while 10 are strongly dependent on unsafe.

The feature names are those output by the Google Vision API and most are self-explanatory. However, it is very clear that there is a strong overlap between some of the features. For example, ‘Asphalt’ and ‘Road surface’ are clearly similar as asphalt is often used as a road surface. What is surprising here is that ‘Asphalt’ is more strongly associated with fear, whereas ‘Road surface’ is more strongly associated with safe; however, other materials are used for road surfaces, just as asphalt may be used for other purposes. There are also overlaps between ‘Tree’, ‘Nature’, and ‘Ecoregion’ that are clearly all related to nature. However, a similar situation is presented here where ‘Tree’ and ‘Nature’ are both categorised as safe, whereas ‘Ecoregion’ is associated with fear.

To demonstrate that these associations can be used for classification, a simple logic-based classification approach is tested. This is performed by taking using the features most strongly associated with safe and fear (the 15 and 10 shown in Table 1) and determining an area in which of the features are most frequently occurring. For example, taking the top feature for safe and fear (‘Tree’ and ‘Sky’, respectively), the feature that occurs the most in that region will determine whether the area is categorised as ‘fear’ or ‘safe’. In the analysis, the classification is performed for each of the 100 areas. Table 2 presents Precision, recall, and F-Measure results when performing classification using different feature sets, which are constructed by taking the top nn features, starting at n=1n=1 and incrementing until all features in the top 25 are used. This is a maximum of 15 safe and 10 fear features. The imbalance in a number of features between the two categories is handled as we are using the average feature occurrence to determine which category the area matches the best. The result presented in the table are interesting and demonstrate that the best results are identified with the use of only two features, which are ‘Tree’ for safe polygons and ‘Sky’ for unsafe polygons. The F-measure is decreasing as we increase the feature sets, demonstrating that only the feature sets with a strong dependency measure have predictive power. Using only the two features produces good precision (i.e. minimising false positives) and recall (i.e. minimising false negatives). However, the recall improves incrementally until the set of features reaches 6, but is to the detriment of precision. The F-Measure (harmonic mean between precision and recall) is used to demonstrate the overall capabilities, as we are interested in achieving both good precision and recall. Table 1 shows that it is best for only the first two rows. It is interesting to see the capabilities rapidly diminishing with an increased number of features, which demonstrates that the classification problem can be reduced to a single binary decision based on whether ‘Tree’ or ‘Sky’ are more common.

Safe Feature Set

Fear Feature Set

Precision

Recall

F-Measure

Tree

Sky

97.8

89.8

93.6

Tree, Cloud

Sky, Building

95.5

85.7

90.3

Tree, Cloud, Road surface

Sky, Building, Woody plant

56.8

93.9

70.8

Tree, Cloud, Road surface, Infrastructure

Sky, Building, Woody plant, Plant

57.3

95.9

71.8

Tree, Cloud, Road surface, Infrastructure, Mode of transport

Sky, Building, Woody plant, Plant, Land lot

55.4

93.9

69.7

Tree, Cloud, Road surface, Infrastructure, Mode of transport, Automotive tire

Sky, Building, Woody plant, Plant, Land lot, Urban design

85.7

73.5

79.1

Tree, Cloud, Road surface, Infrastructure, Mode of transport, Automotive tire, Thoroughfare

Sky, Building, Woody plant, Plant, Land lot, Urban design, Asphalt

88.2

61.2

72.3

Tree, Cloud, Road surface, Infrastructure, Mode of transport, Automotive tire, Thoroughfare, Nature

Sky, Building, Woody plant, Plant, Land lot, Urban design, Asphalt, Vehicle

86.2

51.0

64.1

Tree, Cloud, Road surface, Infrastructure, Mode of transport, Automotive tire, Thoroughfare, Nature, Biome

Sky, Building, Woody plant, Plant, Land lot, Urban design, Asphalt, Vehicle, Property

77.4

49.0

60.0

Tree, Cloud, Road surface, Infrastructure, Mode of transport, Automotive tire, Thoroughfare, Nature, Biome, Light

Sky, Building, Woody plant, Plant, Land lot, Urban design, Asphalt, Vehicle, Property, Tire

64.5

40.8

50.0

Tree, Cloud, Road surface, Infrastructure, Mode of transport, Automotive tire, Thoroughfare, Nature, Biome, Light

Sky, Building, Woody plant, Plant, Land lot, Urban design, Asphalt, Vehicle, Property, Tire, House

55.2

32.7

41.0

Tree, Cloud, Road surface, Infrastructure, Mode of transport, Automotive tire, Thoroughfare, Nature, Biome, Light

Sky, Building, Woody plant, Plant, Land lot, Urban design, Asphalt, Vehicle, Property, Tire, House, Car

48.5

32.7

39.0

Tree, Cloud, Road surface, Infrastructure, Mode of transport, Automotive tire, Thoroughfare, Nature, Biome, Light

Sky, Building, Woody plant, Plant, Land lot, Urban design, Asphalt, Vehicle, Property, Tire, House, Car, Wheel

44.4

32.7

37.6

Tree, Cloud, Road surface, Infrastructure, Mode of transport, Automotive tire, Thoroughfare, Nature, Biome, Light

Sky, Building, Woody plant, Plant, Land lot, Urban design, Asphalt, Vehicle, Property, Tire, House, Car, Wheel, Ecoregion

52.2

49.0

50.5

Tree, Cloud, Road surface, Infrastructure, Mode of transport, Automotive tire, Thoroughfare, Nature, Biome, Light

Sky, Building, Woody plant, Plant, Land lot, Urban design, Asphalt, Vehicle, Property, Tire, House, Car, Wheel, Ecoregion, Window

58.5

49.0

53.3

Results of classification based on top 25 features

Comparing with human observers and coders

To compare the CSSO process with traditional SSO, we performed two exercises. The first was a site visit and SSO carried out for 6 polygons, 3 safe ones and 3 unsafe ones. One coder was given training based on existing knowledge (summariser in the paper earlier) on environmental features associated with fear of crime. They were provided a coding form consisting of a spreadsheet with a dropdown menu where they could select features from the list of all features identified by the computational SSO process. They were asked to catalogue every item which they encountered on the site visit of the area using this tool. In addition, they were asked to give a score of unsafety for each feature: +1 if it is thought to increase fear, 0 if it is thought to be neutral, and -1 if it is thought to decrease fear (increase safety). They were also provided free text fields to record additional observations.

Overall, there were far fewer features coded for each polygon by the human coder. The average number of features recorded across the 6 sites was 14.5 (sd = 2.7), the minimum 10 and the maximum 18. Looking at the scoring of these features reveals the importance of context in understanding these features. The same features were often coded as positive, negative or neutral by the coder. For example, "bench" was rated as positive (associated with more fear) in the case of polygon 28 (an area indicated as unsafe by the original participants marking it) with the note: "Benches are in very bad condition" but rated as negative (associated with less fear) in polygon 39 (an area indicated as safe by the original participants marking it) where they are noted to be: "Painted benches in good condition along the main road".

Summing the scores of the features of the polygons confirms this showing that the unsafe polygons were rated to a score of 0, while the safe polygons a score of -25. However, all polygons had both safe and unsafe features in them. The range of scores of the features in each polygon is shown in Figure 6.

Features which were rated as safe or unsafe both by human coders and by our SCO results

Finally, there were additional features which the human coder wanted to note, despite not finding a relevant matching feature in the dropdown menu, selecting an option “NA” and using the free text box instead. These all related to possibly more transient elements of the environment such as litter, rubbish, and levels of crowding. Additionally, the free text comments showed that in some cases where a relevant category as selected, this was still caveated with the free text. For example, the feature “passenger” was selected to describe the presence of "many homeless and minority people". Overall, while this is not the remit of urban design and planning, it was impossible for the human assessor to exclude these relevant environmental features when assessing what might be relevant to perceived safety or unsafety in an area.

Secondly, we wanted to understand how the results from the CSSO compare with the extant knowledge from previous research about environmental features and fear of crime. To answer this, we took the rating of the features by seven independent coders and compared it with the rating of the algorithm (described above). First, we can look at some aggregate measures. To do this, we created a summative score from the rating of the coders. For example, if something was rated as negatively associated with fear of crime, it was given a score of -1. If all coders rated the feature in this way, the total score would be -7. We compare this with the difffdiff_{f} scores in the above analysis, weighted as positive when in favour of unsafe areas (positively associated with fear of crime) and negative when associated with safe areas (negatively associated with fear of crime). A Pearson correlation suggests a statistically significant positive association between human coders and the results of our SCO (t = 2.185, df = 305, p-value = 0.02965) but the effect size is quite small (95 percent confidence interval: 0.0124 - 0.2329). Perhaps more meaningful is to visualise where the coding diverges and aligns. Figure 7 illustrates features on which the human coders disagree with the results from the CSSO. These seem to be features of the built environment that are associated with urban areas (Building, Window, Property, House). While in human coders’ interpretation these were referred as “safe”, (these may be aligned with guardianship), in our observational data these features occurred more in polygons rated as unsafe. On the other hand, thoroughfare and road surface were rated as positively associated with fear by the experts, while these features appeared more in safe polygons.

Features which were rated as safe or unsafe by human coders but the opposite by our SCO results

Figure 8 highlights the points of agreement between the coders and our results. We see that coders identified features associated with nature and light as negatively associated with fear of crime, that is, more present in areas rated as safe. On the other hand, signs of hostile urban environments (Fence, Hazard, Wire Fence) and lack of light (Shade) were positively associated with fear of crime (found more in the unsafe polygons).

Features which were rated as safe or unsafe both by human coders and by our CSSO results

We can also consider some inter-rater reliability measures between the raters and the CSSO results. Computing an extended percentage agreement treating all the human coders as one (and therefore still relying on the summative score to assign each feature to the ‘safe’ or ‘unsafe’ category) shows a 37.8% agreement. An unweighted Cohen’s κ\kappa for two raters shows statistically significant agreement (p-value = 0.0279); however, once again the coefficient is rather small (κ=0.0647\kappa = 0.0647). κ\kappa is simply the proportion of agreement after the chance agreement is removed from consideration. When the agreement obtained equals the chance agreement, κ=0\kappa = 0. Our value of 0.0647 while positive, is not much greater than chance (perfect agreement means κ=1\kappa = 1) [73].

However, we are losing data by combining our raters into one category. Instead, we can consider inter-rater reliability methods for multiple raters. First, we consider only our human raters. Fleiss’s κ\kappa is a way to measure the degree of agreement between three or more raters when the raters assign categorical ratings to a set of items. The coefficient κ\kappa ranges from 0 to 1 where κ=0\kappa = 0 indicates no agreement at all among the raters and κ=1\kappa = 1 indicates perfect inter-rater agreement. Taking only our 7 human raters, we see a “Fair” correlation between raters (κ\kappa = 0.238, z = 24.6, p-value << 0.001 ), and with adding our SCO as a coder, this drops (κ\kappa = 0.208, z = 24.5, p-value < 0.001 ), but remains within the “Fair” category. Evidently, our human raters are not great, and the SCO results do not bring them down too much. In fact, we can repeat Fleiss’s κ\kappa excluding one rater at a time and find that excluding rater number 4 yields better results than excluding the algorithm (κ\kappa = 0.246, z = 24.6, p-value << 0.001 ).

Overall, the interrater reliability measures indicate low consistency between the raters. However, there is some positive association between CSSO results and the expert reviews, and the inter-rater reliability is an issue within the human coders as much as between them and the algorithm. Of course, this is not the same as IRR in a traditional SSO exercise because the specific items observed by the algorithm were not seen by the human coders. Rather, this tells us to what extent the results are consistent with previous work in the assessment of the coders.

Known group validity

Another approach to testing validity is to consider known-group validity. Specifically, due to previous work on the perception of safety, we might expect there to be a difference between the experiences of men and women a priori. Therefore, we consider whether we find anything different between what is associated with safe and unsafe places drawn by male versus female participants.

In the previous section, it has been established that it is possible to establish and use a dependency measure to identify the features that most strongly define an area type and use them for classification. In this next section, the same process is repeated; however, this time the following four categories are used: (1) Safe Polygon / Male, (2) Safe Polygon / Female, (3) Fear Polygon / Male, and (4) Fear Polygon / Female. Here, the gender title denotes whether a female or male participant drew the polygon. The objective of this analysis is to establish whether we can go further than in the previous analysis and try to identify key features based on gender and polygon type.

Safe Polygon / Male

Safe Polygon / Female

Fear Polygon / Male

Fear Polygon / Female

Feature

χ2\chi^2

Feature

χ2\chi^2

Feature

χ2\chi^2

Feature

χ2\chi^2

Road surface

0.062

Car

0.058

Urban design

0.172

Infrastructure

0.137

Tire

0.069

Natural landscape

0.019

Wheel

0.114

Biome

0.087

Property

0.041

Vegetation

0.007

Vehicle

0.148

Fixture

0.082

Mode of transport

0.065

Road

0.005

Window

0.112

Moon

0.117

Motor vehicle

0.033

Cloud

0.102

Natural environment

0.080

Land lot

0.034

Nature

0.082

Door

0.048

Grass

0.025

Thoroughfare

0.060

Light

0.046

Line

0.023

Automotive mirror

0.028

Horizon

0.067

Facade

0.033

Table

0.038

Condominium

0.032

Lighting

0.022

Real estate

0.021

City

0.075

Woody plant

0.018

House

0.019

Automotive parking light

0.013

Land vehicle

0.019

Architecture

0.012

Shade

0.019

Street light

0.012

Automotive lighting

0.019

Ecoregion

0.011

Shrub

0.016

Vehicle registration plate

0.008

Automotive design

0.016

Residential area

0.008

Slope

0.014

Automotive tail & brake light

0.005

Awning

0.018

Terrestrial plant

0.005

World

0.014

Product

0.005

Cottage

0.009

Automotive tire

0.008

Interior design

0.007

Pole

0.007

Forest

0.007

Home door

0.017

Top features for each polygon type and gender combination where difff>diffavgdiff_{f} > diff_{avg}

Table 3 provides the χ2\chi^2 scores for the features, which were identified where the difff>diffavgdiff_{f} > diff_{avg}, following the same process as described in Section 5.2. As is evident, there are a different number of features for each gender and polygon combination, and the χ2\chi^2 scores indicate a very weak dependency. In addition, this grouping of features appears contradictory and unexpected. For example, ‘Road surface’ is the strongest feature for Safe/Male and yet ‘Urban design’ is the top feature for Fear/Male. Classification is then performed based on these features (same process as in Table 2, Section 5.2) to demonstrate which features result in the best classification accuracy.

The results demonstrate that using only the first features for each combination yields the best classification capability. However, the results are generally poor at around 60% average F-Measure when using the feature set containing one feature per polygon type. These features are: ‘Road surface’ for Safe/Male, ‘Car’ for Safe/Female, ‘Urban Design’ for Fear/male and ‘Infrastructure’ for Fear/Female. These relationships are not as expected, and the low dependency score demonstrates that they do not have a strong dependency, and therefore are not suitable to be used as key classification features. It is interesting from the results that the precision varies significantly, yet the recall remains fairly consistent. This demonstrates that few false negatives are made, whereas a large number of false positives are made influencing the Precision.

Feature Set

Polygon Type

Precision

Recall

F-Measure

Road surface

Safe/Male

39.2

99.1

56.19

Car

Safe/Female

46.7

98.3

63.30

Urban design

Fear/Male

42.9

99.3

59.88

Infrastructure

Fear/Female

87.5

98.8

92.82

Road surface, Tire

Safe/Male

25.7

97.6

40.70

Car, Natural landscape

Safe/Female

16.7

88.9

28.07

Urban design, Wheel

Fear/Male

42.9

99.1

59.84

Infrastructure, Biome

Fear/Female

94.3

98.4

96.30

Road surface, Tire, Property

Safe/Male

40.0

98.8

56.94

Car, Natural landscape, Vegetation

Safe/Female

5.9

90.5

11.05

Urban design, Wheel, Vehicle

Fear/Male

57.5

98.5

72.61

Infrastructure, Biome, Fixture

Fear/Female

87.0

97.7

92.03

Road surface, Tire, Property, Mode of transport

Safe/Male

47.1

98.0

63.59

Car, Natural landscape, Vegetation, Road

Safe/Female

4.8

92.0

9.06

Urban design, Wheel, Vehicle, Window

Fear/Male

62.5

98.4

76.46

Infrastructure, Biome, Fixture, Moon

Fear/Female

72.0

98.4

83.17

Results from performing classification using an increasing number of the top features for each gender and polygon combination. Only results up to a set size of 4 are presented, beyond which the F-Measure scores continue to deteriorate.

Overall however it does not look from these results that male and female participants differ in what features are associated with their safe/ unsafe places. Whether this is due to a failure of the algorithm to detect these differences, or due to there being no gender differences in perception of safety of places however we cannot answer here.

Discussion

In this paper, we introduce the process of Computational Systematic Social Observation, as an approach to carrying out replicable environmental audits of large and bespoke areas, in order to facilitate an understanding of the relationship between environmental features and people’s perceptions of safety. Specifically, we explore whether we can gain meaningful insight without calibrating the models with training from either researchers or through crowdsourcing, rather seeing if it is possible to employ off-the-shelf solutions to achieve a fully data-driven, bottom-up approach with no a priori hypothesis, and how that holds up against expected results based on the extant literature.

Therefore, the contribution of the paper is two-fold. First, we present an exploration of people’s self-defined safe/ unsafe areas in relation to the environmental features which characterise these places. We find that some features can be more strongly associated with safe or unsafe places and further demonstrate that these associations can be used for classification using a simple logic-based classification approach. The results can be interpreted from the theoretical framework of place-based fear of crime studies which consider prospect, refuge, and entrapment as the main drivers of perceived (un)safety. The features associated with safety: Light, Thoroughfare, Mode of transport, Infrastructure, Road surface all suggest prospect - the good visibility, ability to see through a place, and perceive through passage (especially thoroughfare!). However, nature words are associated with both safe (Tree, Nature, Biome) and unsafe places (wood plant, plant, ecoregion). This raises the important point of context. This approach cannot tell us what the quality of the features is, simply that they are present. This is also what became evident from the comparison with the traditional SSO site visits. The same feature (e.g., bench) was coded as positively and negatively associated with fear of crime depending on the different condition, appearance, or context. This suggests that human interpretation is required to be truly useful and meaningful, at least in this case.

Secondly, we also considered evaluating our novel approach by exploring some validity and reliability exercises. We found that generally there is a positive association between the coding of features as safe or unsafe achieved by associating the identified and extracted features with the safe/ unsafe areas and coding of the same features by experts in built environment perception. Although the inter-rater reliability scores were generally low between the individual raters and the algorithm, these were also low between the raters themselves. This reinforces the points made by [17] that strength training and supervision are required for SSO, making this an expensive and labour intensive process. This is something which the automation process can address, and allow us to draw inferences from data on much larger scales. Thereby, CSSO can scale up SSO to provide a faster (and cheaper) way to observe larger areas and identify environmental features associated with areas perceived as safe/ unsafe. Perhaps the approaches which combine this method with crowdsourced calibration from human coders (e.g. [41] are one solution. One particular advantage here was how the data collection steps allowed us to collect systematic observations for entire areas of interest, the definitions of which can be flexible. This method could easily be used for street segments or statistical or other neighbourhood boundaries, but it can also be used for more flexible, person-specific definitions of the neighbourhood, such as egohoods, individualised context measures based on the residential location of a person [74][64]. Therefore, CSSO allows for measuring environmental correlates of various outcomes, health, crime, fear of crime, etc, at any unit of analysis, making it applicable to more robust, person-centric measures of neighbourhood, such as egohoods.

Future work and limitations

This paper presents a first step towards wider implementation of Computational Systematic Social Observation for applications in environmental research considering features associated with people feeling safe and unsafe in public spaces. To further establish this method, future work should consider applications in different contexts. We are certain there are other applications in environmental, criminological, and urban studies that could employ CSSO.

Another point is that here we used a pre-trained algorithm provided by Google Vision, which recognises certain elements of the built environment, but not others, which may be more relevant to perceived safety. Although using something that is off-the-shelf has benefits both in terms of cost, but also in it being evaluated for various issues (e.g. [66]) making it a robust choice, it may be unable to pick up on the specific interests of the research question at hand. For example, in our case, the algorithm results are unable to distinguish between different types of urban environments (for the most part). For example, while certain features were coded as positively associated with fear of crime by both human coders and our algorithm (e.g. wired fences, hazards), more urban elements of the built environment (e.g. building, window, house) were coded uniformly associated with fear of crime by the algorithm, while our human coders interpreted these with more nuance.

This was found in previous work using GV; [67] finds that Google is better at recognising some objects than others. For example, looking at images from news articles, an image of a body being removed from the crime scene is tagged with the terms vehicle, car, profession, and labourer while the ambulance, the body, and the police tape are all overlooked or not identifiable.

A custom algorithm, trained to distinguish between different types of building, or recognise signs of guardianship, as well as features of prospect, concealment, and entrapment, could help better distinguish between different areas. We can consider training a bespoke computer vision algorithm as the training recommended for SSO observers by [17]. This training would only need to be carried out once and could be applied to studies across different domains where SCO can be applied to identify environmental features of interest in some (any) spatial unit of analysis.

We mentioned the flexibility of the approach to various spatial units of analysis, and we could expand upon this to consider spatiotemporal paths. For example, in their examination of burglaries using a Google Street View walk-through [20] suggest to “...mimic the journey-to-crime route taken using a virtual GSV walk-through” (p.298). SCO could extract features for multiple such journeys, building sequences of features associated with the journey to crime, journey to victimisation, or journeys which lead to fearful experiences, which may cause harm in themselves, or prevent future journeys on foot or by other active travel modes. This could offer a nice complement to data collection and analysis to studies such as [75] in linking real-life experiences with computer-assisted observational work.

While we did not find strong differences in the features extracted from safe and unsafe areas rated by men or women, consulting with more recent research, this result might not be so unexpected. [33] find that the effect of biological sex was no longer significant after adding individual characteristics to our model, which suggested that the effect of biological sex on the assessment of environmental safety may be qualified by an indirect effect of these individual characteristics. If we compare only between gender maybe we miss nuance, future work would focus on psychological differences.

This early work, although promising, has also the following significant limitations:

  • The work only physical environmental features, not social, are considered. However, from a design and urban planning and situational prevention point of view, the environment is important, and environmental characteristics are important in certain contexts (e.g. fear of crime).

  • The work only considered a small sample of the available polygons, but this is due to space requirements and time restrictions. In this study, a small subset was handled to be achievable, but also to demonstrate if there is a benefit in the presented approach.

  • The most appropriate step size between images is still not known, and this has significant implications for the size.

  • Thea approach is limited to areas and times where GSV photos have been captured [20]. Although areas are ever-expanding, meaning the opportunity to carry out SSO in more remote areas may be limited.

It is also the case that “The key disadvantage of observational methods in neighbourhood research, of course, is that they cannot capture the theoretical constructs that require resident perspectives.“ ... “Nevertheless, when used in conjunction with survey-based methods, direct observation can provide an independent source of data that can strengthen inferences about neighbourhood social organisation and its consequences.” [45] (p.11). Our outcome variable is the subjective experience of our participants, asked to draw boundaries around areas they perceive as safe or unsafe. There are extensive discussions on measurement issues related to self-reported fear of crime [76][77][78][79][80] and future work could consider different ways to operationalise our outcome measure, and whether that might alter any conclusions drawn.

Conclusion

This paper introduces Computational Systematic Social Observation (CSSO) as a new approach to automating the collection and identification of environmental characteristics in geographies of interest. Using Google Street View and the Google Vision API, CSSO provides a scalable and replicable approach to analysing built environments, addressing the limitations of traditional methods that rely on manual observation. This approach does not require crowd-sourcing, training, or other resources beyond computational power. To test whether it can still be effective to study crime and place, we explored the ability to identify environmental features associated with safe or unsafe places. The findings reveal the potential of CSSO to identify specific environmental characteristics associated with safe / unsafe places. Specifically, we find that the ability to see a path through the area (linked to the ideas of prospect, refuge, and entrapment) is associated with safe places. However, we also see a lack of contextual information inherent in this approach. This reduces both bias and detail and validity in the findings.

In general, our experimental use of CSSO offers a template for a practical and data-driven approach to study urban safety through an environmental lens. It has shown good promise as an appropriate mechanism for identifying and classifying areas as whether people feel safe or fearful based on features of the built environment extracted through computer vision. As highlighted in Section 6.1 this area is fertile and there is great potential for the future. This work has successfully developed and tested a methodology at an appropriate scale to gain confidence in the approach. This innovation contributes to researchers and planners’ toolkits for data-driven analysis.

Declarations

Availability of data and materials

All experimental datasets, scripts and software are available from the corresponding author upon request.

Competing interests

The authors declare that they have no competing interests.

Funding

No outside funding was used to support this work.

Authors’ contributions

All authors read and approved the final manuscript.

Appendix

Polygon Information

Table 5 presents the information for each of the 100 polygons, including the number of coordinates generated using the approach presented in Section 4.3.1, the number of images extracted by Google Street View, and finally, the size of all images combined the processing time in minutes of how long the technique took to execute.

Area

Number of Coordinates

Number of Images

Size in MB

Processing time

Area

Number of Coordinates

Number of Images

Size in MB

Processing time in minutes

0

708

2751

1597

149

50

3338

12032

7533

668

1

741

2473

1448

135

51

243

925

563

50

2

265

725

476

42

52

585

2278

1080

115

3

16204

41587

23763

4114

53

28433

80747

43595

6326

4

850

3288

1872

173

54

35

122

71

11

5

2139

3963

2609

549

55

295

1129

709

62

6

2324

6710

4478

557

56

2392

3794

2605

593

7

2389

9138

5595

464

57

1356

2826

1630

376

8

14964

47344

24990

3412

58

118

454

265

23

9

13013

38618

22697

3094

59

754

1605

1025

92

10

84

330

193

18

60

2532

9904

5705

516

11

4122

7645

5124

1012

61

268

148

76

88

12

142

175

86

44

62

205

793

465

41

13

2728

5060

3406

666

63

230

893

525

46

14

6235

16723

9472

1549

64

3413

4989

3407

868

15

29704

103247

59971

6380

65

85961

1875

1084

182

16

106

409

231

22

66

401

969

567

106

17

2318

4990

3141

637

67

125

474

254

29

18

16210

52248

26652

3446

68

233254

51875

28724

40491

19

1327

3918

2297

264

70

32057

31460

16661

10415

20

69

276

180

14

71

153

571

330

35

21

188

709

424

40

72

86401

75133

42504

11464

22

48

112

69

13

73

124

463

283

27

23

18

72

42

4

74

513

1337

760

130

24

61

239

139

13

75

6645

14810

8140

1807

25

5077

9559

6270

1236

76

593

2300

1224

116

26

1528

1699

978

136

77

849

2750

1563

189

27

2589

5178

3030

726

78

73

286

169

14

28

329

1291

750

63

79

151

561

286

31

29

1563

5584

3321

326

80

401

1558

733

83

30

86

335

195

18

81

612

483

242

132

31

190

723

378

40

82

76575

73593

40244

11464

32

431

1033

595

114

83

9272

3308

1981

3254

33

2389

2703

1848

637

84

18809

69930

41107

3851

34

6062

23252

13567

1204

85

40833

55637

32809

11464

35

5454

18350

10917

1067

86

8705

30075

17609

1735

36

277

1084

667

54

87

1161

4330

2609

243

37

605

2373

1126

116

88

1481

5142

3249

301

38

52

197

124

10

89

1481

2950

1706

417

39

384

1514

791

73

90

7233

17756

8403

1739

40

149

580

351

31

91

16260

54754

32459

3551

41

420

503

300

134

92

547

1989

1108

112

42

3972

15290

9048

804

93

792

1286

881

154

43

8368

28742

16281

1558

94

327

1143

682

68

44

4321

16086

9881

872

95

3141

10573

6531

578

45

17869

41225

23799

4772

96

5303

20303

11598

1050

46

3191

343

208

1150

97

12198

41702

23789

2617

47

647

395

200

126

98

28

107

70

6

48

813

2085

1251

115

99

6048

19369

11005

1377

49

12899

37804

21833

3078

100

350

1129

692

83

Information for each polygon in terms of number of coordinates and images

Comments
1
?
Ibrahim Nicekid:

Hello Everyone,

I want to publicly thank Alliance Digital Hack, a group of professional private investigator and a certified expert in Bitcoin Recovery Services. Their assistance in helping me recover all the money I lost to fraud was top notch and unbelievable. An online manipulation artist who represented themselves as knowledgeable and experienced in the field of Crypto investments conned my wife and myself. My $356,000 worth of funds were put into cryptocurrency. I was left helpless after the fraud tricked us and had to spend hours looking for a Crypto recovery service to get my money back. The specialist I found was Alliance Digital Hack. I just had to be patient after describing my situation to the expert and all of my money was returned to my wallet in less than 72 hours. Thank you Alliance Digital Hack for your excellent assistance in getting my money back. If you are interested in hiring there services, they can be reached through their contact details below.

Email: alliancedigitalrecovery @ outlook.com

Whatsapp: +44 7.4.5.2. 2.4.7.2.7.7 thank me later.