Using Computational Systematic Social Observation to Identify Environmental Correlates of Fear of Crime
by Reka Solymosi, Simon Parkinson, Andrea Pődör, Saad Khan, and Muhammad Khan
Published onSep 02, 2024
Using Computational Systematic Social Observation to Identify Environmental Correlates of Fear of Crime
·
Abstract
There is great importance in understanding whether people perceive an environment as safe or unsafe. Perceptions are influenced by the built environment, and through better understanding, design interventions can be made to improve the feeling of safety. There is a rich body of research on this topic, yet it requires a lot of manual effort. In this work, we present an approach named Computational Systematic Social Observation (CSSO) to automate the collection and analysis process. The approach uses Google Street View and the Google Vision API to extract characteristics (herein referred to as features) of the built environment that is used to automate the process of understanding whether people will feel fear or safety. In testing this approach, we extracted ~1.3M images for the 100 locations and identified 297 features of the built environment. A measure of dependency demonstrated that some are more strongly associated with areas where people express a feeling of safety or fear. Further, through empirical testing, it is observed that these features can be used for classification. The results demonstrate the potential of the technique and were compared with human coders. The presented methodology and experimental research provide a foundation for systematic computational observation to identify environmental correlates of fear of crime.
Introduction
Whether people perceive a specific place as safe or not has implications at micro, meso and macro levels. Perceived insecurity affects individual mental and physical health [1][2] and leads to decreased social interaction [3][4][5]. At the community level, perceived insecurity can affect the real estate market [6] and lead to social stigmatisation of areas [5]. People are less willing to walk or cycle through places thought unsafe, reducing exercise which leads to negative public health results [3] and producing higher levels of emissions through choosing non-sustainable travel modes [7][8][9]. Due to its impact at individual, community, and systemic levels, perceived insecurity is important to address. Many studies explore the effect of neighbourhood design consider on perceived unsafety (e.g., [10][11]). By understanding what features of the environment affect the perception of an area as unsafe, we can inform planning and design initiatives that aim to reduce these negative outcomes.
To understand what elements of the environment are associated with perceived unsafety (or any other social phenomena), researchers must systematically take stock of these features. One approach is to carry out systematic social observation (SSO) of the areas of interest. [12] advocated for a systematic approach to social observation, namely the direct observation of social settings for research purposes. In this case, a systematic approach meant that observation and recording follow explicit rules and allow replication [13]. SSO, therefore, consists of using observations conducted by teams of trained assessors walking (or driving) through the study area, recording or rating characteristics according to pre-established criteria, to catalogue environmental features relevant to the objective of the study [13]. SSO has been widely adopted in social sciences, including criminology, where researchers have used SSO to study environmental factors associated with various topics such as robbery [14], burglary [15], and police deescalation tactics [16].
A frequent criticism of SSO concerns bias in these observations: bias can be introduced by inconsistencies between different observers and can lead to unreliable conclusions [17]. Another issue with SSO is scalability: due to the generally expensive process, the teams of researchers carrying out SSOs usually cover only a subset of an area of interest. Although this subset can be large, for example, [18] assessed 20% of all occupied face blocks in 66 Baltimore neighbourhoods while [13] collected measures of disorder across block faces within 200 census tracts, these remain samples rather than a complete inventory of the entire area of interest. How this sample is chosen can introduce further bias into the SSO process.
One approach to cutting costs associated with SSO and scaling is to employ SSO to images of the areas of interest rather than to carry out these observations through site visits. For example, many researchers have used Google Street View (GSV), a web-based collection of street-level imagery, to virtually “visit” areas and code the features of interest based on these images [19][20][21][22]. Using image-based SSO as opposed to site visits has been one approach to address scalability. However, this approach still requires the time of human raters, retaining the issues of bias, and some resource intensity, i.e. while the site visits are eliminated, the human coders must still spend time and effort coding the images, meaning there is still a selection of samples that means scaling up costs more human and, therefore, financial resources.
One possible solution is to attempt to automate the entire SSO process by collecting a representative set of images of the areas of interest and then using a computer vision algorithm to extract the present features. This results in a comprehensive and replicable catalogue of features in a particular area, which can be scaled without (much) cost. Some attempts to implement this have been recently published, for example, in relation to extracting characteristics of the built environment associated with crime [23][24], or in looking at specific elements of an environment in relation to fear of crime, such as the effect of green space [25], or the possibility of guardianship through the presence of windows [26]. All of these papers demonstrate specific examples of applying such a method to a specific question. In this paper, we take a step back and detail and evaluate a systematic process to automate SSO, which we term Computational Systematic Social Observation (CSSO). CSSO automates both the data collection process, building a database of images for each area of interest, and the feature extraction process, building a database of features present in these images. By doing so, we open up the steps of this process to a broad range of applications in criminology, and wider social sciences, urban studies, and policy research.
To demonstrate this approach, we apply CSSO to consider the question: what features are associated with areas labelled safe and unsafe? Using a data set of 49 safe and 51 unsafe areas drawn on a map by residents of Nagykanizsa, Hungary, we applied computational systematic social observation to 1) collect images of the areas and 2) extract the features of these images to a database. In doing so, we first establish a technique for generating a sufficient set of coordinates within each polygon from which to extract Google Street View (GSV) images for analysis by querying the Google API using each coordinate, acquiring four street-level images at north, east, south, and west orientations. Once a set of GSV images is extracted, to represent the area within the polygons of interest, we apply the Google Vision AI pre-trained computer vision algorithm to detect objects within the GSV images. We then compare the features between the safe and unsafe areas, facilitating an exploratory, data-driven approach to understanding elements of the built environment which impact perceived safety in a fully replicable and scalable manner (and on a large scale which provides comprehensive coverage of the areas of interest).
Background
The role of the built environment in preventing or facilitating crime opportunities, as well as generating feelings of safety or fear, is increasingly better understood. Crime prevention through environmental design (CPTED) highlights the role of urban design in enhancing community safety [27]. “The proper design and effective use of the built environment can lead to a reduction in the fear and incidence of crime, and to an improvement in the quality of life” [28]. It therefore follows that to reduce instances of fear of crime, we must explore which environmental features are associated with perceived (un)safety, to inform these urban regeneration initiatives.
Fear of crime and the environment
Here, we approach fear of crime as an experience which poses a barrier to people living healthy lives, enjoying sustainable travel, and benefiting from these equally. This experiential fear of crime must be understood as a context-specific experience, which people encounter as they go about their daily routine activities [29]. This experiential fear [30] is what people experience when doing their routine activities, which is situation specific and can be reduced with environmental design measures. We can consider this an emotional response, something in which risk perception (informed by cues in the situation) plays a key role [31]. People inferring relevant information from their immediate environments is one of the hallmarks of effective human adaptation to their environment [32][33], and we can understand that this is influential on context-specific experiences of fear.
Of course, the characteristics of the person “doing” perception also play a role; some variance in perceived safety of certain areas is due to individual characteristics [31]. However, there is an even larger portion of variance which can be explained by the characteristics of the environment itself [33]. In this paper, we focus on these environmental factors and, to do so, we must “place” these experiences of fear of crime, i.e. locate them in a specific area. As such, this experiential fear is the focus of our article, aligning ourselves with a definition of crime fear as a context-specific experience. In this way, various environmental cues lead people to perceive some spaces as more prone to crime than others [34].
Generally speaking, the environmental factors associated with fear of crime by previous research can be grouped into three categories: visibility, physical signs of disorder (environmental incivilities), and social characteristics (e.g., groups of people hanging around, drinking, and rough sleeping). The latter two, signal disorders/ signal crimes, sometimes called environmental antisocial behaviour or incivilities [35][36], and social characteristics of areas which are less about environmental design and more about indicators of activities such as street drinking, panhandling, or groups of people loitering [37]. These physical and social signs of the disorder, such as graffiti, litter, vandalism, and other signs of neglect of the environment, are seen as drivers of fear [5]. These elements are seen as “signals” [36], which communicate to people a lack of commitment to social norms [36][38]. While important, it is not necessarily in the realm of the urban planner to be able to address these as much as they are able to address visibility. Therefore, we will focus on this idea of visibility, which is summarised by the concepts of prospect, concealment (also called refuge) and entrapment [39][33][40][5].
Prospect refers to the ability of individuals to see the openness of their immediate environment [39], attempting to capture the extent to which an environment offers a good overview to an observer. From an urban design perspective, the prospect can be conceptualised as a clear line of sight, including being able to see easily see from one side of the street to the other [28]. A better prospect in an area is associated with environmental safety judgements [33]. Darkness (linked to absent or poor lightning) reduces prospect, increasing feelings of fear [40][41][5].
In addition to an area with low prospect, concealment suggests the existence of places in which the potential offender could be hiding [42][43][38]. These may include places that are not visible because they are isolated, or sight lines which are obstructed by vegetation, landscaping, or poorly designed buildings that are perceived to increase the risk of attack and therefore fear [5]. Better lighting reduces places for would-be offenders to conceal themselves [5]. In this sense, better lit areas allow for the natural surveillance of an environment. Another feature associated with concealment is dense vegetation. Previous studies have found that dense vegetation is related to increased perceived insecurity [39] as it can communicate hiding places of potential offenders and poor maintenance of spaces, representing social disorder and threat. In contrast, well-cared vegetation, such as grassy areas tended, high canopy trees, flowers, and low bushes, increases feelings of safety because it communicates social order [38].
Finally, entrapment refers to “the extent to which people judge an environment to possess characteristics that impede escape from dangerous situations ”[33]. Returning to lighting, dimly lit places reduce an individual’s field of vision, signalling that the individual may face difficulty escaping when faced with a potential offender. This idea of entrapment emerges as areas with blocked escape are highly associated with fear of crime [38]. Such obstructions to visibility also create the feeling of being ‘trapped’; by contrast, a sense of ‘openness’ in the environment is reassuring [5]. To address entrapment, urban designers can consider the connectivity of a place, in terms of the present infrastructure (e.g., footpaths, laneways, or streets) [28].
These characteristics of the built environment: prospect (perceived overview of a scene), concealment (perceived environmental affordance of hiding places), and entrapment (perceived escape possibilities), are associated with fear of crime in specific places [33]. So how to measure these? What specific features of the built environment might be present or absent to create these perceptions? To answer these questions, it is important to gain a catalogue of the environmental features present in safe and unsafe places. Traditionally, Systematic Social Observation (SSO) has been the preferred method for conducting environmental audits to gather data suitable for quantitative analysis.
Systematic social observation
An approach to understanding what it is about the built environment that makes people feel safe or unsafe is to carry out environmental audits of areas considered safe or unsafe. This is by no means the only approach; there are also experimental studies which involve the pre-hoc selection of environmental stimuli designed to elicit responses of (un)safety. In such studies, the participant may be exposed to the stimuli by walking through a pre-determined route which contains features of interest [39][43][44]. Another group of studies in the experimental research design group exposes participants to relevant stimuli using images or a pre-recorded video which contains features traditionally associated with fear of crime, and then analyse the relationship between levels of fear that people indicate and the score of the image on various fear of crime indicators (e.g. [38]).
However, observational studies allow us to explore context-specific experiences in places where people’s routine activities take them through daily. Observational studies of fear of crime and place involve collecting data on perceived safety in a particular area and then correlating the measures of perceived safety with features of the environments observed retrospectively in these areas. Such studies use direct observation to catalogue the environmental features present in these areas. Direct observation is fundamental to the advancement of science [45]. [12] advocated systematic social observation (SSO) as a key measurement strategy for a wide variety of social science phenomena [45]. SSO refers to systematically aggregating all features of interest following explicit rules that allow replication [12]. It is also important that the means of observation, whether a person or a technology, must be independent of that observed [45], and there should usually be multiple observers to allow performing interrater reliability exercises to validate the SSO data. SSO has been widely adopted in criminological studies, including the evaluation of therapeutic communities for drug offenders [46].
While SSO has been widely adopted and beneficial in social sciences, it is not without limitations. Two key issues associated with Systematic Social Observation (SSO) are observer bias (in non-agreement between observers [17] and non-random selection of environments to survey [33]) and scalability [47]. Bias arises from the human observers in SSO projects for various reasons like human subjectivity, resulting in non-agreement between observers, observer error, and even “cheating” by observers to reduce their workload or record in a desirable location [48]. [17] explore the issue of bias in SSO coders in detail and conclude extensive training of observers as a remedy.
Another limitation lies in the decisions that researchers must make due to resource constraints. Specifically, about units of analysis and the sampling frame of areas of interest [48]. Since human observers cannot cover everything within realistic time and cost constraints, they must select a sample that is not truly random, as it is based on on-site decisions [33]. Since human resources are limited, we may see bias in the areas observed with SSO.
A related issue is that of scalability. SSO research studies usually require some form of sampling of the desired areas of interest to make it feasible for observers to cover these areas. For example [45] combine Chicago’s 865 census tracts into 343 neighbourhood clusters, and then sample only 80 of these for their SSO study. Still, carrying out SSO on these neighbourhoods was a large undertaking “Between June and September 1995, observers trained by the National Opinion Research Center (NORC) drove a sport utility vehicle at a rate of five miles per hour down every street within the 80 sample NCs. The composition of the vehicle included a driver, a videographer, and two observers.” [45] p.13. In this process, “NORC collected data on 14 variables in the 23,816 observer logs with an emphasis on land use, traffic, the physical condition of buildings, and evidence of physical disorder.”[45] p.13. Even after collection this data was not fully utilised: “By contrast, because of the expense of first viewing and then coding the videotapes, a random subsample of all face-blocks was selected for coding. Specifically, in those NCs consisting of 150 or fewer face-blocks, all face-blocks were coded.”[45] p.13 “From the videotapes, 126 variables were coded, including detailed information on physical conditions, housing characteristics, businesses, and social interactions occurring on each face-block”[45] p.14 Clearly, implementing in-person field audits can be expensive if observations are needed over large or geographically dispersed areas or at multiple points in time. A reliable and more efficient method for observational audits could facilitate extendibility (i.e., expanded geographic and temporal scope) and lead to a more standardised assessment that strengthens the ability to compare results across different regions and studies [47][49].
Computational social science approach
The above section detailed the shortcomings of field audits such as SSO. These direct observations require a visit to each area by trained observers, which can be an expensive and time-consuming method, especially for a large-scale or geographically distributed study [49]. However, the advent of new, freely available remote sensing technologies provided by Google and more recently by Microsoft has been seen to offer new possibilities to obtain geospatial data collection, including much greater coverage of many parts of the world [49][50]
Technological advances in research methodologies facilitate academic researchers and policymakers to scale up investigations into the key issues facing individuals, communities, and societies. In the space of environmental audits, web-based geospatial services are increasingly being used by researchers to perform ’virtual’ environmental characteristics audits [49]. These platforms, including Google Street View (GSV), Google Earth, Bing Maps, and others, have been used in a range of domains to identify relevant features of built environments related to a range of outcomes [51]. For example, [20] manually collected a set of Google Street View images related to street addresses where burglaries had taken place, and conducted Systematic Social Observation (SSO) through GSV to identify commonalities between burgled areas. SSO has also been applied to video recordings, both CCTV and citizen recordings, for example, of police interactions [52]
However, while the collection of images representative of the environments which otherwise would have had to be visited on foot is sped up by the use of technologies such as GSV, the requirement for individual raters to go through each image and catalogue the features which they find is not solved. The issue of time taken by the researcher remains, and therefore neither is the scalability of this approach nor the issues of bias. To address these, we can consider the use of a computer vision algorithm to catalogue the features in the images instead.
Previous research in studying the relationship between crime (including the fear of) and the built environment has used supervised machine learning approaches to learn relationships between data classified by participants. However, it is evident that in previous works, the environment, which is often displayed as a street-level image, is collapsed into a small set of features. For example, [41] focus on the colour properties of the image and [23] extracts only eight key environment attributes. Although these works provide the promise of the potential to use street-level images for understanding the fear of crime, there is a substantial opportunity to progress understanding by considering a larger and diverse set of environmental features. In a similar study, [53] combined an automated image collection approach combined with a supervised machine learning algorithm trained through crowd-sourcing through Amazon Mechanical Turk to understand how they might identify visible conditions of urban environments at a large scale. In all these cases, supervised machine learning in terms of image feature recognition and classification is used, meaning that the algorithms are trained based on features predetermined to be of interest to the research. What we do is different in that we explore a fully exploratory approach, where we do not employ crowd-sourcing for any sort of training. We take a wholly inductive approach that is data-driven and entirely bottom-up with no prior hypothesis. There is an absence of research exploring the use of generalised pre-trained algorithms (e.g., Google Vision AI) with the capability to identify a significantly larger set of features.
In one recent paper, the authors used a similar approach as adopted in this paper, where Google Street View (GSV) images are acquired for an area and processed to extract features of the environment [24]. The authors focus on the area of Santa Ana in the United States, where they have information on different types of crime. They use Deeplab3++ as their machine learning engine, which is a type of Convolution Neural Network. Their approach captured GSV images every 20 metres in different orientations (north, south, east, and west), before using machine learning to recognise the occurrence of 11 predefined environmental features to understand their relationship with the different crime categories. Other recent work has followed a similar method but with different sets of environment features [23]. The work presented in this paper follows a substantially different methodology, where the aim is to explore if any features of the environment influence whether or not an area is categorised as safe or unsafe. To undertake this, it is necessary to perform an experimental approach whereby more images are collected, as we do not have a precise point-level location of interest, and to perform the extraction of a larger feature set. In this way, what we employ is a computational systematic social observation at area-level, which could be applied to while neighbourhoods (or other units of interest).
The current study
In this paper, we demonstrate the use of automatic image extraction within a polygon of interest from GSV, combined with machine learning to identify features of the built environment in these images, by identifying what features are associated with perceptions of safety. By combining the automated collection of images representative of areas relevant to people’s experiences of fear of crime with the automated extraction of specific features present in these areas, we can scale and automate SSO. This approach we term computational systematic social observation. We believe this allows the study of environmental features associated with safe/ unsafe places at unprecedented scales. Compared to human coders extracting features from images, GSV is much more time efficient. For example, [54] found that Google Vision (GV) spent 5 minutes to codify 1,818 images, while the human coder needed 35 hours to complete the same task, making Google Vision 14,880% cheaper. This means that in the same period, many more images can be processed, solving the problem of selecting non-representative samples of areas [33] by auditing the entire area. The issue of coder bias or non-agreement between coders [17] may be addressed in this way as well. While [54] found only 52.4–65.0% of the images were similarly codified between human raters and the algorithm, they found that ultimately “even if the human coder generated more diverse and concrete tags, similar conclusions can be extracted.” Replicability is further increased by applying a uniform algorithm which is consistent between all areas, rather than hiring multiple observers to cover different parts of the study area.
Therefore, this paper demonstrates such a computational approach, paired with an automated way to collect images from our desired study areas. We present a large-scale data-driven approach to extracting and evaluating environmental features about the perceived (in)security of places. Our approach is based on a purely visual audit of environmental feature here, rather than making use of additional crowdsourced or big data sources that could be used to catalogue neighbourhood features or characteristics (e.g. [55][56].
Data and methods
Our research is motivated by two recent works. First, by [23], who identify the correlation between attributes of the built environment and crime. Their technique used Google Street View (GSV) images and publicly available police data from a town in northern England. Although the research was exploratory and had to overcome the limitation of handling location-approximate crime data, the technique to acquire images and extract features of the built environment is relevant to this study. The second key work is that by [57], who captured 3,955 polygons representing safe or unsafe areas drawn by 910 respondents in Hungry between January 2016 and March 2019. Note that the authors referred to unsafe areas as those where the participant feels fear, and in this research, we use the classification of unsafe throughout. The data acquired in the study by [57] is used in this investigation. This section provides more information on the specifics of the data used in this research, as well as the technical process undertaken to extract street-level images and use Google Vision API to code features in these images. The following list presents the high-level summary of activities undertaken in this research and presented in this manuscript.
A sample of 100 polygons where people rated areas as safe or unsafe is taken from data collected and published in research undertaken by [57].
A technique is developed to calculate longitude/latitude coordinates within each polygon with a distance of 20m between each;
Next, Google Street View is queried using each coordinate facing north, east, south and west to extract images;
The images then pass through the Google Vision API to extract objects based on their pre-trained algorithm;
Features are grouped for each entire area, followed by accumulating them for each safety category (safe/unsafe);
Finally, a dependency measure is used to understand how strongly correlated an object is to whether the area is safe/unsafe.
As one of the objectives of this paper is to demonstrate this method, each step is detailed in the following subsections.
Study area and sample
For this study, 100 polygons were used that originate from a study conducted in Nagykanizsa, Hungary [57]. As part of a larger study, participants were asked to draw digital sketch maps using a web application, marking areas where they felt safe with green polygons and areas where they felt unsafe with red polygons. Data collection was carried out online and respondents accessed the platform via social media without having to register. The study focused on nine Hungarian cities, chosen based on achieving a minimum number of 50 respondents per city. We focus on Nagykanizsa, specifically a sample of 100 polygons from this area, drawn by 86 unique individuals, 31 females, and 55 males. Since these were drawn by local residents and labelled safe or unsafe by them, we take their perceptions as a “ground truth” in that they have indicated that they feel this way and this is a subjective perception. Polygons represent a good split between areas classified as safe (n = 49) and unsafe (n = 51). Polygon number 69 in the data set was excluded because it was too large to constitute a meaningful response [57]1.
Dependent variable: Safe and Unsafe Places
A digital sketch map tool was presented to participants to draw boundaries for areas where they feel safe or unsafe in their home city [58][57]. Digital sketch maps and mental mapping as a scientific tool for detecting the perception of citizens about their environment have a long tradition [59][60][61][62]. One advantage of this technique is that it gets around the known issue of people’s experiences and self-defined neighbourhoods not necessarily coinciding with existing administrative boundaries. When collecting self-reported experiences of fear of crime, as with crime, scholars must take seriously the level of aggregation and spatial scale [63]. Such data can be collected at the point level, to pinpoint the exact fear of the experience of crime, and then study the environmental characteristics in these specific places [34]; however, this precision is better suited to data collected in real time, rather than retrospective reporting [29]. For studies concerned with neighbourhood-level fear and environmental correlates, defining a neighbourhood is not so easy. Most fear of crime studies examining the role of local context use almost exclusively administrative neighbourhoods with fixed boundaries [64]. The use of administrative boundaries is motivated pragmatically, due to data availability, and in the case of SSO, to provide a neat boundary for observers to cover, and lacks a solid theoretical justification. Administratively defined areas do not necessarily align with how inhabitants experience their unsafety [64] and instead, it is better to focus on the person–context perspectives, that is, to understand how people define their neighbourhood contexts [65]. Nevertheless, allowing participants to select areas of their city on a map which they find safe or unsafe addresses the issues of choosing the appropriate unit of analysis for fear of crime and also the issues around the Modifiable Areal Unit Problem [65]. These areas can then be used to better understand people’s lived experiences of safe and unsafe areas, creating boundaries that reflect these experiences. For more detail on this method including sample see [57].
Computational Systematic Social Observation
Once the boundaries of safe and unsafe places were collected, we applied CSSO to extract the environmental features present in each polygon. In this section we detail the steps required.
Coordinate generation
To extract Google Street View (GSV) images, it is necessary to have longitude and latitude coordinate values for each location where an image is required. The data set contains the coordinates (corners) of each polygon, but does not contain coordinate information within the polygon that is necessary for us to retrieve a GSV image. Therefore, we devised the following technique to systematically create longitude and latitude coordinate values within the polygon. The technique developed in the paper, in summary, takes coordinate pairs (start and end), generating new coordinates of a fixed distance from the previous in a straight line towards the end coordinates. the technique is repeated with all coordinate pairs as they are generated, ensuring that no duplicate coordinates in the same location, or fixed distance, are generated. Although this technique is exhaustive, it ensures a systematic approach.
step=111111d
The details of the approach are presented in Algorithm [algo:coordinategeneration]. The algorithm takes as input two sets of real numbers to represent latitude pairs LA=R and longitude pairs LO=R. Both sets are of equal size, and the same index location in each is a corresponding pair of latitude and longitude values. The algorithm works by considering each latitude/longitude pair in comparison with each other unique pair. The straight line difference is calculated and divided into equal 20-meter segments. A new latitude and longitude pair are then generated in equal increments of 20 meters, and the process is repeated until there are no more pairs to consider. At this point, the space becomes saturated and a complete set of coordinates have been generated. An example can be seen in the three images provided in Figure [fig:area0]. Figure 2 shows the five coordinates provided in the original data set. Note that only four are visible as the first and fifth are the same. Figure 2 illustrates all the points generated within the polygon, but since there are too many to visualise, a magnified excerpt is shown in Figure 3. It is noticeable that the technique does not generate the coordinates in a perfect grid formation, but this is because the polygon is not perfectly rectangular and therefore 20-metre increments from different starting coordinates generate different locations to their immediate neighbouring coordinates, which despite appearing to be on the same line, are moving slightly in both latitude and longitude. As demonstrated in Table 5, the number of coordinates that have been generated ranges from as little as 18 for area 23 to 233,254 for area 68.
Image extraction
Following coordinate generation, the next stage of this research is to take each individual coordinate pair for each polygon and extract a street view image. In this research, we use Google Street View (GSV) as it is one of the most up-to-date street view services and it can be programmatically controlled through the Google API. We query the Google API using each coordinate and acquire four street-level images in north, east, south, and west orientations. If the coordinates do not match a valid street location, then no images are retrieved. This happens on many occasions, where for example the coordinates are within a building or in an area of open space. In the interest of saving space, each image was captured in the resolution of 600×800 pixels. This was a necessity due to the total number of images and the amount of storage space acquired. In total, for the 100 areas, we acquired 1,295,298 images that occupy a total of 723 gigabytes.
An example is demonstrated in Figure 4, where a location is taken from a valid generated street view location for area 1. As can be seen in the figure, four images have been extracted for that one location in North, East, South, and West orientations. This specific example is on Zrinyi Miklós street which is visible in Figure 1.
As mentioned above, polygon number 69 has been removed due to its size. In total, 674,730 coordinates were generated and this is around 3 times larger than the next largest (polygon 68). Considering that the data collection phase for polygon 68 took 40,491 minutes, which is slightly longer than 28 days, it would take around 78 days. The reason why it takes so long to extract the images is that each image requires a few seconds to acquire and a timeout duration of 10 seconds is required for each location. If an image is not returned within the timeout duration, then it can be established that those coordinates are not on a street.
Table 5 also provides information on how many images were acquired for each polygon. At best, the number is four times the number of coordinates; however, in all instances is it lower as it depends on how many of the coordinates fall on valid street view locations. If the image does not fall on a valid location, then the images at that location in the four directions are not acquired. In terms of the number of images acquired for each polygon, this varies from as few as 72 for polygon 18 occupying 42 megabytes in space, to 51,875 images for area 68 occupying a space of almost 29 gigabytes.
Computer vision
In this research, a pre-trained computer vision technique is used to detect objects within the GSV images. In terms of this research, objects are things that are visible in the GSV image and can be automatically recognised. For example, buildings, cars, trees, etc. In previous and related work, object detection techniques were specifically trained to recognise a subset of environment features (eight in total) [23]; however, as we are using a pre-trained algorithm in this research, we are able to extract all objects recognisable to the algorithm. We use the Google Vision AI API as it is one of the more popular off-the-shelf computer vision architectures to extracting features from images, and is therefore widely tested in other domains [66][67][54]. In addition, each object is identified with a percentage confidence as to how certain the algorithm is that it has identified that object. In this research, we extract the top 10 objects from each image, providing their confidence scores are above 70%. Figure 5 illustrates an example whereby vehicles, buildings and windows are identified within the image. Note that the majority have a confidence above 70%.
Analytical approach
Validity checks
In order to report on the feasibility of CSSO for environmental audits, we consider a series of validity and reliability checks. We start with addressing face validity. For this, we compare features between safe and unsafe polygons to identify which are more associated with either type. For each identified object, we use a dependency measure to understand how strongly correlated an object is to whether the area is perceived as safe or unsafe. Specifically, we use a χ2 statistic to measure the independence between terms and categories in text categorisation [68]. The challenge of determining independence and dependence between terms and categories in information retrieval systems shares many characteristics of measuring the relationship between safe or unsafe places and features of the built environment. The χ2 statistical measure has many successful applications in data mining and knowledge extraction tasks, particularly those in information security [69][70]. In this research, we use a two-way contingency table of feature f and safety category (safe or unsafe) c, where A is the number of times feature f and safety category c co-occur, B is the number of times f occurs without c, C is the number of times c occurs without f, D is the number of times neither f or c occur, and N is the number of areas.
χ2(f,c)=(A+B)(A+C)(B+D)(C+D)N(AD−CB)2
The χ2 scores for each of the features and their relationship to their residing location’s safety category allow us to compute the χ2(f,c) scores between the two different safety categories using the following equation:
diff(f)=∣χ2(f,safe)−χ2(f,unsafe)∣
Finally, the average diffavg value is calculated for all diff(f) scores. We rank the features to identify those features with the maximum diff(f) values.
As demonstrated in other classification tasks using χ2 for feature selection, it is useful to categorise the data sets using the top features for each class [71]. To investigate how well these features can be used to categorise an area as ‘safe’ or ‘fear’, the frequency for each of the 298 features is calculated for each polygon, before counting the occurrence of the top features presented in Table 1 appearing in the top 25 of the full feature list. To determine which category the polygon best aligns to, the average occurrence of each of the present ‘safe’ or ‘fear’ features in the top 25 is calculated. This provides a measure of how strong the matching is with respect to either category.
We then compare the CSSO results with two exercises that employ human coders. First, a site visit traditional SSO to a sample of 6 polygons, 3 marked as safe and 3 marked as unsafe by participants. Second, we present the list of features to seven experts in the area of place-based fear of crime who rated the features as positively or negatively associated with fear of crime based on their understanding of concepts of built enirvornment and perception.
Finally, we go on to test known-group validity. Known-group validity is a form of construct validity where hypotheses are pre-specified and then tested to reflect whether a tool is able to differentiate where differences are expected a priori. Where a statistical difference is found, it supports the validity of the tool and where the differences are not significant, either the tool/item is flawed, the hypothesis flawed, or the power inadequate [72]. In fear of crime studies, gender has been established as a known factor. Therefore, we might expect different results between different genders.
Results
Preprocessing
In total, 319 unique features are identified in images acquired from areas of both classification types (safe, unsafe). After manually looking at the features it became immediately apparent that some have been identified due to parts of the image detailing user interaction components in Google Maps, such as ‘Computer’, ‘Text’, and ‘Software’. This leaves a total of 297 unique features identified for further analysis.
In terms of the percentage distribution of the identified features, the same 10 features occupy more than 55% of the total number of features identified for each area. The general observation from looking at these images is that in most instances there is little identifiable difference between the occurrence of the top 10 features and whether the areas have been classified as safe or unsafe. There are some small differences in the inadequate range for features such as ‘nature’ and ‘plant’; however, they have a similar median value for both classification types. For this reason, it is necessary to consider the use of a dependency measure between each feature and classification type.
What features are associated with safe/unsafe areas?
To identify which features occur more in safe or unsafe areas, we consider the results of our χ2 statistic to measure the independence between terms and categories.
Feature
Total Safe
Total Unsafe
χ2(f,safe)
χ2(f,unsafe)
difff
Best Category
Sky
329316
663921
0.003556
0.063941
0.060385
fear
Building
190619
307644
0.039673
0.092245
0.052573
fear
Tree
248408
582299
0.051851
0.000004
0.051848
safe
Cloud
234848
531321
0.030527
0.000750
0.029776
safe
Woody plant
22873
49032
0.003444
0.022438
0.018994
fear
Plant
292710
612018
0.009741
0.028357
0.018616
fear
Land lot
123114
204545
0.020347
0.038815
0.018468
fear
Urban design
129000
224153
0.012
0.029
0.017
fear
Asphalt
255567
527432
0.004829
0.020101
0.015272
fear
Road surface
158443
353442
0.013977
0.000141
0.013837
safe
Infrastructure
151189
334879
0.012
0.000
0.011
safe
Vehicle
94498
155784
0.017474
0.028266
0.010791
fear
Property
102420
177081
0.010632
0.021190
0.010559
fear
Mode of transport
68449
176913
0.027528
0.017581
0.009948
safe
Automotive tire
13263
18655
0.021751
0.013721
0.008030
safe
Thoroughfare
38234
124116
0.057188
0.049301
0.007887
safe
Nature
90079
208084
0.011628
0.003866
0.007761
safe
Biome
122659
270326
0.007948
0.000351
0.007597
safe
Tire
72192
115201
0.018477
0.025297
0.006820
fear
House
66208
102726
0.021341
0.027506
0.006165
fear
Car
104944
198715
0.001
0.007
0.006
fear
Light
34509
63012
0.006359
0.001134
0.005225
safe
Wheel
62358
99538
0.016045
0.021075
0.005030
fear
Ecoregion
60022
97746
0.012932
0.017291
0.004358
fear
Window
120657
242811
0.000139
0.002618
0.002479
fear
Top 25 features where difff>diffavg
Table 1 provides the top 25 features, which were identified where difff>diffavg. In this experiment, diffavg is 0.001629 and 25 features have a difff that is larger. The table also provides information on which safety category each feature has been identified as having a stronger dependency measure. A total of 15 of the features are strongly dependent on the safe category, while 10 are strongly dependent on unsafe.
The feature names are those output by the Google Vision API and most are self-explanatory. However, it is very clear that there is a strong overlap between some of the features. For example, ‘Asphalt’ and ‘Road surface’ are clearly similar as asphalt is often used as a road surface. What is surprising here is that ‘Asphalt’ is more strongly associated with fear, whereas ‘Road surface’ is more strongly associated with safe; however, other materials are used for road surfaces, just as asphalt may be used for other purposes. There are also overlaps between ‘Tree’, ‘Nature’, and ‘Ecoregion’ that are clearly all related to nature. However, a similar situation is presented here where ‘Tree’ and ‘Nature’ are both categorised as safe, whereas ‘Ecoregion’ is associated with fear.
To demonstrate that these associations can be used for classification, a simple logic-based classification approach is tested. This is performed by taking using the features most strongly associated with safe and fear (the 15 and 10 shown in Table 1) and determining an area in which of the features are most frequently occurring. For example, taking the top feature for safe and fear (‘Tree’ and ‘Sky’, respectively), the feature that occurs the most in that region will determine whether the area is categorised as ‘fear’ or ‘safe’. In the analysis, the classification is performed for each of the 100 areas. Table 2 presents Precision, recall, and F-Measure results when performing classification using different feature sets, which are constructed by taking the top n features, starting at n=1 and incrementing until all features in the top 25 are used. This is a maximum of 15 safe and 10 fear features. The imbalance in a number of features between the two categories is handled as we are using the average feature occurrence to determine which category the area matches the best. The result presented in the table are interesting and demonstrate that the best results are identified with the use of only two features, which are ‘Tree’ for safe polygons and ‘Sky’ for unsafe polygons. The F-measure is decreasing as we increase the feature sets, demonstrating that only the feature sets with a strong dependency measure have predictive power. Using only the two features produces good precision (i.e. minimising false positives) and recall (i.e. minimising false negatives). However, the recall improves incrementally until the set of features reaches 6, but is to the detriment of precision. The F-Measure (harmonic mean between precision and recall) is used to demonstrate the overall capabilities, as we are interested in achieving both good precision and recall. Table 1 shows that it is best for only the first two rows. It is interesting to see the capabilities rapidly diminishing with an increased number of features, which demonstrates that the classification problem can be reduced to a single binary decision based on whether ‘Tree’ or ‘Sky’ are more common.
Safe Feature Set
Fear Feature Set
Precision
Recall
F-Measure
Tree
Sky
97.8
89.8
93.6
Tree, Cloud
Sky, Building
95.5
85.7
90.3
Tree, Cloud, Road surface
Sky, Building, Woody plant
56.8
93.9
70.8
Tree, Cloud, Road surface, Infrastructure
Sky, Building, Woody plant, Plant
57.3
95.9
71.8
Tree, Cloud, Road surface, Infrastructure, Mode of transport
Sky, Building, Woody plant, Plant, Land lot
55.4
93.9
69.7
Tree, Cloud, Road surface, Infrastructure, Mode of transport, Automotive tire
Sky, Building, Woody plant, Plant, Land lot, Urban design
Results of classification based on top 25 features
Comparing with human observers and coders
To compare the CSSO process with traditional SSO, we performed two exercises. The first was a site visit and SSO carried out for 6 polygons, 3 safe ones and 3 unsafe ones. One coder was given training based on existing knowledge (summariser in the paper earlier) on environmental features associated with fear of crime. They were provided a coding form consisting of a spreadsheet with a dropdown menu where they could select features from the list of all features identified by the computational SSO process. They were asked to catalogue every item which they encountered on the site visit of the area using this tool. In addition, they were asked to give a score of unsafety for each feature: +1 if it is thought to increase fear, 0 if it is thought to be neutral, and -1 if it is thought to decrease fear (increase safety). They were also provided free text fields to record additional observations.
Overall, there were far fewer features coded for each polygon by the human coder. The average number of features recorded across the 6 sites was 14.5 (sd = 2.7), the minimum 10 and the maximum 18. Looking at the scoring of these features reveals the importance of context in understanding these features. The same features were often coded as positive, negative or neutral by the coder. For example, "bench" was rated as positive (associated with more fear) in the case of polygon 28 (an area indicated as unsafe by the original participants marking it) with the note: "Benches are in very bad condition" but rated as negative (associated with less fear) in polygon 39 (an area indicated as safe by the original participants marking it) where they are noted to be: "Painted benches in good condition along the main road".
Summing the scores of the features of the polygons confirms this showing that the unsafe polygons were rated to a score of 0, while the safe polygons a score of -25. However, all polygons had both safe and unsafe features in them. The range of scores of the features in each polygon is shown in Figure 6.
Finally, there were additional features which the human coder wanted to note, despite not finding a relevant matching feature in the dropdown menu, selecting an option “NA” and using the free text box instead. These all related to possibly more transient elements of the environment such as litter, rubbish, and levels of crowding. Additionally, the free text comments showed that in some cases where a relevant category as selected, this was still caveated with the free text. For example, the feature “passenger” was selected to describe the presence of "many homeless and minority people". Overall, while this is not the remit of urban design and planning, it was impossible for the human assessor to exclude these relevant environmental features when assessing what might be relevant to perceived safety or unsafety in an area.
Secondly, we wanted to understand how the results from the CSSO compare with the extant knowledge from previous research about environmental features and fear of crime. To answer this, we took the rating of the features by seven independent coders and compared it with the rating of the algorithm (described above). First, we can look at some aggregate measures. To do this, we created a summative score from the rating of the coders. For example, if something was rated as negatively associated with fear of crime, it was given a score of -1. If all coders rated the feature in this way, the total score would be -7. We compare this with the difff scores in the above analysis, weighted as positive when in favour of unsafe areas (positively associated with fear of crime) and negative when associated with safe areas (negatively associated with fear of crime). A Pearson correlation suggests a statistically significant positive association between human coders and the results of our SCO (t = 2.185, df = 305, p-value = 0.02965) but the effect size is quite small (95 percent confidence interval: 0.0124 - 0.2329). Perhaps more meaningful is to visualise where the coding diverges and aligns. Figure 7 illustrates features on which the human coders disagree with the results from the CSSO. These seem to be features of the built environment that are associated with urban areas (Building, Window, Property, House). While in human coders’ interpretation these were referred as “safe”, (these may be aligned with guardianship), in our observational data these features occurred more in polygons rated as unsafe. On the other hand, thoroughfare and road surface were rated as positively associated with fear by the experts, while these features appeared more in safe polygons.
Figure 8 highlights the points of agreement between the coders and our results. We see that coders identified features associated with nature and light as negatively associated with fear of crime, that is, more present in areas rated as safe. On the other hand, signs of hostile urban environments (Fence, Hazard, Wire Fence) and lack of light (Shade) were positively associated with fear of crime (found more in the unsafe polygons).
We can also consider some inter-rater reliability measures between the raters and the CSSO results. Computing an extended percentage agreement treating all the human coders as one (and therefore still relying on the summative score to assign each feature to the ‘safe’ or ‘unsafe’ category) shows a 37.8% agreement. An unweighted Cohen’s κ for two raters shows statistically significant agreement (p-value = 0.0279); however, once again the coefficient is rather small (κ=0.0647). κ is simply the proportion of agreement after the chance agreement is removed from consideration. When the agreement obtained equals the chance agreement, κ=0. Our value of 0.0647 while positive, is not much greater than chance (perfect agreement means κ=1) [73].
However, we are losing data by combining our raters into one category. Instead, we can consider inter-rater reliability methods for multiple raters. First, we consider only our human raters. Fleiss’s κ is a way to measure the degree of agreement between three or more raters when the raters assign categorical ratings to a set of items. The coefficient κ ranges from 0 to 1 where κ=0 indicates no agreement at all among the raters and κ=1 indicates perfect inter-rater agreement. Taking only our 7 human raters, we see a “Fair” correlation between raters (κ = 0.238, z = 24.6, p-value < 0.001 ), and with adding our SCO as a coder, this drops (κ = 0.208, z = 24.5, p-value < 0.001 ), but remains within the “Fair” category. Evidently, our human raters are not great, and the SCO results do not bring them down too much. In fact, we can repeat Fleiss’s κ excluding one rater at a time and find that excluding rater number 4 yields better results than excluding the algorithm (κ = 0.246, z = 24.6, p-value < 0.001 ).
Overall, the interrater reliability measures indicate low consistency between the raters. However, there is some positive association between CSSO results and the expert reviews, and the inter-rater reliability is an issue within the human coders as much as between them and the algorithm. Of course, this is not the same as IRR in a traditional SSO exercise because the specific items observed by the algorithm were not seen by the human coders. Rather, this tells us to what extent the results are consistent with previous work in the assessment of the coders.
Known group validity
Another approach to testing validity is to consider known-group validity. Specifically, due to previous work on the perception of safety, we might expect there to be a difference between the experiences of men and women a priori. Therefore, we consider whether we find anything different between what is associated with safe and unsafe places drawn by male versus female participants.
In the previous section, it has been established that it is possible to establish and use a dependency measure to identify the features that most strongly define an area type and use them for classification. In this next section, the same process is repeated; however, this time the following four categories are used: (1) Safe Polygon / Male, (2) Safe Polygon / Female, (3) Fear Polygon / Male, and (4) Fear Polygon / Female. Here, the gender title denotes whether a female or male participant drew the polygon. The objective of this analysis is to establish whether we can go further than in the previous analysis and try to identify key features based on gender and polygon type.
Safe Polygon / Male
Safe Polygon / Female
Fear Polygon / Male
Fear Polygon / Female
Feature
χ2
Feature
χ2
Feature
χ2
Feature
χ2
Road surface
0.062
Car
0.058
Urban design
0.172
Infrastructure
0.137
Tire
0.069
Natural landscape
0.019
Wheel
0.114
Biome
0.087
Property
0.041
Vegetation
0.007
Vehicle
0.148
Fixture
0.082
Mode of transport
0.065
Road
0.005
Window
0.112
Moon
0.117
Motor vehicle
0.033
Cloud
0.102
Natural environment
0.080
Land lot
0.034
Nature
0.082
Door
0.048
Grass
0.025
Thoroughfare
0.060
Light
0.046
Line
0.023
Automotive mirror
0.028
Horizon
0.067
Facade
0.033
Table
0.038
Condominium
0.032
Lighting
0.022
Real estate
0.021
City
0.075
Woody plant
0.018
House
0.019
Automotive parking light
0.013
Land vehicle
0.019
Architecture
0.012
Shade
0.019
Street light
0.012
Automotive lighting
0.019
Ecoregion
0.011
Shrub
0.016
Vehicle registration plate
0.008
Automotive design
0.016
Residential area
0.008
Slope
0.014
Automotive tail & brake light
0.005
Awning
0.018
Terrestrial plant
0.005
World
0.014
Product
0.005
Cottage
0.009
Automotive tire
0.008
Interior design
0.007
Pole
0.007
Forest
0.007
Home door
0.017
Top features for each polygon type and gender combination where difff>diffavg
Table 3 provides the χ2 scores for the features, which were identified where the difff>diffavg, following the same process as described in Section 5.2. As is evident, there are a different number of features for each gender and polygon combination, and the χ2 scores indicate a very weak dependency. In addition, this grouping of features appears contradictory and unexpected. For example, ‘Road surface’ is the strongest feature for Safe/Male and yet ‘Urban design’ is the top feature for Fear/Male. Classification is then performed based on these features (same process as in Table 2, Section 5.2) to demonstrate which features result in the best classification accuracy.
The results demonstrate that using only the first features for each combination yields the best classification capability. However, the results are generally poor at around 60% average F-Measure when using the feature set containing one feature per polygon type. These features are: ‘Road surface’ for Safe/Male, ‘Car’ for Safe/Female, ‘Urban Design’ for Fear/male and ‘Infrastructure’ for Fear/Female. These relationships are not as expected, and the low dependency score demonstrates that they do not have a strong dependency, and therefore are not suitable to be used as key classification features. It is interesting from the results that the precision varies significantly, yet the recall remains fairly consistent. This demonstrates that few false negatives are made, whereas a large number of false positives are made influencing the Precision.
Feature Set
Polygon Type
Precision
Recall
F-Measure
Road surface
Safe/Male
39.2
99.1
56.19
Car
Safe/Female
46.7
98.3
63.30
Urban design
Fear/Male
42.9
99.3
59.88
Infrastructure
Fear/Female
87.5
98.8
92.82
Road surface, Tire
Safe/Male
25.7
97.6
40.70
Car, Natural landscape
Safe/Female
16.7
88.9
28.07
Urban design, Wheel
Fear/Male
42.9
99.1
59.84
Infrastructure, Biome
Fear/Female
94.3
98.4
96.30
Road surface, Tire, Property
Safe/Male
40.0
98.8
56.94
Car, Natural landscape, Vegetation
Safe/Female
5.9
90.5
11.05
Urban design, Wheel, Vehicle
Fear/Male
57.5
98.5
72.61
Infrastructure, Biome, Fixture
Fear/Female
87.0
97.7
92.03
Road surface, Tire, Property, Mode of transport
Safe/Male
47.1
98.0
63.59
Car, Natural landscape, Vegetation, Road
Safe/Female
4.8
92.0
9.06
Urban design, Wheel, Vehicle, Window
Fear/Male
62.5
98.4
76.46
Infrastructure, Biome, Fixture, Moon
Fear/Female
72.0
98.4
83.17
Results from performing classification using an increasing number of the top features for each gender and polygon combination. Only results up to a set size of 4 are presented, beyond which the F-Measure scores continue to deteriorate.
Overall however it does not look from these results that male and female participants differ in what features are associated with their safe/ unsafe places. Whether this is due to a failure of the algorithm to detect these differences, or due to there being no gender differences in perception of safety of places however we cannot answer here.
Discussion
In this paper, we introduce the process of Computational Systematic Social Observation, as an approach to carrying out replicable environmental audits of large and bespoke areas, in order to facilitate an understanding of the relationship between environmental features and people’s perceptions of safety. Specifically, we explore whether we can gain meaningful insight without calibrating the models with training from either researchers or through crowdsourcing, rather seeing if it is possible to employ off-the-shelf solutions to achieve a fully data-driven, bottom-up approach with no a priori hypothesis, and how that holds up against expected results based on the extant literature.
Therefore, the contribution of the paper is two-fold. First, we present an exploration of people’s self-defined safe/ unsafe areas in relation to the environmental features which characterise these places. We find that some features can be more strongly associated with safe or unsafe places and further demonstrate that these associations can be used for classification using a simple logic-based classification approach. The results can be interpreted from the theoretical framework of place-based fear of crime studies which consider prospect, refuge, and entrapment as the main drivers of perceived (un)safety. The features associated with safety: Light, Thoroughfare, Mode of transport, Infrastructure, Road surface all suggest prospect - the good visibility, ability to see through a place, and perceive through passage (especially thoroughfare!). However, nature words are associated with both safe (Tree, Nature, Biome) and unsafe places (wood plant, plant, ecoregion). This raises the important point of context. This approach cannot tell us what the quality of the features is, simply that they are present. This is also what became evident from the comparison with the traditional SSO site visits. The same feature (e.g., bench) was coded as positively and negatively associated with fear of crime depending on the different condition, appearance, or context. This suggests that human interpretation is required to be truly useful and meaningful, at least in this case.
Secondly, we also considered evaluating our novel approach by exploring some validity and reliability exercises. We found that generally there is a positive association between the coding of features as safe or unsafe achieved by associating the identified and extracted features with the safe/ unsafe areas and coding of the same features by experts in built environment perception. Although the inter-rater reliability scores were generally low between the individual raters and the algorithm, these were also low between the raters themselves. This reinforces the points made by [17] that strength training and supervision are required for SSO, making this an expensive and labour intensive process. This is something which the automation process can address, and allow us to draw inferences from data on much larger scales. Thereby, CSSO can scale up SSO to provide a faster (and cheaper) way to observe larger areas and identify environmental features associated with areas perceived as safe/ unsafe. Perhaps the approaches which combine this method with crowdsourced calibration from human coders (e.g. [41] are one solution. One particular advantage here was how the data collection steps allowed us to collect systematic observations for entire areas of interest, the definitions of which can be flexible. This method could easily be used for street segments or statistical or other neighbourhood boundaries, but it can also be used for more flexible, person-specific definitions of the neighbourhood, such as egohoods, individualised context measures based on the residential location of a person [74][64]. Therefore, CSSO allows for measuring environmental correlates of various outcomes, health, crime, fear of crime, etc, at any unit of analysis, making it applicable to more robust, person-centric measures of neighbourhood, such as egohoods.
Future work and limitations
This paper presents a first step towards wider implementation of Computational Systematic Social Observation for applications in environmental research considering features associated with people feeling safe and unsafe in public spaces. To further establish this method, future work should consider applications in different contexts. We are certain there are other applications in environmental, criminological, and urban studies that could employ CSSO.
Another point is that here we used a pre-trained algorithm provided by Google Vision, which recognises certain elements of the built environment, but not others, which may be more relevant to perceived safety. Although using something that is off-the-shelf has benefits both in terms of cost, but also in it being evaluated for various issues (e.g. [66]) making it a robust choice, it may be unable to pick up on the specific interests of the research question at hand. For example, in our case, the algorithm results are unable to distinguish between different types of urban environments (for the most part). For example, while certain features were coded as positively associated with fear of crime by both human coders and our algorithm (e.g. wired fences, hazards), more urban elements of the built environment (e.g. building, window, house) were coded uniformly associated with fear of crime by the algorithm, while our human coders interpreted these with more nuance.
This was found in previous work using GV; [67] finds that Google is better at recognising some objects than others. For example, looking at images from news articles, an image of a body being removed from the crime scene is tagged with the terms vehicle, car, profession, and labourer while the ambulance, the body, and the police tape are all overlooked or not identifiable.
A custom algorithm, trained to distinguish between different types of building, or recognise signs of guardianship, as well as features of prospect, concealment, and entrapment, could help better distinguish between different areas. We can consider training a bespoke computer vision algorithm as the training recommended for SSO observers by [17]. This training would only need to be carried out once and could be applied to studies across different domains where SCO can be applied to identify environmental features of interest in some (any) spatial unit of analysis.
We mentioned the flexibility of the approach to various spatial units of analysis, and we could expand upon this to consider spatiotemporal paths. For example, in their examination of burglaries using a Google Street View walk-through [20] suggest to “...mimic the journey-to-crime route taken using a virtual GSV walk-through” (p.298). SCO could extract features for multiple such journeys, building sequences of features associated with the journey to crime, journey to victimisation, or journeys which lead to fearful experiences, which may cause harm in themselves, or prevent future journeys on foot or by other active travel modes. This could offer a nice complement to data collection and analysis to studies such as [75] in linking real-life experiences with computer-assisted observational work.
While we did not find strong differences in the features extracted from safe and unsafe areas rated by men or women, consulting with more recent research, this result might not be so unexpected. [33] find that the effect of biological sex was no longer significant after adding individual characteristics to our model, which suggested that the effect of biological sex on the assessment of environmental safety may be qualified by an indirect effect of these individual characteristics. If we compare only between gender maybe we miss nuance, future work would focus on psychological differences.
This early work, although promising, has also the following significant limitations:
The work only physical environmental features, not social, are considered. However, from a design and urban planning and situational prevention point of view, the environment is important, and environmental characteristics are important in certain contexts (e.g. fear of crime).
The work only considered a small sample of the available polygons, but this is due to space requirements and time restrictions. In this study, a small subset was handled to be achievable, but also to demonstrate if there is a benefit in the presented approach.
The most appropriate step size between images is still not known, and this has significant implications for the size.
Thea approach is limited to areas and times where GSV photos have been captured [20]. Although areas are ever-expanding, meaning the opportunity to carry out SSO in more remote areas may be limited.
It is also the case that “The key disadvantage of observational methods in neighbourhood research, of course, is that they cannot capture the theoretical constructs that require resident perspectives.“ ... “Nevertheless, when used in conjunction with survey-based methods, direct observation can provide an independent source of data that can strengthen inferences about neighbourhood social organisation and its consequences.” [45] (p.11). Our outcome variable is the subjective experience of our participants, asked to draw boundaries around areas they perceive as safe or unsafe. There are extensive discussions on measurement issues related to self-reported fear of crime [76][77][78][79][80] and future work could consider different ways to operationalise our outcome measure, and whether that might alter any conclusions drawn.
Conclusion
This paper introduces Computational Systematic Social Observation (CSSO) as a new approach to automating the collection and identification of environmental characteristics in geographies of interest. Using Google Street View and the Google Vision API, CSSO provides a scalable and replicable approach to analysing built environments, addressing the limitations of traditional methods that rely on manual observation. This approach does not require crowd-sourcing, training, or other resources beyond computational power. To test whether it can still be effective to study crime and place, we explored the ability to identify environmental features associated with safe or unsafe places. The findings reveal the potential of CSSO to identify specific environmental characteristics associated with safe / unsafe places. Specifically, we find that the ability to see a path through the area (linked to the ideas of prospect, refuge, and entrapment) is associated with safe places. However, we also see a lack of contextual information inherent in this approach. This reduces both bias and detail and validity in the findings.
In general, our experimental use of CSSO offers a template for a practical and data-driven approach to study urban safety through an environmental lens. It has shown good promise as an appropriate mechanism for identifying and classifying areas as whether people feel safe or fearful based on features of the built environment extracted through computer vision. As highlighted in Section 6.1 this area is fertile and there is great potential for the future. This work has successfully developed and tested a methodology at an appropriate scale to gain confidence in the approach. This innovation contributes to researchers and planners’ toolkits for data-driven analysis.
Declarations
Availability of data and materials
All experimental datasets, scripts and software are available from the corresponding author upon request.
Competing interests
The authors declare that they have no competing interests.
Funding
No outside funding was used to support this work.
Authors’ contributions
All authors read and approved the final manuscript.
Appendix
Polygon Information
Table 5 presents the information for each of the 100 polygons, including the number of coordinates generated using the approach presented in Section 4.3.1, the number of images extracted by Google Street View, and finally, the size of all images combined the processing time in minutes of how long the technique took to execute.
Area
Number of Coordinates
Number of Images
Size in MB
Processing time
Area
Number of Coordinates
Number of Images
Size in MB
Processing time in minutes
0
708
2751
1597
149
50
3338
12032
7533
668
1
741
2473
1448
135
51
243
925
563
50
2
265
725
476
42
52
585
2278
1080
115
3
16204
41587
23763
4114
53
28433
80747
43595
6326
4
850
3288
1872
173
54
35
122
71
11
5
2139
3963
2609
549
55
295
1129
709
62
6
2324
6710
4478
557
56
2392
3794
2605
593
7
2389
9138
5595
464
57
1356
2826
1630
376
8
14964
47344
24990
3412
58
118
454
265
23
9
13013
38618
22697
3094
59
754
1605
1025
92
10
84
330
193
18
60
2532
9904
5705
516
11
4122
7645
5124
1012
61
268
148
76
88
12
142
175
86
44
62
205
793
465
41
13
2728
5060
3406
666
63
230
893
525
46
14
6235
16723
9472
1549
64
3413
4989
3407
868
15
29704
103247
59971
6380
65
85961
1875
1084
182
16
106
409
231
22
66
401
969
567
106
17
2318
4990
3141
637
67
125
474
254
29
18
16210
52248
26652
3446
68
233254
51875
28724
40491
19
1327
3918
2297
264
70
32057
31460
16661
10415
20
69
276
180
14
71
153
571
330
35
21
188
709
424
40
72
86401
75133
42504
11464
22
48
112
69
13
73
124
463
283
27
23
18
72
42
4
74
513
1337
760
130
24
61
239
139
13
75
6645
14810
8140
1807
25
5077
9559
6270
1236
76
593
2300
1224
116
26
1528
1699
978
136
77
849
2750
1563
189
27
2589
5178
3030
726
78
73
286
169
14
28
329
1291
750
63
79
151
561
286
31
29
1563
5584
3321
326
80
401
1558
733
83
30
86
335
195
18
81
612
483
242
132
31
190
723
378
40
82
76575
73593
40244
11464
32
431
1033
595
114
83
9272
3308
1981
3254
33
2389
2703
1848
637
84
18809
69930
41107
3851
34
6062
23252
13567
1204
85
40833
55637
32809
11464
35
5454
18350
10917
1067
86
8705
30075
17609
1735
36
277
1084
667
54
87
1161
4330
2609
243
37
605
2373
1126
116
88
1481
5142
3249
301
38
52
197
124
10
89
1481
2950
1706
417
39
384
1514
791
73
90
7233
17756
8403
1739
40
149
580
351
31
91
16260
54754
32459
3551
41
420
503
300
134
92
547
1989
1108
112
42
3972
15290
9048
804
93
792
1286
881
154
43
8368
28742
16281
1558
94
327
1143
682
68
44
4321
16086
9881
872
95
3141
10573
6531
578
45
17869
41225
23799
4772
96
5303
20303
11598
1050
46
3191
343
208
1150
97
12198
41702
23789
2617
47
647
395
200
126
98
28
107
70
6
48
813
2085
1251
115
99
6048
19369
11005
1377
49
12899
37804
21833
3078
100
350
1129
692
83
Information for each polygon in terms of number of coordinates and images
I want to publicly thank Alliance Digital Hack, a group of professional private investigator and a certified expert in Bitcoin Recovery Services. Their assistance in helping me recover all the money I lost to fraud was top notch and unbelievable. An online manipulation artist who represented themselves as knowledgeable and experienced in the field of Crypto investments conned my wife and myself. My $356,000 worth of funds were put into cryptocurrency. I was left helpless after the fraud tricked us and had to spend hours looking for a Crypto recovery service to get my money back. The specialist I found was Alliance Digital Hack. I just had to be patient after describing my situation to the expert and all of my money was returned to my wallet in less than 72 hours. Thank you Alliance Digital Hack for your excellent assistance in getting my money back. If you are interested in hiring there services, they can be reached through their contact details below.
Email: alliancedigitalrecovery @ outlook.com
Whatsapp: +44 7.4.5.2. 2.4.7.2.7.7 thank me later.