This study is a replication of:
Meng, Yunliang. 2021. Crime rates and contextual characteristics: A case study in connecticut, USA. Human Geographies 15, (2) (11): 209-228, https://www.proquest.com/scholarly-journals/crime-rates-contextual-characteristics-case-study/docview/2638089143/se-2 (accessed April 6, 2025).
Key words: Connecticut, crime, inequality, contextual
characteristicsSubject: Social and Behavioral Sciences: Geography:
Human GeographyDate created: 04/06/2024Date modified: 2025-05-20Spatial Coverage: Connecticut, USASpatial Resolution: County SubdivisionsSpatial Reference System: EPSG: 2234Temporal Coverage: 2013 - 2017Temporal Resolution: 1 yearSpatial Coverage: Connecticut, USASpatial Resolution: County SubdivisionsSpatial Reference System: EPSG: 2234Temporal Coverage: 2013 - 2017Temporal Resolution: 1 yearThis is a replication of a study on crime and contextual characteristics in Connecticut. The original study uses geographically weighted regression to test how crime rates at the county subdivision level vary based on several socio-demographic characteristics.
The original study is observational using socio-demographic indicators from the Census Bureau’s American Community Survey 5-year estimates and crime data from the Uniform Crime Report disseminated by the Federal Bureau of Investigation.
We will attempt to use the same methods and data sources as the original authors to see if there is any variation in our results or missing methods in their research.
There are two data sources for this study, one is demographic data from the American Community Survey and the other is crime rate statistics from the Uniform Crime Report gathered by the FBI.
Title: CT Census Subdivision Socio-demographic
DataAbstract: BCT Census County Subdivision
Socio-demographic DataSpatial Coverage: ConnecticutSpatial Resolution: County SubdivisionSpatial Representation Type: vectorSpatial Reference System: EPSG: 2234Temporal Coverage: 2013-2017Temporal Resolution: 1 yearLineage: collected using the census API and tidycensus
package in RDistribution: Publicly availableConstraints: Public dataData Quality: trustworthy## Reading layer `county_subdivision' from data source
## `/Users/dermotmcmillan/Desktop/GitHub/RPr-CT-crime/data/raw/public/county_subdivision.gpkg'
## using driver `GPKG'
## Simple feature collection with 173 features and 98 fields (with 4 geometries empty)
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: -74 ymin: 41 xmax: -72 ymax: 42
## Geodetic CRS: NAD83
| Label | Alias | Definition | Type | Accuracy | Domain | Missing Data Value(s) | Missing Data Frequency |
|---|---|---|---|---|---|---|---|
| total_population | B01003_001 | Total US population (Estimate) | … | … | … | … | … |
| age_20m | B01001_008 | Population of Males aged 20 | … | … | … | … | … |
| age_21m | B01001_009 | Population of Males aged 21 | … | … | … | … | … |
| age_22_24m | B01001_010 | Population of Males aged 22-24 | … | … | … | … | … |
| age_25_29m | B01001_011 | Population of Males aged 25-29 | … | … | … | … | … |
| age_30_34m | B01001_012 | Population of Males aged 30-34 | … | … | … | … | … |
| age_20f | B01001_032 | Population of Females aged 20 | … | … | … | … | … |
| age_21f | B01001_033 | Population of Females aged 20 | … | … | … | … | … |
| age_22_24f | B01001_034 | Population of Females aged 22-24 | … | … | … | … | … |
| age_25_29f | B01001_035 | Population of Females aged 25-29 | … | … | … | … | … |
| age_30_34f | B01001_036 | Population of Females aged 30-34 | … | … | … | … | … |
| education_total | B15003_001 | Total population | … | … | … | … | … |
| education_assoc | B15003_021 | Highest degree or the highest level of school completed = Associates degree | … | … | … | … | … |
| education_ba | B15003_022 | Highest degree or the highest level of school completed = Bachelors Degree | … | … | … | … | … |
| education_ma | B15003_023 | Highest degree or the highest level of school completed = Masters Degree | … | … | … | … | … |
| education_pro | B15003_024 | Highest degree or the highest level of school completed = Profession School Degree | … | … | … | … | … |
| education_phd | B15003_025 | Highest degree or the highest level of school completed = Doctorate Degree | … | … | … | … | … |
| median_income | B19013_001 | Median Household Income | … | … | … | … | … |
| poverty_total_pop | B17001_001 | Total Population | … | … | … | … | … |
| poverty_below | B17001_002 | Income below the poverty level in last 12 months | … | … | … | … | … |
| unemployment_total | B23025_001 | Total Population | … | … | … | … | … |
| unemployment_total_in_labor | B23025_002 | Population in Labor Force | … | … | … | … | … |
| unemployment_unemployed | B23025_005 | Unemployed population considered to be in labor force | … | … | … | … | … |
| housing_total | B25003_001 | Occupied Housing Units | … | … | … | … | … |
| housing_renter | B25003_003 | Renter occupied Housing Units | … | … | … | … | … |
| housing_units_total | B25024_001 | Housing Units | … | … | … | … | … |
| housing_units_2 | B25024_004 | Housing Units w/ 2 units | … | … | … | … | … |
| housing_units_3_4 | B25024_005 | Housing Units w/ 3 or 4 units | … | … | … | … | … |
| housing_units_5_9 | B25024_006 | Housing Units w/ 5 to 9 units | … | … | … | … | … |
| housing_units_10_19 | B25024_007 | Housing Units w/ 10 to 19units | … | … | … | … | … |
| housing_units_20_49 | B25024_008 | Housing Units w/ 20-49 units | … | … | … | … | … |
| housing_units_50 | B25024_009 | Housing Units w/ 50 or more units | … | … | … | … | … |
| moved_total | B07001_001 | Population 1 year or more in the US | … | … | … | … | … |
| moved_within_12_months | B07001_017 | Population that has moved homes in the past 12 months | … | … | … | … | … |
| households_total | B11003_001 | Family Type by Presence and Age of Own Children Under 18 Years | … | … | … | … | … |
| lone_parent_families_m | B11003_010 | Male Housholder, no wife present | … | … | … | … | … |
| lone_parent_families_f | B11003_016 | Female housholder, no husband present | … | … | … | … | … |
| hispanic | B03002_012 | Hispanic | … | … | … | … | … |
| race_white | B03002_003 | Not Hispanic or Latino, White alone | … | … | … | … | … |
| race_black | B03002_003 | Not Hispanic or Latino, Black or African American alone | … | … | … | … | … |
| race_asian | B03002_006 | Not Hispanic or Latino, Asian alone | … | … | … | … | … |
| race_native | B03002_005 | Not Hispanic or Latino, American Indian and Alaska Native Alone | … | … | … | … | |
| race_pacific | B03002_007 | Not Hispanic or Latino, Native Hawaiian and Other Pacific Islander Alone | … | … | … | … | … |
| race_other | B03002_008 | Not Hispanic or Latino, Some Other Race Alone | … | … | … | … | … |
| race_two_or_more | B03002_009 | Not Hispanic or Latino, Two or more races | … | … | … | … | … |
Title: CTAbstract: BCT Census town level Crime DataSpatial Coverage: ConnecticutSpatial Resolution: townSpatial Representation Type: non-spatialTemporal Coverage: 2013-2017Temporal Resolution: 1 yearLineage: gathered on 04/06/2024 from http://data.ctdata.org/dataset/ucr-crime-indexDistribution: Publicly availableConstraints: Public dataData Quality: good, reported from local law enforcement
agenciesThe threat specifically relevant to this problem is the Modifiable Unit Area Problem since crime rates will have different social and spatial patterns at different scales. There are also potential sources of error related to endogeneity and spatial auto-correlation both of which are moderately accounted for in the original study. Additionally, the results do not have predictive power because the GWR is too regionally specific and over fit. Instead these results can be interpreted as exploratory requiring more rigorous research to contextualize and verify any findings. Bias is also inherent to crime data since crime is socially constructed and criminality is at least partially defined around race and class in America. Over-policing and over-reporting in Low Income areas and Black and brown neighborhoods introduces bias into the measurement of crime itself.
There are several methodological choices that the original authors did not specify, and which we will have to figure out by comparing results and summary statistics. Specifically, we need to choose a spatial weights matrix for the GWR. We will start with the default ArcGIS spatial matrix (since they used the ArcGIS tool for their analysis) and go from there. If we cannot figure out which one they used we will chose our own and compare results. There are also some transformation choices with the census data that we will have to figure out by comparing our data to the summary statistics provided (i.e what denominator for percentages).
Data transformations for Crime and Census data are provided in the following workflow:
## `summarise()` has grouped output by 'Town'. You can override using the
## `.groups` argument.
###Crime Data
| Statistic | Min | Median | Max | IQR | SD |
|---|---|---|---|---|---|
| Total Violent Crime | 0 | 53 | 951 | 59 | 140 |
| Total Property Crime | 134 | 783 | 3911 | 1204 | 815 |
Unplanned Deviation It is clear, since the minimum values are different, that the author treated some empty or 0 values as nulls. Since we have no way of discerning which ones these were we will move forward by treating all empty values as 0.
###Census
| Statistic | Min | Median | Max | IQR | SD |
|---|---|---|---|---|---|
| age | 5.92 | 14.61 | 41.40 | 5.86 | 5.36 |
| poverty_rate | 0.27 | 5.13 | 30.49 | 4.54 | 5.12 |
| education | 14.74 | 51.04 | 77.92 | 20.11 | 13.80 |
| median_income | 33841.00 | 85296.00 | 219868.00 | 28534.00 | 28102.93 |
| unemployment_rate | 1.21 | 5.58 | 16.02 | 2.61 | 2.27 |
| rent_rate | 2.26 | 18.98 | 76.20 | 15.58 | 13.67 |
| multi_unit_rate | 0.00 | 17.62 | 94.23 | 21.61 | 18.49 |
| res_mobility | 0.96 | 6.96 | 23.18 | 4.73 | 3.45 |
| pop_density | 11.38 | 169.49 | 3260.72 | 358.20 | 495.35 |
| shannon_eq | 0.07 | 0.24 | 0.65 | 0.19 | 0.15 |
Unplanned Deviation Variables were calculated using each tables respective total population. We cross compared using the summary table (table 2), and were able to match most of the values. Population density, housing type (multi_unit_rate), and residential mobility calculations yielded slightly different results. For population density, this is likely because of minor differences sin calculating area. For housing type and residential mobility, we were unable to parse the differences. Discrepancies may be because the original authors cleaned the data and didn’t report it.
The Shannon index we calculated reported very different summary statistics compared to the original study. Initially, we thought this was a calculation error but we re-ran the analysis several ways (hand-built method and ChatGPT generated workflow) and got the same results. To further verify, I compared the spatial distribution of the Shannon index to maps of other diversity measures in CT and they were almost identical. This, along with some concerning deviations in our analysis, lead us to believe that the original authors incorrectly calculated the metric.
###Property Crime
| term | estimate | std.error | statistic | p.value |
|---|---|---|---|---|
| (Intercept) | 529.27 | 64.16 | 8.2 | <0.0001 |
| pop_density | 0.77 | 0.13 | 5.9 | <0.0001 |
| multi_unit_rate | 15.31 | 3.49 | 4.4 | <0.0001 |
###Violent Crime
| term | estimate | std.error | statistic | p.value |
|---|---|---|---|---|
| (Intercept) | -71.24 | 21.56 | -3.3 | 0.0012 |
| pop_density | 0.19 | 0.02 | 11.3 | <0.0001 |
| education | 1.26 | 0.42 | 3.0 | 0.0029 |
| poverty_rate | 8.25 | 1.51 | 5.5 | <0.0001 |
| shannon_eq | -73.11 | 53.25 | -1.4 | 0.1716 |
The ordinary least squares regression results with
Total Property Crime (really the crime rate per 100,000) as
the response variable gave surprisingly similar results to the original
OLS model in the study. The beta estimates were slightly different for
both the predictor values, but this makes sense given that all 3 of the
variables had minor discrepancies. We only used the predictor variables
selected by the original authors. To expand in this section (and explore
the tree of forking paths), it may make sense to do a variable selection
process with our data too see if we may have chosen different
predictors.
The OLS coefficients for Total Violent Crime were all
similar to the original study except for the Shannon equability index
(diversity), which didn’t even provide a significant result.
In this section we only visualized the Local Moran’s I values, we did not calculate a Global Moran’s I for times sake.
Planned Deviation We had no idea what spatial weights matrix the original author used to calculate local Moran’s I scores so we went with what seems to be the default in ArcGIS: fixed distance based on the maximum of the nearest neighbor distances.
Ulanned Deviation It was difficult to determine the exact classification scheme used in ArcGIS for the cluster analysis, as the GUI offers multiple options and limited transparency. After researching the default settings and discussing with ChatGPT, we concluded that areas with statistically significant Local Moran’s I results were classified based on whether their own crime rate and the spatial lag (the average crime rate of neighboring areas) were above or below the global mean. This combination allowed us to assign clusters such as High-High, Low-Low, High-Low, and Low-High.
##GWR
## Adaptive q: 0.38 CV score: 53374871
## Adaptive q: 0.62 CV score: 46640010
## Adaptive q: 0.76 CV score: 47949831
## Adaptive q: 0.65 CV score: 46898852
## Adaptive q: 0.53 CV score: 46310686
## Adaptive q: 0.54 CV score: 46175445
## Adaptive q: 0.57 CV score: 46236944
## Adaptive q: 0.55 CV score: 46178513
## Adaptive q: 0.54 CV score: 46118022
## Adaptive q: 0.54 CV score: 46134633
## Adaptive q: 0.54 CV score: 46108956
## Adaptive q: 0.54 CV score: 46091356
## Adaptive q: 0.54 CV score: 46120000
## Adaptive q: 0.54 CV score: 46098094
## Adaptive q: 0.54 CV score: 46101446
## Adaptive q: 0.54 CV score: 46093134
## Adaptive q: 0.54 CV score: 46094716
## Adaptive q: 0.54 CV score: 46091893
## Adaptive q: 0.54 CV score: 46092195
## Adaptive q: 0.54 CV score: 46091356
## Adaptive q: 0.38 CV score: 998007
## Adaptive q: 0.62 CV score: 1036577
## Adaptive q: 0.24 CV score: 1126640
## Adaptive q: 0.47 CV score: 1033412
## Adaptive q: 0.33 CV score: 1020719
## Adaptive q: 0.39 CV score: 998600
## Adaptive q: 0.38 CV score: 997949
## Adaptive q: 0.36 CV score: 1004380
## Adaptive q: 0.37 CV score: 1001825
## Adaptive q: 0.38 CV score: 998121
## Adaptive q: 0.38 CV score: 997871
## Adaptive q: 0.38 CV score: 997839
## Adaptive q: 0.38 CV score: 997840
## Adaptive q: 0.38 CV score: 997838
## Adaptive q: 0.38 CV score: 997838
## Adaptive q: 0.38 CV score: 997838
## Adaptive q: 0.38 CV score: 997838
We used the same method as the original author to chose an adaptive bandwith based on AIC minimization. We did not have to specify a spatial weights matrix for he GWR like we thought.
###Summary
| Statistic | Min | Max | under_1.96 | between_1.96_2.58 | above_2.58 |
|---|---|---|---|---|---|
| pop_density.1 | 0.44 | 1.53 | 0.59 | 0.83 | 0.15 |
| multi_unit_rate.1 | 2.25 | 25.94 | 0.34 | 0.56 | 0.11 |
| pop_density.2 | 0.04 | 0.46 | 0.03 | 0.88 | 0.09 |
| education.1 | -3.22 | 2.85 | 0.41 | 0.50 | 0.09 |
| poverty_rate.1 | -2.29 | 24.21 | 0.59 | 0.33 | 0.08 |
| shannon_eq.1 | -250.19 | 143.80 | 0.91 | 0.01 | 0.08 |
Coefficient ranges for each of the models (Property/ Violent Crime) varried significantly between the original study and our reproduction. This difference can be explained by the discrepancies in the underlying data and the fact that GWR over fits (explains too much of the error).