Abstract

This study is a reproduction of Charles W. Sterling III et al’s study on “Connections Between Present-Day Water Access and Historical Redlining”.

Sterling III, Charles W., et al. “Connections between present-day water access and historical redlining.” Environmental Justice (2023). DOI:[10.1089/env.2022.0115](https://doi.org/10.1089/env.2022.0115)

This study uses ACS Census data and historical HOLC records of neighborhoods to examine the correlation of historical redlining and current-day access to water in cities. The study uses a binary logistic regression to identify relationships of different demographic and HOLC varaibles to access to water and sewage. The study finds that historically worse HOLC scores were correlated with less access to water in cities across all regions of the United States.

Original study spatio-temporal metadata

  • Spatial Coverage: A collection of inner-city cores with historical HOLC grades that are stored in the University of Richmond Mapping Inequality database.
  • Spatial Resolution: HOLC polygons
  • Spatial Reference System: NAD83
  • Temporal Coverage: 2016-2020
  • Temporal Resolution: Observations collected yearly (ACS)

Study design

This study is a reproduction of Charles W. Sterling III et al’s study on “Connections Between Present-Day Water Access and Historical Redlining”, a study which set out to explore the relationship between prevalence of the complete plumbing ACS variable in census blocks and historical neighborhood HOLC grades. In the study, Sterling hypothesizes that there will be a link between HOLC grade and complete plumbing. This hypothesis is based on the HOLC’s well documented history of assigning neighborhoods of color the lowest grades and previous findings of a nationwide link between race and complete plumbing prevalance as found in Deitz and Meehan 2019. Sterling’s work uses similar method to this study, using a logistic regression to test the link between the neighborhood grades and plumbing.

Other methods included in this study include the use of Areal Interpolation to supposedly assign ACS variable characteristics to the HOLC graded neighborhoods. This is somewhat of a strange strategy; often used to fill in missing or anamolous values in gridded data, areal interpolation doesn’t make a lot of sense for something like assigning data to an irregularly shaped polygon. Furthermore, areal interpolation assigns a value that is a average of several values of the areal units on either side of it. This suggests that the phenomena to be studied in the filled in area is a continuation of patterns that are present proximal to it. This is almost the opposite of the point of HOLC grades, grades assigned to neighborhoods based on their characteristics that set them apart. There’s a somewhat simple explanation to why Sterling chose to use this approach; The study that Sterling cites as the published usage of this technique for re-aggregating HOLC data - Fricker and Allen 2022 - used area-weighted-reaggregation to re-assign casualties from tornado paths to the neighborhoods they passed through, but incorrectly used the term ‘areal interpolation’ to describe their method. This means that likely Sterling also just used AWR. This assumption is supported by an interrogation of the python package used for the reaggregation, which considers AWR “the simplest form of areal interpolation”, a somewhat misleading designation given that the two refer to different techniques.

Materials and procedure

Computational environment

Data and variables

This study used only two data sources; American Community Survey (ACS) demographic data and maps of neighborhood’s historical mortgage HOLC grades, published and maintained on the University of Richmond’s Mapping Inequality Database.

American Community Survey (ACS)

  • Title: American Community Survey (ACS)
  • Abstract: The ACS is a nationwide survey that collects and produces information on social, economic, housing, and demographic characteristics about our nation’s population every year
  • Spatial Coverage: United States of America
  • Spatial Resolution: ACS data is aggregated by Census Block.
  • Spatial Representation Type: vector, MULTIPOLYGON
  • Spatial Reference System: NAD83
  • Temporal Coverage: ACS data from 2016-2020.
  • Temporal Resolution: Data is collected annually.
  • Lineage: Downloaded and used as-is.
  • Distribution: Data is publically avalible from the US Census (https://www.census.gov/programs-surveys/acs/data.html)[https://www.census.gov/programs-surveys/acs/data.html]
  • Constraints: Publicly available data.
  • Data Quality: N/A

See acs_values.csv at /data/raw/public/acs_values.csv

HOLC Grades

Title: HOLC grades from the Mapping Inequality Database (MID) - Abstract: HOLC grades are historical neighborhood delineations created by the Home Owners’ Loan Corporation in the 1930s to assess mortgage lending risk. These maps were used to guide investment and disinvestment in urban areas and are a foundational dataset for understanding redlining and its long-term impacts. - Spatial Coverage: Select major U.S. cities - Spatial Resolution: Neighborhood-scale boundaries (city block to district level, variable by city) - Spatial Representation Type: vector, MULTIPOLYGON - Spatial Reference System: NAD83 - Temporal Coverage: 1935–1940 (approximate dates of HOLC map creation) - Temporal Resolution: Single mapping effort. - Lineage: Downloaded and used as-is. - Distribution: Data is avalible on the Mapping Inequality website (https://dsl.richmond.edu/panorama/redlining/data)[https://dsl.richmond.edu/panorama/redlining/data] - Constraints: Publicly available data under a CC-by-NC 2.5 license - Data Quality: Accurate to the degree that the original digitization of the 1930’s HOLC maps was accurate.

Label Alias Definition Type Accuracy Domain Missing Data Value(s) Missing Data Frequency
city City Name City in which the HOLC designated area resides character N/A Names of US cities none none
state State Abbreviation USPS state designator character N/A The 50 states N/A none
category Desirability Assessment of investment desirability based on HOLC designation character subjective/historical a range of desigations none none
grade Grade Historical HOLC grade character subjective/historical A - D none none
label Shape Label HOLC grade and arbitrary number (A1, C3) to divide same-grade areas from the same city character N/A A-D, number of same-grade zones in the same city none none
residential Is it residential? ” ” binary (TRUE/FALSE) N/A TRUE/FALSE none none
commercial Is it commercial? ” ” binary (TRUE/FALSE) N/A TRUE/FALSE none none
industrial Is it industrial? ” ” binary (TRUE/FALSE) N/A TRUE/FALSE none none

Prior observations

There are no prior observations related with this data.

Bias and threats to validity

Two main source of bias in this study come from the HOLC data itself, while another comes from the ACS data. Firstly, the spatial extent of the HOLC data only covers the centers of major cities across the united states, limiting the analysis to center-cities. Secondly, ‘Areal Interpolation’ was used to determine census data values for each HOLC neighborhood, a method that doesn’t seem logical or advisable give the study context. Some data in the study suggests that this actually was a more logical area weighted reaggregation. Additionally, it’s unclear how this areal interpolation was done. Lastly, there could be a difference in results if an areal unit other than census block groups were used to perform this analysis.

Data transformations

First, remap ACS values from block groups to HOLC polygons through areal interpolation, likely actually area-weighted reaggregation.

Using census statistics from the ACS dataset, resulting calculated assigned variables are: Black Population %, White Population %, Indigenous American %, Asian Population %, Hispanic or Latino Population %, % Below Poverty Line, % Housing Units Owned, % Housing Units Rented, % Mobile Homes, % Homes before 1980, % Homes after 1980, % With Complete Plumbing, % With Incomplete Plumbing, and % Foreign Born. This data is separated by region as well as collected nationally. Using regional and national averages for plumbing rates, a binary variable is calculated on whether plumbing is below or above the average. This bianary variable will be creaed with a case/when function, using a national average of 2.61668% incomplete plumbing to determine if HOLC neighborhoods fall above or below. Neighborhoods that fall above will be assigned a 1, others will be assigned a 0.

This data can then be verified with the Supplementary Data Tables.

Then, use the Dunn Test to determine whether incomplete plumbing % in Grade A is significantly different than the other Grades and compared with supplementary table 1.

Using a Stepwise function with the \(StepAIC\) package and the HOLC Grade A as a reference group, we can gather a correlation table to remove the highest correlation values. In this case, the highest correlation value will be removed, and compared with supplementary table 2.

Analysis

Sterling III’s original study used a binary logistic regression model to compare categorical hold grades and a census block group’s binary desigantion of whether plumbing rates were above or below average. Results were gathered as the Average Marginal Effects (AME) of different census-derived varaibles represented as binary values.

To test potential unaccounted for geographic correlation, we will re-run using a geographic weighted regression (GWR) to assess the degree to which the proximity of similar HOLC grades effect the statistical correlation found by the original authors.

Results

Data is presented as 3 figures, 2 tables, and 7 supplementary tables:

Figures:

Tables:

Supplementary Tables:

National Data Tables - AME Table for the Northeast region using the average 1.447623 as the threshold for the binary variable - AME Table for the South region using the average 3.414391 as the threshold for the binary variable - AME Table for the Midwest region using the average 3.756471 as the threshold for the binary variable - AME Table for the West region using the average 0.7836609 as the threshold for the binary variable

Discussion

Sterling hypothesizes that areas of present-day incomplete plumbing within U.S. cities (i.e., communities with a proportion of homes lacking complete plumbing above the national average) are significantly associated with HOLC neighborhood designations. The study finds evidence for this hypothesis through the correlation of lower HOLC grades and incomplete plumbing.

A successful reproduction would confirm this hypothesis with similar derived values to those found in the original study. Because the data transformations that are a part of this workflow are relatively simple, I would expect that there is very little margin for error in the difference of the final product from that of Sterling. The one potential complication is Sterling’s unsubstantiated use of either AWR or AI, which will need to be investigated. Because we’re not sure what method he used, our results could be slightly different depending on what package was chosen.

Comparing the results of the bianary logistic regression and the geographic weighted regression will be interesting, assessing the degree to which geographic correlation plays a role in the significance found as part of the results of this study.

Integrity Statement

The authors of this preregistration state that they completed this preregistration to the best of their knowledge and that no other preregistration exists pertaining to the same hypotheses and research.

Acknowledgements

This report is based upon the template for Reproducible and Replicable Research in Human-Environment and Geographical Sciences, DOI:[10.17605/OSF.IO/W29MQ](https://doi.org/10.17605/OSF.IO/W29MQ)

References

Sterling III, Charles W., et al. “Connections between present-day water access and historical redlining.” Environmental Justice (2023). DOI:[10.1089/env.2022.0115](https://doi.org/10.1089/env.2022.0115)

Shiloh Deitz and Katie Meehan. “Plumbing Poverty: Mapping Hot Spots of Racial and Geographic Inequality in U.S. Household Water Insecurity.” Annals of the American Association of Geographers 109 (2019): 1092–1109.

Tyler Fricker and Douglas L. Allen. “A Place-Based Analysis of Tornado Activity and Casualties in Shreveport, Louisiana.” Natural Hazards 113 (2022): 1853–1874.

Leeper, Thomas J. 2024. Margins: Marginal Effects for Model Objects. https://github.com/bbolker/margins.
Müller, Kirill. 2020. Here: A Simpler Way to Find Your Files. https://here.r-lib.org/.
R Core Team. 2024. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing. https://www.R-project.org/.
Walker, Kyle, and Matt Herman. 2025. Tidycensus: Load US Census Boundary and Attribute Data as Tidyverse and Sf-Ready Data Frames. https://walker-data.com/tidycensus/.
Wickham, Hadley. 2023. Tidyverse: Easily Install and Load the Tidyverse. https://tidyverse.tidyverse.org.
Wickham, Hadley, Mara Averick, Jennifer Bryan, Winston Chang, Lucy D’Agostino McGowan, Romain François, Garrett Grolemund, et al. 2019. “Welcome to the tidyverse.” Journal of Open Source Software 4 (43): 1686. https://doi.org/10.21105/joss.01686.