How Differential Privacy Will Affect Estimates of Air Pollution Exposure and Disparities in the United States

Findings Census data is crucial to understand energy and environmental justice outcomes such as poor air quality which disproportionately impact people of color in the U.S. Wwith the advent of sophisticated personal datasets and analysis, Census Bureau is considering adding top-down noise (differential privacy) and post-processing 2020 census data to reduce the risk of identification of individual respondents. Using 2010 demonstration census and pollution data, I find that compared to the original census, differentially private (DP) census significantly changes ambient pollution exposure in areas with sparse populations. White Americans have lowest variability, followed by Latinos, Asian, and Black Americans. DP underestimates pollution disparities for SO 2 and PM 2.5 while overestimates the pollution disparities for PM 10 .

block-group level data, while increasing the level of aggregation to larger spatial resolution (state or county level) underestimates disparities compared to census tract or block group level aggregation (Clark et al. 2022;Paolella et al. 2018).Noise and adjustments in census data can significantly alter these estimates.
In this piece, I answer how introducing differential privacy in Census data impact:

Methods
I use population data at census block group level (CBG) from the original 2010 Census and from the latest experimental runs of differential privacy algorithm applied to the original 2010 Census (Vintage 2022-08-25) from IPUMS NHGIS (Manson et al. 2022).Americans who identify as non-Hispanic blacks only, non-Hispanic whites only, non-Hispanic Asians only, non-Hispanic native American and American Indian only are referred to as Blacks, Whites, Asians, and Native Americans in this work.Latinos include all Americans who identify as Latinos or Hispanics.Americans who identify as mixed race aren't included in this analysis.I use census block group level (CBG) ambient pollution estimates of four air pollutants (PM 2.5 , PM 10 , NO 2 , SO 2 ) for the year 2010 from the Center for Air, Climate and Energy Solutions ("The Center for Air, Climate, and Energy Solutions" n.d.) as described in published work (Kim et al. 2020).
Exposure of pollutant i by race and ethnicity j is aggregated to census tract level and county level is given as: Where denotes the ambient pollution estimate of pollutant in each census block group and denotes the total population or population of race/ethnicity in each census block group summed over all census block groups in a census tract or county.Figure 1 and 2 plot the percentage difference of exposure of pollutants experienced by total population and different race and ethnicity in the differentially private census compared to the original census aggregated at county and census tract level respectively.Census tracts or counties with any population count of 0 in either original or differentially private census are removed.
1. Air pollution exposure of different race and ethnicity in the United States 2. Exposure disparities when aggregated to county and census tract levels.
How Differential Privacy Will Affect Estimates of Air Pollution Exposure and Disparities in the United States

Findings
To understand the impact of differential private census products on pollution disparities, I estimate risk gap at county and census tract levels.Risk gap is defined as the difference between the pollution exposure of most burdened group, i.e., maximum value of exposure for a race and ethnicity as calculated above and the total population average exposure.
Where is pollution exposure of pollutant i by race and ethnicity j in census tract or county and is the pollution exposure of pollutant i for the entire population in census tract or county.Figure 3 plots the ratio of risk gap calculated using the DP and original census by the population average pollution exposure at census tract and county.Ratio above (below) 1 denotes that DP census shows larger (smaller) risk gap compared to the original census.

Findings
Differential privacy in census data significantly changes the ambient pollution exposure in small spatial units with sparse population of people of color (Figure 1 and 2).Census tracts have higher variations than counties.White American have the lowest variance in exposure, followed by Latinos, Asian, and Black Americans.This is, in part, due to post-processing procedure which gives priority to the accuracy counts for the largest racial group in an area.The changes in pollution exposure also depends on the pollutant.For example, in counties with sparse population of Asian and Black Americans, the NO 2 exposure changes can be as high as +/-50%.Exposure differences nullify for larger population counts.
Figure 3 displays the ratio of the risk gap calculated by DP census and original census with ambient pollutant levels for both county and census tract aggregation.Differentially private census underestimates (ratio less than 1) the disparity for SO 2 in both county and census tract aggregations.The ratio decreases with higher levels of ambient SO 2 .DP overestimates the risk gap associated with PM 10 for both county and census tract compared to original census (ratio greater than 1), with the ratio increasing as the ambient pollution of PM 10 increases.The trends in risk gap ratio at the county level for NO 2 and PM 2.5 are not significant, but DP significantly underestimates the disparity for PM 2.5 at the census tract level, particularly in more polluted census tracts.
How Differential Privacy Will Affect Estimates of Air Pollution Exposure and Disparities in the United States

Figure 1 .
Figure 1.Percentage change in air pollution exposure (PM 10 , PM 2.5 , NO 2 , and SO 2 ) of total population and different racial and ethnic groups in differentially private census compared to the original census for year 2010 aggregated at county level.X axis for each plot shows the logarithm (base 10) of population count of specific racial and ethnic group or the total population.

Figure 2 .
Figure 2. Percentage change in air pollution exposure (PM 10 , PM 2.5 , NO 2 , and SO 2 ) of total population and different racial and ethnic groups in differentially private census compared to the original census for year 2010 aggregated at census tract level.X axis for each plot shows the logarithm (base 10) of population count of specific racial and ethnic group or the total population.

Figure 3 .
Figure 3. Ratio of risk gap for pollutants at county (left) and census tract (right) level of DP census compared to the original census.X axis shows the population average concentration of pollutant in a county or census tract.Risk gap is defined as the difference between the pollution exposure of the most burdened race and ethnicity compared to the population average exposure.Ratio less than 1 indicates that DP underestimates pollution disparity compared to the original census and vice-versa.