Modeling Residential Vehicle Miles Traveled at Block Group Level in California

Jennifer Ziebarth; Matthew Goyne; Mackenzie Watten; Siddartha Rayaprolu; Rey (Reyhane) Hosseinzade

doi:10.32866/001c.162714

1. Questions

Reducing driving and the greenhouse gas emissions, congestion, and safety risks that accompany it has become a central objective of transportation and land use policy (EPA n.d.-b). Yet practitioners often lack tools to translate built-environment conditions into clear, quantitative, and communicable expectations for VMT. A substantial body of research has examined the relationship between built environment characteristics and VMT through different analytical approaches. The literature consistently identifies density, design, diversity, distance to transit, and destination accessibility as influential predictors of the number of trips, trip length, and VMT (Cervero and Kockelman 1997; Ewing and Cervero 2010; Ewing et al. 2011; Ewing and Cervero 2017; Ewing et al. 2016; Ihlanfeldt 2020; DeWeese and El-Geneidy 2020; Yin et al. 2025). However, most prior work has been conducted at the household or regional level, often relying on household travel survey data with limited geographic coverage, and few studies have applied statewide, block-group-scale modeling using passively collected big data in a context like California, where VMT reduction is a statutory environmental metric under Senate Bill 743. Recent work at the metropolitan scale has begun identifying nonlinear thresholds in this relationship. Hamidi et al. (2026) find a compactness score of approximately 120 as the point at which walking and transit commuting meaningfully increase across U.S. MSAs, but these MSA-level findings leave open the question of which specific neighborhoods within a metropolitan area are at or above such thresholds. Block-group-scale analysis of the kind presented here is well-positioned to characterize how built-environment conditions vary across metropolitan areas and to quantify their associations with residential VMT in a form directly usable by planners. While identifying nonlinear thresholds of the kind Hamidi et al. (2026) describe remains an important direction for future research, the present analysis prioritizes a direct linear specification that is interpretable across California’s diverse planning contexts and directly connected to the built-environment dimensions subject to policy intervention.

2. Methods

The study includes a statewide, cross-sectional dataset at the census block group level to estimate the relationships between the independent variables and residential VMT. The dependent variable is average home-based VMT per resident (generated by all trips where origin or destination is the home), derived from StreetLight Data’s VMT Index (Fehr & Peers, n.d.). The explanatory variables were assembled primarily from the U.S. Environmental Protection Agency’s Smart Location Database (EPA-SLD n.d.-a). The analysis is conducted at the census block group scale and includes roughly 25,000 of the 29,000 block groups in California. Explanatory variables included population density, employment density, intersection density, transit frequency, jobs accessible via transit versus by auto, and the share of housing cost-burdened households.

We employed a multivariable regression model with log-transformed residential VMT per resident as the dependent variable. Predictor variables enter the model in different forms depending on their nature and distribution. The four density and frequency measures are log-transformed, so their coefficients are interpreted as elasticities. We included two additional predictors in their natural units: the ratio of job accessibility by transit vs auto, and housing cost-burdened households. Table 1 provides an overview of all the variables along with their sources and forms in the model.

For consistency with international reporting conventions, an additional specification was estimated with the dependent variable expressed in vehicle kilometers traveled (VKT) per resident. Intersection density was similarly converted from intersections per square mile to intersections per square kilometer, and all other variables remained the same. We included the results in the Appendix.

To account for systematic regional heterogeneity across California’s diverse metropolitan contexts, for example, differences between the Los Angeles, San Francisco Bay, Sacramento, and San Diego regions, we estimated models for four major metropolitan areas, other MPOs, and non-MPO areas separately.

Table 1.Variables used in the model to estimate the relationship between VMT and the built environment

Variable	Definition	Source	Form in model
Dependent Variable
Residential VMT per resident	average home-based VMT (generated by all trips where origin or destination is the home) per resident	Streetlight Data via VMT Index Tool	Log
Explanatory: Built Environment (Density)
Population Density	Gross population density in people per acre on unprotected land at the census block group level	EPA Smart Location Database v3.0	Log
Employment Density	Gross employment density in jobs per acre on unprotected land at the census block group level		Log
Intersection Density	Number of street intersections per square mile		Log
Explanatory: Built Environment (Accessibility and Transit)
Transit Frequency	Number of transit service runs per hour aggregated at the block group level	EPA Smart Location Database v3.0	Log
Ratio of Job Accessibility by Transit vs Auto	The number of jobs accessible within 45 minutes by transit divided by the number of jobs accessible within 45 minutes by auto	EPA Smart Location Database v3.0	Ratio
Explanatory: Selected Household Characteristics
Housing Cost Burden	Percent of households with 35% or more of income spent on housing, weighted by owner and renter totals (aggregated at census block group)	ACS 2020, tract	Percent

To evaluate model validity and inference stability for the final specification, we calculated variance inflation factors (VIFs) for all predictors, all of which were below 3, indicating no problematic multicollinearity among regressors.

3. Findings

To examine whether these statewide relationships hold consistently across California’s diverse metropolitan contexts, we estimated separate models for five regional geographies: MTC (San Francisco Bay Area), SACOG (Sacramento region), SCAG (Southern California), SANDAG (San Diego), and non-MPO areas. Table 2 presents the estimated coefficients for each region with significance indicators. Across regional models, several built environment characteristics are consistently associated with lower residential VMT. All density factors show a consistent negative association across all five geographies, suggesting that more concentrated activity and better-connected street networks support shorter auto trips and non-auto travel. Transit-related variables show negative associations in several regions, indicating that where service is available and competitive with auto travel, residents tend to drive less.

Table 2.Regional Coefficient Comparison by MPO/Planning Region

Variable	MTC (Bay Area) R²: 0.34	SACOG (Sacramento) R²: 0.38	SCAG (Southern California) R²: 0.44	SANDAG (San Diego) R²: 0.41	Other MPOs R²: 0.40	Other (non-MPO) R²: 0.37
Log of population density	-0.11***	-0.03**	-0.13***	-0.10***	-0.07***	-0.06**
Log of employment density	-0.03***	-0.02*	-0.04***	-0.01**	-0.06***	-0.09***
Jobs accessible: transit vs. auto	-0.03***	-0.02	-0.01***	-0.06***	-0.03***	-0.05**
Log of intersection density	-0.02***	-0.08***	-0.02***	-0.04***	-0.04***	-0.06***
Log of transit frequency	-0.01	-0.02**	-0.05***	-0.01	-0.02***	-0.05**
% of Housing Cost-Burdened Households	0.01***	0.01***	0.00*	0.00	0.00**	-0.01

Significance: *** p<0.001, ** p<0.01, * p<0.05, . p<0.1. MTC = Metropolitan Transportation Commission; SACOG = Sacramento Area Council of Governments; SCAG = Southern California Association of Governments; SANDAG = San Diego Association of Governments. Models estimated separately for each region using the same specification: log(VMT) ~ log(Pop density) + log(Emp density) + Jobs(transit/auto) + log(Transit frequency)+ log(Intersection density) + Housing burden (%).

This study demonstrates the value of a transparent, small-scale modeling approach that planners can both explain and use. The model is designed to support practical decisions, screening strategies, comparing scenarios, and grounding conversations about VMT reduction in a consistent empirical framework.

We acknowledge that a few limitations and interpretation cautions should guide how the findings are used. First, because the analysis is cross-sectional, estimated relationships should not be treated as causal effects of single interventions, and they may reflect unobserved local factors correlated with both the built environment and travel behavior (Ewing et al. 2016). Second, the model intentionally excludes two categories of predictors that fall outside the policy-lever framework. Vehicle ownership is excluded as a predictor, consistent with evidence that it is itself shaped by built-environment conditions and should not be treated as exogenous to the same variables included as predictors (Yin et al. 2025). Socioeconomic and demographic variables such as household income, household size, age distribution, and education are similarly excluded. While these characteristics correlate with travel behavior, they are not directly subject to planning intervention in the way that density, street connectivity, or transit investment are. The model is intended to be a policy tool, and variable selection was guided by the purpose to isolate the built-environment dimensions that planners and policymakers can act on. We acknowledge that this introduces potential omitted variable bias into the built-environment coefficient estimates, and recommend that applications of this model treat the coefficients as reflecting associations under realistic neighborhood conditions rather than as isolated causal effects of individual variables.

We addressed the regional heterogeneity by estimating separate models for each geography, which allows coefficients to vary freely across regions; however, unmeasured within-region variation and spatial autocorrelation remain present in the residuals. Finally, the analysis is limited to California, which provides a rigorous and policy-relevant context given the state’s VMT reduction mandates, but coefficients may not transfer directly to states with substantially different land use patterns, transit systems, or travel cultures.

The findings carry practical implications for planners and policymakers working to reduce VMT through land use and transportation investment. The consistent negative associations between VMT and population density, intersection density, employment density, transit frequency, and job accessibility by transit across all five regions suggest that coordinated strategies targeting multiple built environment dimensions simultaneously are likely to be more effective than single-variable interventions. Compact, mixed-use development paired with frequent and competitive transit service appears to be the most reliable combination for reducing residential driving across California’s diverse contexts. The regional variation in coefficient magnitudes further implies that investment priorities should be calibrated to local conditions.

The positive association between the housing cost burden variable and VMT across most MPO areas suggests that cost-burdened households in those regions are disproportionately located in less accessible, higher-driving neighborhoods, pointing to a compounding disadvantage in which affordability pressures and auto-dependence reinforce one another. This underscores the importance of co-locating affordable housing with transit-rich, walkable environments. However, this association is marginal in the San Diego Region (p<0.1) and negative and marginal in non-MPO areas, suggesting the relationship between housing affordability and driving is context-dependent and warrants further investigation.

Acknowledgments

The authors would like to acknowledge Jerry Walters for his foundational contributions to this work and for his encouragement to consider the policy implications of each variable included in the model.

Modeling Residential Vehicle Miles Traveled at Block Group Level in California

Abstract

1. Questions

2. Methods

3. Findings

Acknowledgments

References