1. QUESTIONS
Active travel (AT) (i.e., walking and cycling) offers individual and community-wide benefits. For example, it fosters physical activity and social interaction and reduces car use (Faulkner et al. 2009; McDonald et al. 2016). While the determinants of active travel are complex, distance from origin to destination is a significant barrier to active travel, especially for younger populations, as they are sensitive to distance because of their small steps (Nelson et al. 2008). Despite its importance, there are multiple ways to model distance, and little consensus on which method is more accurate.
Many choice models employ distance as a continuous independent variable, assuming a direct relationship between distance and the utility of active travel (Mitra, Buliung, and Roorda 2010), which may oversimplify complex behavioural patterns. Since the utility is exponentiated via a logistic function in logit models, the linear treatment of distance does not suggest a linear decrease in the probability of choosing active mode. Categorical treatments divide distance into specific bands, such as 0-1 km and 1-2 km, to capture distinct travel behaviour patterns (Mitra and Buliung 2015). While this approach recognises that shorter distances may encourage active travel, sharp category boundaries can lead to misleading interpretations, particularly at the breakpoints, and are insensitive to the impact of distance within the same category. Quadratic approaches, on the other hand, offer flexibility by capturing non-linear effects, but can complicate result interpretation with higher-order terms (Subhojit 2021).
Finally, there are piecewise approaches, which divide distance variable into distinct intervals, allowing each interval to have a separate coefficient while maintaining continuity at the breakpoints (Train 2009). By doing so, piecewise treatment of distance can capture critical behavioural shifts, such as a sharp decline in walking or cycling beyond a certain distance, offering superior accuracy.
The key question addressed in this paper is whether piecewise treatments address the limitations of linear, logit, quadratic, and categorical approaches in explaining active modes. Using a database of 7555 students’ journeys to school from Sydney, Australia, the study compares different methods to assess distance’s impact on model fit, active mode probability predictions across distance ranges, and distance elasticity estimates.
2. METHODS
The study utilises data from the 2015 School Physical Activity and Nutrition Survey (SPANS), which includes detailed attributes such as age, gender, socioeconomic status (SES), school type, and travel mode choice (as detailed in Hardy et al. (2017). Built-environment measures, including population density (ABS 2016b), land-use mix (ABS 2016a), intersection density (Geofabrik 2018), cycling infrastructure (Open data hub 2023), and traffic calming (Geofabrik 2018), were also appended, based on the school location. Table 1 provides the description of the variables used.
Five binary choice models were developed to assess how distance affects the likelihood of choosing between active versus non-active travel modes. The models vary only in their treatment of distance: no distance variable, linear, quadratic, categorical, and piecewise specifications. Control variables included demographic and built-environment factors.
Model estimation was performed using the Biogeme package (Bierlaire 2009), with performance assessed through McFadden’s R2, and Akaike Information Criterion (AIC). Elasticity values, representing the percentage change in active mode choice probability for a one per cent increase in distance, were computed and summarised across demographic segments to highlight the implications for different population groups.
3. FINDINGS
The piecewise model demonstrates the best overall fit among the five models, as indicated by the highest McFadden’s R2 value (66%) and the lowest AIC (3128.2) score, as illustrated in Table 2. While the linear, quadratic, and categorical models also perform well, their fit metrics are slightly lower compared to the piecewise model. In the piecewise model, the negative coefficients for each distance range emphasise the sharp decline in active travel likelihood as distance increases (e.g., -2.73 within the first cap). This suggests that the negative relationship between active travel and distance changes substantially in magnitude at certain thresholds. Based on this evidence, the linear specification not only reduces the model’s predictive power but also overlooks the non-linear behaviour. Thus, the linear specification is not recommended for use in this context.
Walking probability versus distance plots show a discontinuity in the walking probability at the breakpoints of the categorical model, which is counterintuitive (Figure 1). However, the other models display a more logical trend, albeit with different gradients. The quadratic model shows an unexpected incline at longer distances, suggesting an increase in walking probability beyond 18 km, which is implausible. This anomaly arises from the second power of distance, which corrects for non-linearity at shorter distances but produces unrealistic results at larger distances. These probability plots suggest that categorical variables and quadratic forms should be avoided, especially when travel distances, including non-walking modes, can be large, as is often the case in a city like Sydney.
Mean elasticity values in Table 3 compare the estimated elasticities across three dimensions: Location (Inner Regional, Major Cities, Outer Regional, Remote), SES (low: <800, mid: between 800 and 1000, high: >1000), and distance bands (0-1km, 1-3km, 3-5km, >5km). The average distance elasticity shows a consistent elasticity of about -2% across quadratic, categorical, and piecewise models, while the linear model shows -4.6%. This discrepancy is more pronounced for certain subgroups; for example, in Inner Regional areas with low SES and 3-5 km, the linear, quadratic, categorical, and piecewise models yield elasticities of -5%, -4%, 0, and -2%, respectively. Such significant differences are concerning, particularly when using elasticity values for policy assessments, including equity-based interventions targeting specific demographic groups. Additionally, piecewise elasticities tend to be more stable, generally ranging from -1% to -3%, whereas the categorical model exhibits a broader range from -0.17% to -36%. Quadratic elasticities occasionally show positive values, which is counterintuitive but consistent with earlier observations from probability versus distance plots.
The influence of the built-environment on active transport is complex. While these findings are limited to a specific built and cultural context, they do underscore the importance of model specification in understanding and promoting active travel. Future research is required to apply these findings to other contexts and modes. Nevertheless, it is clear that piecewise models offer a more accurate reflection of distance effects, supporting their use in research and interpretation into policy outcomes.
ACKNOWLEDGMENTS
This study used the SPANS dataset, funded by the NSW Ministry of Health. We also acknowledge the SPANS data collection team and are particularly thankful to Dr Louise Hardy for her assistance with this work. Finally, we acknowledge the students, parents, schools and staff who participated.
This research is supported by an Australian Government Research Training Program (RTP) Scholarship and the Australian Research Council (DE190100211).