Loading [Contrib]/a11y/accessibility-menu.js
Skip to main content
null
Findings
  • Menu
  • Articles
    • Energy Findings
    • Resilience Findings
    • Safety Findings
    • Transport Findings
    • Urban Findings
    • All
  • For Authors
  • Editorial Board
  • About
  • Blog
  • covid-19
  • search
  • X (formerly Twitter) (opens in a new tab)
  • LinkedIn (opens in a new tab)
  • RSS feed (opens a modal with a link to feed)

RSS Feed

Enter the URL below into your favorite RSS reader.

http://localhost:55376/feed
ISSN 2652-8800
Transport Findings
October 17, 2025 AEST

Uncertainty in Cell-Phone Generated Bike and Pedestrian Volumes

Lily Heidger, Trisalyn Nelson, Ph.D., Dan Willet,
big datavalidationactive transportationGPS datacell-phone data
Copyright Logoccby-sa-4.0 • https://doi.org/10.32866/001c.145234
Findings
Heidger, Lily, Trisalyn Nelson, and Dan Willet. 2025. “Uncertainty in Cell-Phone Generated Bike and Pedestrian Volumes.” Findings, October. https:/​/​doi.org/​10.32866/​001c.145234.
Download all (3)
  • Figure 1. Linear regression of Replica car volumes and observed video count volumes.
    Download
  • Figure 2. Linear regression of Replica bike volumes and observed video count bike volumes.
    Download
  • Figure 3. Linear regression of Replica pedestrian volumes and observed video count pedestrian volumes.
    Download

Sorry, something went wrong. Please try again.

If this problem reoccurs, please contact Scholastica Support

Error message:

undefined

View more stats

Abstract

Big data from mobile phones are increasingly used in transport research and planning, offering unprecedented spatial and temporal detail. However, data accuracy remains unclear. This study evaluates Replica, a dataset modelled from mobile phone GPS data, by comparing modeled volumes for motor vehicles, bicycles, and pedestrians against field counts in Santa Barbara, California. Car volumes were modeled with high accuracy (R² = 0.92), while bicycle (R² = 0.23) and pedestrian (R² = 0.05) estimates showed substantial uncertainty. When using transport data generated from mobile phone GPS, additional caution is needed for non-motorized modes.

1. Questions

Big mobility data, often generated from cell phones, are becoming increasingly more popular in transportation research. Cell phone-generated mobility data provide unprecedented spatial and temporal resolutions and extent at a fraction of the cost of traditional data collection. These datasets are typically generated as synthetic mobility records derived from multiple sources, trained on mobile location data, and validated against ground-truth observations for multi-state regions (Replica 2025). However, further data validation is necessary in advance of using data for applied science or for local decision-making. Our study asks the question: how well do modeled mobile phone tracking data, specifically Replica data, represent actual bike, pedestrian, and car volumes in Santa Barbara, California?

2. Methods

According to Replica, their transportation data are modeled from a variety of original data sources, such as mobile location data, demographic data, built environment data, economic activity data, and ground truth data from local municipalities. The scale of these models is “megaregions,” which encompass multiple states and 10 to 50 million residents, depending on the region (Massey, n.d.). Transportation data are modeled as individual trips, with each trip specifying the street network links traversed by a synthetic user. Trips are generated to represent an average weekday or weekend within a given season, such as a typical Thursday in the spring of 2021. Replica data also do not account for recreational trips, where the start and end points of the trip suggest biking or walking for exercise (Replica 2025).

We accessed modeled transportation volumes from ReplicaHQ.com and evaluated them against validation data provided by the City of Santa Barbara. Video counter volume data, collected at multiple intersections and analyzed manually, were cleaned and matched to Replica’s network links. For each time frame, Replica trips passing through intersections with video count data were totaled, and observations were aligned by day of the week, season, and year for comparability. In total, 33 sites across the city were matched, representing a range of car, bicycle, and pedestrian volumes.

To compare Replica counts to actual counts, we ran a linear regression for each of the modes of transportation. We estimated the R2 value for each analysis and compared the results among transportation types.

3. Findings

Modeled car volumes were accurate, with an R2 value of 0.94. Figure 1 shows the results of the linear regression analysis. As shown in the plot, a majority of the car count sites were underestimated by Replica’s model, particularly those with lower volumes. Most of the sites with the highest volumes were more likely to be overestimated by the model.

Figure 1
Figure 1.Linear regression of Replica car volumes and observed video count volumes.

Bike volumes had a much lower correlation coefficient than vehicles, with an R2 of 0.23. Figure 2 depicts a much larger confidence interval, illustrating a higher chance of error. For bikes, Replica’s model over- and underestimates volumes at similar rates, with slightly more than half being underestimated by the model. The variability appears random, where the model over and underpredicts randomly, not based on the site volumes. The two sites with the largest residuals are overestimated by nearly an order of magnitude. While we do not show the results here, we spent significant time trying to identify patterns that could explain the differences in prediction, but the error always appeared random.

Figure 2
Figure 2.Linear regression of Replica bike volumes and observed video count bike volumes.

Among the three transportation modes, pedestrian volumes were modeled with the least accuracy, yielding an R² of 0.05 and wide confidence intervals (Figure 3). As with bicycle volumes, approximately half of the sites were overestimated and half underestimated, with no clear trend in model performance across volume levels. However, the magnitude of error was substantially greater, with the most severely overestimated site exhibiting observed volumes nearly eight times lower than the modeled estimate.

Figure 3
Figure 3.Linear regression of Replica pedestrian volumes and observed video count pedestrian volumes.

These findings highlight the critical need for improved data on bicycling and walking. Modeled car volumes likely achieve higher accuracy as a greater proportion of the cell phone data are generated through vehicle travel (Massey, n.d.). Bicycling and walking are more difficult to isolate based on space-time patterns (Lee and Sener 2020). Additionally, Replica does not model multi-modal transit trips, meaning the trips where individuals take the bus and walk to reach their destination are assigned one mode over the other (Replica 2025). This may lead to an underrepresentation of biking and walking trips near transit stops. Investment in bicycle and pedestrian count programs could improve training data and facilitate more accurate model development. In the meantime, alternative data sources will be necessary for applications such as exposure estimation. It is likely that this issue is not specific to Replica and includes other mobile location datasets, as the task of identifying different modes is challenging (Lee and Sener 2017, 2020). While mobile location data with high spatial and temporal resolution are appealing, reliance on them without sufficient validation may be problematic and risk misleading decision-making, particularly in small and mid-sized cities such as Santa Barbara.

Submitted: September 10, 2025 AEST

Accepted: October 07, 2025 AEST

References

Lee, K., and I. N. Sener. 2017. “Emerging Data Mining for Pedestrian and Bicyclist Monitoring: A Literature Review Report.” Safety through Disruption (Safe-D) National University Transportation Center (UTC) Program.
———. 2020. “Emerging Data for Pedestrian and Bicycle Monitoring: Sources and Applications.” Transportation Research Interdisciplinary Perspectives 4:100095. https:/​/​doi.org/​10.1016/​j.trip.2020.100095.
Google Scholar
Massey, Lauren. n.d. “Data Quality Overview and Technical Approach.” Replica Help. Accessed August 18, 2025. http:/​/​help.replicahq.com/​en/​articles/​6214243-data-quality-overview-and-technical-approach.
Replica. 2025. “Active Transportation Trips.” https:/​/​documentation.replicahq.com/​docs/​active-transportation.

This website uses cookies

We use cookies to enhance your experience and support COUNTER Metrics for transparent reporting of readership statistics. Cookie data is not sold to third parties or used for marketing purposes.

cookies
cookies
cookies
Powered by Scholastica, the modern academic journal management system