In late 2005, the Tri-County Metropolitan Transportation District of Oregon (TriMet) partnered with Google to develop a data standard for incorporating transit route and schedule information into third-party navigation applications (McHugh 2013). Two years later, this data standard was published as Google Transit Feed Specification (GTFS) (and retained the acronym when it was subsequently renamed as General Transit Feed Specification). GTFS has transformed the way travelers plan trips, and its success has relied on the willingness of transit operators to adopt it.
What might explain a transit agency’s decision to adopt the GTFS standard earlier? Frick, Kumar, and Post (2020) have found that small transit agencies (those with reduced reporting requirements from the Federal Transit Administration) and rural transit agencies were less likely to have published GTFS feeds and that independent public transit authorities are more likely than other organization types (such as transit agencies that are organized within local government units) to publish real-time vehicle locations in the more-recent GTFS-realtime data format. This may suggest that independent authorities are more open to early technology adoption.
If transit agency characteristics are associated with earlier technology adoption, findings by Iseki at al. (2007) on early adoption of smart card fare collection may be informative. They found that early adopters tended to be those with greater funding availability and those with established relationships with other transit agencies. Rogers’ (2003) review of four decades of innovation diffusion literature highlights several characteristics of organizations that correlate with being early adopters of new technologies, including larger size, social interconnectedness, and organizational slack (i.e. additional resources beyond what is required to deliver a firm’s core product).
Based on the above background, we developed a set of hypothesized relationships with the likelihood that a transit agency will have been an early adopter of the GTFS data standard. These are listed in Table 1.
We identified 471 transit agencies from the National Transit Database that carried passengers on scheduled service in 2007 (shortly after the GTFS standard was published in September 2006). For each agency, we identified the earliest GTFS feed available from any of three online archives: GTFS Data Exchange (Czebotar 2016), OpenMobilityData (Mobility Data IO 2021), and Transitland (Interline Technologies 2022). We used the earliest date in the earliest publicly available feed as each transit agency’s date of GTFS adoption.
We structured the dataset so that each observation represents a specific agency in a specific year. We compiled a set of variables (listed with their sources in Table 1) for each agency for each year from 2007 to the earlier of (1) 2020 or (2) the year in which the agency adopted the GTFS data standard. Each agency only appears in the dataset for years in which it could have adopted the standard, and not in years when it had already adopted it.
The resulting dataset included 3,514 observations for 471 agencies. Table 2 summarizes the variables of interest for 2007 (the year for which all 471 agencies are included in the analysis).
Figure 1 illustrates the correlations among all pairs of continuous independent variables. GTFS market penetration is almost perfectly correlated with the number of years since the publication of the standard. Service area population is highly correlated with the number of transit agencies in a region (r = 0.75) and with the share of a region’s total transit service an agency represents (r = -0.72). The number of transit agencies in a region is also highly correlated with the share of a region’s total transit service an agency represents (r = -0.59).
We estimated three logistic regression models predicting the likelihood that an agency adopted the GTFS data standard in a given year, with standard errors clustered by agency. We estimated a null model with no predictors as a basis for comparison with the model fit statistics of the other two models, a full model including all variables listed in Table 1, and a final model that reduces multicollinearity by excluding population, number of transit agencies within a region, annual vehicle revenue distance, and years since GTFS publication.
Figure 2 shows the increase in the market penetration of GTFS in the United States from January 2005 to June 2022. GTFS’s market penetration was zero until TriMet began using a version of GTFS in 2005. Five additional agencies piloted the standard and adopted it simultaneously with its publication in September 2006 (McHugh 2013). These initial six agencies are not included in our regression analysis, which only includes agencies that had not yet adopted GTFS when it was initially published publicly. After its publication in September 2006, the first agency to adopt GTFS was the San Francisco Bay Area Rapid Transit District (BART) in January 2007. By June 2022, just over 75 percent of transit agencies with scheduled transit service had adopted GTFS.
Table 3 shows the results of all three logistic regression models. All continuous variables are mean-centered and scaled to have a standard deviation of one. Coefficient estimates represent the predicted change in the log-odds of adoption associated with a one-standard-deviation difference in the predictor variable.
The final model fits the data approximately as well as the full model, but is more interpretable with the removal of highly correlated variables. The final model predicts that an agency will have adopted GTFS sooner if it serves more passenger trips and provides a smaller share of the total vehicle revenue distance in its service area. Figure 3 illustrates how the predicted probability of GTFS adoption in a given year would vary by annual ridership and an agency’s share of the total vehicle revenue distance in its urbanized area, based on the final model.
This research was funded by the Laboratory for Design Technology at the Harvard Graduate School of Design. The authors would like to thank Mengyao Li for her assistance in assembling data from the National Transit Database.
The earliest GTFS feed for the City of Fairfax CUE bus listed a service date range from 2000 – 2099. Rather than list 2000 as the date of GTFS adoption, we used the date the feed was uploaded to the GTFS Data Exchange.