## 1. Questions

Major airports devote significant resources to mitigate congestion (Harris, Nourinejad, and Roorda 2017). Their interventions lie on a spectrum of time-scales: bespoke responses to individual incidents (e.g., dispatching operations personnel) at one end and long-term strategic initiatives (e.g., building new infrastructure) at the other (Ugirumurera et al. 2021). In between the two are a range of tactical decisions that may rely on both proactive and reactive actions (e.g., constructing a permanent digital signboard that displays messages in response to real-time changes). We focus on this category of interventions, which currently tend to be based largely on heuristics and domain intuition. Quantifying the effect of such interventions in a numerically robust, reproducible, and interpretable manner is a crucial step towards automated decision-making that can optimize for the objectives that airports care about.

We wish to measure the effectiveness of variable message signage at diverting traffic between arrivals and departures using historical vehicle speed and flow (volume per unit time) and sign data obtained from SEA. Measuring this effectiveness will help implement automated traffic diversion control that optimizes for congestion, sojourn, and/or emissions. A time delay exists between interventions, i.e., traffic diversion messages, and measurable responses, i.e., vehicle flow entering the roadways. Thus, while we can distinguish between congested and uncongested regimes, naively analyzing the data does not indicate if the system responds. Since we lack counterfactual data, we need to make a prior assumption on 1) the causal relationship between the timing of the sign and the response rate and 2) the distribution of the effectiveness of the signage at diverting drivers.

## 2. Methods

The difficulty of our setting stems largely from the available data: vehicle flow and median speed over 4 months, binned to 15-minute intervals, and time-stamped records of messages that request traffic to divert. *We observe no per-vehicle information whatsoever*. Neither do we know the intended and eventual destinations of incoming vehicles, nor can we estimate them. We also cannot run a controlled experiment and must only work with historical observational data.

We frame our problem as estimating the effect of a multi-timestep intervention on a time-varying system. We first quantify the statistical significance of the treatment impact observed on three traffic-related metrics during the intervention period using a two-tailed independent T-test. The first two are the vehicle flow and median speed. We also use a third metric that combines speed and flow as a single measure of congestion. This metric, the so-called *critical ratio*, is the ratio of median speed to the critical speed, which is the speed threshold (given the current flow) below which the system is likely congested (Kerner 2009). The lower the critical ratio than 1, the more congested the system (details omitted for brevity). We posit a null hypothesis about the mean change in these metrics before and during intervention, i.e, the time bins in which the diversionary message remains active. A t-statistic outside the 95% confidence region (p-value less than 0.05) is evidence of a causal relationship but does not alone yield a function that maps the current system state to an expected number of drivers who will divert if diversion is signaled. Therefore we additionally assume a plausible causal model where the treatment is the diversion and the outcome is a function of the difference in incoming flow between Departures and Arrivals. Finally, we use Bayesian regression to estimate the average treatment effect overall and controlling for hour-of-day.

### 2.1. Estimating Treatment Effects

The following is a Bayesian Linear Regression model for estimating time-varying treatment effects:

Outcomet∼Normal (α+β⋅Outcomet−1,…+γ⋅Treatmentt,t−1,…,σ)α,β,γ,σ∼Priors,

where priors encode domain knowledge (McElreath 2020). We have two kinds of treatments, for diverting from Departures to Arrivals (denoted TD) and for the opposite (denoted TA). At most one treatment can be active at a given time. We also rule out terms beyond time Flow and speed are binned to 15-minute intervals; any driver making a choice in the current time-step would have seen the message in at most the previous time-step.

Treatments attempt to redirect traffic between roadways, thus our outcome should depend on the difference in flow, i.e., *might erroneously conclude that treating Departures sends more traffic towards it*, which is highly implausible.

Instead, we posit that our outcome variable should be the rate of change of

, i.e., Our regression model is then:Δq′t∼Normal(α+β⋅Δq′t−1+γD⋅TDt,t−1+γA⋅TAt,t−1,σ).

Here, the posterior distributions of

and encode the average estimated effect of the respective treatments, in the current and previous time bin, on the rate of change of If the treatments work as intended, we expect and (diverting from Departures to Arrivals would reduce the rate-of-change of difference between and and vice versa) and (treatments should explain most of the variation when active, not the outcome at the previous time-step). If so, then and are our estimates of the average number of vehicles that respond to the treatment, i.e., that divert from one facility to the other when the corresponding message is active. Since we do not have counterfactual information from controlled experiments, these regression co-efficients are our best available estimate of the treatment effects.Our hypothesis about the first-order rate of change is based on intuition, not a physical scientific model. Therefore, we run another regression with the second-order rate of change (Gelman et al. 1995). All experiments use the PyMC Python library (Patil, Huard, and Fonnesbeck 2010).

to determine if treatments affect second or higher orders. For this regression, we only care about the significance of treatments, i.e. their relative rather than absolute co-efficients. Thus, we standardize the outcome such that has mean and standard deviationFor the outcome variables, we have data for every 15 minute interval for a little over 4 months, i.e., nearly

time intervals. For the treatments, we have precise time-stamped records of when each message was displayed. Both kinds of treatments vary widely in duration and time of deployment. Most instances lie between minutes and hours, but a handful of both kinds last for longer. All Departures treatments are deployed starting between 5 am and 2 pm, and most Arrivals treatments between 8 pm and midnight. Since we know when each treatment starts and ends, there are some 15 minute intervals where treatments are only partially active; we control for this by encoding TA/TD to be the corresponding fraction of 15 minutes, i.e., not## 3. Findings

### 3.1. Statistical Significance Test

Table 1 summarizes the results of the T-test. The observed effect appears to be consistent for different variables of interest. The changes in median speeds in response to TA and TD are statistically significant, with a low p-value, indicating sufficient evidence to reject the null hypothesis. The departure flow remains unchanged and the arrival flow appears to decrease when the treatment is in effect, which implies some measurable adherence to TD and TA. Furthermore, the confidence intervals (CI) of the difference in means for the critical ratio metric for departures and arrivals demonstrate that overall congestion improves significantly in departures and marginally in arrivals. This analysis also uncovered some missed opportunities, as noted in Figure 2. Despite the positive effect of the treatment on the overall system state, the intervention comes in too late, i.e., after the system is already congested. A more proactive, predictive control strategy could decrease the time spent in a congested state, leading to lower overall travel time and many other environmental and logistical benefits.

### 3.2. Quantifying Response Rates

Bayesian Linear Regression yields a posterior distribution for each parameter. We focus on the autoregressive and treatment co-efficients, i.e., Figure 3 illustrates posterior distributions for each of for the first-order (solid) and standardized second-order (dashed) outcomes; we plot the distribution of to easily interpret it as the positive number of vehicles diverted. The first-order auto-regressive co-efficient is close to 0, while the treatments are not, which suggests that the treatments explain most of the variation. The magnitude of the second-order auto-regressive co-efficient is much higher than either treatment co-efficient, which suggests that the treatment has no significant effect on higher-order rates-of-change.

from Equation 2. Our two regressions used as outcomes the first-order rate-of-change of flow and the standardized second-order rate-of-change of flow respectively.*The first-order co-efficients from Figure 3 suggest that on average, roughly 12 cars per 15 mins respond to Departures treatments and 4 cars per 15 minutes respond to Arrivals treatments*. Since traffic varies by hour-of-day, we expect the response to as well. Thus, we reran the regressions while controlling for hour-of-day. Table 2 reports the average hourly response to each treatment as a percentage of average hourly traffic, for three selected hours-of-day (roughly corresponding to AM/PM peaks). We observe considerable variation by hour, as well as clearer picture of how the response to one treatment is higher than the other. Note that response rates can change depending on several factors, such as the location of VMS board; future work will further investigate the factors that could affect the response rates.

### Acknowledgements

Pacific Northwest National Laboratory is operated by Battelle Memorial Institute for the U.S. Department of Energy under Contract No. DE-AC05-76RL01830. This work was supported by the U.S. Department of Energy Vehicle Technologies Office. We also thank the landside operations team at Seattle-Tacoma International Airport for sharing data and insights.