Network Origin-Destination Estimation using Percolation

David Levinson

doi:10.32866/001c.147387

1. Questions

Most trip distribution models allocate flows across all origin–destination pairs using a global rule (e.g., gravity/entropy (Erlander and Stewart 1990; Wilson 1970) or discrete choice (Domencich and McFadden 1975; Ben-Akiva and Lerman 1985)), or via sequential acceptance without explicit destination capacity (Intervening Opportunities (Stouffer 1940), or Radiation (Simini et al. 2012)), or using hiring dynamics (Tilahun and Levinson 2013). These approaches usually treat the network implicitly. A broad comparison with gravity/entropy, discrete choice, intervening opportunities, radiation, and ABODE appears in Table 1.

Table 1.Comparison of trip distribution approaches

Feature	Gravity/ Entropy	Utility-Based	Intervening Opportunities	Radiation	ABODE (2013)	NODE
Network representation	Costs in impedance	Costs/ attributes in utility	Often network-free or distance rank	Often network-free, opportunity rank	Travel time in utility, network implicit	Shortest-path costs, cost-ordered search
Destination Capacity constraints	Optional via shadow prices/ doubly-constrained	Optional via shadow prices	Implicit via cumulative opportunities	Implicit via intervening population/ jobs	Explicit vacancies filled by matches	Explicitly enforced
Functional form	Global impedance parameter	Estimated utility function	Cumulative acceptance	Deterministic by opportunities/ totals	Agent rules with utility/value	None; optional acceptance
Decision process	All OD pairs simultaneous	All OD pairs simultaneous	Sequential by distance/ opportunities	Sequential by opportunities	Sequential search, offers, acceptance; iterative	Sequential by increasing cost
Data requirements	Totals, impedance	Totals, impedance, attributes	Totals, ranked opportunities	Totals, opportunities by distance	Workers, jobs, wages/skills, network costs	Totals, network costs
Behavioural richness	Low	High	Medium (opportunity-based)	Low to medium (structural)	High	Medium (search, competition)
Computation	Low	Moderate	Low to moderate	Low	High	Low to moderate

In contrast, we examine a cost-ordered, percolation-like expansion that depletes destination capacity as it is reached. With shortest-path costs $c_{od}$ , the global benchmark is the Hitchcock–Koopmans transport problem (a min-cost flow on the OD bipartite graph) (Hitchcock 1941; Koopmans 1949; Dantzig 1951; Burkard, Klinz, and Rudolf 1996).

Our main question is:

Can we construct an origin–destination matrix that respects destination opportunity constraints using only shortest-path costs and a fixed, reproducible order, without estimating behaviour?

Under what conditions is this approach like or unlike existing approaches? For instance, under a one-parameter acceptance rule with scaling factor, do expected flows reduce to a singly-constrained exponential-gravity form when no destination reaches its capacity? Similarly, under what cost/topology conditions does the greedy, cost-ordered allocation coincide with, or diverge from, a min-cost-flow solution?

2. Methods

The Network Origin–Destination Estimation (NODE) method allocates workers $W_o$ from origins $o\in O$ to jobs $J_d$ at destinations $d\in D$ by sweeping feasible origin–destination pairs in non-decreasing (weakly increasing) shortest-path cost $c_{od}$ , assigning until either the origin’s residual production or the destination’s residual capacity is exhausted.

Objects and data types. Let $W\in\mathbb{Z}_{\ge 0}^{|O|}$ , $J\in\mathbb{Z}_{\ge 0}^{|D|}$ , $X\in\mathbb{Z}_{\ge 0}^{|O|\times |D|}$ , and $C\in(\mathbb{R}_{\ge 0}\cup\{\infty\})^{|O|\times |D|}$ , where $c_{od}=\infty$ if $d$ is unreachable from $o$ on network $G$ . Residuals are $w_o$ and $j_d$ . Ties in $c_{od}$ are broken by a fixed lexicographic order on $(c_{od},o,d)$ for reproducibility. Destination capacity is explicit and enforced at assignment time; link capacities are not modelled here and are handled in a subsequent route-choice/assignment step coupled to NODE.

NODE works as a percolation-like sweep over the network: as lower-cost states ‘open,’ new $(o,d)$ pairs become admissible; assignments proceed in cost order until destination capacities ‘close’ further flow. Admissible means $c_{od}<\infty$ (reachable over $G$ ).

NODE assumes:

A set of origins $O$ with productions (resident workers) $W_o$ .
A set of destinations $D$ with attractions (jobs) $J_d$ .
A transport network $G = \{V,E\}$ that yields generalised costs $c_{od}$ between $o$ and $d$ .

The method applies a best-first expansion over the network. Any cost-ordered search algorithm (e.g., Dijkstra) can be used; what matters is the order in which destination candidates are discovered by increasing $c_{od}$ . Ties are broken lexicographically for reproducibility. Equal-cost ties can be shared or randomised if desired. A deterministic choice can bias split patterns when ties are common, such as grid networks. NODE constructs a single OD matrix from zonal totals and shortest-path costs. This is shown in Algorithm 1.

Algorithm 1.Deterministic NODE (event sweep with vector update)

It is not inherently a dynamic propagation model like the cell transmission model (CTM) (Daganzo 1994, 1995) or cellular automata (Nagel and Schreckenberg 1992), which evolve link states in time, though it could be used in such an evolutionary way. It can also be used in feedback with an agent-based route choice (ARC) model by supplying capacity- and topology-aware OD seeds (Zhu and Levinson 2018).

Acceptance and hazard. Let $F(c)=\Pr(\text{accept by cost }\le c)$ be the CDF and $f(c)=F'(c)$ the PDF. With hazard $\lambda(c)\ge 0$ , the standard relation is

$\small{f(c)=\lambda(c)\exp\!\Big(-\int_{0}^{c}\lambda(s)\,ds\Big), \quad F(c)=1-\exp\!\Big(-\int_{0}^{c}\lambda(s)\,ds\Big).}\tag{1}$

With constant $\lambda$ , $f(c)=\lambda e^{-\lambda c}$ , and if no destination reaches its capacity and acceptance draws are i.i.d. across $d$ , the expected NODE flows reduce to the origin-constrained exponential gravity form:

$\mathbb{E}[X_{od}] \;=\; W_o\frac{J_d e^{-\lambda c_{od}}}{\sum_{d'\in D}J_{d'}e^{-\lambda c_{od'}}}.\tag{2}$

Here $\lambda$ has units of $1/\text{time}$ if $c$ is time; in a discrete-time realisation with step $\Delta t$ , the per-step acceptance is $\sigma=1-e^{-\lambda\Delta t}$ .

Relation to social networks. The acceptance hazard can incorporate social pull. Prior work shows that social ties systematically shape activity locations and distances, strong ties meet closer or in-home, weak ties meet farther, while contacts mediate information about homes and jobs, influencing where people live and work (Tilahun and Levinson 2011, 2013). In NODE, we can encode this by adjusting the impedance or acceptance rate to account for the strength or share of an origin’s contacts near the destination.

CTM realisation (summary; full details in SI). NODE can be implemented over a time-expanded network. We partition space into cells (states); $P_k$ is a row-stochastic movement kernel of size $|\text{cells}|\times|\text{cells}|$ giving the probability that mass moves between cells per $\Delta t$ at step $k$ . The per-step acceptance $\sigma_{d,k}\in[0,1]$ applies to arrivals at destination cell $d$ (we allow $\sigma$ to depend on $o$ in the SI). Algorithm 2 in the SI defines the state updates explicitly and shows how destination capacity is enforced before splitting across origins.

Incremental and evolutionary application. When a prior trip table exists and jobs or workers change, one can run NODE on residual changes (deltas), or apply a damped rolling reallocation; this yields incremental, not equilibrium, updates, consistent with evolutionary planning ideas (Levinson 1995), aligned with short-run rather than long-run changes.

NODE can also be implemented in an evolutionary frame across scenarios or time steps, in this case consider $t=0,1,\dots$ as costs and capacities change, $(c^{(t)},W^{(t)},J^{(t)})$ . Compare allocations via $\Delta X^{(t)}=X^{(t)}-X^{(t-1)}$ , and record the costs at which destinations fill (frontier ‘closure costs’). The frontier closure cost of destination $d$ is the maximum cost at which any assignment to $d$ occurs in the sweep. This reveals who gains or loses under proposed links, prices, or shocks, and where capacity competition binds.

Position in the literature. NODE differs from gravity/entropy and discrete choice by allocating in cost order with explicit destination capacity and no global utility or balancing multipliers; it differs from intervening-opportunities and radiation by using network shortest-path costs rather than rank by distance or opportunities, and by enforcing capacity at assignment. It coincides in expectation with origin-constrained exponential gravity under the non-binding, single-hazard case above, but it generally diverges when capacities bind, topology reorders the reachable set, or hazards vary across places or groups.

3. Findings

Worked example. Consider two origins with resident workers $W_{1}=4$ and $W_{2}=5$ , and three destinations with job capacities $J_{1}=4$ , $J_{2}=3$ , $J_{3}=2$ . Link costs determine $c_{od}$ on a small network (Figure 2).

Figure 1.Frontier closure by destination. Each bar shows the cost interval during which a destination remains open; light ticks mark assignments; the heavy tick marks the closure cost (last assignment to that destination).

Figure 2.Two-Origin, three-destination toy network. See Table 2 for realised allocations. Edge labels are generalised costs

$c_{od}$ .

The first sweep fills $D1$ from $O2$ , which reaches it earlier in cost order; $O2$ then sends one worker to $D2$ . Unable to access $D1$ , $O1$ sends two workers to $D2$ and its remaining two to $D3$ . The realised allocation (Table 2) matches production and attraction totals, and shows diversion caused by competition. For stepwise clarity, residuals are updated as $w_{o,k}$ , $j_{d,k}$ , and assignments as $X_{od,k}$ .

Table 2.Toy network results. Deterministic sweep and one stochastic realisation. Event order (by

$c_{od}$ ):

$(O2,D1)$ (4),

$(O1,D1)$ (5),

$(O2,D2)$ (6),

$(O1,D2)$ (7),

$(O2,D3)$ (8),

$(O1,D3)$ (9).

	Deterministic $X$			Stochastic (one draw)			Row total
	$D1$	$D2$	$D3$	$D1$	$D2$	$D3$	$W_o$
$O1$	0	2	2	0	3	1	4
$O2$	4	1	0	4	0	1	5
Column total $J_d$	4	3	2	4	3	2	9

Numbers are units (persons). Row sums equal $W_o$ , and column sums equal $J_d$ . The stochastic panel is a single Monte Carlo draw with i.i.d. acceptance; totals are preserved, but the split can differ from the deterministic sweep.

Stochastic draw. Table 2 also reports a single Monte Carlo realisation with i.i.d. acceptance. Row and column totals match $W$ and $J$ by construction, but the split across destinations differs from the deterministic sweep because some earlier candidates are declined.

Equivalence to gravity. With a single $\lambda$ in $F(c)$ and negligible destination capacity constraints, the expected NODE flows converge to those from a singly-constrained gravity or entropy-maximising model with the same impedance curve. This holds with a single hazard $\lambda$ , i.i.d. acceptance independent across destinations, connected OD pairs, and capacities that are never reached. In this limit, NODE is simply a different solution process that preserves the same trip length distribution.

Relation to min-cost flow (summary; SI has details). The deterministic sweep is not a general solver for the Hitchcock–Koopmans transport problem. However, when the OD cost matrix satisfies the Monge property ( $c_{i+1,j+1}+c_{i,j} \le c_{i+1,j}+c_{i,j+1}$ ), a greedy order consistent with non-decreasing $c_{od}$ yields an optimal transport plan; see SI (Proposition A.1) and (Burkard, Klinz, and Rudolf 1996). Outside these conditions, NODE can diverge from the min-cost solution, which we illustrate with a counterexample in the SI.

Departures from gravity. Systematic differences arise from:

Capacity competition: Destinations reached early can fill, diverting later arrivals to further alternatives.
Topology effects: Network structure and asymmetries reorder which destinations are encountered first; cost ties (destinations reached at equal cost) and branching can produce distinctive spatial patterns.
Heterogeneity and stochasticity: Varying $\lambda$ across groups or places, or random acceptance, breaks proportional balancing while preserving production and attraction totals.

When no destination reaches its capacity and costs produce a unique non-tied order, the deterministic sweep matches any allocation consistent with that order, though it need not be globally min-cost when later ties or alternative paths exist.

Implications. NODE is a transparent, network-based allocation method that exactly preserves zonal totals when totals match and pairs are connected, embeds topology and capacity directly in the search process, and can reproduce plausible trip length distributions without explicit global calibration. It is well-suited to:

Seeding OD matrices when only productions, attractions, and a network are known.
Capacity-limited facility allocation problems.
Benchmarking or coupling with route choice schemes that update costs between batches.

Even without behavioural detail, NODE is useful when capacity, topology, or sparse data make conventional gravity or utility-based models less appropriate.

Network Origin-Destination Estimation using Percolation

Abstract

1. Questions

2. Methods

3. Findings

References