Input: The graph \(G\), the parameter \(K\), a starting solution \(s\), and the probability of agent taxis \(p_{rl}\).

Output: A solution for the current time step

1: while time is available do

2: First iteration: Fix the agent taxis \(T\) using \(p_{rl}\);

3: Initialize the backbone graph \(BG\) by removing all arcs \(KG;\)

4: Add solution s to \(BG;\)

5: while \(BG\) has less than \(E_{\max} \times \left( 1 - p_{rl} \right)\) arcs do

6: For each taxi \(t\) not in \(T\), generate a random walk on \(KG\) from \(t\);

7: Add those arcs to \(BG;\)

8: end while

9: All agents choose their actions (paths) using Q-learning to complete \(E_{\max}\) arcs in \(BG;\)

10: Solve MIPmaxflow on \(BG;\)

11: Update the solution \(s;\)

\(12:\) end while