Input: The graph \(G\), the parameter \(K\), a starting solution \(s\), and the probability of agent taxis \(p_{rl}\).
Output: A solution for the current time step
1: while time is available do
2:       First iteration: Fix the agent taxis \(T\) using \(p_{rl}\);
3:       Initialize the backbone graph \(BG\) by removing all arcs \(KG;\)
4:       Add solution s to \(BG;\)
5:       while \(BG\) has less than \(E_{\max} \times \left( 1 - p_{rl} \right)\) arcs do
6:             For each taxi \(t\) not in \(T\), generate a random walk on \(KG\) from \(t\);
7:             Add those arcs to \(BG;\)
8:       end while
9:       All agents choose their actions (paths) using Q-learning to complete \(E_{\max}\) arcs in \(BG;\)
10:     Solve MIPmaxflow on \(BG;\)
11:     Update the solution \(s;\)
\(12:\) end while