**Input:** The graph \(G\), the parameter \(K\), a starting solution \(s\), and the probability of agent taxis \(p_{rl}\). |

**Output:** A solution for the current time step |

1: **while** time is available **do** |

2: **First iteration:** Fix the agent taxis \(T\) using \(p_{rl}\); |

3: Initialize the backbone graph \(BG\) by removing all arcs \(KG;\) |

4: Add solution s to \(BG;\) |

5: **while** \(BG\) has less than \(E_{\max} \times \left( 1 - p_{rl} \right)\) arcs **do** |

6: For each taxi \(t\) not in \(T\), generate a random walk on \(KG\) from \(t\); |

7: Add those arcs to \(BG;\) |

8: **end while** |

9: All agents choose their actions (paths) using Q-learning to complete \(E_{\max}\) arcs in \(BG;\) |

10: Solve MIPmaxflow on \(BG;\) |

11: Update the solution \(s;\) |

\(12:\) **end while** |