Scheduling a project with interconnected dependencies and multiple processors is a known NP-hard problem. YOU CANNOT DIVIDE AND CONQUER TO SOLVE A COMPLEX SCHEDULING PROBLEM IN POLYNOMIAL TIME. While simple schedules involving 1 processor and a couple of tasks is trivial to solve (one coding agent running in a single thread, i'm sure we all know how that this is pretty much solved by now). Asking an LLM to implement a complex project plan is the equal to asking it to solve an NP-hard problem.

The number of possible schedules you have with 1000 tasks, 10 processors and 1 day is P(172800, 1000) = a very big number. While you might get lucky (basically 0% of the time) and find a solution it's way more probable that the project ends up in a million lines of disapointment. There are far more efficient ways to generate schedules than relying on LLM hallucinations (and i'm surprised very few people are talking about this). ILP solvers like CP-SAT have been around for years and help companies schedule everything from flight crews to kubernetes cluster resources, why not use this to orchestrate agents?

I wonder if you can use these solvers ILP solvers like CP-SAT to create plans for a coding agents that lead to qualitative differences compared to a vanilla LLM plan. I'll conduct the experiment by using the cursor teams experiences outlined in their blog post titled Scaling long-running autonomous coding from back in the stone ages of Jan. 2026 and comparing it with a scheduling procedure that I made using CP-SAT (you can use proprietary solvers like Gurobi or CPLEX as well but CP-SAT is open source so...). Since I don't have $1 million dollars to spend on trillions of tokens to create a web browser, i'll settle with creating a CRM system for a moving company, which should include a lot of complexity that most vibe coding apps won't be able to manage without human oversight.

The setup

Prerequisites

Vanilla LLM

We have a planner agent that decides what tasks to do next (pick the first tasks based on the PRD). The planner is able to spawn worker agents that implement the currently active tasks. When the worker agents are done a judge will determine whether the task has been implemented or if needs to be refined or redone completely. The planner decides if there needs to be more features added or removed and then decide what tasks to do next. This process continues until the requirements have been fulfilled.

Time elapsed Model Tokens in Tokens out Cost
16:28 claude-opus-4-6 8 241 813 132 432 USD 11.62
19:34 calude-opus-4-6 11 236 394 166 475 USD 11.69

Scheduling method

From the PRD we ask a planner agent to generate a DAG. This DAG is used to create a schedule for the project using CP-SAT. For each time step in the schedule the planner checks what tasks should be in progress, if no agent is working on the task and it has not yet been started it spawns an agent, if the task is done a judge is used to check if it has been implemented correctly if it's not the worker gets 3 retries to finish the task otherwise the code is used as is. This process continues until the entire schedule has been completed.

Time elapsed Model Tokens in Tokens out Cost
29:44 claude-opus-4-6 17 381 567 132 502 $13,89