May 16, 2026 · Engineering

How SwiftCNS turns chat into a multi-agent evaluation system

Saad Naji

4 min read

When we started building SwiftCNS, chat looked like the obvious interface, and it still is. People describe uncertain ideas in messy language, and conversation gives them a natural place to start. The harder problem is evaluation: knowing which assumption moved forward, which experiment was chosen, and what evidence changed the decision. That requires durable state the project can carry, not just a thread of good answers. Chat could not be the unit of work. It had to sit above orchestration agents, interactive cards, and an evidence record that outlive any single thread.

The layout we build toward keeps the chat surface on the left, routes evaluation work through specialized agents in the middle, and writes durable project state on the right. The sketch below is that shape at a glance: conversation in, evaluation state out.

Diagram showing chat on the left, orchestration agents in the middle, and durable project state on the right, with the caption conversation in and evaluation state out.

Evaluation is the bottleneck

AI made output cheap; it did not make evaluation easy. A chat-only assistant can explain an idea, critique it, and suggest what to try next, but unless the conversation becomes durable state, the team is still left with a transcript when the thread ends. SwiftCNS is built for that gap: agents, cards, and evidence that survive beyond the thread. For why evaluation is the strategic bottleneck, read the SwiftCNS thesis.

Chat is the interface

The first design choice was to stop treating the answer as the artifact. In SwiftCNS, conversation is where the user expresses uncertainty; progress shows up when a piece of the learning loop is represented as something the project retains, whether that is an assumption, a hypothesis, an experiment card, a learning, an insight, or decision context. We call that state: a durable change in the project, such as an assumption selected, a hypothesis created, an experiment approved, evidence attached, or decision context updated. The chat surface stays where people think out loud; the system tracks what actually advanced.

The card is where conversation becomes state

The card is where the agent stops writing and state gets written. When the system turns agent output into an interactive card, the user can select an assumption, answer a clarifying question, choose a hypothesis, or approve an experiment path, and each action gives the next agent a structured signal to continue from. An assumption card is not a prettier message; it is a state transition. The next agent does not infer intent from prose. It reads what the user committed to in the card.

Diagram showing the anatomy of an interactive agent card as a state transition.

The same pattern applies across stages: assumption cards, experiment cards, and the structured surfaces that follow.

Agents own stages of evaluation

Once cards can write state, evaluation work still has to be split across specialized agents because the learning loop is not one job. An orchestration layer routes each turn by evaluation stage so the right specialist runs next. Assumption Discovery surfaces the beliefs an idea depends on. Experiment Design takes a selected assumption and turns it into hypotheses and experiment options. Other agents extend the same pattern across planning, data tracking, learning synthesis, and insight generation. Each agent owns a stage and produces a durable artifact the next one can use, so the chat surface stays natural while the system underneath keeps moving.

A simplified pass through Assumption Discovery and Experiment Design:

What changes in practice

Imagine you have an uncertain idea: busy families might pay for prepared dinners that save time on weeknights. A normal assistant can discuss customers, pricing, and delivery at length. SwiftCNS asks what state the idea should move into next. Assumption Discovery proposes candidate beliefs; when you select the riskiest one, that choice is written to project state, and Experiment Design proposes experiment cards with success criteria, cost, run time, and data reliability. Choosing a path does not mean the chat continued. It means the evaluation moved forward.

The same trace, step by step:

Horizontal trace from meal-prep idea through assumption selection, experiment design, evidence, and decision context.

That is the difference between an answer and an evaluation system.

SwiftCNS is built for that pattern so uncertain ideas can move toward evidence-backed decisions at a deliberate rate and in a deliberate direction. Conversation stays the surface; specialized agents and interactive cards do the evaluation work; project memory records what changed. For the full thesis on why idea evaluation is the bottleneck, read the SwiftCNS thesis. To put an uncertain idea through an agent-powered learning loop, early access is open now.