Informative Appendix (non-normative)
This appendix covers platforms that split an engagement across multiple agents, workers, or model-driven components. It does not create new conformance requirements. The purpose is narrower. APTS already includes controls for scope, safety, logging, containment, and human oversight. Most of those controls were written against a single agent runtime. Once a platform distributes work across several concurrent decision makers, the same control logic still applies, but the failure modes change.
That change matters. A platform can have a solid kill switch, clean scope validation, and a respectable audit trail when tested as a single-agent system, then fail under concurrent operation. One worker halts, another continues. Each worker stays inside its local rate budget, but the target still receives excessive traffic in aggregate. One agent treats a newly discovered asset as merely interesting; another treats the same discovery as implicit authorization to probe it. Those are recognizable multi-agent safety failures.
Primary scope: one platform or operator-controlled system running multiple workers, role-separated agents, or semi-independent decision components under one engagement authority and one coordination fabric.
Limited scope: semi-independent agents participating in one engagement while still operating under a shared coordination layer or shared operator authority model.
Out of scope for normative interpretation: truly independent agents from different vendors or separate control planes that merely happen to work on the same customer environment. In that case, many of the patterns below become coordination expectations or contractual practices rather than platform-enforced safeguards.
For this appendix, "multi-agent" should be read in two layers.
The primary case, and the one this appendix is written for, is a single platform or operator-controlled system running multiple concurrent workers, role-separated agents, or semi-independent decision components under one engagement authority. In that design, the platform owns the coordination fabric. It can enforce one halt state, one scope authority, one shared budget, and one audit model across all active components.
The secondary case is looser: multiple semi-independent agents participating in one engagement while still operating under a shared coordination layer or a shared operator authority model. That is harder to reason about, but the same control themes still apply if there is a genuine mechanism for shared halt, shared scope authority, and shared auditability.
The appendix is not written as a normative coordination model for truly independent agents from different vendors or separate control planes that merely happen to work on the same customer environment. Once the coordination fabric is split across separate operators, the problem changes. Internal platform controls can no longer be assumed to propagate, and many of the patterns below become interface expectations or contractual coordination practices rather than platform-enforced safeguards.
This appendix is mainly for:
If separate operators participate in one engagement, the guidance here is still useful, but it should be treated as a coordination baseline rather than proof that shared controls are actually enforced.
The issue is not that APTS lacks safety controls. The issue is that concurrency changes where those controls break.
In a single-agent design, most safety questions collapse into one runtime and one decision stream. In a multi-agent design, that is no longer true. There may be several planning contexts, several execution queues, several local memories, and several partial views of the same target. Coordination becomes part of the safety model whether the platform admits it or not.
The common failure cases are easy to recognize once you look for them:
The rest of this appendix addresses those coordination failures by tying them back to existing APTS controls.
One orchestration layer assigns work to multiple workers or agent instances. This is the most common model. The obvious risks are aggregate rate overshoot, uneven halt propagation, duplicate action against the same target, and shared-state corruption.
Different agents or models play different roles: planner, executor, verifier, reporter, memory manager, and so on. Here the main problem is not raw concurrency. It is trust. A lower-trust component can influence a higher-impact one unless the handoff points are explicit and enforced.
Several agents can make partially independent decisions while still operating under one Rules of Engagement document, one scope definition, one operator chain, and one audit model. This design can be workable. It can also fail in strange ways if disagreement handling is vague. The architecture has to answer a simple question: when two agents disagree about scope, impact, or authorization, who wins, and what happens in the meantime?
This case sits at the edge of the appendix rather than at its center. If two vendors, or two separately operated control planes, participate in one engagement, the internal coordination guarantees described elsewhere in this appendix may not exist. A shared kill state may be impossible. Shared rate budgets may be approximate. State isolation may be a legal or contractual concern rather than an internal architecture decision.
In that situation, the useful question is not whether the parties "support multi-agent operation" in the abstract. The useful question is whether they have a defined cross-operator coordination mechanism for halt, scope changes, budget enforcement, escalation, and audit evidence. If they do not, the safer assumption is that the engagement is a set of adjacent systems, not one coordinated multi-agent platform.
A useful review frame for multi-agent platforms is whether the architecture preserves a short set of invariants across every active worker and orchestration component.
If those invariants do not hold during concurrent operation, the platform's existing controls may still exist individually, but the assurance value of those controls is reduced in practice.
In a multi-agent platform, a kill switch is a distributed control. That sounds obvious, yet this is where a lot of systems quietly fail. Stopping the worker that detected the incident is not enough. The halt has to reach every active worker before that worker dispatches its next action.
Practical designs usually rely on a shared halt signal outside the agent runtime. Some use publish-subscribe messaging. Some use a coordination gateway that invalidates every outstanding work item when the halt state flips. Some stamp each work lease with a halt epoch, then refuse dispatch when the epoch no longer matches. The exact mechanism matters less than the property it creates: once the halt state changes, no worker can continue acting because local state has not yet caught up.
The pre-invocation halt check from the issue discussion belongs here. It is one of the cleaner patterns because it catches the dangerous case: a worker that is still alive, still connected, and about to fire the next tool invocation after another part of the system has already decided the engagement must stop.
Reviewers should expect evidence that halt propagation is engagement-wide. The useful test is not whether one process can stop itself. The useful test is whether several active workers stop together, whether they stop before new actions are dispatched, and whether the audit trail shows when each worker observed and acknowledged the halt.
Related normative requirements: APTS-SC-009, APTS-HO-008, APTS-HO-009, APTS-HO-015, APTS-SC-018.
Local compliance does not mean safe aggregate behavior. Three workers can each sit under their own request cap and still push the target past the engagement's allowed traffic budget.
The control that matters here usually sits above the workers. A gateway-enforced token bucket is a good example because it turns the budget into a shared scarce resource instead of a local promise. One worker spends tokens, which means fewer are available to the others. The same idea can be extended to bandwidth ceilings, payload-size limits, expensive action classes, and cooldown windows for fragile targets.
This control is easy to weaken by accident. Teams often implement "rate limiting" in each worker and call the job done. That misses the actual problem. In a multi-agent system, the safe question is not "did each worker stay polite?" It is "did the engagement as a whole stay within the allowed envelope?"
Reviewers should expect the platform to show where aggregate limits are enforced, how shared budgets are accounted for, and what happens when several workers compete for the same budget at once.
Related normative requirements: APTS-SC-004, APTS-SE-019, APTS-SC-006, APTS-SC-010.
Discovery is not authorization. In a multi-agent design, that distinction has to remain explicit.
One agent may discover a new host, a newly exposed admin path, or a credential that appears to open a second system. Another agent may be waiting for anything interesting to appear so it can continue the chain. That is exactly where scope errors happen. Shared knowledge starts to feel like shared permission.
The safer pattern is centralized re-validation. A newly discovered target, path, or credential can be shared as a candidate input, but another agent should not act on it until the authoritative scope logic has checked it. If the result is uncertain, the platform should escalate the uncertainty rather than let one agent inherit another agent's optimism.
This gets more important in overlapping or adjacent engagements. One worker may see an asset as familiar because it belongs to a different customer environment, a different campaign, or a previous engagement snapshot. The coordination layer has to distinguish "known to the platform" from "authorized for this engagement." Those are not the same thing.
Reviewers should expect the logs to show who discovered the candidate target, which control re-validated it, and whether the final decision was automatic or operator-driven.
Related normative requirements: APTS-SE-006, APTS-SE-007, APTS-SE-021, APTS-SE-022, APTS-HO-013.
State isolation is one of the gaps the issue comments sharpened, and rightly so. Multi-agent systems fail in predictable but serious ways when working context moves across agent boundaries without controlled rules.
One agent's scratch notes should not become another agent's instructions by accident. A stale target classification should not linger in shared memory after the operator has changed the scope. A credential reference intended for a narrow path should not spread through several contexts just because it was convenient to replicate. These are not abstract concerns. They are how systems create their own cross-contamination.
The safer pattern is to isolate per-agent working memory and treat shared state as a separate, authoritative layer outside any one agent context. Scope, deny lists, operator directives, escalation state, approved credentials, and similar safety-critical facts belong there. Promotion from local state to shared state should be explicit. Shared facts should carry freshness markers or version identifiers so agents can tell whether the copy they are reading is still current.
There is also a role question. Not every agent should be able to write every class of shared state. A reporting agent does not need broad write authority over execution context. A verifier should not be able to silently rewrite the underlying evidence it is supposed to judge. The implementation details vary, but the principle is stable: convenient state sharing is not a safety property.
Reviewers should expect evidence that state sharing is intentional, role-bounded, and logged.
Related normative requirements: APTS-MR-022, APTS-MR-023, APTS-TP-017, APTS-AR-020, APTS-SE-023.
Agent-to-agent traffic should be treated as a real trust boundary, not as harmless internal chatter.
Messages can be stale. They can be malformed. They can relay target-controlled content dressed up as internal recommendation. They can carry scope references, identifiers, and credentials that look plausible enough to pass a casual check. The fact that a message came from another component inside the platform does not make it trustworthy.
The issue discussion pointed to authenticated messaging, and that belongs here. Session-scoped authentication is useful because it binds the message to the engagement and to the sending component. Message schemas help too, especially when they distinguish between observation, recommendation, request for approval, and execution result. Those categories matter. A platform gets into trouble when a descriptive message silently turns into an authorization signal.
There is a second boundary here as well: target-derived content relayed by one agent into another agent's context should stay classified as untrusted data. Internal routing does not cleanse it. If the first agent saw adversarial content, the second agent should not receive it as trusted instruction.
Reviewers should expect the platform to show how messages are authenticated, how message type is enforced, and which roles are allowed to trigger which downstream effects.
Related normative requirements: APTS-MR-018, APTS-MR-022, APTS-MR-023, APTS-SC-020, APTS-AR-020.
This is the other gap the comments pushed into focus. The safe default cannot be left implicit.
Multi-agent platforms often use verifier approval, planner-executor agreement, quorum, or paired review before a higher-impact action goes forward. That part is easy to describe on a diagram. The harder part is the failure path. What happens when one agent says the action is in scope and another flags uncertainty? What happens when the verifier does not answer? What happens when the containment check times out during a live decision window?
The answer should be restrictive. If the disagreement is safety-critical, the platform should pause, halt, or escalate. If a required approval path times out, the timeout should not count as implied consent. That was one of the clearest points in the issue discussion, and it should appear in the appendix exactly that way.
Not every disagreement needs the same treatment. An execution agent and a reporting agent may disagree on wording without any safety impact. That is fine. A dispute about scope, impact, authorization, or containment status is different. The platform should classify those as safety-critical and give them a defined restrictive fallback.
Reviewers should expect the platform to show both the normal consensus path and the timeout path. The timeout path is where systems reveal what they actually trust.
Related normative requirements: APTS-HO-003, APTS-HO-011, APTS-HO-012, APTS-SC-009, APTS-SC-018.
Heartbeats are useful. They are not enough.
An agent can be fully alive from an infrastructure perspective while doing exactly the wrong thing. It may keep requesting blocked actions, probe the edges of scope repeatedly, drift into a role it was never supposed to hold, or produce action sequences that no sane operator would expect from its declared mandate. That is why the issue comments pushed behavioral anomaly detection over heartbeat-only checks.
The better pattern is to watch behavior, not just liveness. A platform can look for repeated blocked tool requests, strange parameter combinations, action distribution drift, conflict between local state and authoritative shared state, or role violations such as a reporting component trying to steer execution behavior. None of those signals need to be perfect in isolation. Taken together, they can show that one worker is no longer behaving like the worker the platform thinks it deployed.
Containment should be specific. If one agent drifts, the platform should be able to isolate or stop that agent, preserve its state and logs, and prevent it from dragging the rest of the engagement with it. Sometimes the right answer will still be a full halt. Sometimes it will be worker-level containment followed by operator review. The key point is that heartbeat status alone is too weak a proxy for safety.
Related normative requirements: APTS-MR-023, APTS-SE-026, APTS-SC-018, APTS-AL-028, APTS-AR-001.
By the time a reviewer reaches a multi-agent platform, the architecture diagram is rarely the hard part. The hard part is proving what happened during live coordination.
The audit record should let a reviewer answer basic but important questions:
Useful evidence usually includes architecture diagrams, coordination-control design notes, kill-propagation test records, shared-budget enforcement evidence, state-isolation design, anomaly-detection workflow records, and incident or tabletop material showing how the platform handles one misbehaving agent in the presence of several healthy ones.
What matters is reconstructability. If a reviewer cannot tell which agent knew what and when it knew it, the platform may still have logs, but it does not yet have a coordination audit trail that can support serious review.
| Reviewer question | Evidence to request | Relevant APTS anchors |
|---|---|---|
| Does halt propagate across all active workers before new actions are dispatched? | Kill-switch test records, coordination logs, worker acknowledgment traces | APTS-SC-009, APTS-HO-008, APTS-HO-009 |
| Are rate and bandwidth limits enforced across the engagement rather than per worker only? | Shared-budget design, gateway enforcement records, concurrency test results | APTS-SC-004, APTS-SE-019, APTS-SC-006 |
| Is agent state isolated with controlled write paths into shared state? | State model documentation, role-based write controls, shared-state audit logs | APTS-MR-022, APTS-MR-023, APTS-AR-020 |
| What happens when agents disagree or a required approval path times out? | Decision-flow documentation, timeout-path test results, escalation records | APTS-HO-003, APTS-HO-011, APTS-HO-012 |
| Can the platform isolate one misbehaving agent without losing control of the engagement? | Containment drill records, anomaly-detection logic, incident review material | APTS-MR-023, APTS-SC-018, APTS-AL-028 |
Related normative requirements: APTS-AR-001, APTS-AR-020, APTS-HO-005, APTS-RP-004.
Distributed safety controls require specific concurrency validation beyond the single-agent tests covered in the Customer_Acceptance_Testing.md appendix.
The full suite of multi-agent concurrency tests-verifying kill-switch propagation, budget enforcement, and anomaly containment across multiple workers-has been moved to a dedicated appendix.
See: Multi-Agent Acceptance Testing
| Coordination concern | Example implementation focus | Primary APTS anchors |
|---|---|---|
| Distributed halt across workers | Broadcast halt signal, pre-invocation halt check, halt acknowledgment tracking | APTS-SC-009, APTS-HO-008, APTS-HO-009, APTS-HO-015 |
| Aggregate impact and traffic control | Shared rate budget, gateway token bucket, central backoff enforcement | APTS-SC-004, APTS-SE-019, APTS-SC-006 |
| Scope conflict across agents | Re-validation of discovered assets, centralized scope adjudication | APTS-SE-006, APTS-SE-007, APTS-SE-021, APTS-HO-013 |
| Shared-state integrity | Authoritative external state store, controlled write paths, freshness markers | APTS-MR-022, APTS-MR-023, APTS-AR-020 |
| Inter-agent trust boundaries | Authenticated messaging, schema validation, role-bounded communication | APTS-MR-018, APTS-MR-022, APTS-SC-020 |
| Safe-default coordination decisions | Halt or pause on disagreement or timeout, escalation for unresolved decisions | APTS-HO-003, APTS-HO-011, APTS-HO-012 |
| Rogue agent handling | Behavioral anomaly detection, worker isolation, containment verification | APTS-MR-023, APTS-SC-018, APTS-AL-028 |
| Multi-agent audit reconstruction | Per-agent provenance, coordination event logs, state-transition records | APTS-AR-001, APTS-AR-020, APTS-RP-004 |
Multi-agent operation does not replace the APTS control model. It raises the bar for how those controls have to be applied.
The platform still has to answer the same questions. What is in scope? What is allowed? What must stop immediately? Who approved the action? What happened, in what order, and why? A single-agent platform answers those questions through one decision stream. A multi-agent platform has to answer them across several concurrent ones without losing authority, clarity, or restraint. That is the standard this appendix is meant to support.