AGENTS.md — Multi-Agent Research Orchestration Report

Date: April 15, 2026 Research ID: RCH-tech-jgwill-claws-infrastructure--2604150901--a989a67a-bd77-4cf4-b093-a24cda73d48f PDE ID: 325a8ade-e716-45e6-8e5f-a4866b1bdd18 Orchestration Model: claude-opus-4.6 (all agents) For: Guillaume Descoteaux-Isabelle (jgwill)

Overview

What Was Researched

Infrastructure planning and budgeting for local AI inference and training on a Mac Mini, alongside existing OpenAI Codex and GitHub Copilot subscriptions. The research covers the OpenClaw plugin ecosystem, Apple Silicon hardware specifications, QMD knowledge-base model fine-tuning, and cloud provider integration — all in the context of an Indigenous-AI Collaborative Platform (IAIP).

Why

Guillaume needs to make a purchasing decision (Mac Mini configuration) and an architectural decision (which plugins, providers, and training pipelines to adopt) for a system that will serve as a local AI node running alongside cloud subscriptions. The system must support Indigenous knowledge sovereignty — a requirement that introduces data sovereignty constraints absent from typical infrastructure planning.

For Whom

Guillaume Descoteaux-Isabelle, developer of the IAIP platform, who runs OpenClaw, Hermes Agent, QMD, and related tools for Indigenous-AI research and development.

Orchestration Architecture

┌──────────────────────────────────────────────────────────────┐
│                   Phase 0: PDE Decomposition                 │
│              mcp-pde → 4 MECE research tracks                │
│     9 secondary intents · 4 ambiguities · Four Directions    │
└──────────────────────┬───────────────────────────────────────┘
                       │
                       ▼
┌──────────────────────────────────────────────────────────────┐
│              Cycle 0: Initial Research (5 agents)            │
│                                                              │
│  ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐           │
│  │Agent A  │ │Agent B  │ │Agent C  │ │Agent D  │           │
│  │Plugins  │ │Inference│ │Training │ │QMD      │           │
│  │362 lines│ │492 lines│ │593 lines│ │634 lines│           │
│  │~415s    │ │~365s    │ │~506s    │ │~410s    │           │
│  └─────────┘ └────┬────┘ └─────────┘ └─────────┘           │
│                    │ completes                               │
│               ┌────▼────┐                                    │
│               │Agent E  │  (hit 4-agent concurrency limit)   │
│               │Copilot  │                                    │
│               │467 lines│                                    │
│               │~393s    │                                    │
│               └─────────┘                                    │
│                                              Total: 2,548 L  │
└──────────────────────┬───────────────────────────────────────┘
                       │
                       ▼
┌──────────────────────────────────────────────────────────────┐
│              Cycle 1: Review (2 agents)                      │
│                                                              │
│  ┌──────────────────────┐  ┌──────────────────────┐         │
│  │ Track 1 Reviewer     │  │ Track 2 Reviewer     │         │
│  │ Agents A + B + E     │  │ Agents C + D         │         │
│  │ 335 lines · ~352s    │  │ 327 lines · ~387s    │         │
│  │                      │  │                      │         │
│  │ 5 critical issues    │  │ 2 BLOCKING issues    │         │
│  │ 15 revision items    │  │ 12 revision items    │         │
│  │ 16 verified facts    │  │ 16 verified facts    │         │
│  └──────────────────────┘  └──────────────────────┘         │
│                                               Total: 662 L   │
└──────────────────────┬───────────────────────────────────────┘
                       │
                       ▼
┌──────────────────────────────────────────────────────────────┐
│              Cycle 2: Revision (3 agents)                    │
│                                                              │
│  ┌───────────────┐ ┌───────────────┐ ┌───────────────┐      │
│  │ Reviser 1     │ │ Reviser 2     │ │ Reviser 3     │      │
│  │ RESULT-01 +   │ │ RESULT-02     │ │ RESULT-03     │      │
│  │ RESULT-04     │ │ Inference     │ │ Training      │      │
│  │ 934 lines     │ │ 546 lines     │ │ 851 lines     │      │
│  │ ~448s         │ │ ~333s         │ │ ~364s         │      │
│  └───────────────┘ └───────────────┘ └───────────────┘      │
│                                             Total: 2,331 L   │
└──────────────────────┬───────────────────────────────────────┘
                       │
                       ▼
┌──────────────────────────────────────────────────────────────┐
│                    Final Assembly                             │
│  4 RESULT files copied to root · INDEX.md · AGENTS.md        │
└──────────────────────────────────────────────────────────────┘

Agent Relationship Map

Agent A (Plugins) ──────┐
Agent B (Inference) ────┼──→ Track 1 Reviewer ──→ Reviser 1 → RESULT-01, RESULT-04
Agent E (Copilot) ──────┘                    └──→ Reviser 2 → RESULT-02

Agent C (Training) ─────┐
Agent D (QMD Models) ───┘──→ Track 2 Reviewer ──→ Reviser 3 → RESULT-03

Phase 0: PDE Decomposition

Tool: mcp-pde (Prompt Decomposition Engine) PDE ID: 325a8ade-e716-45e6-8e5f-a4866b1bdd18 PDE File: .pde/2604150910--325a8ade-e716-45e6-8e5f-a4866b1bdd18/ Confidence: 95%

How the Inquiry Was Decomposed

The original multi-part inquiry was processed through the PDE to extract structured intents, map dependencies, and identify ambiguities. The PDE uses a Four Directions framework (East/South/West/North) to organize the decomposition:

Direction	Role	Intents Identified
🌅 East — Vision	What the user wants	Understand local AI + cloud hybrid goal; Identify Indigenous knowledge context for persona training
🔥 South — Analysis	Research tracks	OpenClaw plugins; Mac Mini M4 specs; Community fine-tuning practices; QMD HuggingFace models
🌊 West — Validation	Verify assumptions	Plugin compatibility across Claw variants; 40GB model feasibility in unified memory; Mac Mini vs Studio
❄️ North — Action	Deliverables	Tiered specs with pricing; Plugin recommendations; Training roadmap

MECE Research Tracks

The PDE identified 4 mutually exclusive, collectively exhaustive research tracks, which became the 5-agent Cycle 0 plan:

OpenClaw Plugin Ecosystem → Agent A
Mac Mini Hardware for Inference → Agent B
Apple Silicon Training & Fine-Tuning → Agent C + Agent D (split: general training + QMD-specific)
Cloud Provider Capabilities → Agent E

Identified Ambiguities (4)

Which specific "Claw variants" beyond OpenClaw and Hermes?
What is the budget ceiling for hardware?
What is the intended training frequency (weekly? monthly?)?
What level of Indigenous knowledge sensitivity applies?

Secondary Intents (9)

Dependency-ordered intents spanning evaluate, spec, research, investigate, clarify, verify, and budget actions. The PDE mapped 6 dependency edges between these intents to ensure proper sequencing.

Expected Outputs

The PDE pre-declared 5 artifact filenames:

RESULT-01-openclaw-plugins-local-ai.md
RESULT-02-mac-mini-inference-scenarios.md
RESULT-03-mac-mini-training-scenarios.md
RESULT-04-copilot-google-plugin.md
INDEX.md

Cycle 0: Initial Research

Agents: 5 × claude-opus-4.6 sub-agents, launched in parallel Total Output: 2,548 lines of raw research Duration: ~506s (wall clock, limited by slowest agent) Concurrency Note: Platform hit a 4-agent concurrency limit; Agent E launched after Agent B completed (~365s)

Agent A — OpenClaw Plugin Ecosystem

File: 00-initial-research/agent-a-plugins.md (362 lines, ~415s) Focus: Which OpenClaw plugins serve local AI inference and training workflows

Key Findings:

OpenClaw is a general-purpose AI automation agent (~250K GitHub stars), NOT a coding agent
@openclaw/ollama-provider is the essential local AI plugin — auto-discovers models via /api/tags
Hermes Agent (Nous Research, Feb 2026) is a separate Python-based framework; plugins are NOT cross-compatible
Both OpenClaw and Hermes can share the same Ollama instance concurrently
Identified ACPX Runtime as OpenClaw's internal plugin execution engine
Anthropic blocked OAuth tokens on April 4, 2026

Issues Found in Review: Incorrectly characterized the Copilot provider as "unofficial/community integration" (it is bundled first-class). Missing GPT-5 family models, Claude Opus, security warnings about malicious plugins.

Agent B — Mac Mini M4 Inference Specifications

File: 00-initial-research/agent-b-mac-inference.md (492 lines, ~365s) Focus: Hardware specs, pricing, and inference benchmarks for two scenarios

Key Findings:

Scenario A (Minimal): Mac Mini M4, 24GB, 512GB — $999. Runs 7B–8B models at 28–35 tok/s
Scenario B (Evolved): Mac Mini M4 Pro, 64GB, 1TB — $2,199. Runs up to 70B Q4 at 6–8 tok/s
RAM is soldered and cannot be upgraded — critical decision point
Unified Memory Architecture eliminates CPU↔GPU transfer bottleneck
Memory bandwidth (not compute) is the LLM inference bottleneck
Mac Studio M4 Max (128GB, $3,699) is the upgrade path

Issues Found in Review: 14-core M4 Pro pricing understated. MLX backend status overstated ("default" vs "preview"). Benchmark inconsistencies between Metal and MLX numbers. RTX 4070 Super comparison misleading. Missing M5 chip timeline, CAD pricing, DeepSeek-R1 benchmarks.

Agent C — Apple Silicon Training Capabilities

File: 00-initial-research/agent-c-mac-training.md (593 lines, ~506s) Focus: Training frameworks, feasibility, and hardware requirements

Key Findings:

MLX-LM (v0.31.1) is the primary framework for LoRA/QLoRA on Apple Silicon
LoRA fine-tuning of 7B models: 10–30 min per adapter on M4 Pro
Five persona adapters trainable in ~2.5–3 hours
Weekend self-training schedule proposed with Friday night automation
Mac Mini M4 Pro 48GB is the recommended training configuration

Issues Found in Review: mlx-tune version inflated (v0.4.21 → actual v0.4.19). EmbeddingGemma architecture misidentified (claimed Gemma 3 decoder, actually encoder-only). Mac Mini pricing incorrect. LoRA CLI example outdated. Missing QMD pipeline CUDA dependency warning.

Agent D — QMD HuggingFace Models Analysis

File: 00-initial-research/agent-d-qmd-models.md (634 lines, ~410s) Focus: QMD's three GGUF models, fine-tuning potential, and deployment

Key Findings:

QMD uses exactly 3 GGUF models: embeddinggemma-300M, Qwen3-Reranker-0.6B, qmd-query-expansion-1.7B
All three swappable via environment variables (QMD_EMBED_MODEL, QMD_RERANK_MODEL, QMD_GENERATE_MODEL)
Complete fine-tuning pipeline exists in finetune/ directory for query expansion model
Training data: ~2,290 examples, LoRA config: rank 16, alpha 32
Cloud training via HuggingFace Jobs: ~$1.50/run on A10G

Issues Found in Review: Same EmbeddingGemma architecture error as Agent C. Training data format not fully explained. CUDA dependency in pipeline not flagged. GGUF conversion for encoder-only models unverified — identified as BLOCKING risk.

Agent E — Copilot + Google Plugin Capabilities

File: 00-initial-research/agent-e-copilot-google.md (467 lines, ~393s) Focus: What Copilot subscription enables in OpenClaw, Google plugin capabilities

Key Findings:

GitHub Copilot provider gives $0 marginal cost access to multiple model families
Google plugin provides 5 capability contracts: LLM, image gen, video gen, music gen, web search
Perplexity plugin is web-search only, not an LLM provider
Multi-provider routing supports primary + fallback chain configuration

Issues Found in Review: Used deprecated gpt-4o in all config examples. Missing entire GPT-5 family, Claude Opus/Haiku models. Oversimplified "$0 for all models" claim (premium quotas exist). Star count inconsistency.

Cycle 1: Review

Agents: 2 × claude-opus-4.6 review agents, launched in parallel Total Output: 662 lines of review analysis Duration: ~387s (wall clock)

Track 1 Reviewer — Plugins, Inference, Copilot (Agents A, B, E)

File: 01-review/review-track1-plugins-inference-copilot.md (335 lines, ~352s) Overall Grade: B+

What This Reviewer Did:

Cross-referenced all three documents for contradictions
Verified 16 factual claims against web sources and official documentation
Identified 5 critical issues and 15 revision items
Provided corrected data tables (pricing, model catalog, MLX performance)

Critical Issues Found:

Copilot provider contradiction — Agent A called it "unofficial/community," Agent E showed it bundled in the monorepo. Resolution: Agent E correct.
GPT-4o deprecated — All config examples used a deprecated model. Must use GPT-4.1.
M4 Pro pricing understated — 14-core 64GB/1TB listed at $2,399 but actual Apple price may be $2,499–$3,499.
MLX backend status overstated — Preview, not default; may require OLLAMA_MLX=1.
Missing GPT-5 model family — GPT-5 mini, 5.2, 5.3-Codex, 5.4 all available but absent from all documents.

Cross-Document Contradictions Identified: 6 (Copilot provider status, star counts, MLX benchmark inconsistency, Hermes characterization, OpenClaw description, missing cross-references).

Track 2 Reviewer — Training, QMD Models (Agents C, D)

File: 01-review/review-track2-training-qmd.md (327 lines, ~387s) Overall Grade: B+

What This Reviewer Did:

Verified Agent D's QMD source code claims against actual commit cfd640e
Confirmed all 3 model URIs, env vars, and SDK API match source code
Verified fine-tuning pipeline existence (20+ files in finetune/)
Identified 2 BLOCKING issues and 12 revision items

BLOCKING Issues:

GGUF conversion unverified — Converting fine-tuned encoder-only embedding model back to GGUF is experimental. If conversion fails, the entire embedding fine-tuning proposal collapses. Mitigation: test with dummy data first; fallback to Qwen3-Embedding-0.6B.
CUDA dependency in QMD pipeline — pyproject.toml depends on nvidia-ml-py and targets A10G GPUs. Won't run on Mac without modification.

Cross-Document Contradictions: 5 (embedding framework disagreement, sentence-transformers compatibility, CUDA vs Mac, priority ordering, training data format).

Data Sovereignty Gap: Both documents were "significantly deficient" on Indigenous data sovereignty. The reviewer added detailed OCAP principles, knowledge classification, consent requirements, and cloud training risks.

Key Corrections Identified (Combined)

Copilot provider is bundled first-class, not unofficial/community
GPT-4o is deprecated — replace with GPT-4.1 in all examples
Add GPT-5 family (5-mini, 5.2, 5.2-codex, 5.3-codex, 5.4, 5.4-mini) to model catalog
Add Claude Opus 4.5/4.6 and Claude Haiku 4.5 to model catalog
EmbeddingGemma-300M is encoder-only (BERT-like), NOT Gemma 3 decoder
mlx-tune version is v0.4.19, not v0.4.21
GGUF conversion for fine-tuned encoder models is unverified — BLOCKING risk
QMD finetune pipeline is CUDA-only — needs modification for Mac
Mac Mini M4 Pro pricing corrections with verified Apple Store numbers
MLX backend is preview, may need explicit OLLAMA_MLX=1 enablement
Premium request quotas exist — "$0 for all models" is oversimplified
M5 chip timeline should be included (expected WWDC June 2026)
Canadian pricing needed (1 USD ≈ 1.38 CAD)
OpenClaw security crisis — 12–20% malicious community plugins
Data sovereignty protocols — OCAP principles, knowledge classification, local-only training

Cycle 2: Revision

Agents: 3 × claude-opus-4.6 revision agents, launched in parallel Total Output: 2,331 lines, 17,285 words across 4 final documents Duration: ~448s (wall clock)

Reviser 1 — Plugins + Copilot/Google (RESULT-01 + RESULT-04)

Output Files:

RESULT-01-openclaw-plugins-local-ai.md (419 lines, 2,826 words)
RESULT-04-copilot-google-plugin.md (515 lines, 3,369 words) Duration: ~448s

Corrections Incorporated:

Copilot provider corrected from "unofficial" to "bundled first-class extension"
All gpt-4o references replaced with gpt-4.1
Complete GPT-5 family, Claude Opus/Haiku, Gemini models added to catalog (17+ models)
Premium request quota system documented (300/mo Pro, 1,500/mo Pro+)
Security warning added: 12–20% malicious community plugins
Anthropic OAuth block scope corrected (all consumer tiers, not just Pro/Max)
Perplexity pricing detailed ($20/mo subscription + token costs)
Total cost analysis added combining hardware + subscriptions + API costs
Hermes claw migrate tool referenced for cross-platform migration
Star count standardized to ~250K across documents

Reviser 2 — Mac Mini Inference Scenarios (RESULT-02)

Output File: RESULT-02-mac-mini-inference-scenarios.md (546 lines, 5,106 words) Duration: ~333s

Corrections Incorporated:

Complete pricing table with verified Apple MSRP (USD) and estimated CAD
14-core M4 Pro pricing corrected to $2,399–$2,499 range with notes
12-core M4 Pro at $2,199 recommended as better value (same bandwidth)
MLX backend status corrected: "preview, may need OLLAMA_MLX=1"
MLX speedup range corrected: "21–29% for 8B models" (not misleading 87%)
Benchmark tables clearly labeled Metal vs MLX
RTX 4070 Super misleading comparison removed/corrected
DeepSeek-R1 32B benchmarks added (11–14 tok/s on M4 Pro 64GB)
Ollama OLLAMA_NUM_PARALLEL concurrency discussion added
M5 chip timeline section added (expected WWDC June 2026)
Canadian pricing added throughout (1 USD ≈ 1.38 CAD)
Power consumption corrected: 50–80W range (not just 50W)
GPT-4o deprecation warning added

Reviser 3 — Mac Mini Training Scenarios (RESULT-03)

Output File: RESULT-03-mac-mini-training-scenarios.md (851 lines, 5,984 words) Duration: ~364s

BLOCKING Issues Resolved:

GGUF conversion risk: Explicitly documented as "hardest unsolved step." Added mitigation strategy (test with dummy data first, fallback to Qwen3-Embedding-0.6B). Marked as experimental throughout.
CUDA dependency: Documented three forward paths (HuggingFace Jobs cloud, MLX-LM local rewrite, PyTorch MPS modification) with data sovereignty implications for each.

Other Corrections:

EmbeddingGemma architecture corrected to "encoder-only transformer"
mlx-tune version corrected to v0.4.19 with capability matrix caveated as "claimed"
LoRA CLI corrected from python lora.py to python -m mlx_lm.lora
MLX vs MLX-LM distinction clarified (separate packages)
Mac Mini pricing corrected (48GB/1TB: $2,299, not $1,800)
Training automation script enhanced with error handling, logging, rollback, and validation gates
launchd recommended over cron for macOS scheduling
Data sovereignty section elevated to foundational requirement
OCAP principles applied to training data and model weights
Knowledge classification system added (Public/Community/Sacred/Private)
Evaluation metrics added (MRR, Precision@5, A/B testing)
Multilingual considerations added (Indigenous languages, French, English)
"What's Proven vs Experimental" honesty table added

Final Deliverables

#	File	Description	Lines	Words
1	`RESULT-01-openclaw-plugins-local-ai.md`	Plugin ecosystem: Ollama, Copilot, Google, Perplexity, Hermes compatibility, multi-provider routing, security	419	2,826
2	`RESULT-02-mac-mini-inference-scenarios.md`	Hardware for inference: Scenario A ($999) vs B ($2,199), benchmarks, pricing, thermal, hybrid strategy	546	5,106
3	`RESULT-03-mac-mini-training-scenarios.md`	Hardware for training: MLX-LM, QMD fine-tuning, LoRA personas, data sovereignty, automation pipeline	851	5,984
4	`RESULT-04-copilot-google-plugin.md`	Cloud providers: Copilot model catalog, premium quotas, Google capabilities, Perplexity, cost analysis	515	3,369
—	`INDEX.md`	Master index with quick-answer section	67	—
—	`AGENTS.md`	This orchestration report	—	—

Lessons Learned

What Worked Well

PDE decomposition prevented scope drift. By pre-declaring 4 MECE tracks and 5 expected artifacts, the research stayed focused. No agent wandered into territory covered by another.
The 3-cycle pipeline (Research → Review → Revision) caught significant errors. Without Cycle 1 review, the final documents would have contained deprecated model names (GPT-4o), incorrect pricing ($1,800 vs $2,299), a wrong architecture description (EmbeddingGemma), and an unacknowledged BLOCKING risk (GGUF conversion). The review cycle added ~12 minutes of wall time but prevented material misinformation.
Cross-document verification was invaluable. The reviewers found 11 cross-document contradictions that no single-agent approach would have caught (e.g., Copilot provider described as "unofficial" in one doc and "bundled" in another; two different star counts; two different tok/s figures for the same hardware).
Parallel agent execution was efficient. 5 agents running simultaneously in Cycle 0 produced 2,548 lines of research in ~8.5 minutes (wall time). Sequential execution would have taken ~35 minutes.
Source code verification by Track 2 reviewer confirmed all QMD model URIs, env vars, and SDK API exactly matched the source. This turned speculative claims into verified facts.
Data sovereignty gap identification demonstrated that domain-specific review catches what general-purpose research misses. Both initial agents were "significantly deficient" on Indigenous data sovereignty — a foundational requirement the reviewer correctly elevated.

What Could Be Improved

4-agent concurrency limit caused serialization. Agent E had to wait for Agent B to complete, adding ~6 minutes of unnecessary delay. A 5-agent limit would have saved this.
Initial research agents shared a misconception. Both Agent C and Agent D independently produced the same incorrect EmbeddingGemma architecture description ("Gemma 3 decoder" instead of encoder-only). This suggests they used the same flawed source. A pre-research fact-check step or different source diversity could prevent correlated errors.
Review agents should produce structured correction manifests. The reviewers produced prose documents with inline corrections. A structured format (JSON/YAML) with file, line, current_text, corrected_text fields would make the revision cycle more mechanical and less error-prone.
No evaluation of final output quality. The pipeline lacks a Cycle 3 validation step that would verify the revisions actually incorporated all corrections. A quick diff-based check could confirm this.
Training time estimates remain extrapolated. Despite thorough review, M4 Pro training benchmarks are still scaled from M2/M3 data. Future research should include actual M4 Pro benchmark runs.
Reviser workload was imbalanced. Reviser 1 produced 2 documents (934 lines) while Reviser 3 produced 1 document (851 lines). Reviser 1 took longer (~448s vs ~364s). Better load balancing would equalize durations.

Agent Interaction Patterns Observed

Pattern	Description	Example
Correlated error	Multiple agents produce the same incorrect claim from a shared bad source	EmbeddingGemma architecture in Agents C and D
Contradictory characterization	Agents describe the same entity differently based on different sources	Copilot provider: "unofficial" (A) vs "bundled" (E)
Complementary depth	Agents covering adjacent topics reinforce each other's findings	Agent D's QMD source code verification validated Agent C's training feasibility claims
Gap revelation	Cross-referencing reveals missing connections	Agent B's RAM calculations didn't account for Agent A's LanceDB memory plugin RAM requirements
Scope creep resistance	PDE pre-declaration kept agents in lanes	No agent attempted to cover another agent's research track

Statistics

Metric	Value
Total agents spawned	10
Agent model	`claude-opus-4.6` (all agents)
Cycle 0 agents	5 (research)
Cycle 1 agents	2 (review)
Cycle 2 agents	3 (revision)
Total lines produced	5,541 (2,548 + 662 + 2,331)
Final deliverable lines	2,331
Final deliverable words	17,285
Total documents created	13 (5 research + 2 review + 4 revision + INDEX + AGENTS)
Critical issues found	7 (5 in Track 1, 2 BLOCKING in Track 2)
Revision items	27 (15 in Track 1, 12 in Track 2)
Verified facts	32 (16 + 16)
Cross-document contradictions	11 (6 in Track 1, 5 in Track 2)
Estimated wall-clock time	~25 minutes total
Cycle 0 duration (wall)	~506s (~8.4 min)
Cycle 1 duration (wall)	~387s (~6.5 min)
Cycle 2 duration (wall)	~448s (~7.5 min)
PDE decomposition	4 MECE tracks, 9 secondary intents, 4 ambiguities
Concurrency limit hit	Yes (4-agent max; Agent E delayed)

File Tree

RCH-tech-jgwill-claws-infrastructure--2604150901--a989a67a-bd77-4cf4-b093-a24cda73d48f/
│
├── AGENTS.md                          ← This document (orchestration report)
├── INDEX.md                           ← Master index with quick answers
│
├── RESULT-01-openclaw-plugins-local-ai.md      ← Final: Plugin ecosystem guide
├── RESULT-02-mac-mini-inference-scenarios.md    ← Final: Hardware for inference
├── RESULT-03-mac-mini-training-scenarios.md     ← Final: Hardware for training
├── RESULT-04-copilot-google-plugin.md           ← Final: Cloud provider capabilities
│
├── 00-initial-research/               ← Cycle 0: Raw research (5 agents)
│   ├── agent-a-plugins.md             │  362 lines — OpenClaw plugin ecosystem
│   ├── agent-b-mac-inference.md       │  492 lines — Mac Mini M4 inference specs
│   ├── agent-c-mac-training.md        │  593 lines — Apple Silicon training
│   ├── agent-d-qmd-models.md          │  634 lines — QMD models & fine-tuning
│   └── agent-e-copilot-google.md      │  467 lines — Copilot + Google plugin
│
├── 01-review/                         ← Cycle 1: Cross-reference review (2 agents)
│   ├── review-track1-plugins-inference-copilot.md  │  335 lines — Agents A+B+E
│   └── review-track2-training-qmd.md               │  327 lines — Agents C+D
│
├── 02-revision/                       ← Cycle 2: Corrected final documents (3 agents)
│   ├── RESULT-01-openclaw-plugins-local-ai.md      │  419 lines
│   ├── RESULT-02-mac-mini-inference-scenarios.md    │  546 lines
│   ├── RESULT-03-mac-mini-training-scenarios.md     │  851 lines
│   └── RESULT-04-copilot-google-plugin.md           │  515 lines
│
└── .pde/                             ← PDE decomposition artifacts
    └── 2604150910--325a8ade-e716-45e6-8e5f-a4866b1bdd18/
        ├── pde-325a8ade-e716-45e6-8e5f-a4866b1bdd18.json  ← Structured decomposition
        └── pde-325a8ade-e716-45e6-8e5f-a4866b1bdd18.md    ← Markdown export

How to Replicate This Orchestration

Prerequisites

Access to claude-opus-4.6 (or equivalent) via GitHub Copilot CLI with sub-agent spawning
mcp-pde MCP server configured for prompt decomposition
Web search tools for agent verification steps

Pipeline Steps

Decompose the inquiry using pde_decompose → pde_parse_response to produce structured MECE tracks
Spawn N research agents in parallel, one per track, each writing to 00-initial-research/agent-{letter}-{topic}.md
Spawn M review agents in parallel, each cross-referencing a subset of research docs, writing to 01-review/review-track{n}.md
Spawn K revision agents in parallel, each producing final RESULT-{nn}-{topic}.md files in 02-revision/
Assemble — copy RESULT files to root, create INDEX.md, create AGENTS.md

Key Design Decisions

Why 3 cycles, not 2? The review cycle catches errors that agents cannot self-detect (cross-document contradictions, correlated misconceptions, scope gaps). Skipping it would have left 7 critical issues in the final output.
Why opus, not sonnet/haiku? Research requiring web search, source verification, and long-form synthesis benefits from higher-capability models. The cost difference (~$0.15/agent vs ~$0.03) is negligible for a 10-agent orchestration.
Why MECE decomposition? Non-overlapping tracks prevent duplicate work and contradictory framings of the same topic. The PDE enforces this structure.

Generated April 15, 2026. This document describes the orchestration of 10 claude-opus-4.6 sub-agents across 3 cycles producing 4 final research deliverables for IAIP infrastructure planning.