← Back to Articles & Artefacts
artefactssouth

AGENTS.md β€” Multi-Agent Research Orchestration Report

IAIP Research
rch-tech-jgwill-claws-infrastructure

AGENTS.md β€” Multi-Agent Research Orchestration Report

Date: April 15, 2026 Research ID: RCH-tech-jgwill-claws-infrastructure--2604150901--a989a67a-bd77-4cf4-b093-a24cda73d48f PDE ID: 325a8ade-e716-45e6-8e5f-a4866b1bdd18 Orchestration Model: claude-opus-4.6 (all agents) For: Guillaume Descoteaux-Isabelle (jgwill)


Overview

What Was Researched

Infrastructure planning and budgeting for local AI inference and training on a Mac Mini, alongside existing OpenAI Codex and GitHub Copilot subscriptions. The research covers the OpenClaw plugin ecosystem, Apple Silicon hardware specifications, QMD knowledge-base model fine-tuning, and cloud provider integration β€” all in the context of an Indigenous-AI Collaborative Platform (IAIP).

Why

Guillaume needs to make a purchasing decision (Mac Mini configuration) and an architectural decision (which plugins, providers, and training pipelines to adopt) for a system that will serve as a local AI node running alongside cloud subscriptions. The system must support Indigenous knowledge sovereignty β€” a requirement that introduces data sovereignty constraints absent from typical infrastructure planning.

For Whom

Guillaume Descoteaux-Isabelle, developer of the IAIP platform, who runs OpenClaw, Hermes Agent, QMD, and related tools for Indigenous-AI research and development.


Orchestration Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                   Phase 0: PDE Decomposition                 β”‚
β”‚              mcp-pde β†’ 4 MECE research tracks                β”‚
β”‚     9 secondary intents Β· 4 ambiguities Β· Four Directions    β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                       β”‚
                       β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚              Cycle 0: Initial Research (5 agents)            β”‚
β”‚                                                              β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”           β”‚
β”‚  β”‚Agent A  β”‚ β”‚Agent B  β”‚ β”‚Agent C  β”‚ β”‚Agent D  β”‚           β”‚
β”‚  β”‚Plugins  β”‚ β”‚Inferenceβ”‚ β”‚Training β”‚ β”‚QMD      β”‚           β”‚
β”‚  β”‚362 linesβ”‚ β”‚492 linesβ”‚ β”‚593 linesβ”‚ β”‚634 linesβ”‚           β”‚
β”‚  β”‚~415s    β”‚ β”‚~365s    β”‚ β”‚~506s    β”‚ β”‚~410s    β”‚           β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜           β”‚
β”‚                    β”‚ completes                               β”‚
β”‚               β”Œβ”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”                                    β”‚
β”‚               β”‚Agent E  β”‚  (hit 4-agent concurrency limit)   β”‚
β”‚               β”‚Copilot  β”‚                                    β”‚
β”‚               β”‚467 linesβ”‚                                    β”‚
β”‚               β”‚~393s    β”‚                                    β”‚
β”‚               β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                                    β”‚
β”‚                                              Total: 2,548 L  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                       β”‚
                       β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚              Cycle 1: Review (2 agents)                      β”‚
β”‚                                                              β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”         β”‚
β”‚  β”‚ Track 1 Reviewer     β”‚  β”‚ Track 2 Reviewer     β”‚         β”‚
β”‚  β”‚ Agents A + B + E     β”‚  β”‚ Agents C + D         β”‚         β”‚
β”‚  β”‚ 335 lines Β· ~352s    β”‚  β”‚ 327 lines Β· ~387s    β”‚         β”‚
β”‚  β”‚                      β”‚  β”‚                      β”‚         β”‚
β”‚  β”‚ 5 critical issues    β”‚  β”‚ 2 BLOCKING issues    β”‚         β”‚
β”‚  β”‚ 15 revision items    β”‚  β”‚ 12 revision items    β”‚         β”‚
β”‚  β”‚ 16 verified facts    β”‚  β”‚ 16 verified facts    β”‚         β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜         β”‚
β”‚                                               Total: 662 L   β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                       β”‚
                       β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚              Cycle 2: Revision (3 agents)                    β”‚
β”‚                                                              β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”      β”‚
β”‚  β”‚ Reviser 1     β”‚ β”‚ Reviser 2     β”‚ β”‚ Reviser 3     β”‚      β”‚
β”‚  β”‚ RESULT-01 +   β”‚ β”‚ RESULT-02     β”‚ β”‚ RESULT-03     β”‚      β”‚
β”‚  β”‚ RESULT-04     β”‚ β”‚ Inference     β”‚ β”‚ Training      β”‚      β”‚
β”‚  β”‚ 934 lines     β”‚ β”‚ 546 lines     β”‚ β”‚ 851 lines     β”‚      β”‚
β”‚  β”‚ ~448s         β”‚ β”‚ ~333s         β”‚ β”‚ ~364s         β”‚      β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜      β”‚
β”‚                                             Total: 2,331 L   β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                       β”‚
                       β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                    Final Assembly                             β”‚
β”‚  4 RESULT files copied to root Β· INDEX.md Β· AGENTS.md        β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Agent Relationship Map

Agent A (Plugins) ──────┐
Agent B (Inference) ────┼──→ Track 1 Reviewer ──→ Reviser 1 β†’ RESULT-01, RESULT-04
Agent E (Copilot) β”€β”€β”€β”€β”€β”€β”˜                    └──→ Reviser 2 β†’ RESULT-02

Agent C (Training) ─────┐
Agent D (QMD Models) β”€β”€β”€β”˜β”€β”€β†’ Track 2 Reviewer ──→ Reviser 3 β†’ RESULT-03

Phase 0: PDE Decomposition

Tool: mcp-pde (Prompt Decomposition Engine) PDE ID: 325a8ade-e716-45e6-8e5f-a4866b1bdd18 PDE File: .pde/2604150910--325a8ade-e716-45e6-8e5f-a4866b1bdd18/ Confidence: 95%

How the Inquiry Was Decomposed

The original multi-part inquiry was processed through the PDE to extract structured intents, map dependencies, and identify ambiguities. The PDE uses a Four Directions framework (East/South/West/North) to organize the decomposition:

DirectionRoleIntents Identified
πŸŒ… East β€” VisionWhat the user wantsUnderstand local AI + cloud hybrid goal; Identify Indigenous knowledge context for persona training
πŸ”₯ South β€” AnalysisResearch tracksOpenClaw plugins; Mac Mini M4 specs; Community fine-tuning practices; QMD HuggingFace models
🌊 West β€” ValidationVerify assumptionsPlugin compatibility across Claw variants; 40GB model feasibility in unified memory; Mac Mini vs Studio
❄️ North β€” ActionDeliverablesTiered specs with pricing; Plugin recommendations; Training roadmap

MECE Research Tracks

The PDE identified 4 mutually exclusive, collectively exhaustive research tracks, which became the 5-agent Cycle 0 plan:

  1. OpenClaw Plugin Ecosystem β†’ Agent A
  2. Mac Mini Hardware for Inference β†’ Agent B
  3. Apple Silicon Training & Fine-Tuning β†’ Agent C + Agent D (split: general training + QMD-specific)
  4. Cloud Provider Capabilities β†’ Agent E

Identified Ambiguities (4)

  1. Which specific "Claw variants" beyond OpenClaw and Hermes?
  2. What is the budget ceiling for hardware?
  3. What is the intended training frequency (weekly? monthly?)?
  4. What level of Indigenous knowledge sensitivity applies?

Secondary Intents (9)

Dependency-ordered intents spanning evaluate, spec, research, investigate, clarify, verify, and budget actions. The PDE mapped 6 dependency edges between these intents to ensure proper sequencing.

Expected Outputs

The PDE pre-declared 5 artifact filenames:

  • RESULT-01-openclaw-plugins-local-ai.md
  • RESULT-02-mac-mini-inference-scenarios.md
  • RESULT-03-mac-mini-training-scenarios.md
  • RESULT-04-copilot-google-plugin.md
  • INDEX.md

Cycle 0: Initial Research

Agents: 5 Γ— claude-opus-4.6 sub-agents, launched in parallel Total Output: 2,548 lines of raw research Duration: ~506s (wall clock, limited by slowest agent) Concurrency Note: Platform hit a 4-agent concurrency limit; Agent E launched after Agent B completed (~365s)

Agent A β€” OpenClaw Plugin Ecosystem

File: 00-initial-research/agent-a-plugins.md (362 lines, ~415s) Focus: Which OpenClaw plugins serve local AI inference and training workflows

Key Findings:

  • OpenClaw is a general-purpose AI automation agent (~250K GitHub stars), NOT a coding agent
  • @openclaw/ollama-provider is the essential local AI plugin β€” auto-discovers models via /api/tags
  • Hermes Agent (Nous Research, Feb 2026) is a separate Python-based framework; plugins are NOT cross-compatible
  • Both OpenClaw and Hermes can share the same Ollama instance concurrently
  • Identified ACPX Runtime as OpenClaw's internal plugin execution engine
  • Anthropic blocked OAuth tokens on April 4, 2026

Issues Found in Review: Incorrectly characterized the Copilot provider as "unofficial/community integration" (it is bundled first-class). Missing GPT-5 family models, Claude Opus, security warnings about malicious plugins.

Agent B β€” Mac Mini M4 Inference Specifications

File: 00-initial-research/agent-b-mac-inference.md (492 lines, ~365s) Focus: Hardware specs, pricing, and inference benchmarks for two scenarios

Key Findings:

  • Scenario A (Minimal): Mac Mini M4, 24GB, 512GB β€” $999. Runs 7B–8B models at 28–35 tok/s
  • Scenario B (Evolved): Mac Mini M4 Pro, 64GB, 1TB β€” $2,199. Runs up to 70B Q4 at 6–8 tok/s
  • RAM is soldered and cannot be upgraded β€” critical decision point
  • Unified Memory Architecture eliminates CPU↔GPU transfer bottleneck
  • Memory bandwidth (not compute) is the LLM inference bottleneck
  • Mac Studio M4 Max (128GB, $3,699) is the upgrade path

Issues Found in Review: 14-core M4 Pro pricing understated. MLX backend status overstated ("default" vs "preview"). Benchmark inconsistencies between Metal and MLX numbers. RTX 4070 Super comparison misleading. Missing M5 chip timeline, CAD pricing, DeepSeek-R1 benchmarks.

Agent C β€” Apple Silicon Training Capabilities

File: 00-initial-research/agent-c-mac-training.md (593 lines, ~506s) Focus: Training frameworks, feasibility, and hardware requirements

Key Findings:

  • MLX-LM (v0.31.1) is the primary framework for LoRA/QLoRA on Apple Silicon
  • LoRA fine-tuning of 7B models: 10–30 min per adapter on M4 Pro
  • Five persona adapters trainable in ~2.5–3 hours
  • Weekend self-training schedule proposed with Friday night automation
  • Mac Mini M4 Pro 48GB is the recommended training configuration

Issues Found in Review: mlx-tune version inflated (v0.4.21 β†’ actual v0.4.19). EmbeddingGemma architecture misidentified (claimed Gemma 3 decoder, actually encoder-only). Mac Mini pricing incorrect. LoRA CLI example outdated. Missing QMD pipeline CUDA dependency warning.

Agent D β€” QMD HuggingFace Models Analysis

File: 00-initial-research/agent-d-qmd-models.md (634 lines, ~410s) Focus: QMD's three GGUF models, fine-tuning potential, and deployment

Key Findings:

  • QMD uses exactly 3 GGUF models: embeddinggemma-300M, Qwen3-Reranker-0.6B, qmd-query-expansion-1.7B
  • All three swappable via environment variables (QMD_EMBED_MODEL, QMD_RERANK_MODEL, QMD_GENERATE_MODEL)
  • Complete fine-tuning pipeline exists in finetune/ directory for query expansion model
  • Training data: ~2,290 examples, LoRA config: rank 16, alpha 32
  • Cloud training via HuggingFace Jobs: ~$1.50/run on A10G

Issues Found in Review: Same EmbeddingGemma architecture error as Agent C. Training data format not fully explained. CUDA dependency in pipeline not flagged. GGUF conversion for encoder-only models unverified β€” identified as BLOCKING risk.

Agent E β€” Copilot + Google Plugin Capabilities

File: 00-initial-research/agent-e-copilot-google.md (467 lines, ~393s) Focus: What Copilot subscription enables in OpenClaw, Google plugin capabilities

Key Findings:

  • GitHub Copilot provider gives $0 marginal cost access to multiple model families
  • Google plugin provides 5 capability contracts: LLM, image gen, video gen, music gen, web search
  • Perplexity plugin is web-search only, not an LLM provider
  • Multi-provider routing supports primary + fallback chain configuration

Issues Found in Review: Used deprecated gpt-4o in all config examples. Missing entire GPT-5 family, Claude Opus/Haiku models. Oversimplified "$0 for all models" claim (premium quotas exist). Star count inconsistency.


Cycle 1: Review

Agents: 2 Γ— claude-opus-4.6 review agents, launched in parallel Total Output: 662 lines of review analysis Duration: ~387s (wall clock)

Track 1 Reviewer β€” Plugins, Inference, Copilot (Agents A, B, E)

File: 01-review/review-track1-plugins-inference-copilot.md (335 lines, ~352s) Overall Grade: B+

What This Reviewer Did:

  • Cross-referenced all three documents for contradictions
  • Verified 16 factual claims against web sources and official documentation
  • Identified 5 critical issues and 15 revision items
  • Provided corrected data tables (pricing, model catalog, MLX performance)

Critical Issues Found:

  1. Copilot provider contradiction β€” Agent A called it "unofficial/community," Agent E showed it bundled in the monorepo. Resolution: Agent E correct.
  2. GPT-4o deprecated β€” All config examples used a deprecated model. Must use GPT-4.1.
  3. M4 Pro pricing understated β€” 14-core 64GB/1TB listed at $2,399 but actual Apple price may be $2,499–$3,499.
  4. MLX backend status overstated β€” Preview, not default; may require OLLAMA_MLX=1.
  5. Missing GPT-5 model family β€” GPT-5 mini, 5.2, 5.3-Codex, 5.4 all available but absent from all documents.

Cross-Document Contradictions Identified: 6 (Copilot provider status, star counts, MLX benchmark inconsistency, Hermes characterization, OpenClaw description, missing cross-references).

Track 2 Reviewer β€” Training, QMD Models (Agents C, D)

File: 01-review/review-track2-training-qmd.md (327 lines, ~387s) Overall Grade: B+

What This Reviewer Did:

  • Verified Agent D's QMD source code claims against actual commit cfd640e
  • Confirmed all 3 model URIs, env vars, and SDK API match source code
  • Verified fine-tuning pipeline existence (20+ files in finetune/)
  • Identified 2 BLOCKING issues and 12 revision items

BLOCKING Issues:

  1. GGUF conversion unverified β€” Converting fine-tuned encoder-only embedding model back to GGUF is experimental. If conversion fails, the entire embedding fine-tuning proposal collapses. Mitigation: test with dummy data first; fallback to Qwen3-Embedding-0.6B.
  2. CUDA dependency in QMD pipeline β€” pyproject.toml depends on nvidia-ml-py and targets A10G GPUs. Won't run on Mac without modification.

Cross-Document Contradictions: 5 (embedding framework disagreement, sentence-transformers compatibility, CUDA vs Mac, priority ordering, training data format).

Data Sovereignty Gap: Both documents were "significantly deficient" on Indigenous data sovereignty. The reviewer added detailed OCAP principles, knowledge classification, consent requirements, and cloud training risks.

Key Corrections Identified (Combined)

  1. Copilot provider is bundled first-class, not unofficial/community
  2. GPT-4o is deprecated β€” replace with GPT-4.1 in all examples
  3. Add GPT-5 family (5-mini, 5.2, 5.2-codex, 5.3-codex, 5.4, 5.4-mini) to model catalog
  4. Add Claude Opus 4.5/4.6 and Claude Haiku 4.5 to model catalog
  5. EmbeddingGemma-300M is encoder-only (BERT-like), NOT Gemma 3 decoder
  6. mlx-tune version is v0.4.19, not v0.4.21
  7. GGUF conversion for fine-tuned encoder models is unverified β€” BLOCKING risk
  8. QMD finetune pipeline is CUDA-only β€” needs modification for Mac
  9. Mac Mini M4 Pro pricing corrections with verified Apple Store numbers
  10. MLX backend is preview, may need explicit OLLAMA_MLX=1 enablement
  11. Premium request quotas exist β€” "$0 for all models" is oversimplified
  12. M5 chip timeline should be included (expected WWDC June 2026)
  13. Canadian pricing needed (1 USD β‰ˆ 1.38 CAD)
  14. OpenClaw security crisis β€” 12–20% malicious community plugins
  15. Data sovereignty protocols β€” OCAP principles, knowledge classification, local-only training

Cycle 2: Revision

Agents: 3 Γ— claude-opus-4.6 revision agents, launched in parallel Total Output: 2,331 lines, 17,285 words across 4 final documents Duration: ~448s (wall clock)

Reviser 1 β€” Plugins + Copilot/Google (RESULT-01 + RESULT-04)

Output Files:

  • RESULT-01-openclaw-plugins-local-ai.md (419 lines, 2,826 words)
  • RESULT-04-copilot-google-plugin.md (515 lines, 3,369 words) Duration: ~448s

Corrections Incorporated:

  • Copilot provider corrected from "unofficial" to "bundled first-class extension"
  • All gpt-4o references replaced with gpt-4.1
  • Complete GPT-5 family, Claude Opus/Haiku, Gemini models added to catalog (17+ models)
  • Premium request quota system documented (300/mo Pro, 1,500/mo Pro+)
  • Security warning added: 12–20% malicious community plugins
  • Anthropic OAuth block scope corrected (all consumer tiers, not just Pro/Max)
  • Perplexity pricing detailed ($20/mo subscription + token costs)
  • Total cost analysis added combining hardware + subscriptions + API costs
  • Hermes claw migrate tool referenced for cross-platform migration
  • Star count standardized to ~250K across documents

Reviser 2 β€” Mac Mini Inference Scenarios (RESULT-02)

Output File: RESULT-02-mac-mini-inference-scenarios.md (546 lines, 5,106 words) Duration: ~333s

Corrections Incorporated:

  • Complete pricing table with verified Apple MSRP (USD) and estimated CAD
  • 14-core M4 Pro pricing corrected to $2,399–$2,499 range with notes
  • 12-core M4 Pro at $2,199 recommended as better value (same bandwidth)
  • MLX backend status corrected: "preview, may need OLLAMA_MLX=1"
  • MLX speedup range corrected: "21–29% for 8B models" (not misleading 87%)
  • Benchmark tables clearly labeled Metal vs MLX
  • RTX 4070 Super misleading comparison removed/corrected
  • DeepSeek-R1 32B benchmarks added (11–14 tok/s on M4 Pro 64GB)
  • Ollama OLLAMA_NUM_PARALLEL concurrency discussion added
  • M5 chip timeline section added (expected WWDC June 2026)
  • Canadian pricing added throughout (1 USD β‰ˆ 1.38 CAD)
  • Power consumption corrected: 50–80W range (not just 50W)
  • GPT-4o deprecation warning added

Reviser 3 β€” Mac Mini Training Scenarios (RESULT-03)

Output File: RESULT-03-mac-mini-training-scenarios.md (851 lines, 5,984 words) Duration: ~364s

BLOCKING Issues Resolved:

  1. GGUF conversion risk: Explicitly documented as "hardest unsolved step." Added mitigation strategy (test with dummy data first, fallback to Qwen3-Embedding-0.6B). Marked as experimental throughout.
  2. CUDA dependency: Documented three forward paths (HuggingFace Jobs cloud, MLX-LM local rewrite, PyTorch MPS modification) with data sovereignty implications for each.

Other Corrections:

  • EmbeddingGemma architecture corrected to "encoder-only transformer"
  • mlx-tune version corrected to v0.4.19 with capability matrix caveated as "claimed"
  • LoRA CLI corrected from python lora.py to python -m mlx_lm.lora
  • MLX vs MLX-LM distinction clarified (separate packages)
  • Mac Mini pricing corrected (48GB/1TB: $2,299, not $1,800)
  • Training automation script enhanced with error handling, logging, rollback, and validation gates
  • launchd recommended over cron for macOS scheduling
  • Data sovereignty section elevated to foundational requirement
  • OCAP principles applied to training data and model weights
  • Knowledge classification system added (Public/Community/Sacred/Private)
  • Evaluation metrics added (MRR, Precision@5, A/B testing)
  • Multilingual considerations added (Indigenous languages, French, English)
  • "What's Proven vs Experimental" honesty table added

Final Deliverables

#FileDescriptionLinesWords
1RESULT-01-openclaw-plugins-local-ai.mdPlugin ecosystem: Ollama, Copilot, Google, Perplexity, Hermes compatibility, multi-provider routing, security4192,826
2RESULT-02-mac-mini-inference-scenarios.mdHardware for inference: Scenario A ($999) vs B ($2,199), benchmarks, pricing, thermal, hybrid strategy5465,106
3RESULT-03-mac-mini-training-scenarios.mdHardware for training: MLX-LM, QMD fine-tuning, LoRA personas, data sovereignty, automation pipeline8515,984
4RESULT-04-copilot-google-plugin.mdCloud providers: Copilot model catalog, premium quotas, Google capabilities, Perplexity, cost analysis5153,369
β€”INDEX.mdMaster index with quick-answer section67β€”
β€”AGENTS.mdThis orchestration reportβ€”β€”

Lessons Learned

What Worked Well

  1. PDE decomposition prevented scope drift. By pre-declaring 4 MECE tracks and 5 expected artifacts, the research stayed focused. No agent wandered into territory covered by another.

  2. The 3-cycle pipeline (Research β†’ Review β†’ Revision) caught significant errors. Without Cycle 1 review, the final documents would have contained deprecated model names (GPT-4o), incorrect pricing ($1,800 vs $2,299), a wrong architecture description (EmbeddingGemma), and an unacknowledged BLOCKING risk (GGUF conversion). The review cycle added ~12 minutes of wall time but prevented material misinformation.

  3. Cross-document verification was invaluable. The reviewers found 11 cross-document contradictions that no single-agent approach would have caught (e.g., Copilot provider described as "unofficial" in one doc and "bundled" in another; two different star counts; two different tok/s figures for the same hardware).

  4. Parallel agent execution was efficient. 5 agents running simultaneously in Cycle 0 produced 2,548 lines of research in ~8.5 minutes (wall time). Sequential execution would have taken ~35 minutes.

  5. Source code verification by Track 2 reviewer confirmed all QMD model URIs, env vars, and SDK API exactly matched the source. This turned speculative claims into verified facts.

  6. Data sovereignty gap identification demonstrated that domain-specific review catches what general-purpose research misses. Both initial agents were "significantly deficient" on Indigenous data sovereignty β€” a foundational requirement the reviewer correctly elevated.

What Could Be Improved

  1. 4-agent concurrency limit caused serialization. Agent E had to wait for Agent B to complete, adding ~6 minutes of unnecessary delay. A 5-agent limit would have saved this.

  2. Initial research agents shared a misconception. Both Agent C and Agent D independently produced the same incorrect EmbeddingGemma architecture description ("Gemma 3 decoder" instead of encoder-only). This suggests they used the same flawed source. A pre-research fact-check step or different source diversity could prevent correlated errors.

  3. Review agents should produce structured correction manifests. The reviewers produced prose documents with inline corrections. A structured format (JSON/YAML) with file, line, current_text, corrected_text fields would make the revision cycle more mechanical and less error-prone.

  4. No evaluation of final output quality. The pipeline lacks a Cycle 3 validation step that would verify the revisions actually incorporated all corrections. A quick diff-based check could confirm this.

  5. Training time estimates remain extrapolated. Despite thorough review, M4 Pro training benchmarks are still scaled from M2/M3 data. Future research should include actual M4 Pro benchmark runs.

  6. Reviser workload was imbalanced. Reviser 1 produced 2 documents (934 lines) while Reviser 3 produced 1 document (851 lines). Reviser 1 took longer (~448s vs ~364s). Better load balancing would equalize durations.

Agent Interaction Patterns Observed

PatternDescriptionExample
Correlated errorMultiple agents produce the same incorrect claim from a shared bad sourceEmbeddingGemma architecture in Agents C and D
Contradictory characterizationAgents describe the same entity differently based on different sourcesCopilot provider: "unofficial" (A) vs "bundled" (E)
Complementary depthAgents covering adjacent topics reinforce each other's findingsAgent D's QMD source code verification validated Agent C's training feasibility claims
Gap revelationCross-referencing reveals missing connectionsAgent B's RAM calculations didn't account for Agent A's LanceDB memory plugin RAM requirements
Scope creep resistancePDE pre-declaration kept agents in lanesNo agent attempted to cover another agent's research track

Statistics

MetricValue
Total agents spawned10
Agent modelclaude-opus-4.6 (all agents)
Cycle 0 agents5 (research)
Cycle 1 agents2 (review)
Cycle 2 agents3 (revision)
Total lines produced5,541 (2,548 + 662 + 2,331)
Final deliverable lines2,331
Final deliverable words17,285
Total documents created13 (5 research + 2 review + 4 revision + INDEX + AGENTS)
Critical issues found7 (5 in Track 1, 2 BLOCKING in Track 2)
Revision items27 (15 in Track 1, 12 in Track 2)
Verified facts32 (16 + 16)
Cross-document contradictions11 (6 in Track 1, 5 in Track 2)
Estimated wall-clock time~25 minutes total
Cycle 0 duration (wall)~506s (~8.4 min)
Cycle 1 duration (wall)~387s (~6.5 min)
Cycle 2 duration (wall)~448s (~7.5 min)
PDE decomposition4 MECE tracks, 9 secondary intents, 4 ambiguities
Concurrency limit hitYes (4-agent max; Agent E delayed)

File Tree

RCH-tech-jgwill-claws-infrastructure--2604150901--a989a67a-bd77-4cf4-b093-a24cda73d48f/
β”‚
β”œβ”€β”€ AGENTS.md                          ← This document (orchestration report)
β”œβ”€β”€ INDEX.md                           ← Master index with quick answers
β”‚
β”œβ”€β”€ RESULT-01-openclaw-plugins-local-ai.md      ← Final: Plugin ecosystem guide
β”œβ”€β”€ RESULT-02-mac-mini-inference-scenarios.md    ← Final: Hardware for inference
β”œβ”€β”€ RESULT-03-mac-mini-training-scenarios.md     ← Final: Hardware for training
β”œβ”€β”€ RESULT-04-copilot-google-plugin.md           ← Final: Cloud provider capabilities
β”‚
β”œβ”€β”€ 00-initial-research/               ← Cycle 0: Raw research (5 agents)
β”‚   β”œβ”€β”€ agent-a-plugins.md             β”‚  362 lines β€” OpenClaw plugin ecosystem
β”‚   β”œβ”€β”€ agent-b-mac-inference.md       β”‚  492 lines β€” Mac Mini M4 inference specs
β”‚   β”œβ”€β”€ agent-c-mac-training.md        β”‚  593 lines β€” Apple Silicon training
β”‚   β”œβ”€β”€ agent-d-qmd-models.md          β”‚  634 lines β€” QMD models & fine-tuning
β”‚   └── agent-e-copilot-google.md      β”‚  467 lines β€” Copilot + Google plugin
β”‚
β”œβ”€β”€ 01-review/                         ← Cycle 1: Cross-reference review (2 agents)
β”‚   β”œβ”€β”€ review-track1-plugins-inference-copilot.md  β”‚  335 lines β€” Agents A+B+E
β”‚   └── review-track2-training-qmd.md               β”‚  327 lines β€” Agents C+D
β”‚
β”œβ”€β”€ 02-revision/                       ← Cycle 2: Corrected final documents (3 agents)
β”‚   β”œβ”€β”€ RESULT-01-openclaw-plugins-local-ai.md      β”‚  419 lines
β”‚   β”œβ”€β”€ RESULT-02-mac-mini-inference-scenarios.md    β”‚  546 lines
β”‚   β”œβ”€β”€ RESULT-03-mac-mini-training-scenarios.md     β”‚  851 lines
β”‚   └── RESULT-04-copilot-google-plugin.md           β”‚  515 lines
β”‚
└── .pde/                             ← PDE decomposition artifacts
    └── 2604150910--325a8ade-e716-45e6-8e5f-a4866b1bdd18/
        β”œβ”€β”€ pde-325a8ade-e716-45e6-8e5f-a4866b1bdd18.json  ← Structured decomposition
        └── pde-325a8ade-e716-45e6-8e5f-a4866b1bdd18.md    ← Markdown export

How to Replicate This Orchestration

Prerequisites

  • Access to claude-opus-4.6 (or equivalent) via GitHub Copilot CLI with sub-agent spawning
  • mcp-pde MCP server configured for prompt decomposition
  • Web search tools for agent verification steps

Pipeline Steps

  1. Decompose the inquiry using pde_decompose β†’ pde_parse_response to produce structured MECE tracks
  2. Spawn N research agents in parallel, one per track, each writing to 00-initial-research/agent-{letter}-{topic}.md
  3. Spawn M review agents in parallel, each cross-referencing a subset of research docs, writing to 01-review/review-track{n}.md
  4. Spawn K revision agents in parallel, each producing final RESULT-{nn}-{topic}.md files in 02-revision/
  5. Assemble β€” copy RESULT files to root, create INDEX.md, create AGENTS.md

Key Design Decisions

  • Why 3 cycles, not 2? The review cycle catches errors that agents cannot self-detect (cross-document contradictions, correlated misconceptions, scope gaps). Skipping it would have left 7 critical issues in the final output.
  • Why opus, not sonnet/haiku? Research requiring web search, source verification, and long-form synthesis benefits from higher-capability models. The cost difference (~$0.15/agent vs ~$0.03) is negligible for a 10-agent orchestration.
  • Why MECE decomposition? Non-overlapping tracks prevent duplicate work and contradictory framings of the same topic. The PDE enforces this structure.

Generated April 15, 2026. This document describes the orchestration of 10 claude-opus-4.6 sub-agents across 3 cycles producing 4 final research deliverables for IAIP infrastructure planning.