← Back to Articles & Artefacts
artefactssouth

Track 1 Review: Plugins, Inference Hardware, and Copilot/Google Analysis

IAIP Research
rch-tech-jgwill-claws-infrastructure

Track 1 Review: Plugins, Inference Hardware, and Copilot/Google Analysis

Reviewer: Senior Technical Reviewer
Review Date: April 15, 2026
Documents Reviewed:

  1. agent-a-plugins.md — OpenClaw & Hermes Plugin Ecosystem Analysis
  2. agent-b-mac-inference.md — Mac Mini M4 Hardware for Local AI Inference
  3. agent-e-copilot-google.md — Copilot Provider, Google Plugin, Perplexity Plugin

Review Summary

Overall Quality: B+ (Good but with material issues that must be corrected before use)

The three documents together form a solid foundation for Guillaume's purchasing decision, but contain several factual errors, internal contradictions, and gaps that would mislead a buyer. The most critical issues are:

  1. A direct contradiction between Documents 1 and 3 about whether the GitHub Copilot provider is "unofficial/community" or "bundled/first-class" — these cannot both be true and the recommendation changes based on which is correct.
  2. GPT-4o is listed as a current Copilot model in Document 3, but GPT-4o is being deprecated in GitHub Copilot, with GPT-4.1 as its replacement. This makes the primary model in all config examples stale.
  3. Mac Mini M4 Pro 14-core/20-core pricing is significantly understated in Document 2 — the document says $2,399 for 64GB/1TB but Apple's actual price is approximately $3,499.
  4. MLX backend status is overstated — Document 2 says it's "default since March 2026" but sources indicate it's still in preview, not default.
  5. Missing the GPT-5 model family — As of April 2026, GPT-5 mini, GPT-5.2, GPT-5.3-Codex, and GPT-5.4 are all available through GitHub Copilot. None of the documents mention these.

Despite these issues, the documents demonstrate strong structural organization, well-sourced evidence tables, and genuinely useful configuration examples. The hardware analysis in Document 2 is particularly thorough. With corrections applied, these would serve as excellent guides.


Document 1: Plugins Analysis — Review

Accuracy Issues

  1. CRITICAL — Copilot provider characterization is wrong. Document 1 calls @openclaw/github-copilot-provider an "unofficial/community integration" with the caveat that "GitHub may change API access at any time" and "Authentication and feature support may be incomplete compared to official VS Code extension." However, Document 3 (which cites actual source code paths in the OpenClaw monorepo like extensions/github-copilot/) presents it as a bundled, first-class extension enabled by default. Web verification confirms it is bundled in the monorepo. Document 1's characterization is incorrect and must be corrected.

  2. ACPX acronym is unverified. The document itself flags "Active Claw Plugin eXecution" as low-confidence, sourced from "one web search result." The acronym expansion should be removed or clearly marked as unconfirmed rather than presented as a section header definition.

  3. Star count discrepancy. Document 1 says "250K+ GitHub stars by March 2026." Document 3's sources section says "357K+ stars." Web verification shows the widely-cited official figure is ~250,829, with some aggregators showing up to 350K. The documents should use a consistent, verifiable number.

  4. Anthropic OAuth block date. Document 1 says "April 4, 2026" — verified correct via multiple authoritative sources (TechCrunch, dev.to, aitoolsrecap.com). However, the document should note that this block affects ALL consumer subscription tiers (Free, Pro, Max, Team), not just "Pro/Max" as stated.

  5. Ollama auto-discovery endpoint. Document states /api/tags for model discovery — this is correct for Ollama's API.

Completeness Gaps

  1. No mention of the Anthropic-specific Copilot provider distinction. Via GitHub Copilot's API, Claude models ARE accessible (Claude Sonnet 4.5, 4.6) at $0 marginal cost. This is separate from Anthropic's own OAuth block. The document should clarify that Guillaume can still use Claude via Copilot subscription — the Anthropic block only affects direct Anthropic auth tokens.

  2. Missing newer models. No mention of GPT-5 family (GPT-5 mini, GPT-5.2, GPT-5.3-Codex, GPT-5.4), Claude Opus 4.5/4.6, or Claude Haiku 4.5 — all available through Copilot as of April 2026.

  3. No security warning about OpenClaw's plugin ecosystem. A source found during verification (particula.tech) reports that "20% of OpenClaw skills were found [malicious]" — a significant security concern that Guillaume should know about when installing community plugins.

  4. Hermes Agent plugin system is more capable than described. The document says Hermes uses "Python modules + SKILL.md specs" but verification reveals a full plugin system with plugin.yaml manifests, lifecycle hooks (pre_tool_call, post_tool_call), pip/directory distribution, and 40+ built-in tools. The comparison table undersells Hermes.

  5. Missing hermes claw migrate tool. Document 3 mentions this migration tool that imports OpenClaw configs/skills into Hermes. Document 1's compatibility section should reference this since it directly enables cross-platform workflow migration.

Corrections Needed

ItemCurrent TextCorrection
Copilot provider status"unofficial/community integration""Bundled first-class extension in OpenClaw monorepo"
Copilot provider caveat"GitHub may change API access at any time"Remove or soften — this is a maintained, official integration
ACPX section header"ACPX Runtime ('Active Claw Plugin eXecution')""ACPX Runtime (acronym expansion unconfirmed)"
Anthropic OAuth block scope"Claude Pro/Max OAuth tokens""All consumer subscription OAuth tokens (Free/Pro/Max/Team)"
Star count"250K+"Use "~250K+" consistently across all docs; note some aggregators show higher

Additional Findings from Verification

  • OpenClaw security crisis (2026): ~20% of community-published skills/plugins were found to contain malicious code. This makes plugin vetting critical for Guillaume's setup. Source: particula.tech
  • Hermes Agent released February 2026 (not just "a separate project" — it's recent), with 200+ LLM model support, 40+ built-in tools, and full OAuth 2.1 security
  • OpenClaw moved to community foundation after Steinberger joined OpenAI — confirmed. OpenAI provides financial support to the foundation.

Document 2: Mac Mini Inference — Review

Accuracy Issues

  1. CRITICAL — 14-core M4 Pro pricing is significantly understated. The document lists:

    • 14-core/20-core, 64GB, 1TB at $2,399

    Web verification from Apple's own store and B&H Photo shows this configuration at $3,499 (Apple) / $2,399–$2,499 (B&H for some variants). The discrepancy may be due to the CTO upgrade pricing being applied differently than expected. The document's claim of "+$200 to the 12-core base" for the 14-core upgrade appears to be incorrect — the actual premium is substantially higher for higher-RAM configurations. This must be verified against Apple.com/shop directly and corrected.

  2. MLX backend status is overstated. Document says "MLX backend (default since March 2026)." Web verification shows Ollama 0.19's MLX support was launched in preview in March 2026 — some sources say it's "now default," others say "preview." The document should note the preview status and that the user may need to explicitly enable it via OLLAMA_MLX=1 environment variable on some installations.

  3. MLX speed improvement claim of "87% faster" is cherry-picked. The 87% figure applies only to Qwen3 0.6B Q4 — a tiny model. For the 8B models Guillaume would actually run, the improvement is 21–29%. The headline claim should use a range like "20–90% faster depending on model size" rather than leading with the extreme case.

  4. Benchmark tok/s numbers are internally inconsistent. The "Scenario A" section says M4 24GB runs Llama 3.1 8B at "28–35 tok/s" (line 204). The "With MLX Backend" subsection immediately below says the same model runs at "45–60 tok/s" (line 217). The Scenario A main table should clarify whether these are Metal or MLX numbers — currently the reader sees two contradictory figures for the same hardware/model combination.

  5. RTX 4070 Super comparison is misleading. The NVIDIA comparison table says RTX 4070 Super (12GB) gets "~12 tok/s" on Llama 70B Q4. But the footnote says "requires offloading layers to CPU RAM, severely degrading performance." A 70B Q4 model (38GB) cannot fit in 12GB VRAM — the actual performance with heavy CPU offloading would be closer to 2–4 tok/s, not 12. This comparison is misleading and should be corrected or removed.

  6. Mac Studio M4 Max pricing. Document says "$2,499–$2,999." This needs verification — Apple's Mac Studio M4 Max configurations vary widely. The 128GB version specifically should be priced.

  7. Temperature "118°C" claim. The document reports GPU temperatures reaching 118°C under max load. This is physically extreme (above the boiling point of water) and likely a thermal junction reading, not a general surface or die temperature. The document itself flags this as "extreme case from one test" but it's still presented in the main table without sufficient context.

Completeness Gaps

  1. No M5 chip mention in the decision matrix. The Ollama MLX search revealed that M5 chips exist (M5 Max benchmarks are cited). If Apple's M5 Mac Mini is expected in late 2026, this is a critical planning factor — Guillaume may want to wait 6 months for dramatically better performance. One source explicitly says the M4 Mac Mini is "approaching the end of its cycle."

  2. No Canadian pricing. Guillaume Descoteaux-Isabelle has a French-Canadian name — the document should include CAD pricing or at minimum note the USD-to-CAD conversion for major configurations.

  3. Missing DeepSeek-R1 model benchmarks. DeepSeek-R1 is a popular model for local inference in 2026. One verification source shows DeepSeek-R1 32B at 11–14 tok/s on M4 Pro 64GB. This is a notable model not included in the benchmark tables.

  4. No discussion of Ollama model concurrency. Ollama supports running multiple models simultaneously with OLLAMA_NUM_PARALLEL — this affects RAM planning significantly. For a multi-agent OpenClaw setup, this is critical.

  5. Missing TurboQuant explanation. TurboQuant is mentioned as enabling "70B at 32K context on 64GB" but the technique is not explained. What is it? Is it a quantization method, a KV cache optimization, or an Ollama feature? When was it released? The reader needs context.

  6. Power consumption comparison is incomplete. The table says Mac Mini M4 Pro uses ~50W. This is the idle/light load figure. Under sustained AI inference, the Mac Mini M4 Pro draws 65–80W. The comparison with NVIDIA rigs is still favorable but the numbers should be accurate.

Corrections Needed

ItemCurrent TextCorrection
14-core M4 Pro pricing (64GB/1TB)"$2,399"Verify — may be $2,399–$3,499 depending on retailer vs Apple direct
MLX backend status"default since March 2026""Released in preview March 2026; may need explicit enablement"
MLX speed headline"up to 87% faster""20–90% faster depending on model size (87% for tiny models, ~25% for 8B)"
RTX 4070 Super 70B tok/s"~12"Remove or correct — severe CPU offloading makes this misleading
Mac Mini power consumption"~50W""50–80W (idle to sustained inference)"
Benchmark tablesTwo different tok/s for same configClearly label Metal vs MLX numbers in all tables

Additional Findings from Verification

  • Ollama MLX backend requires 32GB minimum RAM according to one source. If true, this means the 24GB M4 base config may not benefit from MLX, only from the old Metal backend. This is a significant finding that changes Scenario A's performance claims.
  • M5 Max benchmarks already exist showing 4× faster TTFT than M4 and 19–27% faster decode. An M5 Mac Mini refresh is expected late 2026.
  • NVFP4 quantization is now supported in Ollama 0.19, providing an alternative to GGUF/Q4_K_M format with potentially better inference consistency across platforms.

Document 3: Copilot + Google Plugin — Review

Accuracy Issues

  1. CRITICAL — GPT-4o is deprecated in GitHub Copilot. The document lists gpt-4o as a current model in the Copilot defaults and uses it as the primary model in ALL configuration examples (e.g., "primary": "github-copilot/gpt-4o"). GitHub announced GPT-4o deprecation with GPT-4.1 as the replacement. As of April 2026, GPT-4o is being phased out and usage may count against premium request quotas. All configuration examples must be updated to use gpt-4.1 instead.

  2. Missing GPT-5 family models. The Copilot model catalog in the document only lists GPT-4o, GPT-4.1, GPT-4.1-mini, GPT-4.1-nano, o1, o1-mini, o3-mini, and Claude Sonnet 4.5/4.6. Verification shows GPT-5 mini, GPT-5.2, GPT-5.3-Codex, and GPT-5.4 are now available through Copilot. This is a major omission — these are the newest, most capable models.

  3. Missing Claude Opus models. The document only lists Claude Sonnet 4.5 and 4.6. Verification shows Claude Opus 4.5 and Opus 4.6 are also available in Copilot for advanced reasoning tasks. Claude Haiku 4.5 is also available for lightweight tasks.

  4. Missing Gemini models via Copilot. Verification shows Gemini 2.5 Pro and Gemini 3 Flash are available through GitHub Copilot. The document doesn't mention that some Google models can be accessed via Copilot (not just via the separate Google plugin).

  5. GitHub star count inconsistency. Document 3's sources section says "357K+ stars" for the OpenClaw repo. Document 1 says "250K+". The verified figure is ~250K (official/widely-cited).

  6. Copilot usage limits not discussed. The document states all Copilot model costs are "$0." While technically true for the subscription, GitHub Copilot has premium request quotas for certain models. Some models (like o1, Claude Opus) may count against limited monthly allowances depending on the plan tier. The "$0 marginal cost" framing is oversimplified.

  7. "Gemini 3.1 Pro" and "Gemini 3.1 Flash" naming. These model names need verification — they appear to be speculative future model names. As of April 2026, the latest verified Gemini models through Google's API are Gemini 2.5 Pro and possibly Gemini 3.x. The exact versioning ("3.1") should be confirmed.

Completeness Gaps

  1. No mention of Copilot's rate limits and throttling. GitHub Copilot enforces rate limits, smaller context windows, and additional filtering compared to direct API access. This is a real constraint for agentic usage through OpenClaw.

  2. No comparison of Copilot-via-OpenClaw vs direct API access. Document 3 should explain the tradeoff: Copilot gives "free" access to multiple models, but with smaller context windows and less control than paying for direct API keys. For agentic workloads that need large context, direct API may be worth the cost.

  3. No mention of xAI Grok models. Verification shows Grok Code Fast 1 is available through Copilot. This is another cloud option that complements Ollama.

  4. Google plugin Veo 3.1 / Lyria 3 pricing not discussed. The document lists video and music generation capabilities but doesn't mention their pricing. Google AI Studio has different rate limits and pricing tiers for these capabilities vs. basic LLM chat.

  5. Missing Perplexity pricing tiers. The document says Perplexity requires "a Perplexity account/plan" but doesn't specify costs. How much does Guillaume need to budget for Perplexity API access? Is there a free tier?

  6. No total cost of ownership calculation. Given Guillaume already has Copilot and Codex subscriptions, the document should calculate the incremental cost of adding Google and Perplexity API access alongside the hardware purchase.

Corrections Needed

ItemCurrent TextCorrection
Primary model in all examplesgithub-copilot/gpt-4ogithub-copilot/gpt-4.1 (or gpt-5.2)
Copilot model catalogLists 9 modelsAdd GPT-5 family (4 models), Claude Opus (2 models), Claude Haiku, Gemini models
Copilot cost claims"All model costs are $0""Included in subscription; some models consume premium request quotas"
GitHub star count"357K+""~250K+" (consistent with Document 1 and verified sources)

Additional Findings from Verification

  • Model selection for Claude and Codex agents on github.com was announced April 14, 2026 (yesterday relative to research date). This is highly current functionality where users can manually select between Claude and Codex agents on github.com — relevant for Guillaume's workflow.
  • Copilot paid users have unlimited GPT-4.1 access — GPT-4.1 does not count against premium request quotas, making it the optimal default model for OpenClaw primary provider.

Cross-Document Contradictions

1. Copilot Provider Status (CRITICAL)

  • Document 1: "This is an unofficial/community integration. GitHub may change API access at any time. Authentication and feature support may be incomplete."
  • Document 3: "The github-copilot provider is a bundled OpenClaw extension (enabled by default)." Cites source code path extensions/github-copilot/ in the monorepo.
  • Resolution: Document 3 is correct. The Copilot provider is maintained in the official OpenClaw monorepo. Document 1 must be corrected.

2. GitHub Star Count

  • Document 1: "250K+ GitHub stars by March 2026"
  • Document 3 sources: "357K+ stars"
  • Resolution: The officially-cited milestone is ~250K. The 357K figure may come from a different aggregator or include forks. Use ~250K consistently.

3. MLX Backend Impact vs Benchmark Tables

  • Document 2, Scenario A main table: Llama 3.1 8B at "28–35 tok/s" (M4 24GB)
  • Document 2, MLX subsection: Same model at "45–60 tok/s" (M4, with MLX)
  • Resolution: The main benchmark tables appear to use pre-MLX (Metal backend) numbers. Post-MLX numbers appear in a separate subsection. This is confusing — all tables should clearly indicate which backend the benchmarks assume, and since MLX is the current backend, it should be the primary reference.

4. Hermes Agent Characterization

  • Document 1: Describes Hermes's plugin format as "Python modules + SKILL.md specs" and plugin registry as "agentskills.io"
  • Document 3: Says Hermes "skills are Python-based and auto-generated by the agent from its own task execution patterns"
  • Resolution: Both are partially correct but incomplete. Hermes has a formal plugin system (plugin.yaml + Python modules + lifecycle hooks) AND an auto-learning skill system. The documents conflate plugins and skills.

5. OpenClaw Description

  • Document 1, Finding #1: "OpenClaw is a general-purpose AI automation agent (NOT a coding agent)"
  • Document 3: Describes it implicitly as an agent framework/gateway/orchestrator
  • Resolution: Both framings are defensible, but "general-purpose AI automation agent" is more accurate than "coding agent." No conflict, just emphasis difference.

6. Missing Cross-References

  • Document 2's hardware recommendations don't reference which OpenClaw plugins benefit from more RAM (e.g., LanceDB vector memory from Document 1 adds RAM requirements).
  • Document 1's vLLM/SGLang provider plugins aren't cross-referenced with Document 2's performance expectations (what hardware do vLLM/SGLang need vs Ollama?).
  • Document 3's multi-provider routing configuration doesn't account for Document 2's thermal throttling (if you're falling back to Ollama during sustained load, throttling matters).

Critical Gaps for Revision

The following items MUST be addressed in the revision pass:

  1. Correct the Copilot provider characterization in Document 1 — change from "unofficial/community" to "bundled first-class extension" with evidence from the monorepo source code.

  2. Replace all gpt-4o references in Document 3 config examples with gpt-4.1 (or gpt-5.2 for newest capability). GPT-4o is deprecated.

  3. Add GPT-5 model family, Claude Opus, and Claude Haiku to Document 3's Copilot model catalog.

  4. Verify and correct the 14-core M4 Pro pricing in Document 2. The $2,399 figure for 64GB/1TB may be understated by $500–$1,100 depending on configuration.

  5. Clarify MLX backend status in Document 2 — preview vs default, and whether the 32GB minimum RAM requirement is real (this affects Scenario A recommendations).

  6. Label ALL benchmark tables in Document 2 with backend used (Metal vs MLX). Remove or correct the RTX 4070 Super 70B comparison.

  7. Add Copilot premium request quota information to Document 3. The "$0 for all models" claim is misleading without noting that some models consume limited monthly premium requests.

  8. Add M5 chip timeline to Document 2's decision matrix. If M5 Mac Mini is expected late 2026, Guillaume may choose to wait — this is a material purchasing factor.

  9. Add Canadian pricing or currency conversion note to Document 2, given Guillaume's likely Canadian residence.

  10. Add OpenClaw plugin security warning to Document 1. The reported 20% malicious skill rate is a critical safety concern.

  11. Add total cost of ownership (TCO) section that combines hardware cost (Document 2) + subscription costs (Copilot, Codex) + API costs (Google, Perplexity) + electricity estimates.

  12. Cross-reference memory requirements between Document 1 (LanceDB memory plugin) and Document 2 (available RAM calculations). Running Ollama + LanceDB + OpenClaw + dev tools changes the RAM math.

  13. Explain TurboQuant in Document 2 — it's mentioned as enabling 70B at 32K context on 64GB but never defined or sourced.

  14. Add DeepSeek-R1 benchmarks to Document 2 — this is a popular 2026 model that's notably absent from all benchmark tables.

  15. Add Ollama OLLAMA_NUM_PARALLEL concurrency discussion to Document 2 for multi-agent scenarios.


Verified Facts

The following claims from the documents were verified against current (April 2026) sources:

ClaimDocumentStatusSource
OpenClaw ~250K GitHub stars1✅ Verifiedopenclaws.io, getpanto.ai
Peter Steinberger joined OpenAI Feb 20261✅ VerifiedTechCrunch, techstartups.com
OpenClaw transitioned to community foundation1✅ Verifiedtheclawstreetjournal.com
Anthropic blocked OAuth April 4, 20261✅ Verifieddev.to, natural20.com, kersai.com
Mac Mini M4 Pro starts at $1,399 (24GB/512GB)2✅ Verifiedmacprices.net, appleinsider.com
Mac Mini has no M4 Max option2✅ Verifiedapple.com, macrumors.com
M4 Pro memory bandwidth: 273 GB/s2✅ Verifiedapple.com specs
M4 Max memory bandwidth: 546 GB/s2✅ Verifiedapple.com specs
Ollama 0.19 MLX backend launched March 20262✅ Verifiedollama.com/blog/mlx
Thermal throttling at 8–10 min sustained load2✅ Verifiedvpsmac.com, macrumors forums
30–45% performance drop from throttling2✅ Verifiedblog.shiptasks.com, marc0.dev
Copilot provider is bundled in OpenClaw3✅ Verifiedgithub.com/openclaw/openclaw
Claude Sonnet 4.6 available via Copilot3✅ Verifiedgithub.blog changelog
Hermes Agent by Nous Research (Feb 2026)1,3✅ Verifiedhermes-agent.nousresearch.com
Hermes uses agentskills.io standard1✅ Verifiedhermes-agent.org, hermesatlas.com
GPT-4o deprecated in Copilot (replaced by GPT-4.1)N/A✅ New findinggithub.com community discussions

Updated/Corrected Data

Mac Mini M4 Pro Pricing (Corrected, April 2026)

ConfigurationDocument 2 PriceVerified PriceΔ
M4 Pro 12c, 24GB, 512GB$1,399$1,399✅ Correct
M4 Pro 12c, 48GB, 1TB$1,999$1,999✅ Correct
M4 Pro 12c, 64GB, 1TB$2,199$2,199✅ Correct
M4 Pro 14c, 24GB, 1TB$1,799Needs verification⚠️
M4 Pro 14c, 64GB, 1TB$2,399$2,499–$3,499❌ Understated

GitHub Copilot Model Catalog (Updated, April 2026)

ModelStatusNotes
gpt-4oDEPRECATEDReplaced by GPT-4.1; may still work but consumes premium quotas
gpt-4.1✅ Current defaultUnlimited for paid Copilot users
gpt-4.1-mini✅ AvailableLighter variant
gpt-4.1-nano✅ AvailableLightest variant
gpt-5-miniNEW (missing from docs)Fast, cheap
gpt-5.2NEW (missing from docs)Standard
gpt-5.3-codexNEW (missing from docs)Optimized for code
gpt-5.4NEW (missing from docs)Latest flagship
claude-sonnet-4.5✅ AvailableVia Anthropic Messages transport
claude-sonnet-4.6✅ AvailableGA since Feb 17, 2026
claude-opus-4.5NEW (missing from docs)Advanced reasoning
claude-opus-4.6NEW (missing from docs)Premium reasoning
claude-haiku-4.5NEW (missing from docs)Fast, lightweight
o1✅ AvailableReasoning model
o1-mini✅ AvailableReasoning model (smaller)
o3-mini✅ AvailableReasoning model
gemini-2.5-proNEW (missing from docs)Via Copilot
gemini-3-flashNEW (missing from docs)Via Copilot

Corrected Config Example for Document 3

{
  agents: {
    defaults: {
      // LLM: Copilot primary (GPT-4.1, NOT deprecated GPT-4o)
      model: {
        primary: "github-copilot/gpt-4.1",
        fallbacks: ["github-copilot/gpt-5.2", "ollama/gemma4"]
      },
      memorySearch: {
        provider: "github-copilot"
      }
    }
  }
}

MLX Backend Performance (Corrected Context)

Model SizeMLX Improvement vs MetalContext
<1B (tiny)39–87%Largest gains on tiny models
8B (standard)21–29%Typical models Guillaume would run
35B+ (large)~93% decode improvementOn M5 Max hardware (not M4)

Note: The 87% headline figure from Document 2 applies only to Qwen3 0.6B. For the 7B–8B models that form the core of Scenario A, expect ~25% improvement.


Review completed April 15, 2026. All verification searches performed same day.