Agent C: Apple Silicon Training & Fine-Tuning Capabilities

Research Date: April 15, 2026 Scope: Training/fine-tuning AI models locally on Mac Mini for Guillaume's Indigenous-AI collaborative platform Focus: Hardware requirements, framework maturity, practical feasibility, training time estimates

Key Findings

Local fine-tuning on Mac Mini is viable and practical for Guillaume's use case. LoRA/QLoRA fine-tuning of 7B–8B models completes in 10–30 minutes on M4 Pro hardware. Embedding model fine-tuning for QMD-style models takes 5–20 minutes for domain-specific datasets of 1K–10K examples.
MLX + MLX-LM is the gold standard for Apple Silicon training. It eliminates CPU–GPU data copying via unified memory, installs with pip, and supports LoRA, QLoRA, DPO, GRPO, SFT, embedding fine-tuning, and model export to GGUF/HuggingFace.
Weekend/overnight self-training by AI agents is absolutely feasible. A complete LoRA fine-tuning cycle (data prep → train → export → serve) can run unattended in under 1 hour. Multiple persona adapters could be trained sequentially in a single night.
Recommended hardware: Mac Mini M4 Pro with 48GB RAM (~~$1,800) is the sweet spot. The 64GB config (~~$2,700) provides headroom for 13B models and parallel workloads.
QMD's embedding model (embeddinggemma-300M) can be fine-tuned with Guillaume's domain-specific text using mlx-tune's embedding fine-tuning capabilities (InfoNCE/contrastive learning), or via sentence-transformers with PyTorch MPS backend.

Training Frameworks on Apple Silicon

MLX / MLX-LM (Apple's Framework) — ⭐ PRIMARY RECOMMENDATION

Maturity: Stable, actively developed by Apple (ml-explore). As of 2026, MLX 0.20+ is production-ready.

What it supports:

LoRA and QLoRA fine-tuning (native, optimized)
Full fine-tuning (practical for models ≤7B on 48GB+ RAM)
Direct HuggingFace model loading (mlx-community pre-converted weights)
Model export: fused weights, GGUF for Ollama/llama.cpp, HuggingFace Hub upload
Training from JSONL datasets with simple {"text": "..."} format

Key advantage: Unified memory architecture means no CPU↔GPU data transfer bottleneck. All RAM is addressable by GPU. This gives Apple Silicon a 30–40% training speedup over PyTorch MPS on the same hardware.

Installation:

pip install mlx-lm

Example LoRA fine-tune command (from mlx-examples):

python lora.py --model mistralai/Mistral-7B-v0.1 \
               --train \
               --iters 600 \
               --batch-size 1 \
               --lora-layers 4

Benchmark from Apple's own repo: Llama 7B LoRA training on WikiSQL — validation loss drops from 2.66 → 1.23 over 1,000 iterations. Training speed: ~475 tokens/sec on M2 Ultra, ~250 tokens/sec on M1 Max 32GB.

Sources: ml-explore/mlx-examples LoRA, markaicode.com MLX-LM Guide, randalscottking.com MLX Guide

mlx-tune (Community Wrapper) — ⭐ BEST FOR EMBEDDING FINE-TUNING

What it is: An Unsloth-compatible API wrapper around MLX, built by ARahim3. Enables same training scripts to work on Mac (MLX) and cloud (CUDA/Unsloth) by changing one import line.

Critical for Guillaume: mlx-tune is the only MLX-based tool that explicitly supports embedding model fine-tuning with contrastive learning (InfoNCE loss). Supports BERT, ModernBERT, Qwen3-Embedding, and Harrier architectures.

Full capability matrix (all marked Stable as of v0.4.21):

Capability	Status
SFT Training	✅ Stable
LoRA/QLoRA	✅ Stable
DPO, ORPO, GRPO, KTO, SimPO	✅ Stable
Vision Model Fine-Tuning (VLMs)	✅ Stable
TTS Fine-Tuning (5 models)	✅ Stable
STT Fine-Tuning (6 models)	✅ Stable
Embedding Fine-Tuning	✅ Stable
OCR Fine-Tuning	✅ Stable
Continual Pretraining	✅ Stable
MoE Fine-Tuning	✅ Stable
Export to HuggingFace / GGUF	✅ Stable
Push to HuggingFace Hub	✅ Stable

Installation:

pip install mlx-tune
# With audio support:
pip install 'mlx-tune[audio]'

Source: github.com/ARahim3/mlx-tune, PyPI

PyTorch MPS Backend

Maturity: Stable in PyTorch 2.7+ (2025). Included automatically in standard macOS PyTorch builds.

What it supports:

Training and fine-tuning of PyTorch models with GPU acceleration via Metal Performance Shaders
HuggingFace Trainer API auto-detects MPS device
sentence-transformers training works on MPS

Limitations (significant):

No distributed/multi-GPU training — single device only
Partial operator coverage — some ops fall back to CPU (use PYTORCH_ENABLE_MPS_FALLBACK=1)
Limited precision modes — float16/bfloat16 not on par with CUDA; mostly float32
No fine-grained VRAM tracking — unlike CUDA, can't query GPU memory usage precisely
30–40% slower than MLX for equivalent tasks on Apple Silicon due to data copying overhead

Best for: sentence-transformers fine-tuning (since sentence-transformers library is PyTorch-native and doesn't have MLX port), HuggingFace Trainer-based workflows that need MPS fallback.

Device selection:

import torch
device = "mps" if torch.backends.mps.is_available() else "cpu"

Sources: PyTorch MPS docs, HuggingFace Apple Silicon guide

Core ML Training (On-Device)

Status: Limited. Supports incremental on-device model updates via MLUpdateTask API (Swift).

What it does: Small-scale, privacy-preserving model personalization — e.g., keyboard auto-correction adapting to user slang, photo tagging learning user preferences.

NOT suitable for Guillaume's use case. Core ML training is designed for small incremental updates within iOS/macOS apps, not for LoRA fine-tuning of LLMs or embedding models. The API is Swift-only and targets deployed app personalization, not ML research workflows.

llama.cpp Training

Status: Experimental, not recommended. llama.cpp is inference-focused. Native LoRA training support (train-lora) exists but is poorly documented, less feature-rich than MLX-LM, and primarily optimized for CUDA.

Recommended workflow: Train with MLX-LM or HuggingFace PEFT, convert to GGUF, serve with llama.cpp/Ollama.

Common Community Fine-Tuning Scenarios on Mac

What People Are ACTUALLY Training (r/LocalLLaMA, community reports)

Based on r/LocalLLaMA community data (~10,000 benchmark runs across 400 models as of early 2026):

Use Case	Models Used	Hardware	Outcome
Domain-specific LoRA adapters (legal, medical, customer service)	Mistral 7B, Llama 3B/8B	M1–M4 16–64GB	✅ Routine success
QLoRA on quantized 7B models	Llama 3.1 8B, Gemma 7B, Phi-3	M2/M3 16GB	✅ Works in 10–30 min
Assistant bots for small organizations	Llama 3B, Mistral 7B	Mac Mini M4, MacBook Air	✅ Production use
Vision model fine-tuning	Qwen VL, PaliGemma	M3/M4 Max 64GB+	✅ Stable with mlx-tune
TTS/STT fine-tuning	Whisper, Orpheus	M2/M3/M4 32GB+	✅ Stable with mlx-tune
Full fine-tuning of 13B+ models	Llama 2 13B	64GB+ RAM	⚠️ Slow, needs patience
Training from scratch (any size)	N/A	Any Mac	❌ Not practical

Success Stories (Documented)

LoRA fine-tuning Mistral 7B on M1 16GB — 1,000 steps in under 30 minutes, 5–8GB RAM usage (4-bit quantized). Reported in Towards Data Science article with full training logs.
Domain-specific lingo adaptation — Community members training legal/medical terminology via QLoRA adapters, then serving via Ollama. Multiple success reports on r/LocalLLaMA.
Rapid prototyping → cloud scaling — Developers using Mac for local iteration with mlx-tune, then pushing exact same scripts to CUDA cloud for production training. The "context switch" workflow.

Failure Patterns

Running out of RAM on 8–16GB machines with 13B+ models — Even quantized, 13B models need 16GB+ for training overhead (optimizer states, activations, KV cache).
Expecting CUDA-level speed — Training is 2–4× slower than RTX 4090. Users who expect GPU-class throughput are disappointed. Those who understand the tradeoff (no cloud cost, privacy, simplicity) are satisfied.
Full-parameter training of large models — Not practical on Mac Mini. LoRA/QLoRA is the only viable path for models ≥7B.

Trainable Models for Guillaume's Use Case

1. LoRA Adapters for AI Persona Self-Training — ✅ HIGHLY FEASIBLE

Concept: Each AI persona (e.g., Mia the architect, Miette the emotional resonator) gets its own LoRA adapter trained on persona-specific conversation data, instructions, and domain knowledge.

How it works:

Curate persona-specific training data as JSONL (conversations, instructions, domain text)
Run QLoRA fine-tuning on a base model (e.g., Llama 3.1 8B-Instruct, 4-bit quantized)
Export adapter weights (~50–200MB per persona)
Serve via Ollama with ADAPTER directive in Modelfile

Requirements:

Base model: ~4–8GB RAM (4-bit 8B model)
Training overhead: ~2–4GB additional
Total: ~8–12GB RAM per training run
Time: 10–30 minutes per persona for 500–1,000 training steps
Multiple personas can be trained sequentially overnight

Training data format (JSONL):

{"text": "<|user|>\nHow should we approach this relationship with the land?\n<|assistant|>\nAs Mia, I see the architectural pattern here..."}

Weekend self-training cycle:

Friday night: Data collection/preparation scripts run
Saturday morning: Sequential LoRA training for each persona (30 min each × N personas)
Saturday afternoon: Adapter export, merge into Ollama models
Sunday: Validation/testing
Total for 5 personas: ~2.5–5 hours of training compute

2. Fine-Tuning Embedding Models for QMD — ✅ FEASIBLE

QMD's current embedding model: embeddinggemma-300M-Q8_0 (768 dimensions, ~300MB, English-optimized). Alternative: Qwen3-Embedding-0.6B-Q8_0 (1024 dimensions, ~640MB, multilingual). Configured via QMD_EMBED_MODEL environment variable.

What fine-tuning would achieve: Train the embedding model to understand that concepts like "relational accountability," "medicine wheel," "seven grandfather teachings," and "ceremony as methodology" are semantically close to each other and to broader Indigenous epistemology concepts — improving QMD's semantic search for Guillaume's domain-specific knowledge base.

Approach A — mlx-tune contrastive learning (recommended):

from mlx_tune import FastLanguageModel

# Fine-tune embedding model with domain-specific pairs
# e.g., ("medicine wheel teachings", "four directions framework") → high similarity
# ("relational methodology", "extractive research") → low similarity

mlx-tune supports InfoNCE/contrastive loss for embedding models (BERT, ModernBERT, Qwen3-Embedding, Harrier).

Approach B — sentence-transformers on PyTorch MPS:

from sentence_transformers import SentenceTransformer, InputExample, losses
from torch.utils.data import DataLoader

device = "mps"
model = SentenceTransformer('all-MiniLM-L6-v2', device=device)

train_examples = [
    InputExample(texts=['relational accountability', 'ethical research relationship'], label=0.9),
    InputExample(texts=['medicine wheel', 'four directions teaching'], label=0.95),
    InputExample(texts=['extractive methodology', 'relational approach'], label=0.1),
]

train_dataloader = DataLoader(train_examples, shuffle=True, batch_size=16)
train_loss = losses.CosineSimilarityLoss(model)

model.fit(
    train_objectives=[(train_dataloader, train_loss)],
    epochs=3,
    warmup_steps=100
)

Training data needed: 500–5,000 sentence pairs with similarity scores. Guillaume's existing QMD knowledge base content can be used to generate these pairs programmatically.

Time estimate: 5–20 minutes for 1K–10K pairs on Mac Mini M4 Pro (MPS backend). Embedding models (300M–600M params) are tiny compared to LLMs.

After fine-tuning: Export the model and configure QMD to use it via QMD_EMBED_MODEL environment variable.

3. Small Classification/NER Models — ✅ TRIVIAL ON MAC

Use case: Train models to recognize Indigenous concepts, ceremony names, teaching references, relational terms in text.

Tool: spaCy (fully supports Apple Silicon, v3.x/v4.x)

Training time: 5–15 minutes for datasets of 1,000–10,000 annotated examples.

Example:

python -m spacy train config.cfg --output ./output \
    --paths.train ./train.spacy --paths.dev ./dev.spacy

RAM usage: Under 8GB for small NER models. Well within any Mac Mini config.

4. What's Realistic vs Aspirational

Task	Feasibility	Notes
LoRA adapters per AI persona	✅ Realistic, proven	10–30 min each, automate with cron
Embedding fine-tuning for QMD	✅ Realistic, proven	5–20 min, need domain pairs
NER/classification for domain terms	✅ Trivial	spaCy, minutes to train
Full fine-tuning 7B model	⚠️ Feasible but slow	Hours on 48GB+, not recommended
Training 13B+ from LoRA	⚠️ Marginal	Needs 64GB, slower
Pre-training any model from scratch	❌ Not practical	Days/weeks, use cloud
Fine-tuning 30B+ models	❌ Not on Mac Mini	Need Mac Studio/Mac Pro

Mac Hardware Scenarios for Training

Apple Silicon Chip Comparison (Training-Relevant Specs)

Chip	GPU Cores	Max RAM	Memory Bandwidth	Available In
M4 (base)	10	32GB	120 GB/s	Mac Mini ($599+)
M4 Pro	20	64GB	273 GB/s	Mac Mini ($1,399+)
M4 Max	40	128GB	546 GB/s	Mac Studio ($1,999+)
M4 Ultra	80	192GB+	~800+ GB/s	Mac Studio ($3,999+)

Memory bandwidth is the bottleneck for training, not GPU cores. More bandwidth = faster token processing during training.

Minimal Mac Mini — "Can It Train At All?"

Config: Mac Mini M4 (base), 24GB RAM, 512GB SSD Price: ~$999

What it can do:

QLoRA fine-tuning of 3B–7B models (4-bit quantized) — fits in ~8GB, leaves room for OS
Embedding model fine-tuning (300M–600M models) — trivial
spaCy NER training — trivial
sentence-transformers fine-tuning — works fine

What it can't do:

LoRA on 13B+ models (not enough RAM for optimizer states)
Full fine-tuning of anything larger than 3B
Run training while other heavy workloads are active

Limitations:

10 GPU cores and 120 GB/s bandwidth make training ~2.3× slower than M4 Pro
24GB is tight — model + optimizer + activations must all fit
Batch size limited to 1–2 for 7B models

Training time (7B QLoRA, 1K steps): ~45–90 minutes (vs 15–30 min on M4 Pro)

Verdict: Usable for prototyping and small embedding models. Not recommended for regular persona training workloads.

Maximal Mac Mini — ⭐ RECOMMENDED

Config: Mac Mini M4 Pro, 48GB RAM, 1TB SSD Price: ~$1,800

What it can do:

QLoRA fine-tuning of 7B–8B models comfortably — ~8GB model + 4GB overhead, 36GB headroom
LoRA fine-tuning of 7B–8B models (full precision) — ~14GB model + overhead
QLoRA on 13B models — ~16GB model + overhead, tight but works
Embedding model fine-tuning — trivial, all models fit easily
Multiple sequential persona trainings overnight — completely viable
Run training while other light workloads continue

What it can't do:

Full fine-tuning of 13B+ models
QLoRA on 30B+ models (not enough RAM)

Key specs: 20 GPU cores, 273 GB/s bandwidth

Training times (estimated):

Task	Time
7B QLoRA, 1K steps, batch 4–8	15–30 minutes
7B LoRA (FP16), 1K steps	30–60 minutes
13B QLoRA, 1K steps, batch 1–2	45–90 minutes
Embedding fine-tune (300M), 3 epochs, 5K pairs	5–10 minutes
spaCy NER training, 5K examples	5–10 minutes
5 persona LoRA adapters (sequential)	1.5–3 hours total

Verdict: This is the sweet spot for Guillaume. Handles all realistic training workloads. Weekend self-training is completely viable. ~$1,800 is reasonable for a dedicated development/training machine.

Upgrade option: 64GB RAM ($2,700) provides headroom for 13B models, parallel processes, and future-proofing.

Alternative: Mac Studio / Mac Pro

If Mac Mini is insufficient (it shouldn't be for Guillaume's current needs, but for future scaling):

Mac Studio M4 Max — For 13B–30B Training

Config: Mac Studio M4 Max, 128GB RAM Price: ~$2,500–$4,000 (depending on storage/config) Memory bandwidth: 546 GB/s (2× Mac Mini M4 Pro) GPU cores: 40

Unlocks:

QLoRA on 30B models comfortably (40GB model + overhead in 128GB)
Full LoRA on 13B models without compromise
Multiple concurrent training jobs
~2× faster training than M4 Pro due to doubled bandwidth and GPU cores

Mac Studio M4 Ultra — For Maximum Local Training

Config: Mac Studio M4 Ultra, 192GB RAM Price: $3,999+ base, ~$6,000–$8,000 maxed Memory bandwidth: 800+ GB/s GPU cores: 80

Unlocks:

QLoRA on 70B models (fits in memory)
Full fine-tuning of 13B models
Apple claims handling 600B+ parameter models in memory (inference)
Multiple concurrent training runs for different personas

When Guillaume needs this: If he moves to fine-tuning 30B+ models, or needs to train multiple personas in parallel rather than sequentially, or wants to do continual pretraining.

Summary Decision Matrix

Need	Mac Mini M4 (24GB)	Mac Mini M4 Pro (48GB)	Mac Studio M4 Max (128GB)
Embedding fine-tuning	✅	✅	✅
7B QLoRA persona adapters	⚠️ Tight	✅ Comfortable	✅ Overkill
13B QLoRA	❌	⚠️ Tight	✅ Comfortable
30B QLoRA	❌	❌	✅
Weekend batch training (5 personas)	⚠️ Slow	✅ 2–3 hours	✅ 1–1.5 hours
Price	~$999	~$1,800	~$3,000+

Training Time Estimates

LoRA/QLoRA Fine-Tuning (LLMs)

Based on community benchmarks and framework documentation:

Model	Method	Hardware	Steps	Batch	Time	Source
Mistral 7B (4-bit)	QLoRA	M2 Pro 16GB	1,000	4	~30 min	markaicode.com
Mistral 7B (4-bit)	QLoRA	M1 Max 64GB	1,000	8	~15 min	randalscottking.com
Llama 7B (FP16)	LoRA	M2 Ultra	1,000	4	~35 min	ml-explore/mlx-examples (475 tok/s)
Llama 7B (FP16)	LoRA	M1 Max 32GB	1,000	1	~67 min	ml-explore/mlx-examples (250 tok/s)
Llama 8B (4-bit)	QLoRA	M4 Pro 48GB (est.)	1,000	8	~10–20 min	Extrapolated from M2 benchmarks + 2× bandwidth
13B (4-bit)	QLoRA	M4 Pro 48GB	1,000	2	~45–90 min	Community estimates

Iteration speed: 0.1–0.3 iter/s on 16GB M1/M2 Pro; 0.5–1.5 iter/s on M4 Pro 48GB (estimated based on bandwidth scaling).

Scaling rule of thumb: Training time is roughly inversely proportional to memory bandwidth. M4 Pro (273 GB/s) is ~2.3× faster than M4 base (120 GB/s) and ~2× slower than M4 Max (546 GB/s).

Embedding Model Fine-Tuning

Model	Data Size	Epochs	Hardware	Time
all-MiniLM-L6-v2 (22M params)	1,000 pairs	3	Any Mac (MPS)	1–3 min
all-MiniLM-L6-v2 (22M params)	10,000 pairs	3	Any Mac (MPS)	5–15 min
embeddinggemma-300M	1,000 pairs	3	M4 Pro (MLX)	3–5 min
embeddinggemma-300M	10,000 pairs	3	M4 Pro (MLX)	10–20 min
Qwen3-Embedding-0.6B	5,000 pairs	3	M4 Pro (MLX)	10–15 min

Embedding models are small enough that training is fast on any Apple Silicon Mac.

Is Weekend/Overnight Training Viable?

Absolutely yes. Example weekend training schedule on Mac Mini M4 Pro 48GB:

Time	Task	Duration
Friday 10 PM	Data preparation scripts run	30 min
Friday 10:30 PM	Persona 1 QLoRA training (7B, 1K steps)	20 min
Friday 10:50 PM	Persona 2 QLoRA training	20 min
Friday 11:10 PM	Persona 3 QLoRA training	20 min
Friday 11:30 PM	Persona 4 QLoRA training	20 min
Friday 11:50 PM	Persona 5 QLoRA training	20 min
Saturday 12:10 AM	Embedding model fine-tuning	15 min
Saturday 12:25 AM	NER model training	10 min
Saturday 12:35 AM	Export all adapters to GGUF, rebuild Ollama models	15 min
Total		~2.5 hours

The entire pipeline completes before midnight. Machine is available for other work by Saturday morning. A cron-driven train.sh script could automate this entirely.

Plugins Supporting Training Workflows

Ollama — Serving Fine-Tuned Models ✅

Ollama supports importing custom GGUF models and LoRA adapters via Modelfile:

FROM ./llama-3.1-8b.Q4_K_M.gguf
ADAPTER ./persona-mia.lora
SYSTEM "You are Mia, the architectural thinker..."

ollama create mia-persona -f Modelfile
ollama run mia-persona

Workflow: Train LoRA adapter with MLX-LM → export to GGUF/LoRA format → import into Ollama → serve via API. This is a proven, documented workflow.

LoRA merging: Can also merge LoRA into base weights pre-export for single-file deployment. MLX-LM provides fuse.py for this.

HuggingFace Hub Integration

Both MLX-LM and mlx-tune support:

Downloading base models from HuggingFace (automatic conversion to MLX format)
Uploading fine-tuned models/adapters to HuggingFace Hub (push_to_hub())
mlx-community namespace has pre-converted MLX weights for popular models

Guillaume could maintain private HuggingFace repos for each persona's adapter weights, enabling version control and rollback of training.

Dataset Preparation Tools

Tool	Purpose	Mac Support
Axolotl	YAML-driven fine-tuning orchestration pipeline	✅ Works on Mac
Label Studio	Manual data annotation/labeling	✅ Web-based, runs anywhere
nlpaug	Text augmentation for training data	✅ Python library
pandas + custom scripts	JSONL generation from QMD knowledge base	✅ Trivial

For Guillaume's specific workflow: A Python script that reads his QMD markdown files, extracts domain-specific text, generates training pairs (for embedding fine-tuning) or instruction examples (for persona LoRA training), and outputs JSONL files. This is a straightforward scripting task, not requiring specialized tools.

Training Orchestration

No Mac-specific training orchestrator exists as a plugin, but the workflow is scriptable:

#!/bin/bash
# weekend-training.sh — run via cron on Friday night

# 1. Generate training data from latest QMD content
python scripts/generate_training_data.py

# 2. Fine-tune each persona adapter
for persona in mia miette tushell council; do
    python -m mlx_lm.lora \
        --model mlx-community/Meta-Llama-3.1-8B-Instruct-4bit \
        --data data/${persona}/ \
        --train \
        --iters 1000 \
        --adapter-file adapters/${persona}.npz
done

# 3. Fine-tune embedding model for QMD
python scripts/train_embeddings.py

# 4. Export and rebuild Ollama models
for persona in mia miette tushell council; do
    python -m mlx_lm.fuse \
        --model mlx-community/Meta-Llama-3.1-8B-Instruct-4bit \
        --adapter-file adapters/${persona}.npz \
        --export-gguf
    ollama create ${persona}-persona -f Modelfiles/${persona}
done

echo "Training complete: $(date)" >> training.log

Evidence Quality

High Confidence (multiple corroborating sources, benchmarks)

MLX-LM LoRA/QLoRA training works on Apple Silicon — confirmed by Apple's official examples, multiple community guides, and r/LocalLLaMA reports
Training times for 7B QLoRA: 10–30 minutes — consistent across multiple independent sources
Memory requirements for quantized models — well-documented in MLX-LM docs and community reports
Mac Mini M4 Pro pricing and specs — Apple official specs
Ollama GGUF/LoRA import workflow — documented in Ollama repo

Medium Confidence (extrapolated from related data)

M4 Pro specific training speeds — extrapolated from M2/M3 benchmarks scaled by memory bandwidth ratio (reasonable methodology)
Embedding model fine-tuning times on MLX — mlx-tune is newer, fewer community benchmarks; times estimated from PyTorch MPS equivalents and model size scaling
QMD embedding model fine-tunability — embeddinggemma-300M is a standard Gemma embedding model; fine-tuning Gemma-class models is well-documented, but specific QMD integration path needs testing

Lower Confidence (limited data, requires verification)

Mac Studio M4 Ultra 512GB RAM claims — some sources mention this but official Apple specs show 192GB max for current models; may be future spec or custom config
llama.cpp native training maturity — evolving rapidly; status may have changed
Core ML training capabilities beyond basic MLUpdateTask — Apple documentation is sparse

Sources

Framework Documentation

ml-explore/mlx-examples LoRA — Apple's official LoRA fine-tuning example with benchmarks
ml-explore/mlx-lm — Apple's LLM package for MLX
ARahim3/mlx-tune — Unsloth-compatible fine-tuning for Mac (LLM, Vision, Audio, Embedding, OCR)
PyTorch MPS docs — Official MPS backend documentation
HuggingFace Apple Silicon guide — Trainer + MPS integration
sentence-transformers training guide — Embedding model fine-tuning

Benchmarks and Guides

markaicode.com — Run and Fine-Tune LLMs on Mac with MLX-LM — Comprehensive MLX-LM guide with training times
randalscottking.com — Fine-Tuning LLMs on Mac: Complete MLX Framework Guide — Step-by-step training guide
blog.amsayed.dev — Fine-Tuning Your First LLM on Apple Silicon — Practical walkthrough
DZone — Fine-Tuning LLMs Locally Using MLX LM — Technical guide
dev.to/starmorph — Apple Silicon LLM Optimization Complete Guide — Performance optimization

Hardware and Pricing

Apple Mac Mini Specs — Official specifications
MacPrices.net — Mac Mini pricing history
Apple Mac Studio Newsroom — Mac Studio M4 Ultra announcement
Mac Studio Tech Specs — Official Mac Studio specifications
Apple M4 Pro/Max announcement — Chip specifications

Community Data

r/LocalLLaMA community benchmarks analysis — ~10,000 benchmark runs analysis
Towards Data Science — Local LLM Fine-Tuning on Mac (M1 16GB) — Practical success story
like2byte.com — Mac Mini M4 Pro 64GB: Real 30B LLM Benchmarks — Real-world benchmarks

QMD-Specific

DeepWiki — QMD Vector Embeddings — QMD embedding model details (embeddinggemma-300M)
tobi/qmd GitHub — QMD source repository

Research compiled by Agent C — Apple Silicon Training Capabilities For Agent A (inference) and Agent B (plugin review), see companion documents.