← Back to Articles & Artefacts
artefactseast

Agent C: Apple Silicon Training & Fine-Tuning Capabilities

IAIP Research
rch-tech-jgwill-claws-infrastructure

Agent C: Apple Silicon Training & Fine-Tuning Capabilities

Research Date: April 15, 2026 Scope: Training/fine-tuning AI models locally on Mac Mini for Guillaume's Indigenous-AI collaborative platform Focus: Hardware requirements, framework maturity, practical feasibility, training time estimates


Key Findings

  1. Local fine-tuning on Mac Mini is viable and practical for Guillaume's use case. LoRA/QLoRA fine-tuning of 7B–8B models completes in 10–30 minutes on M4 Pro hardware. Embedding model fine-tuning for QMD-style models takes 5–20 minutes for domain-specific datasets of 1K–10K examples.

  2. MLX + MLX-LM is the gold standard for Apple Silicon training. It eliminates CPU–GPU data copying via unified memory, installs with pip, and supports LoRA, QLoRA, DPO, GRPO, SFT, embedding fine-tuning, and model export to GGUF/HuggingFace.

  3. Weekend/overnight self-training by AI agents is absolutely feasible. A complete LoRA fine-tuning cycle (data prep → train → export → serve) can run unattended in under 1 hour. Multiple persona adapters could be trained sequentially in a single night.

  4. Recommended hardware: Mac Mini M4 Pro with 48GB RAM ($1,800) is the sweet spot. The 64GB config ($2,700) provides headroom for 13B models and parallel workloads.

  5. QMD's embedding model (embeddinggemma-300M) can be fine-tuned with Guillaume's domain-specific text using mlx-tune's embedding fine-tuning capabilities (InfoNCE/contrastive learning), or via sentence-transformers with PyTorch MPS backend.


Training Frameworks on Apple Silicon

MLX / MLX-LM (Apple's Framework) — ⭐ PRIMARY RECOMMENDATION

Maturity: Stable, actively developed by Apple (ml-explore). As of 2026, MLX 0.20+ is production-ready.

What it supports:

  • LoRA and QLoRA fine-tuning (native, optimized)
  • Full fine-tuning (practical for models ≤7B on 48GB+ RAM)
  • Direct HuggingFace model loading (mlx-community pre-converted weights)
  • Model export: fused weights, GGUF for Ollama/llama.cpp, HuggingFace Hub upload
  • Training from JSONL datasets with simple {"text": "..."} format

Key advantage: Unified memory architecture means no CPU↔GPU data transfer bottleneck. All RAM is addressable by GPU. This gives Apple Silicon a 30–40% training speedup over PyTorch MPS on the same hardware.

Installation:

pip install mlx-lm

Example LoRA fine-tune command (from mlx-examples):

python lora.py --model mistralai/Mistral-7B-v0.1 \
               --train \
               --iters 600 \
               --batch-size 1 \
               --lora-layers 4

Benchmark from Apple's own repo: Llama 7B LoRA training on WikiSQL — validation loss drops from 2.66 → 1.23 over 1,000 iterations. Training speed: ~475 tokens/sec on M2 Ultra, ~250 tokens/sec on M1 Max 32GB.

Sources: ml-explore/mlx-examples LoRA, markaicode.com MLX-LM Guide, randalscottking.com MLX Guide


mlx-tune (Community Wrapper) — ⭐ BEST FOR EMBEDDING FINE-TUNING

What it is: An Unsloth-compatible API wrapper around MLX, built by ARahim3. Enables same training scripts to work on Mac (MLX) and cloud (CUDA/Unsloth) by changing one import line.

Critical for Guillaume: mlx-tune is the only MLX-based tool that explicitly supports embedding model fine-tuning with contrastive learning (InfoNCE loss). Supports BERT, ModernBERT, Qwen3-Embedding, and Harrier architectures.

Full capability matrix (all marked Stable as of v0.4.21):

CapabilityStatus
SFT Training✅ Stable
LoRA/QLoRA✅ Stable
DPO, ORPO, GRPO, KTO, SimPO✅ Stable
Vision Model Fine-Tuning (VLMs)✅ Stable
TTS Fine-Tuning (5 models)✅ Stable
STT Fine-Tuning (6 models)✅ Stable
Embedding Fine-Tuning✅ Stable
OCR Fine-Tuning✅ Stable
Continual Pretraining✅ Stable
MoE Fine-Tuning✅ Stable
Export to HuggingFace / GGUF✅ Stable
Push to HuggingFace Hub✅ Stable

Installation:

pip install mlx-tune
# With audio support:
pip install 'mlx-tune[audio]'

Source: github.com/ARahim3/mlx-tune, PyPI


PyTorch MPS Backend

Maturity: Stable in PyTorch 2.7+ (2025). Included automatically in standard macOS PyTorch builds.

What it supports:

  • Training and fine-tuning of PyTorch models with GPU acceleration via Metal Performance Shaders
  • HuggingFace Trainer API auto-detects MPS device
  • sentence-transformers training works on MPS

Limitations (significant):

  • No distributed/multi-GPU training — single device only
  • Partial operator coverage — some ops fall back to CPU (use PYTORCH_ENABLE_MPS_FALLBACK=1)
  • Limited precision modes — float16/bfloat16 not on par with CUDA; mostly float32
  • No fine-grained VRAM tracking — unlike CUDA, can't query GPU memory usage precisely
  • 30–40% slower than MLX for equivalent tasks on Apple Silicon due to data copying overhead

Best for: sentence-transformers fine-tuning (since sentence-transformers library is PyTorch-native and doesn't have MLX port), HuggingFace Trainer-based workflows that need MPS fallback.

Device selection:

import torch
device = "mps" if torch.backends.mps.is_available() else "cpu"

Sources: PyTorch MPS docs, HuggingFace Apple Silicon guide


Core ML Training (On-Device)

Status: Limited. Supports incremental on-device model updates via MLUpdateTask API (Swift).

What it does: Small-scale, privacy-preserving model personalization — e.g., keyboard auto-correction adapting to user slang, photo tagging learning user preferences.

NOT suitable for Guillaume's use case. Core ML training is designed for small incremental updates within iOS/macOS apps, not for LoRA fine-tuning of LLMs or embedding models. The API is Swift-only and targets deployed app personalization, not ML research workflows.


llama.cpp Training

Status: Experimental, not recommended. llama.cpp is inference-focused. Native LoRA training support (train-lora) exists but is poorly documented, less feature-rich than MLX-LM, and primarily optimized for CUDA.

Recommended workflow: Train with MLX-LM or HuggingFace PEFT, convert to GGUF, serve with llama.cpp/Ollama.


Common Community Fine-Tuning Scenarios on Mac

What People Are ACTUALLY Training (r/LocalLLaMA, community reports)

Based on r/LocalLLaMA community data (~10,000 benchmark runs across 400 models as of early 2026):

Use CaseModels UsedHardwareOutcome
Domain-specific LoRA adapters (legal, medical, customer service)Mistral 7B, Llama 3B/8BM1–M4 16–64GB✅ Routine success
QLoRA on quantized 7B modelsLlama 3.1 8B, Gemma 7B, Phi-3M2/M3 16GB✅ Works in 10–30 min
Assistant bots for small organizationsLlama 3B, Mistral 7BMac Mini M4, MacBook Air✅ Production use
Vision model fine-tuningQwen VL, PaliGemmaM3/M4 Max 64GB+✅ Stable with mlx-tune
TTS/STT fine-tuningWhisper, OrpheusM2/M3/M4 32GB+✅ Stable with mlx-tune
Full fine-tuning of 13B+ modelsLlama 2 13B64GB+ RAM⚠️ Slow, needs patience
Training from scratch (any size)N/AAny Mac❌ Not practical

Success Stories (Documented)

  1. LoRA fine-tuning Mistral 7B on M1 16GB — 1,000 steps in under 30 minutes, 5–8GB RAM usage (4-bit quantized). Reported in Towards Data Science article with full training logs.

  2. Domain-specific lingo adaptation — Community members training legal/medical terminology via QLoRA adapters, then serving via Ollama. Multiple success reports on r/LocalLLaMA.

  3. Rapid prototyping → cloud scaling — Developers using Mac for local iteration with mlx-tune, then pushing exact same scripts to CUDA cloud for production training. The "context switch" workflow.

Failure Patterns

  1. Running out of RAM on 8–16GB machines with 13B+ models — Even quantized, 13B models need 16GB+ for training overhead (optimizer states, activations, KV cache).

  2. Expecting CUDA-level speed — Training is 2–4× slower than RTX 4090. Users who expect GPU-class throughput are disappointed. Those who understand the tradeoff (no cloud cost, privacy, simplicity) are satisfied.

  3. Full-parameter training of large models — Not practical on Mac Mini. LoRA/QLoRA is the only viable path for models ≥7B.


Trainable Models for Guillaume's Use Case

1. LoRA Adapters for AI Persona Self-Training — ✅ HIGHLY FEASIBLE

Concept: Each AI persona (e.g., Mia the architect, Miette the emotional resonator) gets its own LoRA adapter trained on persona-specific conversation data, instructions, and domain knowledge.

How it works:

  1. Curate persona-specific training data as JSONL (conversations, instructions, domain text)
  2. Run QLoRA fine-tuning on a base model (e.g., Llama 3.1 8B-Instruct, 4-bit quantized)
  3. Export adapter weights (~50–200MB per persona)
  4. Serve via Ollama with ADAPTER directive in Modelfile

Requirements:

  • Base model: ~4–8GB RAM (4-bit 8B model)
  • Training overhead: ~2–4GB additional
  • Total: ~8–12GB RAM per training run
  • Time: 10–30 minutes per persona for 500–1,000 training steps
  • Multiple personas can be trained sequentially overnight

Training data format (JSONL):

{"text": "<|user|>\nHow should we approach this relationship with the land?\n<|assistant|>\nAs Mia, I see the architectural pattern here..."}

Weekend self-training cycle:

  • Friday night: Data collection/preparation scripts run
  • Saturday morning: Sequential LoRA training for each persona (30 min each × N personas)
  • Saturday afternoon: Adapter export, merge into Ollama models
  • Sunday: Validation/testing
  • Total for 5 personas: ~2.5–5 hours of training compute

2. Fine-Tuning Embedding Models for QMD — ✅ FEASIBLE

QMD's current embedding model: embeddinggemma-300M-Q8_0 (768 dimensions, ~300MB, English-optimized). Alternative: Qwen3-Embedding-0.6B-Q8_0 (1024 dimensions, ~640MB, multilingual). Configured via QMD_EMBED_MODEL environment variable.

What fine-tuning would achieve: Train the embedding model to understand that concepts like "relational accountability," "medicine wheel," "seven grandfather teachings," and "ceremony as methodology" are semantically close to each other and to broader Indigenous epistemology concepts — improving QMD's semantic search for Guillaume's domain-specific knowledge base.

Approach A — mlx-tune contrastive learning (recommended):

from mlx_tune import FastLanguageModel

# Fine-tune embedding model with domain-specific pairs
# e.g., ("medicine wheel teachings", "four directions framework") → high similarity
# ("relational methodology", "extractive research") → low similarity

mlx-tune supports InfoNCE/contrastive loss for embedding models (BERT, ModernBERT, Qwen3-Embedding, Harrier).

Approach B — sentence-transformers on PyTorch MPS:

from sentence_transformers import SentenceTransformer, InputExample, losses
from torch.utils.data import DataLoader

device = "mps"
model = SentenceTransformer('all-MiniLM-L6-v2', device=device)

train_examples = [
    InputExample(texts=['relational accountability', 'ethical research relationship'], label=0.9),
    InputExample(texts=['medicine wheel', 'four directions teaching'], label=0.95),
    InputExample(texts=['extractive methodology', 'relational approach'], label=0.1),
]

train_dataloader = DataLoader(train_examples, shuffle=True, batch_size=16)
train_loss = losses.CosineSimilarityLoss(model)

model.fit(
    train_objectives=[(train_dataloader, train_loss)],
    epochs=3,
    warmup_steps=100
)

Training data needed: 500–5,000 sentence pairs with similarity scores. Guillaume's existing QMD knowledge base content can be used to generate these pairs programmatically.

Time estimate: 5–20 minutes for 1K–10K pairs on Mac Mini M4 Pro (MPS backend). Embedding models (300M–600M params) are tiny compared to LLMs.

After fine-tuning: Export the model and configure QMD to use it via QMD_EMBED_MODEL environment variable.

3. Small Classification/NER Models — ✅ TRIVIAL ON MAC

Use case: Train models to recognize Indigenous concepts, ceremony names, teaching references, relational terms in text.

Tool: spaCy (fully supports Apple Silicon, v3.x/v4.x)

Training time: 5–15 minutes for datasets of 1,000–10,000 annotated examples.

Example:

python -m spacy train config.cfg --output ./output \
    --paths.train ./train.spacy --paths.dev ./dev.spacy

RAM usage: Under 8GB for small NER models. Well within any Mac Mini config.

4. What's Realistic vs Aspirational

TaskFeasibilityNotes
LoRA adapters per AI persona✅ Realistic, proven10–30 min each, automate with cron
Embedding fine-tuning for QMD✅ Realistic, proven5–20 min, need domain pairs
NER/classification for domain terms✅ TrivialspaCy, minutes to train
Full fine-tuning 7B model⚠️ Feasible but slowHours on 48GB+, not recommended
Training 13B+ from LoRA⚠️ MarginalNeeds 64GB, slower
Pre-training any model from scratch❌ Not practicalDays/weeks, use cloud
Fine-tuning 30B+ models❌ Not on Mac MiniNeed Mac Studio/Mac Pro

Mac Hardware Scenarios for Training

Apple Silicon Chip Comparison (Training-Relevant Specs)

ChipGPU CoresMax RAMMemory BandwidthAvailable In
M4 (base)1032GB120 GB/sMac Mini ($599+)
M4 Pro2064GB273 GB/sMac Mini ($1,399+)
M4 Max40128GB546 GB/sMac Studio ($1,999+)
M4 Ultra80192GB+~800+ GB/sMac Studio ($3,999+)

Memory bandwidth is the bottleneck for training, not GPU cores. More bandwidth = faster token processing during training.


Minimal Mac Mini — "Can It Train At All?"

Config: Mac Mini M4 (base), 24GB RAM, 512GB SSD Price: ~$999

What it can do:

  • QLoRA fine-tuning of 3B–7B models (4-bit quantized) — fits in ~8GB, leaves room for OS
  • Embedding model fine-tuning (300M–600M models) — trivial
  • spaCy NER training — trivial
  • sentence-transformers fine-tuning — works fine

What it can't do:

  • LoRA on 13B+ models (not enough RAM for optimizer states)
  • Full fine-tuning of anything larger than 3B
  • Run training while other heavy workloads are active

Limitations:

  • 10 GPU cores and 120 GB/s bandwidth make training ~2.3× slower than M4 Pro
  • 24GB is tight — model + optimizer + activations must all fit
  • Batch size limited to 1–2 for 7B models

Training time (7B QLoRA, 1K steps): ~45–90 minutes (vs 15–30 min on M4 Pro)

Verdict: Usable for prototyping and small embedding models. Not recommended for regular persona training workloads.


Maximal Mac Mini — ⭐ RECOMMENDED

Config: Mac Mini M4 Pro, 48GB RAM, 1TB SSD Price: ~$1,800

What it can do:

  • QLoRA fine-tuning of 7B–8B models comfortably — ~8GB model + 4GB overhead, 36GB headroom
  • LoRA fine-tuning of 7B–8B models (full precision) — ~14GB model + overhead
  • QLoRA on 13B models — ~16GB model + overhead, tight but works
  • Embedding model fine-tuning — trivial, all models fit easily
  • Multiple sequential persona trainings overnight — completely viable
  • Run training while other light workloads continue

What it can't do:

  • Full fine-tuning of 13B+ models
  • QLoRA on 30B+ models (not enough RAM)

Key specs: 20 GPU cores, 273 GB/s bandwidth

Training times (estimated):

TaskTime
7B QLoRA, 1K steps, batch 4–815–30 minutes
7B LoRA (FP16), 1K steps30–60 minutes
13B QLoRA, 1K steps, batch 1–245–90 minutes
Embedding fine-tune (300M), 3 epochs, 5K pairs5–10 minutes
spaCy NER training, 5K examples5–10 minutes
5 persona LoRA adapters (sequential)1.5–3 hours total

Verdict: This is the sweet spot for Guillaume. Handles all realistic training workloads. Weekend self-training is completely viable. ~$1,800 is reasonable for a dedicated development/training machine.

Upgrade option: 64GB RAM ($2,700) provides headroom for 13B models, parallel processes, and future-proofing.


Alternative: Mac Studio / Mac Pro

If Mac Mini is insufficient (it shouldn't be for Guillaume's current needs, but for future scaling):

Mac Studio M4 Max — For 13B–30B Training

Config: Mac Studio M4 Max, 128GB RAM Price: ~$2,500–$4,000 (depending on storage/config) Memory bandwidth: 546 GB/s (2× Mac Mini M4 Pro) GPU cores: 40

Unlocks:

  • QLoRA on 30B models comfortably (40GB model + overhead in 128GB)
  • Full LoRA on 13B models without compromise
  • Multiple concurrent training jobs
  • ~2× faster training than M4 Pro due to doubled bandwidth and GPU cores

Mac Studio M4 Ultra — For Maximum Local Training

Config: Mac Studio M4 Ultra, 192GB RAM Price: $3,999+ base, ~$6,000–$8,000 maxed Memory bandwidth: 800+ GB/s GPU cores: 80

Unlocks:

  • QLoRA on 70B models (fits in memory)
  • Full fine-tuning of 13B models
  • Apple claims handling 600B+ parameter models in memory (inference)
  • Multiple concurrent training runs for different personas

When Guillaume needs this: If he moves to fine-tuning 30B+ models, or needs to train multiple personas in parallel rather than sequentially, or wants to do continual pretraining.

Summary Decision Matrix

NeedMac Mini M4 (24GB)Mac Mini M4 Pro (48GB)Mac Studio M4 Max (128GB)
Embedding fine-tuning
7B QLoRA persona adapters⚠️ Tight✅ Comfortable✅ Overkill
13B QLoRA⚠️ Tight✅ Comfortable
30B QLoRA
Weekend batch training (5 personas)⚠️ Slow✅ 2–3 hours✅ 1–1.5 hours
Price~$999~$1,800~$3,000+

Training Time Estimates

LoRA/QLoRA Fine-Tuning (LLMs)

Based on community benchmarks and framework documentation:

ModelMethodHardwareStepsBatchTimeSource
Mistral 7B (4-bit)QLoRAM2 Pro 16GB1,0004~30 minmarkaicode.com
Mistral 7B (4-bit)QLoRAM1 Max 64GB1,0008~15 minrandalscottking.com
Llama 7B (FP16)LoRAM2 Ultra1,0004~35 minml-explore/mlx-examples (475 tok/s)
Llama 7B (FP16)LoRAM1 Max 32GB1,0001~67 minml-explore/mlx-examples (250 tok/s)
Llama 8B (4-bit)QLoRAM4 Pro 48GB (est.)1,0008~10–20 minExtrapolated from M2 benchmarks + 2× bandwidth
13B (4-bit)QLoRAM4 Pro 48GB1,0002~45–90 minCommunity estimates

Iteration speed: 0.1–0.3 iter/s on 16GB M1/M2 Pro; 0.5–1.5 iter/s on M4 Pro 48GB (estimated based on bandwidth scaling).

Scaling rule of thumb: Training time is roughly inversely proportional to memory bandwidth. M4 Pro (273 GB/s) is ~2.3× faster than M4 base (120 GB/s) and ~2× slower than M4 Max (546 GB/s).

Embedding Model Fine-Tuning

ModelData SizeEpochsHardwareTime
all-MiniLM-L6-v2 (22M params)1,000 pairs3Any Mac (MPS)1–3 min
all-MiniLM-L6-v2 (22M params)10,000 pairs3Any Mac (MPS)5–15 min
embeddinggemma-300M1,000 pairs3M4 Pro (MLX)3–5 min
embeddinggemma-300M10,000 pairs3M4 Pro (MLX)10–20 min
Qwen3-Embedding-0.6B5,000 pairs3M4 Pro (MLX)10–15 min

Embedding models are small enough that training is fast on any Apple Silicon Mac.

Is Weekend/Overnight Training Viable?

Absolutely yes. Example weekend training schedule on Mac Mini M4 Pro 48GB:

TimeTaskDuration
Friday 10 PMData preparation scripts run30 min
Friday 10:30 PMPersona 1 QLoRA training (7B, 1K steps)20 min
Friday 10:50 PMPersona 2 QLoRA training20 min
Friday 11:10 PMPersona 3 QLoRA training20 min
Friday 11:30 PMPersona 4 QLoRA training20 min
Friday 11:50 PMPersona 5 QLoRA training20 min
Saturday 12:10 AMEmbedding model fine-tuning15 min
Saturday 12:25 AMNER model training10 min
Saturday 12:35 AMExport all adapters to GGUF, rebuild Ollama models15 min
Total~2.5 hours

The entire pipeline completes before midnight. Machine is available for other work by Saturday morning. A cron-driven train.sh script could automate this entirely.


Plugins Supporting Training Workflows

Ollama — Serving Fine-Tuned Models ✅

Ollama supports importing custom GGUF models and LoRA adapters via Modelfile:

FROM ./llama-3.1-8b.Q4_K_M.gguf
ADAPTER ./persona-mia.lora
SYSTEM "You are Mia, the architectural thinker..."
ollama create mia-persona -f Modelfile
ollama run mia-persona

Workflow: Train LoRA adapter with MLX-LM → export to GGUF/LoRA format → import into Ollama → serve via API. This is a proven, documented workflow.

LoRA merging: Can also merge LoRA into base weights pre-export for single-file deployment. MLX-LM provides fuse.py for this.

HuggingFace Hub Integration

Both MLX-LM and mlx-tune support:

  • Downloading base models from HuggingFace (automatic conversion to MLX format)
  • Uploading fine-tuned models/adapters to HuggingFace Hub (push_to_hub())
  • mlx-community namespace has pre-converted MLX weights for popular models

Guillaume could maintain private HuggingFace repos for each persona's adapter weights, enabling version control and rollback of training.

Dataset Preparation Tools

ToolPurposeMac Support
AxolotlYAML-driven fine-tuning orchestration pipeline✅ Works on Mac
Label StudioManual data annotation/labeling✅ Web-based, runs anywhere
nlpaugText augmentation for training data✅ Python library
pandas + custom scriptsJSONL generation from QMD knowledge base✅ Trivial

For Guillaume's specific workflow: A Python script that reads his QMD markdown files, extracts domain-specific text, generates training pairs (for embedding fine-tuning) or instruction examples (for persona LoRA training), and outputs JSONL files. This is a straightforward scripting task, not requiring specialized tools.

Training Orchestration

No Mac-specific training orchestrator exists as a plugin, but the workflow is scriptable:

#!/bin/bash
# weekend-training.sh — run via cron on Friday night

# 1. Generate training data from latest QMD content
python scripts/generate_training_data.py

# 2. Fine-tune each persona adapter
for persona in mia miette tushell council; do
    python -m mlx_lm.lora \
        --model mlx-community/Meta-Llama-3.1-8B-Instruct-4bit \
        --data data/${persona}/ \
        --train \
        --iters 1000 \
        --adapter-file adapters/${persona}.npz
done

# 3. Fine-tune embedding model for QMD
python scripts/train_embeddings.py

# 4. Export and rebuild Ollama models
for persona in mia miette tushell council; do
    python -m mlx_lm.fuse \
        --model mlx-community/Meta-Llama-3.1-8B-Instruct-4bit \
        --adapter-file adapters/${persona}.npz \
        --export-gguf
    ollama create ${persona}-persona -f Modelfiles/${persona}
done

echo "Training complete: $(date)" >> training.log

Evidence Quality

High Confidence (multiple corroborating sources, benchmarks)

  • MLX-LM LoRA/QLoRA training works on Apple Silicon — confirmed by Apple's official examples, multiple community guides, and r/LocalLLaMA reports
  • Training times for 7B QLoRA: 10–30 minutes — consistent across multiple independent sources
  • Memory requirements for quantized models — well-documented in MLX-LM docs and community reports
  • Mac Mini M4 Pro pricing and specs — Apple official specs
  • Ollama GGUF/LoRA import workflow — documented in Ollama repo

Medium Confidence (extrapolated from related data)

  • M4 Pro specific training speeds — extrapolated from M2/M3 benchmarks scaled by memory bandwidth ratio (reasonable methodology)
  • Embedding model fine-tuning times on MLX — mlx-tune is newer, fewer community benchmarks; times estimated from PyTorch MPS equivalents and model size scaling
  • QMD embedding model fine-tunability — embeddinggemma-300M is a standard Gemma embedding model; fine-tuning Gemma-class models is well-documented, but specific QMD integration path needs testing

Lower Confidence (limited data, requires verification)

  • Mac Studio M4 Ultra 512GB RAM claims — some sources mention this but official Apple specs show 192GB max for current models; may be future spec or custom config
  • llama.cpp native training maturity — evolving rapidly; status may have changed
  • Core ML training capabilities beyond basic MLUpdateTask — Apple documentation is sparse

Sources

Framework Documentation

  1. ml-explore/mlx-examples LoRA — Apple's official LoRA fine-tuning example with benchmarks
  2. ml-explore/mlx-lm — Apple's LLM package for MLX
  3. ARahim3/mlx-tune — Unsloth-compatible fine-tuning for Mac (LLM, Vision, Audio, Embedding, OCR)
  4. PyTorch MPS docs — Official MPS backend documentation
  5. HuggingFace Apple Silicon guide — Trainer + MPS integration
  6. sentence-transformers training guide — Embedding model fine-tuning

Benchmarks and Guides

  1. markaicode.com — Run and Fine-Tune LLMs on Mac with MLX-LM — Comprehensive MLX-LM guide with training times
  2. randalscottking.com — Fine-Tuning LLMs on Mac: Complete MLX Framework Guide — Step-by-step training guide
  3. blog.amsayed.dev — Fine-Tuning Your First LLM on Apple Silicon — Practical walkthrough
  4. DZone — Fine-Tuning LLMs Locally Using MLX LM — Technical guide
  5. dev.to/starmorph — Apple Silicon LLM Optimization Complete Guide — Performance optimization

Hardware and Pricing

  1. Apple Mac Mini Specs — Official specifications
  2. MacPrices.net — Mac Mini pricing history
  3. Apple Mac Studio Newsroom — Mac Studio M4 Ultra announcement
  4. Mac Studio Tech Specs — Official Mac Studio specifications
  5. Apple M4 Pro/Max announcement — Chip specifications

Community Data

  1. r/LocalLLaMA community benchmarks analysis — ~10,000 benchmark runs analysis
  2. Towards Data Science — Local LLM Fine-Tuning on Mac (M1 16GB) — Practical success story
  3. like2byte.com — Mac Mini M4 Pro 64GB: Real 30B LLM Benchmarks — Real-world benchmarks

QMD-Specific

  1. DeepWiki — QMD Vector Embeddings — QMD embedding model details (embeddinggemma-300M)
  2. tobi/qmd GitHub — QMD source repository

Research compiled by Agent C — Apple Silicon Training Capabilities For Agent A (inference) and Agent B (plugin review), see companion documents.