← Back to Articles & Artefacts
artefactseast

CONTENT

IAIP Research
mobile-workout-devops-664ddbcf-915a-4b81-a891-72e6ade5e89c
<img src="https://r2cdn.perplexity.ai/pplx-full-logo-primary-dark%402x.png" style="height:64px;margin-right:32px"/>

Verbatim transcript (cleaned and clearly marked)[1]

[Unknown Speaker A]

Okay, so I'm getting up. On my computer, there are various very interesting work that I started. Where I want to go with this is to find a way that I could be considered as a participant in something like the implementation, because in the future these agents would probably be in the loop. Or they would have something that watches their sessions, or rather their messages, to see if they receive something. If they receive a message, it influences the continuity of their sessions, of their work.

Therefore, if I'm a participant and there can be messages left, or if I can leave messages, I think we might be conceiving of a routing agent or something like that. Basically, I'm speaking to my watch and recording an audio, and we will need somewhere to drop this audio so it can be transcribed and processed accordingly, in whatever way it is received. Maybe it is going to be through an iOS shortcut that would receive the recording.

This server has intelligence and is capable of decomposing what I spoke about. It probably has access to what is going on. Maybe in the first version it is not going to be capable of doing everything, but I'm going to speak it out here anyway.

Optimally, these sessions are in an awaiting mode for continuing and are triggered by a change, something decomposed and organized by the routing agent. That is what would be receiving my… I do not know what to call this, but right now I am doing some physical exercise and it makes me see that I really need this interface, or whatever we call that. I am open to having it well named by whoever will interpret this, to create some kind of handing off to the local agent.

I am going to implement something. I think that one of the constraints right now is being local on the computer. When I get up, I can launch a server pretty quickly. Inside of it, there are ways to input and give my content there for processing.

I would be interested in what skills I need to put into motion to do all of that. I know that in these systems there is a relationship with state machines, because yesterday I did some interesting reading and there is something about state machines and large language models with graphs. There is something called LangGraph and LangChain. I have these certifications in progress on the Udemy platform, which probably will not be recognized adequately, but that is not that important. There are libraries and code bases being constructed to help advance the work.

At the level of abstraction, we want a reflection layer that would prepare a handing off to work within our existing libraries, to upgrade them or create new ones. This service or server interface, whatever we call this, is something I am thinking about while getting up and not looking at the screen. I might look at the screen, but I might also not.

There could also be another way to interact with it. In iOS Shortcuts, it is possible to do a loop over a certain number of iterations. It is possible to present a menu that leads into some form of sub‑functions. If I use the dictation tools, I can have something transcribed and sent to an API right away.

There are three modes: when you press the button; when there is a short silence; or when there is a long silence that stops dictating and passes the input to the next node in the workflow graph. This is one of the main possibilities that I would want to support.

There is also something about handing off to myself: from being on the desktop to being in exercise. Right now I am doing an exercise with my shoulder; it is pretty good. It is very important to get up and move.

Given that this application could exist, we already have some code on the server that could probably be used as a platform to call or do something. We do not necessarily have to start from scratch.

My point here is that either I am starting fresh—meaning I get up and start recording, I start my iOS shortcut—or there are other possibilities. When I am getting up, there could be a process that gets triggered on the computer and that prepares something in storage. When I start my iOS shortcut, it would then “grab your handing‑off package.”

When the handoff package is retrieved, it is a sequential thing where I can give verbal feedback on each piece of content. But that increases the level of complexity, because it raises a question like: what am I going to input there? That is a good question.

We are creating a system, and often the formulations of the desired outcome are not to my liking. If I am told where we are and what outcome we are trying to achieve, and then I am asked anything, I will speak content that could be decomposed into anything relevant.

There could also be a mode on the computer where, every time content is received, my agents work on it, or it just sits there. When I am finished doing my workout, I sit down at the computer and press a button like “ingest,” and it injects the content into whatever session.

I think there would be a routing agent running just for this during the whole session while I am up. It would probably loop in an iteration and, each minute, call to see if there is something to ingest, with proper instructions to do its job.

Hopefully, any other agents that have sessions are registered and have the Model Context Protocol tools, and they know whether they have received a message or not. When I am sitting at the computer, there could be a visual representation of where the messages composed by the routing agent are going to be sent. With something really easy to use visually, I would be able to accept or reject items, or input more content with my voice.

I will go with that as a closing statement. I know I am going to send this to at least Perplexity for processing, with some instructions about the fact that I am creating something here.

We might need— I might need to give you the published package pages, so you can query them and know what they are doing. I think I will want to work on reusable components. In the end, we will have this server running, doing what we talked about and what is implied here.

In my output, I am going to want a verbatim of what I just spoke about here, so I can use it with my local agent, with some tools that you do not have access to. Then probably another section where you will produce anything that could represent the load of what we talked about to local agents.


Revised and extended specification for local agents1

1. Overall purpose

Design and implement a local, desktop‑anchored agentic system that:

  • Lets me speak into a watch or phone while away from the keyboard.
  • Routes, structures, and stores that content.
  • Hands it off to existing or new local agent sessions that continue my work.
  • Supports back‑and‑forth “handoff to self” between desktop and movement/exercise contexts.

The system must run locally by default and integrate with existing code and libraries, including graph/state‑machine‑style orchestration (e.g., LangGraph, LangChain).1


2. Main actors

  • Human user (me)
    • Speaks ideas, status, and instructions while moving or exercising.
    • Reviews and curates routed messages when back at the desktop.1
  • Mobile capture client
    • Implemented initially via iOS Shortcuts (and watch‑triggered recording).
    • Handles audio recording, dictation/transcription (either device‑side or via API), and POSTs the resulting text/audio to the local server endpoint.
    • Supports three dictation modes:
      • Button‑press to end input.
      • Short silence timeout.
      • Long silence timeout that ends dictation and passes the content to the next node in the workflow graph.1
  • Local server
    • Runs on the desktop/laptop, launched quickly when I get up or log in.
    • Provides HTTP/IPC endpoints for:
      • Receiving new recordings or transcripts.
      • Polling and managing “handoff packages”.
      • Exposing a UI for reviewing routing decisions and session status.1
  • Routing agent
    • A long‑running process or loop, scoped to the current “up and moving” session.
    • Periodically checks (e.g., once per minute) for new content in an inbox/storage.
    • Decomposes each recording into structured items (messages, tasks, notes, questions, etc.).
    • Decides which active or latent agent sessions should receive which items.
    • Creates and maintains a “handoff package” for the mobile side and for the later desktop review.1
  • Work agents / sessions
    • Existing or new agent sessions that maintain context about specific projects, documents, or workflows.
    • Are MCP‑aware and can receive routed messages as part of their context.
    • Expose actions like “ingest message”, “update plan”, “create task”, “respond with questions”, etc.1
  • Desktop review UI
    • Visual interface that shows:
      • Incoming messages from mobile capture.
      • Proposed routing decisions (message → target session).
      • Current state of each session.
    • Allows me to:
      • Accept or reject routings.
      • Edit or annotate messages.
      • Trigger ingestion into sessions.1

3. Core workflows

3.1 Starting a movement/exercise session

  1. I get up from the computer.
  2. Optionally, a desktop trigger fires (login hook, hotkey, manual button) that:
    • Starts or resumes the local server (if not already running).
    • Generates or updates a “handoff package” containing:
      • Current active sessions.
      • Their brief status and goals.
      • Any queued items needing feedback.1
  3. I start the iOS Shortcut / watch capture flow.
  4. The shortcut optionally fetches the current handoff package summary to present a minimal menu (e.g., “Project A, Project B, General Inbox”).1

3.2 Capturing content while moving

  1. I speak into the watch/phone using the shortcut’s dictation step.
  2. The shortcut captures audio and/or text, applies its mode (button, short silence, long silence) to delimit the utterance.
  3. The shortcut POSTs the utterance to the local server’s /ingest/raw endpoint with metadata:
    • Timestamp.
    • Optional user‑chosen tag (project/inbox).
    • Dictation mode used.1
  4. The server stores the raw item in an “Incoming” collection for the routing agent.1

3.3 Routing and decomposition

  1. The routing agent loop wakes periodically (e.g., once per minute).
  2. For each new incoming item:
    • Calls an LLM/tooling pipeline to:
      • Clean transcription.
      • Segment into atomic units (messages, tasks, reflections).
      • Extract intent, project cues, and priority.
    • Proposes a mapping from each unit to:
      • Specific agent session(s), or
      • A general “inbox” or “to‑triage” bucket.1
  3. The routing agent updates the “handoff package” with:
    • List of units.
    • Proposed destinations.
    • Short summaries for UI display.1

3.4 Optional mobile‑side feedback on handoff package

  1. The mobile shortcut can request the current handoff package (e.g., /handoff/current).
  2. The shortcut presents a simple loop/menu over units:
    • “Item 1: [summary]. What do you want to do?”
    • Choices: confirm routing, change destination, discard, add comment.
  3. My verbal responses are transcribed and sent back as small control messages, updating the routing decisions and annotations.1

3.5 Desktop review and ingestion

  1. After finishing exercise, I return to the desktop.
  2. In the desktop UI I see:
    • A list of units from the last session.
    • Proposed routing.
    • Any comments I added while mobile.1
  3. I can:
    • Accept routing in bulk or per unit.
    • Edit text units.
    • Merge or split units as needed.1
  4. On “Ingest”, the system:
    • Sends each accepted unit to the corresponding agent session via a standard message API.
    • Logs the event in a session history.
  5. Agents update their internal plans, data structures, or artifacts accordingly.1

4. Key design constraints

  • Local‑first
    • Server and agents run locally.
    • Network access (for APIs, cloud models) is optional and configurable.1
  • State‑machine / graph‑oriented orchestration
    • Workflows are represented as graphs or state machines (e.g., using LangGraph/LangChain frameworks or equivalents).
    • The routing agent, handoff package lifecycle, and ingestion are explicit nodes/transitions in these graphs.1
  • MCP‑aware sessions
    • Sessions adopt a Model Context Protocol‑like interface:
      • receive_message(message)
      • get_context()
      • list_tools() / invoke_tool()
    • Sessions can signal whether they expect more messages or are dormant.1
  • Reusable components
    • Routing logic, handoff package management, and desktop UI must be factored as reusable modules that can be attached to different projects and codebases.1

5. Suggested component breakdown

  • API layer
    • /ingest/raw – accept new raw content from mobile or other clients.
    • /handoff/current – fetch current handoff package summary.
    • /handoff/update – apply mobile or desktop feedback on units.
    • /sessions/list – list active/inactive sessions.
    • /sessions/ingest – send curated units into sessions.1
  • Storage layer
    • Collections/tables for:
      • Raw utterances.
      • Decomposed units.
      • Handoff packages (per time window / movement session).
      • Session registry and metadata.1
  • Routing engine
    • LLM‑powered classification + rule‑based overrides.
    • Functions: decompose_utterance, propose_routing, apply_feedback, finalize_units_for_ingestion.1
  • Session manager
    • Registers sessions, with fields like id, name, project, status, mcp_capabilities.
    • Provides a uniform interface for message delivery.1
  • Desktop UI
    • Views:
      • “Today’s mobile notes” (list of units).
      • “Routing overview” (matrix of units vs sessions).
      • “Session detail” (timeline of messages).1

6. Example message/unit schema

```json { "id": "unit-2026-02-22-001", "source_raw_id": "raw-2026-02-22-001", "timestamp": "2026-02-22T16:16:00Z", "text": "I want a routing agent that polls every minute and sends my notes to active sessions.", "type": "design_intent", "project_hint": "agent-orchestration", "priority": "normal", "proposed_destinations": ["session-design-docs", "session-impl-tasks"], "status": "pending_review", "annotations": [ { "author": "user", "channel": "mobile", "text": "This belongs to the architecture document.", "timestamp": "2026-02-22T16:17:00Z" } ] } ```

Agents can rely on this structure to ingest and act on content systematically.1


7. Tasks for local agents

  • Implement the local server with the endpoints and storage model described above.
  • Implement the routing agent loop and its decomposition + routing logic.
  • Implement the session manager with MCP‑style interfaces.
  • Implement a minimal desktop UI for reviewing and ingesting routed units.
  • Provide example configurations for different projects (e.g., research synthesis, code architecture, narrative design) that reuse the same routing/handoff infrastructure.1
<div align="center">⁂</div>

Footnotes

  1. file-1.m4a 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29