Productivity & Automation / workflow case

Why Apple Messages Might Be the Best Front Door to Your AI System

Beginner to intermediate Set up once, then iterate continuously @_chrisbarnhart
Result

Connecting a Claude Code Agent to iMessage + Family Calendar PDF / basketball scouting / scheduled actions — all triggerable and reportable via SMS | Personal AI infrastructure workflow

For

People who want to integrate their personal AI agent into messaging, home automation, and daily workflows.

How I built Hopper: a textable Claude Code agent that can run local tools, control my home, check schedules, manage workflows, and report back through @Apple iMessage.

The more AI can do, the worse the app model starts to feel.

That is the strange thing about agentic AI. As these systems become more capable, the question is no longer just whether the model can answer you. It is whether the system can act where your life and work already happen.

The industry is moving quickly in that direction. AI products are shifting from chatbots that respond to prompts into agents that can use tools, connect to apps, retrieve context, inspect files, run commands, generate artifacts, trigger workflows, and coordinate work across systems.

The center of gravity is moving from “What can the model answer?” to “What can the system do?” But even as AI systems become more agentic, most of them still make the same assumption: you should start inside their app.

That works when AI is mostly helping you think. It starts to break when AI is supposed to help you operate.

The work people care about usually does not begin inside an AI product. It lives in the individual context silos of everyday life: calendars, messages, files, reminders, local scripts, home automations, saved preferences, recurring workflows, and all the small operational systems that make life specific to you.

That context is what makes AI personal.

The harness is what makes it powerful.

A useful agent needs both. It needs to understand the shape of your life, and it needs access to the tools required to do something useful with that understanding.

That is why the front door matters.

The best AI interface is not just a better place to type. It is a way into your individual context layer and your action harness at the same time.

That is what made Apple Messages interesting to me. Not because it is the most advanced interface or its features... It is better than that in one very specific way.

I already use it daily

Messages is on my phone, my watch, my laptop, and my desktop. It supports async communication, notifications, attachments, voice dictation, threads, and a mental model everyone already understands. I do not have to remember to open it. I do not have to learn it. I do not have to convince my family to adopt it.

It is just there.

So instead of building another AI destination, I started thinking about Apple Messages as a front door. Behind that front door is Hopper: a home AI operating system built around Claude Code running on my Mac mini, connected to my local tools, files, scripts, automations, memories, and family workflows.

The text thread is simple. The system behind it is powerful.

When I send a message, I am not just prompting a chatbot. I am invoking an AI runtime that can reason through the request, decide what needs to happen, route work through the right tools, generate artifacts, trigger workflows, and send the result back through the same conversation.

The message can be casual. The system it reaches does not have to be.

That is what makes the pattern feel different. Apple Messages becomes the universal input layer for a much deeper local system. Behind it, Hopper can call MauriceOS, inspect calendars, generate PDFs, run scripts, check logs, interact with home automations, manage scheduled actions, and even route specialized work to other AI coding agents like Codex or Gemini when that is the better tool for the job.

The more I used it, the more obvious the next layer became.

A system like this should not simply respond. It should learn from use. Every interaction creates signal: what worked, what missed, what needed correction, what became a repeatable pattern, and what should become a better default next time. Hopper can use that signal to continuously refine its memory structure, improve its workflows, preserve useful context, and make the experience feel more personal over time.

That is where the system starts to compound. It does not just answer the next request. It gets better at understanding how I work, what my family needs, which outputs are useful, and which patterns are worth turning into durable tools.

My personal harness: Hopper

I wanted to a powerful AI operating system at home and have it actually do helpful things for my busy family of 5.

Hopper is a FastAPI service running on a Mac mini on my home LAN. It wraps Anthropic's claude-agent-sdk (Claude Code as a library, not the CLI) and exposes several ingress paths: a streaming web endpoint, an endpoint for the Apple Watch, and a multipart iMessage webhook, all of which funnel into the same agent runtime with the same project context and the same rolling per-client session map. The model is one component. The interesting engineering is everything around it: ingress, session state, tool routing, permissioning, a scheduling daemon, and a return path back through Apple Messages. The rest of this piece walks through that surface area, because for an engineering audience the harness is the product.

The Calendar Moment

The first time Hopper really clicked for my family was not a flashy demo. It was a simple request to print out our combined calendars for her notepad.

My wife likes writing things down. She is a note taker, and for some planning workflows, pen and paper still work better than screens. Our family calendar has the usual sprawl: kids' activities, school events, sports practices, games, appointments, travel timing, and all the small logistics that come with running an active household.

I did not want to open a calendar app, export events, design a layout, format a printable document, add space for notes, save it as a PDF, and then figure do the same thing next month.

So, I text my house and ask for the thing I needed.

I sent Hopper a message asking it to review the family calendars and generate a beautiful, professional PDF with room for handwritten notes. Hopper pulled the calendar context, created the document, and returned the finished artifact through the same message thread.

That is a small example, but it captures the bigger shift.

The calendar request resolves to a parameterized Python script that pulls events from the MauriceOS calendar API and renders a landscape, one-page-per-month PDF with a fixed house theme: Monday-start weeks, weekend tint, holiday styling, all-day chips, a per-month highlights list, and a lined notes panel.

Claude Code's job is not to lay out a PDF token by token; it is to recognize intent, select the right tool, invoke it with the right arguments, and hand back the artifact. The deterministic work lives in code; the model supplies the routing and the judgment.

The finished file goes back to Messages as a multipart upload to a Maurice "Message" shortcut, where the server saves the upload, injects a URL parameter into the shortcut input, and the shortcut attaches it in an iMessage. That separation, model for intent and code for deterministic output and a shortcut for delivery, is the pattern that repeats everywhere in the system.

The best part is the same system can generate code on-demand, so if a tool needs to be created for a task... it will just create it.

The Basketball Moment

The second moment was basketball.

My 17-year-old and 14-year-old sons play AAU basketball, which means our weekends often depend on schedules, opponents, venues, traffic, team quality, and a lot of scattered information. If you have kids in competitive sports, you know how quickly a weekend can turn into an operations problem.

Who are we playing?

Where is the game?

How strong is the opponent?

What time do we need to leave?

Is this a tough matchup or a game we should win?

What should we expect by halftime?

I wanted Hopper to help with that too.

So I started texting my house before tournament weekends and asking it to research the opponents my boys were playing, use the available AAU data, build an ELO-style matchup view, and generate predictions by half.

This is where the front door pattern becomes powerful. I am not opening a scouting app. I am not running a script manually. I am not checking five sites and stitching together the answer. I am sending a message in natural language to a local AI harness that knows which tools exist, what context matters, and where to return the result.

The output is not just a generic answer. It is a family-specific artifact built around a real recurring workflow.

The pipeline behind one text. A basketball preview is a multi-stage data-gathering job, and it is worth being concrete because this is where the harness earns its keep. Hopper resolves each son's team to a stable team number, pulls the newest cached stats export from a local scouting service on a second Mac, parses the opponent out of the schedule entry, and resolves it through a division-scoped team search. When an opponent has no cached export, Hopper kicks off a scrape job, polls a status endpoint until the scrape completes, and only then assembles the matchup view. It builds an HTML card per game with deep links into each team's stats, flags the toughest matchup, and, for recurring weekend runs, archives a dated PDF into a known folder so there is a history to refer back to. Spelling-variant retries, silent fallback when an opponent link fails, and idempotent caching are all encoded in the workflow, not improvised by the model each time. The model orchestrates; the tools and conventions make the orchestration reliable and repeatable.

The value is not that Hopper can talk about basketball. Any model can talk about basketball. The value is that Hopper can participate in the actual workflow around our basketball weekends.

It can connect the schedule, the teams, the available data, the matchup logic, the travel context, and generate the final report.

Hopper Is Not the Model

This is the most important architectural point.

Hopper is not the model.

Hopper is the harness around the model.

Claude Code provides the primary reasoning loop, but Hopper is the system that gives that reasoning loop a place to operate. It provides the interface, the routing, the project context, the local tools, the memories, the permissions, the action layer, and the return path.

That distinction matters because most useful work is not just a reasoning problem. It is an orchestration problem.

A model can produce a great answer in isolation. But the workflows I care about require more than an answer. They require the system to know where the data lives, which tools are available, what actions are allowed, how to generate the output, and how to get the result back to me.

That is why the harness matters.

The harness is what turns intelligence into leverage.

Without the harness, the model can describe the calendar PDF. With the harness, it can create it.

Without the harness, the model can explain how to scout an opponent. With the harness, it can become part of the scouting workflow.

Without the harness, the model can tell me how to automate a repetitive task. With the harness, it can inspect the current workflow, propose an improvement, and help turn that improvement into a script, a shortcut, a scheduled action, or a change to the system itself.

Grounding is a file, not a fine-tune. Hopper does not customize the model. It loads a single, hand-maintained grounding document as project context by pointing the agent's working directory at the project root.

That file is the operating manual: every API endpoint and its quirks, known device and list IDs, URL-encoding gotchas, destructive-operation guardrails, and house-specific conventions like the office ceiling lights being downstream of a fan controller. Layered on top is a file-based memory store, one fact per file with light frontmatter and a loaded index, that persists user preferences, feedback, and project state across sessions. The result is that the same stock model behaves like a system that has worked in my house for months, because the context window is doing the work a fine-tune otherwise would, at a fraction of the cost and with edit-a-file iteration speed.

That is why I think the next wave of AI systems will not be defined only by model quality.

They will be defined by the quality of the harness around the model.

What context does it have?

What tools can it use?

What permissions constrain it?

Where does it live?

How does the user invoke it?

How does it learn from repeated use?

How does it turn a useful interaction into a durable improvement?

Those are product questions, but they are also architecture questions.

How Hopper Works

The easiest way to understand Hopper is to separate the front door from the reasoning engine and the reasoning engine from the action layer.

Apple Messages is the front door.

When I text Hopper, I am sending a message into a local entry point that lives on my own infrastructure. Hopper receives the message, turns it into a Claude Code request, runs the agent against the local project context, and then sends the result back through Messages.

That means the text thread becomes the control surface, but it is not the system itself.

It is the front door.

An inbound text reaches Hopper through an iOS Shortcut on my phone that fires a fire-and-forget multipart POST (text plus an optional attachment) at a webhook on the local server. The server is fire-and-forget by design: it acknowledges immediately, runs the turn asynchronously, and texts the reply back out when it is done, so a long job never blocks the sender and never depends on a held HTTP connection.

Before the message is ever forwarded to the model, a thin keyword layer intercepts a handful of single-word control commands: status reports what the server is currently working on with elapsed time (and deliberately runs even while the turn lock is held, which is the whole point of it), cancel aborts the in-flight work but only inside a short armed window after a status, reset clears the rolling session and drops pending approvals, and Y or N answer an outstanding tool-approval prompt. Those are cheap, deterministic, and never cost a model call. Everything else becomes a prompt.

Behind that front door is a local Hopper server. The server is the routing layer. It accepts requests from Apple Messages, the web app, the Watch app, and other local clients. It keeps track of active sessions, manages special commands like status, cancel, reset, and help, and decides whether a message should be handled directly or passed into a Claude Code turn.

One runtime, three ingress shapes. The web UI streams over Server-Sent Events so the browser sees tokens as they arrive. The Watch and phone app cannot reliably consume a streaming event channel over a self-signed TLS link, so they hit a sibling endpoint that runs the identical agent configuration but accumulates the full answer and returns it as one JSON blob.

All three share the same rolling session map keyed by client ID, so a conversation can move across surfaces and keep its context, or stay independent when the client uses its own persisted ID. The lesson for builders: pick one agent runtime and adapt the transport per client, rather than forking the brain per surface.

Claude Code is the primary reasoning engine. It reads the project context, understands the request, decides what tools it needs, and produces the next action. But the important part is that Claude is not operating in a blank chat window. It is running inside a local environment that gives it access to the systems I have explicitly wired in.

That includes MauriceOS, the personal operating layer for my house and family workflows. MauriceOS exposes local APIs for calendars, reminders, shortcuts, workflows, logs, HomeKit devices, GoTime travel plans, scheduled tasks, file delivery, and other tools. It is also an iOS app: https://apps.apple.com/us/app/maurice-os/id6758306736

Hopper sits on top of that layer. It gives Claude Code a way to reason across those capabilities and decide which tool should be used for a given request.

The action layer is just HTTP. MauriceOS is a self-describing REST surface on the LAN. The discipline that keeps it reliable is that Hopper re-fetches the API index at the start of a session before calling anything, rather than trusting a cached mental model of endpoints that drift over time. Tool access is plain HTTP. State that crosses bridges, like HomeKit device state behind a smart-home hub, is treated as untrustworthy until a cache refresh plus a reachability check confirms it, and stale state is labeled stale rather than asserted as fact. That "no claims without evidence fetched this turn" rule is written into the grounding doc and is, in practice, the single most important guardrail against a confident-but-wrong agent in a system that controls real devices.

So when I text, "make me a printable family calendar," Hopper is not pretending to be helpful by describing how I could make one. It can inspect the calendar, run the right script, generate the PDF, and send the finished file back through Apple Messages.

When I text it about basketball opponents, it can move through a different workflow. It can look at the schedule, identify the teams, pull available team data, use local basketball scouting tools, generate a matchup view, and return something I can actually use before the weekend starts.

The same pattern applies across the system.

A request comes in through a simple interface. Claude decides what needs to happen. The local harness gives it tools. The tools perform real work. The result comes back through the same thread.

There is also a scheduling layer. Hopper can save future or recurring actions, wait until the right time, run a headless Claude Code session, and send me the result. That matters because some workflows should not depend on me opening an app at the exact right moment. If I want a morning brief, a weekend sports preview, a travel check, or a recurring system review, the system can run the work later and deliver it via text.

Scheduled actions persist as JSON server-side, so the schedule looks identical on every client. A 30-second async tick loop fires anything that is due; if the server was down at fire time, it catches up on next startup rather than silently dropping the run.

Each action carries a one-time or recurring schedule and an optional per-action timeout override (default five minutes, raised to fifteen-to-thirty for multi-step research or image generation, capped server-side). The interlock that makes this safe to run unattended: every new action enters a pending-approval state and cannot execute until I explicitly accept it; the model is never allowed to approve its own scheduled work.

When it does fire, it runs a fully headless Claude turn with real read, edit, write, and shell access, and the contract is that the prompt ends with a short message-ready summary that the server texts out automatically. Autonomy is real, but it is gated by human approval at creation and bounded by a timeout at execution.

Hopper can also route specialized work to other coding agents when it makes sense. Claude Code is the main runtime, but Codex and Gemini are available as additional local CLI tools. That means Hopper can treat other AI agents less like competitors and more like callable capabilities inside the operating system.

Models as callable tools

Codex (with its built-in image generation) and Gemini are installed as ordinary CLIs and invoked like any other tool. When a request calls for image generation, Hopper shells out to Codex, writes the result to a file, and delivers it through the same Messages path as any other artifact. The harness is deliberately not loyal to one model; it picks the right capability for the job and folds the output back into the system. That is the difference between an agent and a router-of-agents.

That is an important part of the concept.

A strong AI harness should not be religious about one model or one tool. It should know what capabilities exist, choose the right one for the job, and preserve the useful output back into the system.

The architecture is not complicated because each piece has a clear job.

Apple Messages is the front door.

The Hopper server is the router.

Claude Code is the reasoning engine.

MauriceOS is the action layer.

Local scripts and APIs are the tools.

Codex and Gemini are specialist agents when needed.

Scheduled actions are the autonomy layer.

Messages are the return path.

The result is not an AI app in the traditional sense.

It is a home AI operating system that I can invoke through Apple Messages.

Why Apple Messages Works So Well

Apple Messages works as a front door because it has almost none of the friction that kills new interfaces.

I do not need to remember to open Hopper. I do not need to switch contexts. I do not need to convince my brain that I am entering a separate AI workspace. I can send a message from wherever I already am.

That sounds small, but it changes the usage pattern.

Most apps require intent before usage. You have to decide to open them. You have to remember they exist. You have to navigate their interface. You have to translate your need into the app's structure.

A message thread does not work that way.

It is ambient. It is always nearby. It can be synchronous or asynchronous. It supports quick requests and longer-running jobs. It naturally handles back-and-forth clarification. It can receive files, images, links, and text. It can notify you when the result is ready.

For AI systems, that matters.

A lot of useful AI work is not something you want to babysit. You want to make the request, let the system work, and get the result back when it is done.

That is what message threads are already good at.

They are not just input boxes. They are durable communication channels.

Why the channel maps cleanly onto agent work. The hardest UX problem for a tool-using agent is that real work is variable-latency: some turns finish in a second, some take fifteen minutes of scraping and rendering. A message thread is the rare interface that handles that natively. It is inherently asynchronous, it already has a notification model, it carries attachments in both directions, and it has durable history that doubles as session state. The fire-and-forget webhook plus a texted-back result is not a workaround; it is the interaction model the channel was already built for. The status and cancel keywords exist precisely because a long-running async job needs an out-of-band way to ask "what are you doing?" and "stop," and a text thread gives you that for free, no second app required.

That durability is important. A message thread has history. It has context. It has a natural rhythm of request, response, clarification, update, and completion. It is already where people expect to communicate with other people and services.

For a personal AI system, that makes it a surprisingly strong interface.

The goal is not to cram every possible AI interaction into Messages. There are times when a dashboard is better. There are times when a web UI is better. There are times when voice is better. There are times when a full application surface is necessary.

But for a large class of personal AI workflows, Messages is a nearly perfect front door.

Ask for the thing.

Let the system work.

Get the result back.

That is the loop.

The System Can Improve Because the System Has Memory

The most interesting part of Hopper is not any single workflow.

It is the feedback loop.

Because Hopper runs against a local project context, its useful patterns can become durable. If I ask for the same kind of output repeatedly, that interaction can become a template. If I keep correcting the system, those corrections can become memory. If a workflow takes too many steps, Hopper can help turn it into a script. If a script becomes useful, it can become part of the harness.

That is where the system starts to feel less like software I use and more like infrastructure I grow.

The calendar PDF is not just a one-off artifact. It can become a reusable family calendar generator.

The basketball scouting workflow is not just a one-off report. It can become a repeatable weekend matchup system.

A morning brief can become a scheduled action.

A repeated shortcut can become a named workflow.

A formatting preference can become a durable default.

A frustration can become a backlog item.

The feedback loop is a write path, not a metaphor. What makes this concrete rather than aspirational is that Hopper has full edit and write access to its own project directory, including the grounding doc and the memory store. So "the system improves" decomposes into specific, auditable artifacts: a recurring need becomes a checked-in script with a documented trigger phrase; a correction becomes a one-fact memory file with a "why" and a "how to apply"; a repeated multi-step ask becomes a saved scheduled action; a layout preference becomes an edit to the script that owns that layout.

The agent improves the harness because it operates inside the harness with the same tools that built it.

The discipline that keeps that from becoming sprawl is also written down:

check for an existing memory before writing a new one

delete memories that turn out wrong

don't store what the code already records.

It is version-controllable, diff-able self-improvement, not a black-box model updating its own weights.

That is what I mean by recursive improvement.

Not an uncontrolled AI rewriting itself in the background. Not a science fiction agent modifying its own goals. Something much more practical.

The system can learn from use because the interactions happen close to the tools, files, memories, and workflows that define the system. Hopper can help improve the harness because it operates inside the harness.

Most software asks users to adapt to the product. A local AI harness can adapt the product around the user.

That does not remove the need for judgment, permissions, or review. In fact, it makes them more important. A system that can act needs constraints. It needs transparency. It needs a way to ask before destructive actions. It needs logs. It needs visible tool use. It needs human approval for the things that matter.

What "constraints" means in code... Hopper's answers are specific.

No user-facing factual claim without evidence, is what keeps an agent with real actuators honest.

Destructive operations are gated: a cascading delete returns an error with the child-row count until it is re-issued with explicit confirmation, and the rule is to surface that count to a human and get a yes before committing.

Print jobs, bulk calendar writes, and lock or thermostat changes all require confirmation first.

The watch and iMessage paths run on a restricted tool allow-list and auto-deny mutations they cannot get a human to approve.

Scheduled actions are approval-gated at creation and timeout-bounded at execution.

Every shortcut run and device state change is logged and queryable. None of that is exotic; it is the same defense-in-depth a backend engineer would expect around any system with write access to production.

Within those boundaries, the feedback loop is the product. Every interaction can make the next interaction better.

Why This Is Different From a Smart Home

It would be easy to describe Hopper as a smart home project, but that undersells the idea. Smart home systems usually start with devices. Lights, locks, thermostats, cameras, speakers, sensors, buttons, scenes, and automations. That is useful, but it is also narrow.

The real opportunity is not controlling devices. It is coordinating workflows.

A house is not just a collection of devices. It is a living operating environment. It has schedules, people, routines, preferences, documents, meals, sports, school events, travel plans, reminders, chores, projects, and communication patterns.

Most smart home systems are good at actions like turn on the lights.

They are less good at something like this:

Review the family calendar, identify the important events this weekend, create a printable PDF that my wife will actually like, leave space for notes, and send it back to me.

Or this:

Look at the boys' upcoming basketball games, research the opponents, pull the available data, create an ELO-style matchup view, flag the toughest game, and give me predictions by half.

Those are not device commands.

They are family operations.

Hopper does control HomeKit, but even there the interesting work is operational, not switch-flipping. Turning on the office ceiling lights requires first activating the upstream fan controller they are wired behind; the lights report success either way, so the system has to know the real-world dependency.

House-wide "bright" or "dim" modes fan out across four native HomeKit scenes in parallel. That is the tell: the value is not the device endpoints, which any platform exposes. It is the accumulated layer of knowledge about how this house actually works, the dependencies, the IDs, the conventions, the quirks, that turns a generic API into a system that does the right thing. That knowledge lives in the grounding doc and the memory store, providing that hyper-personalized experience.

That is why the AI operating system framing matters. Hopper is not just trying to make devices conversational. It is trying to make the operating layer of the house accessible.

Apple Messages becomes the front door into that operating layer.

Claude Code becomes the reasoning engine.

MauriceOS becomes the action layer.

The tools become capabilities.

The family workflows become the product.

Learnings along the way

The obvious lesson is that AI agents need tools.

The less obvious lesson is that agents need front doors.

The front door determines whether the system becomes part of daily life or remains another destination people forget to visit.

For some products, the right front door will be Slack. For others, it will be email, a browser extension, a command line, a calendar, a mobile widget, a voice interface, or an embedded button inside an existing workflow.

For my life, @Apple Messages was the obvious place to start.

That does not mean every AI system should be text-message-first. It means builders should think carefully about where the request should begin.

Where is the user when the need appears?

What interface do they already trust?

What channel already supports the rhythm of the work?

Does the user need an app, or do they need a way to invoke a system?

That last question is the one I keep coming back to.

A lot of AI products still feel like destinations. Useful destinations, but destinations. You go there, ask for help, and then leave.

The systems I want to build feel more like infrastructure.

They sit behind the surfaces I already use. They understand the tools around me. They can act within constraints. They can return artifacts, not just answers. They can improve as I use them.

I do not want another AI app.

I want to text my house and have it actually do things.

Related