I stopped writing code by hand. Here's what replaced it.

I stopped writing code by hand. Here's what replaced it.

On AI-assisted development, why the mental model shift matters more than the tooling, and what happens when you point an agent at the iOS Simulator.

For most of my career, writing code was the job. Researching an approach, sketching a design, then sitting down and translating intention into Swift, line by line. That loop — think, type, compile, test, repeat — was so fundamental to how I worked that I didn’t really think of it as a method. It was just what programming looked like.

That changed at Dropbox. Not overnight, and not because someone handed down a mandate. It changed because I started using AI tools seriously, and somewhere in the middle of a long feature build on Dash, I looked up and realized I hadn’t typed a significant block of code in several days. I’d been directing, reviewing, correcting, and composing — but not transcribing. The ratio had completely inverted, and the output was better for it.

This is the post I wish I’d had at the start of that shift.

What “AI-assisted development” actually means day to day

The shorthand tends to conjure images of autocomplete on steroids. That’s real, and it’s useful, but it undersells the actual change. The more meaningful shift is at the task level, not the keystroke level.

Before: I’d get a task, break it into subtasks, implement each one, write tests, review the output.

After: I get a task, describe the intent and constraints to a model, iterate on the plan, review what gets produced, correct and refine, repeat until the output matches what I would have written — or, frequently, is better than what I would have written because the model explored an approach I wouldn’t have reached for first.

The skill isn’t typing faster. It’s learning how to describe what you want precisely enough that the agent can execute it without losing the signal. That takes practice. Early attempts produce confident, plausible, wrong code. You learn to be more specific about invariants, edge cases, architectural constraints. You learn that what you leave out of a prompt matters as much as what you include.

At Dropbox I used Claude heavily — for code authoring, for architecture review, for thinking through the data layer design of our AI features before a single line was written. The model became a collaborator I kept in context across long sessions, not a search box I queried and discarded.

The thing I built that surprised me

Somewhere in the middle of this workflow shift, I started wondering what it would look like to apply the same approach to testing — specifically, end-to-end testing on iOS, which has historically been painful to write and brittle to maintain.

So I built a Claude Code skill around it. The idea was simple in concept and genuinely weird in practice: instead of writing XCUITest scripts by hand, you describe the feature you want to test and how to navigate there in plain language. The agent takes that description, reasons about the steps required, writes a plan to a local file, and then executes that plan by interacting with the iOS Simulator directly — not through XCUITest, but through CoreSimulator’s private APIs.

CoreSimulator is the framework Apple uses internally to manage simulator instances. It exposes APIs for launching apps, querying running processes, and — crucially — sending synthesized input events directly to the simulator without going through the UI testing layer. This makes the interactions faster, more reliable, and decoupled from the accessibility tree that XCUITest depends on.

The flow looks like this:

  1. You describe the task in natural language: what the feature does, the navigation path to reach it, what a successful outcome looks like.
  2. The agent writes a structured plan to disk — a step-by-step workflow file that specifies each interaction and what to verify after it.
  3. The agent begins executing the plan: launching the simulator, navigating to the right screen, performing the interactions.
  4. At each verification point, the agent self-discovers — rather than asserting against a hardcoded expected state, it inspects the current simulator state, reads what’s actually on screen, and determines whether what it finds matches the expected outcome.
  5. If a step fails, the agent reasons about why and either corrects its approach or surfaces the discrepancy for human review.

The result is something I’ve started calling on-the-fly end-to-end tests. They’re not persistent test suites — they’re single-run verification passes that the agent generates and executes in response to a described change. They’re particularly useful for the kind of “did this feature work after I refactored the underlying model?” question that’s annoying to answer manually and overkill to build a full UITest suite for.

The self-discovery piece is the part that keeps surprising me. Because the agent isn’t asserting against a fixture — it’s genuinely navigating and observing — it catches things a hardcoded test would miss. A layout shift. An unexpected empty state. A loading spinner that shouldn’t still be there. The agent doesn’t know to expect any of these; it just notices them because it’s looking at the actual simulator output, not a pre-written oracle.

What this changes about being an iOS engineer

The workflow shift doesn’t make the craft go away. If anything it surface-areas the craft more. When you’re transcribing logic into code, a lot of your cognitive bandwidth goes toward syntax, API lookups, and the mechanical work of translation. When that layer is handled, you spend more time on the questions that actually determine whether software is good: Is this the right abstraction? Where are the failure modes? What happens in the slow path?

The engineers I’ve seen struggle with AI-assisted development are the ones who use it as a faster typist and then disengage. The output suffers because the judgment isn’t there to correct it. The engineers I’ve seen thrive are the ones who treat the model as a capable-but-junior collaborator who needs precise direction and thorough review.

I still read every line. I still understand every decision. But I’m not the one doing the mechanical work of translating that understanding into text, and that freed-up bandwidth goes somewhere useful.

The iOS simulator skill I built is probably not something I’ll open source as-is — it’s a bit rough and fairly Dropbox-specific in its assumptions. But the pattern it represents — agent writes plan, executes against live state, self-discovers to verify — feels like a direction that end-to-end testing on mobile is going to converge on regardless. I just got to try an early version of it.

That’s the thing about working at the frontier of a tooling shift: you get to build the things that don’t exist yet. Even the rough ones teach you something about what the right version would look like.


© Andrew Apperley 2026. All rights reserved.