Test harness (prototype)

This is the one part of DoesItARM that is built, not just specced: a zero-dependency Swift CLI (apps/doesitarm-harness) that takes a Mac app from “we have a copy” to “here’s the verdict, with evidence.” Everything below reflects what runs today; planned pieces are marked.

System architecture — current vs planned

flowchart TB
  subgraph HOST["macOS host · Apple M2 Max · macOS 26.3 — ACTIVE TODAY"]
    H["Harness CLI<br/>swift run harness"]
    subgraph DRV["Driver seams (swappable)"]
      UI["UIDriver<br/>ax ✓"]
      REC["EvidenceRecorder<br/>screencapturekit ✓ · screencapture · ffmpeg · none"]
      ISO["IsolationBackend<br/>host ✓"]
    end
    APP["App under test<br/>VLC · HandBrake · Audacity"]
    H --> DRV
    DRV --> APP
  end
  subgraph VM["macOS VM · tart / Virtualization.framework — PLANNED (stub, not active)"]
    VAPP["App in clean-room VM"]
  end
  HOST -. "--isolation tart (not wired up yet)" .-> VM
  classDef planned stroke-dasharray:6 4,opacity:0.5;
  class VM,VAPP planned;

Build status

Capability	Status
Static arch / Rosetta classification (`lipo`/`otool`)	✅ built
Bulk scan of all installed apps (no permissions)	✅ built — 113 apps classified
Acquire via Homebrew cask + locate	✅ built
Launch + window-ready + crash detection	✅ built
Drive UI via Accessibility (menus, buttons, files)	✅ built (needs Accessibility grant)
Audio Unit lane via `auval` (zero-UI)	✅ built
Record video + screenshots	✅ built — needs Screen Recording grant to capture
Local store (SQLite = D1 mirror, `r2/` = R2 mirror)	✅ built
Swappable recorder + isolation backends	✅ built (host + capture backends); tart/user = stubs
macOS VM isolation (`tart`)	◌ stub
Live D1/R2 sync	◌ dry-run only

Test run pipeline

One invocation of harness test <app> runs this, start to finish:

flowchart LR
  A["acquire<br/>brew cask / locate"] --> B["static arch<br/>lipo · otool"]
  B --> C["isolation.prepare<br/>host: clean state"]
  C --> D["launch<br/>NSWorkspace"]
  D --> E["record<br/>ScreenCaptureKit"]
  E --> F["drive profile<br/>Accessibility API"]
  F --> G["evidence<br/>video · shots · crash · logs"]
  G --> H["classify<br/>passed / native …"]
  H --> I[("store<br/>SQLite + r2/")]
  I -. "sync (later)" .-> J["D1 + R2"]

The driver pattern

Three seams are pluggable at runtime, so we can A/B the original tooling against alternatives — or run permission-free — with a flag. ✓ = working, (stub) = wired seam, not implemented.

flowchart TB
  E["AutomationEngine"]
  E --> U{{"UIDriver"}}
  E --> R{{"EvidenceRecorder"}}
  E --> S{{"IsolationBackend"}}
  U --> u1["ax ✓"]
  U -.-> u2["cgevent (stub)"]
  U -.-> u3["xcuitest (stub)"]
  R --> r1["screencapturekit ✓"]
  R --> r2["screencapture ✓"]
  R --> r3["ffmpeg ✓"]
  R --> r4["none ✓"]
  S --> s1["host ✓"]
  S -.-> s2["tart VM (stub)"]
  S -.-> s3["user account (stub)"]

Isolation options

Backend	GPU	Isolation	Status
`host` (default)	✅ real Metal	prefs/containers reset	active
`user` (dedicated account)	✅ real Metal	separate account state	stub
`tart` (macOS VM)	⚠️ paravirtual (capped)	full clean-room	stub
bare-metal worker (prod)	✅✅ full	one app per box	planned

Data model → D1 + R2

Results map 1:1 onto the data model. The local store is built so “connect Cloudflare later” is a flat upload: SQLite is D1, and r2/ file paths are R2 object keys.

flowchart LR
  subgraph M["Domain model"]
    T["Title"] --> SG["Signal"]
    TR["TestRun"] --> SG
    SG --> V["Verdict"]
  end
  subgraph L["Local store · .darm-data"]
    DB[("SQLite<br/>doesitarm.db")]
    FS["r2/ mirror<br/>recordings · logs · diagnostics"]
  end
  M --> DB
  TR -. "evidence keys" .-> FS
  DB -.->|"wrangler d1 execute"| D1[("Cloudflare D1")]
  FS -.->|"wrangler r2 object put"| CR[("Cloudflare R2")]

Component tree

mindmap
  root(("doesitarm-harness"))
    CLI
      setup
      arch
      scan
      test
      plugins
      sync
    Core["CompatHarnessCore"]
      Acquirer
      ArchInspect
      Launcher
      EvidenceRecorder
      IsolationBackend
      Accessibility
      Profiles["Base + VLC/HandBrake/Audacity"]
      Storage["Storage + Schema"]
      VerdictResolver
      AutomationEngine
    Store[".darm-data"]
      SQLite["SQLite (D1 mirror)"]
      r2["r2/ evidence (R2 mirror)"]

What works without permissions

macOS gates UI driving (Accessibility) and capture (Screen Recording) behind TCC grants that can’t be set from the CLI. The arch/scan/plugin lanes need neither, so we collect real data today regardless.

flowchart TB
  subgraph FREE["Works now — no permissions"]
    sc["scan · 113 apps classified"]
    pl["plugins · auval"]
    ar["arch · static Mach-O"]
  end
  subgraph AXG["Accessibility ✓ granted"]
    dr["drive UI · menus · buttons"]
  end
  subgraph SRG["Screen Recording ✗ NOT granted"]
    vid["video clips"]
    shot["screenshots"]
  end
  classDef ok fill:#0f3d2e,stroke:#5ee0a8,color:#dffaf0;
  classDef warn fill:#3a2f12,stroke:#e5c463,color:#fff7e0;
  class FREE,sc,pl,ar,AXG,dr ok;
  class SRG,vid,shot warn;

Running it

cd apps/doesitarm-harness
swift build
swift run harness scan                  # classify every installed app (no permissions)
swift run harness test vlc              # acquire, launch, drive, record, store
swift run harness test vlc --recorder ffmpeg --isolation host   # swap backends

The full options matrix and decisions live in the repo at apps/doesitarm-harness/docs/driver-architecture-and-tooling-options.md.