Executive summary
The archive you can't retire is the risk you can't close.
Regulated enterprises hold billions of customer documents — statements, explanations of benefits, insurance policies, tax forms — locked in legacy archives like Mobius and ASG-ViewDirect. These systems are end-of-life, expensive to license, and understood by a shrinking pool of specialists. Yet the documents inside are legally significant and must remain perfectly retrievable for years.
Migrating them sounds simple and almost never is. The dominant format, AFP, renders correctly only with its external resources; a naïve lift-and-shift silently corrupts pages. Sensitive data hides in the index fields. And when an auditor asks "can you prove every document arrived, intact and complete?", a manual migration has no defensible answer.
MigrateIQ is a migration control plane that solves all three. An AI copilot collapses months of manual field analysis into minutes of human-approved decisions. A deterministic, restartable engine guarantees fidelity — refusing to load anything it can't render losslessly. And every wave emits a signed, hash-chained evidence pack that proves, to an auditor's standard, that source equals target. The thesis is not a slide; it runs end-to-end today.
Discover
Inventory, format mix, sizing, wave plan and risk flags — before you touch the archive.
Execute
AI-drafted, human-approved mappings; a restartable engine that moves content losslessly.
Govern
Hash-chained audit and signed evidence packs that prove every wave to a regulator.
1 · The problem
Legacy content archives are a liability that compounds.
The challenge of high-volume regulated content migration is not volume alone — it is the combination of constraints that defeats generic tools:
- Fidelity is fragile. AFP (MO:DCA) documents render from external resources — form definitions, overlays, fonts. Migrate the data without the resources and the page becomes garbage. The failure is silent; you discover it when a customer or regulator does.
- Scale is unforgiving. Tens or hundreds of millions of documents mean a run will be interrupted. Without idempotent checkpointing, a crash means starting over — or worse, loading documents twice.
- Privacy is everywhere. PHI and PII hide in index metadata. A migration that copies an SSN into a log or an AI prompt has created a breach, not a record.
- Proof is mandatory. SOX, HIPAA, PCI and records-retention rules require demonstrable completeness and integrity. "We think it all came over" is not an audit position.
Each constraint alone is manageable. Together, they are why these archives sit untouched for a decade while the bill — financial, operational and regulatory — keeps growing.
2 · The thesis
AI collapses the analysis. A deterministic engine guarantees the result.
MigrateIQ is built on a deliberate division of labor between probabilistic and deterministic systems — using each for what it is good at, and never the reverse.
AI collapses the analysis. The slow, expensive part of any migration is understanding the source: what fields exist, what they mean, where they should land, and what's sensitive. A mapping copilot proposes every field with a confidence score, supporting evidence, and a rationale — turning months of specialist analysis into a review queue a human can approve in an afternoon.
A deterministic engine guarantees fidelity. The actual movement of content is never left to a model. A deterministic pipeline extracts, transforms, loads and reconciles with no randomness anywhere: ids are content hashes, timestamps are injected, and the same input always produces the same output — and the same evidence signature.
The platform proves it. Every action is recorded in a tamper-evident ledger, and every wave ends in a signed evidence pack. The proof is a first-class output, not an afterthought.
3 · The platform
Discover · Execute · Govern.
Three jobs, one control plane — designed so the same engine serves any source, any format, and any target.
Discover
A read-only assessment crawls the source and produces an inventory, format-mix breakdown, storage sizing, an automated wave plan, and risk flags — including detection of sensitive fields. You see the shape and the danger of the archive before a single document moves.
Execute
The AI copilot drafts a mapping spec; a human approves it at an explicit gate; then the restartable engine runs extract → transform → load → validate, emitting lineage and audit at every stage. An unapproved spec cannot start a run.
Govern
Every action appends to a hash-chained ledger. Each wave reconciles counts, verifies checksums, runs a render-diff, and emits a signed evidence pack in HTML and JSON — auditor-ready by construction.
Canonical Content Model
Sources and targets never talk directly. Everything passes through a typed Canonical Content Model (the single source of truth), so adding a new source or target is a connector — not a rewrite.
4 · Inside the engine
Every document, on rails — from archive to evidence.
The pipeline is a sequence of discrete, checkpointed activities. Each stage is idempotent and records its outcome, so the run can be killed and resumed without loading any document twice.
DISCOVER → inventory · format mix · sizing · wave plan · risk
MAP (AI) → draft spec (confidence + evidence + rationale)
APPROVE → human gate — unapproved spec cannot run
EXTRACT → source connector (Mobius, ViewDirect, …)
TRANSFORM → AFP→PDF, fail-closed on missing resources
LOAD → target connector (CMOD: arsxml · index · arsload)
VALIDATE → counts · checksum · metadata · render-diff
EVIDENCE → signed pack (HTML + JSON) + audit excerpt
Restartable by design. The orchestrator is structured as checkpointed activities — a Temporal-workflow stand-in today, a drop-in for Temporal tomorrow. Kill a run after a thousand documents and resume; the final counts are correct and nothing is double-loaded.
# ... process dies mid-wave ...
$ miq run resume --run-id r1
✓ resumed from checkpoint · documents loaded twice: 0
✓ source == staged == loaded == queryable
5 · Fidelity & privacy
Lossless or it doesn't load. Private or it doesn't move.
Fail-closed fidelity. MigrateIQ models the defining property of AFP: appearance lives in external resources. The renderer fails closed with a MissingResourceError if any declared resource is absent, and the transform refuses to produce a rendition. A render-diff then proves the AFP→PDF transform is visually lossless, with a similarity of 1.0. A document that cannot be rendered correctly is never loaded — full stop.
renderAfp(doc) → MissingResourceError: overlay O1MAIL not found
transform: REFUSED (fail-closed)
// a complete document is proven lossless
render-diff similarity: 1.0 status: PASS
PHI handled, not leaked. Sensitive fields are detected during discovery. A field like MemberSSN with no clean target becomes an explicit quarantine row — kept with a target: null so it is never silently dropped, redacted before it is ever sent to the AI, and never written to a log. Privacy is enforced by the pipeline's structure, not by reviewer discipline.
ai_input: "•••-••-••••" (redacted before send)
audit log value: never recorded
6 · Evidence & audit
Auditor-ready proof — emitted, not assembled.
Compliance is a property of the platform, not a report you write afterward. Two mechanisms make every wave defensible:
- Hash-chained audit ledger. Every action — discover, map, approve, load, validate — appends to an append-only ledger where each entry includes the hash of the previous one. Tamper with any record and verification fails; the evidence pack itself refuses to go green if the chain is broken.
- Signed, reproducible evidence pack. Each wave emits an HTML + JSON pack: reconciled counts, verified checksums, render-diff results, and a signature. Because the engine is deterministic, the same input yields the same signature on every run — an independently verifiable claim.
These map directly to the controls regulators already ask about:
| Framework | What the evidence pack demonstrates |
|---|---|
| SOX | Completeness & integrity of financial records; segregation via the approval gate |
| HIPAA | PHI minimization, quarantine, and redaction; access logging |
| PCI-DSS | Sensitive-field handling and tamper-evident audit trail |
| SEC 17a-4 | Non-rewriteable, time-sequenced records with verifiable retention |
| ISO 27001 / SOC 2 | Change control, immutability, and reproducible verification |
7 · Architecture
Every load-bearing seam is real; scale drops in.
The Phase 1 vertical slice is a single, genuinely runnable program that proves the entire thesis end-to-end — with every architecturally load-bearing component real: the canonical model, connector SDK, lineage, hash-chained audit, approval gate, transform, render-diff and reconciliation. Infrastructure that only adds operational scale is the documented next phase, and the seams are deliberately designed so it drops in without changing the engines.
| Capability | Today (proven) | Next phase (drop-in seam) |
|---|---|---|
| Orchestration | Checkpointed activities | Temporal workflows |
| Stage events | Audit ledger emissions | Kafka topics |
| Stores | File-backed behind a narrow API | Postgres + object store |
| CMOD target | Simulator emitting real arsxml/index | Real CMOD via ODWEK |
| AI copilot | Swappable heuristic provider | Azure OpenAI |
| Connectors | TypeScript reference impl | Polyglot (Java/Spring), language-agnostic SDK |
A separate read-only console — the control room shown in the live demo — visualizes runs, the mapping studio, validation, lineage and the audit ledger over a thin BFF API.
8 · Add-on · on the roadmap
CMOD Report Portal — getting reports back out.
Migration gets your reports into CMOD. The CMOD Report Portal is the planned companion web app that gets them back out — a fast, modern search-and-retrieve experience over everything MigrateIQ has loaded, replacing the legacy green-screen client.
- Search & retrieve. Full-text and indexed search across migrated CMOD report groups, with instant PDF preview in any browser — no thick client.
- Entitlement-scoped access. Built on the same IDOR-proof access layer proven in the platform: a token scoped to one customer is denied — with a 403 — any other customer's document.
- Audited retrieval. Every search and every retrieval writes to the same hash-chained ledger — so access to migrated records is as provable as the migration itself.
9 · Roadmap
From proof to platform.
- Phase 1 · now
- End-to-end AFP migration with a green evidence pack, restartable load, AI mapping copilot, PHI handling, and an IDOR-proof secure access layer — runnable today.
- Phase 2 · next
- Operational scale: Temporal, Kafka, Postgres, real CMOD via ODWEK, Azure OpenAI copilot, and polyglot connectors — dropping into seams already in place.
- Add-on
- CMOD Report Portal — search, preview and retrieve migrated reports, entitlement-scoped and fully audited.
- Beyond
- Additional sources (FileNet, Documentum, OnBase) and targets, on the same canonical model and the same evidence guarantee.
Retire the legacy archive. Keep the proof.
The platform is runnable today. Walk the executive deck, or open the live control-room console and see the green evidence pack for yourself.