Photo
CamScanner-style filter strip (Original, Omnifix, Enhance, No-shadow, Eco, B&W, …), AI deblur / enhance text / remove background, default 1280px JPEG q65 preset.
A modern reincarnation of Catch Notes (3banana, 2010–2013), rebuilt for multimodal AI. Text, images, audio, video, forwarded chats — captured, understood, organized, and mirrored as plain markdown into a git repository you own.
Inside CaptureX BO, every persisted capture body is composed of exactly four top-level H2 sections, in canonical order.
## Audit Trail — the steps and models that ran## Capture File — source, attachments, geo, lineage tags## Summary — one paragraph, written by the crafter## Note — the reusable content, structured with H3+The schema is enforced at the BO layer (composePlazaCaptureNoteBody), defended at the mobile chat-render layer, and asserted by the auditor before persistence.
A realm is your personal partition: work, research, family, side-project, client A, client B. Each realm has its own Plaza skill bundle, its own default Inbox and Memory space, its own mirror folder, and its own data driver.
Realm · work
Realm · research
Realm · family
All three coexist; one click in the AppBar switches the active realm, refreshes the skill plaza, and re-scopes search.
Pick a realm to use the local_git data driver and CaptureX will mirror that realm into a single absolute folder on disk — as plain markdown, with a deterministic layout.
The git mirror is the canonical store in v1 — read it with any text editor, version it like code, and back it up wherever you back up your code.
CaptureX is not a generic note app with a bolt-on sync layer. The backend (CaptureX BO) is a NestJS service you run locally or on your own infra — mobile and desktop clients talk to it over HTTP only.
Pair it with your own LLM keys and the deployment is one operator running python ops.py restart.
Capture clients
CaptureX BO (NestJS)
Mobile + Desktop apps (Flutter)
Per-realm git repository
Receipts, whiteboards, voice memos, screenshots, forwarded chats, PDFs — all funnel through the same orchestrator with the right vendor and the right prompt.
CamScanner-style filter strip (Original, Omnifix, Enhance, No-shadow, Eco, B&W, …), AI deblur / enhance text / remove background, default 1280px JPEG q65 preset.
Voice memos transcribed and summarized; cloud TTS through BO HTTP only — default Volcengine / Doubao, OpenAI-compat fallback. Read-aloud on every capture.
Drag, paste, or share PDFs and Office docs into the AgentSheet composer. BO extracts text + structure; the auditor flags low-quality extraction.
Forward a chat thread from any messenger and it lands as a capture with the original quotes preserved, the senders tagged, and the action items extracted.
Photo EXIF GPS + text auto-parse + map-picker (Google / Apple / Amap / inline OSM). Every capture remembers where it came from.
Vision LLM through a vendor-slot (Gemini default), TTS through Volcengine/Doubao with OpenAI-compat fallback, and keys via Service Profile or external secrets file — never client-to-vendor.
We design for the long arc. The GET /v1/captures endpoint supports cursor pagination with stub / full include modes; the local-git driver imports in batches with a resumable checkpoint.