Trace lifecycle

This is the current operational path for a trace from local collection to reviewable SellTraces data.

1. Local discovery

The CLI looks for supported transcript locations and source kinds. Discovery does not upload transcript content. Current source adapters include Claude Code projects, Cursor SQLite state, Codex sessions, Gemini CLI, OpenCode, and the local trace index.

2. Scan and normalize

scanLocation() and scanChangedPath() parse source-specific files or SQLite rows into normalized trace objects through the shared ingest package. This is the first structural mutation: local source files become trace objects with source, source id, messages, timestamps, client/provider data, environment metadata, and optional fields such as title or summary.

3. Local upload plan

Every CLI upload path builds an upload plan before any HTTP request:
scan local source
  -> source policy
  -> blocked terms
  -> local PII preflight
  -> repeat-sync and duplicate filtering
  -> request chunking
  -> post accepted chunks
Accepted traces can first differ from local transcript content after local redaction. Long traces remain upload candidates. The CLI does not skip a trace because it exceeds the previous token-count, message-count, per-message content, or serialized byte thresholds; request chunking isolates large accepted traces instead.

4. Web ingest request

POST /api/ingest authenticates the user, validates JSON or ZIP upload shape, recounts CLI JSON message tokens, writes the upload payload to blob storage, inserts an ingest_jobs row, and returns 202. Heavy parse, scrub, score, and write work happens outside the web request.

5. Worker claim and parse

The ingest worker claims queued jobs with row locking, updates status to parsing, loads the raw upload blob, parses ZIP exports or decodes JSON traces, then marks the job ingesting. Job statuses:
queued -> parsing -> ingesting -> done
queued -> parsing/ingesting -> failed

6. Server pipeline

For each trace, the server pipeline:
  1. Normalizes the trace again.
  2. Computes an original_hash from normalized role/content text unless supplied.
  3. Runs server-side scrub.
  4. Inserts a pii_rejections row and stops if scrub rejects the trace.
  5. Scores quality and uniqueness.
  6. Estimates value.
  7. Checks for existing duplicates by (user_id, source, source_id) or original_hash.
  8. Inserts a new traces row and message rows for new traces.
  9. Writes per-trace raw and derived blobs.
The derived blob is the scrubbed trace JSON used for downloads.

7. Duplicate handling

Duplicate uploads do not create new trace rows. The pipeline may refresh nullable metadata on the canonical trace, such as model, summary, environment, timing, task type, or quality metadata. It does not rewrite existing message rows or per-trace blobs during that metadata refresh.

8. Dashboard review

Dashboard include/exclude actions update traces.sell_excluded for unsold traces. Once a trace has a sale_traces row, include/exclude mutations are rejected because the trace has already sold. Trace downloads stream the derived scrubbed blob and do not mutate the trace.

9. Sale and payout state

Sales attach traces through sale_traces. Payouts attach user payment state through payouts. Once sold, a trace remains part of that sale record even if later dashboard filters or review state change elsewhere.