Skip to main content

Evidence Architecture

Six layers. Every quantity defensible.

The data model that makes 'click any number, see the source' possible. Technical deep-dive for engineering reviewers, security teams, and customers running compliance evaluations.

The six layers

From ingestion to export, evidence-preserving end-to-end

L1

Ingestion

Vector PDFs (born-vector engineering drawings) are extracted directly. Raster PDFs and scans go through the vision-model OCR pipeline. Both produce structured tokens with sheet + region + page-coordinate metadata captured at this layer.

Data model (simplified)

ingestion_artifact { id, project_id, sheet_id, ingestion_class: 'vector'|'raster', source_uri, raw_extraction_blob, processed_at }

Failure mode handled at this layer

OCR fails on heavily-skewed scans → fallback to lower-confidence raster pipeline + flagged in review queue rather than silently dropped. Vector parser fails on encrypted PDFs → surfaced to user with clear remediation path.

L2

Recognizer

Per-trade-context symbol recognizer matches ingestion tokens against the org's symbol library. Each detection captures a confidence score, the recognizer model version, and the symbol-library entry used. Active-learning pipeline updates symbol entries on confirmed annotations.

Data model (simplified)

detection { id, ingestion_artifact_id, symbol_library_entry_id, confidence, recognizer_model_version, region_polygon, page_coords, created_at }

Failure mode handled at this layer

Recognizer mis-classifies a custom symbol → confidence stays below review-queue floor → estimator confirms or corrects → annotation feeds active-learning loop → next training round sharpens the recognizer.

L3

Spec linking

Detected items link bi-directionally to spec section paragraphs via the spec-link resolver (CSI 40% / keyword 35% / equipment-tag 25%). Each link carries a score + evidence path that survives serialization to the audit log.

Data model (simplified)

spec_link { id, detection_id, spec_section_id, paragraph_offset, score, evidence_signals[], created_at }

Failure mode handled at this layer

Spec section absent or rewritten in addendum → spec-link goes stale → diff service flags broken back-references → estimator re-anchors during review.

L4

Evidence trail

Every takeoff line carries a complete evidence chain: detection → spec link → recognizer model version → ingestion artifact. Click any quantity in the UI and you walk the chain to the original sheet + region. Audit log records every chain traversal.

Data model (simplified)

takeoff_line { id, project_id, csi, qty, unit_cost, ext_cost, source_chain: [detection_ids, spec_link_ids, ...], chain_hash }

Failure mode handled at this layer

User asks 'where did this 212-receptacle count come from?' → click-through walks the entire chain in <500ms. No black-box outputs.

L5

Audit log

Append-only audit log captures every chain modification, evidence-trail traversal, recognizer training event, and human override. Hash-chain validation makes tampering detectable. Replicated to write-once object storage with 7-year retention.

Data model (simplified)

audit_event { id, event_type, actor_id, target_object_type, target_object_id, payload_hash, prev_event_hash, immutable_at, retained_until }

Failure mode handled at this layer

Federal audit asks 'who changed this quantity, when, why' → audit log returns the full event chain. Hash-chain validates no tampering. Object-lock prevents delete.

L6

Export + proposal

Customer-facing exports (proposal PDF, RFQ packets, audit-trail CSV) preserve the evidence chain in human-readable form. Each line in the proposal has a footnote referencing the sheet, region, and recognizer call that produced it.

Data model (simplified)

proposal_artifact { id, project_id, format: 'pdf'|'csv'|'docx', evidence_chain_embedded: bool, generated_at, sha256 }

Failure mode handled at this layer

Customer needs to defend a number in front of an owner → the proposal PDF itself contains the evidence references. Owner clicks the footnote, sees the sheet excerpt embedded in the audit-trail attachment.

Why this architecture

What it enables that conventional black-box AI doesn't

Defensible-bid documentation

Every quantity has a paper trail. Customers defend bids in front of owners with the click-through evidence; auditors get the same view. No 'just trust the AI' moments.

Active-learning that compounds

Because every detection carries its source, customer corrections feed back into the symbol library with full context. The recognizer gets shop-specific over weeks rather than starting from a generic prior.

Audit-ready by construction

The audit log isn't a bolt-on; it's structurally part of every chain modification. Federal procurement audits and SOX-style compliance reviews pass because the data model was designed for them from the start.

Architecture review?

We'll do a 60-min walkthrough.

For technical-evaluator audiences: senior engineers, security teams, federal-procurement officers. We'll walk the data model, the audit log, the active-learning loop, and any specific evidence-chain scenarios you want to validate. NDA optional.

Evidence Architecture — OmniTakeoff