Ingestion
Vector PDFs (born-vector engineering drawings) are extracted directly. Raster PDFs and scans go through the vision-model OCR pipeline. Both produce structured tokens with sheet + region + page-coordinate metadata captured at this layer.
Data model (simplified)
ingestion_artifact { id, project_id, sheet_id, ingestion_class: 'vector'|'raster', source_uri, raw_extraction_blob, processed_at }Failure mode handled at this layer
OCR fails on heavily-skewed scans → fallback to lower-confidence raster pipeline + flagged in review queue rather than silently dropped. Vector parser fails on encrypted PDFs → surfaced to user with clear remediation path.