Parser Service

Understand the isolated parser service that extracts text and page metadata from source files.

The parser service is a separate authenticated service for document parsing and provider-specific LLM proxy paths that cannot run directly inside Convex.

It is stateless and protected by a shared secret.

Responsibilities

ResponsibilityDetail
Download source filesOnly from configured allowed hosts.
Enforce boundsMax file bytes, max pages, OCR settings, and parser concurrency.
Extract textParse PDF and supported document formats into text and page metadata.
OCR when enabledUse selective OCR for scanned pages when configured.
Return structured errorsPreserve enough failure detail for retry and user messaging.

Failure handling

Transient infrastructure failures can be retried through the processing queue. Invalid files, unsupported types, missing documents, and access errors should fail fast.