Cellv0.2.5
API Reference

Extraction API

Complete reference for document extraction functions

Pipeline functions

classifyDocumentType(pdf, options?)

Classify a document as a policy or quote.

ParameterTypeDescription
pdfstringBase64-encoded PDF
optionsClassifyOptionsOptional configuration

Returns: Promise<{ documentType: "policy" | "quote"; confidence: number; signals: string[] }>

const { documentType, confidence, signals } = await classifyDocumentType(pdfBase64);

extractFromPdf(pdf, options?)

Full policy extraction pipeline (passes 1-3).

ParameterTypeDescription
pdfstringBase64-encoded PDF
optionsExtractOptionsOptional configuration

Returns: Promise<{ rawText: string; extracted: any }>

const { rawText, extracted } = await extractFromPdf(pdfBase64, {
  log: async (msg) => console.log(msg),
  onMetadata: async (raw) => await saveMetadata(raw),
  models: customModels,
});

extractQuoteFromPdf(pdf, options?)

Full quote extraction pipeline (passes 1-2).

Parameters: Same as extractFromPdf.

Returns: Promise<{ rawText: string; extracted: any }>


extractSectionsOnly(pdf, metadataRaw, options?)

Retry section extraction using saved metadata from a prior pass 1.

ParameterTypeDescription
pdfstringBase64-encoded PDF
metadataRawstringJSON string from a previous pass 1
optionsExtractSectionsOptionsOptional configuration

Returns: Promise<{ rawText: string; extracted: any }>


enrichSupplementaryFields(document, models?, log?)

Pass 3 enrichment. Parses raw text into structured supplementary fields. Non-fatal — returns the document unchanged on failure.

ParameterTypeDescription
documentanyDocument object with raw supplementary fields
modelsModelConfigOptional model config
logLogFnOptional logger

Returns: Promise<any>

Mapping functions

applyExtracted(extracted)

Map raw policy extraction JSON to persistence-ready fields.

Returns: Object with carrier, policyNumber, coverages, effectiveDate, expirationDate, etc.


applyExtractedQuote(extracted)

Map raw quote extraction JSON to persistence-ready fields.

Returns: Object with quoteNumber, premiumBreakdown, subjectivities, proposedEffectiveDate, etc.

Merge functions

mergeChunkedSections(metadataResult, sectionChunks)

Merge section chunks from policy extraction. Combines sections and takes the last non-null supplementary field.


mergeChunkedQuoteSections(metadataResult, sectionChunks)

Merge section chunks from quote extraction. Also accumulates subjectivities and underwriting conditions.

Utility functions

getPageChunks(totalPages, chunkSize?)

Calculate page ranges for chunked extraction.

ParameterTypeDefaultDescription
totalPagesnumberTotal page count
chunkSizenumber30Pages per chunk

Returns: Array<[number, number]>


stripFences(text)

Remove markdown code fences from AI response text.


sanitizeNulls<T>(obj)

Recursively convert null values to undefined.

Options types

ExtractOptions

interface ExtractOptions {
  log?: LogFn;
  onMetadata?: (raw: string) => Promise<void>;
  models?: ModelConfig;
  metadataProviderOptions?: ProviderOptions;
  fallbackProviderOptions?: ProviderOptions;
}

ClassifyOptions

interface ClassifyOptions {
  log?: LogFn;
  models?: ModelConfig;
}

ExtractSectionsOptions

interface ExtractSectionsOptions {
  log?: LogFn;
  promptBuilder?: PromptBuilder;
  models?: ModelConfig;
  fallbackProviderOptions?: ProviderOptions;
}

LogFn

type LogFn = (message: string) => Promise<void>;

Constants

ConstantValueDescription
SONNET_MODEL"claude-sonnet-4-6"Default Sonnet model ID
HAIKU_MODEL"claude-haiku-4-5-20251001"Default Haiku model ID

On this page