Cellv0.2.5
Getting Started

Model Configuration

Configure which AI models run each pipeline pass

Cell's extraction pipeline uses different models for different passes. You can use the defaults, a single model for everything, or assign models per role.

Default configuration

If you don't pass a models option, Cell uses Anthropic defaults via createDefaultModelConfig():

RoleDefault ModelPurpose
classificationClaude Haiku 4.5Fast document type detection
metadataClaude Sonnet 4.6Accurate metadata extraction
sectionsClaude Haiku 4.5Chunked section extraction
sectionsFallbackClaude Sonnet 4.6Retry when sections truncate
enrichmentClaude Haiku 4.5Supplementary field parsing

Default models require @ai-sdk/anthropic as a peer dependency. It's lazy-imported — consumers using other providers never need it installed.

Uniform model

Use the same model for every pass:

import { createAnthropic } from "@ai-sdk/anthropic";
import { extractFromPdf, createUniformModelConfig } from "@claritylabs-inc/cell";

const anthropic = createAnthropic();
const { extracted } = await extractFromPdf(pdfBase64, {
  models: createUniformModelConfig(anthropic("claude-sonnet-4-6")),
});

Fine-grained configuration

Assign a different model to each pipeline role:

import { createAnthropic } from "@ai-sdk/anthropic";
import { extractFromPdf, type ModelConfig } from "@claritylabs-inc/cell";

const anthropic = createAnthropic();
const models: ModelConfig = {
  classification: anthropic("claude-haiku-4-5-20251001"),
  metadata: anthropic("claude-sonnet-4-6"),
  sections: anthropic("claude-haiku-4-5-20251001"),
  sectionsFallback: anthropic("claude-sonnet-4-6"),
  enrichment: anthropic("claude-haiku-4-5-20251001"),
};

const { extracted } = await extractFromPdf(pdfBase64, { models });

Non-Anthropic providers

Cell works with any provider that implements the Vercel AI SDK LanguageModel interface:

import { createOpenAI } from "@ai-sdk/openai";
import { extractFromPdf, createUniformModelConfig } from "@claritylabs-inc/cell";

const openai = createOpenAI();
const { extracted } = await extractFromPdf(pdfBase64, {
  models: createUniformModelConfig(openai("gpt-4o")),
  metadataProviderOptions: {},  // disable Anthropic-specific thinking
});
import { createGoogleGenerativeAI } from "@ai-sdk/google";
import { extractFromPdf, createUniformModelConfig } from "@claritylabs-inc/cell";

const google = createGoogleGenerativeAI();
const { extracted } = await extractFromPdf(pdfBase64, {
  models: createUniformModelConfig(google("gemini-2.0-flash")),
  metadataProviderOptions: {},
});

When using non-Anthropic providers, set metadataProviderOptions: {} and fallbackProviderOptions: {} to disable Anthropic-specific extended thinking, which is enabled by default.

Token limits

Cell sets per-role token limits based on the task, not the provider:

RoleMax Output Tokens
classification512
metadata4,096
sections8,192
sectionsFallback16,384
enrichment4,096

These are exported as MODEL_TOKEN_LIMITS for reference but are managed internally by the pipeline.

Provider options

The metadataProviderOptions and fallbackProviderOptions fields pass provider-specific configuration (like Anthropic extended thinking) through to the Vercel AI SDK:

const { extracted } = await extractFromPdf(pdfBase64, {
  metadataProviderOptions: {
    anthropic: { thinking: { type: "enabled", budgetTokens: 8192 } },
  },
});

By default, Anthropic thinking is enabled with a 4,096 token budget for metadata and fallback calls.

On this page