Client-Side OCR API Reference

OCR Engines
Core Classes
Processing Methods
Configuration Options
Result Types
Utility Functions
React Components
Events and Callbacks
Error Handling

OCR Engines

`createRapidOCREngine(config: RapidOCRConfig): RapidOCREngine`

Creates a RapidOCR engine instance with multi-language support.

interface RapidOCRConfig {
  language: LanguageCode;        // Language code (e.g., 'en', 'ch', 'ja')
  modelVersion?: 'PP-OCRv4' | 'PP-OCRv5';  // Default: 'PP-OCRv4'
  modelType?: 'mobile' | 'server';          // Default: 'mobile'
  baseUrl?: string;              // Base URL for model files
  cacheModels?: boolean;         // Cache models in IndexedDB
  workerConfig?: WorkerConfig;   // Web Worker configuration
}

Example:

const ocr = createRapidOCREngine({
  language: 'en',
  modelVersion: 'PP-OCRv4',
  modelType: 'mobile',
  cacheModels: true
});

`createPPUEngine(config: PPUConfig): PPUEngine`

Creates a PPU PaddleOCR engine instance (English only).

interface PPUConfig {
  modelPath: string;        // Path to PPU model files
  cacheModels?: boolean;    // Cache models in IndexedDB
  workerConfig?: WorkerConfig;
}

`createOCREngine(config: OCRConfig): OCREngine`

Generic factory function that creates appropriate engine based on config.

Core Classes

`RapidOCREngine`

Main OCR engine class for RapidOCR models.

Methods

`initialize(): Promise<void>`

Initializes the engine and downloads models if needed.

await ocr.initialize();

`processImage(input: ImageInput, options?: ProcessOptions): Promise<OCRResult>`

Processes an image and returns OCR results.

const result = await ocr.processImage(file, {
  enableWordSegmentation: true,
  enableTextClassification: true
});

`processBatch(inputs: ImageInput[], options?: BatchOptions): Promise<OCRResult[]>`

Processes multiple images in parallel.

const results = await ocr.processBatch(images, {
  maxConcurrent: 3,
  onProgress: (current, total) => console.log(`${current}/${total}`)
});

`cleanup(): Promise<void>`

Cleans up resources and terminates workers.

await ocr.cleanup();

`areModelsCached(): Promise<boolean>`

Checks if models are cached locally.

`clearCache(): Promise<void>`

Clears cached models from IndexedDB.

`ModelDownloader`

Handles model downloading and caching.

Methods

`downloadModels(config: ModelConfig): Promise<ModelPaths>`

Downloads OCR models with progress tracking.

const downloader = new ModelDownloader({
  onProgress: (progress) => console.log(`${progress.percent}% complete`)
});

const paths = await downloader.downloadModels({
  language: 'en',
  modelVersion: 'PP-OCRv4',
  modelType: 'mobile'
});

`MetaONNXLoader`

Loads ONNX models with embedded metadata.

Static Methods

`loadModel(modelPath: string): Promise<MetaONNXModel>`

Loads an ONNX model with metadata support.

`getDictionary(model: MetaONNXModel, dictPath?: string): Promise<string[]>`

Extracts dictionary from model metadata or external file.

`hasEmbeddedDictionary(model: MetaONNXModel): boolean`

Checks if model has embedded dictionary.

Processing Methods

Image Input Types

type ImageInput = 
  | File                    // File from input element
  | Blob                    // Blob data
  | string                  // URL or base64 data URI
  | HTMLImageElement        // Image element
  | HTMLCanvasElement       // Canvas element
  | ImageData              // Raw image data
  | ArrayBuffer;           // Binary image data

Process Options

interface ProcessOptions {
  // Feature flags
  enableWordSegmentation?: boolean;      // Enable word-level segmentation
  enableTextClassification?: boolean;    // Enable rotation detection
  returnConfidence?: boolean;            // Return confidence scores
  
  // Preprocessing configuration
  preprocessConfig?: PreprocessConfig;
  
  // Postprocessing configuration  
  postprocessConfig?: PostprocessConfig;
  
  // Performance options
  useGPU?: boolean;                     // Use WebGL acceleration
  batchSize?: number;                   // Recognition batch size
}

Preprocessing Configuration

interface PreprocessConfig {
  // Detection preprocessing
  detectImageNetNorm?: boolean;         // Use ImageNet normalization
  maxSideLen?: number;                  // Max image side length (default: 960)
  detectImageMode?: 'scale' | 'pad';    // Resize mode
  
  // Recognition preprocessing
  recStandardNorm?: boolean;            // Standard normalization
  recImageShape?: [number, number, number]; // [C, H, W]
  
  // General preprocessing
  grayscale?: boolean;                  // Convert to grayscale
  denoise?: boolean;                    // Apply denoising
  deskew?: boolean;                     // Correct skew
  contrast?: number;                    // Contrast adjustment (0.5-2.0)
  brightness?: number;                  // Brightness adjustment (-100 to 100)
  sharpen?: boolean;                    // Apply sharpening
}

Postprocessing Configuration

interface PostprocessConfig {
  // DB postprocessing
  unclipRatio?: number;                 // Text region expansion (default: 2.0)
  boxThresh?: number;                   // Box confidence threshold (default: 0.7)
  minBoxSize?: number;                  // Minimum box size in pixels
  
  // Text postprocessing
  mergeRotatedBoxes?: boolean;          // Merge boxes for rotated text
  filterOverlapping?: boolean;          // Remove overlapping boxes
  sortMode?: 'top-to-bottom' | 'left-to-right';
}

Configuration Options

Language Codes

type LanguageCode = 
  | 'ch'    // Chinese (Simplified & Traditional)
  | 'en'    // English
  | 'ja'    // Japanese
  | 'ko'    // Korean
  | 'fr'    // French
  | 'de'    // German
  | 'es'    // Spanish
  | 'pt'    // Portuguese
  | 'it'    // Italian
  | 'ru'    // Russian
  | 'ar'    // Arabic
  | 'hi'    // Hindi
  | 'vi'    // Vietnamese
  | 'id'    // Indonesian
  | 'fa'    // Persian
  | 'ka'    // Kannada
  // ... 100+ more languages

Worker Configuration

interface WorkerConfig {
  numWorkers?: number;          // Number of workers (default: CPU cores)
  workerPath?: string;          // Custom worker script path
  terminateOnIdle?: boolean;    // Terminate idle workers
  idleTimeout?: number;         // Idle timeout in ms
  wasmPath?: string;           // Custom WASM file path
}

Result Types

OCRResult

Main result object returned by processing methods.

interface OCRResult {
  // Text content
  text: string;                        // Full extracted text
  lines: TextLine[];                   // Text lines with positions
  wordBoxes?: WordBox[];              // Word-level boxes (if enabled)
  
  // Metadata
  confidence: number;                  // Overall confidence (0-1)
  angle?: 0 | 180;                    // Detected rotation
  language?: string;                   // Detected language
  
  // Performance metrics
  processingTime: ProcessingTime;      // Time breakdown
  
  // Debug info
  debugInfo?: DebugInfo;              // Additional debug data
}

TextLine

Individual text line with position and confidence.

interface TextLine {
  text: string;                        // Line text content
  box: BoundingBox;                    // Position coordinates
  confidence: number;                  // Line confidence (0-1)
  words?: WordBox[];                   // Word boxes within line
}

BoundingBox

Quadrilateral bounding box for text regions.

interface BoundingBox {
  topLeft: Point;
  topRight: Point;
  bottomRight: Point;
  bottomLeft: Point;
}

interface Point {
  x: number;
  y: number;
}

WordBox

Word-level segmentation result.

interface WordBox {
  text: string;                        // Word text
  box: BoundingBox;                    // Word position
  confidence: number;                  // Word confidence
  isNumeric?: boolean;                 // Is pure number
  isAlpha?: boolean;                   // Is pure letters
  language?: string;                   // Detected language
}

ProcessingTime

Performance metrics breakdown.

interface ProcessingTime {
  total: number;                       // Total time in ms
  detection: number;                   // Detection stage time
  recognition: number;                 // Recognition stage time
  classification?: number;             // Classification time
  preprocessing?: number;              // Preprocessing time
  postprocessing?: number;            // Postprocessing time
  modelLoading?: number;              // Model loading time
}

Utility Functions

Image Processing

`preprocessImage(input: ImageInput, config: PreprocessConfig): Promise<ImageData>`

Applies preprocessing to an image.

const enhanced = await preprocessImage(image, {
  grayscale: true,
  denoise: true,
  contrast: 1.5
});

`resizeImage(input: ImageInput, options: ResizeOptions): Promise<Blob>`

Resizes an image while maintaining quality.

interface ResizeOptions {
  maxWidth?: number;
  maxHeight?: number;
  maintainAspectRatio?: boolean;
  format?: 'jpeg' | 'png' | 'webp';
  quality?: number;  // 0-1 for lossy formats
}

`detectTextOrientation(input: ImageInput): Promise<0 | 90 | 180 | 270>`

Detects text orientation in an image.

Text Processing

`mergeTextLines(lines: TextLine[], options?: MergeOptions): string`

Merges text lines into paragraphs.

interface MergeOptions {
  lineSpacing?: number;        // Threshold for paragraph breaks
  horizontalSpacing?: number;  // Threshold for word spacing
  preserveLayout?: boolean;    // Maintain spatial layout
}

`extractPatterns(text: string): PatternMatches`

Extracts common patterns from text.

interface PatternMatches {
  emails: string[];
  phones: string[];
  urls: string[];
  dates: string[];
  numbers: string[];
  currencies: string[];
}

Language Detection

`detectLanguage(input: ImageInput | string): Promise<LanguageCode>`

Detects the primary language of text.

const language = await detectLanguage(image);
// or
const language = await detectLanguage("Hello world");

`getSupportedLanguages(): LanguageInfo[]`

Returns list of supported languages.

interface LanguageInfo {
  code: LanguageCode;
  name: string;
  nativeName: string;
  direction: 'ltr' | 'rtl';
  supportedModels: ModelVersion[];
}

React Components

`<RapidOCRInterface>`

Full-featured OCR interface component.

interface RapidOCRInterfaceProps {
  // Configuration
  defaultLanguage?: LanguageCode;
  modelVersion?: ModelVersion;
  modelType?: ModelType;
  
  // Features
  enableWordSegmentation?: boolean;
  enableTextClassification?: boolean;
  enablePreprocessing?: boolean;
  showPerformanceMetrics?: boolean;
  
  // Callbacks
  onResult?: (result: OCRResult) => void;
  onError?: (error: Error) => void;
  onProgress?: (progress: Progress) => void;
  
  // UI customization
  theme?: 'light' | 'dark' | 'auto';
  locale?: string;
  customStyles?: React.CSSProperties;
}

`<OCRDropzone>`

Drag-and-drop file upload component.

interface OCRDropzoneProps {
  onFilesSelected: (files: File[]) => void;
  accept?: string[];              // MIME types
  maxFiles?: number;
  maxSize?: number;              // Bytes
  multiple?: boolean;
  disabled?: boolean;
  className?: string;
}

`<OCRResultViewer>`

Component for displaying OCR results.

interface OCRResultViewerProps {
  result: OCRResult;
  showConfidence?: boolean;
  showBoundingBoxes?: boolean;
  highlightMode?: 'word' | 'line' | 'none';
  onTextSelect?: (text: string) => void;
  editable?: boolean;
}

React Hooks

`useOCR(config?: OCRConfig): UseOCRResult`

React hook for OCR functionality.

interface UseOCRResult {
  // Methods
  processImage: (input: ImageInput) => Promise<OCRResult>;
  processBatch: (inputs: ImageInput[]) => Promise<OCRResult[]>;
  cancel: () => void;
  
  // State
  processing: boolean;
  progress: number;  // 0-1
  result: OCRResult | null;
  error: Error | null;
  
  // Engine management
  initialize: () => Promise<void>;
  cleanup: () => Promise<void>;
  initialized: boolean;
}

Events and Callbacks

Progress Events

interface Progress {
  stage: 'download' | 'initialize' | 'process';
  current: number;
  total: number;
  percent: number;
  message?: string;
}

// Usage
const ocr = createRapidOCREngine({
  language: 'en',
  onProgress: (progress) => {
    console.log(`${progress.stage}: ${progress.percent}%`);
  }
});

Model Download Progress

interface DownloadProgress {
  file: string;              // Current file
  bytesLoaded: number;
  bytesTotal: number;
  filesCompleted: number;
  filesTotal: number;
  percent: number;
}

Error Handling

Error Types

class OCRError extends Error {
  code: OCRErrorCode;
  details?: any;
}

enum OCRErrorCode {
  // Initialization errors
  MODEL_DOWNLOAD_FAILED = 'MODEL_DOWNLOAD_FAILED',
  MODEL_LOAD_FAILED = 'MODEL_LOAD_FAILED',
  WORKER_INIT_FAILED = 'WORKER_INIT_FAILED',
  
  // Processing errors
  INVALID_IMAGE = 'INVALID_IMAGE',
  PROCESSING_FAILED = 'PROCESSING_FAILED',
  TIMEOUT = 'TIMEOUT',
  
  // Resource errors
  OUT_OF_MEMORY = 'OUT_OF_MEMORY',
  WEBGL_NOT_SUPPORTED = 'WEBGL_NOT_SUPPORTED',
  WORKER_NOT_SUPPORTED = 'WORKER_NOT_SUPPORTED',
  
  // Configuration errors
  INVALID_LANGUAGE = 'INVALID_LANGUAGE',
  INVALID_MODEL = 'INVALID_MODEL',
  INVALID_CONFIG = 'INVALID_CONFIG'
}

Error Handling Examples

try {
  const result = await ocr.processImage(file);
} catch (error) {
  if (error instanceof OCRError) {
    switch (error.code) {
      case OCRErrorCode.INVALID_IMAGE:
        console.error('Invalid image format');
        break;
      case OCRErrorCode.OUT_OF_MEMORY:
        console.error('Image too large, try reducing size');
        break;
      default:
        console.error('OCR failed:', error.message);
    }
  }
}

Retry Logic

async function processWithRetry(
  image: ImageInput, 
  maxRetries = 3
): Promise<OCRResult> {
  let lastError;
  
  for (let i = 0; i < maxRetries; i++) {
    try {
      return await ocr.processImage(image);
    } catch (error) {
      lastError = error;
      
      // Exponential backoff
      await new Promise(resolve => 
        setTimeout(resolve, Math.pow(2, i) * 1000)
      );
    }
  }
  
  throw lastError;
}

Advanced Topics

Custom Model Integration

// Register custom ONNX model
const customOCR = new RapidOCREngine({
  customModels: {
    detection: '/models/custom-det.onnx',
    recognition: '/models/custom-rec.onnx',
    dictionary: '/models/custom-dict.txt'
  }
});

WebAssembly Configuration

// Configure ONNX Runtime WASM settings
const ocr = createRapidOCREngine({
  language: 'en',
  wasmConfig: {
    numThreads: 4,
    simd: true,
    proxy: true,
    fetchOptions: {
      cache: 'force-cache'
    }
  }
});

Performance Monitoring

// Enable detailed performance metrics
const ocr = createRapidOCREngine({
  language: 'en',
  debug: true,
  collectMetrics: true
});

const result = await ocr.processImage(image);
console.log('Detailed metrics:', result.debugInfo.metrics);

FilesExpand file tree

API.md

Latest commit

History

API.md

File metadata and controls

Client-Side OCR API Reference

Table of Contents

OCR Engines

createRapidOCREngine(config: RapidOCRConfig): RapidOCREngine

createPPUEngine(config: PPUConfig): PPUEngine

createOCREngine(config: OCRConfig): OCREngine

Core Classes

RapidOCREngine

Methods

initialize(): Promise<void>

processImage(input: ImageInput, options?: ProcessOptions): Promise<OCRResult>

processBatch(inputs: ImageInput[], options?: BatchOptions): Promise<OCRResult[]>

cleanup(): Promise<void>

areModelsCached(): Promise<boolean>

clearCache(): Promise<void>

ModelDownloader

Methods

downloadModels(config: ModelConfig): Promise<ModelPaths>

MetaONNXLoader

Static Methods

loadModel(modelPath: string): Promise<MetaONNXModel>

getDictionary(model: MetaONNXModel, dictPath?: string): Promise<string[]>

hasEmbeddedDictionary(model: MetaONNXModel): boolean

Processing Methods

Image Input Types

Process Options

Preprocessing Configuration

Postprocessing Configuration

Configuration Options

Language Codes

Worker Configuration

Result Types

OCRResult

TextLine

BoundingBox

WordBox

ProcessingTime

Utility Functions

Image Processing

preprocessImage(input: ImageInput, config: PreprocessConfig): Promise<ImageData>

resizeImage(input: ImageInput, options: ResizeOptions): Promise<Blob>

detectTextOrientation(input: ImageInput): Promise<0 | 90 | 180 | 270>

Text Processing

mergeTextLines(lines: TextLine[], options?: MergeOptions): string

extractPatterns(text: string): PatternMatches

Language Detection

detectLanguage(input: ImageInput | string): Promise<LanguageCode>

getSupportedLanguages(): LanguageInfo[]

React Components

<RapidOCRInterface>

<OCRDropzone>

<OCRResultViewer>

React Hooks

useOCR(config?: OCRConfig): UseOCRResult

Events and Callbacks

Progress Events

Model Download Progress

Error Handling

Error Types

Error Handling Examples

Retry Logic

Advanced Topics

Custom Model Integration

WebAssembly Configuration

Performance Monitoring

`createRapidOCREngine(config: RapidOCRConfig): RapidOCREngine`

`createPPUEngine(config: PPUConfig): PPUEngine`

`createOCREngine(config: OCRConfig): OCREngine`

`RapidOCREngine`

`initialize(): Promise<void>`

`processImage(input: ImageInput, options?: ProcessOptions): Promise<OCRResult>`

`processBatch(inputs: ImageInput[], options?: BatchOptions): Promise<OCRResult[]>`

`cleanup(): Promise<void>`

`areModelsCached(): Promise<boolean>`

`clearCache(): Promise<void>`

`ModelDownloader`

`downloadModels(config: ModelConfig): Promise<ModelPaths>`

`MetaONNXLoader`

`loadModel(modelPath: string): Promise<MetaONNXModel>`

`getDictionary(model: MetaONNXModel, dictPath?: string): Promise<string[]>`

`hasEmbeddedDictionary(model: MetaONNXModel): boolean`

`preprocessImage(input: ImageInput, config: PreprocessConfig): Promise<ImageData>`

`resizeImage(input: ImageInput, options: ResizeOptions): Promise<Blob>`

`detectTextOrientation(input: ImageInput): Promise<0 | 90 | 180 | 270>`

`mergeTextLines(lines: TextLine[], options?: MergeOptions): string`

`extractPatterns(text: string): PatternMatches`

`detectLanguage(input: ImageInput | string): Promise<LanguageCode>`

`getSupportedLanguages(): LanguageInfo[]`

`<RapidOCRInterface>`

`<OCRDropzone>`

`<OCRResultViewer>`

`useOCR(config?: OCRConfig): UseOCRResult`