From 08379b7c0b5a98bd72c2ad57717edc870f08bd90 Mon Sep 17 00:00:00 2001 From: Norbert Klockiewicz Date: Mon, 24 Feb 2025 13:21:15 +0100 Subject: [PATCH 1/8] docs: added docs pages for ocr hooks and hookless api --- docs/docs/computer-vision/useOCR.md | 157 ++++++++++++++++++ docs/docs/computer-vision/useVerticalOCR.md | 167 ++++++++++++++++++++ docs/docs/hookless-api/OCRModule.md | 67 ++++++++ docs/docs/hookless-api/VerticalOCRModule.md | 75 +++++++++ 4 files changed, 466 insertions(+) create mode 100644 docs/docs/computer-vision/useOCR.md create mode 100644 docs/docs/computer-vision/useVerticalOCR.md create mode 100644 docs/docs/hookless-api/OCRModule.md create mode 100644 docs/docs/hookless-api/VerticalOCRModule.md diff --git a/docs/docs/computer-vision/useOCR.md b/docs/docs/computer-vision/useOCR.md new file mode 100644 index 0000000000..c2b3667402 --- /dev/null +++ b/docs/docs/computer-vision/useOCR.md @@ -0,0 +1,157 @@ +--- +title: useOCR +sidebar_position: 4 +--- + +Optical character recognition(OCR) is a computer vision technique that detects and recognizes text within the image. It's commonly used to convert different types of documents, such as scanned paper documents, PDF files, or images captured by a digital camera, into editable and searchable data. + +:::caution +It is recommended to use models provided by us, which are available at our [Hugging Face repository](https://huggingface.co/software-mansion/react-native-executorch-ssdlite320-mobilenet-v3-large). You can also use [constants](https://github.com/software-mansion/react-native-executorch/blob/69802ee1ca161d9df00def1dabe014d36341cfa9/src/constants/modelUrls.ts#L28) shipped with our library. +::: + +## Reference + +```jsx +import { useOCR, CRAFT_800, CRNN_RECOGNIZERS_EN } from 'react-native-executorch'; + +function App() { + const model = useOCR({ + detectorSource: CRAFT_800, + recognizerSources: CRNN_RECOGNIZERS_EN + }); + + ... + for (const ocrDetection of await model.forward("https://url-to-image.jpg")) { + console.log("Bounding box: ", ocrDetection.bbox); + console.log("Bounding label: ", ocrDetection.text); + console.log("Bounding score: ", ocrDetection.score); + } + ... +} +``` + +
+Type definitions + +```typescript +interface Point { + x: number; + y: number; +} + +interface OCRDetection { + bbox: Point[]; + text: string; + score: number; +} + +interface RecognizerSources: { + recognizerLarge: string; + recognizerMedium: string; + recognizerSmall: string; +} +``` + +
+ +### Arguments + +**`detectorSource`** - A string that specifies the location of the detector binary. For more information, take a look at [loading models](../fundamentals/loading-models.md) section. + +**`recognizerSources`** - An object that specifies locations of the recognizers binary files. For more information, take a look at [loading models](../fundamentals/loading-models.md) section. + +### Returns + +The hook returns an object with the following properties: + +| Field | Type | Description | +| ------------------ | -------------------------------------------- | ------------------------------------------------------------------------------------------- | +| `forward` | `(input: string) => Promise` | A function that accepts an image (url, b64) and returns an array of `OCRDetection` objects. | +| `error` | string | null | Contains the error message if the model loading failed. | +| `isGenerating` | `boolean` | Indicates whether the model is currently processing an inference. | +| `isReady` | `boolean` | Indicates whether the model has successfully loaded and is ready for inference. | +| `downloadProgress` | `number` | Represents the download progress as a value between 0 and 1. | + +## Running the model + +To run the model, you can use the `forward` method. It accepts one argument, which is the image. The image can be a remote URL, a local file URI, or a base64-encoded image. The function returns an array of `OCRDetection` objects. Each object contains coordinates of the bounding box, the text recognized within the box, and the confidence score. For more information, please refer to the reference or type definitions. + +## Detection object + +The detection object is specified as follows: + +```typescript +interface Point { + x: number; + y: number; +} + +interface OCRDetection { + bbox: Point[]; + text: string; + score: number; +} +``` + +The `bbox` property contains information about the bounding box of detected text regions. It is represented as four points, which are corners of detected bounding box. +The `text` property contains the text recognized withinh detected text region. The `score` represents the confidence score of the recognized text. + +## Example + +```tsx +import { + useOCR, + CRAFT_800, + CRNN_RECOGNIZERS_EN, +} from 'react-native-executorch'; + +function App() { + const model = useOCR({ + detectorSource: CRAFT_800, + recognizerSources: CRNN_RECOGNIZERS_EN, + }); + + const runModel = async () => { + const ocrDetections = await model.forward('https://url-to-image.jpg'); + + for (const ocrDetection of ocrDetections) { + console.log('Bounding box: ', ocrDetection.bbox); + console.log('Bounding text: ', ocrDetection.text); + console.log('Bounding score: ', ocrDetection.score); + } + }; +} +``` + +## Supported models + +| Model | Type | +| ------------------------------------------------------ | ---------- | +| [CRAFT_800](https://github.com/clovaai/CRAFT-pytorch) | Detector | +| [CRNN_EN_512](https://www.jaided.ai/easyocr/modelhub/) | Recognizer | +| [CRNN_EN_256](https://www.jaided.ai/easyocr/modelhub/) | Recognizer | +| [CRNN_EN_128](https://www.jaided.ai/easyocr/modelhub/) | Recognizer | + +## Benchmarks + +### Model size + +| Model | XNNPACK [MB] | +| --------------------------------------------------- | ------------ | +| CRAFT_800 + CRNN_EN_512 + CRNN_EN_256 + CRNN_EN_128 | 13.9 | + +### Memory usage + +| Model | Android (XNNPACK) [MB] | iOS (XNNPACK) [MB] | +| --------------------------------------------------- | ---------------------- | ------------------ | +| CRAFT_800 + CRNN_EN_512 + CRNN_EN_256 + CRNN_EN_128 | 90 | 90 | + +### Inference time + +:::warning warning +Times presented in the tables are measured as consecutive runs of the model. Initial run times may be up to 2x longer due to model loading and initialization. +::: + +| Model | iPhone 16 Pro (XNNPACK) [ms] | iPhone 13 Pro (XNNPACK) [ms] | iPhone SE 3 (XNNPACK) [ms] | Samsung Galaxy S24 (XNNPACK) [ms] | OnePlus 12 (XNNPACK) [ms] | +| --------------------------------------------------- | ---------------------------- | ---------------------------- | -------------------------- | --------------------------------- | ------------------------- | +| CRAFT_800 + CRNN_EN_512 + CRNN_EN_256 + CRNN_EN_128 | 190 | 260 | 280 | 100 | 90 | diff --git a/docs/docs/computer-vision/useVerticalOCR.md b/docs/docs/computer-vision/useVerticalOCR.md new file mode 100644 index 0000000000..efb5e63771 --- /dev/null +++ b/docs/docs/computer-vision/useVerticalOCR.md @@ -0,0 +1,167 @@ +--- +title: useVerticalOCR +sidebar_position: 5 +--- + +:::danger Experimental +The `useVerticalOCR` hook is currently in an experimental phase. We appreciate feedback from users as we continue to refine and enhance its functionality. +::: + +Optical Character Recognition (OCR) is a computer vision technique used to detect and recognize text within images. It is commonly utilized to convert a variety of documents, such as scanned paper documents, PDF files, or images captured by a digital camera, into editable and searchable data. Traditionally, OCR technology has been optimized for recognizing horizontal text, and integrating support for vertical text recognition often requires significant additional effort from developers. To simplify this, we introduce `useVerticalOCR`, a tool designed to abstract the complexities of vertical text OCR, enabling seamless integration into your applications. + +:::caution +It is recommended to use models provided by us, which are available at our [Hugging Face repository](https://huggingface.co/software-mansion/react-native-executorch-ssdlite320-mobilenet-v3-large). You can also use [constants](https://github.com/software-mansion/react-native-executorch/blob/69802ee1ca161d9df00def1dabe014d36341cfa9/src/constants/modelUrls.ts#L28) shipped with our library. +::: + +## Reference + +```jsx +import { useVerticalOCR, VERTICAL_DETECTORS, VERTICAL_CRNN_RECOGNIZERS_EN } from 'react-native-executorch'; + +function App() { + const model = useVerticalOCR({ + detectorSources: VERTICAL_DETECTORS, + recognizerSources: VERTICAL_CRNN_RECOGNIZERS_EN + }); + + ... + for (const ocrDetection of await model.forward("https://url-to-image.jpg")) { + console.log("Bounding box: ", ocrDetection.bbox); + console.log("Bounding label: ", ocrDetection.text); + console.log("Bounding score: ", ocrDetection.score); + } + ... +} +``` + +
+Type definitions + +```typescript +interface Point { + x: number; + y: number; +} + +interface OCRDetection { + bbox: Point[]; + text: string; + score: number; +} + +interface DetectorSources: { + detectorLarge: string; + detectorNarrow: string; +} + +interface RecognizerSources: { + recognizerLarge: string; + recognizerSmall: string; +} +``` + +
+ +### Arguments + +**`detectorSources`** - An object that specifies the location of the detectors binary files. For more information, take a look at [loading models](../fundamentals/loading-models.md) section. + +**`recognizerSources`** - An object that specifies the locations of the recognizers binary files. For more information, take a look at [loading models](../fundamentals/loading-models.md) section. + +**`independentCharacters`** – A boolean parameter that indicates whether the text in the image consists of a random sequence of characters. If set to true, the algorithm will scan each character individually instead of reading them as continuous text. + +### Returns + +The hook returns an object with the following properties: + +| Field | Type | Description | +| ------------------ | -------------------------------------------- | ------------------------------------------------------------------------------------------- | +| `forward` | `(input: string) => Promise` | A function that accepts an image (url, b64) and returns an array of `OCRDetection` objects. | +| `error` | string | null | Contains the error message if the model loading failed. | +| `isGenerating` | `boolean` | Indicates whether the model is currently processing an inference. | +| `isReady` | `boolean` | Indicates whether the model has successfully loaded and is ready for inference. | +| `downloadProgress` | `number` | Represents the download progress as a value between 0 and 1. | + +## Running the model + +To run the model, you can use the `forward` method. It accepts one argument, which is the image. The image can be a remote URL, a local file URI, or a base64-encoded image. The function returns an array of `OCRDetection` objects. Each object contains coordinates of the bounding box, the text recognized within the box, and the confidence score. For more information, please refer to the reference or type definitions. + +## Detection object + +The detection object is specified as follows: + +```typescript +interface Point { + x: number; + y: number; +} + +interface OCRDetection { + bbox: Point[]; + text: string; + score: number; +} +``` + +The `bbox` property contains information about the bounding box of detected text regions. It is represented as four points, which are corners of detected bounding box. +The `text` property contains the text recognized withinh detected text region. The `score` represents the confidence score of the recognized text. + +## Example + +```tsx +import { + useVerticalOCR, + VERTICAL_DETECTORS, + VERTICAL_CRNN_RECOGNIZERS_EN, +} from 'react-native-executorch'; + +function App() { + const model = useVerticalOCR({ + detectorSources: VERTICAL_DETECTORS, + recognizerSources: VERTICAL_CRNN_RECOGNIZERS_EN, + }); + + const runModel = async () => { + const ocrDetections = await model.forward('https://url-to-image.jpg'); + + for (const ocrDetection of ocrDetections) { + console.log('Bounding box: ', ocrDetection.bbox); + console.log('Bounding text: ', ocrDetection.text); + console.log('Bounding score: ', ocrDetection.score); + } + }; +} +``` + +## Supported models + +| Model | Type | +| -------------------------------------------------------- | ---------- | +| [CRAFT_1280](https://github.com/clovaai/CRAFT-pytorch) | Detector | +| [CRAFT_NARROW](https://github.com/clovaai/CRAFT-pytorch) | Detector | +| [CRNN_EN_512](https://www.jaided.ai/easyocr/modelhub/) | Recognizer | +| [CRNN_EN_64](https://www.jaided.ai/easyocr/modelhub/) | Recognizer | + +## Benchmarks + +### Model size + +| Model | XNNPACK [MB] | +| ---------------------------------------------------- | ------------ | +| CRAFT_1280 + CRAFT_NARROW + CRNN_EN_512 + CRNN_EN_64 | 13.9 | + +### Memory usage + +| Model | Android (XNNPACK) [MB] | iOS (XNNPACK) [MB] | +| ---------------------------------------------------- | ---------------------- | ------------------ | +| CRAFT_1280 + CRAFT_NARROW + CRNN_EN_512 + CRNN_EN_64 | 90 | 90 | + +### Inference time + +:::warning warning +Times presented in the tables are measured as consecutive runs of the model. Initial run times may be up to 2x longer due to model loading and initialization. +::: + +| Model | iPhone 16 Pro (XNNPACK) [ms] | iPhone 13 Pro (XNNPACK) [ms] | iPhone SE 3 (XNNPACK) [ms] | Samsung Galaxy S24 (XNNPACK) [ms] | OnePlus 12 (XNNPACK) [ms] | +| ---------------------------------------------------- | ---------------------------- | ---------------------------- | -------------------------- | --------------------------------- | ------------------------- | +| CRAFT_1280 + CRAFT_NARROW + CRNN_EN_512 + CRNN_EN_64 | 190 | 260 | 280 | 100 | 90 | diff --git a/docs/docs/hookless-api/OCRModule.md b/docs/docs/hookless-api/OCRModule.md new file mode 100644 index 0000000000..2507462e42 --- /dev/null +++ b/docs/docs/hookless-api/OCRModule.md @@ -0,0 +1,67 @@ +--- +title: OCRModule +sidebar_position: 6 +--- + +Hookless implementation of the [useOCR](../computer-vision/useOCR.mdx) hook. + +## Reference + +```typescript +import { + OCRModule, + CRAFT_800, + CRNN_RECOGNIZERS_EN, +} from 'react-native-executorch'; + +const imageUri = 'path/to/image.png'; + +// Loading the model +await OCRModule.load({ + detectorSource: CRAFT_800, + recognizerSources: CRNN_RECOGNIZERS_EN, +}); + +// Running the model +const ocrDetections = await OCRModule.forward(imageUri); +``` + +### Methods + +| Method | Type | Description | +| -------------------- | ------------------------------------------------------------------------------- | -------------------------------------------------------------------------------------------------------- | +| `load` | `(detectorSource: string, recognizerSources: RecognizerSources): Promise` | Loads the model, where `modelSource` is a string that specifies the location of the model binary. | +| `forward` | `(input: string): Promise` | Executes the model's forward pass, where `input` can be a fetchable resource or a Base64-encoded string. | +| `onDownloadProgress` | `(callback: (downloadProgress: number) => void): any` | Subscribe to the download progress event. | + +
+Type definitions + +```typescript +interface Point { + x: number; + y: number; +} + +interface OCRDetection { + bbox: Point[]; + text: string; + score: number; +} + +interface RecognizerSources: { + recognizerLarge: String; + recognizerMedium: String; + recognizerSmall: String; +} +``` + +
+ +## Loading the model + +To load the model, use the `load` method. It accepts the `detectorSource` - a string that specifies the location of the detector binary and `recognizerSources` which is an object specifying locations of the recognizer binary files. For more information, take a look at [loading models](../fundamentals/loading-models.md) page. This method returns a promise, which can resolve to an error or void. + +## Running the model + +To run the model, you can use the `forward` method. It accepts one argument, which is the image. The image can be a remote URL, a local file URI, or a base64-encoded image. The method returns a promise, which can resolve either to an error or an array of `OCRDetection` objects. Each object contains coordinates of the bounding box, the label of the detected object, and the confidence score. diff --git a/docs/docs/hookless-api/VerticalOCRModule.md b/docs/docs/hookless-api/VerticalOCRModule.md new file mode 100644 index 0000000000..4ae5f5c4e1 --- /dev/null +++ b/docs/docs/hookless-api/VerticalOCRModule.md @@ -0,0 +1,75 @@ +--- +title: VerticalOCRModule +sidebar_position: 7 +--- + +Hookless implementation of the [useVerticalOCR](../computer-vision/useVerticalOCR.mdx) hook. + +## Reference + +```typescript +import { + VerticalOCRModule, + VERTICAL_DETECTORS, + VERTICAL_CRNN_RECOGNIZERS_EN, +} from 'react-native-executorch'; + +const imageUri = 'path/to/image.png'; + +// Loading the model +await VerticalOCRModule.load({ + detectorSources: VERTICAL_DETECTORS, + recognizerSources: VERTICAL_CRNN_RECOGNIZERS_EN, +}); + +// Running the model +const ocrDetections = await VerticalOCRModule.forward(imageUri); +``` + +### Methods + +| Method | Type | Description | +| -------------------- | ------------------------------------------------------------------------------------------------------------------------- | -------------------------------------------------------------------------------------------------------- | +| `load` | `(detectorSources: DetectorSources, recognizerSources: RecognizerSources, independentCharacters: boolean): Promise` | Loads the model, where `modelSource` is a string that specifies the location of the model binary. | +| `forward` | `(input: string): Promise` | Executes the model's forward pass, where `input` can be a fetchable resource or a Base64-encoded string. | +| `onDownloadProgress` | `(callback: (downloadProgress: number) => void): any` | Subscribe to the download progress event. | + +
+Type definitions + +```typescript +interface Point { + x: number; + y: number; +} + +interface OCRDetection { + bbox: Point[]; + text: string; + score: number; +} + +interface DetectorSources: { + detectorLarge: string; + detectorNarrow: string; +} + +interface RecognizerSources: { + recognizerLarge: string; + recognizerSmall: string; +} +``` + +
+ +## Loading the model + +To load the model, use the `load` method. It accepts: + +- `detectorSources` - An object that specifies the location of the detectors binary files. For more information, take a look at [loading models](../fundamentals/loading-models.md) section. +- `recognizerSources` - An object that specifies the locations of the recognizers binary files. For more information, take a look at [loading models](../fundamentals/loading-models.md) section. +- `independentCharacters` - A boolean parameter that indicates whether the text in the image consists of a random sequence of characters. If set to true, the algorithm will scan each character individually instead of reading them as continuous text. + +## Running the model + +To run the model, you can use the `forward` method. It accepts one argument, which is the image. The image can be a remote URL, a local file URI, or a base64-encoded image. The method returns a promise, which can resolve either to an error or an array of `OCRDetection` objects. Each object contains coordinates of the bounding box, the label of the detected object, and the confidence score. From 0261cd1be322d980615d88d38e3b92f1cbdd4397 Mon Sep 17 00:00:00 2001 From: Norbert Klockiewicz Date: Tue, 25 Feb 2025 09:21:56 +0100 Subject: [PATCH 2/8] docs: add missing parameters --- docs/docs/computer-vision/useOCR.md | 6 +++++- docs/docs/computer-vision/useVerticalOCR.md | 8 +++++++- 2 files changed, 12 insertions(+), 2 deletions(-) diff --git a/docs/docs/computer-vision/useOCR.md b/docs/docs/computer-vision/useOCR.md index c2b3667402..d26c0ffcb3 100644 --- a/docs/docs/computer-vision/useOCR.md +++ b/docs/docs/computer-vision/useOCR.md @@ -17,7 +17,8 @@ import { useOCR, CRAFT_800, CRNN_RECOGNIZERS_EN } from 'react-native-executorch' function App() { const model = useOCR({ detectorSource: CRAFT_800, - recognizerSources: CRNN_RECOGNIZERS_EN + recognizerSources: CRNN_RECOGNIZERS_EN, + language: "en", }); ... @@ -60,6 +61,8 @@ interface RecognizerSources: { **`recognizerSources`** - An object that specifies locations of the recognizers binary files. For more information, take a look at [loading models](../fundamentals/loading-models.md) section. +**`language`** - A parameter that specifies the language of the text to be recognized by the OCR. + ### Returns The hook returns an object with the following properties: @@ -109,6 +112,7 @@ function App() { const model = useOCR({ detectorSource: CRAFT_800, recognizerSources: CRNN_RECOGNIZERS_EN, + language: 'en', }); const runModel = async () => { diff --git a/docs/docs/computer-vision/useVerticalOCR.md b/docs/docs/computer-vision/useVerticalOCR.md index efb5e63771..786730bd6f 100644 --- a/docs/docs/computer-vision/useVerticalOCR.md +++ b/docs/docs/computer-vision/useVerticalOCR.md @@ -21,7 +21,9 @@ import { useVerticalOCR, VERTICAL_DETECTORS, VERTICAL_CRNN_RECOGNIZERS_EN } from function App() { const model = useVerticalOCR({ detectorSources: VERTICAL_DETECTORS, - recognizerSources: VERTICAL_CRNN_RECOGNIZERS_EN + recognizerSources: VERTICAL_CRNN_RECOGNIZERS_EN, + language: "en", + independentCharacters: True, }); ... @@ -68,6 +70,8 @@ interface RecognizerSources: { **`recognizerSources`** - An object that specifies the locations of the recognizers binary files. For more information, take a look at [loading models](../fundamentals/loading-models.md) section. +**`language`** - A parameter that specifies the language of the text to be recognized by the OCR. + **`independentCharacters`** – A boolean parameter that indicates whether the text in the image consists of a random sequence of characters. If set to true, the algorithm will scan each character individually instead of reading them as continuous text. ### Returns @@ -119,6 +123,8 @@ function App() { const model = useVerticalOCR({ detectorSources: VERTICAL_DETECTORS, recognizerSources: VERTICAL_CRNN_RECOGNIZERS_EN, + language: 'en', + independentCharacters: True, }); const runModel = async () => { From 44666ee10a96e95b928cfb7059be419f3ee6e66c Mon Sep 17 00:00:00 2001 From: Norbert Klockiewicz Date: Fri, 28 Feb 2025 14:12:28 +0100 Subject: [PATCH 3/8] docs: benchmarks for ocr and vertical ocr --- docs/docs/computer-vision/useOCR.md | 20 +++++++++------ docs/docs/computer-vision/useVerticalOCR.md | 25 ++++++++++++------- .../docs/hookless-api/ClassificationModule.md | 2 +- docs/docs/hookless-api/OCRModule.md | 2 +- .../hookless-api/ObjectDetectionModule.md | 2 +- docs/docs/hookless-api/StyleTransferModule.md | 2 +- docs/docs/hookless-api/VerticalOCRModule.md | 2 +- 7 files changed, 34 insertions(+), 21 deletions(-) diff --git a/docs/docs/computer-vision/useOCR.md b/docs/docs/computer-vision/useOCR.md index d26c0ffcb3..a35ade70cb 100644 --- a/docs/docs/computer-vision/useOCR.md +++ b/docs/docs/computer-vision/useOCR.md @@ -140,15 +140,18 @@ function App() { ### Model size -| Model | XNNPACK [MB] | -| --------------------------------------------------- | ------------ | -| CRAFT_800 + CRNN_EN_512 + CRNN_EN_256 + CRNN_EN_128 | 13.9 | +| Model | XNNPACK [MB] | +| ----------- | ------------ | +| CRAFT_800 | 83.1 | +| CRNN_EN_512 | 547 | +| CRNN_EN_256 | 277 | +| CRNN_EN_128 | 142 | ### Memory usage | Model | Android (XNNPACK) [MB] | iOS (XNNPACK) [MB] | | --------------------------------------------------- | ---------------------- | ------------------ | -| CRAFT_800 + CRNN_EN_512 + CRNN_EN_256 + CRNN_EN_128 | 90 | 90 | +| CRAFT_800 + CRNN_EN_512 + CRNN_EN_256 + CRNN_EN_128 | 2100 | 1782 | ### Inference time @@ -156,6 +159,9 @@ function App() { Times presented in the tables are measured as consecutive runs of the model. Initial run times may be up to 2x longer due to model loading and initialization. ::: -| Model | iPhone 16 Pro (XNNPACK) [ms] | iPhone 13 Pro (XNNPACK) [ms] | iPhone SE 3 (XNNPACK) [ms] | Samsung Galaxy S24 (XNNPACK) [ms] | OnePlus 12 (XNNPACK) [ms] | -| --------------------------------------------------- | ---------------------------- | ---------------------------- | -------------------------- | --------------------------------- | ------------------------- | -| CRAFT_800 + CRNN_EN_512 + CRNN_EN_256 + CRNN_EN_128 | 190 | 260 | 280 | 100 | 90 | +| Model | iPhone 16 Pro (XNNPACK) [ms] | iPhone 14 Pro Max (XNNPACK) [ms] | iPhone SE 3 (XNNPACK) [ms] | Samsung Galaxy S24 (XNNPACK) [ms] | Samsung Galaxy S21 (XNNPACK) [ms] | +| ----------- | ---------------------------- | -------------------------------- | -------------------------- | --------------------------------- | --------------------------------- | +| CRAFT_800 | 2099 | 2227 | ❌ | 2245 | 7108 | +| CRNN_EN_512 | 70 | 252 | ❌ | 54 | 151 | +| CRNN_EN_256 | 39 | 123 | ❌ | 24 | 78 | +| CRNN_EN_128 | 17 | 83 | ❌ | 14 | 39 | diff --git a/docs/docs/computer-vision/useVerticalOCR.md b/docs/docs/computer-vision/useVerticalOCR.md index 786730bd6f..013fc56726 100644 --- a/docs/docs/computer-vision/useVerticalOCR.md +++ b/docs/docs/computer-vision/useVerticalOCR.md @@ -152,15 +152,19 @@ function App() { ### Model size -| Model | XNNPACK [MB] | -| ---------------------------------------------------- | ------------ | -| CRAFT_1280 + CRAFT_NARROW + CRNN_EN_512 + CRNN_EN_64 | 13.9 | +| Model | XNNPACK [MB] | +| ----------- | ------------ | +| CRAFT_1280 | 83.1 | +| CRAFT_320 | 83.1 | +| CRNN_EN_512 | 277 | +| CRNN_EN_64 | 74.3 | ### Memory usage -| Model | Android (XNNPACK) [MB] | iOS (XNNPACK) [MB] | -| ---------------------------------------------------- | ---------------------- | ------------------ | -| CRAFT_1280 + CRAFT_NARROW + CRNN_EN_512 + CRNN_EN_64 | 90 | 90 | +| Model | Android (XNNPACK) [MB] | iOS (XNNPACK) [MB] | +| ------------------------------------ | ---------------------- | ------------------ | +| CRAFT_1280 + CRAFT_320 + CRNN_EN_512 | 2770 | 3720 | +| CRAFT_1280 + CRAFT_320 + CRNN_EN_64 | 1770 | 2740 | ### Inference time @@ -168,6 +172,9 @@ function App() { Times presented in the tables are measured as consecutive runs of the model. Initial run times may be up to 2x longer due to model loading and initialization. ::: -| Model | iPhone 16 Pro (XNNPACK) [ms] | iPhone 13 Pro (XNNPACK) [ms] | iPhone SE 3 (XNNPACK) [ms] | Samsung Galaxy S24 (XNNPACK) [ms] | OnePlus 12 (XNNPACK) [ms] | -| ---------------------------------------------------- | ---------------------------- | ---------------------------- | -------------------------- | --------------------------------- | ------------------------- | -| CRAFT_1280 + CRAFT_NARROW + CRNN_EN_512 + CRNN_EN_64 | 190 | 260 | 280 | 100 | 90 | +| Model | iPhone 16 Pro (XNNPACK) [ms] | iPhone 14 Pro Max (XNNPACK) [ms] | iPhone SE 3 (XNNPACK) [ms] | Samsung Galaxy S24 (XNNPACK) [ms] | Samsung Galaxy S21 (XNNPACK) [ms] | +| ----------- | ---------------------------- | -------------------------------- | -------------------------- | --------------------------------- | --------------------------------- | +| CRAFT_1280 | 5457 | 5833 | ❌ | 6296 | 14053 | +| CRAFT_320 | 1351 | 1460 | ❌ | 1485 | 3101 | +| CRNN_EN_512 | 39 | 123 | ❌ | 24 | 78 | +| CRNN_EN_64 | 10 | 33 | ❌ | 7 | 18 | diff --git a/docs/docs/hookless-api/ClassificationModule.md b/docs/docs/hookless-api/ClassificationModule.md index 732971db27..2e62cbd4ab 100644 --- a/docs/docs/hookless-api/ClassificationModule.md +++ b/docs/docs/hookless-api/ClassificationModule.md @@ -3,7 +3,7 @@ title: ClassificationModule sidebar_position: 1 --- -Hookless implementation of the [useClassification](../computer-vision/useClassification.mdx) hook. +Hookless implementation of the [useClassification](../computer-vision/useClassification.md) hook. ## Reference diff --git a/docs/docs/hookless-api/OCRModule.md b/docs/docs/hookless-api/OCRModule.md index 2507462e42..cd9f0c80f1 100644 --- a/docs/docs/hookless-api/OCRModule.md +++ b/docs/docs/hookless-api/OCRModule.md @@ -3,7 +3,7 @@ title: OCRModule sidebar_position: 6 --- -Hookless implementation of the [useOCR](../computer-vision/useOCR.mdx) hook. +Hookless implementation of the [useOCR](../computer-vision/useOCR.md) hook. ## Reference diff --git a/docs/docs/hookless-api/ObjectDetectionModule.md b/docs/docs/hookless-api/ObjectDetectionModule.md index 2cc3504ef4..6c730b7fe0 100644 --- a/docs/docs/hookless-api/ObjectDetectionModule.md +++ b/docs/docs/hookless-api/ObjectDetectionModule.md @@ -3,7 +3,7 @@ title: ObjectDetectionModule sidebar_position: 5 --- -Hookless implementation of the [useObjectDetection](../computer-vision/useObjectDetection.mdx) hook. +Hookless implementation of the [useObjectDetection](../computer-vision/useObjectDetection.md) hook. ## Reference diff --git a/docs/docs/hookless-api/StyleTransferModule.md b/docs/docs/hookless-api/StyleTransferModule.md index f084d8cad5..29c750bee3 100644 --- a/docs/docs/hookless-api/StyleTransferModule.md +++ b/docs/docs/hookless-api/StyleTransferModule.md @@ -3,7 +3,7 @@ title: StyleTransferModule sidebar_position: 4 --- -Hookless implementation of the [useStyleTransfer](../computer-vision/useStyleTransfer.mdx) hook. +Hookless implementation of the [useStyleTransfer](../computer-vision/useStyleTransfer.md) hook. ## Reference diff --git a/docs/docs/hookless-api/VerticalOCRModule.md b/docs/docs/hookless-api/VerticalOCRModule.md index 4ae5f5c4e1..8b27a436ee 100644 --- a/docs/docs/hookless-api/VerticalOCRModule.md +++ b/docs/docs/hookless-api/VerticalOCRModule.md @@ -3,7 +3,7 @@ title: VerticalOCRModule sidebar_position: 7 --- -Hookless implementation of the [useVerticalOCR](../computer-vision/useVerticalOCR.mdx) hook. +Hookless implementation of the [useVerticalOCR](../computer-vision/useVerticalOCR.md) hook. ## Reference From 212726eb1d3d4e4f19763af0015e6ba519c2c05c Mon Sep 17 00:00:00 2001 From: Norbert Klockiewicz Date: Mon, 3 Mar 2025 11:03:40 +0100 Subject: [PATCH 4/8] docs: update docs to match implementation --- docs/docs/benchmarks/inference-time.md | 18 ++++++ docs/docs/benchmarks/memory-usage.md | 13 +++++ docs/docs/benchmarks/model-size.md | 18 ++++++ docs/docs/computer-vision/useOCR.md | 40 +++++++++---- docs/docs/computer-vision/useVerticalOCR.md | 62 ++++++++++++++------- docs/docs/hookless-api/OCRModule.md | 48 +++++++++++----- docs/docs/hookless-api/VerticalOCRModule.md | 59 +++++++++++++------- 7 files changed, 192 insertions(+), 66 deletions(-) diff --git a/docs/docs/benchmarks/inference-time.md b/docs/docs/benchmarks/inference-time.md index c1f91a3b7b..642e903890 100644 --- a/docs/docs/benchmarks/inference-time.md +++ b/docs/docs/benchmarks/inference-time.md @@ -28,6 +28,24 @@ Times presented in the tables are measured as consecutive runs of the model. Ini | STYLE_TRANSFER_UDNIE | 450 | 600 | 750 | 1650 | 1800 | | STYLE_TRANSFER_RAIN_PRINCESS | 450 | 600 | 750 | 1650 | 1800 | +## OCR + +| Model | iPhone 16 Pro (XNNPACK) [ms] | iPhone 14 Pro Max (XNNPACK) [ms] | iPhone SE 3 (XNNPACK) [ms] | Samsung Galaxy S24 (XNNPACK) [ms] | Samsung Galaxy S21 (XNNPACK) [ms] | +| ----------- | ---------------------------- | -------------------------------- | -------------------------- | --------------------------------- | --------------------------------- | +| CRAFT_800 | 2099 | 2227 | ❌ | 2245 | 7108 | +| CRNN_EN_512 | 70 | 252 | ❌ | 54 | 151 | +| CRNN_EN_256 | 39 | 123 | ❌ | 24 | 78 | +| CRNN_EN_128 | 17 | 83 | ❌ | 14 | 39 | + +## Vertical OCR + +| Model | iPhone 16 Pro (XNNPACK) [ms] | iPhone 14 Pro Max (XNNPACK) [ms] | iPhone SE 3 (XNNPACK) [ms] | Samsung Galaxy S24 (XNNPACK) [ms] | Samsung Galaxy S21 (XNNPACK) [ms] | +| ----------- | ---------------------------- | -------------------------------- | -------------------------- | --------------------------------- | --------------------------------- | +| CRAFT_1280 | 5457 | 5833 | ❌ | 6296 | 14053 | +| CRAFT_320 | 1351 | 1460 | ❌ | 1485 | 3101 | +| CRNN_EN_512 | 39 | 123 | ❌ | 24 | 78 | +| CRNN_EN_64 | 10 | 33 | ❌ | 7 | 18 | + ## LLMs | Model | iPhone 16 Pro (XNNPACK) [tokens/s] | iPhone 13 Pro (XNNPACK) [tokens/s] | iPhone SE 3 (XNNPACK) [tokens/s] | Samsung Galaxy S24 (XNNPACK) [tokens/s] | OnePlus 12 (XNNPACK) [tokens/s] | diff --git a/docs/docs/benchmarks/memory-usage.md b/docs/docs/benchmarks/memory-usage.md index 868a0884b6..2f535ad48b 100644 --- a/docs/docs/benchmarks/memory-usage.md +++ b/docs/docs/benchmarks/memory-usage.md @@ -24,6 +24,19 @@ sidebar_position: 2 | STYLE_TRANSFER_UDNIE | 950 | 350 | | STYLE_TRANSFER_RAIN_PRINCESS | 950 | 350 | +## OCR + +| Model | Android (XNNPACK) [MB] | iOS (XNNPACK) [MB] | +| --------------------------------------------------- | ---------------------- | ------------------ | +| CRAFT_800 + CRNN_EN_512 + CRNN_EN_256 + CRNN_EN_128 | 2100 | 1782 | + +## Vertical OCR + +| Model | Android (XNNPACK) [MB] | iOS (XNNPACK) [MB] | +| ------------------------------------ | ---------------------- | ------------------ | +| CRAFT_1280 + CRAFT_320 + CRNN_EN_512 | 2770 | 3720 | +| CRAFT_1280 + CRAFT_320 + CRNN_EN_64 | 1770 | 2740 | + ## LLMs | Model | Android (XNNPACK) [GB] | iOS (XNNPACK) [GB] | diff --git a/docs/docs/benchmarks/model-size.md b/docs/docs/benchmarks/model-size.md index a80f59d47f..59f1d9bda0 100644 --- a/docs/docs/benchmarks/model-size.md +++ b/docs/docs/benchmarks/model-size.md @@ -24,6 +24,24 @@ sidebar_position: 1 | STYLE_TRANSFER_UDNIE | 6.78 | 5.22 | | STYLE_TRANSFER_RAIN_PRINCESS | 6.78 | 5.22 | +## OCR + +| Model | XNNPACK [MB] | +| ----------- | ------------ | +| CRAFT_800 | 83.1 | +| CRNN_EN_512 | 547 | +| CRNN_EN_256 | 277 | +| CRNN_EN_128 | 142 | + +## Vertical OCR + +| Model | XNNPACK [MB] | +| ----------- | ------------ | +| CRAFT_1280 | 83.1 | +| CRAFT_320 | 83.1 | +| CRNN_EN_512 | 277 | +| CRNN_EN_64 | 74.3 | + ## LLMs | Model | XNNPACK [GB] | diff --git a/docs/docs/computer-vision/useOCR.md b/docs/docs/computer-vision/useOCR.md index a35ade70cb..73a5e43370 100644 --- a/docs/docs/computer-vision/useOCR.md +++ b/docs/docs/computer-vision/useOCR.md @@ -12,12 +12,22 @@ It is recommended to use models provided by us, which are available at our [Hugg ## Reference ```jsx -import { useOCR, CRAFT_800, CRNN_RECOGNIZERS_EN } from 'react-native-executorch'; +import { + useOCR, + CRAFT_800, + RECOGNIZER_EN_CRNN_512, + RECOGNIZER_EN_CRNN_256, + RECOGNIZER_EN_CRNN_128 +} from 'react-native-executorch'; function App() { const model = useOCR({ detectorSource: CRAFT_800, - recognizerSources: CRNN_RECOGNIZERS_EN, + recognizerSources: { + recognizerLarge: RECOGNIZER_EN_CRNN_512, + recognizerMedium: RECOGNIZER_EN_CRNN_256, + recognizerSmall: RECOGNIZER_EN_CRNN_128 + }, language: "en", }); @@ -35,6 +45,14 @@ function App() { Type definitions ```typescript +interface RecognizerSources { + recognizerLarge: string; + recognizerMedium: string; + recognizerSmall: string; +} + +type OCRLanguage = 'en'; + interface Point { x: number; y: number; @@ -45,12 +63,6 @@ interface OCRDetection { text: string; score: number; } - -interface RecognizerSources: { - recognizerLarge: string; - recognizerMedium: string; - recognizerSmall: string; -} ``` @@ -97,7 +109,7 @@ interface OCRDetection { ``` The `bbox` property contains information about the bounding box of detected text regions. It is represented as four points, which are corners of detected bounding box. -The `text` property contains the text recognized withinh detected text region. The `score` represents the confidence score of the recognized text. +The `text` property contains the text recognized within detected text region. The `score` represents the confidence score of the recognized text. ## Example @@ -105,13 +117,19 @@ The `text` property contains the text recognized withinh detected text region. T import { useOCR, CRAFT_800, - CRNN_RECOGNIZERS_EN, + RECOGNIZER_EN_CRNN_512, + RECOGNIZER_EN_CRNN_256, + RECOGNIZER_EN_CRNN_128, } from 'react-native-executorch'; function App() { const model = useOCR({ detectorSource: CRAFT_800, - recognizerSources: CRNN_RECOGNIZERS_EN, + recognizerSources: { + recognizerLarge: RECOGNIZER_EN_CRNN_512, + recognizerMedium: RECOGNIZER_EN_CRNN_256, + recognizerSmall: RECOGNIZER_EN_CRNN_128, + }, language: 'en', }); diff --git a/docs/docs/computer-vision/useVerticalOCR.md b/docs/docs/computer-vision/useVerticalOCR.md index 013fc56726..21ffedf3b5 100644 --- a/docs/docs/computer-vision/useVerticalOCR.md +++ b/docs/docs/computer-vision/useVerticalOCR.md @@ -16,14 +16,26 @@ It is recommended to use models provided by us, which are available at our [Hugg ## Reference ```jsx -import { useVerticalOCR, VERTICAL_DETECTORS, VERTICAL_CRNN_RECOGNIZERS_EN } from 'react-native-executorch'; +import { + DETECTOR_CRAFT_1280, + DETECTOR_CRAFT_320, + RECOGNIZER_EN_CRNN_512, + RECOGNIZER_EN_CRNN_64, + useVerticalOCR, +} from 'react-native-executorch'; function App() { const model = useVerticalOCR({ - detectorSources: VERTICAL_DETECTORS, - recognizerSources: VERTICAL_CRNN_RECOGNIZERS_EN, - language: "en", - independentCharacters: True, + detectorSources: { + detectorLarge: DETECTOR_CRAFT_1280, + detectorNarrow: DETECTOR_CRAFT_320, + }, + recognizerSources: { + recognizerLarge: RECOGNIZER_EN_CRNN_512, + recognizerSmall: RECOGNIZER_EN_CRNN_64, + }, + language: 'en', + independentCharacters: true, }); ... @@ -40,6 +52,18 @@ function App() { Type definitions ```typescript +interface DetectorSources { + detectorLarge: string; + detectorNarrow: string; +} + +interface RecognizerSources { + recognizerLarge: string; + recognizerSmall: string; +} + +type OCRLanguage = 'en'; + interface Point { x: number; y: number; @@ -50,16 +74,6 @@ interface OCRDetection { text: string; score: number; } - -interface DetectorSources: { - detectorLarge: string; - detectorNarrow: string; -} - -interface RecognizerSources: { - recognizerLarge: string; - recognizerSmall: string; -} ``` @@ -114,17 +128,25 @@ The `text` property contains the text recognized withinh detected text region. T ```tsx import { + DETECTOR_CRAFT_1280, + DETECTOR_CRAFT_320, + RECOGNIZER_EN_CRNN_512, + RECOGNIZER_EN_CRNN_64, useVerticalOCR, - VERTICAL_DETECTORS, - VERTICAL_CRNN_RECOGNIZERS_EN, } from 'react-native-executorch'; function App() { const model = useVerticalOCR({ - detectorSources: VERTICAL_DETECTORS, - recognizerSources: VERTICAL_CRNN_RECOGNIZERS_EN, + detectorSources: { + detectorLarge: DETECTOR_CRAFT_1280, + detectorNarrow: DETECTOR_CRAFT_320, + }, + recognizerSources: { + recognizerLarge: RECOGNIZER_EN_CRNN_512, + recognizerSmall: RECOGNIZER_EN_CRNN_64, + }, language: 'en', - independentCharacters: True, + independentCharacters: true, }); const runModel = async () => { diff --git a/docs/docs/hookless-api/OCRModule.md b/docs/docs/hookless-api/OCRModule.md index cd9f0c80f1..fac76df9cb 100644 --- a/docs/docs/hookless-api/OCRModule.md +++ b/docs/docs/hookless-api/OCRModule.md @@ -11,15 +11,21 @@ Hookless implementation of the [useOCR](../computer-vision/useOCR.md) hook. import { OCRModule, CRAFT_800, - CRNN_RECOGNIZERS_EN, + RECOGNIZER_EN_CRNN_512, + RECOGNIZER_EN_CRNN_256, + RECOGNIZER_EN_CRNN_128, } from 'react-native-executorch'; - const imageUri = 'path/to/image.png'; // Loading the model await OCRModule.load({ detectorSource: CRAFT_800, - recognizerSources: CRNN_RECOGNIZERS_EN, + recognizerSources: { + recognizerLarge: RECOGNIZER_EN_CRNN_512, + recognizerMedium: RECOGNIZER_EN_CRNN_256, + recognizerSmall: RECOGNIZER_EN_CRNN_128, + }, + language: 'en', }); // Running the model @@ -28,16 +34,24 @@ const ocrDetections = await OCRModule.forward(imageUri); ### Methods -| Method | Type | Description | -| -------------------- | ------------------------------------------------------------------------------- | -------------------------------------------------------------------------------------------------------- | -| `load` | `(detectorSource: string, recognizerSources: RecognizerSources): Promise` | Loads the model, where `modelSource` is a string that specifies the location of the model binary. | -| `forward` | `(input: string): Promise` | Executes the model's forward pass, where `input` can be a fetchable resource or a Base64-encoded string. | -| `onDownloadProgress` | `(callback: (downloadProgress: number) => void): any` | Subscribe to the download progress event. | +| Method | Type | Description | +| -------------------- | ------------------------------------------------------------------------------------------------------ | -------------------------------------------------------------------------------------------------------- | +| `load` | `(detectorSource: string, recognizerSources: RecognizerSources, language: OCRLanguage): Promise` | Loads the detector and recognizers, which sources are represented by `RecognizerSources`. | +| `forward` | `(input: string): Promise` | Executes the model's forward pass, where `input` can be a fetchable resource or a Base64-encoded string. | +| `onDownloadProgress` | `(callback: (downloadProgress: number) => void): any` | Subscribe to the download progress event. |
Type definitions ```typescript +interface RecognizerSources { + recognizerLarge: string; + recognizerMedium: string; + recognizerSmall: string; +} + +type OCRLanguage = 'en'; + interface Point { x: number; y: number; @@ -48,19 +62,23 @@ interface OCRDetection { text: string; score: number; } - -interface RecognizerSources: { - recognizerLarge: String; - recognizerMedium: String; - recognizerSmall: String; -} ```
## Loading the model -To load the model, use the `load` method. It accepts the `detectorSource` - a string that specifies the location of the detector binary and `recognizerSources` which is an object specifying locations of the recognizer binary files. For more information, take a look at [loading models](../fundamentals/loading-models.md) page. This method returns a promise, which can resolve to an error or void. +To load the model, use the `load` method. It accepts: + +- `detectorSource` - A string that specifies the location of the detector binary file. For more information, take a look at [loading models](../fundamentals/loading-models.md) section. +- `recognizerSources` - An object that specifies the locations of the recognizers binary files. For more information, take a look at [loading models](../fundamentals/loading-models.md) section. +- `language` - A parameter that specifies the language of the text to be recognized by the OCR. + +This method returns a promise, which can resolve to an error or void. + +## Listening for download progress + +To subscribe to the download progress event, you can use the `onDownloadProgress` method. It accepts a callback function that will be called whenever the download progress changes. ## Running the model diff --git a/docs/docs/hookless-api/VerticalOCRModule.md b/docs/docs/hookless-api/VerticalOCRModule.md index 8b27a436ee..030f2c74d3 100644 --- a/docs/docs/hookless-api/VerticalOCRModule.md +++ b/docs/docs/hookless-api/VerticalOCRModule.md @@ -9,17 +9,27 @@ Hookless implementation of the [useVerticalOCR](../computer-vision/useVerticalOC ```typescript import { - VerticalOCRModule, - VERTICAL_DETECTORS, - VERTICAL_CRNN_RECOGNIZERS_EN, + DETECTOR_CRAFT_1280, + DETECTOR_CRAFT_320, + RECOGNIZER_EN_CRNN_512, + RECOGNIZER_EN_CRNN_64, + useVerticalOCR, } from 'react-native-executorch'; const imageUri = 'path/to/image.png'; // Loading the model await VerticalOCRModule.load({ - detectorSources: VERTICAL_DETECTORS, - recognizerSources: VERTICAL_CRNN_RECOGNIZERS_EN, + detectorSources: { + detectorLarge: DETECTOR_CRAFT_1280, + detectorNarrow: DETECTOR_CRAFT_320, + }, + recognizerSources: { + recognizerLarge: RECOGNIZER_EN_CRNN_512, + recognizerSmall: RECOGNIZER_EN_CRNN_64, + }, + language: 'en', + independentCharacters: true, }); // Running the model @@ -28,16 +38,28 @@ const ocrDetections = await VerticalOCRModule.forward(imageUri); ### Methods -| Method | Type | Description | -| -------------------- | ------------------------------------------------------------------------------------------------------------------------- | -------------------------------------------------------------------------------------------------------- | -| `load` | `(detectorSources: DetectorSources, recognizerSources: RecognizerSources, independentCharacters: boolean): Promise` | Loads the model, where `modelSource` is a string that specifies the location of the model binary. | -| `forward` | `(input: string): Promise` | Executes the model's forward pass, where `input` can be a fetchable resource or a Base64-encoded string. | -| `onDownloadProgress` | `(callback: (downloadProgress: number) => void): any` | Subscribe to the download progress event. | +| Method | Type | Description | +| -------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------ | +| `load` | `(detectorSources: DetectorSources, recognizerSources: RecognizerSources, language: OCRLanguage independentCharacters: boolean): Promise` | Loads detectors and recognizers, which sources are represented by `DetectorSources` and `RecognizerSources`. | +| `forward` | `(input: string): Promise` | Executes the model's forward pass, where `input` can be a fetchable resource or a Base64-encoded string. | +| `onDownloadProgress` | `(callback: (downloadProgress: number) => void): any` | Subscribe to the download progress event. |
Type definitions ```typescript +interface DetectorSources { + detectorLarge: string; + detectorNarrow: string; +} + +interface RecognizerSources { + recognizerLarge: string; + recognizerSmall: string; +} + +type OCRLanguage = 'en'; + interface Point { x: number; y: number; @@ -48,16 +70,6 @@ interface OCRDetection { text: string; score: number; } - -interface DetectorSources: { - detectorLarge: string; - detectorNarrow: string; -} - -interface RecognizerSources: { - recognizerLarge: string; - recognizerSmall: string; -} ```
@@ -69,6 +81,13 @@ To load the model, use the `load` method. It accepts: - `detectorSources` - An object that specifies the location of the detectors binary files. For more information, take a look at [loading models](../fundamentals/loading-models.md) section. - `recognizerSources` - An object that specifies the locations of the recognizers binary files. For more information, take a look at [loading models](../fundamentals/loading-models.md) section. - `independentCharacters` - A boolean parameter that indicates whether the text in the image consists of a random sequence of characters. If set to true, the algorithm will scan each character individually instead of reading them as continuous text. +- `language` - A parameter that specifies the language of the text to be recognized by the OCR. + +This method returns a promise, which can resolve to an error or void. + +## Listening for download progress + +To subscribe to the download progress event, you can use the `onDownloadProgress` method. It accepts a callback function that will be called whenever the download progress changes. ## Running the model From bfe0da0c613e2ce7f5e911d30d4cb84339e1889b Mon Sep 17 00:00:00 2001 From: Norbert Klockiewicz Date: Mon, 3 Mar 2025 14:54:22 +0100 Subject: [PATCH 5/8] fix: requested changes --- docs/docs/benchmarks/inference-time.md | 4 ++++ docs/docs/computer-vision/useOCR.md | 4 +++- docs/docs/computer-vision/useVerticalOCR.md | 4 +++- 3 files changed, 10 insertions(+), 2 deletions(-) diff --git a/docs/docs/benchmarks/inference-time.md b/docs/docs/benchmarks/inference-time.md index 642e903890..45c408a8e5 100644 --- a/docs/docs/benchmarks/inference-time.md +++ b/docs/docs/benchmarks/inference-time.md @@ -37,6 +37,8 @@ Times presented in the tables are measured as consecutive runs of the model. Ini | CRNN_EN_256 | 39 | 123 | ❌ | 24 | 78 | | CRNN_EN_128 | 17 | 83 | ❌ | 14 | 39 | +❌ - Insufficient RAM. + ## Vertical OCR | Model | iPhone 16 Pro (XNNPACK) [ms] | iPhone 14 Pro Max (XNNPACK) [ms] | iPhone SE 3 (XNNPACK) [ms] | Samsung Galaxy S24 (XNNPACK) [ms] | Samsung Galaxy S21 (XNNPACK) [ms] | @@ -46,6 +48,8 @@ Times presented in the tables are measured as consecutive runs of the model. Ini | CRNN_EN_512 | 39 | 123 | ❌ | 24 | 78 | | CRNN_EN_64 | 10 | 33 | ❌ | 7 | 18 | +❌ - Insufficient RAM. + ## LLMs | Model | iPhone 16 Pro (XNNPACK) [tokens/s] | iPhone 13 Pro (XNNPACK) [tokens/s] | iPhone SE 3 (XNNPACK) [tokens/s] | Samsung Galaxy S24 (XNNPACK) [tokens/s] | OnePlus 12 (XNNPACK) [tokens/s] | diff --git a/docs/docs/computer-vision/useOCR.md b/docs/docs/computer-vision/useOCR.md index 73a5e43370..fe9d409b40 100644 --- a/docs/docs/computer-vision/useOCR.md +++ b/docs/docs/computer-vision/useOCR.md @@ -6,7 +6,7 @@ sidebar_position: 4 Optical character recognition(OCR) is a computer vision technique that detects and recognizes text within the image. It's commonly used to convert different types of documents, such as scanned paper documents, PDF files, or images captured by a digital camera, into editable and searchable data. :::caution -It is recommended to use models provided by us, which are available at our [Hugging Face repository](https://huggingface.co/software-mansion/react-native-executorch-ssdlite320-mobilenet-v3-large). You can also use [constants](https://github.com/software-mansion/react-native-executorch/blob/69802ee1ca161d9df00def1dabe014d36341cfa9/src/constants/modelUrls.ts#L28) shipped with our library. +It is recommended to use models provided by us, which are available at our [Hugging Face repository](https://huggingface.co/software-mansion). You can also use [constants](https://github.com/software-mansion/react-native-executorch/blob/69802ee1ca161d9df00def1dabe014d36341cfa9/src/constants/modelUrls.ts#L28) shipped with our library. ::: ## Reference @@ -183,3 +183,5 @@ Times presented in the tables are measured as consecutive runs of the model. Ini | CRNN_EN_512 | 70 | 252 | ❌ | 54 | 151 | | CRNN_EN_256 | 39 | 123 | ❌ | 24 | 78 | | CRNN_EN_128 | 17 | 83 | ❌ | 14 | 39 | + +❌ - Insufficient RAM. diff --git a/docs/docs/computer-vision/useVerticalOCR.md b/docs/docs/computer-vision/useVerticalOCR.md index 21ffedf3b5..08d1710e30 100644 --- a/docs/docs/computer-vision/useVerticalOCR.md +++ b/docs/docs/computer-vision/useVerticalOCR.md @@ -10,7 +10,7 @@ The `useVerticalOCR` hook is currently in an experimental phase. We appreciate f Optical Character Recognition (OCR) is a computer vision technique used to detect and recognize text within images. It is commonly utilized to convert a variety of documents, such as scanned paper documents, PDF files, or images captured by a digital camera, into editable and searchable data. Traditionally, OCR technology has been optimized for recognizing horizontal text, and integrating support for vertical text recognition often requires significant additional effort from developers. To simplify this, we introduce `useVerticalOCR`, a tool designed to abstract the complexities of vertical text OCR, enabling seamless integration into your applications. :::caution -It is recommended to use models provided by us, which are available at our [Hugging Face repository](https://huggingface.co/software-mansion/react-native-executorch-ssdlite320-mobilenet-v3-large). You can also use [constants](https://github.com/software-mansion/react-native-executorch/blob/69802ee1ca161d9df00def1dabe014d36341cfa9/src/constants/modelUrls.ts#L28) shipped with our library. +It is recommended to use models provided by us, which are available at our [Hugging Face repository](https://huggingface.co/software-mansion). You can also use [constants](https://github.com/software-mansion/react-native-executorch/blob/69802ee1ca161d9df00def1dabe014d36341cfa9/src/constants/modelUrls.ts#L28) shipped with our library. ::: ## Reference @@ -200,3 +200,5 @@ Times presented in the tables are measured as consecutive runs of the model. Ini | CRAFT_320 | 1351 | 1460 | ❌ | 1485 | 3101 | | CRNN_EN_512 | 39 | 123 | ❌ | 24 | 78 | | CRNN_EN_64 | 10 | 33 | ❌ | 7 | 18 | + +❌ - Insufficient RAM. From 0861c6f29b3150bfeaf8c841c72bd5bce6dcfb68 Mon Sep 17 00:00:00 2001 From: Norbert Klockiewicz Date: Tue, 4 Mar 2025 16:29:38 +0100 Subject: [PATCH 6/8] docs: change link to modelUrls to point to valid commit --- docs/docs/computer-vision/useOCR.md | 2 +- docs/docs/computer-vision/useVerticalOCR.md | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/docs/computer-vision/useOCR.md b/docs/docs/computer-vision/useOCR.md index fe9d409b40..4a43a4a04d 100644 --- a/docs/docs/computer-vision/useOCR.md +++ b/docs/docs/computer-vision/useOCR.md @@ -6,7 +6,7 @@ sidebar_position: 4 Optical character recognition(OCR) is a computer vision technique that detects and recognizes text within the image. It's commonly used to convert different types of documents, such as scanned paper documents, PDF files, or images captured by a digital camera, into editable and searchable data. :::caution -It is recommended to use models provided by us, which are available at our [Hugging Face repository](https://huggingface.co/software-mansion). You can also use [constants](https://github.com/software-mansion/react-native-executorch/blob/69802ee1ca161d9df00def1dabe014d36341cfa9/src/constants/modelUrls.ts#L28) shipped with our library. +It is recommended to use models provided by us, which are available at our [Hugging Face repository](https://huggingface.co/software-mansion). You can also use [constants](https://github.com/software-mansion/react-native-executorch/blob/765305abc289083787eb9613b899d6fcc0e24126/src/constants/modelUrls.ts#L51) shipped with our library. ::: ## Reference diff --git a/docs/docs/computer-vision/useVerticalOCR.md b/docs/docs/computer-vision/useVerticalOCR.md index 08d1710e30..5af6a6c3ed 100644 --- a/docs/docs/computer-vision/useVerticalOCR.md +++ b/docs/docs/computer-vision/useVerticalOCR.md @@ -10,7 +10,7 @@ The `useVerticalOCR` hook is currently in an experimental phase. We appreciate f Optical Character Recognition (OCR) is a computer vision technique used to detect and recognize text within images. It is commonly utilized to convert a variety of documents, such as scanned paper documents, PDF files, or images captured by a digital camera, into editable and searchable data. Traditionally, OCR technology has been optimized for recognizing horizontal text, and integrating support for vertical text recognition often requires significant additional effort from developers. To simplify this, we introduce `useVerticalOCR`, a tool designed to abstract the complexities of vertical text OCR, enabling seamless integration into your applications. :::caution -It is recommended to use models provided by us, which are available at our [Hugging Face repository](https://huggingface.co/software-mansion). You can also use [constants](https://github.com/software-mansion/react-native-executorch/blob/69802ee1ca161d9df00def1dabe014d36341cfa9/src/constants/modelUrls.ts#L28) shipped with our library. +It is recommended to use models provided by us, which are available at our [Hugging Face repository](https://huggingface.co/software-mansion). You can also use [constants](https://github.com/software-mansion/react-native-executorch/blob/765305abc289083787eb9613b899d6fcc0e24126/src/constants/modelUrls.ts#L51) shipped with our library. ::: ## Reference From 293301329a252bf9f594bb2dc40019179fef81cb Mon Sep 17 00:00:00 2001 From: Norbert Klockiewicz Date: Wed, 5 Mar 2025 11:15:33 +0100 Subject: [PATCH 7/8] docs: add information about each model path in ocr docs, change headers in module api so example is one section --- docs/docs/computer-vision/useOCR.md | 12 ++++++---- docs/docs/computer-vision/useVerticalOCR.md | 18 ++++++++++----- docs/docs/hookless-api/OCRModule.md | 18 ++++++++++----- docs/docs/hookless-api/VerticalOCRModule.md | 25 ++++++++++++++------- docs/docs/module-api/executorch-bindings.md | 6 ++--- 5 files changed, 52 insertions(+), 27 deletions(-) diff --git a/docs/docs/computer-vision/useOCR.md b/docs/docs/computer-vision/useOCR.md index 4a43a4a04d..ff82022e1e 100644 --- a/docs/docs/computer-vision/useOCR.md +++ b/docs/docs/computer-vision/useOCR.md @@ -46,9 +46,9 @@ function App() { ```typescript interface RecognizerSources { - recognizerLarge: string; - recognizerMedium: string; - recognizerSmall: string; + recognizerLarge: string | number; + recognizerMedium: string | number; + recognizerSmall: string | number; } type OCRLanguage = 'en'; @@ -71,7 +71,11 @@ interface OCRDetection { **`detectorSource`** - A string that specifies the location of the detector binary. For more information, take a look at [loading models](../fundamentals/loading-models.md) section. -**`recognizerSources`** - An object that specifies locations of the recognizers binary files. For more information, take a look at [loading models](../fundamentals/loading-models.md) section. +**`recognizerSources`** - An object that specifies locations of the recognizers binary files. For more information, take a look at [loading models](../fundamentals/loading-models.md) section. Each recognizer is composed of three models tailored to process images of varying widths. + +- `recognizerLarge` - A string that specifies the location of the recognizer binary file which accepts input images with a width of 512 pixels. +- `recognizerMedium` - A string that specifies the location of the recognizer binary file which accepts input images with a width of 256 pixels. +- `recognizerSmall` - A string that specifies the location of the recognizer binary file which accepts input images with a width of 128 pixels. **`language`** - A parameter that specifies the language of the text to be recognized by the OCR. diff --git a/docs/docs/computer-vision/useVerticalOCR.md b/docs/docs/computer-vision/useVerticalOCR.md index 5af6a6c3ed..996fc0784e 100644 --- a/docs/docs/computer-vision/useVerticalOCR.md +++ b/docs/docs/computer-vision/useVerticalOCR.md @@ -53,13 +53,13 @@ function App() { ```typescript interface DetectorSources { - detectorLarge: string; - detectorNarrow: string; + detectorLarge: string | number; + detectorNarrow: string | number; } interface RecognizerSources { - recognizerLarge: string; - recognizerSmall: string; + recognizerLarge: string | number; + recognizerSmall: string | number; } type OCRLanguage = 'en'; @@ -80,9 +80,15 @@ interface OCRDetection { ### Arguments -**`detectorSources`** - An object that specifies the location of the detectors binary files. For more information, take a look at [loading models](../fundamentals/loading-models.md) section. +**`detectorSources`** - An object that specifies the location of the detectors binary files. For more information, take a look at [loading models](../fundamentals/loading-models.md) section. Each detector is composed of two models tailored to process images of varying widths. -**`recognizerSources`** - An object that specifies the locations of the recognizers binary files. For more information, take a look at [loading models](../fundamentals/loading-models.md) section. +- `detectorLarge` - A string that specifies the location of the recognizer binary file which accepts input images with a width of 1280 pixels. +- `detectorNarrow` - A string that specifies the location of the detector binary file which accepts input images with a width of 320 pixels. + +**`recognizerSources`** - An object that specifies the locations of the recognizers binary files. For more information, take a look at [loading models](../fundamentals/loading-models.md) section. Each recognizer is composed of two models tailored to process images of varying widths. + +- `recognizerLarge` - A string that specifies the location of the recognizer binary file which accepts input images with a width of 512 pixels. +- `recognizerSmall` - A string that specifies the location of the recognizer binary file which accepts input images with a width of 64 pixels. **`language`** - A parameter that specifies the language of the text to be recognized by the OCR. diff --git a/docs/docs/hookless-api/OCRModule.md b/docs/docs/hookless-api/OCRModule.md index fac76df9cb..7d14085bfc 100644 --- a/docs/docs/hookless-api/OCRModule.md +++ b/docs/docs/hookless-api/OCRModule.md @@ -45,9 +45,9 @@ const ocrDetections = await OCRModule.forward(imageUri); ```typescript interface RecognizerSources { - recognizerLarge: string; - recognizerMedium: string; - recognizerSmall: string; + recognizerLarge: string | number; + recognizerMedium: string | number; + recognizerSmall: string | number; } type OCRLanguage = 'en'; @@ -70,9 +70,15 @@ interface OCRDetection { To load the model, use the `load` method. It accepts: -- `detectorSource` - A string that specifies the location of the detector binary file. For more information, take a look at [loading models](../fundamentals/loading-models.md) section. -- `recognizerSources` - An object that specifies the locations of the recognizers binary files. For more information, take a look at [loading models](../fundamentals/loading-models.md) section. -- `language` - A parameter that specifies the language of the text to be recognized by the OCR. +**`detectorSource`** - A string that specifies the location of the detector binary. For more information, take a look at [loading models](../fundamentals/loading-models.md) section. + +**`recognizerSources`** - An object that specifies locations of the recognizers binary files. For more information, take a look at [loading models](../fundamentals/loading-models.md) section. Each recognizer is composed of three models tailored to process images of varying widths. + +- `recognizerLarge` - A string that specifies the location of the recognizer binary file which accepts input images with a width of 512 pixels. +- `recognizerMedium` - A string that specifies the location of the recognizer binary file which accepts input images with a width of 256 pixels. +- `recognizerSmall` - A string that specifies the location of the recognizer binary file which accepts input images with a width of 128 pixels. + +**`language`** - A parameter that specifies the language of the text to be recognized by the OCR. This method returns a promise, which can resolve to an error or void. diff --git a/docs/docs/hookless-api/VerticalOCRModule.md b/docs/docs/hookless-api/VerticalOCRModule.md index 030f2c74d3..67b08a6df1 100644 --- a/docs/docs/hookless-api/VerticalOCRModule.md +++ b/docs/docs/hookless-api/VerticalOCRModule.md @@ -49,13 +49,13 @@ const ocrDetections = await VerticalOCRModule.forward(imageUri); ```typescript interface DetectorSources { - detectorLarge: string; - detectorNarrow: string; + detectorLarge: string | number; + detectorNarrow: string | number; } interface RecognizerSources { - recognizerLarge: string; - recognizerSmall: string; + recognizerLarge: string | number; + recognizerSmall: string | number; } type OCRLanguage = 'en'; @@ -78,10 +78,19 @@ interface OCRDetection { To load the model, use the `load` method. It accepts: -- `detectorSources` - An object that specifies the location of the detectors binary files. For more information, take a look at [loading models](../fundamentals/loading-models.md) section. -- `recognizerSources` - An object that specifies the locations of the recognizers binary files. For more information, take a look at [loading models](../fundamentals/loading-models.md) section. -- `independentCharacters` - A boolean parameter that indicates whether the text in the image consists of a random sequence of characters. If set to true, the algorithm will scan each character individually instead of reading them as continuous text. -- `language` - A parameter that specifies the language of the text to be recognized by the OCR. +**`detectorSources`** - An object that specifies the location of the detectors binary files. For more information, take a look at [loading models](../fundamentals/loading-models.md) section. Each detector is composed of two models tailored to process images of varying widths. + +- `detectorLarge` - A string that specifies the location of the recognizer binary file which accepts input images with a width of 1280 pixels. +- `detectorNarrow` - A string that specifies the location of the detector binary file which accepts input images with a width of 320 pixels. + +**`recognizerSources`** - An object that specifies the locations of the recognizers binary files. For more information, take a look at [loading models](../fundamentals/loading-models.md) section. Each recognizer is composed of two models tailored to process images of varying widths. + +- `recognizerLarge` - A string that specifies the location of the recognizer binary file which accepts input images with a width of 512 pixels. +- `recognizerSmall` - A string that specifies the location of the recognizer binary file which accepts input images with a width of 64 pixels. + +**`language`** - A parameter that specifies the language of the text to be recognized by the OCR. + +**`independentCharacters`** – A boolean parameter that indicates whether the text in the image consists of a random sequence of characters. If set to true, the algorithm will scan each character individually instead of reading them as continuous text. This method returns a promise, which can resolve to an error or void. diff --git a/docs/docs/module-api/executorch-bindings.md b/docs/docs/module-api/executorch-bindings.md index 282beaf533..e2e48ab63f 100644 --- a/docs/docs/module-api/executorch-bindings.md +++ b/docs/docs/module-api/executorch-bindings.md @@ -61,7 +61,7 @@ To run model with ExecuTorch Bindings it's essential to specify the shape of the This example demonstrates the integration and usage of the ExecuTorch bindings with a [style transfer model](../computer-vision/useStyleTransfer.md). Specifically, we'll be using the `STYLE_TRANSFER_CANDY` model, which applies artistic style transfer to an input image. -## Importing the Module and loading the model +### Importing the Module and loading the model First, import the necessary functions from the `react-native-executorch` package and initialize the ExecuTorch module with the specified style transfer model. @@ -77,7 +77,7 @@ const executorchModule = useExecutorchModule({ }); ``` -## Setting up input parameters +### Setting up input parameters To prepare the input for the model, define the shape of the input tensor. This shape depends on the model's requirements. For the `STYLE_TRANSFER_CANDY` model, we need a tensor of shape `[1, 3, 640, 640]`, corresponding to a batch size of 1, 3 color channels (RGB), and dimensions of 640x640 pixels. @@ -88,7 +88,7 @@ const shape = [1, 3, 640, 640]; const input = new Float32Array(1 * 3 * 640 * 640); // fill this array with your image data ``` -## Performing inference +### Performing inference ```typescript try { From 86a0edc8f11ca3558ece16b2f82c48ff6f05e256 Mon Sep 17 00:00:00 2001 From: Norbert Klockiewicz Date: Wed, 5 Mar 2025 13:00:56 +0100 Subject: [PATCH 8/8] docs: move loading models link to the bottom of detector/recognizer sources section --- docs/docs/computer-vision/useOCR.md | 4 +++- docs/docs/computer-vision/useVerticalOCR.md | 8 ++++++-- docs/docs/hookless-api/OCRModule.md | 4 +++- docs/docs/hookless-api/VerticalOCRModule.md | 8 ++++++-- 4 files changed, 18 insertions(+), 6 deletions(-) diff --git a/docs/docs/computer-vision/useOCR.md b/docs/docs/computer-vision/useOCR.md index ff82022e1e..e2431f49a8 100644 --- a/docs/docs/computer-vision/useOCR.md +++ b/docs/docs/computer-vision/useOCR.md @@ -71,12 +71,14 @@ interface OCRDetection { **`detectorSource`** - A string that specifies the location of the detector binary. For more information, take a look at [loading models](../fundamentals/loading-models.md) section. -**`recognizerSources`** - An object that specifies locations of the recognizers binary files. For more information, take a look at [loading models](../fundamentals/loading-models.md) section. Each recognizer is composed of three models tailored to process images of varying widths. +**`recognizerSources`** - An object that specifies locations of the recognizers binary files. Each recognizer is composed of three models tailored to process images of varying widths. - `recognizerLarge` - A string that specifies the location of the recognizer binary file which accepts input images with a width of 512 pixels. - `recognizerMedium` - A string that specifies the location of the recognizer binary file which accepts input images with a width of 256 pixels. - `recognizerSmall` - A string that specifies the location of the recognizer binary file which accepts input images with a width of 128 pixels. +For more information, take a look at [loading models](../fundamentals/loading-models.md) section. + **`language`** - A parameter that specifies the language of the text to be recognized by the OCR. ### Returns diff --git a/docs/docs/computer-vision/useVerticalOCR.md b/docs/docs/computer-vision/useVerticalOCR.md index 996fc0784e..8fb82d507c 100644 --- a/docs/docs/computer-vision/useVerticalOCR.md +++ b/docs/docs/computer-vision/useVerticalOCR.md @@ -80,16 +80,20 @@ interface OCRDetection { ### Arguments -**`detectorSources`** - An object that specifies the location of the detectors binary files. For more information, take a look at [loading models](../fundamentals/loading-models.md) section. Each detector is composed of two models tailored to process images of varying widths. +**`detectorSources`** - An object that specifies the location of the detectors binary files. Each detector is composed of two models tailored to process images of varying widths. - `detectorLarge` - A string that specifies the location of the recognizer binary file which accepts input images with a width of 1280 pixels. - `detectorNarrow` - A string that specifies the location of the detector binary file which accepts input images with a width of 320 pixels. -**`recognizerSources`** - An object that specifies the locations of the recognizers binary files. For more information, take a look at [loading models](../fundamentals/loading-models.md) section. Each recognizer is composed of two models tailored to process images of varying widths. +For more information, take a look at [loading models](../fundamentals/loading-models.md) section. + +**`recognizerSources`** - An object that specifies the locations of the recognizers binary files. Each recognizer is composed of two models tailored to process images of varying widths. - `recognizerLarge` - A string that specifies the location of the recognizer binary file which accepts input images with a width of 512 pixels. - `recognizerSmall` - A string that specifies the location of the recognizer binary file which accepts input images with a width of 64 pixels. +For more information, take a look at [loading models](../fundamentals/loading-models.md) section. + **`language`** - A parameter that specifies the language of the text to be recognized by the OCR. **`independentCharacters`** – A boolean parameter that indicates whether the text in the image consists of a random sequence of characters. If set to true, the algorithm will scan each character individually instead of reading them as continuous text. diff --git a/docs/docs/hookless-api/OCRModule.md b/docs/docs/hookless-api/OCRModule.md index 7d14085bfc..493371196f 100644 --- a/docs/docs/hookless-api/OCRModule.md +++ b/docs/docs/hookless-api/OCRModule.md @@ -72,12 +72,14 @@ To load the model, use the `load` method. It accepts: **`detectorSource`** - A string that specifies the location of the detector binary. For more information, take a look at [loading models](../fundamentals/loading-models.md) section. -**`recognizerSources`** - An object that specifies locations of the recognizers binary files. For more information, take a look at [loading models](../fundamentals/loading-models.md) section. Each recognizer is composed of three models tailored to process images of varying widths. +**`recognizerSources`** - An object that specifies locations of the recognizers binary files. Each recognizer is composed of three models tailored to process images of varying widths. - `recognizerLarge` - A string that specifies the location of the recognizer binary file which accepts input images with a width of 512 pixels. - `recognizerMedium` - A string that specifies the location of the recognizer binary file which accepts input images with a width of 256 pixels. - `recognizerSmall` - A string that specifies the location of the recognizer binary file which accepts input images with a width of 128 pixels. +For more information, take a look at [loading models](../fundamentals/loading-models.md) section. + **`language`** - A parameter that specifies the language of the text to be recognized by the OCR. This method returns a promise, which can resolve to an error or void. diff --git a/docs/docs/hookless-api/VerticalOCRModule.md b/docs/docs/hookless-api/VerticalOCRModule.md index 67b08a6df1..d876b82778 100644 --- a/docs/docs/hookless-api/VerticalOCRModule.md +++ b/docs/docs/hookless-api/VerticalOCRModule.md @@ -78,16 +78,20 @@ interface OCRDetection { To load the model, use the `load` method. It accepts: -**`detectorSources`** - An object that specifies the location of the detectors binary files. For more information, take a look at [loading models](../fundamentals/loading-models.md) section. Each detector is composed of two models tailored to process images of varying widths. +**`detectorSources`** - An object that specifies the location of the detectors binary files. Each detector is composed of two models tailored to process images of varying widths. - `detectorLarge` - A string that specifies the location of the recognizer binary file which accepts input images with a width of 1280 pixels. - `detectorNarrow` - A string that specifies the location of the detector binary file which accepts input images with a width of 320 pixels. -**`recognizerSources`** - An object that specifies the locations of the recognizers binary files. For more information, take a look at [loading models](../fundamentals/loading-models.md) section. Each recognizer is composed of two models tailored to process images of varying widths. +For more information, take a look at [loading models](../fundamentals/loading-models.md) section. + +**`recognizerSources`** - An object that specifies the locations of the recognizers binary files. Each recognizer is composed of two models tailored to process images of varying widths. - `recognizerLarge` - A string that specifies the location of the recognizer binary file which accepts input images with a width of 512 pixels. - `recognizerSmall` - A string that specifies the location of the recognizer binary file which accepts input images with a width of 64 pixels. +For more information, take a look at [loading models](../fundamentals/loading-models.md) section. + **`language`** - A parameter that specifies the language of the text to be recognized by the OCR. **`independentCharacters`** – A boolean parameter that indicates whether the text in the image consists of a random sequence of characters. If set to true, the algorithm will scan each character individually instead of reading them as continuous text.