DocumentVision is a node.js library for processing and understanding scanned documents.
- Image loading using jpgd, LodePNG and pixel buffers
- Image manipulation using Leptonica (Version 1.69)
- OCR using Tesseract (Version 3.02)
- OMR for Barcodes using ZXing (Version 2.10 with PDF417 patches applied)
[sudo] npm install [-g] dv
Once you've installed, download that image. You can use any other image containing simple text at 300dpi or higher. Now run the following code snipped to recognize text from your image:
var dv = require('dv');
var fs = require('fs');
var image = new dv.Image('png', fs.readFileSync('textpage300.png'));
var tesseract = new dv.Tesseract('eng', image);
console.log(tesseract.findText('plain'));Here are some quick links to help you get started:
Licensed under the incredibly permissive MIT License. Copyright © 2012 Christoph Schulz.