Tesseract.js

[![Build Status](https://travis-ci.org/naptha/tesseract.js.svg?branch=master)](https://travis-ci.org/naptha/tesseract.js) [![npm version](https://badge.fury.io/js/tesseract.js.svg)](https://badge.fury.io/js/tesseract.js) [![Maintenance](https://img.shields.io/badge/Maintained%3F-yes-green.svg)](https://github.com/naptha/tesseract.js/graphs/commit-activity) [![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://opensource.org/licenses/Apache-2.0) [![Code Style](https://badgen.net/badge/code%20style/airbnb/ff5a5f?icon=airbnb)](https://github.com/airbnb/javascript) [![Downloads Total](https://img.shields.io/npm/dt/tesseract.js.svg)](https://www.npmjs.com/package/tesseract.js) [![Downloads Month](https://img.shields.io/npm/dm/tesseract.js.svg)](https://www.npmjs.com/package/tesseract.js)

Version 2 is now available and under development in the master branch
Check the support/1.x branch for version 1


Tesseract.js is a javascript library that gets words in [almost any language](./docs/tesseract_lang_list.md) out of images. ([Demo](http://tesseract.projectnaptha.com/)) [![fancy demo gif](./docs/images/demo.gif)](http://tesseract.projectnaptha.com) Tesseract.js wraps an [emscripten](https://github.com/kripken/emscripten) [port](https://github.com/naptha/tesseract.js-core) of the [Tesseract](https://github.com/tesseract-ocr/tesseract) [OCR](https://en.wikipedia.org/wiki/Optical_character_recognition) Engine. It works in the browser using [webpack](https://webpack.js.org/) or plain script tags with a [#CDN](CDN) and on the server with [Node.js](https://nodejs.org/en/). After you [install it](#installation), using it is as simple as: ```javascript import { TesseractWorker } from 'tesseract.js'; const worker = new TesseractWorker(); worker.recognize(myImage) .progress(progress => { console.log('progress', progress); }).then(result => { console.log('result', result); }); ``` [Check out the docs](#docs) for a full explanation of the API. ## Major changes in v2 - Upgrade to tesseract v4 - Support multiple languages at the same time, eg: eng+chi_tra for English and Traditional Chinese - Supported image formats: png, jpg, bmp, pbm ## Installation Tesseract.js works with a ` ``` After including the script the `Tesseract` variable will be globally available. ### Node.js **Tesseract.js currently requires Node.js v6.8.0 or higher** ```shell # For v2 npm install tesseract.js@next yarn add tesseract.js@next # For v1 npm install tesseract.js yarn add tesseract.js ``` ## Documentation * [Examples](./docs/examples.md) * [Image Format](./docs/image-format.md) * [API](./docs/api.md) * [Local Installation](./docs/local-installation.md) * [FAQ](./docs/faq.md) ## Contributing ### Development To run a development copy of Tesseract.js do the following: ```shell # First we clone the repository git clone https://github.com/naptha/tesseract.js.git cd tesseract.js # Then we install the dependencies npm install # And finally we start the development server npm start ``` The development server will be available at http://localhost:3000/examples/browser/demo.html in your favorite browser. It will automatically rebuild `tesseract.dev.js` and `worker.min.js` when you change files in the src folder. ### Building Static Files To build the compiled static files just execute the following: ```shell npm run build ``` This will output the files into the `dist` directory.