Tesseract.js

![Lint & Test](https://github.com/naptha/tesseract.js/workflows/Node.js%20CI/badge.svg) ![CodeQL](https://github.com/naptha/tesseract.js/workflows/CodeQL/badge.svg) [![Gitpod Ready-to-Code](https://img.shields.io/badge/Gitpod-ready--to--code-blue?logo=gitpod)](https://github.com/naptha/tesseract.js) [![Financial Contributors on Open Collective](https://opencollective.com/tesseractjs/all/badge.svg?label=financial+contributors)](https://opencollective.com/tesseractjs) [![npm version](https://badge.fury.io/js/tesseract.js.svg)](https://badge.fury.io/js/tesseract.js) [![Maintenance](https://img.shields.io/badge/Maintained%3F-yes-green.svg)](https://github.com/naptha/tesseract.js/graphs/commit-activity) [![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://opensource.org/licenses/Apache-2.0) [![Code Style](https://badgen.net/badge/code%20style/airbnb/ff5a5f?icon=airbnb)](https://github.com/airbnb/javascript) [![Downloads Total](https://img.shields.io/npm/dt/tesseract.js.svg)](https://www.npmjs.com/package/tesseract.js) [![Downloads Month](https://img.shields.io/npm/dm/tesseract.js.svg)](https://www.npmjs.com/package/tesseract.js) Tesseract.js is a javascript library that gets words in [almost any language](./docs/tesseract_lang_list.md) out of images. ([Demo](http://tesseract.projectnaptha.com/)) Image Recognition [![fancy demo gif](./docs/images/demo.gif)](http://tesseract.projectnaptha.com) Video Real-time Recognition

Tesseract.js Video

Tesseract.js wraps an [emscripten](https://github.com/kripken/emscripten) [port](https://github.com/naptha/tesseract.js-core) of the [Tesseract](https://github.com/tesseract-ocr/tesseract) [OCR](https://en.wikipedia.org/wiki/Optical_character_recognition) Engine. It works in the browser using [webpack](https://webpack.js.org/) or plain script tags with a [CDN](#CDN) and on the server with [Node.js](https://nodejs.org/en/). After you [install it](#installation), using it is as simple as: ```javascript import Tesseract from 'tesseract.js'; Tesseract.recognize( 'https://tesseract.projectnaptha.com/img/eng_bw.png', 'eng', { logger: m => console.log(m) } ).then(({ data: { text } }) => { console.log(text); }) ``` Or more imperative ```javascript import { createWorker } from 'tesseract.js'; const worker = await createWorker({ logger: m => console.log(m) }); (async () => { await worker.loadLanguage('eng'); await worker.initialize('eng'); const { data: { text } } = await worker.recognize('https://tesseract.projectnaptha.com/img/eng_bw.png'); console.log(text); await worker.terminate(); })(); ``` [Check out the docs](#documentation) for a full explanation of the API. ## Major changes in v4 Version 4 includes many new features and bug fixes--see [this issue](https://github.com/naptha/tesseract.js/issues/662) for a full list. Several highlights are below. - Added rotation preprocessing options (including auto-rotate) for significantly better accuracy - Processed images (rotated, grayscale, binary) can now be retrieved - Improved support for parallel processing (schedulers) - Breaking changes: - `createWorker` is now async - `getPDF` function replaced by `pdf` recognize option ## Major changes in v3 - Significantly faster performance - Runtime reduction of 84% for Browser and 96% for Node.js when recognizing the [example images](./examples/data) - Upgrade to Tesseract v5.1.0 (using emscripten 3.1.18) - Added SIMD-enabled build for supported devices - Added support: - Node.js version 18 - Removed support: - ASM.js version, any other old versions of Tesseract.js-core (<3.0.0) - Node.js versions 10 and 12 ## Major changes in v2 - Upgrade to tesseract v4.1.1 (using emscripten 1.39.10 upstream) - Support multiple languages at the same time, eg: eng+chi\_tra for English and Traditional Chinese - Supported image formats: png, jpg, bmp, pbm - Support WebAssembly (fallback to ASM.js when browser doesn't support) - Support Typescript Read a story about v2: Why I refactor tesseract.js v2?
Check the support/1.x branch for version 1 ## Installation Tesseract.js works with a ` ``` After including the script the `Tesseract` variable will be globally available. ### Node.js **Tesseract.js v3 requires Node.js v14 or higher** ```shell # For v3 npm install tesseract.js yarn add tesseract.js # For v2 npm install tesseract.js@2 yarn add tesseract.js@2 ``` ## Documentation * [Examples](./docs/examples.md) * [Image Format](./docs/image-format.md) * [API](./docs/api.md) * [Local Installation](./docs/local-installation.md) * [FAQ](./docs/faq.md) ## Use tesseract.js the way you like! - Offline Version: https://github.com/jeromewu/tesseract.js-offline - Electron Version: https://github.com/jeromewu/tesseract.js-electron - Custom Traineddata: https://github.com/jeromewu/tesseract.js-custom-traineddata - Chrome Extension #1: https://github.com/jeromewu/tesseract.js-chrome-extension - Chrome Extension #2: https://github.com/fxnoob/image-to-text - Firefox Extension: https://github.com/gnonio/korporize - With Vue: https://github.com/jeromewu/tesseract.js-vue-app - With Angular: https://github.com/jeromewu/tesseract.js-angular-app - With React: https://github.com/jeromewu/tesseract.js-react-app - Typescript: https://github.com/jeromewu/tesseract.js-typescript - Video Real-time Recognition: https://github.com/jeromewu/tesseract.js-video ## Contributing ### Development To run a development copy of Tesseract.js do the following: ```shell # First we clone the repository git clone https://github.com/naptha/tesseract.js.git cd tesseract.js # Then we install the dependencies npm install # And finally we start the development server npm start ``` The development server will be available at http://localhost:3000/examples/browser/demo.html in your favorite browser. It will automatically rebuild `tesseract.dev.js` and `worker.dev.js` when you change files in the **src** folder. ### Online Setup with a single Click You can use Gitpod(A free online VS Code like IDE) for contributing. With a single click it will launch a ready to code workspace with the build & start scripts already in process and within a few seconds it will spin up the dev server so that you can start contributing straight away without wasting any time. [![Open in Gitpod](https://gitpod.io/button/open-in-gitpod.svg)](https://gitpod.io/#https://github.com/naptha/tesseract.js/blob/master/examples/browser/demo.html) ### Building Static Files To build the compiled static files just execute the following: ```shell npm run build ``` This will output the files into the `dist` directory. ## Contributors ### Code Contributors This project exists thanks to all the people who contribute. [[Contribute](CONTRIBUTING.md)]. ### Financial Contributors Become a financial contributor and help us sustain our community. [[Contribute](https://opencollective.com/tesseractjs/contribute)] #### Individuals #### Organizations Support this project with your organization. Your logo will show up here with a link to your website. [[Contribute](https://opencollective.com/tesseractjs/contribute)]