Pure Javascript OCR for more than 100 Languages 📖🎉🖥
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

152 lines
3.9 KiB

5 years ago
# Tesseract.js Examples
You can also check [examples](../examples) folder.
5 years ago
### basic
```javascript
5 years ago
import { createWorker } from 'tesseract.js';
const worker = createWorker();
(async () => {
await worker.load();
await worker.loadLanguage('eng');
await worker.initialize('eng');
const { data: { text } } = await worker.recognize('https://tesseract.projectnaptha.com/img/eng_bw.png');
console.log(text);
await worker.terminate();
})();
5 years ago
```
### with detailed progress
```javascript
5 years ago
import { createWorker } from 'tesseract.js';
const worker = createWorker({
logger: m => console.log(m), // Add logger here
});
(async () => {
await worker.load();
await worker.loadLanguage('eng');
await worker.initialize('eng');
const { data: { text } } = await worker.recognize('https://tesseract.projectnaptha.com/img/eng_bw.png');
console.log(text);
await worker.terminate();
})();
5 years ago
```
### with multiple languages, separate by '+'
5 years ago
```javascript
5 years ago
import { createWorker } from 'tesseract.js';
const worker = createWorker();
(async () => {
await worker.load();
await worker.loadLanguage('eng+chi_tra');
await worker.initialize('eng+chi_tra');
const { data: { text } } = await worker.recognize('https://tesseract.projectnaptha.com/img/eng_bw.png');
console.log(text);
await worker.terminate();
})();
5 years ago
```
5 years ago
### with whitelist char (^2.0.0-beta.1)
5 years ago
```javascript
import { createWorker } from 'tesseract.js';
5 years ago
const worker = createWorker();
5 years ago
(async () => {
await worker.load();
await worker.loadLanguage('eng');
await worker.initialize('eng');
await worker.setParameters({
tessedit_char_whitelist: '0123456789',
});
5 years ago
const { data: { text } } = await worker.recognize('https://tesseract.projectnaptha.com/img/eng_bw.png');
console.log(text);
await worker.terminate();
})();
```
5 years ago
### with different pageseg mode (^2.0.0-beta.1)
Check here for more details of pageseg mode: https://github.com/tesseract-ocr/tesseract/blob/4.0.0/src/ccstruct/publictypes.h#L163
```javascript
5 years ago
import { createWorker, PSM } from 'tesseract.js';
5 years ago
const worker = createWorker();
5 years ago
(async () => {
await worker.load();
await worker.loadLanguage('eng');
await worker.initialize('eng');
await worker.setParameters({
tessedit_pageseg_mode: PSM.SINGLE_BLOCK,
});
5 years ago
const { data: { text } } = await worker.recognize('https://tesseract.projectnaptha.com/img/eng_bw.png');
console.log(text);
await worker.terminate();
})();
```
5 years ago
### with pdf output (^2.0.0-beta.1)
5 years ago
Please check **examples** folder for details.
5 years ago
Browser: [download-pdf.html](../examples/browser/download-pdf.html)
Node: [download-pdf.js](../examples/node/download-pdf.js)
5 years ago
### with only part of the image (^2.0.0-beta.1)
5 years ago
```javascript
import { createWorker } from 'tesseract.js';
const worker = createWorker();
const rectangles = [
{ left: 0, top: 0, width: 500, height: 250 },
];
(async () => {
await worker.load();
await worker.loadLanguage('eng');
await worker.initialize('eng');
const { data: { text } } = await worker.recognize('https://tesseract.projectnaptha.com/img/eng_bw.png', 'eng', { rectangles });
console.log(text);
await worker.terminate();
})();
```
5 years ago
### with multiple workers to speed up (^2.0.0-beta.1)
```javascript
5 years ago
import { createWorker, createScheduler } from 'tesseract.js';
const scheduler = createScheduler();
const worker1 = createWorker();
const worker2 = createWorker();
(async () => {
await worker1.load();
await worker2.load();
await worker1.loadLanguage('eng');
await worker2.loadLanguage('eng');
await worker1.initialize('eng');
await worker2.initialize('eng');
scheduler.addWorker(worker1);
scheduler.addWorker(worker2);
/** Add 10 recognition jobs */
const results = await Promise.all(Array(10).fill(0).map(() => (
await scheduler.addJob('recognize', 'https://tesseract.projectnaptha.com/img/eng_bw.png')
)))
console.log(results);
await scheduler.terminate(); // It also terminates all workers.
})();
```