Pure Javascript OCR for more than 100 Languages 📖🎉🖥
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
 
 

2.8 KiB

Tesseract.js Parameters

In the 3rd argument of TesseractWorker.recognize(), you can pass a params object to customize the result of OCR, below are supported parameters in tesseract.js so far.

Example:

import Tesseract from 'tesseract.js';

const { TesseractWorker, OEM, PSM } = Tesseract;
const worker = new TesseractWorker();

worker
  .recognize(image, 'eng', {
    tessedit_ocr_engine_mode: OEM.LSTM_ONLY,
    tessedit_pageseg_mode: PSM.SINGLE_BLOCK,
  })
  .then(result => console.log(result.text));
name type default value description
tessedit_ocr_engine_mode enum OEM.LSTM_ONLY Check HERE for definition of each mode
tessedit_pageseg_mode enum PSM.SINGLE_BLOCK Check HERE for definition of each mode
tessedit_char_whitelist string '' setting white list characters makes the result only contains these characters, useful the content in image is limited
tessjs_create_pdf string '0' only 2 values, '0' or '1', when the value is '1', tesseract.js generates a pdf output
tessjs_create_hocr string '1' only 2 values, '0' or '1', when the value is '1', tesseract.js includes hocr in the result
tessjs_create_tsv string '1' only 2 values, '0' or '1', when the value is '1', tesseract.js includes tsv in the result
tessjs_create_box string '0' only 2 values, '0' or '1', when the value is '1', tesseract.js includes box in the result
tessjs_create_unlv string '0' only 2 values, '0' or '1', when the value is '1', tesseract.js includes unlv in the result
tessjs_create_osd string '0' only 2 values, '0' or '1', when the value is '1', tesseract.js includes osd in the result
tessjs_pdf_name string 'tesseract.js-ocr-result' the name of the generated pdf file
tessjs_pdf_title string 'Tesseract.js OCR Result' the title of the generated pdf file
tessjs_pdf_auto_download boolean true If the value is true, tesseract.js will automatic download/writeFile pdf file
tessjs_pdf_bin boolean false whether to include pdf binary array in the result object (result.files.pdf)
tessjs_image_rectangle_left number 0 The left of the sub-rectangle of the image.
tessjs_image_rectangle_top number 0 The top of the sub-rectangle of the image.
tessjs_image_rectangle_width number -1 The width of the sub-rectangle of the image, -1 means auto width detection
tessjs_image_rectangle_height number -1 The height of the sub-rectangle of the image, -1 means auto height detection