Update docs

5 years ago · b59d144af3
parent 7f485c4461
commit b59d144af3
6 changed files with 426 additions and 227 deletions
--- a/README.md
+++ b/README.md
@ -11,7 +11,7 @@
 [![Downloads Month](https://img.shields.io/npm/dm/tesseract.js.svg)](https://www.npmjs.com/package/tesseract.js)

 <h3 align="center">
-  Version 2 is now available and under development in the master branch<br>
+  Version 2 beta is now available and under development in the master branch<br>
  Check the <a href="https://github.com/naptha/tesseract.js/tree/support/1.x">support/1.x</a> branch for version 1
 </h3>

@ -26,25 +26,45 @@ It works in the browser using [webpack](https://webpack.js.org/) or plain script
 After you [install it](#installation), using it is as simple as:

 ```javascript
-import { TesseractWorker } from 'tesseract.js';
-const worker = new TesseractWorker();
-
-worker.recognize(myImage)
-  .progress(progress => {
-    console.log('progress', progress);
-  }).then(result => {
-    console.log('result', result);
-  });
+import Tesseract from 'tesseract.js';
+
+Tesseract.recognize(
+  'https://tesseract.projectnaptha.com/img/eng_bw.png',
+  'eng',
+  { logger: m => console.log(m) }
+).then(({ data: { text } }) => {
+  console.log(text);
+})
+```
+
+Or more imperative
+
+```javascript
+import { createWorker } from 'tesseract.js';
+
+const worker = createWorker({
+  logger: m => console.log(m)
+});
+
+(async () => {
+  await worker.load();
+  await worker.loadLanguage('eng');
+  await worker.initialize('eng');
+  const { data: { text } } = await worker.recognize('https://tesseract.projectnaptha.com/img/eng_bw.png');
+  console.log(text);
+  await woker.terminate();
+})();
 ```

 [Check out the docs](#docs) for a full explanation of the API.


-## Major changes in v2
- Upgrade to tesseract v4
+## Major changes in v2 beta
+- Upgrade to tesseract v4.1 (using emscripten 1.38.45)
 - Support multiple languages at the same time, eg: eng+chi_tra for English and Traditional Chinese
 - Supported image formats: png, jpg, bmp, pbm
 - Support WebAssembly (fallback to ASM.js when browser doesn't support)
+- Support Typescript


 ## Installation
@ -54,7 +74,7 @@ Tesseract.js works with a `<script>` tag via local copy or CDN, with webpack via
 ### CDN
 ```html
 <!-- v2 -->
-<script src='https://unpkg.com/tesseract.js@v2.0.0-alpha.16/dist/tesseract.min.js'></script>
+<script src='https://unpkg.com/tesseract.js@v2.0.0-beta.1/dist/tesseract.min.js'></script>

 <!-- v1 -->
 <script src='https://unpkg.com/tesseract.js@1.0.19/src/index.js'></script>
@ -103,7 +123,7 @@ npm start
 ```

 The development server will be available at http://localhost:3000/examples/browser/demo.html in your favorite browser.
-It will automatically rebuild `tesseract.dev.js` and `worker.min.js` when you change files in the src folder.
+It will automatically rebuild `tesseract.dev.js` and `worker.dev.js` when you change files in the **src** folder.

 You can also run the development server in Gitpod ( a free online IDE and dev environment for GitHub that will automate your dev setup ) with a single click.

--- a/docs/api.md
+++ b/docs/api.md
@ -1,5 +1,249 @@
 # API

+- [createWorker()](#create-worker)
+  - [Worker.load](#worker-load)
+  - [Worker.loadLanguage](#worker-load-language)
+  - [Worker.initialize](#worker-initialize)
+  - [Worker.setParameters](#worker-set-parameters)
+  - [Worker.recognize](#worker-recognize)
+  - [Worker.detect](#worker-detect)
+  - [Worker.terminate](#worker-terminate)
+- [createScheduler()](#create-scheduler)
+  - [Scheduler.addWorker](#scheduler-add-worker)
+  - [Scheduler.addJob](#scheduler-add-job)
+  - [Scheduler.getQueueLen](#scheduler-get-queue-len)
+  - [Scheduler.getNumWorkers](#scheduler-get-num-workers)
+- [setLogging()](#set-logging)
+- [recognize()](#recognize)
+- [detect()](#detect)
+- [PSM](#psm)
+- [OEM](#oem)
+
+---
+
+<a name="create-worker"></a>
+## createWorker(options): Worker
+
+createWorker is a factory function that creates a tesseract worker, a worker is basically a Web Worker in browser and Child Process in Node.
+
+**Arguments:**
+
+- `options` an object of customized options
+  - `corePath` path for tesseract-core.js script
+  - `langPath` path for downloading traineddata, do not include `/` at the end of the path
+  - `workerPath` path for downloading worker script
+  - `dataPath` path for saving traineddata in WebAssembly file system, not common to modify
+  - `cachePath` path for the cached traineddata, more useful for Node, for browser it only changes the key in IndexDB
+  - `cacheMethod` a string to indicate the method of cache management, should be one of the following options
+    - write: read cache and write back (default method)
+    - readOnly: read cache and not to write back
+    - refresh: not to read cache and write back
+    - none: not to read cache and not to write back
+  - `workerBlobURL` a boolean to define whether to use Blob URL for worker script, default: true
+  - `gzip` a boolean to define whether the traineddata from the remote is gzipped, default: true
+  - `logger` a function to log the progress, a quick example is `m => console.log(m)`
+
+
+**Examples:**
+
+```javascript
+const { createWorker } = Tesseract;
+const worker = createWorker({
+  langPath: '...',
+  logger: m => console.log(m),
+});
+```
+
+## Worker
+
+A Worker helps you to do the OCR related tasks, it takes few steps to setup Worker before it is fully functional. The full flow is:
+
+- load
+- loadLanguauge
+- initialize
+- setParameters // optional
+- recognize or detect
+- terminate
+
+Each function is async, so using async/await or Promise is required. When it is resolved, you get an object:
+
+```json
+{
+  "jobId": "Job-1-123",
+  "data": { ... }
+}
+```
+
+jobId is generated by Tesseract.js, but you can put your own when calling any of the function above.
+
+<a name="worker-load"></a>
+### Worker.load(jobId): Promise
+
+Worker.load() loads tesseract.js-core scripts (download from remote if not presented), it makes Web Worker/Child Process ready for next action.
+
+**Arguments:**
+
+- `jobId` Please see details above
+
+**Examples:**
+
+```javascript
+(async () => {
+  await worker.load();
+})();
+```
+
+<a name="worker-load-language"></a>
+### Worker.loadLanguage(langs, jobId): Promise
+
+Worker.loadLanguage() loads traineddata from cache or download traineddata from remote, and put traineddata into the WebAssembly file system.
+
+**Arguments:**
+
+- `langs` a string to indicate the languages traineddata to download, multiple languages are concated with **+**, ex: **eng+chi\_tra**
+- `jobId` Please see details above
+
+**Examples:**
+
+```javascript
+(async () => {
+  await worker.loadLanguage('eng+chi_tra');
+})();
+```
+
+<a name="worker-initialize"></a>
+### Worker.initialize(langs, oem, jobId): Promise
+
+Worker.initialize() initializes the Tesseract API, make sure it is ready for doing OCR tasks.
+
+**Arguments:**
+
+- `langs` a string to indicate the languages loaded by Tesseract API, it can be the subset of the languauge traineddata you loaded from Worker.loadLanguage.
+- `oem` a enum to indicate the OCR Engine Mode you use
+- `jobId` Please see details above
+
+**Examples:**
+
+```javascript
+(async () => {
+  /** You can load more languages in advance, but use only part of them in Worker.initialize() */
+  await worker.loadLanguage('eng+chi_tra');
+  await worker.initialize('eng');
+})();
+```
+<a name="worker-set-parameters"></a>
+### Worker.setParameters(params, jobId): Promise
+
+Worker.setParameters() set parameters for Tesseract API (using SetVariable()), it changes the behavior of Tesseract and some parameters like tessedit\_char\_whitelist is very useful.
+
+**Arguments:**
+
+- `params` an object with key and value of the parameters
+- `jobId` Please see details above
+
+**Supported Paramters:**
+
+| name | type | default value | description |
+| ---- | ---- | ------------- | ----------- |
+| tessedit\_ocr\_engine\_mode | enum | OEM.LSTM\_ONLY | Check [HERE](https://github.com/tesseract-ocr/tesseract/blob/4.0.0/src/ccstruct/publictypes.h#L268) for definition of each mode | 
+| tessedit\_pageseg\_mode | enum | PSM.SINGLE\_BLOCK | Check [HERE](https://github.com/tesseract-ocr/tesseract/blob/4.0.0/src/ccstruct/publictypes.h#L163) for definition of each mode |
+| tessedit\_char\_whitelist | string | '' | setting white list characters makes the result only contains these characters, useful the content in image is limited |
+| tessjs\_create\_hocr | string | '1' | only 2 values, '0' or '1', when the value is '1', tesseract.js includes hocr in the result |
+| tessjs\_create\_tsv | string | '1' | only 2 values, '0' or '1', when the value is '1', tesseract.js includes tsv in the result |
+| tessjs\_create\_box | string | '0' | only 2 values, '0' or '1', when the value is '1', tesseract.js includes box in the result |
+| tessjs\_create\_unlv | string | '0' | only 2 values, '0' or '1', when the value is '1', tesseract.js includes unlv in the result |
+| tessjs\_create\_osd | string | '0' | only 2 values, '0' or '1', when the value is '1', tesseract.js includes osd in the result |
+
+**Examples:**
+
+```javascript
+(async () => {
+  await worker.setParameters({
+    tessedit_char_whitelist: '0123456789',
+  });
+})
+```
+
+
+<a name="worker-recognize"></a>
+### Worker.recognize(image, options, jobId): Promise
+<a name="worker-detect"></a>
+### Worker.detect(image, jobId): Promise
+<a name="worker-terminate"></a>
+### Worker.terminate(jobId): Promise
+
+<a name="create-scheduler"></a>
+## createScheduler(): Scheduler
+
+<a name="scheduler-add-worker"></a>
+### Scheduler.addWorker(worker): string
+
+<a name="scheduler-add-job"></a>
+### Scheduler.addJob(worker): Promise
+
+<a name="scheduler-get-queue-len"></a>
+### Scheduler.getQueueLen(): number
+
+Scheduler.getNumWorkers() returns the length of job queue.
+
+<a name="scheduler-get-num-workers"></a>
+### Scheduler.getNumWorkers(): number
+
+Scheduler.getNumWorkers() returns number of workers added into the scheduler
+
+<a name="scheduler-terminate"></a>
+### Scheduler.terminate(): Promise
+
+Scheduler.terminate() terminates all workers added, useful to do quick clean up.
+
+**Examples:**
+
+```javascript
+(async () => {
+  await scheduler.terminate();
+})();
+```
+
+<a name="set-logging"></a>
+## setLogging(logging: boolean)
+
+setLogging() sets the logging flag, you can `setLogging(true)` to see detailed information, useful for debugging.
+
+**Arguments:**
+
+- `logging` boolean to define whether to see detailed logs, default: false
+
+**Examples:**
+
+```javascript
+const { setLogging } = Tesseract;
+setLogging(true);
+```
+
+<a name="recognize"></a>
+## recognize(image, langs, options): Promise
+
+recognize() is a function to quickly achieve recognize() task, it is not recommended to use in real application, but useful when you want to save some time.
+
+See [Tesseract.js](../src/Tesseract.js)
+
+<a name="detect"></a>
+## detect(image, options): Promise
+
+Same background as recongize(), but it does detect instead.
+
+See [Tesseract.js](../src/Tesseract.js)
+
+<a name="psm"></a>
+## PSM
+
+See [PSM.js](../src/constatns/PSM.js)
+
+<a name="oem"></a>
+## OEM
+
+See [OEM.js](../src/constatns/OEM.js)
+
 ## TesseractWorker.recognize(image, lang, [, options]) -> [TesseractJob](#tesseractjob)
 Figures out what words are in `image`, where the words are in `image`, etc.
 > Note: `image` should be sufficiently high resolution.
--- a/docs/examples.md
+++ b/docs/examples.md
@ -12,217 +12,147 @@ Example repositories:
 ### basic

 ```javascript
-import Tesseract from 'tesseract.js';
-
-const { TesseractWorker } = Tesseract;
-const worker = new TesseractWorker();
-
-worker
-  .recognize('https://tesseract.projectnaptha.com/img/eng_bw.png')
-  .progress((p) => {
-    console.log('progress', p);
-  })
-  .then(({ text }) => {
-    console.log(text);
-    worker.terminate();
-  });
+import { createWorker } from 'tesseract.js';
+
+const worker = createWorker();
+
+(async () => {
+  await worker.load();
+  await worker.loadLanguage('eng');
+  await worker.initialize('eng');
+  const { data: { text } } = await worker.recognize('https://tesseract.projectnaptha.com/img/eng_bw.png');
+  console.log(text);
+  await worker.terminate();
+})();
 ```

 ### with detailed progress 

 ```javascript
-import Tesseract from 'tesseract.js';
-
-const { TesseractWorker } = Tesseract;
-const worker = new TesseractWorker();
-
-worker
-  .recognize('https://tesseract.projectnaptha.com/img/eng_bw.png')
-  .progress((p) => {
-    console.log('progress', p);
-  })
-  .then(({ text }) => {
-    console.log(text);
-    worker.terminate();
-  });
+import { createWorker } from 'tesseract.js';
+
+const worker = createWorker({
+  logger: m => console.log(m), // Add logger here
+});
+
+(async () => {
+  await worker.load();
+  await worker.loadLanguage('eng');
+  await worker.initialize('eng');
+  const { data: { text } } = await worker.recognize('https://tesseract.projectnaptha.com/img/eng_bw.png');
+  console.log(text);
+  await worker.terminate();
+})();
 ```

 ### with multiple languages, separate by '+'

 ```javascript
-import Tesseract from 'tesseract.js';
-
-const { TesseractWorker } = Tesseract;
-const worker = new TesseractWorker();
-
-worker
-  .recognize(
-    'https://tesseract.projectnaptha.com/img/eng_bw.png',
-    'eng+chi_tra'
-  )
-  .progress((p) => {
-    console.log('progress', p);
-  })
-  .then(({ text }) => {
-    console.log(text);
-    worker.terminate();
-  });
+import { createWorker } from 'tesseract.js';
+
+const worker = createWorker();
+
+(async () => {
+  await worker.load();
+  await worker.loadLanguage('eng+chi_tra');
+  await worker.initialize('eng+chi_tra');
+  const { data: { text } } = await worker.recognize('https://tesseract.projectnaptha.com/img/eng_bw.png');
+  console.log(text);
+  await worker.terminate();
+})();
 ```
+### with whitelist char (^2.0.0-beta.1)

-### with whitelist char (^2.0.0-alpha.5)
+```javascript
+import { createWorker } from 'tesseract.js';

-Sadly, whitelist chars is not supported in tesseract.js v4, so in tesseract.js we need to switch to tesseract v3 mode to make it work.
+const worker = createWorker();

-```javascript
-import Tesseract from 'tesseract.js';
-
-const { TesseractWorker, OEM } = Tesseract;
-const worker = new TesseractWorker();
-
-worker
-  .recognize(
-    'https://tesseract.projectnaptha.com/img/eng_bw.png',
-    'eng',
-    {
-      'tessedit_ocr_engine_mode': OEM.TESSERACT_ONLY,
-      'tessedit_char_whitelist': '0123456789-.',
-    }
-  )
-  .progress((p) => {
-    console.log('progress', p);
-  })
-  .then(({ text }) => {
-    console.log(text);
-    worker.terminate();
+(async () => {
+  await worker.load();
+  await worker.loadLanguage('eng');
+  await worker.initialize('eng');
+  await worker.setParameters({
+    tessedit_char_whitelist: '0123456789',
  });
+  const { data: { text } } = await worker.recognize('https://tesseract.projectnaptha.com/img/eng_bw.png');
+  console.log(text);
+  await worker.terminate();
+})();
 ```

-### with different pageseg mode (^2.0.0-alpha.5)
+### with different pageseg mode (^2.0.0-beta.1)

 Check here for more details of pageseg mode: https://github.com/tesseract-ocr/tesseract/blob/4.0.0/src/ccstruct/publictypes.h#L163

 ```javascript
-import Tesseract from 'tesseract.js';
-
-const { TesseractWorker, PSM } = Tesseract;
-const worker = new TesseractWorker();
-
-worker
-  .recognize(
-    'https://tesseract.projectnaptha.com/img/eng_bw.png',
-    'eng',
-    {
-      'tessedit_pageseg_mode': PSM.SINGLE_BLOCK,
-    }
-  )
-  .progress((p) => {
-    console.log('progress', p);
-  })
-  .then(({ text }) => {
-    console.log(text);
-    worker.terminate();
-  });
-```
-
-### with pdf output (^2.0.0-alpha.12)
+import { createWorker, PSM } from 'tesseract.js';

-In this example, pdf file will be downloaded in browser and write to file system in Node.js
+const worker = createWorker();

-```javascript
-import Tesseract from 'tesseract.js';
-
-const { TesseractWorker } = Tesseract;
-const worker = new TesseractWorker();
-
-worker
-  .recognize(
-    'https://tesseract.projectnaptha.com/img/eng_bw.png',
-    'eng',
-    {
-      'tessjs_create_pdf': '1',
-    }
-  )
-  .progress((p) => {
-    console.log('progress', p);
-  })
-  .then(({ text }) => {
-    console.log(text);
-    worker.terminate();
+(async () => {
+  await worker.load();
+  await worker.loadLanguage('eng');
+  await worker.initialize('eng');
+  await worker.setParameters({
+    tessedit_pageseg_mode: PSM.SINGLE_BLOCK,
  });
+  const { data: { text } } = await worker.recognize('https://tesseract.projectnaptha.com/img/eng_bw.png');
+  console.log(text);
+  await worker.terminate();
+})();
 ```

-If you want to handle pdf file by yourself
+### with pdf output (^2.0.0-beta.1)

-```javascript
-import Tesseract from 'tesseract.js';
-
-const { TesseractWorker } = Tesseract;
-const worker = new TesseractWorker();
-
-worker
-  .recognize(
-    'https://tesseract.projectnaptha.com/img/eng_bw.png',
-    'eng',
-    {
-      'tessjs_create_pdf': '1',
-      'tessjs_pdf_auto_download': false, // disable auto download
-      'tessjs_pdf_bin': true,            // add pdf file bin array in result
-    }
-  )
-  .progress((p) => {
-    console.log('progress', p);
-  })
-  .then(({ files: { pdf } }) => {
-    console.log(Object.values(pdf)); // As pdf is an array-like object, you need to do a little convertion first.
-    worker.terminate();
-  });
-```
+Please check **examples** folder for details.

-### with preload language data
+Browser: [download-pdf.html](../examples/browser/download-pdf.html)
+Node: [download-pdf.js](../examples/node/download-pdf.js)

-```javascript
-const Tesseract = require('tesseract.js');
-
-const { TesseractWorker, utils: { loadLang } } = Tesseract;
-const worker = new TesseractWorker();
-
-loadLang({ langs: 'eng', langPath: worker.options.langPath })
-  .then(() => {
-    worker
-      .recognize('https://tesseract.projectnaptha.com/img/eng_bw.png')
-      .progress(p => console.log(p))
-      .then(({ text }) => {
-        console.log(text);
-        worker.terminate();
-      });
-  });
+### with only part of the image (^2.0.0-beta.1)

+```javascript
+import { createWorker } from 'tesseract.js';
+
+const worker = createWorker();
+const rectangles = [
+  { left: 0, top: 0, width: 500, height: 250 },
+];
+
+(async () => {
+  await worker.load();
+  await worker.loadLanguage('eng');
+  await worker.initialize('eng');
+  const { data: { text } } = await worker.recognize('https://tesseract.projectnaptha.com/img/eng_bw.png', 'eng', { rectangles });
+  console.log(text);
+  await worker.terminate();
+})();
 ```

-### with only part of the image (^2.0.0-alpha.12)
+### with multiple workers to speed up (^2.0.0-beta.1)

 ```javascript
-import Tesseract from 'tesseract.js';
-
-const { TesseractWorker } = Tesseract;
-const worker = new TesseractWorker();
-
-worker
-  .recognize(
-    'https://tesseract.projectnaptha.com/img/eng_bw.png',
-    'eng',
-    {
-      tessjs_image_rectangle_left: 0,
-      tessjs_image_rectangle_top: 0,
-      tessjs_image_rectangle_width: 500,
-      tessjs_image_rectangle_height: 250,
-    }
-  )
-  .progress((p) => {
-    console.log('progress', p);
-  })
-  .then(({ text }) => {
-    console.log(text);
-    worker.terminate();
-  });
+import { createWorker, createScheduler } from 'tesseract.js';
+
+const scheduler = createScheduler();
+const worker1 = createWorker();
+const worker2 = createWorker();
+
+(async () => {
+  await worker1.load();
+  await worker2.load();
+  await worker1.loadLanguage('eng');
+  await worker2.loadLanguage('eng');
+  await worker1.initialize('eng');
+  await worker2.initialize('eng');
+  scheduler.addWorker(worker1);
+  scheduler.addWorker(worker2);
+  /** Add 10 recognition jobs */
+  const results = await Promise.all(Array(10).fill(0).map(() => (
+    await scheduler.addJob('recognize', 'https://tesseract.projectnaptha.com/img/eng_bw.png')
+  )))
+  console.log(results);
+  await scheduler.terminate(); // It also terminates all workers.
+})();
 ```
--- a/docs/faq.md
+++ b/docs/faq.md
@ -3,9 +3,9 @@ FAQ

 ## How does tesseract.js download and keep \*.traineddata?

-When you execute recognize() function (ex: `recognize(image, 'eng')`), the language model to download is determined by the 2nd argument of recognize(). (`eng` in the example)
+The language model is downloaded by `worker.loadLanguage()` and you need to pass the langs to `worker.initialize()`.

-Tesseract.js will first check if \*.traineddata already exists. (browser: [IndexedDB](https://developer.mozilla.org/en-US/docs/Web/API/IndexedDB_API), Node.js: fs, in the folder you execute the command) If the \*.traineddata doesn't exist, it will fetch \*.traineddata.gz from [tessdata](https://github.com/naptha/tessdata), ungzip and store in IndexedDB or fs, you can delete it manually and it will download again for you.
+During the downloading of language model, Tesseract.js will first check if \*.traineddata already exists. (browser: [IndexedDB](https://developer.mozilla.org/en-US/docs/Web/API/IndexedDB_API), Node.js: fs, in the folder you execute the command) If the \*.traineddata doesn't exist, it will fetch \*.traineddata.gz from [tessdata](https://github.com/naptha/tessdata), ungzip and store in IndexedDB or fs, you can delete it manually and it will download again for you.

 ## How can I train my own \*.traineddata?

@ -15,26 +15,28 @@ For tesseract.js v1, check [Training Tesseract 3.03–3.05](https://github.com/t

 ## How can I get HOCR, TSV, Box, UNLV, OSD?

-Starting from 2.0.0-alpha.10, you can get all these information in the final result.
+Starting from 2.0.0-beta.1, you can get all these information in the final result.

 ```javascript
-import Tesseract from 'tesseract.js';
-
-const { TesseractWorker } = Tesseract;
-const worker = new TesseractWorker();
-
-worker
-  .recognize('https://tesseract.projectnaptha.com/img/eng_bw.png', 'eng', {
+import { createWorker } from 'tesseract.js';
+const worker = createWorker({
+  logger: m => console.log(m)
+});
+
+(async () => {
+  await worker.load();
+  await worker.loadLanguage('eng');
+  await worker.initialize('eng');
+  await worker.setParameters({
    tessedit_create_box: '1',
    tessedit_create_unlv: '1',
    tessedit_create_osd: '1',
-  })
-  .then((result) => {
-    console.log(result.text);
-    console.log(result.hocr);
-    console.log(result.tsv);
-    console.log(result.box);
-    console.log(result.unlv);
-    console.log(result.osd);
  });
+  const { data: { text, hocr, tsv, box, unlv } } = await worker.recognize('https://tesseract.projectnaptha.com/img/eng_bw.png');
+  console.log(text);
+  console.log(hocr);
+  console.log(tsv);
+  console.log(box);
+  console.log(unlv);
+})();
 ```
--- a/docs/local-installation.md
+++ b/docs/local-installation.md
@ -9,10 +9,20 @@ Because of this we recommend loading `tesseract.js` from a CDN. But if you reall
 In Node.js environment, the only path you may want to customize is languages/langPath.

 ```javascript
-const worker = Tesseract.TesseractWorker({
-  workerPath: 'https://unpkg.com/tesseract.js@v2.0.0-alpha.13/dist/worker.min.js',
+Tesseract.recognize(image, langs, {
+  workerPath: 'https://unpkg.com/tesseract.js@v2.0.0-beta.1/dist/worker.min.js',
  langPath: 'https://tessdata.projectnaptha.com/4.0.0',
-  corePath: 'https://unpkg.com/tesseract.js-core@v2.0.0-beta.10/tesseract-core.wasm.js',
+  corePath: 'https://unpkg.com/tesseract.js-core@v2.0.0-beta.13/tesseract-core.wasm.js',
+})
+```
+
+Or
+
+```javascript
+const worker = createWorker({
+  workerPath: 'https://unpkg.com/tesseract.js@v2.0.0-beta.1/dist/worker.min.js',
+  langPath: 'https://tessdata.projectnaptha.com/4.0.0',
+  corePath: 'https://unpkg.com/tesseract.js-core@v2.0.0-beta.13/tesseract-core.wasm.js',
 });
 ```

@ -23,6 +33,6 @@ A string specifying the location of the [worker.js](./dist/worker.min.js) file.
 A string specifying the location of the tesseract language files, with default value 'https://tessdata.projectnaptha.com/4.0.0'. Language file URLs are calculated according to the formula `langPath + langCode + '.traineddata.gz'`.

 ### corePath
-A string specifying the location of the [tesseract.js-core library](https://github.com/naptha/tesseract.js-core), with default value 'https://unpkg.com/tesseract.js-core@v2.0.0-beta.10/tesseract-core.wasm.js' (fallback to tesseract-core.asm.js when WebAssembly is not available).
+A string specifying the location of the [tesseract.js-core library](https://github.com/naptha/tesseract.js-core), with default value 'https://unpkg.com/tesseract.js-core@v2.0.0-beta.13/tesseract-core.wasm.js' (fallback to tesseract-core.asm.js when WebAssembly is not available).

-Another WASM option is 'https://unpkg.com/tesseract.js-core@v2.0.0-beta.10/tesseract-core.js' which is a script that loads 'https://unpkg.com/tesseract.js-core@v2.0.0-beta.10/tesseract-core.wasm'. But it fails to fetch at this moment.
+Another WASM option is 'https://unpkg.com/tesseract.js-core@v2.0.0-beta.13/tesseract-core.js' which is a script that loads 'https://unpkg.com/tesseract.js-core@v2.0.0-beta.13/tesseract-core.wasm'. But it fails to fetch at this moment.
--- a/docs/tesseract_parameters.md
+++ b/docs/tesseract_parameters.md
@ -1,12 +1,14 @@
 Tesseract.js Parameters
 =======================

-In the 3rd argument of `TesseractWorker.recognize()`, you can pass a params object to customize the result of OCR, below are supported parameters in tesseract.js so far.
+When initializing 
+
+In the 3rd argument of `ecognize()`, you can pass a params object to customize the result of OCR, below are supported parameters in tesseract.js so far.

 Example:

 ```javascript
-import Tesseract from 'tesseract.js';
+import { createWorker, OEM, PSM } from 'tesseract.js';

 const { TesseractWorker, OEM, PSM } = Tesseract;
 const worker = new TesseractWorker();
@ -24,17 +26,8 @@ worker
 | tessedit\_ocr\_engine\_mode | enum | OEM.LSTM\_ONLY | Check [HERE](https://github.com/tesseract-ocr/tesseract/blob/4.0.0/src/ccstruct/publictypes.h#L268) for definition of each mode | 
 | tessedit\_pageseg\_mode | enum | PSM.SINGLE\_BLOCK | Check [HERE](https://github.com/tesseract-ocr/tesseract/blob/4.0.0/src/ccstruct/publictypes.h#L163) for definition of each mode |
 | tessedit\_char\_whitelist | string | '' | setting white list characters makes the result only contains these characters, useful the content in image is limited |
-| tessjs\_create\_pdf | string | '0' | only 2 values, '0' or '1', when the value is '1', tesseract.js generates a pdf output |
 | tessjs\_create\_hocr | string | '1' | only 2 values, '0' or '1', when the value is '1', tesseract.js includes hocr in the result |
 | tessjs\_create\_tsv | string | '1' | only 2 values, '0' or '1', when the value is '1', tesseract.js includes tsv in the result |
 | tessjs\_create\_box | string | '0' | only 2 values, '0' or '1', when the value is '1', tesseract.js includes box in the result |
 | tessjs\_create\_unlv | string | '0' | only 2 values, '0' or '1', when the value is '1', tesseract.js includes unlv in the result |
 | tessjs\_create\_osd | string | '0' | only 2 values, '0' or '1', when the value is '1', tesseract.js includes osd in the result |
-| tessjs\_pdf\_name | string | 'tesseract.js-ocr-result' | the name of the generated pdf file |
-| tessjs\_pdf\_title | string | 'Tesseract.js OCR Result' | the title of the generated pdf file |
-| tessjs\_pdf\_auto\_download | boolean | true | If the value is true, tesseract.js will automatic download/writeFile pdf file |
-| tessjs\_pdf\_bin | boolean | false | whether to include pdf binary array in the result object (result.files.pdf) |
-| tessjs\_image\_rectangle\_left | number | 0 | The left of the sub-rectangle of the image. |
-| tessjs\_image\_rectangle\_top | number | 0 | The top of the sub-rectangle of the image. |
-| tessjs\_image\_rectangle\_width | number | -1 | The width of the sub-rectangle of the image, -1 means auto width detection |
-| tessjs\_image\_rectangle\_height | number | -1 | The height of the sub-rectangle of the image, -1 means auto height detection |