Image Text Extraction

Demo

  1. For file uploads, use a library called multer.
  2. To use multer, first create an upload object upload = multer({ dest: 'files/' })
  3. Next, pass your upload object into your post request as a middleware: app.post('/files', upload.array('userFiles'), (req, res) => ...
  4. After the middleware, you can access files in the request object: req.files

After you can successfully upload the file, you need to pass each file into Tesseract Library for text analysis. Since this will take time for each image, response status should be 202 and you should create and return a job url that the user can visit to view the status of the job.

You would need to provide an API /api/job/:jobid to lookup the status of each processed file.

Sample Tesseract Code

const Tesseract = require('tesseract.js')
Tesseract.recognize( ..path to image file... , 'eng', { logger: console.log }).then( ... callback ...)

If your application experiences crashes during the text recognition, you may have to delete the generated file eng.traineddata.