How to convert image(jpg) to text / word using shell command?

September 22, 2015 - 16:20

Converting the jpg to text is done by OCR(Optical character reading). There are lot of online OCR services available now. But these services set a limit in the input file size we are feeding. So the best option is to do conversion through the shell command. Here I explain the conversion using the shell commands.

Convertion of image to text

To convert the image to text, we are using tesseract OCR. For using tesseract we should install the tesseract in our system.
For that use the code:

sudo apt-get install tesseract-ocr

The tesseact OCR will only process files with .tif format. So we should convert our image files to tif format. So we should need an image processing application. Here I use ImageMagick. For installing use the code:

sudo apt-get install imagemagick

After the successful installation, convert the image file to tif format. For that use command:

convert 1.jpg foo.tif

Now you have your tif file in your directory. Convert this tif to txt file. For that use command:

tesseract foo.tif foo

You will get a foo.txt file in your folder.

Hope this helps. Please fell free to share your thoughts and doubts regarding this here.

Post your comments / questions