Custom Search
Showing posts with label Tesseract. Show all posts
Showing posts with label Tesseract. Show all posts

Friday, April 12, 2013

How to convert jpg to tiff for OCR with tesseract

1)
Install PIL
#pip install pil

2)
Install  tesseract-ocr
#sudo apt-get install tesseract-ocr

3)
Install  pytesser
http://code.google.com/p/pytesser/downloads/detail?name=pytesser_v0.0.1.zip&can=2&q=
4)
Convert your image to tif
#convert myimage.jpeg -auto-level -compress none myimage.tif

 
5)
Python code to read data from myimage.tif

from PIL import Image
from pytesser.pytesser import *

image_file = 'myimage.tif'
im = Image.open(image_file)
text = image_to_string(im)
text = image_file_to_string(image_file)
text = image_file_to_string(image_file, graceful_errors=True)
print "=====output=======\n"
print text