Custom Search

Friday, April 12, 2013

How to convert jpg to tiff for OCR with tesseract

1)
Install PIL
#pip install pil

2)
Install  tesseract-ocr
#sudo apt-get install tesseract-ocr

3)
Install  pytesser
http://code.google.com/p/pytesser/downloads/detail?name=pytesser_v0.0.1.zip&can=2&q=
4)
Convert your image to tif
#convert myimage.jpeg -auto-level -compress none myimage.tif

 
5)
Python code to read data from myimage.tif

from PIL import Image
from pytesser.pytesser import *

image_file = 'myimage.tif'
im = Image.open(image_file)
text = image_to_string(im)
text = image_file_to_string(image_file)
text = image_file_to_string(image_file, graceful_errors=True)
print "=====output=======\n"
print text

18 comments:

  1. Tested with Ubuntu 12.10 and it is working.

    http://www.imagemagick.org/discourse-server/viewtopic.php?f=1&t=20579

    ReplyDelete
  2. Bypass Captcha using Python and Tesseract OCR engine
    http://www.debasish.in/2012/01/bypass-captcha-using-python-and.html

    ReplyDelete
  3. http://bokobok.fr/bypassing-a-captcha-with-python/

    http://blog.c22.cc/2010/10/12/python-ocr-or-how-to-break-captchas/

    http://www.wausita.com/captcha/

    ReplyDelete







  4. Heya¡­my very first comment on your site. ,I have been reading your blog for a while and thought I would completely pop in and drop a friendly note. . It is great

    stuff indeed. I also wanted to ask..is there a way to subscribe to your site via email?














    Bypass captchas

    ReplyDelete
  5. Thanks for sharing this coding. It is very useful.




    deathbycaptcha

    ReplyDelete
  6. wow.......................this is very informative......................................keep sharing such useful informations...................

    image decoding

    ReplyDelete
  7. Thanks for sharing the information that How to convert jpg to tiff for OCR with tesseract. It is so informative blog!!

    Tiff Converter

    ReplyDelete
  8. Hi i was unable to install tesseract-ocr via terminal using the command sudo apt-get install tesseract-ocr. It was showing archives error. Please help.....

    ReplyDelete
  9. can we install this into windows??
    if so please provide cpmmand set for the same....
    or any site where they mentioned..
    thanks

    ReplyDelete
  10. It's a great topic, but, unfortunately I've been receiving the error message: IOError: cannot write mode LA as BMP, does someone knows how to fix it?

    ReplyDelete
  11. Not working getting Trackback error

    Error
    text = image_to_string(im)
    File "/Users/pc/Downloads/pytesser_v0/pytesser.py", line 31, in image_to_string
    call_tesseract(scratch_image_name, scratch_text_name_root)
    File "/Users/pc/Downloads/pytesser_v0/pytesser.py", line 21, in call_tesseract
    proc = subprocess.Popen(args)
    File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/subprocess.py", line 710, in __init__
    errread, errwrite)
    File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/subprocess.py", line 1335, in _execute_child
    raise child_exception
    OSError: [Errno 2] No such file or directory

    ReplyDelete
  12. File "/home/user/Downloads/pytesser_v0.0.1/errors.py", line 10, in check_for_errors
    inf = file(logfile)
    NameError: name 'file' is not defined

    ReplyDelete
  13. This comment has been removed by the author.

    ReplyDelete
  14. Not Install PIL Plz Help Me
    This Error-- Collecting pil
    Could not find a version that satisfies the requirement pil (from versions: )
    No matching distribution found for pil
    (myenv) keshri:~/ocrtest$

    ReplyDelete
  15. Thanks For sharing this Superb article.I use this Article to show my assignment in college.it is useful For me Great Work. compress jpeg to 100kb

    ReplyDelete
  16. I use 2captcha, a good captcha bypass service.

    ReplyDelete