1)
Install PIL
2)
Install tesseract-ocr
3)
Install pytesser
http://code.google.com/p/pytesser/downloads/detail?name=pytesser_v0.0.1.zip&can=2&q=
4)
Convert your image to tif
5)
Python code to read data from myimage.tif
Install PIL
#pip install pil
2)
Install tesseract-ocr
#sudo apt-get install tesseract-ocr
3)
Install pytesser
http://code.google.com/p/pytesser/downloads/detail?name=pytesser_v0.0.1.zip&can=2&q=
4)
Convert your image to tif
#convert myimage.jpeg -auto-level -compress none myimage.tif
5)
Python code to read data from myimage.tif
from PIL import Image
from pytesser.pytesser import *
image_file = 'myimage.tif'
im = Image.open(image_file)
text = image_to_string(im)
text = image_file_to_string(image_file)
text = image_file_to_string(image_file, graceful_errors=True)
print "=====output=======\n"
print text
print text
Tested with Ubuntu 12.10 and it is working.
ReplyDeletehttp://www.imagemagick.org/discourse-server/viewtopic.php?f=1&t=20579
Bypass Captcha using Python and Tesseract OCR engine
ReplyDeletehttp://www.debasish.in/2012/01/bypass-captcha-using-python-and.html
http://bokobok.fr/bypassing-a-captcha-with-python/
ReplyDeletehttp://blog.c22.cc/2010/10/12/python-ocr-or-how-to-break-captchas/
http://www.wausita.com/captcha/
ReplyDeleteHeya¡my very first comment on your site. ,I have been reading your blog for a while and thought I would completely pop in and drop a friendly note. . It is great
stuff indeed. I also wanted to ask..is there a way to subscribe to your site via email?
Bypass captchas
Thanks for sharing this coding. It is very useful.
ReplyDeletedeathbycaptcha
wow.......................this is very informative......................................keep sharing such useful informations...................
ReplyDeleteimage decoding
Thanks for sharing the information that How to convert jpg to tiff for OCR with tesseract. It is so informative blog!!
ReplyDeleteTiff Converter
Hi i was unable to install tesseract-ocr via terminal using the command sudo apt-get install tesseract-ocr. It was showing archives error. Please help.....
ReplyDeletecan we install this into windows??
ReplyDeleteif so please provide cpmmand set for the same....
or any site where they mentioned..
thanks
great solution for thanks
ReplyDeletethanks...it worked :)
ReplyDeleteIt's a great topic, but, unfortunately I've been receiving the error message: IOError: cannot write mode LA as BMP, does someone knows how to fix it?
ReplyDeleteNot working getting Trackback error
ReplyDeleteError
text = image_to_string(im)
File "/Users/pc/Downloads/pytesser_v0/pytesser.py", line 31, in image_to_string
call_tesseract(scratch_image_name, scratch_text_name_root)
File "/Users/pc/Downloads/pytesser_v0/pytesser.py", line 21, in call_tesseract
proc = subprocess.Popen(args)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/subprocess.py", line 710, in __init__
errread, errwrite)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/subprocess.py", line 1335, in _execute_child
raise child_exception
OSError: [Errno 2] No such file or directory
File "/home/user/Downloads/pytesser_v0.0.1/errors.py", line 10, in check_for_errors
ReplyDeleteinf = file(logfile)
NameError: name 'file' is not defined
This comment has been removed by the author.
ReplyDeleteNot Install PIL Plz Help Me
ReplyDeleteThis Error-- Collecting pil
Could not find a version that satisfies the requirement pil (from versions: )
No matching distribution found for pil
(myenv) keshri:~/ocrtest$
I use 2captcha, a good captcha bypass service.
ReplyDelete