diary top frame
Thursday, May 03, 2007
posted on 5/3/2007 9:51:58 AM (Eastern Daylight Time, UTC-04:00)

Here's another helpful tip from the great folks at our OnSight Support Center. You need to search a bunch of documents a client sent you. You figure it will be a breeze because they are sending PDFs probably generated from the original electronic documents. When you open them, they are not text based PDFs at all but images! Now the unceremonious task of running OCR, spell check, and clean-up awaits you.

The new Standard and Pro versions of Acrobat 8 make that workflow a little less tedious. Acrobat has had the ability to OCR documents for some time, (once referred to as “Paper Capture” but now the more-straight-forward-if-less-elegant-sounding “OCR Text Recognition”), but it has boosted the exporting capability to Word, text, XML, HTML and image formats. Best yet, you can batch process selected files or a folder of documents using "Tools>Document Processing>Batch Processing." That just leaves the inevitable spell check and touch-up of the OCR results.

This provides a valuable stop gap when clients provide sets of PDFs that need to be full-text searchable, but are not so large as to warrant sending out to an EDD or similar vendor for processing.

Name
E-mail
Home page

Comment (HTML not allowed)  

Enter the code shown (prevents robots):

diary top frame