Adobe PDF Format

You can use either the Find toolbar or the Search PDF scanning software  window to locate a word, series of words, or partial word in the active Adobe PDF document. The Find toolbar provides a basic set of scan options for searching for text in only the current PDF document; the Search PDF window searches more PDF areas than the Find toolbar, provides more advanced options, and lets you search for text in one or more PDF documents, an index of PDF files, or PDF files on the Internet (see Searching Adobe PDF documents on the Internet).Scanning Searchable PDF By default, both the Find toolbar and the Search PDF window search the text, layers, form fields, and digital signatures in the scan to PDF document; both features also let you include bookmarks and comments in the search. By default, the Search PDF window also searches object data, and image XIF (extended image file format) metadata; it searches document properties and XMP metadata by default but only when searching multiple PDF documents or a PDF index; it searches indexed structure tags but only when Scanning or searching a PDF index. In addition, the Search PDF window lets you include attachments in the search.Note: scan Adobe PDF documents can have multiple layers. If the search results include an occurrence on a hidden scan  layer, selecting that occurrence displays an alert that asks if you want to make that layer visible.

When you get a document that has been scanned rather than exported from the software that created it, such as MS Word, it's just an image (i.e. a picture). Remember, to a computer, a picture of the letter "A" is not the same as the text character "A," so when you try to text-search an image, you get no hits because there's no text to search. Typical scanned litigation documents are in the TIFF (image) format. There are also many software and hardware packages that scan paper directly into PDF. For now, I'm not going to address using Acrobat or other tools as the scanning software. For our purposes today, let's just say, "you've got those image files that you want to convert into something you can search."

The unique thing about PDF is that you can have an exact image of the document, plus the text, plus all kinds of metadata ALL IN ONE FILE. This is a wonderful thing -- but I will expound on its wonderfulness later... With the "Paper Capture" tools in Acrobat, the software reads the picture, and figures out what the text is. So while you still see the "image," the software can also read the underlying text. OCR is not perfect, and it works best on first generation, laser printed images (just like your eyes do). In the past decade, however, OCR technology has gotten surprisingly accurate.

Scanning and OCR with Acrobat

Scan to PDF