concordia-memories.org Forum Index concordia-memories.org
Recalling Concordia's Past
 
 FAQFAQ   SearchSearch   MemberlistMemberlist   UsergroupsUsergroups   RegisterRegister 
 ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 

Searchable Image Files

 
Post new topic   Reply to topic    concordia-memories.org Forum Index -> General chat
View previous topic :: View next topic  
Author Message
roger.pape
Site Admin


Joined: 17 Mar 2009
Posts: 386
Location: Liverpool, NY

PostPosted: Thu Feb 25, 2010 8:22 pm    Post subject: Searchable Image Files Reply with quote

Most of us have come to rely on a search engine, such as Google, Yahoo, Bing or the like, to find information on the Web. Similarly, using the search capability of various computer programs is almost indispensable for finding something like a name or place in large data files. Essentially all word processing, spreadsheet, database, and other applications have some form of ‘find’ function. This avoids the need to read through page after page of data for the information one is looking for.

For other than simple text files, you may have noticed that much of the data is posted in Adobe Acrobat pdf file format on this website. The pdf format has become a widely accepted documentation standard. As opposed to proprietary formats, such as Microsoft Word .doc files, the pdf format can be viewed on all types of computers and in the various browsers with a free viewer. In addition to the Acrobat Reader provided by Adobe, there are other public domain readers available. I happen to like a freeware program called Foxit available at http://www.foxitsoftware.com/pdf/reader/. This program is much faster and has a few features not found in the Adobe program.

The primary advantage of the pdf file format is that one can combine text and graphics in the same file. (Other proprietary programs, such as MS Word, can do this also but pdf files tend to be smaller in size without the added overhead such as found in a .doc file.) One must realize, however, that graphic images in themselves cannot be searched. When printed pages of text are converted to computer files with a scanner, the result is a series of images, not a string of text characters. So pdf image-only files cannot be searched by normal means. The solution is to use an optical character recognition (OCR) process to extract the text from the image. Fortunately, the pdf file standard also provides for an alternate format known as “searchable images”. This format adds a hidden text layer behind the images that is aligned with the words in an image. The search function in a pdf reader can then locate character strings and highlight any matches it finds.

Where feasible, I have tried to run scanned files through an OCR engine and post the resulting files in this searchable image format. (This does not apply to some files on other websites to which I have provided a link.) Automated OCR software is not perfect and may not provide 100% accuracy in its text recognition, particularly if the quality of the print is poor. So, if you search some of the files, you may not find all occurrences of the string you are looking for, but the results are surprisingly good.

If you are at all interested, you might try the Foxit reader that I referred to above with a searchable image file. There is an icon in the middle of the toolbar at the top of the window (a page with eyeglasses) that will switch the display between the image and the hidden text layer.
Back to top
View user's profile Send private message Send e-mail Visit poster's website
Display posts from previous:   
Post new topic   Reply to topic    concordia-memories.org Forum Index -> General chat All times are GMT - 6 Hours
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Powered by phpBB © 2001, 2005 phpBB Group