How to Digitally Preserve a Book

View previous topic View next topic Go down

How to Digitally Preserve a Book

Post  Frater_NS on Wed Aug 13, 2008 6:52 am

How to Digitally Preserve a Book:


Making an electronic version of an old paper or book the very first time can be a very daunting process. It can be very intimidating to use new software and tools for the first time, but taken step by step it not as difficult as it seems.

The bare minimum of equipment needed is a flat bed scanner (very reasonably priced), a computer and the correct software to enable you to scan.

Most new scanners will come with software which will allow you to scan documents and save them as a picture (JPG, TIFF) or as Adobe Acrobat PDF format. A lot of new scanners come with more advanced software (e.g. PaperPort) which has all the functionality needed to create electronic documents. At the very minimum, a scanner will come with the "TWAIN" driver to allow your scanner to interface with other software. Check your scanner documentation or the vendor's web site for more information.

In my opinion, the most flexible and widely supported format for electronic documents is the Acrobat PDF, and this FAQ will give step by step instructions on how to scan a paper document and produce a PDF file.

Scanner Settings:
For most documents its best to use these scanner settings:

  • Black and White
  • 300-400 Dots Per Inch (DPI) scan resolution.


I would only recommend using “Grey Scale” or “True Colour” for your document if you are forced to, as it creates massive files. However, a few good reasons why you might would be:

  • Document has grainy or discoloured paper
  • Uneven surface
  • Water damage
  • Can’t lay the document completely flat


Positioning the book on the scanner:
1. Position the book straight, and tightly up against the top edge of the scanner.




2. Hold the book down firmly or apply a weight on top. Any 'air gap' should be eliminated as much as possible, as it will lead to a black line appearing on the final scanned image.



3. Click scan on the PC and continue to the next page.

If the book is tightly bound and you are worried pressing the book flat will break the spine, one way to scan it is to drape the book over the edge of the scanner, pressing the book flat. Scan and then rotate the book around to scan the second page. It’s a lot more time consuming, but it causes less damage to the book's spine.

If you are ultra paranoid about breaking the spine of the book, you can select "True Colour" scanning (instead of black & white), and very loosely place the book on the scanner. The final file size will be huge, but it will eliminate the need for the book to be held down.

Scanning using ABBYY FineReader:

1. Use the big "SCAN" button to scan the pictures in and repeat for each page. Don’t worry about formatting and tidying up the image - all this can be done later. Or, if you have used another program for scanning, load them into ABBYY using File->Open PDF of Image.


2. Page Menu->Edit Page Image


3. Use the Crop Tool to crop the page down to size. Do this for each page. Try to find a size which matches all the pages.
If you are brave use the “Apply to all images” button.



4. Select the Split Tool.




Move the splitter bar into the middle of the page and press "Split Image". Do this for all the pages.

5.Use the Deskew & Straighten Tool to straighten pages, and the Eraser Tool to remove any artifacts or blemishes from the page.


Close the "Edit Page Image" Window.

6. You now have two options:
Use ABBYY’s Optical Character Recognition function to convert the scanned image into digital text. This can then be exported as a PDF.
OR
Define each page as a “Picture” and then export as a PDF.

7. Using OCR (exporting as text):

Click the Analyze button to define the area of text and pictures:



Click the Read button to OCR the text:



OCR is not perfect and it often introduces errors into the output texts. It is important to manually check each page for errors.

8. Not using OCR (exporting as a picture):

File Menu->Save As->PDF Document
In the "Save Pages" window select these options:

  • File Name
  • Save as type: PDF
  • Save Pages: All Pages
  • File Options: Create a Single file for all pages
  • Click the Options... button




In the "Options" Window select these options:

  • Select Tab "3. Save"
  • Select Tab PDF
  • Default Paper Size: whatever is appropriate for your document
  • Save Mode: Page Image Only
  • Click off "Use Mixed Raster Content" (otherwise you get really bad quality results).
  • Picture Settings: High (for Printing)
  • Security: No Security




Press OK to exit the "Options" Window.
In the "Save Pages" window press SAVE to generate a PDF file!


Finishing Touches to the PDF File:

Make sure you set the meta-data tags in the PDF file to include the book's title and the author's details. This may seem trivial but can be of great use if you have a huge library of PDF files. Right Mouse Click on the PDF file and select Properties:



External Links:

How to Scan a Book (1st)
How to Scan a Book (2nd)
How to convert a book to PDF+Text
Scanning FAQ
Book Scanning Product
How to Scan a Book With a Digital Camera

Frater_NS

Age : 47
Number of posts : 132
Registration date : 2008-08-11

View user profile

Back to top Go down

awesome

Post  rickyrick on Tue Dec 09, 2008 8:34 pm

thanks so much for this explanation. It helps a lot and ABBYY is the sickest program I have seem for text recognition. Sick.

Props!
avatar
rickyrick

Number of posts : 19
Registration date : 2008-08-17

View user profile

Back to top Go down

Re: How to Digitally Preserve a Book

Post  Chakravanti on Tue Dec 23, 2008 11:07 pm

And for those of you who don't have $400 to shell out for the finest OCR and don't mind doing a little extra detail work (You and cheap hacks like me) there is always the GNU software spectrum...

http://www.gnu.org/software/ocrad/ocrad.html

and of course...

http://directory.fsf.org/search/?query=OCR

My computer is broken and till I get another one I'm not able to make comments about how well or not the software works but I will be taking a swing at it when I do get another crunch machine running again and report back.
avatar
Chakravanti

Age : 34
Number of posts : 57
Registration date : 2008-11-15

View user profile

Back to top Go down

Re: How to Digitally Preserve a Book

Post  neutralrobotboy on Wed Dec 24, 2008 12:49 am

it's not gnu and it's windows-only, but nevertheless i also found this program to be quite usable for ocr work:
http://softi.co.uk/freeocr.htm

neutralrobotboy

Age : 36
Number of posts : 255
Registration date : 2008-12-23

View user profile http://www.namelessnumberhead.com

Back to top Go down

Re: How to Digitally Preserve a Book

Post  Chakravanti on Wed Dec 24, 2008 12:05 pm

It may not be GNU but it is Apache V2.0 which is recognized by the FSF to be Free Software. It is open source and includes free modification and distribution rights and is GPL-3 compatible. IOW, Great find! I will be putting this to work since I'm operating off a windoze platform at the moment and am about to try to preserve and submit a book to the digimob. I wil report back on the program when I'm done.

Well, the Tesseract engine is Apache2 anyway which, by proxy, means that freeocr.net must be freely distributable although not necessarily Open Source.

It in fact doeswork in Linux! The Tesseract engine should work in 32 & 64 bit versions of linux and is regularly tested in Ubuntu (Albeit Edgy & Drake, although that might just be outdated posting).

http://code.google.com/p/tesseract-ocr/

For my fellow hacks out there.
avatar
Chakravanti

Age : 34
Number of posts : 57
Registration date : 2008-11-15

View user profile

Back to top Go down

Re: How to Digitally Preserve a Book

Post  neutralrobotboy on Thu Dec 25, 2008 9:14 pm

ahh, very interesting. i looked into freeocr.net only briefly before trying it out, and i didn't even remember it being open-source or based on an open-source engine. interested to see how you find it when you scan your book.

neutralrobotboy

Age : 36
Number of posts : 255
Registration date : 2008-12-23

View user profile http://www.namelessnumberhead.com

Back to top Go down

thank you for this!

Post  Nicky Lubu on Thu Apr 29, 2010 11:32 am

i always had a problem with the text part I love you
avatar
Nicky Lubu

Location : NJ
Number of posts : 163
Registration date : 2009-11-20

View user profile

Back to top Go down

Re: How to Digitally Preserve a Book

Post  Sponsored content


Sponsored content


Back to top Go down

View previous topic View next topic Back to top

- Similar topics

 
Permissions in this forum:
You cannot reply to topics in this forum