FANDOM


This process currently assumes that you do not have any really powerful OCR tool. ABBYY 8.0 and Microsoft Document Imaging have not been found to be powerful enough, but ABBYY 10 may allow a different procedure than the one below.

Digitization WorkflowEdit

Stage I: PhotographyEdit

Materials: digital camera, camera stand, light sources, page weights (e.g. coins)

Steps:

  1. Photograph all even-numbered pages (weighting down the page margin if needed)
  2. Photograph all odd-numbered pages

Issues: 1. Glossies such as 1960s yearbooks require different lighting to avoid glare -- haven't found ideal solution. 2. Page curvature -- possible to correct using Photoshop?

Stage II: Image processing & uploadEdit

Materials: digital photos, IrfanView, file compression program; optional: PDF to DJVU GUI

Steps:

  1. Transfer photos to folder on computer
  2. Make backup ZIP file (important to do this before doing any actual processing)
  3. Check for duplicates/missing
  4. In IrfanView, do batch rename & rotate, first all evens & then all odds. When done all pages should be in order & facing right-side up.
  5. Either as a batch or individually, fine-rotate & recrop images to show page only (no random background crud)
  6. Re-sort, select & print images to PDF
  7. Upload PDF via http://archive.org/create, selecting "Public Domain Mark" & providing informative description
  8. If of potential Wikisource value:
    1. Convert PDF to DJVU (or just wait for archive.org conversion script to complete & then download DJVU)
    2. Upload DJVU to http://commons.wikimedia.org/Special:Upload

Stage III: Text processingEdit

Wikisource
  1. Following the instructions here, start a new page at en.wikisource.org/wiki/Index:name of DJVU file . (For example, if the file is "Oread_August_1881.djvu", you would create the page en.wikisource.org/wiki/Index:Oread_August_1881.djvu ).
  2. Fill in the data in the resulting form, leaving any mysterious fields as they are. Save.
  3. There should now be a row of little red numbers at the bottom of the Index page.
    1. Click on one (preferably "1").
    2. You'll get a little edit box next to a page image.
    3. You can either:
      1. Type the text by hand (not recommended), or
      2. Upload the individual page image to http://onlineocr.net in "Guest" mode to extract high-quality OCR text and paste that text into the edit field, or
      3. OCR the full PDF/DJVU with a program on your computer (probably not worthwhile unless your program is really good), and paste the resulting text into the edit field.
    4. Go through the resulting text, fixing hyphenation, paragraph breaks, scannos &c.
    5. Save.
    6. Go back to Index: file and click on another page.
ShimerSource
  1. (To be determined...)


This page is part of the Shimer College Wiki, an independent documentation project. Shimer College, the Great Books college of Chicago, is not responsible for its content.



Ad blocker interference detected!


Wikia is a free-to-use site that makes money from advertising. We have a modified experience for viewers using ad blockers

Wikia is not accessible if you’ve made further modifications. Remove the custom ad blocker rule(s) and the page will load as expected.