ssdigit.nothingisreal.com
Socialist Standard digitization blog: September 2012
http://ssdigit.nothingisreal.com/2012_09_01_archive.html
Further experiences with PDFBeads. I had a chance to visually examine the output of PDFBeads. And so far it looks OK. I think I will keep the unpaper. One problem that has arisen, however, is properly specifying the physical dimensions of the page. Back when I started this blog I reported that for most of my scans, the horizontal DPI is not the same as the vertical DPI. Postprocess the output PDF to override the DPI or paper size settings. I'm not sure if there's any easy way of doing this. PDFBeads supp...
ssdigit.nothingisreal.com
Socialist Standard digitization blog: October 2011
http://ssdigit.nothingisreal.com/2011_10_01_archive.html
OCR on GNU/Linux: A survey. Today was spent checking out options for optical character recognition (OCR) on GNU/Linux. There are apparently the following basic engines for OCR:. In June of last year Andreas Gohr. Did a short experiment where he compared the first five above-listed GNU/Linux OCR engines. And found that ABBYY OCR had the highest accuracy, with 100% for proportionally spaced serif and sans-serif text; Tesseract was the best-performing Free Software. Cuneiform, GOCR, Ocrad, OCRopus, Tesseract.