From mboxrd@z Thu Jan 1 00:00:00 1970 From: Pieter Praet Subject: Re: [OT] Scanning for archiving Date: Sat, 05 Nov 2011 23:36:22 +0100 Message-ID: <87vcqy6vtl.fsf@praet.org> References: Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Return-path: Received: from eggs.gnu.org ([140.186.70.92]:52343) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1RMorU-0007l3-7Y for emacs-orgmode@gnu.org; Sat, 05 Nov 2011 18:37:17 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1RMorS-0003py-UR for emacs-orgmode@gnu.org; Sat, 05 Nov 2011 18:37:16 -0400 Received: from mail-wy0-f169.google.com ([74.125.82.169]:48672) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1RMorS-0003pc-Q3 for emacs-orgmode@gnu.org; Sat, 05 Nov 2011 18:37:14 -0400 Received: by wyg24 with SMTP id 24so4170806wyg.0 for ; Sat, 05 Nov 2011 15:37:10 -0700 (PDT) In-Reply-To: List-Id: "General discussions about Org-mode." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-orgmode-bounces+geo-emacs-orgmode=m.gmane.org@gnu.org Sender: emacs-orgmode-bounces+geo-emacs-orgmode=m.gmane.org@gnu.org To: Marcelo de Moraes Serpa , Org Mode On Sat, 5 Nov 2011 14:03:24 -0600, Marcelo de Moraes Serpa wrote: > Hi list, > > I just bought a scanner and started to scan important documents as a > backup, and archiving them with meaningful metadata in orgmode files. Then > a question came to mind - what dpi to use? I'm not really savvy when it > comes to scanning or printing, and I want like a dpi that allows me to > reprint the document at an acceptable quality later if necessary, but that > also doesn't take that much space (600dpi pdfs take around 5MB). > > Any insights welcome, > > Thanks, > > Marcelo. Using PDF for scanned documents results in *huge* files with a seriously disappointing image quality. Consider storing your scans in DjVu format [1], which was developed specifically for this purpose. I scan all docs @ 600dpi, predominantly gray-scale (only in colour when it's *really* necessary) and store in DjVu format, all using gscan2pdf [2]. Even at that seemingly overkill resolution, single-page documents are generally (if they aren't too "grainy") only a few 100 KiB in size. gscan2pdf also supports a number of OCR utils, but the UI for this is clumsy (aren't they all...), so you're better off using the CLI tools directly. Tesseract is recommended. I've used this approach to "convert" piles upon piles of old bank statements to Ledger format, with very little effort. NOTE: When attempting something like this, a fast scanner with a *reliable* automatic document feeder will help prevent premature hair loss ;) Peace -- Pieter [1] http://djvu.org/resources/whatisdjvu.php [2] http://gscan2pdf.sourceforge.net/