From mboxrd@z Thu Jan 1 00:00:00 1970 From: Pieter Praet Subject: Re: [OT] Scanning for archiving Date: Sun, 06 Nov 2011 22:59:01 +0100 Message-ID: <87hb2ghpzu.fsf@praet.org> References: <87vcqy6vtl.fsf@praet.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Return-path: Received: from eggs.gnu.org ([140.186.70.92]:43865) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1RNAkr-0007b2-7q for emacs-orgmode@gnu.org; Sun, 06 Nov 2011 16:59:55 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1RNAkp-0006i5-R9 for emacs-orgmode@gnu.org; Sun, 06 Nov 2011 16:59:53 -0500 Received: from mail-wy0-f169.google.com ([74.125.82.169]:46560) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1RNAkp-0006i1-LX for emacs-orgmode@gnu.org; Sun, 06 Nov 2011 16:59:51 -0500 Received: by wyg24 with SMTP id 24so4798395wyg.0 for ; Sun, 06 Nov 2011 13:59:50 -0800 (PST) In-Reply-To: List-Id: "General discussions about Org-mode." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-orgmode-bounces+geo-emacs-orgmode=m.gmane.org@gnu.org Sender: emacs-orgmode-bounces+geo-emacs-orgmode=m.gmane.org@gnu.org To: Samuel Wales Cc: Org Mode , Marcelo de Moraes Serpa On Sat, 5 Nov 2011 16:35:11 -0700, Samuel Wales wrote: > I used to find that 8-bit 75dpi was legible and small. > True. It all depends on why you're scanning them in the first place. 75dpi is fine when scanning with collaboration/quick-reference in mind, but for archival/backup purposes (i.e. absolute peace of mind when your whole collection of dead trees burns, drowns, or is simply disposed of) or OCR, you'll want to go with 600dpi and beyond. If using DjVu instead of PDF, the storage overhead will be negligible. > What ADF scanners are out there for Linux that have high quality > reliable ADF, [...] I wish I knew... If anyone on this list can think of a scanner whose ADF doesn't require constant babysitting, I'm betting it won't have a consumer-grade price tag. > [...] are fast, [...] Pretty much all of them, these days. > and work well with CLI tools? > As long as it's supported by SANE [1], rats are entirely optional. > Is OCR at the point where it is feasible using CLI? [...] Depends on how "fancy" the document layout is. For most documents worth scanning (let alone OCR'ing), it always has been. Also see OCRopus [2]. > [...] Combining that > with a new feature to have the Org agenda work with indexers (I > participated in a discussion on that here a long while back) would be > interesting. > If you don't intend to create a perfect ASCII copy of the document, and your index is restricted to word occurrence/frequency, it'll do just fine. > On 2011-11-05, Pieter Praet wrote: > > NOTE: When attempting something like this, a fast scanner with a *reliable* > > automatic document feeder will help prevent premature hair loss ;) > > ... > > > [1] http://djvu.org/resources/whatisdjvu.php > > [2] http://gscan2pdf.sourceforge.net/ Peace -- Pieter [1] http://www.sane-project.org/sane-supported-devices.html [2] http://code.google.com/p/ocropus/