From mboxrd@z Thu Jan 1 00:00:00 1970 From: Ramon Diaz-Uriarte Subject: Re: Org Mode and PDF Notes! Date: Fri, 13 Nov 2015 00:51:41 +0100 Message-ID: <874mgqg4j6.fsf@gmail.com> References: <877floffyq.fsf@gmail.com> <87wptnqucw.fsf@gmail.com> <87k2pn70mb.fsf@fastmail.fm> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit Return-path: Received: from eggs.gnu.org ([2001:4830:134:3::10]:47979) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Zx1ee-0003QK-UW for emacs-orgmode@gnu.org; Thu, 12 Nov 2015 18:51:50 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1Zx1eb-00071L-2t for emacs-orgmode@gnu.org; Thu, 12 Nov 2015 18:51:48 -0500 Received: from mail-wm0-x233.google.com ([2a00:1450:400c:c09::233]:34499) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Zx1ea-000711-TF for emacs-orgmode@gnu.org; Thu, 12 Nov 2015 18:51:45 -0500 Received: by wmvv187 with SMTP id v187so57932421wmv.1 for ; Thu, 12 Nov 2015 15:51:44 -0800 (PST) In-reply-to: List-Id: "General discussions about Org-mode." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-orgmode-bounces+geo-emacs-orgmode=m.gmane.org@gnu.org Sender: emacs-orgmode-bounces+geo-emacs-orgmode=m.gmane.org@gnu.org To: Matt Price Cc: Org Mode On Thu, 12-11-2015, at 23:52, Matt Price wrote: > On Thu, Nov 12, 2015 at 9:28 AM, Matt Lundin wrote: > >> Ramon Diaz-Uriarte writes: >> > >> > I'll do. In the meantime, I think this is a limitation coming from >> > poppler. Other people have mentioned similar things (e.g., >> > http://coda.caseykuhlman.com/entries/2014/pdf-extract.html) and using >> other >> > tools that depend on poppler (such as Leela: >> > https://github.com/TrilbyWhite/Leela) also will not give us the text >> > itself. >> >> I don't think this is a limitation of poppler so much as the way that >> pdf annotations work. Typically, the subject/text field is not populated >> by the text of the highlighted region. Rather, a highlight annotation >> specifies bounds, color, style, etc. Basically what Repligo does (I >> wouldn't recommend using it, as it is closed source and severely out of >> date) is to grab the text *at the time of highlighting* and add it to >> the notes field. I don't know of any other annotation tool that does the >> same thing. Applications built on poppler could do it, though they >> currently do not. >> >> For extracting the text of highlighted regions *after the fact*, I've >> had good luck with this script that relies on the pdf-reader gem for >> ruby: >> >> https://gist.github.com/danlucraft/5277732 >> >> This looks interesting. It searches for file "./markup_receiver", but > doesn't provide that file, which does not appear to be a gem. Any hints? I think I got it from https://www.omniref.com/github/danlucraft/pyranine/HEAD/files/lib/pyranine/markup_receiver.rb > > With politza's help am getting close to being able to extract annotation > text from within pdf-tools, but am not quite there yet. Neat! R. > > >> Matt >> -- Ramon Diaz-Uriarte Department of Biochemistry, Lab B-25 Facultad de Medicina Universidad Autónoma de Madrid Arzobispo Morcillo, 4 28029 Madrid Spain Phone: +34-91-497-2412 Email: rdiaz02@gmail.com ramon.diaz@iib.uam.es http://ligarto.org/rdiaz