From mboxrd@z Thu Jan 1 00:00:00 1970 From: Ramon Diaz-Uriarte Subject: Re: Org Mode and PDF Notes! Date: Fri, 13 Nov 2015 00:55:14 +0100 Message-ID: <8737wag4d9.fsf@gmail.com> References: <877floffyq.fsf@gmail.com> <87wptnqucw.fsf@gmail.com> <87k2pn70mb.fsf@fastmail.fm> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit Return-path: Received: from eggs.gnu.org ([2001:4830:134:3::10]:48816) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Zx1i4-000530-Oi for emacs-orgmode@gnu.org; Thu, 12 Nov 2015 18:55:21 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1Zx1i0-00084w-Q6 for emacs-orgmode@gnu.org; Thu, 12 Nov 2015 18:55:20 -0500 Received: from mail-wm0-x22c.google.com ([2a00:1450:400c:c09::22c]:35223) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Zx1i0-00084c-KB for emacs-orgmode@gnu.org; Thu, 12 Nov 2015 18:55:16 -0500 Received: by wmdw130 with SMTP id w130so8606847wmd.0 for ; Thu, 12 Nov 2015 15:55:16 -0800 (PST) In-reply-to: <87k2pn70mb.fsf@fastmail.fm> List-Id: "General discussions about Org-mode." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-orgmode-bounces+geo-emacs-orgmode=m.gmane.org@gnu.org Sender: emacs-orgmode-bounces+geo-emacs-orgmode=m.gmane.org@gnu.org To: Matt Lundin Cc: Ramon Diaz-Uriarte , Org Mode On Thu, 12-11-2015, at 15:28, Matt Lundin wrote: > Ramon Diaz-Uriarte writes: > >> >> so we get the location of the highlight (and its properties), but not the >> textual contents. And this is the case whether I make the annotation with >> EzPDF or Okular or, for that matter, with pdf-tools itself. >> >> So it seems RepliGO is actually giving you a lot more by default :-) >> >>> >>> Politza and I are discussing this here: >>> https://github.com/politza/pdf-tools/issues/137 >>> >>> that might be a good place to ocntinue the conversation. >>> >> >> I'll do. In the meantime, I think this is a limitation coming from >> poppler. Other people have mentioned similar things (e.g., >> http://coda.caseykuhlman.com/entries/2014/pdf-extract.html) and using other >> tools that depend on poppler (such as Leela: >> https://github.com/TrilbyWhite/Leela) also will not give us the text >> itself. > > I don't think this is a limitation of poppler so much as the way that > pdf annotations work. Typically, the subject/text field is not populated > by the text of the highlighted region. Rather, a highlight annotation > specifies bounds, color, style, etc. Basically what Repligo does (I > wouldn't recommend using it, as it is closed source and severely out of > date) is to grab the text *at the time of highlighting* and add it to > the notes field. I don't know of any other annotation tool that does the > same thing. Applications built on poppler could do it, though they > currently do not. I stand corrected. You are right; sorry for the sloppiness in the wording and ideas. > > For extracting the text of highlighted regions *after the fact*, I've > had good luck with this script that relies on the pdf-reader gem for > ruby: > > https://gist.github.com/danlucraft/5277732 That is also what I use for extracting the text from the highlighted regions. R. > > Matt -- Ramon Diaz-Uriarte Department of Biochemistry, Lab B-25 Facultad de Medicina Universidad Autónoma de Madrid Arzobispo Morcillo, 4 28029 Madrid Spain Phone: +34-91-497-2412 Email: rdiaz02@gmail.com ramon.diaz@iib.uam.es http://ligarto.org/rdiaz