From mboxrd@z Thu Jan 1 00:00:00 1970 From: Matt Lundin Subject: Re: Org Mode and PDF Notes! Date: Thu, 12 Nov 2015 08:28:44 -0600 Message-ID: <87k2pn70mb.fsf@fastmail.fm> References: <877floffyq.fsf@gmail.com> <87wptnqucw.fsf@gmail.com> Mime-Version: 1.0 Content-Type: text/plain Return-path: Received: from eggs.gnu.org ([2001:4830:134:3::10]:48588) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Zwsrs-0007yZ-LD for emacs-orgmode@gnu.org; Thu, 12 Nov 2015 09:28:53 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1Zwsrn-0006i2-St for emacs-orgmode@gnu.org; Thu, 12 Nov 2015 09:28:52 -0500 Received: from out4-smtp.messagingengine.com ([66.111.4.28]:48854) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Zwsrn-0006hZ-Mt for emacs-orgmode@gnu.org; Thu, 12 Nov 2015 09:28:47 -0500 In-Reply-To: <87wptnqucw.fsf@gmail.com> (Ramon Diaz-Uriarte's message of "Thu, 12 Nov 2015 13:23:43 +0100") List-Id: "General discussions about Org-mode." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-orgmode-bounces+geo-emacs-orgmode=m.gmane.org@gnu.org Sender: emacs-orgmode-bounces+geo-emacs-orgmode=m.gmane.org@gnu.org To: Ramon Diaz-Uriarte Cc: Org Mode Ramon Diaz-Uriarte writes: > > so we get the location of the highlight (and its properties), but not the > textual contents. And this is the case whether I make the annotation with > EzPDF or Okular or, for that matter, with pdf-tools itself. > > So it seems RepliGO is actually giving you a lot more by default :-) > >> >> Politza and I are discussing this here: >> https://github.com/politza/pdf-tools/issues/137 >> >> that might be a good place to ocntinue the conversation. >> > > I'll do. In the meantime, I think this is a limitation coming from > poppler. Other people have mentioned similar things (e.g., > http://coda.caseykuhlman.com/entries/2014/pdf-extract.html) and using other > tools that depend on poppler (such as Leela: > https://github.com/TrilbyWhite/Leela) also will not give us the text > itself. I don't think this is a limitation of poppler so much as the way that pdf annotations work. Typically, the subject/text field is not populated by the text of the highlighted region. Rather, a highlight annotation specifies bounds, color, style, etc. Basically what Repligo does (I wouldn't recommend using it, as it is closed source and severely out of date) is to grab the text *at the time of highlighting* and add it to the notes field. I don't know of any other annotation tool that does the same thing. Applications built on poppler could do it, though they currently do not. For extracting the text of highlighted regions *after the fact*, I've had good luck with this script that relies on the pdf-reader gem for ruby: https://gist.github.com/danlucraft/5277732 Matt