emacs-orgmode@gnu.org archives
 help / color / mirror / code / Atom feed
From: Matt Lundin <mdl@imapmail.org>
To: Ramon Diaz-Uriarte <rdiaz02@gmail.com>
Cc: Org Mode <emacs-orgmode@gnu.org>
Subject: Re: Org Mode and PDF Notes!
Date: Thu, 12 Nov 2015 08:28:44 -0600	[thread overview]
Message-ID: <87k2pn70mb.fsf@fastmail.fm> (raw)
In-Reply-To: <87wptnqucw.fsf@gmail.com> (Ramon Diaz-Uriarte's message of "Thu, 12 Nov 2015 13:23:43 +0100")

Ramon Diaz-Uriarte <rdiaz02@gmail.com> writes:

>
> so we get the location of the highlight (and its properties), but not the
> textual contents. And this is the case whether I make the annotation with
> EzPDF or Okular or, for that matter, with pdf-tools itself.
>
> So it seems RepliGO is actually giving you a lot more by default :-)
>
>>
>> Politza and I are discussing this here:
>> https://github.com/politza/pdf-tools/issues/137
>>
>> that might be a good place to ocntinue the conversation.
>>
>
> I'll do. In the meantime, I think this is a limitation coming from
> poppler. Other people have mentioned similar things (e.g.,
> http://coda.caseykuhlman.com/entries/2014/pdf-extract.html) and using other
> tools that depend on poppler (such as Leela:
> https://github.com/TrilbyWhite/Leela) also will not give us the text
> itself. 

I don't think this is a limitation of poppler so much as the way that
pdf annotations work. Typically, the subject/text field is not populated
by the text of the highlighted region. Rather, a highlight annotation
specifies bounds, color, style, etc. Basically what Repligo does (I
wouldn't recommend using it, as it is closed source and severely out of
date) is to grab the text *at the time of highlighting* and add it to
the notes field. I don't know of any other annotation tool that does the
same thing. Applications built on poppler could do it, though they
currently do not.

For extracting the text of highlighted regions *after the fact*, I've
had good luck with this script that relies on the pdf-reader gem for
ruby:

https://gist.github.com/danlucraft/5277732

Matt

  parent reply	other threads:[~2015-11-12 14:28 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-11-11 14:42 Org Mode and PDF Notes! Matt Price
2015-11-11 14:59 ` Kaushal Modi
2015-11-11 20:38   ` Matt Price
2015-11-11 20:48     ` Kaushal Modi
2015-11-11 20:58       ` Matt Price
2015-11-12 12:02         ` Sebastian Christ
2015-11-12 11:58       ` Sebastian Christ
2015-11-11 15:06 ` Xebar Saram
2015-11-11 15:10 ` Russell Adams
2015-11-11 16:40 ` Jeffrey DeLeo
2015-11-11 20:18   ` Matt Price
2015-11-11 17:09 ` Memnon Anon
2015-11-11 20:34   ` Matt Price
2015-11-12 17:31     ` Memnon Anon
2015-11-11 20:17 ` Ramon Diaz-Uriarte
2015-11-11 20:33   ` Matt Price
2015-11-11 22:43     ` Matt Lundin
2015-11-12 12:23     ` Ramon Diaz-Uriarte
2015-11-12 13:11       ` Matt Price
2015-11-13  0:39         ` Ramon Diaz-Uriarte
2015-11-12 14:28       ` Matt Lundin [this message]
2015-11-12 22:52         ` Matt Price
2015-11-12 23:51           ` Ramon Diaz-Uriarte
2015-11-12 23:55         ` Ramon Diaz-Uriarte
2015-11-12 11:30 ` Karl Voit
  -- strict thread matches above, loose matches on Subject: below --
2015-11-11 15:15 Peter Davis
     [not found] <20@gmane.emacs.orgmode.nnrss>
2015-11-13  8:04 ` Matti Minkkinen
2015-11-16 10:07   ` Ramon Diaz-Uriarte

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.orgmode.org/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87k2pn70mb.fsf@fastmail.fm \
    --to=mdl@imapmail.org \
    --cc=emacs-orgmode@gnu.org \
    --cc=rdiaz02@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).