On Thu, Nov 12, 2015 at 9:28 AM, Matt Lundin <mdl@imapmail.org> wrote:
Ramon Diaz-Uriarte <rdiaz02@gmail.com> writes:
> I'll do. In the meantime, I think this is a limitation coming from
> poppler. Other people have mentioned similar things (e.g.,
> http://coda.caseykuhlman.com/entries/2014/pdf-extract.html) and using other
> tools that depend on poppler (such as Leela:
> https://github.com/TrilbyWhite/Leela) also will not give us the text
> itself.

I don't think this is a limitation of poppler so much as the way that
pdf annotations work. Typically, the subject/text field is not populated
by the text of the highlighted region. Rather, a highlight annotation
specifies bounds, color, style, etc. Basically what Repligo does (I
wouldn't recommend using it, as it is closed source and severely out of
date) is to grab the text *at the time of highlighting* and add it to
the notes field. I don't know of any other annotation tool that does the
same thing. Applications built on poppler could do it, though they
currently do not.

For extracting the text of highlighted regions *after the fact*, I've
had good luck with this script that relies on the pdf-reader gem for


This looks interesting. It searches for file "./markup_receiver", but doesn't provide that file, which does not appear to be a gem.  Any hints? 

With politza's help am getting close to being able to extract annotation text from within pdf-tools, but am not quite there yet. 