On Thu, Nov 12, 2015 at 7:23 AM, Ramon Diaz-Uriarte <rdiaz02@gmail.com>
wrote:

>
>
>
> On Wed, 11-11-2015, at 21:33, Matt Price <moptop99@gmail.com> wrote:
> >>
> >>
> >>   instead of the text. Bummer! I wonder if RepliGO gives you a lot more
> >>   than the rest, or if I am doing something silly.
> >>
> >> I think that there is no standard way of storing the highlight
> contents. I
> > chose Repligo over EZPDF because it gives you access to the text of the
> > highlights!
>
>
> I'll try to see if I can get repligo (I had it a few years ago)
>
> >
> > Okular, I think, stores your annotations in its own database, rather than
> > in the pdf. You can (I think!) attach the annotations to the pdf from
> > inside Okular.  At leasts, that's what I remember from when I was looking
> > around.
>
> Actually, Okular stores the annotations in the PDF itself if you do "Save
> As". (It still keeps an internal db, but I never use it anymore). It is
> easy to check by doing that and then opening the file with another reader
> in another machine (e.g., in an android).
>
>
> my bad, thx.

>
> >
> > Repligo stores the highlighted text in the "subject" field of the
> > annotation. It's possible that the content of the annotation is stored in
> > some other field, like "content".  Maybe you can try:
> >
> > M-: (pdf-annot-get-annots) and look at the output in the *Messages*
> > buffer.  Can you see any evidence of the the text? Can you share what you
> > learned?
>
> Nope, no evidence of the text. I get things such as
>
> (((buffer . #<buffer Frank_2015_Commentary.pdf>) (page . 13) (edges
> 0.113553 0.31717 0.868657 0.361746) (type . highlight) (id . annot-13-0)
> (flags . 4) (color . "#ffff00") (contents . "") (modified 22081 45188)
> (label . "TF201") (subject . "Highlight") (opacity . 1.0) ...)
>
>
> so we get the location of the highlight (and its properties), but not the
> textual contents. And this is the case whether I make the annotation with
> EzPDF or Okular or, for that matter, with pdf-tools itself.
>
> So it seems RepliGO is actually giving you a lot more by default :-)
>
>
Try replacing

(text (assoc-default 'subject annot))

with

(text (pdf-info-gettext page (assoc-default 'edges annot)))


in the lambda function in pdf-annot-markups-as-org-text.  This will fail on
cropped pdfs if you have added highlights using the most recent pdf-tools,
which stores negative values in the 'edges field, but I've found it works
otherwise.  I'd love to hear if it works for you too. (I know you're
following the relevant bug report on the pdf-tools github repo).


>
> >
> > Politza and I are discussing this here:
> > https://github.com/politza/pdf-tools/issues/137
> >
> > that might be a good place to ocntinue the conversation.
> >
>
> I'll do. In the meantime, I think this is a limitation coming from
> poppler. Other people have mentioned similar things (e.g.,
> http://coda.caseykuhlman.com/entries/2014/pdf-extract.html) and using
> other
> tools that depend on poppler (such as Leela:
> https://github.com/TrilbyWhite/Leela) also will not give us the text
> itself.
>
>
>
> >>
> >> Until I found pdf-tools, I had planned to write a node wrapper for
> pdf.js
> > and grab the annotations that way.  But I don't really know how to do
> that,
> > so this turned out to be easier :-)
> >
> > Anyway, I've judated the post, and it's now possible to create links to
> > individualt annotations, though you will have to use my updated version
> of
> > org-pdfview, until/unless Markus accepts my patch.
>
>
> I just updated packages, and things are working perfectly: I am jumping to
> the page and location.
>
>
>
> Thanks,
>
>
> R.
>
>
>
>
>