From mboxrd@z Thu Jan 1 00:00:00 1970 From: Matt Price Subject: Re: Org Mode and PDF Notes! Date: Thu, 12 Nov 2015 08:11:23 -0500 Message-ID: References: <877floffyq.fsf@gmail.com> <87wptnqucw.fsf@gmail.com> Mime-Version: 1.0 Content-Type: multipart/alternative; boundary=001a1141cf8cfee8e1052457ad9b Return-path: Received: from eggs.gnu.org ([2001:4830:134:3::10]:58237) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Zwrev-0006Nc-Vj for emacs-orgmode@gnu.org; Thu, 12 Nov 2015 08:11:27 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1Zwreu-00047q-5e for emacs-orgmode@gnu.org; Thu, 12 Nov 2015 08:11:25 -0500 Received: from mail-io0-x22d.google.com ([2607:f8b0:4001:c06::22d]:36393) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Zwreu-00047a-0l for emacs-orgmode@gnu.org; Thu, 12 Nov 2015 08:11:24 -0500 Received: by iofh3 with SMTP id h3so64162938iof.3 for ; Thu, 12 Nov 2015 05:11:23 -0800 (PST) In-Reply-To: <87wptnqucw.fsf@gmail.com> List-Id: "General discussions about Org-mode." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-orgmode-bounces+geo-emacs-orgmode=m.gmane.org@gnu.org Sender: emacs-orgmode-bounces+geo-emacs-orgmode=m.gmane.org@gnu.org To: Ramon Diaz-Uriarte Cc: Org Mode --001a1141cf8cfee8e1052457ad9b Content-Type: text/plain; charset=UTF-8 On Thu, Nov 12, 2015 at 7:23 AM, Ramon Diaz-Uriarte wrote: > > > > On Wed, 11-11-2015, at 21:33, Matt Price wrote: > >> > >> > >> instead of the text. Bummer! I wonder if RepliGO gives you a lot more > >> than the rest, or if I am doing something silly. > >> > >> I think that there is no standard way of storing the highlight > contents. I > > chose Repligo over EZPDF because it gives you access to the text of the > > highlights! > > > I'll try to see if I can get repligo (I had it a few years ago) > > > > > Okular, I think, stores your annotations in its own database, rather than > > in the pdf. You can (I think!) attach the annotations to the pdf from > > inside Okular. At leasts, that's what I remember from when I was looking > > around. > > Actually, Okular stores the annotations in the PDF itself if you do "Save > As". (It still keeps an internal db, but I never use it anymore). It is > easy to check by doing that and then opening the file with another reader > in another machine (e.g., in an android). > > > my bad, thx. > > > > > Repligo stores the highlighted text in the "subject" field of the > > annotation. It's possible that the content of the annotation is stored in > > some other field, like "content". Maybe you can try: > > > > M-: (pdf-annot-get-annots) and look at the output in the *Messages* > > buffer. Can you see any evidence of the the text? Can you share what you > > learned? > > Nope, no evidence of the text. I get things such as > > (((buffer . #) (page . 13) (edges > 0.113553 0.31717 0.868657 0.361746) (type . highlight) (id . annot-13-0) > (flags . 4) (color . "#ffff00") (contents . "") (modified 22081 45188) > (label . "TF201") (subject . "Highlight") (opacity . 1.0) ...) > > > so we get the location of the highlight (and its properties), but not the > textual contents. And this is the case whether I make the annotation with > EzPDF or Okular or, for that matter, with pdf-tools itself. > > So it seems RepliGO is actually giving you a lot more by default :-) > > Try replacing (text (assoc-default 'subject annot)) with (text (pdf-info-gettext page (assoc-default 'edges annot))) in the lambda function in pdf-annot-markups-as-org-text. This will fail on cropped pdfs if you have added highlights using the most recent pdf-tools, which stores negative values in the 'edges field, but I've found it works otherwise. I'd love to hear if it works for you too. (I know you're following the relevant bug report on the pdf-tools github repo). > > > > > Politza and I are discussing this here: > > https://github.com/politza/pdf-tools/issues/137 > > > > that might be a good place to ocntinue the conversation. > > > > I'll do. In the meantime, I think this is a limitation coming from > poppler. Other people have mentioned similar things (e.g., > http://coda.caseykuhlman.com/entries/2014/pdf-extract.html) and using > other > tools that depend on poppler (such as Leela: > https://github.com/TrilbyWhite/Leela) also will not give us the text > itself. > > > > >> > >> Until I found pdf-tools, I had planned to write a node wrapper for > pdf.js > > and grab the annotations that way. But I don't really know how to do > that, > > so this turned out to be easier :-) > > > > Anyway, I've judated the post, and it's now possible to create links to > > individualt annotations, though you will have to use my updated version > of > > org-pdfview, until/unless Markus accepts my patch. > > > I just updated packages, and things are working perfectly: I am jumping to > the page and location. > > > > Thanks, > > > R. > > > > > --001a1141cf8cfee8e1052457ad9b Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable


On Thu, Nov 12, 2015 at 7:23 AM, Ramon Diaz-Uriarte &= lt;rdiaz02@gmail.com= > wrote:



On Wed, 11-11-2015, at 21:33, Matt Price <moptop99@gmail.com> wrote:
>>
>>
>>=C2=A0 =C2=A0instead of the text. Bummer! I wonder if RepliGO gives= you a lot more
>>=C2=A0 =C2=A0than the rest, or if I am doing something silly.
>>
>> I think that there is no standard way of storing the highlight con= tents. I
> chose Repligo over EZPDF because it gives you access to the text of th= e
> highlights!


I'll try to see if I can get repligo (I had it a few years ago)<= br>
>
> Okular, I think, stores your annotations in its own database, rather t= han
> in the pdf. You can (I think!) attach the annotations to the pdf from<= br> > inside Okular.=C2=A0 At leasts, that's what I remember from when I= was looking
> around.

Actually, Okular stores the annotations in the PDF itself if you do = "Save
As". (It still keeps an internal db, but I never use it anymore). It i= s
easy to check by doing that and then opening the file with another reader in another machine (e.g., in an android).


my bad, thx.=C2=A0

>
> Repligo stores the highlighted text in the "subject" field o= f the
> annotation. It's possible that the content of the annotation is st= ored in
> some other field, like "content".=C2=A0 Maybe you can try: >
> M-: (pdf-annot-get-annots) and look at the output in the *Messages* > buffer.=C2=A0 Can you see any evidence of the the text? Can you share = what you
> learned?

Nope, no evidence of the text. I get things such as

(((buffer . #<buffer Frank_2015_Commentary.pdf>) (page . 13) (edges 0.113553 0.31717 0.868657 0.361746) (type . highlight) (id . annot-13-0) (flags . 4) (color . "#ffff00") (contents . "") (modifi= ed 22081 45188)
(label . "TF201") (subject . "Highlight") (opacity . 1.= 0) ...)


so we get the location of the highlight (and its properties), but not the textual contents. And this is the case whether I make the annotation with EzPDF or Okular or, for that matter, with pdf-tools itself.

So it seems RepliGO is actually giving you a lot more by default :-)


Try replacing
(text (assoc-default 'subject annot))

with=

(text (pdf-info-gettext page (assoc-default 'edges annot)))
in the lambda function in pdf-annot-markups-as-org-text.= =C2=A0 This will fail on cropped pdfs if you have added highlights using th= e most recent pdf-tools, which stores negative values in the 'edges fie= ld, but I've found it works otherwise.=C2=A0 I'd love to hear if it= works for you too. (I know you're following the relevant bug report on= the pdf-tools github repo).
=C2=A0

>
> Politza and I are discussing this here:
> https://github.com/politza/pdf-tools/issues/137<= /a>
>
> that might be a good place to ocntinue the conversation.
>

I'll do. In the meantime, I think this is a limitation coming fr= om
poppler. Other people have mentioned similar things (e.g.,
http://coda.caseykuhlman.com/entries/2014= /pdf-extract.html) and using other
tools that depend on poppler (such as Leela:
https://github.com/TrilbyWhite/Leela) also will not give us= the text
itself.



>>
>> Until I found pdf-tools, I had planned to write a node wrapper for= pdf.js
> and grab the annotations that way.=C2=A0 But I don't really know h= ow to do that,
> so this turned out to be easier :-)
>
> Anyway, I've judated the post, and it's now possible to create= links to
> individualt annotations, though you will have to use my updated versio= n of
> org-pdfview, until/unless Markus accepts my patch.


I just updated packages, and things are working perfectly: I am jump= ing to
the page and location.



Thanks,


R.





--001a1141cf8cfee8e1052457ad9b--