emacs-orgmode@gnu.org archives
 help / color / mirror / code / Atom feed
* Capture from Firefox to Org-mode
@ 2024-07-09 13:30 Sébastien Gendre
  2024-07-09 15:07 ` Chris Keschnat
  2024-07-11 14:03 ` Ihor Radchenko
  0 siblings, 2 replies; 11+ messages in thread
From: Sébastien Gendre @ 2024-07-09 13:30 UTC (permalink / raw)
  To: emacs-orgmode@gnu.org

[-- Attachment #1: Type: text/plain, Size: 1630 bytes --]

Hello everyone,

How to capture data from a web page to Org-mode ?



* What I need

I want to be able to capture different kind of information :

- Article, to read later

- Book, with all its metadata (title, author, subject, etc)

- Magazin, to buy later (title, editor, price, etc)

- IRL event, that I want to go


For each case, different kind of data is needed to be extracted from the
web page and passed to Org-capture.



* Capture Protocol

I know it exist an org-protocol for capture, but after reading the
manual (section 17.16.2 The capture protocol), there is something
I did not understand.

On the URL of the capture protocol, key value pairs can be specified.
But are they a predefined set of keys, independent to the capture
template, or are they related to the capture template special words (ex:
%a) ?

Is it possible, with Org-protocol, to pass data to be stored into org entry
properties, or an arbitrary string ?



* Firefox extension for Org capture

I have found 2 extensions for capturing to Org-mode from Firefox:

- Org Capture (by Kosta):
  https://addons.mozilla.org/fr/firefox/addon/org-capture/

- org-protocol (by Wojciech Siewierski):
  https://addons.mozilla.org/fr/firefox/addon/org-protocol/

Which one do you suggest ?





Best regards

-------
Gendre Sébastien



PS:

I have seen a typo in the manual "17.16.2 The capture protocol":
https://orgmode.org/manual/The-capture-protocol.html

At the end of the first phrase, it miss a space between the "a" and
"capture". Instead of "a capture", we have "acapture".

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 849 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Capture from Firefox to Org-mode
  2024-07-09 13:30 Capture from Firefox to Org-mode Sébastien Gendre
@ 2024-07-09 15:07 ` Chris Keschnat
  2024-07-09 16:45   ` Morgan Willcock
  2024-07-11 20:57   ` Sébastien Gendre
  2024-07-11 14:03 ` Ihor Radchenko
  1 sibling, 2 replies; 11+ messages in thread
From: Chris Keschnat @ 2024-07-09 15:07 UTC (permalink / raw)
  To: emacs-orgmode

Sébastien Gendre <seb@k-7.ch> writes:

Hi,
I'm not actively using this but I had it configured and just tested that
it works. It might not answer you're questions directly, but I hope it helps.

> How to capture data from a web page to Org-mode ?
>
> * What I need
> I want to be able to capture different kind of information :
> - Article, to read later
> - Book, with all its metadata (title, author, subject, etc)
> - Magazin, to buy later (title, editor, price, etc)
> - IRL event, that I want to go
>
> For each case, different kind of data is needed to be extracted from the
> web page and passed to Org-capture.

This should all be possible.

> * Capture Protocol
>
> I know it exist an org-protocol for capture, but after reading the
> manual (section 17.16.2 The capture protocol), there is something
> I did not understand.
>
> On the URL of the capture protocol, key value pairs can be specified.
> But are they a predefined set of keys, independent to the capture
> template, or are they related to the capture template special words (ex:
> %a) ?
>
> Is it possible, with Org-protocol, to pass data to be stored into org entry
> properties, or an arbitrary string ?

Capturing the selection works.

Sample capture templates:
#+begin_src emacs-lisp
  ("p" "Protocol" entry (file+headline ,(concat my-org-dir "inbox.org") "Inbox")
   "* %^{Title}\nSource: %u, %c\n #+BEGIN_QUOTE\n%i\n#+END_QUOTE\n\n\n%?")

  ("L" "Protocol Link" entry (file+headline ,(concat my-org-dir "inbox.org") "Inbox")
   "* %? [[%:link][%:description]] \nCaptured On: %U")
#+end_src

> * Firefox extension for Org capture
>
> I have found 2 extensions for capturing to Org-mode from Firefox:
>
> - Org Capture (by Kosta):
>   https://addons.mozilla.org/fr/firefox/addon/org-capture/
>
> - org-protocol (by Wojciech Siewierski):
>   https://addons.mozilla.org/fr/firefox/addon/org-protocol/
>
> Which one do you suggest ?

I'm using this one instead.

Extension: https://github.com/Stebalien/org-capture-extension
Image of configured extension: https://0x0.st/XMHN.png

PS:
I had also used this in the past. If you're willing to have a local
python script listening on a port instead of using org-protocol,
it might be an alternative.

https://github.com/karlicoss/grasp


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Capture from Firefox to Org-mode
  2024-07-09 15:07 ` Chris Keschnat
@ 2024-07-09 16:45   ` Morgan Willcock
  2024-07-11 14:27     ` Max Nikulin
  2024-07-11 20:59     ` Sébastien Gendre
  2024-07-11 20:57   ` Sébastien Gendre
  1 sibling, 2 replies; 11+ messages in thread
From: Morgan Willcock @ 2024-07-09 16:45 UTC (permalink / raw)
  To: emacs-orgmode

Chris Keschnat <chris@catsu.it> writes:

>> I have found 2 extensions for capturing to Org-mode from Firefox:
>>
>> - Org Capture (by Kosta):
>>   https://addons.mozilla.org/fr/firefox/addon/org-capture/
>>
>> - org-protocol (by Wojciech Siewierski):
>>   https://addons.mozilla.org/fr/firefox/addon/org-protocol/
>>
>> Which one do you suggest ?
>
> I'm using this one instead.
>
> Extension: https://github.com/Stebalien/org-capture-extension
> Image of configured extension: https://0x0.st/XMHN.png
>
> PS:
> I had also used this in the past. If you're willing to have a local
> python script listening on a port instead of using org-protocol,
> it might be an alternative.
>
> https://github.com/karlicoss/grasp

Just to mention another option, I use this one:

https://addons.mozilla.org/en-US/firefox/addon/linkremark/

I took the easy option and just use the default "integration" which uses
the clipboard, but I think org-protocol is supported as well.

-- 
Morgan Willcock


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Capture from Firefox to Org-mode
  2024-07-09 13:30 Capture from Firefox to Org-mode Sébastien Gendre
  2024-07-09 15:07 ` Chris Keschnat
@ 2024-07-11 14:03 ` Ihor Radchenko
  2024-07-11 20:55   ` Sébastien Gendre
  1 sibling, 1 reply; 11+ messages in thread
From: Ihor Radchenko @ 2024-07-11 14:03 UTC (permalink / raw)
  To: Sébastien Gendre; +Cc: emacs-orgmode@gnu.org

Sébastien Gendre <seb@k-7.ch> writes:

> How to capture data from a web page to Org-mode ?
>
> * What I need
>
> I want to be able to capture different kind of information :
>
> - Article, to read later
>
> - Book, with all its metadata (title, author, subject, etc)
>
> - Magazin, to buy later (title, editor, price, etc)
>
> - IRL event, that I want to go
>
>
> For each case, different kind of data is needed to be extracted from the
> web page and passed to Org-capture.

I wrote https://github.com/yantar92/org-capture-ref to do exactly
this. 

> * Capture Protocol
>
> I know it exist an org-protocol for capture, but after reading the
> manual (section 17.16.2 The capture protocol), there is something
> I did not understand.
>
> On the URL of the capture protocol, key value pairs can be specified.
> But are they a predefined set of keys, independent to the capture
> template, or are they related to the capture template special words (ex:
> %a) ?
>
> Is it possible, with Org-protocol, to pass data to be stored into org entry
> properties, or an arbitrary string ?

Yes, but passing non-standard data is an undocumented feature. (It is
stored in :query property that you may need to extract via
%(plist-get (org-capture-get :query) :my-keyword) placeholder or
something similar.

> I have seen a typo in the manual "17.16.2 The capture protocol":
> https://orgmode.org/manual/The-capture-protocol.html
>
> At the end of the first phrase, it miss a space between the "a" and
> "capture". Instead of "a capture", we have "acapture".

This has been already fixed in de775a36d, but thanks for the heads-up anyway.

-- 
Ihor Radchenko // yantar92,
Org mode contributor,
Learn more about Org mode at <https://orgmode.org/>.
Support Org development at <https://liberapay.com/org-mode>,
or support my work at <https://liberapay.com/yantar92>


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Capture from Firefox to Org-mode
  2024-07-09 16:45   ` Morgan Willcock
@ 2024-07-11 14:27     ` Max Nikulin
  2024-07-11 21:04       ` Sébastien Gendre
  2024-07-11 20:59     ` Sébastien Gendre
  1 sibling, 1 reply; 11+ messages in thread
From: Max Nikulin @ 2024-07-11 14:27 UTC (permalink / raw)
  To: emacs-orgmode

On 09/07/2024 23:45, Morgan Willcock wrote:
> Just to mention another option, I use this one:
> 
> https://addons.mozilla.org/en-US/firefox/addon/linkremark/

In its current state it does not extract metadata specific to books or 
events. Dublin Core or other similar metadata are on TODO list, but with 
low priority. The extension may give priority to doi: links and it may 
be starting point to fetch an entry from some database.

If "magazin to buy" page contains schema.org/Product metadata (LD-JSON 
or microdata) then it may work.

I wonder what is expected capture result for events and what metadata 
are available?

Currently the primary issue is compatibility with recently enforced 
Chrome requirements. More asynchronous code required to perform similar 
tasks (combination of async-await, callbacks and messages between 
extension components).

I have not tried it, but the following package has extractors for some 
sites:
<https://github.com/yantar92/org-capture-ref>



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Capture from Firefox to Org-mode
  2024-07-11 14:03 ` Ihor Radchenko
@ 2024-07-11 20:55   ` Sébastien Gendre
  2024-07-12 11:37     ` Ihor Radchenko
  0 siblings, 1 reply; 11+ messages in thread
From: Sébastien Gendre @ 2024-07-11 20:55 UTC (permalink / raw)
  To: Ihor Radchenko; +Cc: emacs-orgmode@gnu.org

[-- Attachment #1: Type: text/plain, Size: 816 bytes --]


Thank you for your replies.

I wrote more after the quotes below.

Ihor Radchenko <yantar92@posteo.net> writes:
> I wrote https://github.com/yantar92/org-capture-ref to do exactly
> this. 

Is it meant to be used with org-ref ?

I didn't plan to build a bibliography with bibtex (at least for now). I
just plan to built a simple book reading tracker with Org-mode. But I
keep a note about your suggestion, it could be very helpful in the
future.

Thank you very much.

Ihor Radchenko <yantar92@posteo.net> writes:
> Yes, but passing non-standard data is an undocumented feature. (It is
> stored in :query property that you may need to extract via
> %(plist-get (org-capture-get :query) :my-keyword) placeholder or
> something similar.

Thank you for the info. I will take a note about it.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 849 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Capture from Firefox to Org-mode
  2024-07-09 15:07 ` Chris Keschnat
  2024-07-09 16:45   ` Morgan Willcock
@ 2024-07-11 20:57   ` Sébastien Gendre
  1 sibling, 0 replies; 11+ messages in thread
From: Sébastien Gendre @ 2024-07-11 20:57 UTC (permalink / raw)
  To: Chris Keschnat; +Cc: emacs-orgmode

[-- Attachment #1: Type: text/plain, Size: 2516 bytes --]

Thank you for the suggestions, I will take a look.

Chris Keschnat <chris@catsu.it> writes:

> Sébastien Gendre <seb@k-7.ch> writes:
>
> Hi,
> I'm not actively using this but I had it configured and just tested that
> it works. It might not answer you're questions directly, but I hope it helps.
>
>> How to capture data from a web page to Org-mode ?
>>
>> * What I need
>> I want to be able to capture different kind of information :
>> - Article, to read later
>> - Book, with all its metadata (title, author, subject, etc)
>> - Magazin, to buy later (title, editor, price, etc)
>> - IRL event, that I want to go
>>
>> For each case, different kind of data is needed to be extracted from the
>> web page and passed to Org-capture.
>
> This should all be possible.
>
>> * Capture Protocol
>>
>> I know it exist an org-protocol for capture, but after reading the
>> manual (section 17.16.2 The capture protocol), there is something
>> I did not understand.
>>
>> On the URL of the capture protocol, key value pairs can be specified.
>> But are they a predefined set of keys, independent to the capture
>> template, or are they related to the capture template special words (ex:
>> %a) ?
>>
>> Is it possible, with Org-protocol, to pass data to be stored into org entry
>> properties, or an arbitrary string ?
>
> Capturing the selection works.
>
> Sample capture templates:
> #+begin_src emacs-lisp
>   ("p" "Protocol" entry (file+headline ,(concat my-org-dir "inbox.org") "Inbox")
>    "* %^{Title}\nSource: %u, %c\n #+BEGIN_QUOTE\n%i\n#+END_QUOTE\n\n\n%?")
>
>   ("L" "Protocol Link" entry (file+headline ,(concat my-org-dir "inbox.org") "Inbox")
>    "* %? [[%:link][%:description]] \nCaptured On: %U")
> #+end_src
>
>> * Firefox extension for Org capture
>>
>> I have found 2 extensions for capturing to Org-mode from Firefox:
>>
>> - Org Capture (by Kosta):
>>   https://addons.mozilla.org/fr/firefox/addon/org-capture/
>>
>> - org-protocol (by Wojciech Siewierski):
>>   https://addons.mozilla.org/fr/firefox/addon/org-protocol/
>>
>> Which one do you suggest ?
>
> I'm using this one instead.
>
> Extension: https://github.com/Stebalien/org-capture-extension
> Image of configured extension: https://0x0.st/XMHN.png
>
> PS:
> I had also used this in the past. If you're willing to have a local
> python script listening on a port instead of using org-protocol,
> it might be an alternative.
>
> https://github.com/karlicoss/grasp

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 849 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Capture from Firefox to Org-mode
  2024-07-09 16:45   ` Morgan Willcock
  2024-07-11 14:27     ` Max Nikulin
@ 2024-07-11 20:59     ` Sébastien Gendre
  1 sibling, 0 replies; 11+ messages in thread
From: Sébastien Gendre @ 2024-07-11 20:59 UTC (permalink / raw)
  To: Morgan Willcock; +Cc: emacs-orgmode

[-- Attachment #1: Type: text/plain, Size: 368 bytes --]



Morgan Willcock <morgan@ice9.digital> writes:
> Just to mention another option, I use this one:
>
> https://addons.mozilla.org/en-US/firefox/addon/linkremark/
>
> I took the easy option and just use the default "integration" which uses
> the clipboard, but I think org-protocol is supported as well.

A very nice suggestion. I will look at it.

Thank you very much.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 849 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Capture from Firefox to Org-mode
  2024-07-11 14:27     ` Max Nikulin
@ 2024-07-11 21:04       ` Sébastien Gendre
  0 siblings, 0 replies; 11+ messages in thread
From: Sébastien Gendre @ 2024-07-11 21:04 UTC (permalink / raw)
  To: Max Nikulin; +Cc: emacs-orgmode

[-- Attachment #1: Type: text/plain, Size: 1153 bytes --]


Thank you for your reply.

I write wore after the quotes below.


Max Nikulin <manikulin@gmail.com> writes:
> On 09/07/2024 23:45, Morgan Willcock wrote:
>> Just to mention another option, I use this one:
>> https://addons.mozilla.org/en-US/firefox/addon/linkremark/
>
> In its current state it does not extract metadata specific to books or
> events. Dublin Core or other similar metadata are on TODO list, but
> with low priority. The extension may give priority to doi: links and
> it may be starting point to fetch an entry from some database.
>
> If "magazin to buy" page contains schema.org/Product metadata (LD-JSON
> or microdata) then it may work.
>
> I wonder what is expected capture result for events and what metadata
> are available?

For an event, I was thinking of extracting event title, date, place and
organization of an event announced on a Mobilizon web site.


Max Nikulin <manikulin@gmail.com> writes:
> I have not tried it, but the following package has extractors for some
> sites:
> <https://github.com/yantar92/org-capture-ref>

I was already suggested this package on another message, but it seems
too much for what I need.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 849 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Capture from Firefox to Org-mode
  2024-07-11 20:55   ` Sébastien Gendre
@ 2024-07-12 11:37     ` Ihor Radchenko
  2024-07-12 16:42       ` Sébastien Gendre
  0 siblings, 1 reply; 11+ messages in thread
From: Ihor Radchenko @ 2024-07-12 11:37 UTC (permalink / raw)
  To: Sébastien Gendre; +Cc: emacs-orgmode@gnu.org

Sébastien Gendre <seb@k-7.ch> writes:

> Ihor Radchenko <yantar92@posteo.net> writes:
>> I wrote https://github.com/yantar92/org-capture-ref to do exactly
>> this. 
>
> Is it meant to be used with org-ref ?

No. It has nothing to do with org-ref.

> I didn't plan to build a bibliography with bibtex (at least for now). I
> just plan to built a simple book reading tracker with Org-mode. But I
> keep a note about your suggestion, it could be very helpful in the
> future.

org-capture-ref does not create bibtex. It just stores metadata and
chooses a format of that metadata in Org headings that is compatible
with ol-bibtex library.

I use org-capture-ref as reading tracker myself.

Examples of books/articles captured by org-capture-ref:

**** DONE Cal Newport [Goodreads] Digital Minimalism: Choosing a Focused Life in a Noisy World :book:ATTACH:
CLOSED: [2021-02-27 Sat 22:33]
:PROPERTIES:
:ID: goodreads_digit_minim_choos_focus_life
:CREATED: [2020-07-28 Tue 10:18]
:Effort:   0:20
:HOWPUBLISHED: Goodreads
:URL:      https://www.goodreads.com/book/show/40672036-digital-minimalism
:NOTE:     Online; accessed 12 July 2024
:AUTHOR:   Cal Newport
:BTYPE:    book
:TITLE:    Digital Minimalism: Choosing a Focused Life in a Noisy World
:END:

***** DONE Blair [JAMA] (1989) Physical Fitness and All-Cause Mortality :article:ATTACH:
SCHEDULED: <2021-03-14 Sun>
:PROPERTIES:
:TITLE:    Physical Fitness and All-Cause Mortality
:BTYPE:    article
:ID:       3c7ab86d5821f6c622d20c6b83605ef5e4e02691
:AUTHOR:   Steven N. Blair
:JOURNAL:  JAMA
:VOLUME:   262
:NUMBER:   17
:PAGES:    2395
:YEAR:     1989
:DOI:      10.1001/jama.1989.03430170057028
:URL:      https://doi.org/10.1001/jama.1989.03430170057028
:CREATED:  [2021-03-13 Sat 20:50]
:HOWPUBLISHED: Jamanetwork
:NOTE:     Online; accessed 13 March 2021
:END:



-- 
Ihor Radchenko // yantar92,
Org mode contributor,
Learn more about Org mode at <https://orgmode.org/>.
Support Org development at <https://liberapay.com/org-mode>,
or support my work at <https://liberapay.com/yantar92>


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Capture from Firefox to Org-mode
  2024-07-12 11:37     ` Ihor Radchenko
@ 2024-07-12 16:42       ` Sébastien Gendre
  0 siblings, 0 replies; 11+ messages in thread
From: Sébastien Gendre @ 2024-07-12 16:42 UTC (permalink / raw)
  To: Ihor Radchenko; +Cc: emacs-orgmode@gnu.org

[-- Attachment #1: Type: text/plain, Size: 2075 bytes --]


Thank you for your clarification. I will take a look.

Ihor Radchenko <yantar92@posteo.net> writes:

> Sébastien Gendre <seb@k-7.ch> writes:
>
>> Ihor Radchenko <yantar92@posteo.net> writes:
>>> I wrote https://github.com/yantar92/org-capture-ref to do exactly
>>> this. 
>>
>> Is it meant to be used with org-ref ?
>
> No. It has nothing to do with org-ref.
>
>> I didn't plan to build a bibliography with bibtex (at least for now). I
>> just plan to built a simple book reading tracker with Org-mode. But I
>> keep a note about your suggestion, it could be very helpful in the
>> future.
>
> org-capture-ref does not create bibtex. It just stores metadata and
> chooses a format of that metadata in Org headings that is compatible
> with ol-bibtex library.
>
> I use org-capture-ref as reading tracker myself.
>
> Examples of books/articles captured by org-capture-ref:
>
> **** DONE Cal Newport [Goodreads] Digital Minimalism: Choosing a Focused Life in a Noisy World :book:ATTACH:
> CLOSED: [2021-02-27 Sat 22:33]
> :PROPERTIES:
> :ID: goodreads_digit_minim_choos_focus_life
> :CREATED: [2020-07-28 Tue 10:18]
> :Effort:   0:20
> :HOWPUBLISHED: Goodreads
> :URL:      https://www.goodreads.com/book/show/40672036-digital-minimalism
> :NOTE:     Online; accessed 12 July 2024
> :AUTHOR:   Cal Newport
> :BTYPE:    book
> :TITLE:    Digital Minimalism: Choosing a Focused Life in a Noisy World
> :END:
>
> ***** DONE Blair [JAMA] (1989) Physical Fitness and All-Cause Mortality :article:ATTACH:
> SCHEDULED: <2021-03-14 Sun>
> :PROPERTIES:
> :TITLE:    Physical Fitness and All-Cause Mortality
> :BTYPE:    article
> :ID:       3c7ab86d5821f6c622d20c6b83605ef5e4e02691
> :AUTHOR:   Steven N. Blair
> :JOURNAL:  JAMA
> :VOLUME:   262
> :NUMBER:   17
> :PAGES:    2395
> :YEAR:     1989
> :DOI:      10.1001/jama.1989.03430170057028
> :URL:      https://doi.org/10.1001/jama.1989.03430170057028
> :CREATED:  [2021-03-13 Sat 20:50]
> :HOWPUBLISHED: Jamanetwork
> :NOTE:     Online; accessed 13 March 2021
> :END:

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 849 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2024-07-12 16:43 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-07-09 13:30 Capture from Firefox to Org-mode Sébastien Gendre
2024-07-09 15:07 ` Chris Keschnat
2024-07-09 16:45   ` Morgan Willcock
2024-07-11 14:27     ` Max Nikulin
2024-07-11 21:04       ` Sébastien Gendre
2024-07-11 20:59     ` Sébastien Gendre
2024-07-11 20:57   ` Sébastien Gendre
2024-07-11 14:03 ` Ihor Radchenko
2024-07-11 20:55   ` Sébastien Gendre
2024-07-12 11:37     ` Ihor Radchenko
2024-07-12 16:42       ` Sébastien Gendre

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).