emacs-orgmode@gnu.org archives
 help / color / mirror / code / Atom feed
* Yet another browser extension for capturing notes - LinkRemark
@ 2020-12-25 12:44 Maxim Nikulin
  2020-12-25 14:19 ` Ihor Radchenko
  2020-12-25 14:26 ` Yet another browser extension for capturing notes - LinkRemark Russell Adams
  0 siblings, 2 replies; 20+ messages in thread
From: Maxim Nikulin @ 2020-12-25 12:44 UTC (permalink / raw)
  To: emacs-orgmode

I am experimenting with a browser add-on that is intended
to be a bridge between browser and Org mode.
In the family of Org mode capture helpers it is among ones
that adds web page metadata to the note.
Source code repository: https://github.com/maxnikulin/linkremark

Examples

Link:

--->8---
Link: Karl Voit: UOMF: Managing web bookmarks with Org Mode
   :PROPERTIES:
   :DATE_ADDED: [2020-12-25 18:06]
   :END:

- Link URL :: [[https://karl-voit.at/2014/08/10/bookmarks-with-orgmode/]]
- Link text :: Karl Voit: UOMF: Managing web bookmarks with Org Mode

On the page

- URL :: [[https://alphapapa.github.io/org-almanac/]]
- title :: org-almanac
- author :: Adam Porter
- referrer :: [[https://www.google.com/]]
---8<---

Page:

--->8---
public voit
   :PROPERTIES:
   :DATE_ADDED: [2020-12-25 18:11]
   :URL_IMAGE: http://Karl-Voit.at/images/public-voit_T_logo_200x200.png
   :END:

- URL :: [[https://karl-voit.at/2014/08/10/bookmarks-with-orgmode/]]
- title :: public voit
- author :: Karl Voit
- published_time :: 2014-08-10T17:13+01:00
- referrer :: [[https://alphapapa.github.io/org-almanac/]]

#+begin_quote
In my notes.org file, I collect all kind of snippets, knowledge, ideas, 
how-tos, and such stuff.
#+end_quote
---8<---

It is not really ready for the wild web, though
I believe it is already possible to get general impression
and even use it for pages where specially crafted data
are rather unlikely. Due to early development stage,
there is no stability promise yet.

The extension has not published to catalogues of browser extensions.
Signed version for Firefox could be found in "releases" section
on GitHub: 
https://github.com/maxnikulin/linkremark/releases/download/v0.1/linkremark-0.1-fx.xpi
For chrome/chromium it could be loaded as unpacked
extension. Just clone the code and create a symlink
to =manifest-chrome.json= named =manifest.json=.

=README.org= file contains a bit more details,
so visit [[https://github.com/maxnikulin/linkremark]]
or just clone this repository.

The mail list is quite noisy last couple of months,
so, please, do not post lengthy proposals how to integrate
this extension to everything in response.

The gift is crafted quite roughly, glue has not fully cured,
so do not be surprised if you are stuck trying to adapt it
for your habits.

Merry Christmas and Happy New Year!



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Yet another browser extension for capturing notes - LinkRemark
  2020-12-25 12:44 Yet another browser extension for capturing notes - LinkRemark Maxim Nikulin
@ 2020-12-25 14:19 ` Ihor Radchenko
  2020-12-26 11:49   ` Maxim Nikulin
  2020-12-25 14:26 ` Yet another browser extension for capturing notes - LinkRemark Russell Adams
  1 sibling, 1 reply; 20+ messages in thread
From: Ihor Radchenko @ 2020-12-25 14:19 UTC (permalink / raw)
  To: Maxim Nikulin, emacs-orgmode

Maxim Nikulin <manikulin@gmail.com> writes:

> I am experimenting with a browser add-on that is intended
> to be a bridge between browser and Org mode.
> In the family of Org mode capture helpers it is among ones
> that adds web page metadata to the note.
> Source code repository: https://github.com/maxnikulin/linkremark

The author of org-capture-ref here.

Reading through the code, I can see that you are familiar with metadata
conventions. Do you know good references about what og: metadata is
commonly used? I looked through the official OpenGraph specification,
but popular websites appear to ignore most of the conventions.

Also, org-capture-ref does not really force the user to put BiBTeX into
the capture. Individual metadata fields are available using
org-capture-ref-get-bibtex-field (which extracts data from internal
alist structure). It's just that I mostly had BiBTeX in mind (with
distant goal of supporting export to LaTeX) for my use-cases.

Finally, would you be interested to join efforts on metadata parsing? (I
hope this question does not qualify as "integrate this extension to
everything").

P.S. Some links I collected myself when working on org-capture-ref. They
might also be of interest for you:

- https://github.com/ageitgey/node-unfluff
- https://github.com/gabceb/node-metainspector
- https://github.com/wikimedia/html-metadata
- https://github.com/microlinkhq/metascraper
- https://github.com/hboisgibault/unicontent

Best,
Ihor




^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Yet another browser extension for capturing notes - LinkRemark
  2020-12-25 12:44 Yet another browser extension for capturing notes - LinkRemark Maxim Nikulin
  2020-12-25 14:19 ` Ihor Radchenko
@ 2020-12-25 14:26 ` Russell Adams
  2020-12-25 22:11   ` Samuel Wales
  1 sibling, 1 reply; 20+ messages in thread
From: Russell Adams @ 2020-12-25 14:26 UTC (permalink / raw)
  To: emacs-orgmode

On Fri, Dec 25, 2020 at 07:44:22PM +0700, Maxim Nikulin wrote:
> I am experimenting with a browser add-on that is intended
> to be a bridge between browser and Org mode.
> In the family of Org mode capture helpers it is among ones
> that adds web page metadata to the note.
> Source code repository: https://github.com/maxnikulin/linkremark

That's a really neat idea!

I hadn't previously considered having a Firefox plugin to capture
information. Now I must look!

------------------------------------------------------------------
Russell Adams                            RLAdams@AdamsInfoServ.com

PGP Key ID:     0x1160DCB3           http://www.adamsinfoserv.com/

Fingerprint:    1723 D8CA 4280 1EC9 557F  66E8 1154 E018 1160 DCB3


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Yet another browser extension for capturing notes - LinkRemark
  2020-12-25 14:26 ` Yet another browser extension for capturing notes - LinkRemark Russell Adams
@ 2020-12-25 22:11   ` Samuel Wales
  2020-12-26  9:16     ` Maxim Nikulin
  0 siblings, 1 reply; 20+ messages in thread
From: Samuel Wales @ 2020-12-25 22:11 UTC (permalink / raw)
  To: emacs-orgmode

maxim, it is great to see new work in this area.  thanks for sharing.

russell, i use the org-capture extension for firefox, which is on the
firefox extensions site.  it is for if you want a different set of
data captured [it uses your org capture template].  it works well for
me.

[not a suggestion for maxim to integrate into everything; ignore
please.  i can imagine great things possible with such extensions. for
example, you could have sets of tabs, selected by right click in
firefox, to save to a bunch of org entries.  then you could load that
particular set of entries into firefox whenever you want.  and you
could keep notes on each page and move the entries wherever you want.
this would be useful for such things as "i am researching rice
cookers; these are my tabs, but i don't want them cluttering firefox
and i want them with my org notes and to make notes on them and will
re-load them into firefox when i want to revisit".]

[now if i can only debug the extra-blank-lines-in-capture problem.]


On 12/25/20, Russell Adams <RLAdams@adamsinfoserv.com> wrote:
> On Fri, Dec 25, 2020 at 07:44:22PM +0700, Maxim Nikulin wrote:
>> I am experimenting with a browser add-on that is intended
>> to be a bridge between browser and Org mode.
>> In the family of Org mode capture helpers it is among ones
>> that adds web page metadata to the note.
>> Source code repository: https://github.com/maxnikulin/linkremark
>
> That's a really neat idea!
>
> I hadn't previously considered having a Firefox plugin to capture
> information. Now I must look!
>
> ------------------------------------------------------------------
> Russell Adams                            RLAdams@AdamsInfoServ.com
>
> PGP Key ID:     0x1160DCB3           http://www.adamsinfoserv.com/
>
> Fingerprint:    1723 D8CA 4280 1EC9 557F  66E8 1154 E018 1160 DCB3
>
>


-- 
The Kafka Pandemic

Please learn what misopathy is.
https://thekafkapandemic.blogspot.com/2013/10/why-some-diseases-are-wronged.html


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Yet another browser extension for capturing notes - LinkRemark
  2020-12-25 22:11   ` Samuel Wales
@ 2020-12-26  9:16     ` Maxim Nikulin
  2022-01-17  2:29       ` Samuel Wales
  0 siblings, 1 reply; 20+ messages in thread
From: Maxim Nikulin @ 2020-12-26  9:16 UTC (permalink / raw)
  To: emacs-orgmode

On 26/12/2020, Samuel Wales wrote:

> [... i can imagine great things possible with such extensions. for
> example, you could have sets of tabs, selected by right click in
> firefox, to save to a bunch of org entries.  then you could load that
> particular set of entries into firefox whenever you want.  and you
> could keep notes on each page and move the entries wherever you want.
> this would be useful for such things as "i am researching rice
> cookers; these are my tabs, but i don't want them cluttering firefox
> and i want them with my org notes and to make notes on them and will
> re-load them into firefox when i want to revisit".]

It should be possible since some tab management extension were used in 
mozilla to evaluate if webextensions are mature enough and if support of 
XUL add-ons could be dropped. On the other hand do not expect such 
feature soon. A kind of semi-blocker is absence of automatic tests to 
run before every release, and it will require a lot of time.

In the meanwhile, have you looked at the following comment?
https://github.com/sprig/org-capture-extension/issues/12#issuecomment-323569334
alphapapa commented Aug 20, 2017

> You can do this with the "Copy all URLs" extension (ID:
> djdmadneanknadilpjiknlnanaolmbfk). Use this as the custom format (note
> the linebreak):
> 
> [[$url][$title]]

I am almost sure that similar extension should exist for Firefox as well.

Some points should be clarified in my opinion

- Do you expect that metadata should be captured in addition to URLs and 
titles? Browsers can unload some tabs making page content unavailable.
- Are you going to capture reviews of "rice cookers" that could be 
considered as ordinary pages or you are going to save items from online 
stores? I do not current state of affairs but I have heard about some 
activity for special metadata that allows search engines to display 
products in a special way. Could you inspect head element of pages in 
your favorite stores contains desired metadata using page source or 
inspect element tools?
- Should tab group be captured as single Org heading or it should be a 
tree with a section per tab? I am not sure that capture will have no 
problem with subtree. Certainly Emacs interface for org-protocol + 
capture are not suitable for sending each tab as a separate link. 
Another option is to create nested lists, anyway org formatter in my 
extension need improvements. Are you expecting headings subtree or 
nested lists?

> [now if i can only debug the extra-blank-lines-in-capture problem.]

Fully agree that it is really annoying. It is among high priority items 
in my TODO list.

Accidentally I pressed =C-x C-o= and discovered 
[[help:delete-blank-lines]] innerText is not exactly the same as 
selection range toString but the rules could work in a similar way. 
Table rows, floating and absolutely positioned elements require 
newlines. Such elements are often abused by designers.
https://html.spec.whatwg.org/multipage/dom.html#dom-innertext



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Yet another browser extension for capturing notes - LinkRemark
  2020-12-25 14:19 ` Ihor Radchenko
@ 2020-12-26 11:49   ` Maxim Nikulin
  2020-12-26 13:49     ` Ihor Radchenko
  0 siblings, 1 reply; 20+ messages in thread
From: Maxim Nikulin @ 2020-12-26 11:49 UTC (permalink / raw)
  To: emacs-orgmode

On 25/12/2020, Ihor Radchenko wrote:
> 
> Reading through the code, I can see that you are familiar with metadata
> conventions. Do you know good references about what og: metadata is
> commonly used? I looked through the official OpenGraph specification,
> but popular websites appear to ignore most of the conventions.

I just inspected pages on several sites using developer tools and added
code that handles noticed elements.

I have not tried to find any resources on metadata (OK, once I searched 
for LD+JSON, essentially the outcome was the link to schema.org that I 
have seen in data already). Looking into page source, I realized that 
almost nobody cares if the site has metadata of appropriate quality. I 
think, search engines are advanced enough to work without metadata and 
even decrease page rank if something suspicious was added by SEO. The 
only force to add some formal data is "share" buttons. Maybe some guides 
for web developers from social networks or search engines could be more 
useful than formal references, but I have not had a closer look.

> Also, org-capture-ref does not really force the user to put BiBTeX into
> the capture. Individual metadata fields are available using
> org-capture-ref-get-bibtex-field (which extracts data from internal
> alist structure). It's just that I mostly had BiBTeX in mind (with
> distant goal of supporting export to LaTeX) for my use-cases.

I do not have clear vision how to use collected data for queries. 
Certainly I want to have more human-friendly representation than BibTeX 
entries (maybe in addition to machine-parsable data) adjacent to my notes.

Personally, I would prefer to avoid http queries from Emacs. Sometimes 
it is better to have current DOM state, not page source, that is why I 
decided to gather data inside browser, despite security fences that are 
placed quite strangely in some cases.

 From my point of view, you should be happy with any of projects you 
mentioned below. Are all of them have some problems critical for you?

Technically it should be possible to push e.g. raw 
document.head.innerHtml to any external metadata parser using native 
messaging (to deal with sites requiring authorization). However it could 
cause an alarm during review before publication of the extension to the 
browser catalogues.

> Finally, would you be interested to join efforts on metadata parsing?

Could you, please, share a bit more details on your ideas? There is some 
room for improvement, but I do not think that quality of metadata for 
ordinary sites could be dramatically better. The case that is not 
handled it all is scientific publications, unfortunately currently I 
have quite little interest in it. Definitely results should be stored in 
some structured format such as BibTeX. I have seen huge <head> elements 
describing even all references. Certainly such lists are not for 
general-purpose notes (at least without explicit request from the user), 
they should be handled by some bibliography software to display citation 
graphs in the local library. On the other hand it is not a problem to 
feed such data to some tool using native messaging protocol. I have no 
idea if various publisher provide such data in a uniform way, I just 
hope that pressure from citation indices and bibliography management 
software has positive influence on standardization.

I am not going to blow up the code with recipes for particular sites. 
However I realize that some special cases still should be handled. I am 
not ready to adapt user script model used by 
Greasemonkey/Violentmonkey/Tampermonkey. I believe, it is better to 
create dedicated extension(s) that either adds and overwrites existing 
meta elements or allows to query gathered data using sendMessage 
webextensions interface. By the way, scripts for above mentioned 
extensions could be used as well. It should alleviate cases when some 
site with insane metadata is important for particular user.

> P.S. Some links I collected myself when working on org-capture-ref. They
> might also be of interest for you:
> 
> - https://github.com/ageitgey/node-unfluff
> - https://github.com/gabceb/node-metainspector
> - https://github.com/wikimedia/html-metadata
> - https://github.com/microlinkhq/metascraper
> - https://github.com/hboisgibault/unicontent

Thank you for the links. I should have a closer look at that projects. 
E.g. I considered itemprop="author" elements but postponed 
implementation of such features. For some reason I even did not tried to 
find existing projects for metadata extraction. Maybe I still hope that 
quite simple implementation could handle most of the cases.



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Yet another browser extension for capturing notes - LinkRemark
  2020-12-26 11:49   ` Maxim Nikulin
@ 2020-12-26 13:49     ` Ihor Radchenko
  2020-12-27 12:18       ` Maxim Nikulin
  2021-11-18 17:01       ` LinkRemark Firefox extension approved for addons.mozilla.org Max Nikulin
  0 siblings, 2 replies; 20+ messages in thread
From: Ihor Radchenko @ 2020-12-26 13:49 UTC (permalink / raw)
  To: Maxim Nikulin, emacs-orgmode

Maxim Nikulin <manikulin@gmail.com> writes:

> I just inspected pages on several sites using developer tools and added
> code that handles noticed elements.

I see. I basically did the same, except some minimal support for
OpenGraph (though I stopped when I saw that even YouTube is not
following the standard, except the most basic fields).

> The only force to add some formal data is "share" buttons. Maybe some
> guides for web developers from social networks or search engines could
> be more useful than formal references, but I have not had a closer
> look.

It is also consistent with what I saw. <meta .. twitter:..> fields seems
to be very common.

>> Also, org-capture-ref does not really force the user to put BiBTeX into
>> the capture. Individual metadata fields are available using
>> org-capture-ref-get-bibtex-field (which extracts data from internal
>> alist structure). It's just that I mostly had BiBTeX in mind (with
>> distant goal of supporting export to LaTeX) for my use-cases.
>
> I do not have clear vision how to use collected data for queries. 
> Certainly I want to have more human-friendly representation than BibTeX 
> entries (maybe in addition to machine-parsable data) adjacent to my notes.

So far, I found author, website name, publication year, title, and
resource type useful. My standard capture template for links is:

* <Author> [<Website>] (<Year>) Title

Example:

* dash-docs-el [Github] Dash-Docs-El Helm-Dash: Browse Dash Docsets Inside Emacs

Such headlines can be easily searched later, especially when I also add
some #keywords manually.

> Personally, I would prefer to avoid http queries from Emacs. Sometimes 
> it is better to have current DOM state, not page source, that is why I 
> decided to gather data inside browser, despite security fences that are 
> placed quite strangely in some cases.

Completely agree here. That's why I directly reuse the current DOM state
from qutebrowser in my own setup. However, extension for qutebrowser was
easy to write for me as it can be simply a bash script. I know nothing
about Firefox/Chrome extensions and I do not know javascript.

On the other hand, having an ability to get html is still useful in my
case (Emacs package) when the capture is not done from browser. For
example, I often capture links from elfeed - http query from Emacs is
useful then.

>  From my point of view, you should be happy with any of projects you 
> mentioned below. Are all of them have some problems critical for you?

They are all javascript, except one (unicontent), which can be easily
replaced with built-in Elisp libraries (dom.el).

>> Finally, would you be interested to join efforts on metadata parsing?
>
> Could you, please, share a bit more details on your ideas? 

> Technically it should be possible to push e.g. raw 
> document.head.innerHtml to any external metadata parser using native 
> messaging (to deal with sites requiring authorization). However it could 
> cause an alarm during review before publication of the extension to the 
> browser catalogues.

That's unfortunate. Pushing raw html/dom is what I had in mind when
talking about joining efforts.

Another idea would be providing a callback from elisp to browser (I am
not sure if it is possible). org-capture-ref has a mechanism to check if
the link was captured in the past. If the link is already captured, the
information about the link location and todo-state can be messaged back
to the browser.

Example message (only qutebrowser is supported now):

Bookmark not saved!
Already captured into org-capture-ref:TODO maxnikulin [Github] linkremark: LinkRemark - page or link notes with context

>There is some room for improvement, but I do not think that quality of
> metadata for ordinary sites could be dramatically better. The case
> that is not handled it all is scientific publications, unfortunately
> currently I have quite little interest in it. Definitely results
> should be stored in some structured format such as BibTeX. I have seen
> huge <head> elements describing even all references. Certainly such
> lists are not for general-purpose notes (at least without explicit
> request from the user), they should be handled by some bibliography
> software to display citation graphs in the local library. On the other
> hand it is not a problem to feed such data to some tool using native
> messaging protocol. I have no idea if various publisher provide such
> data in a uniform way, I just hope that pressure from citation indices
> and bibliography management software has positive influence on
> standardization.

I think https://github.com/microlinkhq/metascraper#core-rules can be
used for ideas. It has generic parsing apart from site-specific rules.

For the scientific publications, the key point is usually getting
DOI/ISBN. Then, most of the metadata can be obtained using standard API
of doi.org or various ISBN databases. In addition, reference data is
generally available in OpenCitations.net (they also have all kinds of
web APIs).

Also, do you pass any of the parsed metadata to org-protocol? If you do,
it would be trivial to get it into capture templates on Elisp (and
org-capture-ref) side.

> I am not going to blow up the code with recipes for particular sites. 
> However I realize that some special cases still should be handled. I am 
> not ready to adapt user script model used by 
> Greasemonkey/Violentmonkey/Tampermonkey. I believe, it is better to 
> create dedicated extension(s) that either adds and overwrites existing 
> meta elements or allows to query gathered data using sendMessage 
> webextensions interface. By the way, scripts for above mentioned 
> extensions could be used as well. It should alleviate cases when some 
> site with insane metadata is important for particular user.

I see. This is another point I thought it could be worth collaborating.
The parser rules just need to be written once (probably in some common
format, like json) and then can be reused.

> For some reason I even did not tried to 
> find existing projects for metadata extraction. Maybe I still hope that 
> quite simple implementation could handle most of the cases.

Actually, simple parsing does fairly good job on most of websites. It's
just that it is not ideal. For example, I tweaked title of captured
github issues to include "issue#", which helps to distinguish such pages
from individual repo bookmarks. I believe that such adjustments should
be available for the users, which was where org-capture-ref code started
from.

Best,
Ihor



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Yet another browser extension for capturing notes - LinkRemark
  2020-12-26 13:49     ` Ihor Radchenko
@ 2020-12-27 12:18       ` Maxim Nikulin
  2021-11-18 17:01       ` LinkRemark Firefox extension approved for addons.mozilla.org Max Nikulin
  1 sibling, 0 replies; 20+ messages in thread
From: Maxim Nikulin @ 2020-12-27 12:18 UTC (permalink / raw)
  To: emacs-orgmode

On 26/12/2020 20:49, Ihor Radchenko wrote:
 > Maxim Nikulin <manikulin@gmail.com> writes:

I have reordered some parts of discussion

> Also, do you pass any of the parsed metadata to org-protocol? If you
> do, it would be trivial to get it into capture templates on Elisp
> (and org-capture-ref) side.

I decided that capture could be too complicated to fit into simple query 
parameters of org protocol, e.g. it could be a chain of frames. That is 
why I implemented just simple option title + body (url is available but 
it is contained in the body). I am considering generating of tree of 
headings in some cases.

On the other hand almost all captured data is available to native 
messaging backend 
https://developer.mozilla.org/en-US/docs/Mozilla/Add-ons/WebExtensions/Native_messaging 
A dumb example is included into the sources. It is python, but you could 
use any other language. It is just streaming JSON with message size sent 
in binary form. I have added JSON-RPC to let native messaging host to 
report errors and to avoid ambiguity related to attribution of response 
to particular request. I do not think that setting up of org-protocol 
handler is harder than adding manifest for native messaging backend. It 
should be even a bit safer since some weird org-protocol message could 
not be placed behind an innocent link text.

I think it should be no problem to call emacs-client from such 
application. Isn't it enough for customization? Do you still need raw 
html? Currently I am trying to avoid customization inside the extensions 
since it is harder to keep history of settings changes in git. 
Extensions are quite isolated from host. Also I do not think that 
something like mustache/handlebars templates would be warmly welcomed by 
emacs users.

>> I do not have clear vision how to use collected data for queries.
>> Certainly I want to have more human-friendly representation than BibTeX
>> entries (maybe in addition to machine-parsable data) adjacent to my notes.
> 
> So far, I found author, website name, publication year, title, and
> resource type useful. My standard capture template for links is:
> 
> * <Author> [<Website>] (<Year>) Title

I see that my current choice to prefer og:title or twitter:title for 
header is far from been optimal, even head/title text usually is better.
However I was writing about a bit more detailed two or three-line 
representation. Often I prefer a kind of "card" representation to 
table/columns view.

Concerning queries, see below.

> Completely agree here. That's why I directly reuse the current DOM state
> from qutebrowser in my own setup. However, extension for qutebrowser was
> easy to write for me as it can be simply a bash script. I know nothing
> about Firefox/Chrome extensions and I do not know javascript.

It is too easy to underquote some variable reference in bash and to get 
executed something unexpected. Almost any other script language is safer 
in this sense.

>>   From my point of view, you should be happy with any of projects you
>> mentioned below. Are all of them have some problems critical for you?
> 
> They are all javascript, except one (unicontent), which can be easily
> replaced with built-in Elisp libraries (dom.el).

I mean running them using a very thin wrapper that generates metadata in 
the form easily parsable in emacs.

> Another idea would be providing a callback from elisp to browser (I am
> not sure if it is possible). org-capture-ref has a mechanism to check if
> the link was captured in the past. If the link is already captured, the
> information about the link location and todo-state can be messaged back
> to the browser.
> 
> Example message (only qutebrowser is supported now):
> 
> Bookmark not saved!
> Already captured into org-capture-ref:TODO maxnikulin [Github] linkremark: LinkRemark - page or link notes with context

Why it should be a callback from elisp? From my point of view it is 
extension that should initiate a query if particular URL has been 
captured already. I have realized that in my drafts I even have a native 
messaging backend that could filter matched URLs from a text file. It 
was intended to autocomplete URLs typed in the browser location bar 
using text file as a kind of bookmark storage, but it could be adapted 
for checks similar to yours.

Though it is better to get link to the header with URL (e.g. CUSTOM_ID), 
so additional links or quotes could be added and linked to the "main" 
entry. I have not tried if such query using emacs-client is fast enough. 
I have seen a thread on Language Server Protocol but have not checked if 
that protocol supports such queries.

I especially like idea of references to existing headers because it 
allows to avoid cluttering context menus with options to capture link 
without page metadata in addition to existing ones.



^ permalink raw reply	[flat|nested] 20+ messages in thread

* LinkRemark Firefox extension approved for addons.mozilla.org
  2020-12-26 13:49     ` Ihor Radchenko
  2020-12-27 12:18       ` Maxim Nikulin
@ 2021-11-18 17:01       ` Max Nikulin
  1 sibling, 0 replies; 20+ messages in thread
From: Max Nikulin @ 2021-11-18 17:01 UTC (permalink / raw)
  To: emacs-orgmode

A year ago I announced LinkRemark browser extension to save metadata of 
web pages as notes in Org Mode. New version is available in Firefox 
catalog (It is not published to Chrome store, the only option is still 
to load unpacked extension.):

https://addons.mozilla.org/firefox/addon/linkremark/

Capture is not ideal and notes require edits. Some subset of schema.org 
microdata embedded into HTML markup is extracted now. I addressed some 
issues from comments to first release. Example:

#+begin_src org
   ,* Link: Karl Voit: UOMF: Managing web bookmarks with Org Mode
   :PROPERTIES:
   :DATE_ADDED: [2021-09-28 Tue 12:15]
   :END:

   - Link URL :: [[https://karl-voit.at/2014/08/10/bookmarks-with-orgmode/]]
   - Link text :: Karl Voit: UOMF: Managing web bookmarks with Org Mode

   ,#+begin_quote
   author: Karl Voit
   published: [2014-08-10 Sun]
   ,#+end_quote

   On the page

   ,** Adam Porter — org-almanac
   :PROPERTIES:
   :DATE_ADDED: [2021-09-28 Tue 12:15]
   :LAST_MODIFIED: [2021-09-18 Tue 01:23]​ 09/18/2021 01:23:46
   :END:

   - URL :: [[https://alphapapa.github.io/org-almanac/]]
   - title :: org-almanac
   - author :: Adam Porter
   - referrer :: [[https://www.google.com/]]
#+end_src


On 26/12/2020 20:49, Ihor Radchenko wrote:
> 
> Another idea would be providing a callback from elisp to browser (I am
> not sure if it is possible). org-capture-ref has a mechanism to check if
> the link was captured in the past. If the link is already captured, the
> information about the link location and todo-state can be messaged back
> to the browser.

I looked into org-capture-ref code and stole the idea to use an external 
tool to search in Org files. LinkRemark now can ask native messaging 
application helper whether URLs are already known. Proof of concept: 
https://github.com/maxnikulin/burl

> For the scientific publications, the key point is usually getting
> DOI/ISBN.

At least apparent DOI and links should be recognized now, however with 
no additional actions.

> Also, do you pass any of the parsed metadata to org-protocol? If you do,
> it would be trivial to get it into capture templates on Elisp (and
> org-capture-ref) side.

Actually it was possible even a year ago to specify "object" format 
instead of "org" and to get extracted metadata in JSON format wrapped 
into org-protocol URI. I can not say that structure of data has been 
stabilized and I would not change it again.

> For example, I tweaked title of captured
> github issues to include "issue#", which helps to distinguish such pages
> from individual repo bookmarks.

In particular case of GitHub it is better to fetch raw data

curl -H 'Accept: application/vnd.github.v3+json' 
'https://api.github.com/repos/yantar92/org-capture-ref/issues/2'

On 26/12/2020 05:11, Samuel Wales wrote:
> for
> example, you could have sets of tabs, selected by right click in
> firefox, to save to a bunch of org entries.  then you could load that
> particular set of entries into firefox whenever you want.  and you
> could keep notes on each page and move the entries wherever you want.
> this would be useful for such things as "i am researching rice
> cookers; these are my tabs, but i don't want them cluttering firefox
> and i want them with my org notes and to make notes on them and will
> re-load them into firefox when i want to revisit"

I implemented capture of highlighted tab group for Firefox. No ready to 
use solution is provided to restore it. It is just a tree of Org headings.

> now if i can only debug the extra-blank-lines-in-capture problem.

I hope, a kind of hack to avoid excessive newlines in selected text 
would not be a source of problems. Clipboard managers might be a trouble 
though.



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Yet another browser extension for capturing notes - LinkRemark
  2020-12-26  9:16     ` Maxim Nikulin
@ 2022-01-17  2:29       ` Samuel Wales
  2022-01-18  1:03         ` Samuel Wales
  2022-01-18 10:34         ` Max Nikulin
  0 siblings, 2 replies; 20+ messages in thread
From: Samuel Wales @ 2022-01-17  2:29 UTC (permalink / raw)
  To: Maxim Nikulin; +Cc: emacs-orgmode

more below.

On 12/26/20, Maxim Nikulin <manikulin@gmail.com> wrote:
> On 26/12/2020, Samuel Wales wrote:
>
>> [... i can imagine great things possible with such extensions. for
>> example, you could have sets of tabs, selected by right click in
>> firefox, to save to a bunch of org entries.  then you could load that
>> particular set of entries into firefox whenever you want.  and you
>> could keep notes on each page and move the entries wherever you want.
>> this would be useful for such things as "i am researching rice
>> cookers; these are my tabs, but i don't want them cluttering firefox
>> and i want them with my org notes and to make notes on them and will
>> re-load them into firefox when i want to revisit".]
>
> It should be possible since some tab management extension were used in
> mozilla to evaluate if webextensions are mature enough and if support of
> XUL add-ons could be dropped. On the other hand do not expect such
> feature soon. A kind of semi-blocker is absence of automatic tests to
> run before every release, and it will require a lot of time.

interesting.  i do note tab selection features in recent firefox-esr
and i was just assuming something like that.

>
> In the meanwhile, have you looked at the following comment?
> https://github.com/sprig/org-capture-extension/issues/12#issuecomment-323569334
> alphapapa commented Aug 20, 2017
>
>> You can do this with the "Copy all URLs" extension (ID:
>> djdmadneanknadilpjiknlnanaolmbfk). Use this as the custom format (note
>> the linebreak):
>>
>> [[$url][$title]]
>
> I am almost sure that similar extension should exist for Firefox as well.

i think this is for copying all tabs, not selected ones.  so a
workaround for my idea would be to have a fresh firefox window
dedicated to rice cookers and then save them all.  bit it does not
save over existing canonical location for each url or similar.

which would be needed for my idea so as to not have duplicates etc.

also i think this extension does not exist any more in firefox.  i
used to use it for storing as org links.  but it was just to store
links in case firefox screwed up session restore.  which it usually
does.  for that purpose, i use one that does not save as orglinks.

>
> Some points should be clarified in my opinion
>
> - Do you expect that metadata should be captured in addition to URLs and
> titles? Browsers can unload some tabs making page content unavailable.

i wouldn't need this i think.  i'd want page title, just as in
ordinary org links, but in principle that can be assumed from the
existing org entry if exists, and if not exists and you are capturing,
the page is already loaded.  so i think not a metadata issue.

> - Are you going to capture reviews of "rice cookers" that could be
> considered as ordinary pages or you are going to save items from online
> stores? I do not current state of affairs but I have heard about some
> activity for special metadata that allows search engines to display
> products in a special way. Could you inspect head element of pages in
> your favorite stores contains desired metadata using page source or
> inspect element tools?

my web knowledge is too limited to understand your question, but i am
just hoping it would capture ordinary amazon links, review sites, and
so on.  and i never use js if i can avoid it so i'm expecting pretty
normal website stuff i think.  so i'm flexible.

[of course, amazon per se links might need cleaning or uniquification
of some type for finding the version in org maybe, or maybe for
improving privacy by removing amazon's data about you in the url, but
that might not even need any special amazon link knowledge.
[fanciness might look for the amazon id, if implementer willing or
somethign exists for that.]]

> - Should tab group be captured as single Org heading or it should be a
> tree with a section per tab? I am not sure that capture will have no
> problem with subtree. Certainly Emacs interface for org-protocol +
> capture are not suitable for sending each tab as a separate link.
> Another option is to create nested lists, anyway org formatter in my
> extension need improvements. Are you expecting headings subtree or
> nested lists?

the status quo is that there is nothing, so using lists would be a
huge improvement and work great.  but fanciness by using org sections
if poss [i assume this means header and metadata and content and maybe
descendents] could be more flexible.

>
>> [now if i can only debug the extra-blank-lines-in-capture problem.]
>
> Fully agree that it is really annoying. It is among high priority items
> in my TODO list.

we might be talking about different thinks.  i am referring to
something in org that adds blank lines when my particular org capture
templates are used.  i think it is outside all of the hooks that are
available for org capture so not fixable using those.

recent org might fix it dunno.  i am limited in coputer use so i have
not tried to debug it further.  just delete the extra lines.

>
> Accidentally I pressed =C-x C-o= and discovered
> [[help:delete-blank-lines]] innerText is not exactly the same as
> selection range toString but the rules could work in a similar way.
> Table rows, floating and absolutely positioned elements require
> newlines. Such elements are often abused by designers.
> https://html.spec.whatwg.org/multipage/dom.html#dom-innertext

web stuff is above my knowledge and so i think maybe different things
we are talking about.

>
>
>

you would still keep notes on each thing and org metadata.

then you load all links in an org subtree or list, or all with a
:firefox: tag, into firefox.  one question is making sure there is a
canonical place for each topic.  [rice cookers, a research topic,
etc.]

metadata snags like you mention are best figured out by those who
undertstand them unlike myself and i'd be flexible.  i'd be pleased
with anything i think.  i don't need metadata most of the time, just
link and page title.  this is all just an idea for cogitation.

tldr you'd have a set of canonical tabs that is in org and sometimes
in firefox as you please.  you can keep org notes on the org links and
they won't be overwritten when you save from firefox.  you also won't
create duplicates when you do so.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Yet another browser extension for capturing notes - LinkRemark
  2022-01-17  2:29       ` Samuel Wales
@ 2022-01-18  1:03         ` Samuel Wales
  2022-01-18  5:43           ` Samuel Banya
  2022-01-18 10:34         ` Max Nikulin
  1 sibling, 1 reply; 20+ messages in thread
From: Samuel Wales @ 2022-01-18  1:03 UTC (permalink / raw)
  To: Maxim Nikulin; +Cc: emacs-orgmode

my amazon example was silly and confusing.  the point isn't shopping
for something; it's anything.  science papers, news outlets, nerd
blogs.

On 1/16/22, Samuel Wales <samologist@gmail.com> wrote:
> more below.
>
> On 12/26/20, Maxim Nikulin <manikulin@gmail.com> wrote:
>> On 26/12/2020, Samuel Wales wrote:
>>
>>> [... i can imagine great things possible with such extensions. for
>>> example, you could have sets of tabs, selected by right click in
>>> firefox, to save to a bunch of org entries.  then you could load that
>>> particular set of entries into firefox whenever you want.  and you
>>> could keep notes on each page and move the entries wherever you want.
>>> this would be useful for such things as "i am researching rice
>>> cookers; these are my tabs, but i don't want them cluttering firefox
>>> and i want them with my org notes and to make notes on them and will
>>> re-load them into firefox when i want to revisit".]
>>
>> It should be possible since some tab management extension were used in
>> mozilla to evaluate if webextensions are mature enough and if support of
>> XUL add-ons could be dropped. On the other hand do not expect such
>> feature soon. A kind of semi-blocker is absence of automatic tests to
>> run before every release, and it will require a lot of time.
>
> interesting.  i do note tab selection features in recent firefox-esr
> and i was just assuming something like that.
>
>>
>> In the meanwhile, have you looked at the following comment?
>> https://github.com/sprig/org-capture-extension/issues/12#issuecomment-323569334
>> alphapapa commented Aug 20, 2017
>>
>>> You can do this with the "Copy all URLs" extension (ID:
>>> djdmadneanknadilpjiknlnanaolmbfk). Use this as the custom format (note
>>> the linebreak):
>>>
>>> [[$url][$title]]
>>
>> I am almost sure that similar extension should exist for Firefox as well.
>
> i think this is for copying all tabs, not selected ones.  so a
> workaround for my idea would be to have a fresh firefox window
> dedicated to rice cookers and then save them all.  bit it does not
> save over existing canonical location for each url or similar.
>
> which would be needed for my idea so as to not have duplicates etc.
>
> also i think this extension does not exist any more in firefox.  i
> used to use it for storing as org links.  but it was just to store
> links in case firefox screwed up session restore.  which it usually
> does.  for that purpose, i use one that does not save as orglinks.
>
>>
>> Some points should be clarified in my opinion
>>
>> - Do you expect that metadata should be captured in addition to URLs and
>> titles? Browsers can unload some tabs making page content unavailable.
>
> i wouldn't need this i think.  i'd want page title, just as in
> ordinary org links, but in principle that can be assumed from the
> existing org entry if exists, and if not exists and you are capturing,
> the page is already loaded.  so i think not a metadata issue.
>
>> - Are you going to capture reviews of "rice cookers" that could be
>> considered as ordinary pages or you are going to save items from online
>> stores? I do not current state of affairs but I have heard about some
>> activity for special metadata that allows search engines to display
>> products in a special way. Could you inspect head element of pages in
>> your favorite stores contains desired metadata using page source or
>> inspect element tools?
>
> my web knowledge is too limited to understand your question, but i am
> just hoping it would capture ordinary amazon links, review sites, and
> so on.  and i never use js if i can avoid it so i'm expecting pretty
> normal website stuff i think.  so i'm flexible.
>
> [of course, amazon per se links might need cleaning or uniquification
> of some type for finding the version in org maybe, or maybe for
> improving privacy by removing amazon's data about you in the url, but
> that might not even need any special amazon link knowledge.
> [fanciness might look for the amazon id, if implementer willing or
> somethign exists for that.]]
>
>> - Should tab group be captured as single Org heading or it should be a
>> tree with a section per tab? I am not sure that capture will have no
>> problem with subtree. Certainly Emacs interface for org-protocol +
>> capture are not suitable for sending each tab as a separate link.
>> Another option is to create nested lists, anyway org formatter in my
>> extension need improvements. Are you expecting headings subtree or
>> nested lists?
>
> the status quo is that there is nothing, so using lists would be a
> huge improvement and work great.  but fanciness by using org sections
> if poss [i assume this means header and metadata and content and maybe
> descendents] could be more flexible.
>
>>
>>> [now if i can only debug the extra-blank-lines-in-capture problem.]
>>
>> Fully agree that it is really annoying. It is among high priority items
>> in my TODO list.
>
> we might be talking about different thinks.  i am referring to
> something in org that adds blank lines when my particular org capture
> templates are used.  i think it is outside all of the hooks that are
> available for org capture so not fixable using those.
>
> recent org might fix it dunno.  i am limited in coputer use so i have
> not tried to debug it further.  just delete the extra lines.
>
>>
>> Accidentally I pressed =C-x C-o= and discovered
>> [[help:delete-blank-lines]] innerText is not exactly the same as
>> selection range toString but the rules could work in a similar way.
>> Table rows, floating and absolutely positioned elements require
>> newlines. Such elements are often abused by designers.
>> https://html.spec.whatwg.org/multipage/dom.html#dom-innertext
>
> web stuff is above my knowledge and so i think maybe different things
> we are talking about.
>
>>
>>
>>
>
> you would still keep notes on each thing and org metadata.
>
> then you load all links in an org subtree or list, or all with a
> :firefox: tag, into firefox.  one question is making sure there is a
> canonical place for each topic.  [rice cookers, a research topic,
> etc.]
>
> metadata snags like you mention are best figured out by those who
> undertstand them unlike myself and i'd be flexible.  i'd be pleased
> with anything i think.  i don't need metadata most of the time, just
> link and page title.  this is all just an idea for cogitation.
>
> tldr you'd have a set of canonical tabs that is in org and sometimes
> in firefox as you please.  you can keep org notes on the org links and
> they won't be overwritten when you save from firefox.  you also won't
> create duplicates when you do so.
>


-- 
The Kafka Pandemic

A blog about science, health, human rights, and misopathy:
https://thekafkapandemic.blogspot.com


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Yet another browser extension for capturing notes - LinkRemark
  2022-01-18  1:03         ` Samuel Wales
@ 2022-01-18  5:43           ` Samuel Banya
  2022-01-18 10:57             ` Max Nikulin
  0 siblings, 1 reply; 20+ messages in thread
From: Samuel Banya @ 2022-01-18  5:43 UTC (permalink / raw)
  To: Charles Berry

[-- Attachment #1: Type: text/plain, Size: 7548 bytes --]

Not sure if it helps, but you could also use the w3m browser's mentality of just keeping an HTML file that contains all of your bookmarks. I'm sure there's probably even a way to use 'eww' in the same fashion too.

Maybe even making your own personal wiki of a webring of sorts would help too.

I don't personally bookmark anything anymore but just store links on a webring on my site.

Hope this helps.

Sam

On Mon, Jan 17, 2022, at 8:03 PM, Samuel Wales wrote:
> my amazon example was silly and confusing.  the point isn't shopping
> for something; it's anything.  science papers, news outlets, nerd
> blogs.
> 
> On 1/16/22, Samuel Wales <samologist@gmail.com> wrote:
> > more below.
> >
> > On 12/26/20, Maxim Nikulin <manikulin@gmail.com> wrote:
> >> On 26/12/2020, Samuel Wales wrote:
> >>
> >>> [... i can imagine great things possible with such extensions. for
> >>> example, you could have sets of tabs, selected by right click in
> >>> firefox, to save to a bunch of org entries.  then you could load that
> >>> particular set of entries into firefox whenever you want.  and you
> >>> could keep notes on each page and move the entries wherever you want.
> >>> this would be useful for such things as "i am researching rice
> >>> cookers; these are my tabs, but i don't want them cluttering firefox
> >>> and i want them with my org notes and to make notes on them and will
> >>> re-load them into firefox when i want to revisit".]
> >>
> >> It should be possible since some tab management extension were used in
> >> mozilla to evaluate if webextensions are mature enough and if support of
> >> XUL add-ons could be dropped. On the other hand do not expect such
> >> feature soon. A kind of semi-blocker is absence of automatic tests to
> >> run before every release, and it will require a lot of time.
> >
> > interesting.  i do note tab selection features in recent firefox-esr
> > and i was just assuming something like that.
> >
> >>
> >> In the meanwhile, have you looked at the following comment?
> >> https://github.com/sprig/org-capture-extension/issues/12#issuecomment-323569334
> >> alphapapa commented Aug 20, 2017
> >>
> >>> You can do this with the "Copy all URLs" extension (ID:
> >>> djdmadneanknadilpjiknlnanaolmbfk). Use this as the custom format (note
> >>> the linebreak):
> >>>
> >>> [[$url][$title]]
> >>
> >> I am almost sure that similar extension should exist for Firefox as well.
> >
> > i think this is for copying all tabs, not selected ones.  so a
> > workaround for my idea would be to have a fresh firefox window
> > dedicated to rice cookers and then save them all.  bit it does not
> > save over existing canonical location for each url or similar.
> >
> > which would be needed for my idea so as to not have duplicates etc.
> >
> > also i think this extension does not exist any more in firefox.  i
> > used to use it for storing as org links.  but it was just to store
> > links in case firefox screwed up session restore.  which it usually
> > does.  for that purpose, i use one that does not save as orglinks.
> >
> >>
> >> Some points should be clarified in my opinion
> >>
> >> - Do you expect that metadata should be captured in addition to URLs and
> >> titles? Browsers can unload some tabs making page content unavailable.
> >
> > i wouldn't need this i think.  i'd want page title, just as in
> > ordinary org links, but in principle that can be assumed from the
> > existing org entry if exists, and if not exists and you are capturing,
> > the page is already loaded.  so i think not a metadata issue.
> >
> >> - Are you going to capture reviews of "rice cookers" that could be
> >> considered as ordinary pages or you are going to save items from online
> >> stores? I do not current state of affairs but I have heard about some
> >> activity for special metadata that allows search engines to display
> >> products in a special way. Could you inspect head element of pages in
> >> your favorite stores contains desired metadata using page source or
> >> inspect element tools?
> >
> > my web knowledge is too limited to understand your question, but i am
> > just hoping it would capture ordinary amazon links, review sites, and
> > so on.  and i never use js if i can avoid it so i'm expecting pretty
> > normal website stuff i think.  so i'm flexible.
> >
> > [of course, amazon per se links might need cleaning or uniquification
> > of some type for finding the version in org maybe, or maybe for
> > improving privacy by removing amazon's data about you in the url, but
> > that might not even need any special amazon link knowledge.
> > [fanciness might look for the amazon id, if implementer willing or
> > somethign exists for that.]]
> >
> >> - Should tab group be captured as single Org heading or it should be a
> >> tree with a section per tab? I am not sure that capture will have no
> >> problem with subtree. Certainly Emacs interface for org-protocol +
> >> capture are not suitable for sending each tab as a separate link.
> >> Another option is to create nested lists, anyway org formatter in my
> >> extension need improvements. Are you expecting headings subtree or
> >> nested lists?
> >
> > the status quo is that there is nothing, so using lists would be a
> > huge improvement and work great.  but fanciness by using org sections
> > if poss [i assume this means header and metadata and content and maybe
> > descendents] could be more flexible.
> >
> >>
> >>> [now if i can only debug the extra-blank-lines-in-capture problem.]
> >>
> >> Fully agree that it is really annoying. It is among high priority items
> >> in my TODO list.
> >
> > we might be talking about different thinks.  i am referring to
> > something in org that adds blank lines when my particular org capture
> > templates are used.  i think it is outside all of the hooks that are
> > available for org capture so not fixable using those.
> >
> > recent org might fix it dunno.  i am limited in coputer use so i have
> > not tried to debug it further.  just delete the extra lines.
> >
> >>
> >> Accidentally I pressed =C-x C-o= and discovered
> >> [[help:delete-blank-lines]] innerText is not exactly the same as
> >> selection range toString but the rules could work in a similar way.
> >> Table rows, floating and absolutely positioned elements require
> >> newlines. Such elements are often abused by designers.
> >> https://html.spec.whatwg.org/multipage/dom.html#dom-innertext
> >
> > web stuff is above my knowledge and so i think maybe different things
> > we are talking about.
> >
> >>
> >>
> >>
> >
> > you would still keep notes on each thing and org metadata.
> >
> > then you load all links in an org subtree or list, or all with a
> > :firefox: tag, into firefox.  one question is making sure there is a
> > canonical place for each topic.  [rice cookers, a research topic,
> > etc.]
> >
> > metadata snags like you mention are best figured out by those who
> > undertstand them unlike myself and i'd be flexible.  i'd be pleased
> > with anything i think.  i don't need metadata most of the time, just
> > link and page title.  this is all just an idea for cogitation.
> >
> > tldr you'd have a set of canonical tabs that is in org and sometimes
> > in firefox as you please.  you can keep org notes on the org links and
> > they won't be overwritten when you save from firefox.  you also won't
> > create duplicates when you do so.
> >
> 
> 
> -- 
> The Kafka Pandemic
> 
> A blog about science, health, human rights, and misopathy:
> https://thekafkapandemic.blogspot.com
> 
> 

[-- Attachment #2: Type: text/html, Size: 10861 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Yet another browser extension for capturing notes - LinkRemark
  2022-01-17  2:29       ` Samuel Wales
  2022-01-18  1:03         ` Samuel Wales
@ 2022-01-18 10:34         ` Max Nikulin
  2022-01-19  3:28           ` Ihor Radchenko
  1 sibling, 1 reply; 20+ messages in thread
From: Max Nikulin @ 2022-01-18 10:34 UTC (permalink / raw)
  To: emacs-orgmode

Samuel,

since significant part of your message is dedicated to capturing of tab 
groups I should ask if you have tried version of LinkRemark add-on 
currently available from browser extension catalogues:

- https://addons.mozilla.org/firefox/addon/linkremark/
- https://chrome.google.com/webstore/detail/mgmcoaemjnaehlliifkgljdnbpedihoe

Groups of tabs or selected (highlighted) tabs are supported for 
Chromium, Firefox has no built-in tab groups, but it is still possible 
to capture selected tabs.

Your feature requests:
- Clean-up URLs. I have such idea, but I have not approached to 
implementation of it. Maybe URLs should be sent to another extension 
that excels in such task. If you have come comments which add-ons are 
great and which work rather poor, the suggestions my be helpful.
- Deduplicate URLs from tab groups. It requires some work to merge 
selected text, links, or nested frames from each tab. The complication 
is that some sites use internal navigation not reflected in location, so 
the same URL may have completely different content. Some sites have 
their top pages as canonical URLs, so some measures against false 
positives is required. Currently the extension may check if URL already 
present in org files. It requires https://github.com/maxnikulin/burl 
helper application that is in proof-of concept stage.
- Restore set of tabs. It requires some elisp code to iterate over 
subtree and to pick first "Link URL" or "URL" from description lists.

Currently I am thinking on some changes of interface since sometimes I 
just want to check if some URL is in my notes already. I would prefer to 
avoid adding more context menu items.

Additional details are inline.

On 17/01/2022 09:29, Samuel Wales wrote:
> On 12/26/20, Maxim Nikulin <manikulin@gmail.com> wrote:
>> On 26/12/2020, Samuel Wales wrote:
>>
>>> [... i can imagine great things possible with such extensions. for
>>> example, you could have sets of tabs, selected by right click in
>>> firefox, to save to a bunch of org entries.  then you could load that
>>> particular set of entries into firefox whenever you want.
> 
> interesting.  i do note tab selection features in recent firefox-esr
> and i was just assuming something like that.

There is no a ready to use recipe for loading saved tabs, but saving 
should work to some extent.

>>> You can do this with the "Copy all URLs" extension (ID:
>>> djdmadneanknadilpjiknlnanaolmbfk). Use this as the custom format (note
>>> the linebreak):
>>
>> I am almost sure that similar extension should exist for Firefox as well.
> 
> i think this is for copying all tabs, not selected ones.
...
> also i think this extension does not exist any more in firefox.

I have not tried them:
- https://github.com/piroor/copy-selected-tabs-to-clipboard/
- https://github.com/yorkxin/copy-as-markdown

>> - Are you going to capture reviews of "rice cookers" that could be
>> considered as ordinary pages or you are going to save items from online
>> stores?
...
>> Could you inspect head element of pages in
>> your favorite stores contains desired metadata using page source or
>> inspect element tools?
> 
> my web knowledge is too limited to understand your question, but i am
> just hoping it would capture ordinary amazon links, review sites, and
> so on.

It seems that quality of metadata in marketplaces like amazon severely 
depends on particular seller. The extension attempts to treat some data 
specially if there are microdata or JSON-LD with Product schema.org 
type. If I remember correctly, Amazon does not expose canonical link 
explicitly.

>>> [now if i can only debug the extra-blank-lines-in-capture problem.]
>>
>> Fully agree that it is really annoying. It is among high priority items
>> in my TODO list.
> 
> we might be talking about different thinks.  i am referring to
> something in org that adds blank lines when my particular org capture
> templates are used.

See info "(org) Template elements" 
https://orgmode.org/manual/Template-elements.html
:empty-lines, :empty-lines-after, :empty-lines-before
however I can not say that I really understand their meaning. Actually I 
do not mind to have empty line before next heading when refile is 
completed. My impression that it depends on number of empty lines at the 
end of capture buffer. I usually add some comments to captured pages.

On 18/01/2022 08:03, Samuel Wales wrote:
 > my amazon example was silly and confusing.  the point isn't shopping
 > for something; it's anything.  science papers, news outlets, nerd
 > blogs.

Scientific papers require more work, it is necessary to make them 
available to org-cite somehow. Some nerds use quite peculiar blog 
engines and strange setting of metadata. So shopping on some sites might 
work better than other cases.



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Yet another browser extension for capturing notes - LinkRemark
  2022-01-18  5:43           ` Samuel Banya
@ 2022-01-18 10:57             ` Max Nikulin
  0 siblings, 0 replies; 20+ messages in thread
From: Max Nikulin @ 2022-01-18 10:57 UTC (permalink / raw)
  To: emacs-orgmode

On 18/01/2022 12:43, Samuel Banya wrote:
> Not sure if it helps, but you could also use the w3m browser's mentality 
> of just keeping an HTML file that contains all of your bookmarks. I'm 
> sure there's probably even a way to use 'eww' in the same fashion too.
> 
> Maybe even making your own personal wiki of a webring of sorts would 
> help too.
> 
> I don't personally bookmark anything anymore but just store links on a 
> webring on my site.

Actually Samuel Wales added more details to his message posted a year 
ago. I started that thread to announce LinkRemark browser extension 
https://github.com/maxnikulin/linkremark It was me who tried to revive 
the thread a month ago.

The idea is to store bookmarks in Org file and it should be more than 
just URL and page title. Rich "bookmark" should have more metadata and 
may have user comments.

In eww you likely can use org-store-link or org-capture directly. 
Example of projects that extracts metadata: 
https://github.com/yantar92/org-capture-ref

Doesn't Org mode is better than any wiki? At least in some aspects.



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Yet another browser extension for capturing notes - LinkRemark
  2022-01-18 10:34         ` Max Nikulin
@ 2022-01-19  3:28           ` Ihor Radchenko
  2022-01-19  8:45             ` András Simonyi
  2022-01-20  0:23             ` Samuel Wales
  0 siblings, 2 replies; 20+ messages in thread
From: Ihor Radchenko @ 2022-01-19  3:28 UTC (permalink / raw)
  To: Max Nikulin; +Cc: emacs-orgmode

Max Nikulin <manikulin@gmail.com> writes:

> Scientific papers require more work, it is necessary to make them 
> available to org-cite somehow. Some nerds use quite peculiar blog 
> engines and strange setting of metadata. So shopping on some sites might 
> work better than other cases.

I have plans to implement something called oc-org.el The plan is
using ol-bibtex-compatible Org headings as a source of citations.

Best,
Ihor


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Yet another browser extension for capturing notes - LinkRemark
  2022-01-19  3:28           ` Ihor Radchenko
@ 2022-01-19  8:45             ` András Simonyi
  2022-01-19 10:00               ` Ihor Radchenko
  2022-01-20  0:23             ` Samuel Wales
  1 sibling, 1 reply; 20+ messages in thread
From: András Simonyi @ 2022-01-19  8:45 UTC (permalink / raw)
  To: Ihor Radchenko; +Cc: Max Nikulin, emacs-orgmode

Dear All,

On Wed, 19 Jan 2022 at 04:24, Ihor Radchenko <yantar92@gmail.com> wrote:

> > Scientific papers require more work, it is necessary to make them
> > available to org-cite somehow. Some nerds use quite peculiar blog
> > engines and strange setting of metadata. So shopping on some sites might
> > work better than other cases.
>
> I have plans to implement something called oc-org.el The plan is
> using ol-bibtex-compatible Org headings as a source of citations.

Just wanted to note that the CSL-based export processor, oc-csl.el,
already supports this: you can add an Org file as a bibliography, cite
items described by ol-bibtex style headings and export the citations.
It'd be very nice indeed if other built-in processors supported the
format too (e.g., "basic"). As for external ones, the CSL-based
activation processor I wrote
(https://github.com/andras-simonyi/org-cite-csl-activate) also
supports it and there are plans to add support to Citar as well
(through parsebib); see the discussion at
https://github.com/bdarcus/citar/issues/397.

best wishes,
András

> Best,
> Ihor
>


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Yet another browser extension for capturing notes - LinkRemark
  2022-01-19  8:45             ` András Simonyi
@ 2022-01-19 10:00               ` Ihor Radchenko
  2022-01-19 10:58                 ` András Simonyi
  0 siblings, 1 reply; 20+ messages in thread
From: Ihor Radchenko @ 2022-01-19 10:00 UTC (permalink / raw)
  To: András Simonyi; +Cc: Max Nikulin, emacs-orgmode

András Simonyi <andras.simonyi@gmail.com> writes:

> Just wanted to note that the CSL-based export processor, oc-csl.el,
> already supports this: you can add an Org file as a bibliography, cite
> items described by ol-bibtex style headings and export the citations.

Thanks for telling! oc-csl is tricky because it relies on external
library. So, it's hard to know what it can do and what it cannot do.

As a side note, citeproc-el currently has poor performance on large org
files. It is unusable for me.

> It'd be very nice indeed if other built-in processors supported the
> format too (e.g., "basic"). As for external ones, the CSL-based
> activation processor I wrote
> (https://github.com/andras-simonyi/org-cite-csl-activate) also
> supports it

Interesting. By the way, I recommend using composition instead of
display property for rendering. See prettify-symbols-mode.

Best,
Ihor


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Yet another browser extension for capturing notes - LinkRemark
  2022-01-19 10:00               ` Ihor Radchenko
@ 2022-01-19 10:58                 ` András Simonyi
  2022-01-19 11:42                   ` Ihor Radchenko
  0 siblings, 1 reply; 20+ messages in thread
From: András Simonyi @ 2022-01-19 10:58 UTC (permalink / raw)
  To: Ihor Radchenko; +Cc: Max Nikulin, emacs-orgmode

Dear All,

On Wed, 19 Jan 2022 at 10:56, Ihor Radchenko <yantar92@gmail.com> wrote:

> As a side note, citeproc-el currently has poor performance on large org
> files. It is unusable for me.

Could you elaborate? In theory, oc-cs.el's performance should depend
only on the number of citations (as opposed to the size of the Org
document) and be in the same ballpark as pandoc's citeproc. It'd be
interesting to know the details since I plan to work on speeding up
citeproc-el's rendering, although you are the first one to actually
complain :-).

best wishes,
András


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Yet another browser extension for capturing notes - LinkRemark
  2022-01-19 10:58                 ` András Simonyi
@ 2022-01-19 11:42                   ` Ihor Radchenko
  0 siblings, 0 replies; 20+ messages in thread
From: Ihor Radchenko @ 2022-01-19 11:42 UTC (permalink / raw)
  To: András Simonyi; +Cc: Max Nikulin, emacs-orgmode

András Simonyi <andras.simonyi@gmail.com> writes:

>> As a side note, citeproc-el currently has poor performance on large org
>> files. It is unusable for me.
>
> Could you elaborate? In theory, oc-cs.el's performance should depend
> only on the number of citations (as opposed to the size of the Org
> document) and be in the same ballpark as pandoc's citeproc. It'd be
> interesting to know the details since I plan to work on speeding up
> citeproc-el's rendering, although you are the first one to actually
> complain :-).

There is no doubt why I complain - 15Mb "bibliography" file.

The oc-csl.el performance depends on the size of the Org document during
caching stage. Moreover, every time I change the Org document, caching
is repeated. Every time I open the file using oc-csl.el, caching is
repeated. Every time I revert file using oc-csl.el, caching is repeated.

I think that the easiest solution for citeproc would be not calling
org-bibtex-headline on every single headline, but using regexp search
for "BTYPE" property.

Best,
Ihor


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Yet another browser extension for capturing notes - LinkRemark
  2022-01-19  3:28           ` Ihor Radchenko
  2022-01-19  8:45             ` András Simonyi
@ 2022-01-20  0:23             ` Samuel Wales
  1 sibling, 0 replies; 20+ messages in thread
From: Samuel Wales @ 2022-01-20  0:23 UTC (permalink / raw)
  To: Ihor Radchenko; +Cc: Max Nikulin, emacs-orgmode

just a quick fwiw before i try to reply to the longer message by max.
my own suggestion is modest for metadata, [even for science papers and
things with funny web construction].  just title like org-capture
extension.  no need to cite in my case.

my needs for saving and restoring, however, are more fancy.  something
like achieving a 1:1 mapping from firefox selected tabs, or a tree
style tabs extension tree, to their counterparts in org, even when
those counterparts have notes and such.  this might include marking
the org version as deleted/doneified] merely by closing tab in
firefox.  vice-versa would be straightforward.  so it's really a "get
organized and don't get confused by having both firefox and org" kinda
thing.


^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2022-01-20  0:24 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-12-25 12:44 Yet another browser extension for capturing notes - LinkRemark Maxim Nikulin
2020-12-25 14:19 ` Ihor Radchenko
2020-12-26 11:49   ` Maxim Nikulin
2020-12-26 13:49     ` Ihor Radchenko
2020-12-27 12:18       ` Maxim Nikulin
2021-11-18 17:01       ` LinkRemark Firefox extension approved for addons.mozilla.org Max Nikulin
2020-12-25 14:26 ` Yet another browser extension for capturing notes - LinkRemark Russell Adams
2020-12-25 22:11   ` Samuel Wales
2020-12-26  9:16     ` Maxim Nikulin
2022-01-17  2:29       ` Samuel Wales
2022-01-18  1:03         ` Samuel Wales
2022-01-18  5:43           ` Samuel Banya
2022-01-18 10:57             ` Max Nikulin
2022-01-18 10:34         ` Max Nikulin
2022-01-19  3:28           ` Ihor Radchenko
2022-01-19  8:45             ` András Simonyi
2022-01-19 10:00               ` Ihor Radchenko
2022-01-19 10:58                 ` András Simonyi
2022-01-19 11:42                   ` Ihor Radchenko
2022-01-20  0:23             ` Samuel Wales

Code repositories for project(s) associated with this inbox:

	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).