emacs-orgmode@gnu.org archives
help / color / mirror / code / Atom feed
* stability of toc links
@ 2020-12-08 23:28 Samuel Wales
2020-12-08 23:30  Samuel Wales
 (3 more replies)
0 siblings, 4 replies; 50+ messages in thread
From: Samuel Wales @ 2020-12-08 23:28 UTC (permalink / raw)
To: emacs-orgmode

when you link to a section using toc, you get a link like

https://thekafkapandemic.blogspot.com/2020/02/crimes-against-humanity_3.html#org080f0ab

will these links break if somebody copies them and pastes them
elsewhere?  what if you add a section?

there doesn't seem to be a perfect solution, short of adding custom id
or id to everything, but perhaps a fuzzy hash of the header and
contents of the section could be used?  or a strict hash of the
header?  is anything like this being done?  just curious.

--
The Kafka Pandemic

https://thekafkapandemic.blogspot.com/2013/10/why-some-diseases-are-wronged.html

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: stability of toc links
2020-12-08 23:28 stability of toc links Samuel Wales
@ 2020-12-08 23:30  Samuel Wales
2020-12-09  1:39    Tom Gillespie
2020-12-09  2:48  TEC
 (2 subsequent siblings)
From: Samuel Wales @ 2020-12-08 23:30 UTC (permalink / raw)
To: emacs-orgmode

just thought of a better, still bad, solution.

you could use the header text itself.

then if the link breaks, the user can at least go to the document and
manually look for somethnign similar.

On 12/8/20, Samuel Wales <samologist@gmail.com> wrote:
> when you link to a section using toc, you get a link like
>
>
> https://thekafkapandemic.blogspot.com/2020/02/crimes-against-humanity_3.html#org080f0ab
>
> will these links break if somebody copies them and pastes them
> elsewhere?  what if you add a section?
>
> there doesn't seem to be a perfect solution, short of adding custom id
> or id to everything, but perhaps a fuzzy hash of the header and
> contents of the section could be used?  or a strict hash of the
> header?  is anything like this being done?  just curious.
>
> --
> The Kafka Pandemic
>
> Please learn what misopathy is.
> https://thekafkapandemic.blogspot.com/2013/10/why-some-diseases-are-wronged.html
>

--
The Kafka Pandemic

https://thekafkapandemic.blogspot.com/2013/10/why-some-diseases-are-wronged.html

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: stability of toc links
2020-12-08 23:30  Samuel Wales
@ 2020-12-09  1:39    Tom Gillespie
2020-12-12 21:51      TRS-80
From: Tom Gillespie @ 2020-12-09  1:39 UTC (permalink / raw)
To: Samuel Wales; +Cc: emacs-orgmode

It sounds like you are looking for the CUSTOM_ID property. See
whether there is a way to generate ids matching headlines within org
itself, but there is
https://github.com/alphapapa/unpackaged.el#export-to-html-with-useful-anchors.
Best!
Tom

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: stability of toc links
2020-12-08 23:28 stability of toc links Samuel Wales
2020-12-08 23:30  Samuel Wales
@ 2020-12-09  2:48  TEC
2020-12-09  8:45    Diego Zamboni
(2 more replies)
2020-12-14 10:46  Dominique Dumont
2021-04-18 10:32  Nicolas Goaziou
3 siblings, 3 replies; 50+ messages in thread
From: TEC @ 2020-12-09  2:48 UTC (permalink / raw)
To: emacs-orgmode

Hi Sam, link stability is a concern I've had too. I currently have a fix
(or at the very least, an improvement) for this in my config where I
overwrite org-export-get-reference. (see:

I raised this on the list a while ago ---
https://orgmode.org/list/E1jxAjq-0004Dk-LH@lists.gnu.org/ but there
didn't seem to be much interest.

All the best,
Timothy

Samuel Wales <samologist@gmail.com> writes:

> when you link to a section using toc, you get a link like
>
>   https://thekafkapandemic.blogspot.com/2020/02/crimes-against-humanity_3.html#org080f0ab
>
> will these links break if somebody copies them and pastes them
> elsewhere?  what if you add a section?
>
> there doesn't seem to be a perfect solution, short of adding custom id
> or id to everything, but perhaps a fuzzy hash of the header and
> contents of the section could be used?  or a strict hash of the
> header?  is anything like this being done?  just curious.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: stability of toc links
2020-12-09  2:48  TEC
@ 2020-12-09  8:45    Diego Zamboni
2020-12-09  9:15    Carsten Dominik
2021-04-18 21:02    Samuel Wales
2 siblings, 0 replies; 50+ messages in thread
From: Diego Zamboni @ 2020-12-09  8:45 UTC (permalink / raw)
To: TEC; +Cc: Org-mode

[-- Attachment #1: Type: text/plain, Size: 1569 bytes --]

In case it's useful, I have put together (just last week) some config to
help in creating and using human-readable CUSTOM_IDs in conjunction with
counsel-org-link, but which could just as easily be used to apply the IDs
to every heading in the current document:

Now I have to look at TEC's ID-generation code, seems much nicer :)

--Diego

On Wed, Dec 9, 2020 at 3:54 AM TEC <tecosaur@gmail.com> wrote:

>
> Hi Sam, link stability is a concern I've had too. I currently have a fix
> (or at the very least, an improvement) for this in my config where I
> overwrite org-export-get-reference. (see:
> ).
>
> I raised this on the list a while ago ---
> https://orgmode.org/list/E1jxAjq-0004Dk-LH@lists.gnu.org/ but there
> didn't seem to be much interest.
>
> All the best,
> Timothy
>
> Samuel Wales <samologist@gmail.com> writes:
>
> > when you link to a section using toc, you get a link like
> >
> >
> https://thekafkapandemic.blogspot.com/2020/02/crimes-against-humanity_3.html#org080f0ab
> >
> > will these links break if somebody copies them and pastes them
> > elsewhere?  what if you add a section?
> >
> > there doesn't seem to be a perfect solution, short of adding custom id
> > or id to everything, but perhaps a fuzzy hash of the header and
> > contents of the section could be used?  or a strict hash of the
> > header?  is anything like this being done?  just curious.
>
>
>

[-- Attachment #2: Type: text/html, Size: 2596 bytes --]

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: stability of toc links
2020-12-09  2:48  TEC
2020-12-09  8:45    Diego Zamboni
@ 2020-12-09  9:15    Carsten Dominik
2020-12-09 21:25      Samuel Wales
2021-04-18 21:02    Samuel Wales
From: Carsten Dominik @ 2020-12-09  9:15 UTC (permalink / raw)
To: TEC; +Cc: org-mode list

[-- Attachment #1: Type: text/plain, Size: 1193 bytes --]

I think we should merge this code into Org.

Kind regards

Carsten

On Wed, Dec 9, 2020 at 3:54 AM TEC <tecosaur@gmail.com> wrote:

>
> Hi Sam, link stability is a concern I've had too. I currently have a fix
> (or at the very least, an improvement) for this in my config where I
> overwrite org-export-get-reference. (see:
> ).
>
> I raised this on the list a while ago ---
> https://orgmode.org/list/E1jxAjq-0004Dk-LH@lists.gnu.org/ but there
> didn't seem to be much interest.
>
> All the best,
> Timothy
>
> Samuel Wales <samologist@gmail.com> writes:
>
> > when you link to a section using toc, you get a link like
> >
> >
> https://thekafkapandemic.blogspot.com/2020/02/crimes-against-humanity_3.html#org080f0ab
> >
> > will these links break if somebody copies them and pastes them
> > elsewhere?  what if you add a section?
> >
> > there doesn't seem to be a perfect solution, short of adding custom id
> > or id to everything, but perhaps a fuzzy hash of the header and
> > contents of the section could be used?  or a strict hash of the
> > header?  is anything like this being done?  just curious.
>
>
>

[-- Attachment #2: Type: text/html, Size: 2098 bytes --]

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: stability of toc links
2020-12-09  9:15    Carsten Dominik
@ 2020-12-09 21:25      Samuel Wales
2020-12-10  9:55        Carsten Dominik
From: Samuel Wales @ 2020-12-09 21:25 UTC (permalink / raw)
To: Carsten Dominik; +Cc: org-mode list, TEC

just so everybody is on the same page, i think carsten is talking
about tec's code that generates html id's that are then used in urls?

imo great idea.

On 12/9/20, Carsten Dominik <dominik@uva.nl> wrote:
> I think we should merge this code into Org.
>
> Kind regards
>
> Carsten
>
> On Wed, Dec 9, 2020 at 3:54 AM TEC <tecosaur@gmail.com> wrote:
>
>>
>> Hi Sam, link stability is a concern I've had too. I currently have a fix
>> (or at the very least, an improvement) for this in my config where I
>> overwrite org-export-get-reference. (see:
>> ).
>>
>> I raised this on the list a while ago ---
>> https://orgmode.org/list/E1jxAjq-0004Dk-LH@lists.gnu.org/ but there
>> didn't seem to be much interest.
>>
>> All the best,
>> Timothy
>>
>> Samuel Wales <samologist@gmail.com> writes:
>>
>> > when you link to a section using toc, you get a link like
>> >
>> >
>> https://thekafkapandemic.blogspot.com/2020/02/crimes-against-humanity_3.html#org080f0ab
>> >
>> > will these links break if somebody copies them and pastes them
>> > elsewhere?  what if you add a section?
>> >
>> > there doesn't seem to be a perfect solution, short of adding custom id
>> > or id to everything, but perhaps a fuzzy hash of the header and
>> > contents of the section could be used?  or a strict hash of the
>> > header?  is anything like this being done?  just curious.
>>
>>
>>
>

--
The Kafka Pandemic

https://thekafkapandemic.blogspot.com/2013/10/why-some-diseases-are-wronged.html

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: stability of toc links
2020-12-09 21:25      Samuel Wales
@ 2020-12-10  9:55        Carsten Dominik
2020-12-10 12:49          TEC
From: Carsten Dominik @ 2020-12-10  9:55 UTC (permalink / raw)
To: Samuel Wales; +Cc: org-mode list, TEC

[-- Attachment #1: Type: text/plain, Size: 3527 bytes --]

On Wed, Dec 9, 2020 at 10:25 PM Samuel Wales <samologist@gmail.com> wrote:

> just so everybody is on the same page, i think carsten is talking
> about tec's code that generates html id's that are then used in urls?
>

Yes, I mean this code, or something like this, to aid the automatic
creation of links that are somewhat stable.  I have been missing this very
much.

Kind regards

Carsten

>
> imo great idea.
>
>
> On 12/9/20, Carsten Dominik <dominik@uva.nl> wrote:
> > I think we should merge this code into Org.
> >
> > Kind regards
> >
> > Carsten
> >
> > On Wed, Dec 9, 2020 at 3:54 AM TEC <tecosaur@gmail.com> wrote:
> >
> >>
> >> Hi Sam, link stability is a concern I've had too. I currently have a fix
> >> (or at the very least, an improvement) for this in my config where I
> >> overwrite org-export-get-reference. (see:
> >>
> >> ).
> >>
> >> I raised this on the list a while ago ---
> >>
> but there
> >> didn't seem to be much interest.
> >>
> >> All the best,
> >> Timothy
> >>
> >> Samuel Wales <samologist@gmail.com> writes:
> >>
> >> > when you link to a section using toc, you get a link like
> >> >
> >> >
> >>
> >> >
> >> > will these links break if somebody copies them and pastes them
> >> > elsewhere?  what if you add a section?
> >> >
> >> > there doesn't seem to be a perfect solution, short of adding custom id
> >> > or id to everything, but perhaps a fuzzy hash of the header and
> >> > contents of the section could be used?  or a strict hash of the
> >> > header?  is anything like this being done?  just curious.
> >>
> >>
> >>
> >
>
>
> --
> The Kafka Pandemic
>
> Please learn what misopathy is.
>
>

[-- Attachment #2: Type: text/html, Size: 6683 bytes --]

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: stability of toc links
2020-12-10  9:55        Carsten Dominik
@ 2020-12-10 12:49          TEC
2020-12-10 14:36            TEC
From: TEC @ 2020-12-10 12:49 UTC (permalink / raw)
To: Carsten Dominik; +Cc: org-mode list

Carsten Dominik <dominik@uva.nl> writes:

> Yes, I mean this code, or something like this, to aid the automatic
> creation of links that are somewhat stable.  I have been missing this very
> much.

Hi Carsten, glad to hear that there /does/ seem to be interest in this after all :)

A few things worth saying I think:
- I'm quite happy with the idea of my code being used verbatim, with any
modifications others think are a good idea (of course)
- I am have FSF assignment, and the repo is MIT licensed already. In
case it needs saying, I'm quite happy to waive any annoying licence
terms (inclusion of copyright notice is the only thing that comes to
mind) for any code that may be used in Org.
- There are a few touch ups I'll do to my code shortly

All the best,

Timothy.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: stability of toc links
2020-12-10 12:49          TEC
@ 2020-12-10 14:36            TEC
2020-12-11  7:51              Carsten Dominik
From: TEC @ 2020-12-10 14:36 UTC (permalink / raw)
To: Carsten Dominik; +Cc: org-mode list

> There are a few touch ups I'll do to my code shortly

of my code (hopefully) in
https://github.com/tecosaur/emacs-config/commit/dc873d3

I hope this may be of some help,

Timothy

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: stability of toc links
2020-12-10 14:36            TEC
@ 2020-12-11  7:51              Carsten Dominik
2020-12-19  6:41                Carsten Dominik
From: Carsten Dominik @ 2020-12-11  7:51 UTC (permalink / raw)
To: TEC; +Cc: org-mode list

[-- Attachment #1: Type: text/plain, Size: 835 bytes --]

Dear all,

let me test this a bit, and then I am going to proposa a patch.

Kind regards

Carsten

On Thu, Dec 10, 2020 at 3:38 PM TEC <tecosaur@gmail.com> wrote:

>
> > There are a few touch ups I'll do to my code shortly
>
> I'm pleased to say that I've improved the readability and documentation
> of my code (hopefully) in
>
>
> I hope this may be of some help,
>
> Timothy
>

[-- Attachment #2: Type: text/html, Size: 1725 bytes --]

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: stability of toc links
2020-12-09  1:39    Tom Gillespie
@ 2020-12-12 21:51      TRS-80
2020-12-12 22:47        TRS-80
From: TRS-80 @ 2020-12-12 21:51 UTC (permalink / raw)
To: emacs-orgmode

On 2020-12-08 20:39, Tom Gillespie wrote:
> It sounds like you are looking for the CUSTOM_ID property. See
> whether there is a way to generate ids matching headlines within org
> itself, but there is
> https://github.com/alphapapa/unpackaged.el#export-to-html-with-useful-anchors.
> Best!
> Tom

I had set out to shave this particular yak just yesterday I think it
was.  I know I came across alphapapa's solution and maybe TEC's too, but
they were more complex than I could seem to get my feeble brain around
at the time.

Also, I was going for more of a deterministic result, trying to end up
with something like a Markdown style link id.  This coming up in the
course of my larger mission towards better support for exporting
README.org to Markdown (and ultimately, nicely rendered HTML) files over
at Sourcehut[0].

Finally, this operates by a totally different way than replacing some
part of Org export function(s).  My approach was simply to dynamically
assign a CUSTOM_ID property to every heading in current buffer (that did
not have one already) which would be generated according to some
deterministic method.  With the idea to then go on after that and do
whatever regular Org export you want.

Right off the bat I will say this is a very, VERY immature
implementation (literally yesterday).  And I have only done the very
lightest of testing (however it does basically work).  Therefore this is
not for consideration for inclusion into Orgmode but rather just my own
workaround in the meantime.  At best I might hope to add something
useful to the ongoing discussion (or perhaps become enlightened why this
is completely wrong approach).  ;)

I would like to point out the following problems which I have not (yet)
addressed in the following functions (#1 being most glaring probably) as
they are still too new:

1. The punctuation removal regexp needs to have many more characters
added (currently only containing {!.'}).  In fact, this strikes me as a
bit hacky, I am not even sure it's the best approach.

2. This function operates only on the current buffer.

3. Many things still need to be parameterized, in particular the TODO
state is hard coded to be included in the generated id and already I am
starting to think that's a bad idea (but it depends on context I
suppose, hence thinking to make it an option).

4. If I am trying to emulate Markdown (or any other spec) I really
should study and more properly and fully implement said spec.  I have
done /absolutely no such thing/ so far, only a (quite off the cuff)
"Markdown like" implementation.

5. Naming the function beginning with my-ox-' is not meant that this
should be included in ox- package necessarily but rather that I am
associating it with exporting from Org within my own mind and personal
init files.

My plan (before stumbling across this thread ;) ) was to continue to use
and polish these functions (privately) and eventually publish them on my
(relatively new) sr.ht profile[1].  But since this came up, I guess I
will go ahead and put it out there for feedback here on the mailing
list.  I still plan to eventually publish somewhere more properly with
license, where patches can be accepted, etc...  However in the
meantime...

With the above disclaimers out of the way, I present the following
function (and another simple one it depends on) in the hope they are
useful to someone.

[0] https://sourcehut.org
[1] https://sr.ht/~trs-80/

#+begin_src emacs-lisp

(defun my-major-mode-insure (mode)
"If we are not in MAJOR-MODE, exit with error."
(unless (string= major-mode mode)
(user-error "Buffer not in %s, exiting" mode)))

(defun my-ox-assign-custom-ids ()
"Assign reliable CUSTOM_ID to each heading in current buffer.

CUSTOM_ID will only be assigned if one does not exist already.

The generated CUSTOM_ID roughly[0] follows (my very basic and
limited understanding of) the Markdown spec.  In other words, it
will be generated by taking the heading text plus TODO state (so
as not to break link) and:

1. Lower case it.
2. Remove all punctuation.[1]
3. Replace spaces with hyphens.

[0] Currently, likely VERY roughly...

[1] Currently this is a bit hacky replace-regexp-in-string'
featuring only a few common punctuation (right now only
exclamation point, period, apostrophe (i.e., single quote).  Much
more will need to be added here, in fact I am not even sure this
is the best approach."
(interactive)
(my-major-mode-insure 'org-mode)
(org-map-entries '(org-set-property
"CUSTOM_ID"
;; replace space with hyphen
(replace-regexp-in-string
" " "-"
;; remove punctuation
(replace-regexp-in-string
"\\\!\\|\\\.\\|'" ""
(downcase
(substring-no-properties
nil
'file
;; skip function
(lambda ()
(when (org-entry-get nil "CUSTOM_ID")
(point)))))

#+end_src

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: stability of toc links
2020-12-12 21:51      TRS-80
@ 2020-12-12 22:47        TRS-80
0 siblings, 0 replies; 50+ messages in thread
From: TRS-80 @ 2020-12-12 22:47 UTC (permalink / raw)
To: emacs-orgmode

On 2020-12-12 16:51, TRS-80 wrote:
>   "If we are not in MAJOR-MODE, exit with error."

I noticed a small typo:

-  "If we are not in MAJOR-MODE, exit with error."
+  "If we are not in major MODE, exit with error."

Cheers,
TRS-80

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: stability of toc links
2020-12-08 23:28 stability of toc links Samuel Wales
2020-12-08 23:30  Samuel Wales
2020-12-09  2:48  TEC
@ 2020-12-14 10:46  Dominique Dumont
2021-04-18 10:32  Nicolas Goaziou
3 siblings, 0 replies; 50+ messages in thread
From: Dominique Dumont @ 2020-12-14 10:46 UTC (permalink / raw)
To: emacs-orgmode

On Wednesday, 9 December 2020 00:28:46 CET Samuel Wales wrote:
> when you link to a section using toc, you get a link like
>
> https://thekafkapandemic.blogspot.com/2020/02/crimes-against-humanity_3.htm
> l#org080f0ab
>
> will these links break if somebody copies them and pastes them
> elsewhere?  what if you add a section?

I have a similar problem. I write documentation for a customer in org format.
I also have to generate Markdown files that are archived in a git repo (Unlike
Github, Azure DevOps doesn't support org files).

Currently, TOC and headers lines change every time the markdown files are
regenerated, which makes git diff much bigger, which also impacts code reviews.

So stabilizing the generated toc would be much welcome

All the best

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: stability of toc links
2020-12-11  7:51              Carsten Dominik
@ 2020-12-19  6:41                Carsten Dominik
From: Carsten Dominik @ 2020-12-19  6:41 UTC (permalink / raw)
To: TEC; +Cc: org-mode list

[-- Attachment #1: Type: text/plain, Size: 1103 bytes --]

Dear all,

I am sorry, I have trouble finding the time to work on this - so if someone
else wants to look further into this, that would be great.

Carsten

On Fri, Dec 11, 2020 at 8:51 AM Carsten Dominik <dominik@uva.nl> wrote:

> Dear all,
>
> let me test this a bit, and then I am going to proposa a patch.
>
> Kind regards
>
> Carsten
>
> On Thu, Dec 10, 2020 at 3:38 PM TEC <tecosaur@gmail.com> wrote:
>
>>
>> > There are a few touch ups I'll do to my code shortly
>>
>> I'm pleased to say that I've improved the readability and documentation
>> of my code (hopefully) in
>>
>>
>> I hope this may be of some help,
>>
>> Timothy
>>
>

[-- Attachment #2: Type: text/html, Size: 2307 bytes --]

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: stability of toc links
2020-12-19  6:41                Carsten Dominik
0 siblings, 0 replies; 50+ messages in thread
To: Carsten Dominik, TEC; +Cc: org-mode list

Carsten Dominik <dominik@uva.nl> writes:

> Dear all,
>
> I am sorry, I have trouble finding the time to work on this - so if someone
> else wants to look further into this, that would be great.
>
> Carsten

FYI, there is also built-in generator for bibtex ids -
bibtex-generate-autokey from bibtex.el. It does a pretty decent job
generating unique citation keys. I guess one can also reuse it to

Best,
Ihor

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: stability of toc links
2020-12-08 23:28 stability of toc links Samuel Wales
 (2 preceding siblings ...)
2020-12-14 10:46  Dominique Dumont
@ 2021-04-18 10:32  Nicolas Goaziou
2021-04-20  0:58    Samuel Wales
From: Nicolas Goaziou @ 2021-04-18 10:32 UTC (permalink / raw)
To: Samuel Wales; +Cc: emacs-orgmode

Hello,

Samuel Wales <samologist@gmail.com> writes:

> when you link to a section using toc, you get a link like
>
>   https://thekafkapandemic.blogspot.com/2020/02/crimes-against-humanity_3.html#org080f0ab
>
> will these links break if somebody copies them and pastes them
> elsewhere?  what if you add a section?

These links should be stable in a publishing context (i.e., you use
org-publish' to generate the document), even if you add a section.

Regards,
--
Nicolas Goaziou

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: stability of toc links
2020-12-09  2:48  TEC
2020-12-09  8:45    Diego Zamboni
2020-12-09  9:15    Carsten Dominik
@ 2021-04-18 21:02    Samuel Wales
2 siblings, 0 replies; 50+ messages in thread
From: Samuel Wales @ 2021-04-18 21:02 UTC (permalink / raw)
To: TEC; +Cc: emacs-orgmode

i think this is the right type of solution.

On 12/8/20, TEC <tecosaur@gmail.com> wrote:
>
> Hi Sam, link stability is a concern I've had too. I currently have a fix
> (or at the very least, an improvement) for this in my config where I
> overwrite org-export-get-reference. (see:
>
> I raised this on the list a while ago ---
> https://orgmode.org/list/E1jxAjq-0004Dk-LH@lists.gnu.org/ but there
> didn't seem to be much interest.
>
> All the best,
> Timothy
>
> Samuel Wales <samologist@gmail.com> writes:
>
>> when you link to a section using toc, you get a link like
>>
>>
>> https://thekafkapandemic.blogspot.com/2020/02/crimes-against-humanity_3.html#org080f0ab
>>
>> will these links break if somebody copies them and pastes them
>> elsewhere?  what if you add a section?
>>
>> there doesn't seem to be a perfect solution, short of adding custom id
>> or id to everything, but perhaps a fuzzy hash of the header and
>> contents of the section could be used?  or a strict hash of the
>> header?  is anything like this being done?  just curious.
>
>
>

--
The Kafka Pandemic

https://thekafkapandemic.blogspot.com/2013/10/why-some-diseases-are-wronged.html

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: stability of toc links
2021-04-18 10:32  Nicolas Goaziou
@ 2021-04-20  0:58    Samuel Wales
2021-04-20 10:34      Nicolas Goaziou
From: Samuel Wales @ 2021-04-20  0:58 UTC (permalink / raw)
To: Samuel Wales, emacs-orgmode

thank you.  i use export.  e.g. export a subtree to html and paste
into blogger.  i don't use publish.

for me [and some others on this thread iirc] tec's fix looks good,
fwiw.  it would not proliferate custom id or id, and would fix both.

On 4/18/21, Nicolas Goaziou <mail@nicolasgoaziou.fr> wrote:
> Hello,
>
> Samuel Wales <samologist@gmail.com> writes:
>
>> when you link to a section using toc, you get a link like
>>
>>
>> https://thekafkapandemic.blogspot.com/2020/02/crimes-against-humanity_3.html#org080f0ab
>>
>> will these links break if somebody copies them and pastes them
>> elsewhere?  what if you add a section?
>
> These links should be stable in a publishing context (i.e., you use
> org-publish' to generate the document), even if you add a section.
>
> Regards,
> --
> Nicolas Goaziou
>

--
The Kafka Pandemic

https://thekafkapandemic.blogspot.com/2013/10/why-some-diseases-are-wronged.html

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: stability of toc links
2021-04-20  0:58    Samuel Wales
@ 2021-04-20 10:34      Nicolas Goaziou
2021-04-21  0:33        Samuel Wales
From: Nicolas Goaziou @ 2021-04-20 10:34 UTC (permalink / raw)
To: Samuel Wales; +Cc: emacs-orgmode

Hello,

Samuel Wales <samologist@gmail.com> writes:

> thank you.  i use export.  e.g. export a subtree to html and paste
> into blogger.  i don't use publish.
>
> for me [and some others on this thread iirc] tec's fix looks good,
> fwiw.  it would not proliferate custom id or id, and would fix both.

Note that I merely answered your questions in the initial post. I didn't
try to solve a problem you weren't formulating anyway.

I don't know what "tec's fix" is, but if you found a solution, that's
great!

Regards,
--
Nicolas Goaziou

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: stability of toc links
2021-04-20 10:34      Nicolas Goaziou
@ 2021-04-21  0:33        Samuel Wales
2021-04-21  8:32          Nicolas Goaziou
From: Samuel Wales @ 2021-04-21  0:33 UTC (permalink / raw)
To: Samuel Wales, emacs-orgmode

hi,

> I don't know what "tec's fix" is,

i was referring to:

Message ID	<87v9dbelky.fsf@gmail.com>

[and other posts in this thread related to it.]

the problem as i see it is link stability in generated output.

On 4/20/21, Nicolas Goaziou <mail@nicolasgoaziou.fr> wrote:
> Hello,
>
> Samuel Wales <samologist@gmail.com> writes:
>
>> thank you.  i use export.  e.g. export a subtree to html and paste
>> into blogger.  i don't use publish.
>>
>> for me [and some others on this thread iirc] tec's fix looks good,
>> fwiw.  it would not proliferate custom id or id, and would fix both.
>
> Note that I merely answered your questions in the initial post. I didn't
> try to solve a problem you weren't formulating anyway.
>
> I don't know what "tec's fix" is, but if you found a solution, that's
> great!
>
> Regards,
> --
> Nicolas Goaziou
>

--
The Kafka Pandemic

https://thekafkapandemic.blogspot.com/2013/10/why-some-diseases-are-wronged.html

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: stability of toc links
2021-04-21  0:33        Samuel Wales
@ 2021-04-21  8:32          Nicolas Goaziou
2021-04-21 13:32            Samuel Loury
From: Nicolas Goaziou @ 2021-04-21  8:32 UTC (permalink / raw)
To: Samuel Wales; +Cc: emacs-orgmode

Hello,

Samuel Wales <samologist@gmail.com> writes:

> i was referring to:
>
>   Message ID	<87v9dbelky.fsf@gmail.com>
>
>
> [and other posts in this thread related to it.]

Unfortunately, I won't have time to look at the whole thread anytime
soon. However, feel free to explain how "tec's fix" works, or point me
to the exact post where it is explained so I can understand it.

> the problem as i see it is link stability in generated output.

The links are stable if you publish a document. Note that you can even
"publish" a single document.

For exported (i.e., one off documents), this is not so an issue (IMO)
since you have custom ID. I assume this was evoked in the thread.

Link stability is also tied to publishing process because it needs
a cache. And I lazily re-used the cache already implemented there.

Regards,
--
Nicolas Goaziou

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: stability of toc links
2021-04-21  8:32          Nicolas Goaziou
@ 2021-04-21 13:32            Samuel Loury
2021-04-21 16:24              Nicolas Goaziou
2021-04-21 23:20              Samuel Wales
0 siblings, 2 replies; 50+ messages in thread
From: Samuel Loury @ 2021-04-21 13:32 UTC (permalink / raw)
To: Nicolas Goaziou, Samuel Wales; +Cc: emacs-orgmode

[-- Attachment #1: Type: text/plain, Size: 1869 bytes --]

Nicolas Goaziou <mail@nicolasgoaziou.fr> writes:

[...]

> Samuel Wales <samologist@gmail.com> writes:
>
>> i was referring to:
>>
>>   Message ID	<87v9dbelky.fsf@gmail.com>
>>
>>
>> [and other posts in this thread related to it.]
>
> Unfortunately, I won't have time to look at the whole thread anytime
> soon. However, feel free to explain how "tec's fix" works, or point me
> to the exact post where it is explained so I can understand it.
>
>> the problem as i see it is link stability in generated output.
>
> The links are stable if you publish a document. Note that you can even
> "publish" a single document.
>
> For exported (i.e., one off documents), this is not so an issue (IMO)
> since you have custom ID. I assume this was evoked in the thread.

I think you are both talking about the same thing but have different

Say you want to export an org file, then share the link to a friend to a

Imagine that for some reason, you re export the file, then new ids are
generated and the links becomes broken.

Nicolas Goaziou, I know you understand this and propose to use CUSTOM_ID
to make those ids static. But IIUC, Samuel Wales does not know in
everywhere, just in case. This sounds like a lot of unnecessary work.

The solution of tec¹ appears to be a new generation of the html ids based
on the heading content rather than apparently randomly generated, making
the generated link become the same across new generations.

I hope it clarified the discussion.

My best,

--
Konubinix
GPG Key    : 7439106A
Fingerprint: 5993 BE7A DA65 E2D9 06CE  5C36 75D2 3CED 7439 106A

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 487 bytes --]

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: stability of toc links
2021-04-21 13:32            Samuel Loury
@ 2021-04-21 16:24              Nicolas Goaziou
2021-04-23 15:15                Maxim Nikulin
2021-04-25 17:01                Dominique Dumont
2021-04-21 23:20              Samuel Wales
1 sibling, 2 replies; 50+ messages in thread
From: Nicolas Goaziou @ 2021-04-21 16:24 UTC (permalink / raw)
To: Samuel Loury; +Cc: emacs-orgmode

Hello,

> The solution of tec¹ appears to be a new generation of the html ids based
> on the heading content rather than apparently randomly generated, making
> the generated link become the same across new generations.

AFAICT, the link you send only contains code, not explanations nor
design.

In particular, I'm not sure to understand how one system can generate an
ID based on the heading content and still limit itself to alphanumeric
characters. For example, what ID are generated with the following
document?

--8<---------------cut here---------------start------------->8---
* こんにちは
* ｺﾝﾆﾁﾊ
--8<---------------cut here---------------end--------------->8---

Also, does the ID stay stable if you start the following document

--8<---------------cut here---------------start------------->8---
* A
:PROPERTIES:
:CUSTOM_ID: こんにちは
:END:
--8<---------------cut here---------------end--------------->8---

and then edit it to become:

--8<---------------cut here---------------start------------->8---
* B
:PROPERTIES:
:CUSTOM_ID: こんにちは
:END:
--8<---------------cut here---------------end--------------->8---

I hear about stability of links, which is a detail of implementation. We
current only cache, and freeze, ID actually being referred to, but that
could be extended.

I'd link to make sure everyone understands the problems that the current
implementation is trying to solve before throwing it out of the window.

Regards,
--
Nicolas Goaziou

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: stability of toc links
2021-04-21 13:32            Samuel Loury
2021-04-21 16:24              Nicolas Goaziou
@ 2021-04-21 23:20              Samuel Wales
2021-04-21 23:30                Samuel Wales
From: Samuel Wales @ 2021-04-21 23:20 UTC (permalink / raw)
To: Samuel Loury; +Cc: emacs-orgmode, Nicolas Goaziou

more below.  [note the two samuels.]

On 4/21/21, Samuel Loury <konubinix@gmail.com> wrote:
> everywhere, just in case. This sounds like a lot of unnecessary work.
>
> The solution of tec¹ appears to be a new generation of the html ids based
> on the heading content rather than apparently randomly generated, making
> the generated link become the same across new generations.
>
> I hope it clarified the discussion.

it did improve it.  thank you.  the above is concise and clear.

one exported [i did not use org's publish facility] post is
https://thekafkapandemic.blogspot.com/2020/02/crimes-against-humanity_3.html
.  it has MANY sections.  i turn off toc for a whole section and then
put a toc in that section just to make the main toc less forbidding.
here is the html for just the top few entries of the main toc --
notice 3 links each with a hex code that changes.

<div id="table-of-contents">
<div id="text-table-of-contents">
<ul>
<li><a href="#org360b8b5">This post</a></li>
<li><a href="#orgdff6ff2">The name of the law</a></li>
<li><a href="#orgeb015f7">Basic facts</a></li>

suppose i add a section after This post.  Most links will now be
broken.  A could have sent any of them to B as raw hex links.

i just want the problem understood at the user level.  i get that
there are possible implementation issues.

i spent 16 years researching and writing the blog post.  i don't want
links to be broken or to have to kludge in a bunch of custom id or org
id properties drawers just in case somebody links.  even if drawers
are added to every linked section automatically, it's a lot of clutter
and slowness [org id searches are slow and drawers have performance
issues that are being worked on but not merged into maint yet].  that
is a lot of drawers just for links that might or might not be sent.

i am limited in computer use so i will probably not pursue this
further if there is no interest.

there is some interest.  e.g. carsten said he thought tec's code or
somethign like it should be merged into org qua org.

--
The Kafka Pandemic

https://thekafkapandemic.blogspot.com/2013/10/why-some-diseases-are-wronged.html

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: stability of toc links
2021-04-21 23:20              Samuel Wales
@ 2021-04-21 23:30                Samuel Wales
2021-04-29 21:40                  TRS-80
From: Samuel Wales @ 2021-04-21 23:30 UTC (permalink / raw)
To: Samuel Loury; +Cc: emacs-orgmode, Nicolas Goaziou

as it has been a long time my original post is

Message ID	<CAJcAo8tXVW39h18FUXLtwoDYNKQsFHqCp+YDiyWfLee29B1PRw@mail.gmail.com>

and the content is

===
when you link to a section using toc, you get a link like

https://thekafkapandemic.blogspot.com/2020/02/crimes-against-humanity_3.html#org080f0ab

will these links break if somebody copies them and pastes them
elsewhere?  what if you add a section?

there doesn't seem to be a perfect solution, short of adding custom id
or id to everything, but perhaps a fuzzy hash of the header and
contents of the section could be used?  or a strict hash of the
header?  is anything like this being done?  just curious.
===

On 4/21/21, Samuel Wales <samologist@gmail.com> wrote:
> more below.  [note the two samuels.]
>
> On 4/21/21, Samuel Loury <konubinix@gmail.com> wrote:
>> everywhere, just in case. This sounds like a lot of unnecessary work.
>>
>> The solution of tec¹ appears to be a new generation of the html ids based
>> on the heading content rather than apparently randomly generated, making
>> the generated link become the same across new generations.
>>
>> I hope it clarified the discussion.
>
> it did improve it.  thank you.  the above is concise and clear.
>
>
> one exported [i did not use org's publish facility] post is
> https://thekafkapandemic.blogspot.com/2020/02/crimes-against-humanity_3.html
> .  it has MANY sections.  i turn off toc for a whole section and then
> put a toc in that section just to make the main toc less forbidding.
> here is the html for just the top few entries of the main toc --
> notice 3 links each with a hex code that changes.
>
> <div id="table-of-contents">
> <div id="text-table-of-contents">
> <ul>
> <li><a href="#org360b8b5">This post</a></li>
> <li><a href="#orgdff6ff2">The name of the law</a></li>
> <li><a href="#orgeb015f7">Basic facts</a></li>
>
> suppose i add a section after This post.  Most links will now be
> broken.  A could have sent any of them to B as raw hex links.
>
> i just want the problem understood at the user level.  i get that
> there are possible implementation issues.
>
> i spent 16 years researching and writing the blog post.  i don't want
> links to be broken or to have to kludge in a bunch of custom id or org
> id properties drawers just in case somebody links.  even if drawers
> are added to every linked section automatically, it's a lot of clutter
> and slowness [org id searches are slow and drawers have performance
> issues that are being worked on but not merged into maint yet].  that
> is a lot of drawers just for links that might or might not be sent.
>
> i am limited in computer use so i will probably not pursue this
> further if there is no interest.
>
> there is some interest.  e.g. carsten said he thought tec's code or
> somethign like it should be merged into org qua org.
>
> --
> The Kafka Pandemic
>
> Please learn what misopathy is.
> https://thekafkapandemic.blogspot.com/2013/10/why-some-diseases-are-wronged.html
>

--
The Kafka Pandemic

https://thekafkapandemic.blogspot.com/2013/10/why-some-diseases-are-wronged.html

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: stability of toc links
2021-04-21 16:24              Nicolas Goaziou
@ 2021-04-23 15:15                Maxim Nikulin
2021-04-23 20:46                  Samuel Wales
2021-04-24  3:05                  Timothy
2021-04-25 17:01                Dominique Dumont
1 sibling, 2 replies; 50+ messages in thread
From: Maxim Nikulin @ 2021-04-23 15:15 UTC (permalink / raw)
To: emacs-orgmode

On 21/04/2021 23:24, Nicolas Goaziou wrote:
>
> In particular, I'm not sure to understand how one system can generate an
> ID based on the heading content and still limit itself to alphanumeric
> characters. For example, what ID are generated with the following
> document?

My impression is that such conversion is rather wide spread in various
web CMS and documentation generators. I strongly prefer human-readable
anchors (when I can guess link content and realize if I have read it
earlier)

https://werkzeug.palletsprojects.com/en/1.0.x/tutorial/#step-0-a-basic-wsgi-introduction

to codes like

https://orgmode.org/worg/org-hacks.html#org98f055b

I know, Cyrillic is a trivial case in comparison to your example below,
however that is the case when I can confirm that result of
transliteration to ASCII is usually readable enough. It is usually
applied to article title to generate a path component of URL.

> --8<---------------cut here---------------start------------->8---
> * こんにちは
> * ｺﾝﾆﾁﾊ
> --8<---------------cut here---------------end--------------->8---

Sorry, I can not estimate if the following conversion is accurate enough:

python3 -c 'import unidecode; print(unidecode.unidecode("こんにちは"))'
konnichiha

python3 -c 'import unidecode; print(unidecode.unidecode("ｺﾝﾆﾁﾊ"))'
konnitiha

Hex anchors could be a fallback if smarter method could not generate
something reasonable. Finally, exporters can generate compiler-like
warnings if some problem with anchor stability/ambiguity is detected. A
helper function may be suitable to fix ID before editing of a heading.

Actually it was not obvious to me that IDs like org98f055b may be
stable. It is a hidden feature. I do not know if Samuel can adjust his
workflow to use copy from "published" (to local directory) files instead
of copy from export buffer. I guess, as a starting point it is necessary
to pre-populate cache with IDs from existing HTML documents somehow.

Anyway thank you for clarifying of the role of publishing.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: stability of toc links
2021-04-23 15:15                Maxim Nikulin
@ 2021-04-23 20:46                  Samuel Wales
2021-04-23 20:48                    Samuel Wales
2021-04-24  3:05                  Timothy
From: Samuel Wales @ 2021-04-23 20:46 UTC (permalink / raw)
To: Maxim Nikulin; +Cc: emacs-orgmode

python is merely using a different romanization for the second script.
it might consider uppercase [same romanization] for the latter script
instead.  other than that, the overall approach [using export] is good
imo.

idk what transliterators exist in emacs.  i think the principle of
least surprise applies; many users will want to export not publish.

--
The Kafka Pandemic

https://thekafkapandemic.blogspot.com/2013/10/why-some-diseases-are-wronged.html

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: stability of toc links
2021-04-23 20:46                  Samuel Wales
@ 2021-04-23 20:48                    Samuel Wales
2021-04-23 20:51                      Samuel Wales
From: Samuel Wales @ 2021-04-23 20:48 UTC (permalink / raw)
To: Maxim Nikulin; +Cc: emacs-orgmode

i should point out that idk what is allowed in links.  if uppercase is
not, then script need not be indicated or can just use a prefix.

On 4/23/21, Samuel Wales <samologist@gmail.com> wrote:
> python is merely using a different romanization for the second script.
> it might consider uppercase [same romanization] for the latter script
> instead.  other than that, the overall approach [using export] is good
> imo.
>
> idk what transliterators exist in emacs.  i think the principle of
> least surprise applies; many users will want to export not publish.
>
> --
> The Kafka Pandemic
>
> Please learn what misopathy is.
> https://thekafkapandemic.blogspot.com/2013/10/why-some-diseases-are-wronged.html
>

--
The Kafka Pandemic

https://thekafkapandemic.blogspot.com/2013/10/why-some-diseases-are-wronged.html

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: stability of toc links
2021-04-23 20:48                    Samuel Wales
@ 2021-04-23 20:51                      Samuel Wales
0 siblings, 0 replies; 50+ messages in thread
From: Samuel Wales @ 2021-04-23 20:51 UTC (permalink / raw)
To: Maxim Nikulin; +Cc: emacs-orgmode

[and also that i was merely looking at the examples and maxim's
analysis which i agree with, not tec's or others' code.]

On 4/23/21, Samuel Wales <samologist@gmail.com> wrote:
> i should point out that idk what is allowed in links.  if uppercase is
> not, then script need not be indicated or can just use a prefix.
>
> On 4/23/21, Samuel Wales <samologist@gmail.com> wrote:
>> python is merely using a different romanization for the second script.
>> it might consider uppercase [same romanization] for the latter script
>> instead.  other than that, the overall approach [using export] is good
>> imo.
>>
>> idk what transliterators exist in emacs.  i think the principle of
>> least surprise applies; many users will want to export not publish.
>>
>> --
>> The Kafka Pandemic
>>
>> Please learn what misopathy is.
>> https://thekafkapandemic.blogspot.com/2013/10/why-some-diseases-are-wronged.html
>>
>
>
> --
> The Kafka Pandemic
>
> Please learn what misopathy is.
> https://thekafkapandemic.blogspot.com/2013/10/why-some-diseases-are-wronged.html
>

--
The Kafka Pandemic

https://thekafkapandemic.blogspot.com/2013/10/why-some-diseases-are-wronged.html

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: stability of toc links
2021-04-23 15:15                Maxim Nikulin
2021-04-23 20:46                  Samuel Wales
@ 2021-04-24  3:05                  Timothy
1 sibling, 0 replies; 50+ messages in thread
From: Timothy @ 2021-04-24  3:05 UTC (permalink / raw)
To: Maxim Nikulin; +Cc: emacs-orgmode

Maxim Nikulin <manikulin@gmail.com> writes:

> python3 -c 'import unidecode; print(unidecode.unidecode("こんにちは"))'
> konnichiha
>
> python3 -c 'import unidecode; print(unidecode.unidecode("ｺﾝﾆﾁﾊ"))'
> konnitiha

It looks like this isn't built into Emacs, and a package would be
needed: https://github.com/sindikat/unidecode

--
Timothy

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: stability of toc links
2021-04-21 16:24              Nicolas Goaziou
2021-04-23 15:15                Maxim Nikulin
@ 2021-04-25 17:01                Dominique Dumont
2021-04-30  6:24                  Timothy
From: Dominique Dumont @ 2021-04-25 17:01 UTC (permalink / raw)
To: Samuel Loury, Samuel Wales, emacs-orgmode

On Wednesday, 21 April 2021 18:24:27 CEST Nicolas Goaziou wrote:
> In particular, I'm not sure to understand how one system can generate an
> ID based on the heading content and still limit itself to alphanumeric
> characters. For example, what ID are generated with the following
> document?
>
> --8<---------------cut here---------------start------------->8---
> * こんにちは
> * ｺﾝﾆﾁﾊ
> --8<---------------cut here---------------end--------------->8---

For what it's worth, DNS servers faces a similar problem where host names can
contain any unicode character, but DNS servers support only ascii char. In DNS
cases, this is worked around using punycode. (1)

Using the example above, a host named "こんにちは.example" is converted to
"xn--28j2a3ar1p.example".

Punycode is available in Emacs through libidn (2)

HTH

(1) https://dnsquery.org/punycode
(2) https://www.gnu.org/software/libidn/

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: stability of toc links
2021-04-21 23:30                Samuel Wales
@ 2021-04-29 21:40                  TRS-80
2021-04-29 22:18                    Samuel Wales
From: TRS-80 @ 2021-04-29 21:40 UTC (permalink / raw)
To: emacs-orgmode

On 2021-04-21 19:30, Samuel Wales wrote:
> as it has been a long time my original post is
>
> Message
> ID	<CAJcAo8tXVW39h18FUXLtwoDYNKQsFHqCp+YDiyWfLee29B1PRw@mail.gmail.com>
>
> and the content is
>
> ===
> when you link to a section using toc, you get a link like
>
>
> https://thekafkapandemic.blogspot.com/2020/02/crimes-against-humanity_3.html#org080f0ab
>
> will these links break if somebody copies them and pastes them
> elsewhere?  what if you add a section?
>
> there doesn't seem to be a perfect solution, short of adding custom id
> or id to everything, but perhaps a fuzzy hash of the header and
> contents of the section could be used?  or a strict hash of the
> header?  is anything like this being done?  just curious.

My "quick and dirty" solution was to make a function to map over each
heading and assign a custom ID based on the text of the heading.  It
only assigned one if one did not exist already (so in case you had
done any manual adjustments they would not be overwritten).

I had posted it very early in the thread, but it did not seem to gain
much interest.  It was not posted as a "long term and proper solution
to be included into Orgmode" but rather a quick and easy workaround in
the meantime.

Cheers,
TRS-80

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: stability of toc links
2021-04-29 21:40                  TRS-80
@ 2021-04-29 22:18                    Samuel Wales
2021-04-30  1:48                      TRS-80
2021-04-30  5:13                      Tim Cross
0 siblings, 2 replies; 50+ messages in thread
From: Samuel Wales @ 2021-04-29 22:18 UTC (permalink / raw)
To: TRS-80; +Cc: emacs-orgmode

hi trs,

thank you.  i can imagine that could be useful for a lot of users, but
for me, as i said in my op, "short of adding custom id
or id to everything" --- i didn't want to add custom id.  i will try
to clarify why in case it is useful.

in addition to performance, and clutter, there is a semantic issue in
my case.  typically, if i see that there is a properties drawer, i
know that it is there because of an org id or a manual custom id or a
special purpose of my own.  if i know it, i don't need to open it.

there is a new meaning for properties drawers [namely, for stable
linking done automatically].  i would have to open the drawer to
determine if i personally wanted something there.

and thus, the extra properties drawers would cause effort and
distraction in this semantic sense, where i would be opening them
because i would be thinking "did i really have a reason to add a
properties drawer here? i don't recall so... better check"

also, there is the issue that if i decide not to include something in
the toc, it will still have a properties drawer lying around.

in the op, i was not looking for a solution for one blog post, but
thought a general solution for all org users might be possible.

and this would likely be at the html level, probably by using e.g.
header text, fuzzy or strict hashes, or a combination.

when tec posted his html level code, it looked like the right type of
solution to the problem.  i have not tried it, however.

i hope that clarifies.  tec said he originally did not get much
interest.  then there was interest on this thread.  then nothing.

On 4/29/21, TRS-80 <lists.trs-80@isnotmyreal.name> wrote:
> On 2021-04-21 19:30, Samuel Wales wrote:
>> as it has been a long time my original post is
>>
>> Message
>> ID	<CAJcAo8tXVW39h18FUXLtwoDYNKQsFHqCp+YDiyWfLee29B1PRw@mail.gmail.com>
>>
>> and the content is
>>
>> ===
>> when you link to a section using toc, you get a link like
>>
>>
>> https://thekafkapandemic.blogspot.com/2020/02/crimes-against-humanity_3.html#org080f0ab
>>
>> will these links break if somebody copies them and pastes them
>> elsewhere?  what if you add a section?
>>
>> there doesn't seem to be a perfect solution, short of adding custom id
>> or id to everything, but perhaps a fuzzy hash of the header and
>> contents of the section could be used?  or a strict hash of the
>> header?  is anything like this being done?  just curious.
>
> My "quick and dirty" solution was to make a function to map over each
> heading and assign a custom ID based on the text of the heading.  It
> only assigned one if one did not exist already (so in case you had
> done any manual adjustments they would not be overwritten).
>
> I had posted it very early in the thread, but it did not seem to gain
> much interest.  It was not posted as a "long term and proper solution
> to be included into Orgmode" but rather a quick and easy workaround in
> the meantime.
>
> Cheers,
> TRS-80
>
>

--
The Kafka Pandemic

https://thekafkapandemic.blogspot.com/2013/10/why-some-diseases-are-wronged.html

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: stability of toc links
2021-04-29 22:18                    Samuel Wales
@ 2021-04-30  1:48                      TRS-80
2021-04-30  5:13                      Tim Cross
1 sibling, 0 replies; 50+ messages in thread
From: TRS-80 @ 2021-04-30  1:48 UTC (permalink / raw)
To: emacs-orgmode

On 2021-04-29 18:18, Samuel Wales wrote:
> On 4/29/21, TRS-80 <lists.trs-80@isnotmyreal.name> wrote:

>> My "quick and dirty" solution was to make a function to map over each
>> heading and assign a custom ID based on the text of the heading.  It
>> only assigned one if one did not exist already (so in case you had
>> done any manual adjustments they would not be overwritten).
>>
>> I had posted it very early in the thread, but it did not seem to gain
>> much interest.  It was not posted as a "long term and proper solution
>> to be included into Orgmode" but rather a quick and easy workaround in
>> the meantime.

> thank you.  i can imagine that could be useful for a lot of users, but
> for me, as i said in my op, "short of adding custom id
> or id to everything" --- i didn't want to add custom id.  i will try
> to clarify why in case it is useful.

Well, FWIW, I did not want to assign custom IDs, either.  :)  Like you,
I didn't want to "pollute" the properties drawer, etc.

However after looking into the options, including discussing this
issue on the mailing list myself some time ago, I realized that this
was going to be a major change in Orgmode, that it had been discussed
before (multiple times), and therefore would require (perhaps
significant) politicking, etc...

And so then I assigned custom IDs to all my entries, and got on about

I still support your (and others') efforts toward a more proper and
long term solution, I just think they are going to take a while to
bear fruit is all.

Cheers,
TRS-80

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: stability of toc links
2021-04-29 22:18                    Samuel Wales
2021-04-30  1:48                      TRS-80
@ 2021-04-30  5:13                      Tim Cross
2021-04-30 10:02                        Samuel Loury
From: Tim Cross @ 2021-04-30  5:13 UTC (permalink / raw)
To: emacs-orgmode

Samuel Wales <samologist@gmail.com> writes:

> hi trs,
>
> thank you.  i can imagine that could be useful for a lot of users, but
> for me, as i said in my op, "short of adding custom id
> or id to everything" --- i didn't want to add custom id.  i will try
> to clarify why in case it is useful.
>
> in addition to performance, and clutter, there is a semantic issue in
> my case.  typically, if i see that there is a properties drawer, i
> know that it is there because of an org id or a manual custom id or a
> special purpose of my own.  if i know it, i don't need to open it.
>
> however, adding custom id automatically for so many links means that
> there is a new meaning for properties drawers [namely, for stable
> linking done automatically].  i would have to open the drawer to
> determine if i personally wanted something there.
>
> and thus, the extra properties drawers would cause effort and
> distraction in this semantic sense, where i would be opening them
> because i would be thinking "did i really have a reason to add a
> properties drawer here? i don't recall so... better check"
>
> also, there is the issue that if i decide not to include something in
> the toc, it will still have a properties drawer lying around.
>
>
> in the op, i was not looking for a solution for one blog post, but
> thought a general solution for all org users might be possible.
>
> and this would likely be at the html level, probably by using e.g.
> header text, fuzzy or strict hashes, or a combination.
>
> when tec posted his html level code, it looked like the right type of
> solution to the problem.  i have not tried it, however.
>
> i hope that clarifies.  tec said he originally did not get much
> interest.  then there was interest on this thread.  then nothing.
>

A question to help me understand this issue.

If I understand correctly, exporting to HTML does not guarantee
stability of TOC links. If you export as HTML, send someone a link from
the toc and then re-export the document, the link will possibly be
broken. Essentially, exporting to HTML has no guarantee of stability in

However, if you use publish instead of exporting to HTML, there is a
guarantee of stability in toc links. When publishing a second time, the
link will be consistent and still valid.

If you want stability in toc links, why not use publish instead of
export to html? Is there some difference between the two mechanisms
which prevents you from being able to use publish instead to get stable

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: stability of toc links
2021-04-25 17:01                Dominique Dumont
@ 2021-04-30  6:24                  Timothy
2021-04-30 12:20                    Maxim Nikulin
From: Timothy @ 2021-04-30  6:24 UTC (permalink / raw)
To: Dominique Dumont; +Cc: emacs-orgmode, Samuel Loury

Dominique Dumont <dominique.dumont@netc.fr> writes:

> For what it's worth, DNS servers faces a similar problem where host names can
> contain any unicode character, but DNS servers support only ascii char. In DNS
> cases, this is worked around using punycode. (1)
>
> Using the example above, a host named "こんにちは.example" is converted to
> "xn--28j2a3ar1p.example".
>
> Punycode is available in Emacs through libidn (2)

Thanks! I'm now making use of it as of https://github.com/tecosaur/emacs-config/commit/1ccbadd

--
Timothy

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: stability of toc links
2021-04-30  5:13                      Tim Cross
@ 2021-04-30 10:02                        Samuel Loury
2021-04-30 11:12                          Nicolas Goaziou
From: Samuel Loury @ 2021-04-30 10:02 UTC (permalink / raw)
To: Tim Cross, emacs-orgmode

[-- Attachment #1: Type: text/plain, Size: 1360 bytes --]

Hi.

I understood differently.

Tim Cross <theophilusx@gmail.com> writes:

[...]

> A question to help me understand this issue.
>
> If I understand correctly, exporting to HTML does not guarantee
> stability of TOC links. If you export as HTML, send someone a link from
> the toc and then re-export the document, the link will possibly be
> broken. Essentially, exporting to HTML has no guarantee of stability in

I think this is right.

> However, if you use publish instead of exporting to HTML, there is a
> guarantee of stability in toc links. When publishing a second time, the
> link will be consistent and still valid.

The publish feature only means exporting several files at once. IIUC,
what was written was that when using the publish feature, the exported
html pages will be coherent and a link in one document pointing to
another document of the same publish call won't be broken.

But IIUC, publishing the whole stuff again will result in totally
different links. They will still be coherent and no broken link from one
document of the whole to another. But a browser bookmark pointing the
published lot the first time won't work with the same lot the second
time.

Did I understand correctly?
--
Konubinix
GPG Key    : 7439106A
Fingerprint: 5993 BE7A DA65 E2D9 06CE  5C36 75D2 3CED 7439 106A

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 487 bytes --]

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: stability of toc links
2021-04-30 10:02                        Samuel Loury
@ 2021-04-30 11:12                          Nicolas Goaziou
2021-04-30 21:12                            Tim Cross
2021-05-01  3:08                            Greg Minshall
0 siblings, 2 replies; 50+ messages in thread
From: Nicolas Goaziou @ 2021-04-30 11:12 UTC (permalink / raw)
To: Samuel Loury; +Cc: Tim Cross, emacs-orgmode

Hello,

Samuel Loury <konubinix@gmail.com> writes:

> The publish feature only means exporting several files at once.

You can publish a single file, too. It makes sense when a file is always
exported to the same location, possibly with the same configuration.

> IIUC, what was written was that when using the publish feature, the exported
> html pages will be coherent and a link in one document pointing to
> another document of the same publish call won't be broken.
>
> But IIUC, publishing the whole stuff again will result in totally
> different links. They will still be coherent and no broken link from one
> document of the whole to another. But a browser bookmark pointing the
> published lot the first time won't work with the same lot the second
> time.
>
> Did I understand correctly?

That's correct.

Org provides a mechanism, called org-export-get-reference', for
creating internal references, which relies on randomness + cache. But it
explicitly removes internal references not actually used from there (see
org-publish--store-crossrefs'). Keeping those references instead would
make all links stable, of course.

Regards,
--
Nicolas Goaziou

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: stability of toc links
2021-04-30  6:24                  Timothy
@ 2021-04-30 12:20                    Maxim Nikulin
0 siblings, 0 replies; 50+ messages in thread
From: Maxim Nikulin @ 2021-04-30 12:20 UTC (permalink / raw)
To: emacs-orgmode

On 30/04/2021 13:24, Timothy wrote:
> Dominique Dumont <dominique.dumont@netc.fr> writes:
>
>> For what it's worth, DNS servers faces a similar problem where host names can
>> contain any unicode character, but DNS servers support only ascii char. In DNS
>> cases, this is worked around using punycode. (1)
>>
>> Using the example above, a host named "こんにちは.example" is converted to
>> "xn--28j2a3ar1p.example".
>>
>> Punycode is available in Emacs through libidn (2)
>
> Thanks! I'm now making use of it as of https://github.com/tecosaur/emacs-config/commit/1ccbadd

Personally, I do not see any point in using of punycode. The result is

I was not sure that url-hexify-string is a reliable solution, but
surprisingly it is used by wikipedia now
https://ru.wikipedia.org/wiki/Emacs#%D0%98%D1%81%D1%82%D0%BE%D1%80%D0%B8%D1%8F[8][9]
Could you, please, provide some details concerning problem you have noticed?

Earlier wikipedia was used another approach:
https://ru.wikipedia.org/wiki/%D0%98%D0%B2%D0%B0%D0%BD_%D0%A4%D1%91%D0%B4%D0%BE%D1%80%D0%BE%D0%B2#.D0.91.D0.B8.D0.BE.D0.B3.D1.80.D0.B0.D1.84.D0.B8.D1.8F
(random URL found in browser history).

Both pure (percent) encode or encode followed by s/%/./g produce

I would like to see something like
https://support.mozilla.org/ru/kb/zaprosy-na-razresheniya-dlya-rasshirenij-firefox
I was never interested in this topic enough to check if other
implementations, e.g. wordpress (or some plugin) use the same technique
as python's unidecode or something else. Certainly, unidecode should not
be hard dependency, but if installed it could be used instead of less
intelligent default method.

On 24/04/2021 03:51, Samuel Wales wrote:
> [and also that i was merely looking at the examples and maxim's
> analysis which i agree with, not tec's or others' code.]

To be clear: my message was in support to Timothy's initiative.

On 24/04/2021 03:46, Samuel Wales wrote:
> i think the principle of
> least surprise applies; many users will want to export not publish.

Unsure, but names might be considered confusing. "Publish" to local
files is a flavor of export that is accordingly to Nicolas stabilize
link anchors (the feature you miss in "export"). Nobody forces you to
transfer exported files to your hosting using "publish".

Simple export is suitable for transient fragments or for tools like
https://github.com/jkitchin/ox-clip

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: stability of toc links
2021-04-30 11:12                          Nicolas Goaziou
@ 2021-04-30 21:12                            Tim Cross
2021-05-01 12:36                              Nicolas Goaziou
2021-05-01  3:08                            Greg Minshall
From: Tim Cross @ 2021-04-30 21:12 UTC (permalink / raw)
To: Nicolas Goaziou; +Cc: emacs-orgmode, Samuel Loury

Nicolas Goaziou <mail@nicolasgoaziou.fr> writes:

> Hello,
>
> Samuel Loury <konubinix@gmail.com> writes:
>
>> The publish feature only means exporting several files at once.
>
> You can publish a single file, too. It makes sense when a file is always
> exported to the same location, possibly with the same configuration.
>
>> IIUC, what was written was that when using the publish feature, the exported
>> html pages will be coherent and a link in one document pointing to
>> another document of the same publish call won't be broken.
>>
>> But IIUC, publishing the whole stuff again will result in totally
>> different links. They will still be coherent and no broken link from one
>> document of the whole to another. But a browser bookmark pointing the
>> published lot the first time won't work with the same lot the second
>> time.
>>
>> Did I understand correctly?
>
> That's correct.
>
> Org provides a mechanism, called org-export-get-reference', for
> creating internal references, which relies on randomness + cache. But it
> explicitly removes internal references not actually used from there (see
> org-publish--store-crossrefs'). Keeping those references instead would
> make all links stable, of course.
>

Given this is not the first time we have seen a similar discussion
regarding link stability for external references, perhaps it would be
good to summarise and put it on worg for reference?

First attempt - let me know if I've got it close!

- If you need stability in TOC links between generated versions, use
Org's publish facility rather than plain HTML export.

- Publish can be used to publish a single file.

- 'something' in the published output needs to reference the TOC links
to ensure consistency.

HTML export lacks the internal caching/tracking necessary to have
capability would significantly complicate the HTML export code base.
This is hard to justify when this export facility is also used for
things like HTML fragments and because internal link stability is only
required in a sub-set of use cases.

The org publish facility already includes the necessary internal
facilities to support internal link consistency across published
versions. You can use publish to publish a single file. Currently, the
internal links need to be referenced/used in order to ensure consistency
across published versions.

If stability of TOC links across versions is required, using publish is
the preferred mechanism. If we would want to make it easier for the user
to create published pages with consistent internal TOC links, we would
be better off enhancing the publish mechanism rather than trying to add
such facilities to the HTML export function.

--
Tim Cross

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: stability of toc links
2021-04-30 11:12                          Nicolas Goaziou
2021-04-30 21:12                            Tim Cross
@ 2021-05-01  3:08                            Greg Minshall
1 sibling, 0 replies; 50+ messages in thread
From: Greg Minshall @ 2021-05-01  3:08 UTC (permalink / raw)
To: emacs-orgmode

> > The publish feature only means exporting several files at once.

> You can publish a single file, too. It makes sense when a file is always
> exported to the same location, possibly with the same configuration.

my model is that exporting is to publishing as, well, as org mode is to
latex.  the former is more user friendly; the latter gives me more
control (and more repeatability).

cheers, Greg

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: stability of toc links
2021-04-30 21:12                            Tim Cross
@ 2021-05-01 12:36                              Nicolas Goaziou
2021-05-01 12:48                                Timothy
From: Nicolas Goaziou @ 2021-05-01 12:36 UTC (permalink / raw)
To: Tim Cross; +Cc: emacs-orgmode, Samuel Loury

Hello,

Tim Cross <theophilusx@gmail.com> writes:

> Given this is not the first time we have seen a similar discussion
> regarding link stability for external references, perhaps it would be
> good to summarise and put it on worg for reference?
>
> First attempt - let me know if I've got it close!
>

[...]

> The org publish facility already includes the necessary internal
> facilities to support internal link consistency across published
> versions. You can use publish to publish a single file. Currently, the
> internal links need to be referenced/used in order to ensure consistency
> across published versions.

This is currently true.

> If stability of TOC links across versions is required, using publish is
> the preferred mechanism. If we would want to make it easier for the user
> to create published pages with consistent internal TOC links, we would
> be better off enhancing the publish mechanism rather than trying to add
> such facilities to the HTML export function.

I would nuance the latter.

In a nutshell, Org provides a reference generation mechanism, with the
function org-export-get-reference', which has with some interesting
properties. Major ones are:

- references consist of alphanumeric characters only, so they are /de
facto/ compatible with any target format;

- references are guaranteed to be unique in the document;

- cross-references between documents are stable.

It has obvious drawbacks, as noted already:

- references themselves are not informative;

- dangling links (i.e., not being cross-referenced) are not stable.

So, it was implemented as an adequate fall-back mechanism, one that
would not let down export back-end developers. It is clearly not
suitable for every need; I don't think such a system exists (but I would
love to be wrong).

It is important to understand that relying on org-export-get-reference'
is optional, when a better local solution exists. For example, Texinfo
exporter relies on that function only as a last resort, and will
circumvent it whenever it can. See org-texinfo--get-node'.

With the above in mind, I think there are two ways to "improve TOC links
stability":

1. Improve the system behind org-export-get-reference', e.g., so it
behaves better during plain export. But please keep in mind that
current major properties must be preserved.

2. Use an hybrid system in HTML exporter, much like Texinfo's.

Those two ways are orthogonal; both may solve the current issue
independently on the other. My gut feeling is the second way is easier
to achieve, at least on the short term.

Regards,
--
Nicolas Goaziou

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: stability of toc links
2021-05-01 12:36                              Nicolas Goaziou
@ 2021-05-01 12:48                                Timothy
2021-05-01 13:13                                  Nicolas Goaziou
From: Timothy @ 2021-05-01 12:48 UTC (permalink / raw)
To: Nicolas Goaziou; +Cc: Tim Cross, emacs-orgmode, Samuel Loury

Nicolas Goaziou <mail@nicolasgoaziou.fr> writes:

> With the above in mind, I think there are two ways to "improve TOC links
> stability":
>
> 1. Improve the system behind org-export-get-reference', e.g., so it
>    behaves better during plain export. But please keep in mind that
>    current major properties must be preserved.
>
> 2. Use an hybrid system in HTML exporter, much like Texinfo's.
>
> Those two ways are orthogonal; both may solve the current issue
> independently on the other. My gut feeling is the second way is easier
> to achieve, at least on the short term.

I'm not quite sure what you mean by a hybrid system, but I think
generating informative references based on header content is a nice
improvement that also make links much more stable.

Forgive me if this is an overly simplistic quicktake.

--
Timothy.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: stability of toc links
2021-05-01 12:48                                Timothy
@ 2021-05-01 13:13                                  Nicolas Goaziou
2021-05-01 13:47                                    Timothy
From: Nicolas Goaziou @ 2021-05-01 13:13 UTC (permalink / raw)
To: Timothy; +Cc: Tim Cross, emacs-orgmode, Samuel Loury

Hello,

Timothy <tecosaur@gmail.com> writes:

> Nicolas Goaziou <mail@nicolasgoaziou.fr> writes:
>
>> With the above in mind, I think there are two ways to "improve TOC links
>> stability":
>>
>> 1. Improve the system behind org-export-get-reference', e.g., so it
>>    behaves better during plain export. But please keep in mind that
>>    current major properties must be preserved.
>>
>> 2. Use an hybrid system in HTML exporter, much like Texinfo's.
>>
>> Those two ways are orthogonal; both may solve the current issue
>> independently on the other. My gut feeling is the second way is easier
>> to achieve, at least on the short term.
>
> I'm not quite sure what you mean by a hybrid system,

As I explained, it is a system where the back-end generates its own
references, and relies on org-export-get-reference' as a robust
fall-back system. Have a look at org-texinfo--get-node' function to see
what I mean.

> but I think generating informative references based on header content
> is a nice improvement that also make links much more stable.

I don't think such a system would preserve properties offered by
org-export-get-reference', for example in non-latin languages. Also,
header content is not stable enough: when you're linking to the custom
ID, you may be able to change the title and yet preserve the link.

However, your proposal may be an improvement over
org-export-get-reference' in the context of "ox-html.el", that I don't
know. That's the second way I wrote about.

Regards,
--
Nicolas Goaziou

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: stability of toc links
2021-05-01 13:13                                  Nicolas Goaziou
@ 2021-05-01 13:47                                    Timothy
2021-05-01 14:09                                      Nicolas Goaziou
From: Timothy @ 2021-05-01 13:47 UTC (permalink / raw)
To: Nicolas Goaziou; +Cc: Tim Cross, emacs-orgmode, Samuel Loury

Nicolas Goaziou <mail@nicolasgoaziou.fr> writes:

> I don't think such a system would preserve properties offered by
> org-export-get-reference', for example in non-latin languages. Also,
> header content is not stable enough: when you're linking to the custom
> ID, you may be able to change the title and yet preserve the link.
>
> However, your proposal may be an improvement over
> org-export-get-reference' in the context of "ox-html.el", that I don't
> know. That's the second way I wrote about.

My proposal basically replaces the random hex generating function in
org-export-get-reference' with a call to a function
org-reference-contraction' which takes a header and outputs a string
like "some-header-string" and then has digits appended should it be
For certain recognised blocks (e.g. src and example blocks) it appends

I consider this to be quite a nice improvement.

--
Timothy

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: stability of toc links
2021-05-01 13:47                                    Timothy
@ 2021-05-01 14:09                                      Nicolas Goaziou
2021-05-01 14:22                                        Timothy
From: Nicolas Goaziou @ 2021-05-01 14:09 UTC (permalink / raw)
To: Timothy; +Cc: Tim Cross, emacs-orgmode, Samuel Loury

Timothy <tecosaur@gmail.com> writes:

> Nicolas Goaziou <mail@nicolasgoaziou.fr> writes:
>
>> I don't think such a system would preserve properties offered by
>> org-export-get-reference', for example in non-latin languages. Also,
>> header content is not stable enough: when you're linking to the custom
>> ID, you may be able to change the title and yet preserve the link.
>>
>> However, your proposal may be an improvement over
>> org-export-get-reference' in the context of "ox-html.el", that I don't
>> know. That's the second way I wrote about.
>
> My proposal basically replaces the random hex generating function in
> org-export-get-reference' with a call to a function
> org-reference-contraction' which takes a header and outputs a string
> like "some-header-string" and then has digits appended should it be
> For certain recognised blocks (e.g. src and example blocks) it appends
>
> I consider this to be quite a nice improvement.

This is mostly what Texinfo exporter does, too. So it must be suitable
in some contexts, indeed.

However, I wouldn't consider it as an improvement over generic
org-export-get-reference'. I pointed out some concerns I have about the
robustness of this system already. I don't think you answered to any of
them. I fear we may be communicating past each other in this thread.

Regards,

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: stability of toc links
2021-05-01 14:09                                      Nicolas Goaziou
@ 2021-05-01 14:22                                        Timothy
2021-05-02 12:10                                          Nicolas Goaziou
From: Timothy @ 2021-05-01 14:22 UTC (permalink / raw)
To: Nicolas Goaziou; +Cc: Tim Cross, emacs-orgmode, Samuel Loury

Nicolas Goaziou <mail@nicolasgoaziou.fr> writes:

> I pointed out some concerns I have about the robustness of this system
> already. I don't think you answered to any of them. I fear we may be
> communicating past each other in this thread.

Sorry about that. I'll try to address the bits I've missed in these last
few emails.

Just in reference to the part of your prior email that I didn't address

> I don't think such a system would preserve properties offered by
> org-export-get-reference', for example in non-latin languages.

Looking back on your recent email where you list properties:

> references consist of alphanumeric characters only, so they are /de
> facto/ compatible with any target format;

This is uses characters from [a-z0-9-]

> references are guaranteed to be unique in the document;

The suffixed number I mentioned ensures this.

> cross-references between documents are stable.

I'm not quite sure what to make of this.

> Also, header content is not stable enough: when you're linking to the
> custom ID, you may be able to change the title and yet preserve the

Custom IDs still work, so I don't quite see the point here.

> I pointed out some concerns I have about the robustness of this system

If there are any other concerns than what I've covered above, I'm afraid
they have slipped past me.

--
Timothy

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: stability of toc links
2021-05-01 14:22                                        Timothy
@ 2021-05-02 12:10                                          Nicolas Goaziou
2021-05-02 20:16                                            Timothy
From: Nicolas Goaziou @ 2021-05-02 12:10 UTC (permalink / raw)
To: Timothy; +Cc: Tim Cross, emacs-orgmode, Samuel Loury

Hello,

Timothy <tecosaur@gmail.com> writes:

> Nicolas Goaziou <mail@nicolasgoaziou.fr> writes:
>
>> I pointed out some concerns I have about the robustness of this system
>> already. I don't think you answered to any of them. I fear we may be
>> communicating past each other in this thread.
>
> Sorry about that. I'll try to address the bits I've missed in these last
> few emails.

Please note that those short answers did not help me much. So I did my
homework and looked at your code. I didn't test it thoroughly, so I may
be missing something.

>> references consist of alphanumeric characters only, so they are /de
>> facto/ compatible with any target format;
>
> This is uses characters from [a-z0-9-]

Indeed. I didn't know about punycode. It has very interesting
properties.

Now, here's the elephant in the room: "puny.el" was included in Emacs
26.1. Org cannot make use of it yet.

Also, the bootstring algorithm, and yours, are very much
English-centered, as can attest
org-reference-contraction-stripped-words'. I insisted on non-latin
languages for a reason:

(org-reference-contraction "こんにちは") =>  "28j2a3ar1p-"

or, for a not so long title

(org-reference-contraction "こんにちは ｺﾝﾆﾁﾊ") => "v8ttbvbva7si998jvba0bzb0m-"

which is arguably worse than "org1234567".

>> references are guaranteed to be unique in the document;
>
> The suffixed number I mentioned ensures this.

Unfortunately, because of them, you cannot guarantee stable links during
export, much like random references.

For example, if you first export

* Foo
bar

and if you later modify your document like this

* Foo
baz
* Foo
bar

As a side note, this the reason why I introduced randomness in
references in the first place. We cannot reference first headline as
because we cannot assume their order is fixed.

More importantly, the above is not limited to headlines with the exact
same title. Since your algorithm truncates output, this will happen in
various, less obvious, situations.

>> cross-references between documents are stable.
>
> I'm not quite sure what to make of this.

Since you don't implement something new but re-use the existing caching
mechanism, I don't think this is an issue.

>> Also, header content is not stable enough: when you're linking to the
>> custom ID, you may be able to change the title and yet preserve the
>
> Custom IDs still work, so I don't quite see the point here.

How can you be sure?

The point is that in some export back-ends, e.g., ASCII, you will only
provide a single reference for a headline, i.e., not one for the title
and another one for the custom ID. If your reference is based solely on
the title, the reference will break whenever you modify the title
without touching custom ID. I gave an example in an earlier post
already. This is a regression wrt the current system.

In a nutshell:

- there are very interesting points in your proposal;

- it is not applicable at the moment;

- it greatly improves references for English language, it is slightly
better for latin languages, and worse for non-latin ones;

- it does not guarantee link stability during export;

- it introduces a regression wrt custom ID.

Notwithstanding the problem of "puny.el", the regression makes it not
suitable as a drop-in replacement for random org-export-get-reference'
yet. With more work, it can become an interesting evolution of
org-export-get-reference', however. Since this regression does not
affect HTML export back-ends, it could be used there meanwhile.

Link stability is still an issue, even if the proposal gives a false
sense of security in that area. I don't think we can solve it without
creating a cache for export, where you store all previous references for
a given file. Even this is not sufficient, because you can export
buffers not attached to files.

Regards,
--
Nicolas Goaziou

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: stability of toc links
2021-05-02 12:10                                          Nicolas Goaziou
@ 2021-05-02 20:16                                            Timothy
0 siblings, 0 replies; 50+ messages in thread
From: Timothy @ 2021-05-02 20:16 UTC (permalink / raw)
To: Nicolas Goaziou; +Cc: Tim Cross, emacs-orgmode, Samuel Loury

Nicolas Goaziou <mail@nicolasgoaziou.fr> writes:

> Please note that those short answers did not help me much. So I did my
> homework and looked at your code. I didn't test it thoroughly, so I may
> be missing something.

It's a pity to hear that I wasn't able to suitably clarify things in my
reply. Thank you for being willing to investigate my implementation.

> Now, here's the elephant in the room: "puny.el" was included in Emacs
> 26.1. Org cannot make use of it yet.

Gah.

> Also, the bootstring algorithm, and yours, are very much
> English-centered, as can attest
> org-reference-contraction-stripped-words'. I insisted on non-latin
> languages for a reason:
>
>        (org-reference-contraction "こんにちは") =>  "28j2a3ar1p-"
>
> or, for a not so long title
>
>   (org-reference-contraction "こんにちは ｺﾝﾆﾁﾊ") => "v8ttbvbva7si998jvba0bzb0m-"
>
> which is arguably worse than "org1234567".

Mmmm. This isn't great. I preferred the output of Unidecode (ASCII
transliteration) mentioned previously, but that doesn't look like it
could easily be used.

>>> references are guaranteed to be unique in the document;
>>
>> The suffixed number I mentioned ensures this.
>
> Unfortunately, because of them, you cannot guarantee stable links during
> export, much like random references.
>
> For example, if you first export
>
>   * Foo
>   bar
>
> and if you later modify your document like this
>
>   * Foo
>   baz
>   * Foo
>   bar
>
>
> As a side note, this the reason why I introduced randomness in
> references in the first place. We cannot reference first headline as
> because we cannot assume their order is fixed.

From this I take it you'd rather a broken reference than an incorrect
one? I don't think there's any "good" solution here, just pick your
poison (and, no surprise, I prefer my way).

> More importantly, the above is not limited to headlines with the exact
> same title. Since your algorithm truncates output, this will happen in
> various, less obvious, situations.

While this is technically possible, I think it's worth noting that I
have never seen this in practice, and for reference I have documents
with hundreds of headings (250 in my config, for example).

>>> Also, header content is not stable enough: when you're linking to the
>>> custom ID, you may be able to change the title and yet preserve the
>>
>> Custom IDs still work, so I don't quite see the point here.
>
> How can you be sure?
>
> The point is that in some export back-ends, e.g., ASCII, you will only
> provide a single reference for a headline, i.e., not one for the title
> and another one for the custom ID. If your reference is based solely on
> the title, the reference will break whenever you modify the title
> without touching custom ID. I gave an example in an earlier post
> already. This is a regression wrt the current system.

I remain rather confused on this point. Say I have a document with the
following content:

:PROPERTIES:
:CUSTOM_ID: hey
:END:

In an HTML export I see:

[...] See <a href="#hey">1</a> or <a href="#hey">1</a></p>

In an ASCII export:

══════════════

See 1 or 1

In a LaTeX export:

\label{hey}
See \ref{hey} or \ref{hey}

etc.

I don't see how my code affects custom IDs.

> In a nutshell:
>
> - there are very interesting points in your proposal;

Glad you've found some things of interest.

> - it is not applicable at the moment;

I'm guessing this is solely due to punycode?

> - it greatly improves references for English language, it is slightly
>   better for latin languages, and worse for non-latin ones;
>
> - it does not guarantee link stability during export;

Indeed. However no approach that doesn't cache every heading with every
export does, and I find this /significantly/ improves stability.

> - it introduces a regression wrt custom ID.

See my confusion above.

> Link stability is still an issue, even if the proposal gives a false
> sense of security in that area. I don't think we can solve it without
> creating a cache for export, where you store all previous references for
> a given file. Even this is not sufficient, because you can export
> buffers not attached to files.

To me this is a case of "don't let the perfect be the enemy of the
good", though I do see that a false sense of security may be
problematic, I consider the benefits to outweigh this.

I hope you've found this reply more useful than my last,

Timothy.

^ permalink raw reply	[flat|nested] 50+ messages in thread

end of thread, other threads:[~2021-05-02 20:16 UTC | newest]

2020-12-08 23:28 stability of toc links Samuel Wales
2020-12-08 23:30  Samuel Wales
2020-12-09  1:39    Tom Gillespie
2020-12-12 21:51      TRS-80
2020-12-12 22:47        TRS-80
2020-12-09  2:48  TEC
2020-12-09  8:45    Diego Zamboni
2020-12-09  9:15    Carsten Dominik
2020-12-09 21:25      Samuel Wales
2020-12-10  9:55        Carsten Dominik
2020-12-10 12:49          TEC
2020-12-10 14:36            TEC
2020-12-11  7:51              Carsten Dominik
2020-12-19  6:41                Carsten Dominik
2020-12-19 11:22                  Ihor Radchenko
2021-04-18 21:02    Samuel Wales
2020-12-14 10:46  Dominique Dumont
2021-04-18 10:32  Nicolas Goaziou
2021-04-20  0:58    Samuel Wales
2021-04-20 10:34      Nicolas Goaziou
2021-04-21  0:33        Samuel Wales
2021-04-21  8:32          Nicolas Goaziou
2021-04-21 13:32            Samuel Loury
2021-04-21 16:24              Nicolas Goaziou
2021-04-23 15:15                Maxim Nikulin
2021-04-23 20:46                  Samuel Wales
2021-04-23 20:48                    Samuel Wales
2021-04-23 20:51                      Samuel Wales
2021-04-24  3:05                  Timothy
2021-04-25 17:01                Dominique Dumont
2021-04-30  6:24                  Timothy
2021-04-30 12:20                    Maxim Nikulin
2021-04-21 23:20              Samuel Wales
2021-04-21 23:30                Samuel Wales
2021-04-29 21:40                  TRS-80
2021-04-29 22:18                    Samuel Wales
2021-04-30  1:48                      TRS-80
2021-04-30  5:13                      Tim Cross
2021-04-30 10:02                        Samuel Loury
2021-04-30 11:12                          Nicolas Goaziou
2021-04-30 21:12                            Tim Cross
2021-05-01 12:36                              Nicolas Goaziou
2021-05-01 12:48                                Timothy
2021-05-01 13:13                                  Nicolas Goaziou
2021-05-01 13:47                                    Timothy
2021-05-01 14:09                                      Nicolas Goaziou
2021-05-01 14:22                                        Timothy
2021-05-02 12:10                                          Nicolas Goaziou
2021-05-02 20:16                                            Timothy
2021-05-01  3:08                            Greg Minshall


Code repositories for project(s) associated with this inbox:

https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).`