emacs-orgmode@gnu.org archives
 help / color / mirror / code / Atom feed
From: Nicolas Goaziou <mail@nicolasgoaziou.fr>
To: Timothy <tecosaur@gmail.com>
Cc: Tim Cross <theophilusx@gmail.com>,
	emacs-orgmode@gnu.org, Samuel Loury <konubinix@gmail.com>
Subject: Re: stability of toc links
Date: Sun, 02 May 2021 14:10:12 +0200	[thread overview]
Message-ID: <875z019w0r.fsf@nicolasgoaziou.fr> (raw)
In-Reply-To: <87a6pebkk8.fsf@gmail.com> (Timothy's message of "Sat, 01 May 2021 22:22:31 +0800")

Hello,

Timothy <tecosaur@gmail.com> writes:

> Nicolas Goaziou <mail@nicolasgoaziou.fr> writes:
>
>> I pointed out some concerns I have about the robustness of this system
>> already. I don't think you answered to any of them. I fear we may be
>> communicating past each other in this thread.
>
> Sorry about that. I'll try to address the bits I've missed in these last
> few emails.

Please note that those short answers did not help me much. So I did my
homework and looked at your code. I didn't test it thoroughly, so I may
be missing something.

>> references consist of alphanumeric characters only, so they are /de
>> facto/ compatible with any target format;
>
> This is uses characters from [a-z0-9-]

Indeed. I didn't know about punycode. It has very interesting
properties.

Now, here's the elephant in the room: "puny.el" was included in Emacs
26.1. Org cannot make use of it yet.

Also, the bootstring algorithm, and yours, are very much
English-centered, as can attest
`org-reference-contraction-stripped-words'. I insisted on non-latin
languages for a reason:

       (org-reference-contraction "こんにちは") =>  "28j2a3ar1p-"

or, for a not so long title

  (org-reference-contraction "こんにちは コンニチハ") => "v8ttbvbva7si998jvba0bzb0m-"

which is arguably worse than "org1234567".

>> references are guaranteed to be unique in the document;
>
> The suffixed number I mentioned ensures this.

Unfortunately, because of them, you cannot guarantee stable links during
export, much like random references.

For example, if you first export

  * Foo
  bar

and if you later modify your document like this

  * Foo
  baz
  * Foo
  bar

your link will now point to the "baz" contents instead of "bar". 

As a side note, this the reason why I introduced randomness in
references in the first place. We cannot reference first headline as
"headline-1", second one as "headline-2", i.e., in a monotonic way,
because we cannot assume their order is fixed.

More importantly, the above is not limited to headlines with the exact
same title. Since your algorithm truncates output, this will happen in
various, less obvious, situations.

>> cross-references between documents are stable.
>
> I'm not quite sure what to make of this.

Since you don't implement something new but re-use the existing caching
mechanism, I don't think this is an issue.

>> Also, header content is not stable enough: when you're linking to the
>> custom ID, you may be able to change the title and yet preserve the
>> link.
>
> Custom IDs still work, so I don't quite see the point here.

How can you be sure?

The point is that in some export back-ends, e.g., ASCII, you will only
provide a single reference for a headline, i.e., not one for the title
and another one for the custom ID. If your reference is based solely on
the title, the reference will break whenever you modify the title
without touching custom ID. I gave an example in an earlier post
already. This is a regression wrt the current system.

In a nutshell:

- there are very interesting points in your proposal;

- it is not applicable at the moment;

- it greatly improves references for English language, it is slightly
  better for latin languages, and worse for non-latin ones;

- it does not guarantee link stability during export;

- it introduces a regression wrt custom ID.

Notwithstanding the problem of "puny.el", the regression makes it not
suitable as a drop-in replacement for random `org-export-get-reference'
yet. With more work, it can become an interesting evolution of
`org-export-get-reference', however. Since this regression does not
affect HTML export back-ends, it could be used there meanwhile.

Link stability is still an issue, even if the proposal gives a false
sense of security in that area. I don't think we can solve it without
creating a cache for export, where you store all previous references for
a given file. Even this is not sufficient, because you can export
buffers not attached to files.

Regards,
-- 
Nicolas Goaziou


  reply	other threads:[~2021-05-02 12:11 UTC|newest]

Thread overview: 50+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-12-08 23:28 Samuel Wales
2020-12-08 23:30 ` Samuel Wales
2020-12-09  1:39   ` Tom Gillespie
2020-12-12 21:51     ` TRS-80
2020-12-12 22:47       ` TRS-80
2020-12-09  2:48 ` TEC
2020-12-09  8:45   ` Diego Zamboni
2020-12-09  9:15   ` Carsten Dominik
2020-12-09 21:25     ` Samuel Wales
2020-12-10  9:55       ` Carsten Dominik
2020-12-10 12:49         ` TEC
2020-12-10 14:36           ` TEC
2020-12-11  7:51             ` Carsten Dominik
2020-12-19  6:41               ` Carsten Dominik
2020-12-19 11:22                 ` Ihor Radchenko
2021-04-18 21:02   ` Samuel Wales
2020-12-14 10:46 ` Dominique Dumont
2021-04-18 10:32 ` Nicolas Goaziou
2021-04-20  0:58   ` Samuel Wales
2021-04-20 10:34     ` Nicolas Goaziou
2021-04-21  0:33       ` Samuel Wales
2021-04-21  8:32         ` Nicolas Goaziou
2021-04-21 13:32           ` Samuel Loury
2021-04-21 16:24             ` Nicolas Goaziou
2021-04-23 15:15               ` Maxim Nikulin
2021-04-23 20:46                 ` Samuel Wales
2021-04-23 20:48                   ` Samuel Wales
2021-04-23 20:51                     ` Samuel Wales
2021-04-24  3:05                 ` Timothy
2021-04-25 17:01               ` Dominique Dumont
2021-04-30  6:24                 ` Timothy
2021-04-30 12:20                   ` Maxim Nikulin
2021-04-21 23:20             ` Samuel Wales
2021-04-21 23:30               ` Samuel Wales
2021-04-29 21:40                 ` TRS-80
2021-04-29 22:18                   ` Samuel Wales
2021-04-30  1:48                     ` TRS-80
2021-04-30  5:13                     ` Tim Cross
2021-04-30 10:02                       ` Samuel Loury
2021-04-30 11:12                         ` Nicolas Goaziou
2021-04-30 21:12                           ` Tim Cross
2021-05-01 12:36                             ` Nicolas Goaziou
2021-05-01 12:48                               ` Timothy
2021-05-01 13:13                                 ` Nicolas Goaziou
2021-05-01 13:47                                   ` Timothy
2021-05-01 14:09                                     ` Nicolas Goaziou
2021-05-01 14:22                                       ` Timothy
2021-05-02 12:10                                         ` Nicolas Goaziou [this message]
2021-05-02 20:16                                           ` Timothy
2021-05-01  3:08                           ` Greg Minshall

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.orgmode.org/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=875z019w0r.fsf@nicolasgoaziou.fr \
    --to=mail@nicolasgoaziou.fr \
    --cc=emacs-orgmode@gnu.org \
    --cc=konubinix@gmail.com \
    --cc=tecosaur@gmail.com \
    --cc=theophilusx@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).