emacs-orgmode@gnu.org archives
 help / color / mirror / code / Atom feed
From: Max Nikulin <manikulin@gmail.com>
To: emacs-orgmode@gnu.org
Subject: org parser and priorities of inline elements
Date: Sat, 27 Nov 2021 19:16:08 +0700	[thread overview]
Message-ID: <snt7ia$ct9$1@ciao.gmane.io> (raw)
In-Reply-To: <87v90lzwkm.fsf@localhost>

On 21/11/2021 16:28, Ihor Radchenko wrote:
> 
> Also, is there any reason why we are not simply using punctuation
> character class instead of listing punctuation chars explicitly (and
> only for English)? What about "_你叫什么名字_?"

It seems punctuation character class is too broad. E.g.
     ¿ INVERTED QUESTION MARK
normally appears before words, while "?" is usually after them. I do not 
see anything special in
     (category-set-mnemonics (char-category-set ?¿))
that may help to discriminate such cases.

An example that confuses fontification but not parser:
     : false [[http://te.st/dir?b-=&a=-][verbatim]] fontification
It is a simplified example, original one:
Chris Hunt. Bug: Tildes in URL impact visible link text
Sun, 27 Dec 2020 11:44:07 -0500.
https://list.orgmode.org/CAH+Wm4-_XHUZKFTf=ZtbfnCPvQWkbEoeGs8EpYm+8SPmu8LHFg@mail.gmail.com/

Nicolas Goaziou. Thu, 18 Nov 2021 13:35:19 +0100.
https://list.orgmode.org/87y25l8wvs.fsf@nicolasgoaziou.fr
> Ihor Radchenko writes:
> 
>> My intuition says that the current parser behaviour is not correct. It
>> would make more sense to prioritise link over italics. However, it would
>> require a major change in the parser - instead of a single pass, the
>> parser may parse different types of objects sequentially. The emphasis
>> objects should come last avoiding the markers to have different parents.
> 
> I disagree. Priority should be given to the first object being started.
> This is, IMO, the only sane way to handle syntax.

Origin of such expectation is not only TeX that changes category of 
characters for argument of verbatim commands. In markdown links and code 
have higher priorities than emphasis as well:

     echo 'A _b `c_ d` e_ f' | pandoc -f markdown -t html -
     <p>A <em>b <code>c_ d</code> e</em> f</p>

Org:
     A _b =c_ d= e_ f
export result (it is more concise and easier to read than output of 
`org-element-parse-secondary-string'):
     <p>
     A <span class="underline">b =c</span> d= e_ f
     </p>

Link in markdown:

     echo 'A _b c <https://orgmode.org/index.htm_?k=v> d e_ f' \
          | pandoc -f markdown -t html -
     <p>A <em>b c <a href="https://orgmode.org/index.htm_?k=v" 
class="uri">https://orgmode.org/index.htm_?k=v</a> d e</em> f</p>

Org:
     <p>
     A <span class="underline">b /c &lt;<a 
href="https://orgmode.org/index.htm">https://orgmode.org/index.htm</a></span>?k=v&gt; 
d/ e_ f
     </p>

I can not estimate efforts necessary to implement priorities of objects 
(verbatim - link - emphasis) in org-elements parser since I have not 
looked into its code. Comparing the following snippets, I might naively 
expect some kind of backtracking:

- A /b *c +d e+ f*    g/ h
- A /b *c +d    f* e+    h

I admit that I can be wrong and "first wins" approach handles buffer of 
incomplete parsed entities in a different way.

P.S. In reStructured text simple nesting is not allowed, maybe it is 
possible to use replacements.



  parent reply	other threads:[~2021-11-27 12:16 UTC|newest]

Thread overview: 41+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-11-15  0:53 c47b535bb origin/main org-element: Remove dependency on ‘org-emphasis-regexp-components’ Ihor Radchenko
2021-11-15  9:56 ` Nicolas Goaziou
2021-11-15 15:20   ` Ihor Radchenko
2021-11-15 16:25     ` Max Nikulin
2021-11-16  7:43       ` Ihor Radchenko
2021-11-16 21:56         ` Samuel Wales
2021-11-16 22:16           ` Samuel Wales
2021-11-17 16:44         ` Max Nikulin
2021-11-17 22:44           ` Samuel Wales
2021-11-18 12:25           ` Ihor Radchenko
2021-11-18 12:35             ` Nicolas Goaziou
2021-11-18 12:55               ` Ihor Radchenko
2021-11-19  8:18                 ` Nicolas Goaziou
2021-11-19 11:38                   ` [PATCH] " Ihor Radchenko
2021-11-19 12:37                     ` Nicolas Goaziou
2021-11-19 13:53                       ` Ihor Radchenko
2021-11-20 18:25                         ` Nicolas Goaziou
2021-11-21  9:28                           ` Ihor Radchenko
2021-11-22 18:44                             ` Nicolas Goaziou
2021-11-23 14:28                               ` Ihor Radchenko
2021-11-27 12:16                             ` Max Nikulin [this message]
2021-11-27 19:02                               ` org parser and priorities of inline elements Nicolas Goaziou
2023-07-17 11:51                               ` Org markup and non-ASCII punctuation (was: org parser and priorities of inline elements) Ihor Radchenko
2023-07-18  0:03                                 ` Tom Gillespie
2023-07-18  5:07                                   ` Ihor Radchenko
2023-07-18  5:40                                     ` Tom Gillespie
2023-07-18  9:45                                       ` Ihor Radchenko
2021-11-19 16:34             ` c47b535bb origin/main org-element: Remove dependency on ‘org-emphasis-regexp-components’ Max Nikulin
2021-11-20 12:02         ` Max Nikulin
2021-11-21 10:01           ` Ihor Radchenko
2021-11-21 16:36             ` Max Nikulin
2021-11-23 17:05             ` [PATCH] org.el: Warning for unsupported markers in `org-set-emphasis-alist' Max Nikulin
2022-11-04  6:53               ` Ihor Radchenko
2022-11-04 12:31                 ` Max Nikulin
2022-11-05  8:21                   ` Ihor Radchenko
2023-02-02 10:53                     ` [PATCH v5] " Ihor Radchenko
2023-02-06 15:11                       ` Max Nikulin
2023-02-06 16:49                       ` Max Nikulin
2023-02-07 10:47                         ` Should we obsolete org-emphasis-alist? (was: [PATCH v5] org.el: Warning for unsupported markers in `org-set-emphasis-alist') Ihor Radchenko
2023-02-07 12:22                           ` Timothy
2023-02-09 12:11                           ` Max Nikulin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.orgmode.org/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='snt7ia$ct9$1@ciao.gmane.io' \
    --to=manikulin@gmail.com \
    --cc=emacs-orgmode@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).