From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mp0 ([2001:41d0:2:4a6f::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by ms0.migadu.com with LMTPS id 6GgPE7khomGaHwAAgWs5BA (envelope-from ) for ; Sat, 27 Nov 2021 13:16:57 +0100 Received: from aspmx1.migadu.com ([2001:41d0:2:4a6f::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by mp0 with LMTPS id WOnXDrkhomHrDgAA1q6Kng (envelope-from ) for ; Sat, 27 Nov 2021 12:16:57 +0000 Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by aspmx1.migadu.com (Postfix) with ESMTPS id B094F34B76 for ; Sat, 27 Nov 2021 13:16:56 +0100 (CET) Received: from localhost ([::1]:51494 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1mqwdT-0004fy-7N for larch@yhetil.org; Sat, 27 Nov 2021 07:16:55 -0500 Received: from eggs.gnu.org ([209.51.188.92]:37820) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1mqwcx-0004e4-7J for emacs-orgmode@gnu.org; Sat, 27 Nov 2021 07:16:24 -0500 Received: from ciao.gmane.io ([116.202.254.214]:57664) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1mqwcv-0004ca-Fo for emacs-orgmode@gnu.org; Sat, 27 Nov 2021 07:16:22 -0500 Received: from list by ciao.gmane.io with local (Exim 4.92) (envelope-from ) id 1mqwcs-0003c5-RR for emacs-orgmode@gnu.org; Sat, 27 Nov 2021 13:16:18 +0100 X-Injected-Via-Gmane: http://gmane.org/ To: emacs-orgmode@gnu.org From: Max Nikulin Subject: org parser and priorities of inline elements Date: Sat, 27 Nov 2021 19:16:08 +0700 Message-ID: References: <87o86mw86r.fsf@localhost> <87fsrxkahq.fsf@nicolasgoaziou.fr> <87fsrxa1j5.fsf@localhost> <878rxoa6lk.fsf@localhost> <87tug93b2a.fsf@localhost> <87y25l8wvs.fsf@nicolasgoaziou.fr> <87r1bd39ny.fsf@localhost> <8735nsv9qo.fsf@nicolasgoaziou.fr> <87mtm09xzf.fsf@localhost> <87zgq02ueq.fsf@nicolasgoaziou.fr> <87h7c89rqr.fsf@localhost> <874k86y997.fsf@nicolasgoaziou.fr> <87v90lzwkm.fsf@localhost> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.14.0 In-Reply-To: <87v90lzwkm.fsf@localhost> Content-Language: en-US Received-SPF: pass client-ip=116.202.254.214; envelope-from=geo-emacs-orgmode@m.gmane-mx.org; helo=ciao.gmane.io X-Spam_score_int: 28 X-Spam_score: 2.8 X-Spam_bar: ++ X-Spam_report: (2.8 / 5.0 requ) BAYES_00=-1.9, DKIM_ADSP_CUSTOM_MED=0.001, FORGED_GMAIL_RCVD=1, FORGED_MUA_MOZILLA=2.309, FREEMAIL_FORGED_FROMDOMAIN=0.248, FREEMAIL_FROM=0.001, HEADER_FROM_DIFFERENT_DOMAINS=0.249, NML_ADSP_CUSTOM_MED=0.9, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=no autolearn_force=no X-Spam_action: no action X-BeenThere: emacs-orgmode@gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "General discussions about Org-mode." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-orgmode-bounces+larch=yhetil.org@gnu.org Sender: "Emacs-orgmode" X-Migadu-Flow: FLOW_IN X-Migadu-Country: US ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=yhetil.org; s=key1; t=1638015416; h=from:from:sender:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:list-id:list-help: list-unsubscribe:list-subscribe:list-post; bh=lDYOXMN530nyaoAFL/1NPzoHWlSOcVACGMcS7h/GTlI=; b=u5RJ9432t0EB6vYYaotzxkOzjUoao3QQibogJ0AYmg8QKQo99xPccS9NfSPCbavk8l1teQ qy/qzGtP7JXV0+0qtIytpuFnz7+8UCICzbX6zcrJzbqxtehMhHccSridvIwxNBssjlK9zq jd7YyRQRElv+ouKpqS6uLYDaVmnZqm52EluSiv5lGLKvODRuCVJzeeiZwWEbCefjX+r8MI BAGfpqdhFMU8KWdlATLyr/N29Iarbgrc25KDvEfAqgKdEcxROqmEv+fiAmuuusN72DYfdR JUGimJOfoD+tv+cCOxm0WAvu0ETWfSIUprWWY/zts55jPdbzn4aCULEyjIBpng== ARC-Seal: i=1; s=key1; d=yhetil.org; t=1638015416; a=rsa-sha256; cv=none; b=EWozFG2DmDEstT8COjN6W3KoN9+y1iwjLCAAbIdxSbOGpVyXWGhKqQbD2FBGfw0KKK5BFS VQ0+rYSeh+bAfc6uG5FjYc16n1IjDfPPfZyWOKW1hejU6asjfSGq1K9pelm7PL0WeeJ2Lc EeE3Ip1QvgfcHnianTPUgkZLWUUZ+I6dz+LbCb6vsOHz+/zcR/J262bdYTG8noepKdkScI 8wpamGjLVYDYVGbxEtboFvT1zQoap/6zT2tAAVQbB6f3XrJRLBfL+20UaM13E7N+BN5aA/ KYj1pOnE6JDc9SUvrx4OETpN1/vIeQ2BhVbdjtuD06Rr022V90MfVL8LTYf9mA== ARC-Authentication-Results: i=1; aspmx1.migadu.com; dkim=none; dmarc=fail reason="SPF not aligned (relaxed), No valid DKIM" header.from=gmail.com (policy=none); spf=pass (aspmx1.migadu.com: domain of "emacs-orgmode-bounces+larch=yhetil.org@gnu.org" designates 209.51.188.17 as permitted sender) smtp.mailfrom="emacs-orgmode-bounces+larch=yhetil.org@gnu.org" X-Migadu-Spam-Score: -2.01 Authentication-Results: aspmx1.migadu.com; dkim=none; dmarc=fail reason="SPF not aligned (relaxed), No valid DKIM" header.from=gmail.com (policy=none); spf=pass (aspmx1.migadu.com: domain of "emacs-orgmode-bounces+larch=yhetil.org@gnu.org" designates 209.51.188.17 as permitted sender) smtp.mailfrom="emacs-orgmode-bounces+larch=yhetil.org@gnu.org" X-Migadu-Queue-Id: B094F34B76 X-Spam-Score: -2.01 X-Migadu-Scanner: scn0.migadu.com X-TUID: oP14gE2pbAvj On 21/11/2021 16:28, Ihor Radchenko wrote: > > Also, is there any reason why we are not simply using punctuation > character class instead of listing punctuation chars explicitly (and > only for English)? What about "_你叫什么名字_?" It seems punctuation character class is too broad. E.g. ¿ INVERTED QUESTION MARK normally appears before words, while "?" is usually after them. I do not see anything special in (category-set-mnemonics (char-category-set ?¿)) that may help to discriminate such cases. An example that confuses fontification but not parser: : false [[http://te.st/dir?b-=&a=-][verbatim]] fontification It is a simplified example, original one: Chris Hunt. Bug: Tildes in URL impact visible link text Sun, 27 Dec 2020 11:44:07 -0500. https://list.orgmode.org/CAH+Wm4-_XHUZKFTf=ZtbfnCPvQWkbEoeGs8EpYm+8SPmu8LHFg@mail.gmail.com/ Nicolas Goaziou. Thu, 18 Nov 2021 13:35:19 +0100. https://list.orgmode.org/87y25l8wvs.fsf@nicolasgoaziou.fr > Ihor Radchenko writes: > >> My intuition says that the current parser behaviour is not correct. It >> would make more sense to prioritise link over italics. However, it would >> require a major change in the parser - instead of a single pass, the >> parser may parse different types of objects sequentially. The emphasis >> objects should come last avoiding the markers to have different parents. > > I disagree. Priority should be given to the first object being started. > This is, IMO, the only sane way to handle syntax. Origin of such expectation is not only TeX that changes category of characters for argument of verbatim commands. In markdown links and code have higher priorities than emphasis as well: echo 'A _b `c_ d` e_ f' | pandoc -f markdown -t html -

A b c_ d e f

Org: A _b =c_ d= e_ f export result (it is more concise and easier to read than output of `org-element-parse-secondary-string'):

A b =c d= e_ f

Link in markdown: echo 'A _b c d e_ f' \ | pandoc -f markdown -t html -

A b c https://orgmode.org/index.htm_?k=v d e f

Org:

A b /c <https://orgmode.org/index.htm?k=v> d/ e_ f

I can not estimate efforts necessary to implement priorities of objects (verbatim - link - emphasis) in org-elements parser since I have not looked into its code. Comparing the following snippets, I might naively expect some kind of backtracking: - A /b *c +d e+ f* g/ h - A /b *c +d f* e+ h I admit that I can be wrong and "first wins" approach handles buffer of incomplete parsed entities in a different way. P.S. In reStructured text simple nesting is not allowed, maybe it is possible to use replacements.