From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mp11.migadu.com ([2001:41d0:306:2d92::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by ms9.migadu.com with LMTPS id mMKaKCLXtWTd+QAASxT56A (envelope-from ) for ; Tue, 18 Jul 2023 02:04:50 +0200 Received: from aspmx1.migadu.com ([2001:41d0:306:2d92::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by mp11.migadu.com with LMTPS id aF6sKCLXtWSBIQEA9RJhRA (envelope-from ) for ; Tue, 18 Jul 2023 02:04:50 +0200 Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by aspmx1.migadu.com (Postfix) with ESMTPS id 5FE5849E22 for ; Tue, 18 Jul 2023 02:04:49 +0200 (CEST) Authentication-Results: aspmx1.migadu.com; dkim=pass header.d=gmail.com header.s=20221208 header.b=FYcCLAsJ; spf=pass (aspmx1.migadu.com: domain of "emacs-orgmode-bounces+larch=yhetil.org@gnu.org" designates 209.51.188.17 as permitted sender) smtp.mailfrom="emacs-orgmode-bounces+larch=yhetil.org@gnu.org"; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=yhetil.org; s=key1; t=1689638689; h=from:from:sender:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:in-reply-to:in-reply-to: references:references:list-id:list-help:list-unsubscribe: list-subscribe:list-post:dkim-signature; bh=RW+FAmIvdUoX4saJ5FlqYStuSOllFbreYOmEmFZo4Jg=; b=DF1/Fn/5NmdAIVcvIGv7ihI4kBCQ6b9NVs5b8AGSZ7IIH+gRvP4aWtnPBwdvq7fvbR77+j mOXnp+OFjyLOQgAFgw5Q2U1GIsXwokFVR3OMF/GpzSlNCGTQDVUA1BkFV8pkzEF1caMQZO 3tE8RFDQN31ZLqim0FmN1pV7NRX5gIBjnk7ma1c/41sHf379ZuhoXSZrr/Ctvu2cDr/Zbr yJIJSvudJJRH95MPqHvBiuA+iLB4o0oabAinx5a6GpAwWPgbvpIxLm/9PImJ1yrzAWk887 oBPHX9/jwh3jZvBirOiLQkU/a7GgXB989xSrVzZQe4iLSrhbYYjXzqS2go1Tqg== ARC-Seal: i=1; s=key1; d=yhetil.org; t=1689638689; a=rsa-sha256; cv=none; b=duQxv89syPCfCbI3okq+C62Gz7iqGtQEkvAr0asVFAN+Io1pb19eYhgAr/XhsK7TuxM0rv DhE8q/5QMTRafqca5aReVq5eDGSg/vvtlXCnLghrOMMpdmkSmynYePTjreUdh04lQsvXZX 8k06AG1/i0B5Gl95QOLXWPUr/VnCCHIYzBfFQUGdY7Km9EyyNQTPzHnGpsApzdD971b9dp nV3IiI/EvKz7P4ziSkNbfN+wYRrqHap9pyvK6sHOwB/roy8/ajylMQi04OOaq/cI0Hr5mY UcTbtAry3/2Wr4Bh0FPNLUTL709mTDAFsYx1CZkX7xjycx6cV7jsUHWJvgGiCw== ARC-Authentication-Results: i=1; aspmx1.migadu.com; dkim=pass header.d=gmail.com header.s=20221208 header.b=FYcCLAsJ; spf=pass (aspmx1.migadu.com: domain of "emacs-orgmode-bounces+larch=yhetil.org@gnu.org" designates 209.51.188.17 as permitted sender) smtp.mailfrom="emacs-orgmode-bounces+larch=yhetil.org@gnu.org"; dmarc=pass (policy=none) header.from=gmail.com Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1qLYBq-0005bm-Jz; Mon, 17 Jul 2023 20:03:42 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1qLYBp-0005bZ-5T for emacs-orgmode@gnu.org; Mon, 17 Jul 2023 20:03:41 -0400 Received: from mail-pg1-x52c.google.com ([2607:f8b0:4864:20::52c]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1qLYBn-0000gW-Bl; Mon, 17 Jul 2023 20:03:40 -0400 Received: by mail-pg1-x52c.google.com with SMTP id 41be03b00d2f7-51b4ef5378bso3890815a12.1; Mon, 17 Jul 2023 17:03:38 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1689638617; x=1692230617; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=RW+FAmIvdUoX4saJ5FlqYStuSOllFbreYOmEmFZo4Jg=; b=FYcCLAsJjUMwUzY/A2LaqznofngflgsJsHTcC+e6OQi5WO8r0HWsJJQRvR7R6wp2KY rJq7CqcCEjzdvT5aHDml5PFXEr5ERZthzxmopgNX4EVu87FEXyNfesb7ZA6JaNNJu9Ce Sx/GiNb66zETXyGfvgCTo2LOd8uJPC/pYah9aVkx8/AYiF/rFy1I1QRs275Flw5yWE6D y5QS8EVMoiSdpsONmT2h+pUBt/SsDZafXrXB+Mr6lWSB43mfnk+0+6jX86mH9uvjpxdK Cj5i6HyD/sGfk3rgFjZJioXllS3t0tM2a9SmS3t+NcGQWHWzKriQmP2EB4Ot/h/05LiK 5yCg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1689638617; x=1692230617; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=RW+FAmIvdUoX4saJ5FlqYStuSOllFbreYOmEmFZo4Jg=; b=HIcV1Gx60mHVrb3fPdJKhxaaVQlQL/bZiNLc9rO8LJdSKZqX3tLB2/a1lL45DA6N61 Uhf/La21Re93QqBpAVoxz77sNOMNkHePad2rjZhNP0Mvl45++6bG3kC/TWqnWPtbTjHd QzWUknrRhKaAKautyW9eCojPW6it+FWGdlZJ5h89ppTs2PJyGnqbfLeRY9SYRK2qhbXG teLeMRZBljpd3th0U5DDapBvDDLT1DKVFA/X98nWSe6oph2iP6NHk78NTMxJ3iKkBLp+ Ra9+ah6HKehT9VRmKe2PGahfVBXZg4F3ep+rKAo0RNUEcyvdZBcEH43J3xzb+6QCcq+Q BIew== X-Gm-Message-State: ABy/qLZeseWy8GIqaUfnQSf3M3eNPZ0WBdNV1bEOohrdvFilz/r1cB7u N7apY/XfqxjLx87EvgeaLr4fp4fFaxM8z2nGaus= X-Google-Smtp-Source: APBJJlEO2dr7i/9IlHd9ntdndqQWxwKwjKg9BsYGnySCz+DVTwIfM3dSJbouX1FDoqNUolHHYbC6wT74f62NNceINB8= X-Received: by 2002:a17:90a:d3c5:b0:25e:886b:c6b with SMTP id d5-20020a17090ad3c500b0025e886b0c6bmr10914590pjw.48.1689638617094; Mon, 17 Jul 2023 17:03:37 -0700 (PDT) MIME-Version: 1.0 References: <87o86mw86r.fsf@localhost> <87fsrxkahq.fsf@nicolasgoaziou.fr> <87fsrxa1j5.fsf@localhost> <878rxoa6lk.fsf@localhost> <87tug93b2a.fsf@localhost> <87y25l8wvs.fsf@nicolasgoaziou.fr> <87r1bd39ny.fsf@localhost> <8735nsv9qo.fsf@nicolasgoaziou.fr> <87mtm09xzf.fsf@localhost> <87zgq02ueq.fsf@nicolasgoaziou.fr> <87h7c89rqr.fsf@localhost> <874k86y997.fsf@nicolasgoaziou.fr> <87v90lzwkm.fsf@localhost> <874jm2kb7x.fsf@localhost> In-Reply-To: <874jm2kb7x.fsf@localhost> From: Tom Gillespie Date: Mon, 17 Jul 2023 17:03:25 -0700 Message-ID: Subject: Re: Org markup and non-ASCII punctuation (was: org parser and priorities of inline elements) To: Ihor Radchenko Cc: Max Nikulin , emacs-orgmode@gnu.org, Timothy , Bastien Content-Type: text/plain; charset="UTF-8" Received-SPF: pass client-ip=2607:f8b0:4864:20::52c; envelope-from=tgbugs@gmail.com; helo=mail-pg1-x52c.google.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: emacs-orgmode@gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "General discussions about Org-mode." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-orgmode-bounces+larch=yhetil.org@gnu.org Sender: emacs-orgmode-bounces+larch=yhetil.org@gnu.org X-Migadu-Flow: FLOW_IN X-Migadu-Country: US X-Migadu-Scanner: mx2.migadu.com X-Migadu-Spam-Score: -5.12 X-Spam-Score: -5.12 X-Migadu-Queue-Id: 5FE5849E22 X-TUID: tyeTRsopYmHU Hi Ihor, Thank you for looping me in. Best, Tom The way I have implemented this is by maintaining an explicit list of characters that are safe for pre markup and another for post markup. It is not possible to use unicode punctuation for this because there are a variety of punctuation marks that cannot appear in that position and be considered markup, those include @, #, % to name just a few. Therefore, if we want to do this we commit to extending and then maintaining the lists of valid pre and post markup delimiters as special cases. Note also this could produce changes from current behavior because things that previously tokenized as a series of words connected by e.g. underscores could become markup. The alternative would be (as usual in these cases) for the user to add a zero width space or something like that between the end of the markup marker and the symbol they want to follow the markup. This solution is (trivially) backward compatible, and works for all chars regardless of whether org-mode has blessed them as sanctioned marks. My inclination would be not to make this change because there are a potentially infinite number of future "left right neutral" marks that we would have to maintain and would occasionally have to field requests from users to add them, and those solutions would not work with older versions of org.