From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mp11.migadu.com ([2001:41d0:306:2d92::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by ms9.migadu.com with LMTPS id YA4uNmkrtWRvcgEASxT56A (envelope-from ) for ; Mon, 17 Jul 2023 13:52:09 +0200 Received: from aspmx1.migadu.com ([2001:41d0:306:2d92::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by mp11.migadu.com with LMTPS id gN7FNWkrtWS1WQEA9RJhRA (envelope-from ) for ; Mon, 17 Jul 2023 13:52:09 +0200 Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by aspmx1.migadu.com (Postfix) with ESMTPS id 662B64AA03 for ; Mon, 17 Jul 2023 13:52:09 +0200 (CEST) Authentication-Results: aspmx1.migadu.com; dkim=pass header.d=posteo.net header.s=2017 header.b=OH2xcJkV; spf=pass (aspmx1.migadu.com: domain of "emacs-orgmode-bounces+larch=yhetil.org@gnu.org" designates 209.51.188.17 as permitted sender) smtp.mailfrom="emacs-orgmode-bounces+larch=yhetil.org@gnu.org"; dmarc=pass (policy=none) header.from=posteo.net ARC-Seal: i=1; s=key1; d=yhetil.org; t=1689594729; a=rsa-sha256; cv=none; b=mEFKJlC14QKk+/8GfVskwHlLM+ZFN5JkfwIvZT41IHUKNxXm7ae4+HEBm3DR2SfHZ1WNFE Gvhtyj9b2MJdTpbveDeB611LrfynoIQXxoZ+yfeHDiTtoMOl9JUALeoefXbCpub5t8bxxK VqPgEQeEcDswZtTCBf8XaCs8x2ZMAhviLKvXD0SnOnKXl7xIvHYnK/EBrQOmimapohQ978 8Btc2boM6KMYzWlTSQvIaMZSCv9gMEkT/7VUmCG0zgj0yojKcSbKxvBViTEyZwY8HxO5JN 72CVZTTj8fWytOH4wDFmekad0nQ47NBRXh5NXbpDCl/lkM3W2IDYGj76oxo8Ow== ARC-Authentication-Results: i=1; aspmx1.migadu.com; dkim=pass header.d=posteo.net header.s=2017 header.b=OH2xcJkV; spf=pass (aspmx1.migadu.com: domain of "emacs-orgmode-bounces+larch=yhetil.org@gnu.org" designates 209.51.188.17 as permitted sender) smtp.mailfrom="emacs-orgmode-bounces+larch=yhetil.org@gnu.org"; dmarc=pass (policy=none) header.from=posteo.net ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=yhetil.org; s=key1; t=1689594729; h=from:from:sender:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:list-id:list-help: list-unsubscribe:list-subscribe:list-post:dkim-signature; bh=1Hr0F8zZtZ1hFHB7euzUef6LvU88VibC3vznpOSLc1c=; b=CTcuY0K0GSki5OmKvhwheW7Da1AioO+Wob1ThDrWQebGttPlk+Zq83c9Qnuq5k0CUgPVxv HnrdUsVSYTjQyfXFU471ZcVz12/5tOy5Df8wSjzkyNZHbgdsNwAm735vkmzpvB4qVXl8LJ i2HzP4+GP2rWuFRBneKpBM0A/0b+MK8qwIiFOdb3U8juxICJYeNVpxGgHqnjgdtAjMpcyH 7Nh7OoPuqU9QUauGE1MLO1z0YCstecJZVkIenzVmCDZUUOBNmEaQy1bMDl289WSXGXEFav fab465xAlP3cDQCbAQOA+tKP5QFB1oQVGcZHKgUCVyzZ2mmo6zJq3hz3ROm3xA== Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1qLMlA-00045N-FZ; Mon, 17 Jul 2023 07:51:25 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1qLMl2-00044p-2m for emacs-orgmode@gnu.org; Mon, 17 Jul 2023 07:51:17 -0400 Received: from mout02.posteo.de ([185.67.36.66]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1qLMks-00035I-L7 for emacs-orgmode@gnu.org; Mon, 17 Jul 2023 07:51:09 -0400 Received: from submission (posteo.de [185.67.36.169]) by mout02.posteo.de (Postfix) with ESMTPS id 53578240104 for ; Mon, 17 Jul 2023 13:51:03 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=posteo.net; s=2017; t=1689594663; bh=5RHXWTu25QpaNSe1EgKZFimdaqvihPrzkB433dy6+Jw=; h=From:To:Cc:Subject:Date:Message-ID:MIME-Version: Content-Transfer-Encoding:From; b=OH2xcJkVKlQve8Gq0LPVnUVigIfW4rAhq5bGyI6I6W7+As8BftOFYU7Kv4FTIb1tp tsUk7Kt9u8oZVclcRbMUM7x7LiLGh1+ylrqIfDVzX7C410YOGEgerg64Qr+WoYkcJL f73jp0irKK8vVYQ2bxDhoyAjZwzpKqBqWJoYC6wBRqPjANVxVLngdrBzz1zKs6QZuY Zqip8xwjcP4CjVp90efMbRYWpS+iEz5KX/igvmMH2lg41yIrD/DhWlv1w/ZBSfo+wF q7dyUXKmvDm3IQdgfdnBtcw89TnpChVEq9tTT14NyubCuZBk9iLYqg0ek6q1at6snV gCsW9hk4FshdQ== Received: from customer (localhost [127.0.0.1]) by submission (posteo.de) with ESMTPSA id 4R4L5218VBz6tws; Mon, 17 Jul 2023 13:51:02 +0200 (CEST) From: Ihor Radchenko To: Max Nikulin Cc: emacs-orgmode@gnu.org, Timothy , Tom Gillespie , Bastien Subject: Org markup and non-ASCII punctuation (was: org parser and priorities of inline elements) In-Reply-To: References: <87o86mw86r.fsf@localhost> <87fsrxkahq.fsf@nicolasgoaziou.fr> <87fsrxa1j5.fsf@localhost> <878rxoa6lk.fsf@localhost> <87tug93b2a.fsf@localhost> <87y25l8wvs.fsf@nicolasgoaziou.fr> <87r1bd39ny.fsf@localhost> <8735nsv9qo.fsf@nicolasgoaziou.fr> <87mtm09xzf.fsf@localhost> <87zgq02ueq.fsf@nicolasgoaziou.fr> <87h7c89rqr.fsf@localhost> <874k86y997.fsf@nicolasgoaziou.fr> <87v90lzwkm.fsf@localhost> Date: Mon, 17 Jul 2023 11:51:14 +0000 Message-ID: <874jm2kb7x.fsf@localhost> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Received-SPF: pass client-ip=185.67.36.66; envelope-from=yantar92@posteo.net; helo=mout02.posteo.de X-Spam_score_int: -43 X-Spam_score: -4.4 X-Spam_bar: ---- X-Spam_report: (-4.4 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_MED=-2.3, RCVD_IN_MSPIKE_H5=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=unavailable autolearn_force=no X-Spam_action: no action X-BeenThere: emacs-orgmode@gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "General discussions about Org-mode." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-orgmode-bounces+larch=yhetil.org@gnu.org Sender: emacs-orgmode-bounces+larch=yhetil.org@gnu.org X-Migadu-Flow: FLOW_IN X-Migadu-Country: US X-Migadu-Spam-Score: -9.47 X-Spam-Score: -9.47 X-Migadu-Queue-Id: 662B64AA03 X-Migadu-Scanner: mx2.migadu.com X-TUID: J2DNULg1WGe6 Max Nikulin writes: > On 21/11/2021 16:28, Ihor Radchenko wrote: >>=20 >> Also, is there any reason why we are not simply using punctuation >> character class instead of listing punctuation chars explicitly (and >> only for English)? What about "_=E4=BD=A0=E5=8F=AB=E4=BB=80=E4=B9=88=E5= =90=8D=E5=AD=97_=EF=BC=9F" > > It seems punctuation character class is too broad. E.g. > =C2=BF INVERTED QUESTION MARK > normally appears before words, while "?" is usually after them. I do not= =20 > see anything special in > (category-set-mnemonics (char-category-set ?=C2=BF)) > that may help to discriminate such cases. The last resort is define-category where we can manage exceptions. But I think that even without distinguishing ?=C2=BF, we can improve the situation for CJK users a lot. We can probably split character categories into "left", "right", and "neutral" with "(" being "left" example, ")" being "right" example, and " " being "neutral" example. We start from using the information we can extract from Unicode data and modify it as necessary. Then, emphasis will be defined as PRE MARKER ... MARKER POST with PRE =3D left+neutral category POST =3D right+neutral category --=20 Ihor Radchenko // yantar92, Org mode contributor, Learn more about Org mode at . Support Org development at , or support my work at