From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mp1 ([2001:41d0:2:4a6f::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by ms0.migadu.com with LMTPS id yGb4EFJk22DHKwAAgWs5BA (envelope-from ) for ; Tue, 29 Jun 2021 20:20:02 +0200 Received: from aspmx1.migadu.com ([2001:41d0:2:4a6f::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by mp1 with LMTPS id QI7kC1Jk22ByPQAAbx9fmQ (envelope-from ) for ; Tue, 29 Jun 2021 18:20:02 +0000 Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by aspmx1.migadu.com (Postfix) with ESMTPS id 5D3292673C for ; Tue, 29 Jun 2021 20:20:01 +0200 (CEST) Received: from localhost ([::1]:53950 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1lyIL1-0007e2-Bi for larch@yhetil.org; Tue, 29 Jun 2021 14:19:59 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:59378) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1lyIKe-0007du-JS for emacs-orgmode@gnu.org; Tue, 29 Jun 2021 14:19:36 -0400 Received: from ericabrahamsen.net ([52.70.2.18]:53738 helo=mail.ericabrahamsen.net) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1lyIKc-00047W-Fn for emacs-orgmode@gnu.org; Tue, 29 Jun 2021 14:19:36 -0400 Received: from localhost (c-71-197-184-122.hsd1.wa.comcast.net [71.197.184.122]) (Authenticated sender: eric@ericabrahamsen.net) by mail.ericabrahamsen.net (Postfix) with ESMTPSA id CF67CFA09B; Tue, 29 Jun 2021 18:19:24 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ericabrahamsen.net; s=mail; t=1624990765; bh=DW+V3Xb8cmOydlnQgetjltiKkr8L1Wiy7oreGsyQDsQ=; h=From:To:Cc:Subject:References:Date:In-Reply-To:From; b=Yl56Jaa/jGUMslHCRJy/swlNHm9J8rvsmhiOcGwvZmJM41KjgItrp5MH29ZVo6RA6 /IG8Yw6DfYP5Jfjt1fx9xoP9Q1EjN9/ZskiDg8xvaGwJeHHfBy/cujOXxj24RxEPat IeM4Kl+KsEsKx/O+Rv6hGNVzssNXYl+F55DfBUrU= From: Eric Abrahamsen To: Maxim Nikulin Subject: Re: Bug: ODT export of Chinese text inserts spaces for line breaks References: <17a55e0b01d.11be78c6c72761.7557666657037565597@zoho.com> <557d5f5d.2eed.17a56147ed0.Coremail.tumashu@163.com> Date: Tue, 29 Jun 2021 11:19:23 -0700 In-Reply-To: (Maxim Nikulin's message of "Wed, 30 Jun 2021 00:01:00 +0700") Message-ID: <87sg10ttxw.fsf@ericabrahamsen.net> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/28.0.50 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Received-SPF: pass client-ip=52.70.2.18; envelope-from=eric@ericabrahamsen.net; helo=mail.ericabrahamsen.net X-Spam_score_int: -43 X-Spam_score: -4.4 X-Spam_bar: ---- X-Spam_report: (-4.4 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_MED=-2.3, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: emacs-orgmode@gnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "General discussions about Org-mode." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: emacs-orgmode@gnu.org Errors-To: emacs-orgmode-bounces+larch=yhetil.org@gnu.org Sender: "Emacs-orgmode" X-Migadu-Flow: FLOW_IN ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=yhetil.org; s=key1; t=1624990801; h=from:from:sender:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:list-id:list-help: list-unsubscribe:list-subscribe:list-post:dkim-signature; bh=DW+V3Xb8cmOydlnQgetjltiKkr8L1Wiy7oreGsyQDsQ=; b=Ob3642MoIuE2jFbLmtnUuuVRzn+THbfV2f3hRqm2C3PoiabRLC3J4OSG6X2FyeRalkqaTI QEZ9BjAwNkiAhSLE+uPR9fpaylZaBTYtybhTCSka9iAVtkwO2rCKnKSc6YMMr8iuzzQx0a ihJa3LFqVGA3jpjzs108vFagRweQUbG0uh/0cW4qPTIQGKIhZFCShbpxnvA5X3+2SML5Lj l2k2L2GTtjgS4Z29/OQaSgBdKm3L6It+Oeu0MfEeKp9Q/caTyatypCJitO2/bvm+bp5lcf njj3MGjetc/uUSL05rSu7iazpiVnnIzHmfgsGkoNwMjW/qXeVJnUfAkR+LtN+g== ARC-Seal: i=1; s=key1; d=yhetil.org; t=1624990801; a=rsa-sha256; cv=none; b=QsZXTKIkWHd5KMVW/BEI2mrqrqMMnkAlXYQbTXzwP0LKkbJ2S0ucJlm7SJrGLUJJex9C3t HaixH7GBz5z1KdWgSNzZDfSnBnMrwQVOI2/nY7Jra0OUGuCuMDefgHbDaxMNb12SOdQCT/ /UEfJk82+LvstQY19HbHC4Kh+bEAJ9ku43PS3VzwpPzAUa6w+Jd7bRwB6exqtFJVMqB6D5 vMU/OZRIzdHU/+xDsGVorDU0kAbxliK6vTUzHKGd0wopo0t5YbTbfcYm9aaJzgksexFcW/ k/xNiBH2XLgmTpMqMqMFiM/PzNrje6Nhb5bud8BxRVYUwma9MbEww/0mPWixkg== ARC-Authentication-Results: i=1; aspmx1.migadu.com; dkim=pass header.d=ericabrahamsen.net header.s=mail header.b="Yl56Jaa/"; dmarc=pass (policy=none) header.from=ericabrahamsen.net; spf=pass (aspmx1.migadu.com: domain of emacs-orgmode-bounces@gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=emacs-orgmode-bounces@gnu.org X-Migadu-Spam-Score: -3.12 Authentication-Results: aspmx1.migadu.com; dkim=pass header.d=ericabrahamsen.net header.s=mail header.b="Yl56Jaa/"; dmarc=pass (policy=none) header.from=ericabrahamsen.net; spf=pass (aspmx1.migadu.com: domain of emacs-orgmode-bounces@gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=emacs-orgmode-bounces@gnu.org X-Migadu-Queue-Id: 5D3292673C X-Spam-Score: -3.12 X-Migadu-Scanner: scn1.migadu.com X-TUID: FcuCxLGlaxrt Maxim Nikulin writes: > On 29/06/2021 10:47, James Harkins wrote: >> So, it would make sense to add a rule to the exporter: if one of the >> characters before or after a source-text line break is a Chinese, >> Japanese or Korean character, do not add a space. > > On 29/06/2021 11:43, tumashu wrote: >> You can try the below config :-) >> =C2=A0=C2=A0=C2=A0 (let ((regexp "[[:multibyte:]]") >> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 (string text)) >> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 (setq string >> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 (rep= lace-regexp-in-string >> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0 (format "\\(%s\\) *\n *\\(%s\\)" regexp regexp) >> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0 "\\1\\2" string)) > > Notice that [[:multibyte:]] means almost any non-ASCII script, e.g. > Cyrillic: > > (let ((sample "abc =D0=B0=D0=B1=D0=B2 def")) > (and (string-match "[[:multibyte:]]\+" sample) > (match-string 0 sample))) > "=D0=B0=D0=B1=D0=B2" > > It seems, `org-fill-paragraph' M-q is smart enough to avoid a space > before or after a CJK character, so it is possible to determine > correct way to splice lines, despite e.g. "Script" Unicode property is > not exposed to elisp: > https://www.gnu.org/software/emacs/manual/html_node/elisp/Character-Prope= rties.html > (Anyway maintaining explicit list of scripts is not a straightforward > approach.) There are a few ways to approach this: (aref char-script-table ?=E4=B8=AD) -> 'han (string-match-p "\\cc" "=E4=B8=AD") -> 0 (aref (char-category-set ?=E4=B8=AD) ?|) -> t