From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mp2 ([2001:41d0:2:bcc0::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by ms0.migadu.com with LMTPS id 0BHrHkVi3GBxgAAAgWs5BA (envelope-from ) for ; Wed, 30 Jun 2021 14:23:33 +0200 Received: from aspmx1.migadu.com ([2001:41d0:2:bcc0::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by mp2 with LMTPS id qBapGkVi3GAkbQAAB5/wlQ (envelope-from ) for ; Wed, 30 Jun 2021 12:23:33 +0000 Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by aspmx1.migadu.com (Postfix) with ESMTPS id 4B4A113000 for ; Wed, 30 Jun 2021 14:23:32 +0200 (CEST) Received: from localhost ([::1]:40538 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1lyZFa-0006RB-5h for larch@yhetil.org; Wed, 30 Jun 2021 08:23:30 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:44466) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1lyZEd-0006Pn-NW for emacs-orgmode@gnu.org; Wed, 30 Jun 2021 08:22:31 -0400 Received: from ciao.gmane.io ([116.202.254.214]:42812) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1lyZEc-0005zi-3t for emacs-orgmode@gnu.org; Wed, 30 Jun 2021 08:22:31 -0400 Received: from list by ciao.gmane.io with local (Exim 4.92) (envelope-from ) id 1lyZEZ-0001Px-94 for emacs-orgmode@gnu.org; Wed, 30 Jun 2021 14:22:27 +0200 X-Injected-Via-Gmane: http://gmane.org/ To: emacs-orgmode@gnu.org From: Maxim Nikulin Subject: Re: Bug: ODT export of Chinese text inserts spaces for line breaks Date: Wed, 30 Jun 2021 19:22:21 +0700 Message-ID: References: <17a55e0b01d.11be78c6c72761.7557666657037565597@zoho.com> <557d5f5d.2eed.17a56147ed0.Coremail.tumashu@163.com> <87sg10ttxw.fsf@ericabrahamsen.net> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.11.0 X-Woof-Bug: ODT Export of Chinese text: newlines appear as spaces In-Reply-To: <87sg10ttxw.fsf@ericabrahamsen.net> Content-Language: en-US Received-SPF: pass client-ip=116.202.254.214; envelope-from=geo-emacs-orgmode@m.gmane-mx.org; helo=ciao.gmane.io X-Spam_score_int: 47 X-Spam_score: 4.7 X-Spam_bar: ++++ X-Spam_report: (4.7 / 5.0 requ) BAYES_40=-0.001, DKIM_ADSP_CUSTOM_MED=0.001, FORGED_GMAIL_RCVD=1, FORGED_MUA_MOZILLA=2.309, FREEMAIL_FORGED_FROMDOMAIN=0.249, FREEMAIL_FROM=0.001, HEADER_FROM_DIFFERENT_DOMAINS=0.248, NICE_REPLY_A=-0.001, NML_ADSP_CUSTOM_MED=0.9, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=no autolearn_force=no X-Spam_action: no action X-BeenThere: emacs-orgmode@gnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "General discussions about Org-mode." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-orgmode-bounces+larch=yhetil.org@gnu.org Sender: "Emacs-orgmode" X-Migadu-Flow: FLOW_IN ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=yhetil.org; s=key1; t=1625055813; h=from:from:sender:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:list-id:list-help: list-unsubscribe:list-subscribe:list-post; bh=exL5HgYjA99IK+A8LJvrcNFjZiPu6eRIhaRrGi9WzuI=; b=JGSaUhIGOSYQ4rP+wBjnnJ+Cf+y/cjF60+dwpu8kuGBFxJYZio9nq2vTqEamCKIGaWBBuJ Y27Tr+TghzhMcMn0Js6zRcFiYM7UckC3q4jQkZh81iEk0Y5QysokCRar/9/i7sybsUbLGr V+OjXw9H6zISHg1mTd+kZ9plPMVoH2XgxsMZ5iP2Zy0qCrmob+vaqZqOAHlYhtL1UlM2EQ VXnp4DAQr4Y0CI+DwT9DPl+75Lwlcvuc1cmVrQ0EoeNPt4X59ao0073WMbOnTSN0dpV7mS P5MXNhhYtg3oiunDbDwqcxzg0+2wBWlku9w2tmJBbwX4UWAiDlVt4raQQxwOZg== ARC-Seal: i=1; s=key1; d=yhetil.org; t=1625055813; a=rsa-sha256; cv=none; b=ASBtyTRJny9VN/IXTQH3FiNolP4I3cG5aKzflNaqU81KCMpBCTdUui49vQEZfmG0oPkbCN 1v+kAvx4CWOFmhFbuueqTKH8UMr7d13mDCx2x/X3RnP3PsmcK7FD6mMkvomtjiwHv1XeZS 7IO04QxXxxoPv5/MtUqecWDHR8DdtuBC+xtp19acgCeQkD1FWOAjJ6VG1wwyJIl2BCPf9P WuYkheUx1mD9FDqZlYpiVOrrHRDPf4G6SPfvt+PPpLlCy0t9hYh96pWgG99o9D5VD2Ix87 Itdx0H1KAvVXdTgXi3EIYAHsyl2gzs+z1KO7keIBgonF3WpAjrzKC4mPIC30gA== ARC-Authentication-Results: i=1; aspmx1.migadu.com; dkim=none; dmarc=fail reason="SPF not aligned (relaxed), No valid DKIM" header.from=gmail.com (policy=none); spf=pass (aspmx1.migadu.com: domain of emacs-orgmode-bounces@gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=emacs-orgmode-bounces@gnu.org X-Migadu-Spam-Score: -0.32 Authentication-Results: aspmx1.migadu.com; dkim=none; dmarc=fail reason="SPF not aligned (relaxed), No valid DKIM" header.from=gmail.com (policy=none); spf=pass (aspmx1.migadu.com: domain of emacs-orgmode-bounces@gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=emacs-orgmode-bounces@gnu.org X-Migadu-Queue-Id: 4B4A113000 X-Spam-Score: -0.32 X-Migadu-Scanner: scn0.migadu.com X-TUID: mcGJYy5KCeSh On 29/06/2021 10:47, James Harkins wrote: > * Test > 1本人不想亲自拿到学历学位证书、急于离校者,可书面委托他人代领学历学位证 > 书,29日起即可离校;2本人想亲自领取学历学位证书者,按学校规定的程序及有关 > 要求办理离校手续,领取相关证书后离校; > Exporting to ODT produces the following (body text, omitting titles, > headers and such). > > 1本人不想亲自拿到学历学位证书、急于离校者,可书面委托他人代领学历学位证 书,29日起即可离校;2本人想亲自领取学历学位证书者,按学校规定的程序及有关 要求办理离校手续,领取相关证书后离校; Confirmed: newlines are copied to ODT document as is and they appear as spaces in libreoffice. I did not tried HTML since I am unsure if browsers should glue paragraphs with newlines into continuous string without spaces. Maybe it is necessary to add some attributes for proper representation (e.g. "lang"), however "#+LANGUAGE: cn" does not help even though libreoffice considers paragraph as Chinese. On 30/06/2021 01:19, Eric Abrahamsen wrote: > There are a few ways to approach this: > > (aref char-script-table ?中) -> 'han > (string-match-p "\\cc" "中") -> 0 > (aref (char-category-set ?中) ?|) -> t Thank you. I have not noticed all features hidden behind \c. I believe, (rx (category can-break)) is more readable and I am a bit surprised that there is no descriptive aliases char-categories such as ?|. Just to add another example: (category-set-mnemonics (char-category-set ?ф)) -> ".LYchjy" and `describe-categories' to decipher it. As to splicing lines, I found `fill-delete-newlines' that uses `fill-nospace-between-words-table' besides ?| category to determine whether space should be suppressed while splicing lines. In addition there are some variables to tune behavior.