emacs-orgmode@gnu.org archives
 help / color / mirror / code / Atom feed
* [BUG] [ox-odt] ODT export Chinese/Han script inserts unexpected spaces in each consecutive line
@ 2023-01-12  3:49 Cantoraz Chou
  2023-01-13  8:20 ` Ihor Radchenko
  0 siblings, 1 reply; 5+ messages in thread
From: Cantoraz Chou @ 2023-01-12  3:49 UTC (permalink / raw)
  To: emacs-orgmode

[-- Attachment #1: Type: text/plain, Size: 1185 bytes --]

Emacs  : GNU Emacs 28.2 (build 1, x86_64-pc-linux-gnu, GTK+ Version
3.24.34, cairo version 1.17.6) of 2022-09-12
Package: Org mode version 9.6

This commit [[https://git.savannah.gnu.org/cgit/emacs/org-mode.git/commit/?id=3502ce2dbb29b70cdbb978d144322d48cb00f26d][ox-odt:
Fix newlines replaced by spaces in Han script]]
brings in a strange bug.

Exporting ODT produces an unexpected space inserted into each line.

Eg, the simplest code (a consecutive line conposed of two contiguous lines)

#+begin_src org
那时,天下人的口音、言语都是一样。
那时,天下人的口音、言语都是一样。
#+end_src

will produce following text:

#+begin_example
那时,天下人的口音、言语都是一样。那 时,天下人的口音、言语都是一样。
#+end_example

Notice the second sentence, the space in `那 时'.

The space seems to be inserted randomly. With more detailed example
(provided as attachment), I find that only one space were inserted into
each consecutive line, at nearly middle of the whole line.

If I manually change the code of this commit back to previous
implementation, these unexpected spaces disappear.

[-- Attachment #2: Screenshot_20230112_103905.png --]
[-- Type: image/png, Size: 201065 bytes --]

[-- Attachment #3: example.org --]
[-- Type: application/octet-stream, Size: 4212 bytes --]

#+options: author:nil
# #+odt_styles_file: Templates/note-style.odt

The quick brown fox jumps over the lazy dog.
The quick brown fox jumps over the lazy dog.

那时,天下人的口音、言语都是一样。
那时,天下人的口音、言语都是一样。

那时,天下人的口音、言语都是一样。
那时,天下人的口音、言语都是一样。
那时,天下人的口音、言语都是一样。

天地玄黄,宇宙洪荒。日月盈昃,辰宿列张。寒来暑往,秋收冬藏。闰余成岁,律召调阳。
云腾致雨,露结为霜。金生丽水,玉出昆冈。剑号巨阙,珠称夜光。果珍李柰,菜重芥姜。
海咸河淡,鳞潜羽翔。龙师火帝,鸟官人皇。始制文字,乃服衣裳。推位让国,有虞陶唐。
吊民伐罪,周发殷汤。坐朝问道,垂拱平章。爱育黎首,臣伏戎羌。遐迩一体,率宾归王。
鸣凤在竹,白驹食场。化被草木,赖及万方。

盖此身发,四大五常。恭惟鞠养,岂敢毁伤。女慕贞洁,男效才良。知过必改,得能莫忘。
罔谈彼短,靡恃己长。信使可覆,器欲难量。墨悲丝染,诗赞羔羊。

景行维贤,克念作圣。德建名立,形端表正。空谷传声,虚堂习听。祸因恶积,福缘善庆。
尺璧非宝,寸阴是竞。资父事君,曰严与敬。孝当竭力,忠则尽命。临深履薄,夙兴温凊。
似兰斯馨,如松之盛。川流不息,渊澄取映。容止若思,言辞安定。笃初诚美,慎终宜令。
荣业所基,籍甚无竟。学优登仕,摄职从政。存以甘棠,去而益咏。乐殊贵贱,礼别尊卑。
上和下睦,夫唱妇随。外受傅训,入奉母仪。诸姑伯叔,犹子比儿。孔怀兄弟,同气连枝。
交友投分,切磨箴规。仁慈隐恻,造次弗离。节义廉退,颠沛匪亏。性静情逸,心动神疲。
守真志满,逐物意移。坚持雅操,好爵自縻。

都邑华夏,东西二京。背邙面洛,浮渭据泾。宫殿盘郁,楼观飞惊。图写禽兽,画彩仙灵。
丙舍旁启,甲帐对楹。肆筵设席,鼓瑟吹笙。升阶纳陛,弁转疑星。右通广内,左达承明。
既集坟典,亦聚群英。杜稿钟隶,漆书壁经。府罗将相,路侠槐卿。户封八县,家给千兵。
高冠陪辇,驱毂振缨。世禄侈富,车驾肥轻。策功茂实,勒碑刻铭。磻溪伊尹,佐时阿衡。
奄宅曲阜,微旦孰营。桓公匡合,济弱扶倾。绮回汉惠,说感武丁。俊乂密勿,多士实宁。
晋楚更霸,赵魏困横。假途灭虢,践土会盟。何遵约法,韩弊烦刑。起翦颇牧,用军最精。
宣威沙漠,驰誉丹青。九州禹迹,百郡秦并。岳宗泰岱,禅主云亭。雁门紫塞,鸡田赤城。
昆池碣石,钜野洞庭。旷远绵邈,岩岫杳冥。

治本于农,务兹稼穑。俶载南亩,我艺黍稷。税熟贡新,劝赏黜陟。孟轲敦素,史鱼秉直。
庶几中庸,劳谦谨敕。聆音察理,鉴貌辨色。贻厥嘉猷,勉其祗植。省躬讥诫,宠增抗极。
殆辱近耻,林皋幸即。两疏见机,解组谁逼。索居闲处,沉默寂寥。求古寻论,散虑逍遥。
欣奏累遣,戚谢欢招。渠荷的历,园莽抽条。枇杷晚翠,梧桐蚤凋。陈根委翳,落叶飘摇。
游鹍独运,凌摩绛霄。

耽读玩市,寓目囊箱。易輶攸畏,属耳垣墙。具膳餐饭,适口充肠。饱饫烹宰,饥厌糟糠。
亲戚故旧,老少异粮。妾御绩纺,侍巾帷房。纨扇圆洁,银烛炜煌。昼眠夕寐,蓝笋象床。
弦歌酒宴,接杯举觞。矫手顿足,悦豫且康。嫡后嗣续,祭祀烝尝。稽颡再拜,悚惧恐惶。
笺牒简要,顾答审详。骸垢想浴,执热愿凉。驴骡犊特,骇跃超骧。诛斩贼盗,捕获叛亡。

布射僚丸,嵇琴阮啸。恬笔伦纸,钧巧任钓。释纷利俗,并皆佳妙。毛施淑姿,工颦妍笑。
年矢每催,曦晖朗曜。璇玑悬斡,晦魄环照。指薪修祜,永绥吉劭。矩步引领,俯仰廊庙。
束带矜庄,徘徊瞻眺。孤陋寡闻,愚蒙等诮。谓语助者,焉哉乎也。

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [BUG] [ox-odt] ODT export Chinese/Han script inserts unexpected spaces in each consecutive line
  2023-01-12  3:49 [BUG] [ox-odt] ODT export Chinese/Han script inserts unexpected spaces in each consecutive line Cantoraz Chou
@ 2023-01-13  8:20 ` Ihor Radchenko
  2023-01-14 13:44   ` Cantoraz Chou
  0 siblings, 1 reply; 5+ messages in thread
From: Ihor Radchenko @ 2023-01-13  8:20 UTC (permalink / raw)
  To: Cantoraz Chou; +Cc: emacs-orgmode

Cantoraz Chou <cantoraz@gmail.com> writes:

> This commit [[https://git.savannah.gnu.org/cgit/emacs/org-mode.git/commit/?id=3502ce2dbb29b70cdbb978d144322d48cb00f26d][ox-odt:
> Fix newlines replaced by spaces in Han script]]
> brings in a strange bug.
>
> Exporting ODT produces an unexpected space inserted into each line.
>
> Eg, the simplest code (a consecutive line conposed of two contiguous lines)
>
> #+begin_src org
> 那时,天下人的口音、言语都是一样。
> 那时,天下人的口音、言语都是一样。
> #+end_src
>
> will produce following text:
>
> #+begin_example
> 那时,天下人的口音、言语都是一样。那 时,天下人的口音、言语都是一样。
> #+end_example
>
> Notice the second sentence, the space in `那 时'.

Thanks for reporting!
Fixed, on bugfix.
https://git.savannah.gnu.org/cgit/emacs/org-mode.git/commit/?id=cd967ce00

That was rather silly. I forgot that Chinese characters occupy two columns.

P.S. As Chinese user, do you have other gotchas Org might need to
address?

-- 
Ihor Radchenko // yantar92,
Org mode contributor,
Learn more about Org mode at <https://orgmode.org/>.
Support Org development at <https://liberapay.com/org-mode>,
or support my work at <https://liberapay.com/yantar92>


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [BUG] [ox-odt] ODT export Chinese/Han script inserts unexpected spaces in each consecutive line
  2023-01-13  8:20 ` Ihor Radchenko
@ 2023-01-14 13:44   ` Cantoraz Chou
  2023-01-14 13:54     ` Timestamp format during export (was: [BUG] [ox-odt] ODT export Chinese/Han script inserts unexpected spaces in each consecutive line) Ihor Radchenko
  2023-01-14 14:00     ` Pixel table alignment " Ihor Radchenko
  0 siblings, 2 replies; 5+ messages in thread
From: Cantoraz Chou @ 2023-01-14 13:44 UTC (permalink / raw)
  To: Ihor Radchenko; +Cc: emacs-orgmode

Ihor Radchenko <yantar92@posteo.net> wrote:

> Fixed, on bugfix.
> https://git.savannah.gnu.org/cgit/emacs/org-mode.git/commit/?id=cd967ce00

Thanks for your fix!

-----
[OT]

> P.S. As Chinese user, do you have other gotchas Org might need to
> address?

In my experience, most of weird troubles are related to the (1) language
habit or (2) mixed use of Ideographs and alphanumeric.

Eg. for (1).

The Org export option keyword `DATE' use a fixed built-in timestamp
format which is rarely used in Chinese. It's impossible to customize the
format only for exporting, without touching format in Org.

At present, I have to use a macro to format a Org timestamp to the
string, that used for `DATE' keyword.

It's best to provide a option/variable to set the timestamp format used
in exported doc.

Eg. for (2).

With `pangu-spacing' package installed, set
`pangu-spacing-real-insert-separtor' to nil. A well-chosen monospaced
font are also used to ensure one CJK character is alignment with two
Alphanumerics.

Think of following table:

#+begin_src org
| Alphanumeric | Ideograph | Symbol               | Mixed                    |
|--------------+-----------+----------------------+--------------------------|
| ABC αβγ 123  | 一二三四  | ,。、?!;:“”/‘’ | 1一B二c三α四             |
| ABC αβγ 123  | 一二三四  | ,。、?!;:“”/‘’ | A一/二“三”四            |
| ABC αβγ 123  | 一二三四  | ◉○🞛◇✿❀⚛✸     | ◉一○二🞛三◇四✿❀⚛✸ |
| ABC αβγ 123  | 一二三四  | ▶▷✶⬢⬡☀☼       | ▶▷✶⬢⬡☀☼           |
#+end_src

The right border of `Mixed' column isn't aligned, because
`pangu-spacing' insert a space between Chinese and English in an
overlay. (Actual display result as follows)

#+begin_example
| Alphanumeric | Ideograph | Symbol               | Mixed                    |
|--------------+-----------+----------------------+--------------------------|
| ABC αβγ 123  | 一二三四  | ,。、?!;:“”/‘’ | 1 一 B 二 c 三α四             |
| ABC αβγ 123  | 一二三四  | ,。、?!;:“”/‘’ | A 一/二“三”四            |
| ABC αβγ 123  | 一二三四  | ◉○🞛◇✿❀⚛✸     | ◉一○二🞛三◇四✿❀⚛✸ |
| ABC αβγ 123  | 一二三四  | ▶▷✶⬢⬡☀☼       | ▶▷✶⬢⬡☀☼           |
#+end_example

And, if I move the point vertically by Up/Down key in `Symbol' or
`Mixed' column, the point shifts left or right. This has nothing to do
with `pangu-spacing', but probably because some symbols have variable
width in a font.

Usually, I'm not sure if these problems can be called bugs, or which
extensions are responsible of them. Fortunately, these problems are
non-fatal and tolerable generally.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Timestamp format during export (was: [BUG] [ox-odt] ODT export Chinese/Han script inserts unexpected spaces in each consecutive line)
  2023-01-14 13:44   ` Cantoraz Chou
@ 2023-01-14 13:54     ` Ihor Radchenko
  2023-01-14 14:00     ` Pixel table alignment " Ihor Radchenko
  1 sibling, 0 replies; 5+ messages in thread
From: Ihor Radchenko @ 2023-01-14 13:54 UTC (permalink / raw)
  To: Cantoraz Chou; +Cc: emacs-orgmode

Cantoraz Chou <cantoraz@gmail.com> writes:

> In my experience, most of weird troubles are related to the (1) language
> habit or (2) mixed use of Ideographs and alphanumeric.
>
> Eg. for (1).
>
> The Org export option keyword `DATE' use a fixed built-in timestamp
> format which is rarely used in Chinese. It's impossible to customize the
> format only for exporting, without touching format in Org.
>
> At present, I have to use a macro to format a Org timestamp to the
> string, that used for `DATE' keyword.
>
> It's best to provide a option/variable to set the timestamp format used
> in exported doc.

There is `org-time-stamp-custom-formats' variable, which is used when
exporting timestamps. Do you need a separate variable for export only
(that does not affect `org-display-custom-times')?

-- 
Ihor Radchenko // yantar92,
Org mode contributor,
Learn more about Org mode at <https://orgmode.org/>.
Support Org development at <https://liberapay.com/org-mode>,
or support my work at <https://liberapay.com/yantar92>


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Pixel table alignment (was: [BUG] [ox-odt] ODT export Chinese/Han script inserts unexpected spaces in each consecutive line)
  2023-01-14 13:44   ` Cantoraz Chou
  2023-01-14 13:54     ` Timestamp format during export (was: [BUG] [ox-odt] ODT export Chinese/Han script inserts unexpected spaces in each consecutive line) Ihor Radchenko
@ 2023-01-14 14:00     ` Ihor Radchenko
  1 sibling, 0 replies; 5+ messages in thread
From: Ihor Radchenko @ 2023-01-14 14:00 UTC (permalink / raw)
  To: Cantoraz Chou; +Cc: emacs-orgmode

Cantoraz Chou <cantoraz@gmail.com> writes:

> With `pangu-spacing' package installed, set
> `pangu-spacing-real-insert-separtor' to nil. A well-chosen monospaced
> font are also used to ensure one CJK character is alignment with two
> Alphanumerics.
>
> Think of following table:
>
> #+begin_src org
> | Alphanumeric | Ideograph | Symbol               | Mixed                    |
> |--------------+-----------+----------------------+--------------------------|
> | ABC αβγ 123  | 一二三四  | ,。、?!;:“”/‘’ | 1一B二c三α四             |
> | ABC αβγ 123  | 一二三四  | ,。、?!;:“”/‘’ | A一/二“三”四            |
> | ABC αβγ 123  | 一二三四  | ◉○🞛◇✿❀⚛✸     | ◉一○二🞛三◇四✿❀⚛✸ |
> | ABC αβγ 123  | 一二三四  | ▶▷✶⬢⬡☀☼       | ▶▷✶⬢⬡☀☼           |
> #+end_src
>
> The right border of `Mixed' column isn't aligned, because
> `pangu-spacing' insert a space between Chinese and English in an
> overlay. (Actual display result as follows)
>
> #+begin_example
> | Alphanumeric | Ideograph | Symbol               | Mixed                    |
> |--------------+-----------+----------------------+--------------------------|
> | ABC αβγ 123  | 一二三四  | ,。、?!;:“”/‘’ | 1 一 B 二 c 三α四             |
> | ABC αβγ 123  | 一二三四  | ,。、?!;:“”/‘’ | A 一/二“三”四            |
> | ABC αβγ 123  | 一二三四  | ◉○🞛◇✿❀⚛✸     | ◉一○二🞛三◇四✿❀⚛✸ |
> | ABC αβγ 123  | 一二三四  | ▶▷✶⬢⬡☀☼       | ▶▷✶⬢⬡☀☼           |
> #+end_example
>
> And, if I move the point vertically by Up/Down key in `Symbol' or
> `Mixed' column, the point shifts left or right. This has nothing to do
> with `pangu-spacing', but probably because some symbols have variable
> width in a font.
>
> Usually, I'm not sure if these problems can be called bugs, or which
> extensions are responsible of them. Fortunately, these problems are
> non-fatal and tolerable generally.

This is not a bug, rather deficiency of Org's table alignment code.
The current code only takes into account the number of symbols, but not
their actual width in pixels.

The current approach reasonably good for actual plain text alignment,
but we can indeed do better wrt how tables are displayed in Emacs.

There is an existing package that implements this feature:
https://github.com/casouri/valign

Nobody proposed an equivalent patch for Org so far.

-- 
Ihor Radchenko // yantar92,
Org mode contributor,
Learn more about Org mode at <https://orgmode.org/>.
Support Org development at <https://liberapay.com/org-mode>,
or support my work at <https://liberapay.com/yantar92>

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2023-01-14 14:00 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-01-12  3:49 [BUG] [ox-odt] ODT export Chinese/Han script inserts unexpected spaces in each consecutive line Cantoraz Chou
2023-01-13  8:20 ` Ihor Radchenko
2023-01-14 13:44   ` Cantoraz Chou
2023-01-14 13:54     ` Timestamp format during export (was: [BUG] [ox-odt] ODT export Chinese/Han script inserts unexpected spaces in each consecutive line) Ihor Radchenko
2023-01-14 14:00     ` Pixel table alignment " Ihor Radchenko

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).