From mboxrd@z Thu Jan 1 00:00:00 1970 From: Eric Abrahamsen Subject: Re: double-width characters in tables Date: Wed, 13 Feb 2013 09:46:15 +0800 Message-ID: <87a9r93s0o.fsf@ericabrahamsen.net> References: <878v6tdcmm.fsf@ericabrahamsen.net> <87vc9xixze.fsf@bzg.ath.cx> <87vc9x4jwq.fsf@ericabrahamsen.net> <877gmdcqzd.fsf@Rainer.invalid> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit Return-path: Received: from eggs.gnu.org ([208.118.235.92]:43000) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1U5RLa-0004eN-Sc for emacs-orgmode@gnu.org; Tue, 12 Feb 2013 20:41:20 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1U5RLV-0005bA-BG for emacs-orgmode@gnu.org; Tue, 12 Feb 2013 20:41:18 -0500 Received: from plane.gmane.org ([80.91.229.3]:41301) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1U5RLV-0005b6-4X for emacs-orgmode@gnu.org; Tue, 12 Feb 2013 20:41:13 -0500 Received: from list by plane.gmane.org with local (Exim 4.69) (envelope-from ) id 1U5RLn-0003Io-1P for emacs-orgmode@gnu.org; Wed, 13 Feb 2013 02:41:31 +0100 Received: from 222.95.4.58 ([222.95.4.58]) by main.gmane.org with esmtp (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Wed, 13 Feb 2013 02:41:31 +0100 Received: from eric by 222.95.4.58 with local (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Wed, 13 Feb 2013 02:41:31 +0100 List-Id: "General discussions about Org-mode." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-orgmode-bounces+geo-emacs-orgmode=m.gmane.org@gnu.org Sender: emacs-orgmode-bounces+geo-emacs-orgmode=m.gmane.org@gnu.org To: emacs-orgmode@gnu.org Achim Gratz writes: > Eric Abrahamsen writes: >> Yes, org-string-width eventually calls string-width, so that behaves >> "correctly" as far as it goes, but unfortunately that's not where the >> value in the text properties comes from... >> >> 《蛙》 >> 123456 >> >> Doesn't that line up for you? Those bracket characters come with their >> own "whitespace", maybe this is clearer: >> >> 正能量 >> 123456 >> >> One Chinese character should definitely take up two screen columns. > > That's a function of the font that is ultimately used and their > run-length. Since you are unlikely to use a font that has _all_ those > glyphs, eventually the ones that are missing in your specified font will > be replaced with the glyph corresponding to the same codepoint in a > different font if it exists or possibly a composition of multiple glyphs > (if not you'll get a placeholder). Thanks for all the responses here! Obviously the question is more complicated than I thought. Chinese fonts are fixed-width by default, so I assumed things would line up with a fixed-width latin font, though clearly I wasn't thinking hard enough about typefaces and their different widths/sizes. It's obvious from the screenshots that everyone's got a little something different going on. Takaaki's unicode WHITE CIRCLE is weird -- it's classified as CJK, but Chinese characters are usually named CJK IDEOGRAPH BLAH, and this isn't. It's displayed using a Chinese font on my machine, making it double-width, but sure enough `string-width' returns 1. I actually have the same problem with EM DASH here -- Inconsolata doesn't have that glyph, so a Chinese font is substituted, and I get a double-width character that reports its width as 1. Anyhoo... I still think the original problem is valid: strings in table fields should be measured with `string-width', not `length'. It might not be perfect, but surely it would be better than the present situation? On second thought I don't think it's a problem with text properties. (add-text-properties 0 6 '() "正能量") gives an Args out of range error, and it probably should, since all it cares about is the number of characters in the string. Solving the problem then might mean just using `org-string-width' directly in the code, in the places where string width is currently calculated from text property values. I'll poke at it, and see how badly I break it. Thanks again, Eric