From mboxrd@z Thu Jan 1 00:00:00 1970 From: Yasushi SHOJI Subject: Re: [RFC] ox-ascii.el: fixing variable width character handling Date: Fri, 03 Jan 2014 16:55:13 +0900 Message-ID: <8738l5sdni.wl@dns1.atmark-techno.com> References: <87zjpcsfoq.wl@dns1.atmark-techno.com> <87vc00768z.fsf@gmail.com> Mime-Version: 1.0 (generated by SEMI-EPG 1.14.7 - "Harue") Content-Type: text/plain; charset=ISO-2022-JP Return-path: Received: from eggs.gnu.org ([2001:4830:134:3::10]:56909) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1VyzbJ-0006NP-Uo for emacs-orgmode@gnu.org; Fri, 03 Jan 2014 02:55:30 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1VyzbC-00071K-65 for emacs-orgmode@gnu.org; Fri, 03 Jan 2014 02:55:25 -0500 Received: from p654782.hkidff01.ap.so-net.ne.jp ([121.101.71.130]:52869 helo=dns1.atmark-techno.com) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1VyzbB-00070s-EI for emacs-orgmode@gnu.org; Fri, 03 Jan 2014 02:55:18 -0500 In-Reply-To: <87vc00768z.fsf@gmail.com> List-Id: "General discussions about Org-mode." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-orgmode-bounces+geo-emacs-orgmode=m.gmane.org@gnu.org Sender: emacs-orgmode-bounces+geo-emacs-orgmode=m.gmane.org@gnu.org To: Nicolas Goaziou Cc: emacs-orgmode@gnu.org Hi Nicolas, Took me a while to get back to this. At Sun, 10 Nov 2013 14:09:48 +0100, Nicolas Goaziou wrote: > > > While I came up with the following experimental patch, I have a few > > questions: > > > > - What is the lisp idiom to handle type error? In the following > > patch, I've created a new wrapper function > > `org-ascii--string-width', it's a thin wrapper with if-then-else. > > I thought placing if-then-else clause in the original place clutter > > the code. > > You only use `org-ascii--string-width' in two places. You can write > > (string-width (or contents "")) > > instead, it will be less intrusive. Thanks! I've used this idiom in the attached patch. > BTW, what about `org-ascii--current-text-width'? This function also uses > `length' instead of `string-width'. I've checked all occurences of `length' in ox-ascii.el. It turns out that only two were used *not* for width calculation. > > - If wrapped string-width is good idea, would it be also a good idea > > to merge it with `org-string-width' in org.el? Because it is not > > ox-ascii specific. The current `org-string-width' does not handle > > nil, neither. > > That would be a good idea, but in a distinct patch. Will do when I have time. > > - The width of the underline for headlines should be checked with > > string-width. > > > > As defined in Unicode Standard Annex #41[1], character width is > > environment dependent. So, the calculation of character width at > > export time might not be sufficient enough. We can check to see > > the exported document contains cjk chars more than some thresholds > > or not, I haven't gone that far. > > I don't think there's much to do about it. Hopefully one can always > re-export the Org file in its own environment. > > > - Does anyone using ox-ascii.el depends on the clipped marker `=>' > > for fixed width columns? I thought `...' would be much readable. > > I have no opinion about it. Though, if we switch to "...", we also need > to provide the utf-8 equivalent when using this charset. Will do next time. > > - BTW, while looking at table handling, I noticed fixed column width > > doesn't work with the code at the current git HEAD. That's because > > width calculation is mixed with `length' and `string-width', and > > pass out-of-range arguments to `add-text-properties'. > > I guess your patch also fixed that. Ok here is new one. I've been using for a while. Hope it works for others. ----- >8 ----- >8 ----- >8 ----- >8 ----- Subject: [PATCH] ox-ascii: Convert `length' to `string-width' where appropriate * lisp/ox-ascii.el (org-ascii--current-text-width): Convert `length' to `string-width'. (org-ascii--build-title): Likewise. (org-ascii--build-toc): Likewise. (org-ascii--list-listings): Likewise. (org-ascii--list-tables): Likewise. (org-ascii-template--document-title): Likewise. (org-ascii-inner-template): Likewise. (org-ascii-format-inlinetask-default): Likewise. (org-ascii-format-inlinetask-default):Likewise. (org-ascii-item): Likewise. (org-ascii--table-cell-width): Likewise. (org-ascii-table-cell): Likewise. (org-ascii--current-text-width): Likewise. I've checked all occurrences of the function `length' in ox-ascii.el. It turns out that the most of them are calculating the width of given string. To support fullwidth characters, we better use `string-width' instead of `length'. Some characters in UCS are categorized as "East Asian Ambiguous"[1]. The return value of `string-width' with those characters depends on how Emacs is setup. We leave those ambiguous character handling to Emacs. Two usages of `length' in `ox-ascii.el' were left as-is, because those were used for: - bullet position calculation in `org-ascii-headline', and - cell position calculation in `org-ascii--table-cell-width'. [1]: http://www.unicode.org/reports/tr11/#Ambiguous --- lisp/ox-ascii.el | 73 ++++++++++++++++++++++++++++++++------------------------ 1 file changed, 42 insertions(+), 31 deletions(-) diff --git a/lisp/ox-ascii.el b/lisp/ox-ascii.el index b278ab6..5291a2d 100644 --- a/lisp/ox-ascii.el +++ b/lisp/ox-ascii.el @@ -527,15 +527,18 @@ INFO is a plist used as a communication channel." ;; the list and current item bullet's length. Also ;; remove checkbox length, and tag length (for ;; description lists) or bullet length. - (let ((struct (org-element-property :structure parent-item)) - (beg-item (org-element-property :begin parent-item))) + (let* ((struct (org-element-property :structure parent-item)) + (beg-item (org-element-property :begin parent-item)) + (tag (org-list-get-tag beg-item struct)) + (bullet (org-list-get-bullet beg-item struct))) (+ (- (org-list-get-ind beg-item struct) (org-list-get-ind (org-list-get-top-point struct) struct)) - (length (org-ascii--checkbox parent-item info)) - (length + (string-width (or (org-ascii--checkbox parent-item info) "")) + (string-width (or (org-list-get-tag beg-item struct) - (org-list-get-bullet beg-item struct))))))))))))) + (org-list-get-bullet beg-item struct) + "")))))))))))) (defun org-ascii--build-title (element info text-width &optional underline notags toc) @@ -591,7 +594,8 @@ possible. It doesn't apply to `inlinetask' elements." (when tags (format (format " %%%ds" - (max (- text-width (1+ (length first-part))) (length tags))) + (max (- text-width (1+ (string-width first-part))) + (string-width tags))) tags)) ;; Maybe underline text, if ELEMENT type is `headline' and an ;; underline character has been defined. @@ -602,7 +606,9 @@ possible. It doesn't apply to `inlinetask' elements." org-ascii-underline))))) (and under-char (concat "\n" - (make-string (length first-part) under-char)))))))) + (make-string (/ (string-width first-part) + (char-width under-char)) + under-char)))))))) (defun org-ascii--has-caption-p (element info) "Non-nil when ELEMENT has a caption affiliated keyword. @@ -649,7 +655,7 @@ which the table of contents generation has been initiated." (let ((title (org-ascii--translate "Table of Contents" info))) (concat title "\n" - (make-string (length title) + (make-string (string-width title) (if (eq (plist-get info :ascii-charset) 'utf-8) ?─ ?_)) "\n\n" (let ((text-width @@ -676,7 +682,7 @@ generation. INFO is a plist used as a communication channel." (let ((title (org-ascii--translate "List of Listings" info))) (concat title "\n" - (make-string (length title) + (make-string (string-width title) (if (eq (plist-get info :ascii-charset) 'utf-8) ?─ ?_)) "\n\n" (let ((text-width @@ -690,9 +696,10 @@ generation. INFO is a plist used as a communication channel." ;; Store initial text so its length can be computed. This is ;; used to properly align caption right to it in case of ;; filling (like contents of a description list item). - (let ((initial-text - (format (org-ascii--translate "Listing %d:" info) - (incf count)))) + (let* ((initial-text + (format (org-ascii--translate "Listing %d:" info) + (incf count))) + (width (string-width initial-text))) (concat initial-text " " (org-trim @@ -702,8 +709,8 @@ generation. INFO is a plist used as a communication channel." (let ((caption (or (org-export-get-caption src-block t) (org-export-get-caption src-block)))) (org-export-data caption info)) - (- text-width (length initial-text)) info) - (length initial-text)))))) + (- text-width width) info) + width))))) (org-export-collect-listings info) "\n"))))) (defun org-ascii--list-tables (keyword info) @@ -714,7 +721,7 @@ generation. INFO is a plist used as a communication channel." (let ((title (org-ascii--translate "List of Tables" info))) (concat title "\n" - (make-string (length title) + (make-string (string-width title) (if (eq (plist-get info :ascii-charset) 'utf-8) ?─ ?_)) "\n\n" (let ((text-width @@ -728,9 +735,10 @@ generation. INFO is a plist used as a communication channel." ;; Store initial text so its length can be computed. This is ;; used to properly align caption right to it in case of ;; filling (like contents of a description list item). - (let ((initial-text - (format (org-ascii--translate "Table %d:" info) - (incf count)))) + (let* ((initial-text + (format (org-ascii--translate "Table %d:" info) + (incf count))) + (width (string-width initial-text))) (concat initial-text " " (org-trim @@ -740,8 +748,8 @@ generation. INFO is a plist used as a communication channel." (let ((caption (or (org-export-get-caption table t) (org-export-get-caption table)))) (org-export-data caption info)) - (- text-width (length initial-text)) info) - (length initial-text)))))) + (- text-width width) info) + width))))) (org-export-collect-tables info) "\n"))))) (defun org-ascii--unique-links (element info) @@ -854,14 +862,14 @@ INFO is a plist used as a communication channel." ((and (org-string-nw-p date) (org-string-nw-p author)) (concat author - (make-string (- text-width (length date) (length author)) ? ) + (make-string (- text-width (string-width date) (string-width author)) ? ) date (when (org-string-nw-p email) (concat "\n" email)) "\n\n\n")) ((and (org-string-nw-p date) (org-string-nw-p email)) (concat email - (make-string (- text-width (length date) (length email)) ? ) + (make-string (- text-width (string-width date) (string-width email)) ? ) date "\n\n\n")) ((org-string-nw-p date) (concat @@ -877,11 +885,13 @@ INFO is a plist used as a communication channel." (let* ((utf8p (eq (plist-get info :ascii-charset) 'utf-8)) ;; Format TITLE. It may be filled if it is too wide, ;; that is wider than the two thirds of the total width. - (title-len (min (length title) (/ (* 2 text-width) 3))) + (title-len (min (string-width title) (/ (* 2 text-width) 3))) (formatted-title (org-ascii--fill-string title title-len info)) (line (make-string - (min (+ (max title-len (length author) (length email)) 2) + (min (+ (max title-len + (string-width (or author "")) + (string-width (or email ""))) 2) text-width) (if utf8p ?━ ?_)))) (org-ascii--justify-string (concat line "\n" @@ -920,7 +930,7 @@ holding export options." (concat title "\n" (make-string - (length title) + (string-width title) (if (eq (plist-get info :ascii-charset) 'utf-8) ?─ ?_)))) "\n\n" (let ((text-width (- org-ascii-text-width org-ascii-global-margin))) @@ -1197,7 +1207,7 @@ contextual information." ;;;; Inlinetask (defun org-ascii-format-inlinetask-default - (todo type priority name tags contents width inlinetask info) + (todo type priority name tags contents width inlinetask info) "Format an inline task element for ASCII export. See `org-ascii-format-inlinetask-function' for a description of the paramaters." @@ -1210,7 +1220,7 @@ of the paramaters." (unless utf8p (concat (make-string width ? ) "\n")) ;; Add title. Fill it if wider than inlinetask. (let ((title (org-ascii--build-title inlinetask info width))) - (if (<= (length title) width) title + (if (<= (string-width title) width) title (org-ascii--fill-string title width info))) "\n" ;; If CONTENTS is not empty, insert it along with @@ -1303,7 +1313,7 @@ contextual information." ;; Contents: Pay attention to indentation. Note: check-boxes are ;; already taken care of at the paragraph level so they don't ;; interfere with indentation. - (let ((contents (org-ascii--indent-string contents (length bullet)))) + (let ((contents (org-ascii--indent-string contents (string-width bullet)))) (if (eq (org-element-type (car (org-element-contents item))) 'paragraph) (org-trim contents) (concat "\n" contents)))))) @@ -1675,7 +1685,7 @@ are ignored." (org-element-map table 'table-row (lambda (row) (setq max-width - (max (length + (max (string-width (org-export-data (org-element-contents (elt (org-element-contents row) col)) @@ -1695,7 +1705,8 @@ a communication channel." ;; each cell in the column. (let ((width (org-ascii--table-cell-width table-cell info))) ;; When contents are too large, truncate them. - (unless (or org-ascii-table-widen-columns (<= (length contents) width)) + (unless (or org-ascii-table-widen-columns + (<= (string-width (or contents "")) width)) (setq contents (concat (substring contents 0 (- width 2)) "=>"))) ;; Align contents correctly within the cell. (let* ((indent-tabs-mode nil) @@ -1704,7 +1715,7 @@ a communication channel." (org-ascii--justify-string contents width (org-export-table-cell-alignment table-cell info))))) - (setq contents (concat data (make-string (- width (length data)) ? )))) + (setq contents (concat data (make-string (- width (string-width (or data ""))) ? )))) ;; Return cell. (concat (format " %s " contents) (when (memq 'right (org-export-table-cell-borders table-cell info)) -- 1.8.5.2 -- yashi