emacs-orgmode@gnu.org archives
 help / color / mirror / code / Atom feed
From: Yasushi SHOJI <yashi@atmark-techno.com>
To: Nicolas Goaziou <n.goaziou@gmail.com>
Cc: emacs-orgmode@gnu.org
Subject: Re: [RFC] ox-ascii.el: fixing variable width character handling
Date: Fri, 03 Jan 2014 16:55:13 +0900	[thread overview]
Message-ID: <8738l5sdni.wl@dns1.atmark-techno.com> (raw)
In-Reply-To: <87vc00768z.fsf@gmail.com>

Hi Nicolas,

Took me a while to get back to this.

At Sun, 10 Nov 2013 14:09:48 +0100,
Nicolas Goaziou wrote:
>
> > While I came up with the following experimental patch, I have a few
> > questions:
> >
> >  - What is the lisp idiom to handle type error?  In the following
> >    patch, I've created a new wrapper function
> >    `org-ascii--string-width', it's a thin wrapper with if-then-else.
> >    I thought placing if-then-else clause in the original place clutter
> >    the code.
>
> You only use `org-ascii--string-width' in two places. You can write
>
>   (string-width (or contents ""))
>
> instead, it will be less intrusive.

Thanks! I've used this idiom in the attached patch.

> BTW, what about `org-ascii--current-text-width'? This function also uses
> `length' instead of `string-width'.

I've checked all occurences of `length' in ox-ascii.el.  It turns out
that only two were used *not* for width calculation.

> >  - If wrapped string-width is good idea, would it be also a good idea
> >    to merge it with `org-string-width' in org.el?  Because it is not
> >    ox-ascii specific.  The current `org-string-width' does not handle
> >    nil, neither.
>
> That would be a good idea, but in a distinct patch.

Will do when I have time. 

> >  - The width of the underline for headlines should be checked with
> >    string-width.
> >
> >    As defined in Unicode Standard Annex #41[1], character width is
> >    environment dependent.  So, the calculation of character width at
> >    export time might not be sufficient enough.  We can check to see
> >    the exported document contains cjk chars more than some thresholds
> >    or not, I haven't gone that far.
>
> I don't think there's much to do about it. Hopefully one can always
> re-export the Org file in its own environment.
>
> >  - Does anyone using ox-ascii.el depends on the clipped marker `=>'
> >    for fixed width columns?  I thought `...' would be much readable.
>
> I have no opinion about it. Though, if we switch to "...", we also need
> to provide the utf-8 equivalent when using this charset.

Will do next time.

> >  - BTW, while looking at table handling, I noticed fixed column width
> >    doesn't work with the code at the current git HEAD.  That's because
> >    width calculation is mixed with `length' and `string-width', and
> >    pass out-of-range arguments to `add-text-properties'.
>
> I guess your patch also fixed that.

Ok here is new one.  I've been using for a while.  Hope it works for
others.

----- >8 ----- >8 ----- >8 ----- >8 -----

Subject: [PATCH] ox-ascii: Convert `length' to `string-width' where
 appropriate

* lisp/ox-ascii.el (org-ascii--current-text-width): Convert `length'
  to `string-width'.
  (org-ascii--build-title): Likewise.
  (org-ascii--build-toc): Likewise.
  (org-ascii--list-listings): Likewise.
  (org-ascii--list-tables): Likewise.
  (org-ascii-template--document-title): Likewise.
  (org-ascii-inner-template): Likewise.
  (org-ascii-format-inlinetask-default): Likewise.
  (org-ascii-format-inlinetask-default):Likewise.
  (org-ascii-item): Likewise.
  (org-ascii--table-cell-width): Likewise.
  (org-ascii-table-cell): Likewise.
  (org-ascii--current-text-width): Likewise.

I've checked all occurrences of the function `length' in ox-ascii.el.
It turns out that the most of them are calculating the width of given
string. To support fullwidth characters, we better use `string-width'
instead of `length'.

Some characters in UCS are categorized as "East Asian Ambiguous"[1].
The return value of `string-width' with those characters depends on
how Emacs is setup.  We leave those ambiguous character handling to
Emacs.

Two usages of `length' in `ox-ascii.el' were left as-is, because those
were used for:

 - bullet position calculation in `org-ascii-headline', and
 - cell position calculation in `org-ascii--table-cell-width'.

[1]: http://www.unicode.org/reports/tr11/#Ambiguous
---
 lisp/ox-ascii.el | 73 ++++++++++++++++++++++++++++++++------------------------
 1 file changed, 42 insertions(+), 31 deletions(-)

diff --git a/lisp/ox-ascii.el b/lisp/ox-ascii.el
index b278ab6..5291a2d 100644
--- a/lisp/ox-ascii.el
+++ b/lisp/ox-ascii.el
@@ -527,15 +527,18 @@ INFO is a plist used as a communication channel."
 		 ;; the list and current item bullet's length.  Also
 		 ;; remove checkbox length, and tag length (for
 		 ;; description lists) or bullet length.
-		 (let ((struct (org-element-property :structure parent-item))
-		       (beg-item (org-element-property :begin parent-item)))
+		 (let* ((struct (org-element-property :structure parent-item))
+			(beg-item (org-element-property :begin parent-item))
+			(tag (org-list-get-tag beg-item struct))
+			(bullet (org-list-get-bullet beg-item struct)))
 		   (+ (- (org-list-get-ind beg-item struct)
 			 (org-list-get-ind
 			  (org-list-get-top-point struct) struct))
-		      (length (org-ascii--checkbox parent-item info))
-		      (length
+		      (string-width (or (org-ascii--checkbox parent-item info) ""))
+		      (string-width
 		       (or (org-list-get-tag beg-item struct)
-			   (org-list-get-bullet beg-item struct)))))))))))))
+			   (org-list-get-bullet beg-item struct)
+			   ""))))))))))))
 
 (defun org-ascii--build-title
   (element info text-width &optional underline notags toc)
@@ -591,7 +594,8 @@ possible.  It doesn't apply to `inlinetask' elements."
      (when tags
        (format
 	(format " %%%ds"
-		(max (- text-width  (1+ (length first-part))) (length tags)))
+		(max (- text-width (1+ (string-width first-part)))
+		     (string-width tags)))
 	tags))
      ;; Maybe underline text, if ELEMENT type is `headline' and an
      ;; underline character has been defined.
@@ -602,7 +606,9 @@ possible.  It doesn't apply to `inlinetask' elements."
 			      org-ascii-underline)))))
 	 (and under-char
 	      (concat "\n"
-		      (make-string (length first-part) under-char))))))))
+		      (make-string (/ (string-width first-part)
+				      (char-width under-char))
+				   under-char))))))))
 
 (defun org-ascii--has-caption-p (element info)
   "Non-nil when ELEMENT has a caption affiliated keyword.
@@ -649,7 +655,7 @@ which the table of contents generation has been initiated."
   (let ((title (org-ascii--translate "Table of Contents" info)))
     (concat
      title "\n"
-     (make-string (length title)
+     (make-string (string-width title)
 		  (if (eq (plist-get info :ascii-charset) 'utf-8) ?─ ?_))
      "\n\n"
      (let ((text-width
@@ -676,7 +682,7 @@ generation.  INFO is a plist used as a communication channel."
   (let ((title (org-ascii--translate "List of Listings" info)))
     (concat
      title "\n"
-     (make-string (length title)
+     (make-string (string-width title)
 		  (if (eq (plist-get info :ascii-charset) 'utf-8) ?─ ?_))
      "\n\n"
      (let ((text-width
@@ -690,9 +696,10 @@ generation.  INFO is a plist used as a communication channel."
 	  ;; Store initial text so its length can be computed.  This is
 	  ;; used to properly align caption right to it in case of
 	  ;; filling (like contents of a description list item).
-	  (let ((initial-text
-		 (format (org-ascii--translate "Listing %d:" info)
-			 (incf count))))
+	  (let* ((initial-text
+		  (format (org-ascii--translate "Listing %d:" info)
+			  (incf count)))
+		 (width (string-width initial-text)))
 	    (concat
 	     initial-text " "
 	     (org-trim
@@ -702,8 +709,8 @@ generation.  INFO is a plist used as a communication channel."
 		(let ((caption (or (org-export-get-caption src-block t)
 				   (org-export-get-caption src-block))))
 		  (org-export-data caption info))
-		(- text-width (length initial-text)) info)
-	       (length initial-text))))))
+		(- text-width width) info)
+	       width)))))
 	(org-export-collect-listings info) "\n")))))
 
 (defun org-ascii--list-tables (keyword info)
@@ -714,7 +721,7 @@ generation.  INFO is a plist used as a communication channel."
   (let ((title (org-ascii--translate "List of Tables" info)))
     (concat
      title "\n"
-     (make-string (length title)
+     (make-string (string-width title)
 		  (if (eq (plist-get info :ascii-charset) 'utf-8) ?─ ?_))
      "\n\n"
      (let ((text-width
@@ -728,9 +735,10 @@ generation.  INFO is a plist used as a communication channel."
 	  ;; Store initial text so its length can be computed.  This is
 	  ;; used to properly align caption right to it in case of
 	  ;; filling (like contents of a description list item).
-	  (let ((initial-text
-		 (format (org-ascii--translate "Table %d:" info)
-			 (incf count))))
+	  (let* ((initial-text
+		  (format (org-ascii--translate "Table %d:" info)
+			  (incf count)))
+		 (width (string-width initial-text)))
 	    (concat
 	     initial-text " "
 	     (org-trim
@@ -740,8 +748,8 @@ generation.  INFO is a plist used as a communication channel."
 		(let ((caption (or (org-export-get-caption table t)
 				   (org-export-get-caption table))))
 		  (org-export-data caption info))
-		(- text-width (length initial-text)) info)
-	       (length initial-text))))))
+		(- text-width width) info)
+	       width)))))
 	(org-export-collect-tables info) "\n")))))
 
 (defun org-ascii--unique-links (element info)
@@ -854,14 +862,14 @@ INFO is a plist used as a communication channel."
 	 ((and (org-string-nw-p date) (org-string-nw-p author))
 	  (concat
 	   author
-	   (make-string (- text-width (length date) (length author)) ? )
+	   (make-string (- text-width (string-width date) (string-width author)) ? )
 	   date
 	   (when (org-string-nw-p email) (concat "\n" email))
 	   "\n\n\n"))
 	 ((and (org-string-nw-p date) (org-string-nw-p email))
 	  (concat
 	   email
-	   (make-string (- text-width (length date) (length email)) ? )
+	   (make-string (- text-width (string-width date) (string-width email)) ? )
 	   date "\n\n\n"))
 	 ((org-string-nw-p date)
 	  (concat
@@ -877,11 +885,13 @@ INFO is a plist used as a communication channel."
       (let* ((utf8p (eq (plist-get info :ascii-charset) 'utf-8))
 	     ;; Format TITLE.  It may be filled if it is too wide,
 	     ;; that is wider than the two thirds of the total width.
-	     (title-len (min (length title) (/ (* 2 text-width) 3)))
+	     (title-len (min (string-width title) (/ (* 2 text-width) 3)))
 	     (formatted-title (org-ascii--fill-string title title-len info))
 	     (line
 	      (make-string
-	       (min (+ (max title-len (length author) (length email)) 2)
+	       (min (+ (max title-len
+			    (string-width (or author ""))
+			    (string-width (or email ""))) 2)
 		    text-width) (if utf8p ?━ ?_))))
 	(org-ascii--justify-string
 	 (concat line "\n"
@@ -920,7 +930,7 @@ holding export options."
 	    (concat
 	     title "\n"
 	     (make-string
-	      (length title)
+	      (string-width title)
 	      (if (eq (plist-get info :ascii-charset) 'utf-8) ?─ ?_))))
 	  "\n\n"
 	  (let ((text-width (- org-ascii-text-width org-ascii-global-margin)))
@@ -1197,7 +1207,7 @@ contextual information."
 ;;;; Inlinetask
 
 (defun org-ascii-format-inlinetask-default
-    (todo type priority name tags contents width inlinetask info)
+  (todo type priority name tags contents width inlinetask info)
   "Format an inline task element for ASCII export.
 See `org-ascii-format-inlinetask-function' for a description
 of the paramaters."
@@ -1210,7 +1220,7 @@ of the paramaters."
       (unless utf8p (concat (make-string width ? ) "\n"))
       ;; Add title.  Fill it if wider than inlinetask.
       (let ((title (org-ascii--build-title inlinetask info width)))
-	(if (<= (length title) width) title
+	(if (<= (string-width title) width) title
 	  (org-ascii--fill-string title width info)))
       "\n"
       ;; If CONTENTS is not empty, insert it along with
@@ -1303,7 +1313,7 @@ contextual information."
      ;; Contents: Pay attention to indentation.  Note: check-boxes are
      ;; already taken care of at the paragraph level so they don't
      ;; interfere with indentation.
-     (let ((contents (org-ascii--indent-string contents (length bullet))))
+     (let ((contents (org-ascii--indent-string contents (string-width bullet))))
        (if (eq (org-element-type (car (org-element-contents item))) 'paragraph)
 	   (org-trim contents)
 	 (concat "\n" contents))))))
@@ -1675,7 +1685,7 @@ are ignored."
 	       (org-element-map table 'table-row
 		 (lambda (row)
 		   (setq max-width
-			 (max (length
+			 (max (string-width
 			       (org-export-data
 				(org-element-contents
 				 (elt (org-element-contents row) col))
@@ -1695,7 +1705,8 @@ a communication channel."
   ;; each cell in the column.
   (let ((width (org-ascii--table-cell-width table-cell info)))
     ;; When contents are too large, truncate them.
-    (unless (or org-ascii-table-widen-columns (<= (length contents) width))
+    (unless (or org-ascii-table-widen-columns
+		(<= (string-width (or contents "")) width))
       (setq contents (concat (substring contents 0 (- width 2)) "=>")))
     ;; Align contents correctly within the cell.
     (let* ((indent-tabs-mode nil)
@@ -1704,7 +1715,7 @@ a communication channel."
 	      (org-ascii--justify-string
 	       contents width
 	       (org-export-table-cell-alignment table-cell info)))))
-      (setq contents (concat data (make-string (- width (length data)) ? ))))
+      (setq contents (concat data (make-string (- width (string-width (or data ""))) ? ))))
     ;; Return cell.
     (concat (format " %s " contents)
 	    (when (memq 'right (org-export-table-cell-borders table-cell info))
-- 
1.8.5.2

-- 
          yashi

  reply	other threads:[~2014-01-03  7:55 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-11-10 10:40 [RFC] ox-ascii.el: fixing variable width character handling Yasushi SHOJI
2013-11-10 13:09 ` Nicolas Goaziou
2014-01-03  7:55   ` Yasushi SHOJI [this message]
2014-01-03  9:34     ` Nicolas Goaziou
2014-01-04 11:18       ` Yasushi SHOJI
2014-01-04 19:34         ` Nicolas Goaziou
2014-01-16 15:25         ` Nicolas Goaziou

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.orgmode.org/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=8738l5sdni.wl@dns1.atmark-techno.com \
    --to=yashi@atmark-techno.com \
    --cc=emacs-orgmode@gnu.org \
    --cc=n.goaziou@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).