emacs-orgmode@gnu.org archives
 help / color / mirror / code / Atom feed
* [RFC] ox-ascii.el: fixing variable width character handling
@ 2013-11-10 10:40 Yasushi SHOJI
  2013-11-10 13:09 ` Nicolas Goaziou
  0 siblings, 1 reply; 7+ messages in thread
From: Yasushi SHOJI @ 2013-11-10 10:40 UTC (permalink / raw)
  To: emacs-orgmode

Hi,

I've been trying to fix ASCII export back-end for variable width
chars. It is basically replacing `length' with `string-width', but the
behavior of those two functions differ when you give nil as an
argument; `length' returns 0, `string-width' yields a type error:
"eval: Wrong type argument: stringp, nil"

While I came up with the following experimental patch, I have a few
questions:

 - What is the lisp idiom to handle type error?  In the following
   patch, I've created a new wrapper function
   `org-ascii--string-width', it's a thin wrapper with if-then-else.
   I thought placing if-then-else clause in the original place clutter
   the code.

 - If wrapped string-width is good idea, would it be also a good idea
   to merge it with `org-string-width' in org.el?  Because it is not
   ox-ascii specific.  The current `org-string-width' does not handle
   nil, neither.

 - The width of the underline for headlines should be checked with
   string-width.

   As defined in Unicode Standard Annex #41[1], character width is
   environment dependent.  So, the calculation of character width at
   export time might not be sufficient enough.  We can check to see
   the exported document contains cjk chars more than some thresholds
   or not, I haven't gone that far.

 - Does anyone using ox-ascii.el depends on the clipped marker `=>'
   for fixed width columns?  I thought `...' would be much readable.

 - BTW, while looking at table handling, I noticed fixed column width
   doesn't work with the code at the current git HEAD.  That's because
   width calculation is mixed with `length' and `string-width', and
   pass out-of-range arguments to `add-text-properties'.


[1]: http://www.unicode.org/reports/tr41/

diff --git a/lisp/ox-ascii.el b/lisp/ox-ascii.el
index 8e75007..35d58fc 100644
--- a/lisp/ox-ascii.el
+++ b/lisp/ox-ascii.el
@@ -630,7 +630,8 @@ possible.  It doesn't apply to `inlinetask' elements."
 			      org-ascii-underline)))))
 	 (and under-char
 	      (concat "\n"
-		      (make-string (length first-part) under-char))))))))
+		      (make-string (/ (string-width first-part) (char-width under-char))
+				   under-char))))))))
 
 (defun org-ascii--has-caption-p (element info)
   "Non-nil when ELEMENT has a caption affiliated keyword.
@@ -1704,7 +1705,7 @@ are ignored."
 	       (org-element-map table 'table-row
 		 (lambda (row)
 		   (setq max-width
-			 (max (length
+			 (max (string-width
 			       (org-export-data
 				(org-element-contents
 				 (elt (org-element-contents row) col))
@@ -1714,6 +1715,11 @@ are ignored."
 	       max-width))
 	 cache))))
 
+(defun org-ascii--string-width (str)
+  (if str
+      (string-width str)
+    0))
+
 (defun org-ascii-table-cell (table-cell contents info)
   "Transcode a TABLE-CELL object from Org to ASCII.
 CONTENTS is the cell contents.  INFO is a plist used as
@@ -1724,16 +1730,18 @@ a communication channel."
   ;; each cell in the column.
   (let ((width (org-ascii--table-cell-width table-cell info)))
     ;; When contents are too large, truncate them.
-    (unless (or org-ascii-table-widen-columns (<= (length contents) width))
-      (setq contents (concat (substring contents 0 (- width 2)) "=>")))
+    (unless (or org-ascii-table-widen-columns
+		(<= (org-ascii--string-width contents) width))
+      (setq contents (truncate-string-to-width contents width nil ?. t)))
     ;; Align contents correctly within the cell.
     (let* ((indent-tabs-mode nil)
 	   (data
 	    (when contents
 	      (org-ascii--justify-string
 	       contents width
-	       (org-export-table-cell-alignment table-cell info)))))
-      (setq contents (concat data (make-string (- width (length data)) ? ))))
+	       (org-export-table-cell-alignment table-cell info))))
+	   (trailing-space (make-string (- width (org-ascii--string-width data)) ? )))
+      (setq contents (concat data trailing-space)))
     ;; Return cell.
     (concat (format " %s " contents)
 	    (when (memq 'right (org-export-table-cell-borders table-cell info))

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC] ox-ascii.el: fixing variable width character handling
  2013-11-10 10:40 [RFC] ox-ascii.el: fixing variable width character handling Yasushi SHOJI
@ 2013-11-10 13:09 ` Nicolas Goaziou
  2014-01-03  7:55   ` Yasushi SHOJI
  0 siblings, 1 reply; 7+ messages in thread
From: Nicolas Goaziou @ 2013-11-10 13:09 UTC (permalink / raw)
  To: Yasushi SHOJI; +Cc: emacs-orgmode

Hello,

Yasushi SHOJI <yashi@atmark-techno.com> writes:

> I've been trying to fix ASCII export back-end for variable width
> chars. It is basically replacing `length' with `string-width', but the
> behavior of those two functions differ when you give nil as an
> argument; `length' returns 0, `string-width' yields a type error:
> "eval: Wrong type argument: stringp, nil"

Thank you for your patch.

> While I came up with the following experimental patch, I have a few
> questions:
>
>  - What is the lisp idiom to handle type error?  In the following
>    patch, I've created a new wrapper function
>    `org-ascii--string-width', it's a thin wrapper with if-then-else.
>    I thought placing if-then-else clause in the original place clutter
>    the code.

You only use `org-ascii--string-width' in two places. You can write

  (string-width (or contents ""))

instead, it will be less intrusive.

BTW, what about `org-ascii--current-text-width'? This function also uses
`length' instead of `string-width'.

>  - If wrapped string-width is good idea, would it be also a good idea
>    to merge it with `org-string-width' in org.el?  Because it is not
>    ox-ascii specific.  The current `org-string-width' does not handle
>    nil, neither.

That would be a good idea, but in a distinct patch.

>  - The width of the underline for headlines should be checked with
>    string-width.
>
>    As defined in Unicode Standard Annex #41[1], character width is
>    environment dependent.  So, the calculation of character width at
>    export time might not be sufficient enough.  We can check to see
>    the exported document contains cjk chars more than some thresholds
>    or not, I haven't gone that far.

I don't think there's much to do about it. Hopefully one can always
re-export the Org file in its own environment.

>  - Does anyone using ox-ascii.el depends on the clipped marker `=>'
>    for fixed width columns?  I thought `...' would be much readable.

I have no opinion about it. Though, if we switch to "...", we also need
to provide the utf-8 equivalent when using this charset.

>  - BTW, while looking at table handling, I noticed fixed column width
>    doesn't work with the code at the current git HEAD.  That's because
>    width calculation is mixed with `length' and `string-width', and
>    pass out-of-range arguments to `add-text-properties'.

I guess your patch also fixed that.


Regards,

-- 
Nicolas Goaziou

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC] ox-ascii.el: fixing variable width character handling
  2013-11-10 13:09 ` Nicolas Goaziou
@ 2014-01-03  7:55   ` Yasushi SHOJI
  2014-01-03  9:34     ` Nicolas Goaziou
  0 siblings, 1 reply; 7+ messages in thread
From: Yasushi SHOJI @ 2014-01-03  7:55 UTC (permalink / raw)
  To: Nicolas Goaziou; +Cc: emacs-orgmode

Hi Nicolas,

Took me a while to get back to this.

At Sun, 10 Nov 2013 14:09:48 +0100,
Nicolas Goaziou wrote:
>
> > While I came up with the following experimental patch, I have a few
> > questions:
> >
> >  - What is the lisp idiom to handle type error?  In the following
> >    patch, I've created a new wrapper function
> >    `org-ascii--string-width', it's a thin wrapper with if-then-else.
> >    I thought placing if-then-else clause in the original place clutter
> >    the code.
>
> You only use `org-ascii--string-width' in two places. You can write
>
>   (string-width (or contents ""))
>
> instead, it will be less intrusive.

Thanks! I've used this idiom in the attached patch.

> BTW, what about `org-ascii--current-text-width'? This function also uses
> `length' instead of `string-width'.

I've checked all occurences of `length' in ox-ascii.el.  It turns out
that only two were used *not* for width calculation.

> >  - If wrapped string-width is good idea, would it be also a good idea
> >    to merge it with `org-string-width' in org.el?  Because it is not
> >    ox-ascii specific.  The current `org-string-width' does not handle
> >    nil, neither.
>
> That would be a good idea, but in a distinct patch.

Will do when I have time. 

> >  - The width of the underline for headlines should be checked with
> >    string-width.
> >
> >    As defined in Unicode Standard Annex #41[1], character width is
> >    environment dependent.  So, the calculation of character width at
> >    export time might not be sufficient enough.  We can check to see
> >    the exported document contains cjk chars more than some thresholds
> >    or not, I haven't gone that far.
>
> I don't think there's much to do about it. Hopefully one can always
> re-export the Org file in its own environment.
>
> >  - Does anyone using ox-ascii.el depends on the clipped marker `=>'
> >    for fixed width columns?  I thought `...' would be much readable.
>
> I have no opinion about it. Though, if we switch to "...", we also need
> to provide the utf-8 equivalent when using this charset.

Will do next time.

> >  - BTW, while looking at table handling, I noticed fixed column width
> >    doesn't work with the code at the current git HEAD.  That's because
> >    width calculation is mixed with `length' and `string-width', and
> >    pass out-of-range arguments to `add-text-properties'.
>
> I guess your patch also fixed that.

Ok here is new one.  I've been using for a while.  Hope it works for
others.

----- >8 ----- >8 ----- >8 ----- >8 -----

Subject: [PATCH] ox-ascii: Convert `length' to `string-width' where
 appropriate

* lisp/ox-ascii.el (org-ascii--current-text-width): Convert `length'
  to `string-width'.
  (org-ascii--build-title): Likewise.
  (org-ascii--build-toc): Likewise.
  (org-ascii--list-listings): Likewise.
  (org-ascii--list-tables): Likewise.
  (org-ascii-template--document-title): Likewise.
  (org-ascii-inner-template): Likewise.
  (org-ascii-format-inlinetask-default): Likewise.
  (org-ascii-format-inlinetask-default):Likewise.
  (org-ascii-item): Likewise.
  (org-ascii--table-cell-width): Likewise.
  (org-ascii-table-cell): Likewise.
  (org-ascii--current-text-width): Likewise.

I've checked all occurrences of the function `length' in ox-ascii.el.
It turns out that the most of them are calculating the width of given
string. To support fullwidth characters, we better use `string-width'
instead of `length'.

Some characters in UCS are categorized as "East Asian Ambiguous"[1].
The return value of `string-width' with those characters depends on
how Emacs is setup.  We leave those ambiguous character handling to
Emacs.

Two usages of `length' in `ox-ascii.el' were left as-is, because those
were used for:

 - bullet position calculation in `org-ascii-headline', and
 - cell position calculation in `org-ascii--table-cell-width'.

[1]: http://www.unicode.org/reports/tr11/#Ambiguous
---
 lisp/ox-ascii.el | 73 ++++++++++++++++++++++++++++++++------------------------
 1 file changed, 42 insertions(+), 31 deletions(-)

diff --git a/lisp/ox-ascii.el b/lisp/ox-ascii.el
index b278ab6..5291a2d 100644
--- a/lisp/ox-ascii.el
+++ b/lisp/ox-ascii.el
@@ -527,15 +527,18 @@ INFO is a plist used as a communication channel."
 		 ;; the list and current item bullet's length.  Also
 		 ;; remove checkbox length, and tag length (for
 		 ;; description lists) or bullet length.
-		 (let ((struct (org-element-property :structure parent-item))
-		       (beg-item (org-element-property :begin parent-item)))
+		 (let* ((struct (org-element-property :structure parent-item))
+			(beg-item (org-element-property :begin parent-item))
+			(tag (org-list-get-tag beg-item struct))
+			(bullet (org-list-get-bullet beg-item struct)))
 		   (+ (- (org-list-get-ind beg-item struct)
 			 (org-list-get-ind
 			  (org-list-get-top-point struct) struct))
-		      (length (org-ascii--checkbox parent-item info))
-		      (length
+		      (string-width (or (org-ascii--checkbox parent-item info) ""))
+		      (string-width
 		       (or (org-list-get-tag beg-item struct)
-			   (org-list-get-bullet beg-item struct)))))))))))))
+			   (org-list-get-bullet beg-item struct)
+			   ""))))))))))))
 
 (defun org-ascii--build-title
   (element info text-width &optional underline notags toc)
@@ -591,7 +594,8 @@ possible.  It doesn't apply to `inlinetask' elements."
      (when tags
        (format
 	(format " %%%ds"
-		(max (- text-width  (1+ (length first-part))) (length tags)))
+		(max (- text-width (1+ (string-width first-part)))
+		     (string-width tags)))
 	tags))
      ;; Maybe underline text, if ELEMENT type is `headline' and an
      ;; underline character has been defined.
@@ -602,7 +606,9 @@ possible.  It doesn't apply to `inlinetask' elements."
 			      org-ascii-underline)))))
 	 (and under-char
 	      (concat "\n"
-		      (make-string (length first-part) under-char))))))))
+		      (make-string (/ (string-width first-part)
+				      (char-width under-char))
+				   under-char))))))))
 
 (defun org-ascii--has-caption-p (element info)
   "Non-nil when ELEMENT has a caption affiliated keyword.
@@ -649,7 +655,7 @@ which the table of contents generation has been initiated."
   (let ((title (org-ascii--translate "Table of Contents" info)))
     (concat
      title "\n"
-     (make-string (length title)
+     (make-string (string-width title)
 		  (if (eq (plist-get info :ascii-charset) 'utf-8) ?─ ?_))
      "\n\n"
      (let ((text-width
@@ -676,7 +682,7 @@ generation.  INFO is a plist used as a communication channel."
   (let ((title (org-ascii--translate "List of Listings" info)))
     (concat
      title "\n"
-     (make-string (length title)
+     (make-string (string-width title)
 		  (if (eq (plist-get info :ascii-charset) 'utf-8) ?─ ?_))
      "\n\n"
      (let ((text-width
@@ -690,9 +696,10 @@ generation.  INFO is a plist used as a communication channel."
 	  ;; Store initial text so its length can be computed.  This is
 	  ;; used to properly align caption right to it in case of
 	  ;; filling (like contents of a description list item).
-	  (let ((initial-text
-		 (format (org-ascii--translate "Listing %d:" info)
-			 (incf count))))
+	  (let* ((initial-text
+		  (format (org-ascii--translate "Listing %d:" info)
+			  (incf count)))
+		 (width (string-width initial-text)))
 	    (concat
 	     initial-text " "
 	     (org-trim
@@ -702,8 +709,8 @@ generation.  INFO is a plist used as a communication channel."
 		(let ((caption (or (org-export-get-caption src-block t)
 				   (org-export-get-caption src-block))))
 		  (org-export-data caption info))
-		(- text-width (length initial-text)) info)
-	       (length initial-text))))))
+		(- text-width width) info)
+	       width)))))
 	(org-export-collect-listings info) "\n")))))
 
 (defun org-ascii--list-tables (keyword info)
@@ -714,7 +721,7 @@ generation.  INFO is a plist used as a communication channel."
   (let ((title (org-ascii--translate "List of Tables" info)))
     (concat
      title "\n"
-     (make-string (length title)
+     (make-string (string-width title)
 		  (if (eq (plist-get info :ascii-charset) 'utf-8) ?─ ?_))
      "\n\n"
      (let ((text-width
@@ -728,9 +735,10 @@ generation.  INFO is a plist used as a communication channel."
 	  ;; Store initial text so its length can be computed.  This is
 	  ;; used to properly align caption right to it in case of
 	  ;; filling (like contents of a description list item).
-	  (let ((initial-text
-		 (format (org-ascii--translate "Table %d:" info)
-			 (incf count))))
+	  (let* ((initial-text
+		  (format (org-ascii--translate "Table %d:" info)
+			  (incf count)))
+		 (width (string-width initial-text)))
 	    (concat
 	     initial-text " "
 	     (org-trim
@@ -740,8 +748,8 @@ generation.  INFO is a plist used as a communication channel."
 		(let ((caption (or (org-export-get-caption table t)
 				   (org-export-get-caption table))))
 		  (org-export-data caption info))
-		(- text-width (length initial-text)) info)
-	       (length initial-text))))))
+		(- text-width width) info)
+	       width)))))
 	(org-export-collect-tables info) "\n")))))
 
 (defun org-ascii--unique-links (element info)
@@ -854,14 +862,14 @@ INFO is a plist used as a communication channel."
 	 ((and (org-string-nw-p date) (org-string-nw-p author))
 	  (concat
 	   author
-	   (make-string (- text-width (length date) (length author)) ? )
+	   (make-string (- text-width (string-width date) (string-width author)) ? )
 	   date
 	   (when (org-string-nw-p email) (concat "\n" email))
 	   "\n\n\n"))
 	 ((and (org-string-nw-p date) (org-string-nw-p email))
 	  (concat
 	   email
-	   (make-string (- text-width (length date) (length email)) ? )
+	   (make-string (- text-width (string-width date) (string-width email)) ? )
 	   date "\n\n\n"))
 	 ((org-string-nw-p date)
 	  (concat
@@ -877,11 +885,13 @@ INFO is a plist used as a communication channel."
       (let* ((utf8p (eq (plist-get info :ascii-charset) 'utf-8))
 	     ;; Format TITLE.  It may be filled if it is too wide,
 	     ;; that is wider than the two thirds of the total width.
-	     (title-len (min (length title) (/ (* 2 text-width) 3)))
+	     (title-len (min (string-width title) (/ (* 2 text-width) 3)))
 	     (formatted-title (org-ascii--fill-string title title-len info))
 	     (line
 	      (make-string
-	       (min (+ (max title-len (length author) (length email)) 2)
+	       (min (+ (max title-len
+			    (string-width (or author ""))
+			    (string-width (or email ""))) 2)
 		    text-width) (if utf8p ?━ ?_))))
 	(org-ascii--justify-string
 	 (concat line "\n"
@@ -920,7 +930,7 @@ holding export options."
 	    (concat
 	     title "\n"
 	     (make-string
-	      (length title)
+	      (string-width title)
 	      (if (eq (plist-get info :ascii-charset) 'utf-8) ?─ ?_))))
 	  "\n\n"
 	  (let ((text-width (- org-ascii-text-width org-ascii-global-margin)))
@@ -1197,7 +1207,7 @@ contextual information."
 ;;;; Inlinetask
 
 (defun org-ascii-format-inlinetask-default
-    (todo type priority name tags contents width inlinetask info)
+  (todo type priority name tags contents width inlinetask info)
   "Format an inline task element for ASCII export.
 See `org-ascii-format-inlinetask-function' for a description
 of the paramaters."
@@ -1210,7 +1220,7 @@ of the paramaters."
       (unless utf8p (concat (make-string width ? ) "\n"))
       ;; Add title.  Fill it if wider than inlinetask.
       (let ((title (org-ascii--build-title inlinetask info width)))
-	(if (<= (length title) width) title
+	(if (<= (string-width title) width) title
 	  (org-ascii--fill-string title width info)))
       "\n"
       ;; If CONTENTS is not empty, insert it along with
@@ -1303,7 +1313,7 @@ contextual information."
      ;; Contents: Pay attention to indentation.  Note: check-boxes are
      ;; already taken care of at the paragraph level so they don't
      ;; interfere with indentation.
-     (let ((contents (org-ascii--indent-string contents (length bullet))))
+     (let ((contents (org-ascii--indent-string contents (string-width bullet))))
        (if (eq (org-element-type (car (org-element-contents item))) 'paragraph)
 	   (org-trim contents)
 	 (concat "\n" contents))))))
@@ -1675,7 +1685,7 @@ are ignored."
 	       (org-element-map table 'table-row
 		 (lambda (row)
 		   (setq max-width
-			 (max (length
+			 (max (string-width
 			       (org-export-data
 				(org-element-contents
 				 (elt (org-element-contents row) col))
@@ -1695,7 +1705,8 @@ a communication channel."
   ;; each cell in the column.
   (let ((width (org-ascii--table-cell-width table-cell info)))
     ;; When contents are too large, truncate them.
-    (unless (or org-ascii-table-widen-columns (<= (length contents) width))
+    (unless (or org-ascii-table-widen-columns
+		(<= (string-width (or contents "")) width))
       (setq contents (concat (substring contents 0 (- width 2)) "=>")))
     ;; Align contents correctly within the cell.
     (let* ((indent-tabs-mode nil)
@@ -1704,7 +1715,7 @@ a communication channel."
 	      (org-ascii--justify-string
 	       contents width
 	       (org-export-table-cell-alignment table-cell info)))))
-      (setq contents (concat data (make-string (- width (length data)) ? ))))
+      (setq contents (concat data (make-string (- width (string-width (or data ""))) ? ))))
     ;; Return cell.
     (concat (format " %s " contents)
 	    (when (memq 'right (org-export-table-cell-borders table-cell info))
-- 
1.8.5.2

-- 
          yashi

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC] ox-ascii.el: fixing variable width character handling
  2014-01-03  7:55   ` Yasushi SHOJI
@ 2014-01-03  9:34     ` Nicolas Goaziou
  2014-01-04 11:18       ` Yasushi SHOJI
  0 siblings, 1 reply; 7+ messages in thread
From: Nicolas Goaziou @ 2014-01-03  9:34 UTC (permalink / raw)
  To: Yasushi SHOJI; +Cc: emacs-orgmode

Hello,

Yasushi SHOJI <yashi@atmark-techno.com> writes:

> Ok here is new one.  I've been using for a while.  Hope it works for
> others.

Thank you. It looks mostly good, but I cannot apply it on top of master
branch. Could you rebase and send an updated version?

> * lisp/ox-ascii.el (org-ascii--current-text-width): Convert `length'
>   to `string-width'.
>   (org-ascii--build-title): Likewise.
>   (org-ascii--build-toc): Likewise.
>   (org-ascii--list-listings): Likewise.
>   (org-ascii--list-tables): Likewise.
>   (org-ascii-template--document-title): Likewise.
>   (org-ascii-inner-template): Likewise.
>   (org-ascii-format-inlinetask-default): Likewise.
>   (org-ascii-format-inlinetask-default):Likewise.
>   (org-ascii-item): Likewise.
>   (org-ascii--table-cell-width): Likewise.
>   (org-ascii-table-cell): Likewise.
>   (org-ascii--current-text-width): Likewise.

The usual format is

  (org-ascii--build-title, org-ascii--build-toc, ...): Likewise.

IOW, you don't need a new line for each function.

> -		 (let ((struct (org-element-property :structure parent-item))
> -		       (beg-item (org-element-property :begin parent-item)))
> +		 (let* ((struct (org-element-property :structure parent-item))
> +			(beg-item (org-element-property :begin parent-item))
> +			(tag (org-list-get-tag beg-item struct))
> +			(bullet (org-list-get-bullet beg-item struct)))

Since you don't use these variables below, you can skip both tag and
bullet binding.

>  		   (+ (- (org-list-get-ind beg-item struct)
>  			 (org-list-get-ind
>  			  (org-list-get-top-point struct) struct))
> -		      (length (org-ascii--checkbox parent-item info))
> -		      (length
> +		      (string-width (or (org-ascii--checkbox parent-item info) ""))
> +		      (string-width
>  		       (or (org-list-get-tag beg-item struct)
> -			   (org-list-get-bullet beg-item struct)))))))))))))
> +			   (org-list-get-bullet beg-item struct)
> +			   ""))))))))))))


Regards,

-- 
Nicolas Goaziou

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC] ox-ascii.el: fixing variable width character handling
  2014-01-03  9:34     ` Nicolas Goaziou
@ 2014-01-04 11:18       ` Yasushi SHOJI
  2014-01-04 19:34         ` Nicolas Goaziou
  2014-01-16 15:25         ` Nicolas Goaziou
  0 siblings, 2 replies; 7+ messages in thread
From: Yasushi SHOJI @ 2014-01-04 11:18 UTC (permalink / raw)
  To: Nicolas Goaziou; +Cc: emacs-orgmode

Hi,

At Fri, 03 Jan 2014 10:34:08 +0100,
Nicolas Goaziou wrote:
> 
> Thank you. It looks mostly good, but I cannot apply it on top of master
> branch. Could you rebase and send an updated version?

Opps.  Rebased version attached.

> The usual format is
> 
>   (org-ascii--build-title, org-ascii--build-toc, ...): Likewise.
> 
> IOW, you don't need a new line for each function.

Thanks.  I've now checked Emacs' Changelog.  It seems like each line
have open and close parentheses.  Am I right?

> > -		 (let ((struct (org-element-property :structure parent-item))
> > -		       (beg-item (org-element-property :begin parent-item)))
> > +		 (let* ((struct (org-element-property :structure parent-item))
> > +			(beg-item (org-element-property :begin parent-item))
> > +			(tag (org-list-get-tag beg-item struct))
> > +			(bullet (org-list-get-bullet beg-item struct)))
> 
> Since you don't use these variables below, you can skip both tag and
> bullet binding.

My bad.  Deleted.  New patch attached.

----- >8 ----- >8 ----- >8 -----
Subject: [PATCH] ox-ascii: Convert `length' to `string-width' where
 appropriate

* lisp/ox-ascii.el (org-ascii--current-text-width): Convert `length'
  to `string-width'.
  (org-ascii--build-title, org-ascii--build-toc)
  (org-ascii--list-listings, org-ascii--list-tables)
  (org-ascii-template--document-title)
  (org-ascii-inner-template, org-ascii-format-inlinetask-default)
  (org-ascii-format-inlinetask-default, org-ascii-item
  (org-ascii--table-cell-width, org-ascii-table-cell)
  (org-ascii--current-text-width): Likewise.

I've checked all occurrences of the function `length' in ox-ascii.el.
It turns out that the most of them are calculating the width of given
string. To support fullwidth characters, we better use `string-width'
instead of `length'.

Some characters in UCS are categorized as "East Asian Ambiguous"[1].
The return value of `string-width' with those characters depends on
how Emacs is setup.  We leave those ambiguous character handling to
Emacs.

Two usages of `length' in `ox-ascii.el' were left as-is, because those
were used for:

 - bullet depth calculation in `org-ascii-headline', and
 - cell position calculation in `org-ascii--table-cell-width'.

[1]: http://www.unicode.org/reports/tr11/#Ambiguous
---
 lisp/ox-ascii.el | 71 +++++++++++++++++++++++++++++++-------------------------
 1 file changed, 40 insertions(+), 31 deletions(-)

diff --git a/lisp/ox-ascii.el b/lisp/ox-ascii.el
index b278ab6..4b20afd 100644
--- a/lisp/ox-ascii.el
+++ b/lisp/ox-ascii.el
@@ -527,15 +527,16 @@ INFO is a plist used as a communication channel."
 		 ;; the list and current item bullet's length.  Also
 		 ;; remove checkbox length, and tag length (for
 		 ;; description lists) or bullet length.
-		 (let ((struct (org-element-property :structure parent-item))
-		       (beg-item (org-element-property :begin parent-item)))
+		 (let* ((struct (org-element-property :structure parent-item))
+			(beg-item (org-element-property :begin parent-item)))
 		   (+ (- (org-list-get-ind beg-item struct)
 			 (org-list-get-ind
 			  (org-list-get-top-point struct) struct))
-		      (length (org-ascii--checkbox parent-item info))
-		      (length
+		      (string-width (or (org-ascii--checkbox parent-item info) ""))
+		      (string-width
 		       (or (org-list-get-tag beg-item struct)
-			   (org-list-get-bullet beg-item struct)))))))))))))
+			   (org-list-get-bullet beg-item struct)
+			   ""))))))))))))
 
 (defun org-ascii--build-title
   (element info text-width &optional underline notags toc)
@@ -591,7 +592,8 @@ possible.  It doesn't apply to `inlinetask' elements."
      (when tags
        (format
 	(format " %%%ds"
-		(max (- text-width  (1+ (length first-part))) (length tags)))
+		(max (- text-width (1+ (string-width first-part)))
+		     (string-width tags)))
 	tags))
      ;; Maybe underline text, if ELEMENT type is `headline' and an
      ;; underline character has been defined.
@@ -602,7 +604,9 @@ possible.  It doesn't apply to `inlinetask' elements."
 			      org-ascii-underline)))))
 	 (and under-char
 	      (concat "\n"
-		      (make-string (length first-part) under-char))))))))
+		      (make-string (/ (string-width first-part)
+				      (char-width under-char))
+				   under-char))))))))
 
 (defun org-ascii--has-caption-p (element info)
   "Non-nil when ELEMENT has a caption affiliated keyword.
@@ -649,7 +653,7 @@ which the table of contents generation has been initiated."
   (let ((title (org-ascii--translate "Table of Contents" info)))
     (concat
      title "\n"
-     (make-string (length title)
+     (make-string (string-width title)
 		  (if (eq (plist-get info :ascii-charset) 'utf-8) ?─ ?_))
      "\n\n"
      (let ((text-width
@@ -676,7 +680,7 @@ generation.  INFO is a plist used as a communication channel."
   (let ((title (org-ascii--translate "List of Listings" info)))
     (concat
      title "\n"
-     (make-string (length title)
+     (make-string (string-width title)
 		  (if (eq (plist-get info :ascii-charset) 'utf-8) ?─ ?_))
      "\n\n"
      (let ((text-width
@@ -690,9 +694,10 @@ generation.  INFO is a plist used as a communication channel."
 	  ;; Store initial text so its length can be computed.  This is
 	  ;; used to properly align caption right to it in case of
 	  ;; filling (like contents of a description list item).
-	  (let ((initial-text
-		 (format (org-ascii--translate "Listing %d:" info)
-			 (incf count))))
+	  (let* ((initial-text
+		  (format (org-ascii--translate "Listing %d:" info)
+			  (incf count)))
+		 (width (string-width initial-text)))
 	    (concat
 	     initial-text " "
 	     (org-trim
@@ -702,8 +707,8 @@ generation.  INFO is a plist used as a communication channel."
 		(let ((caption (or (org-export-get-caption src-block t)
 				   (org-export-get-caption src-block))))
 		  (org-export-data caption info))
-		(- text-width (length initial-text)) info)
-	       (length initial-text))))))
+		(- text-width width) info)
+	       width)))))
 	(org-export-collect-listings info) "\n")))))
 
 (defun org-ascii--list-tables (keyword info)
@@ -714,7 +719,7 @@ generation.  INFO is a plist used as a communication channel."
   (let ((title (org-ascii--translate "List of Tables" info)))
     (concat
      title "\n"
-     (make-string (length title)
+     (make-string (string-width title)
 		  (if (eq (plist-get info :ascii-charset) 'utf-8) ?─ ?_))
      "\n\n"
      (let ((text-width
@@ -728,9 +733,10 @@ generation.  INFO is a plist used as a communication channel."
 	  ;; Store initial text so its length can be computed.  This is
 	  ;; used to properly align caption right to it in case of
 	  ;; filling (like contents of a description list item).
-	  (let ((initial-text
-		 (format (org-ascii--translate "Table %d:" info)
-			 (incf count))))
+	  (let* ((initial-text
+		  (format (org-ascii--translate "Table %d:" info)
+			  (incf count)))
+		 (width (string-width initial-text)))
 	    (concat
 	     initial-text " "
 	     (org-trim
@@ -740,8 +746,8 @@ generation.  INFO is a plist used as a communication channel."
 		(let ((caption (or (org-export-get-caption table t)
 				   (org-export-get-caption table))))
 		  (org-export-data caption info))
-		(- text-width (length initial-text)) info)
-	       (length initial-text))))))
+		(- text-width width) info)
+	       width)))))
 	(org-export-collect-tables info) "\n")))))
 
 (defun org-ascii--unique-links (element info)
@@ -854,14 +860,14 @@ INFO is a plist used as a communication channel."
 	 ((and (org-string-nw-p date) (org-string-nw-p author))
 	  (concat
 	   author
-	   (make-string (- text-width (length date) (length author)) ? )
+	   (make-string (- text-width (string-width date) (string-width author)) ? )
 	   date
 	   (when (org-string-nw-p email) (concat "\n" email))
 	   "\n\n\n"))
 	 ((and (org-string-nw-p date) (org-string-nw-p email))
 	  (concat
 	   email
-	   (make-string (- text-width (length date) (length email)) ? )
+	   (make-string (- text-width (string-width date) (string-width email)) ? )
 	   date "\n\n\n"))
 	 ((org-string-nw-p date)
 	  (concat
@@ -877,11 +883,13 @@ INFO is a plist used as a communication channel."
       (let* ((utf8p (eq (plist-get info :ascii-charset) 'utf-8))
 	     ;; Format TITLE.  It may be filled if it is too wide,
 	     ;; that is wider than the two thirds of the total width.
-	     (title-len (min (length title) (/ (* 2 text-width) 3)))
+	     (title-len (min (string-width title) (/ (* 2 text-width) 3)))
 	     (formatted-title (org-ascii--fill-string title title-len info))
 	     (line
 	      (make-string
-	       (min (+ (max title-len (length author) (length email)) 2)
+	       (min (+ (max title-len
+			    (string-width (or author ""))
+			    (string-width (or email ""))) 2)
 		    text-width) (if utf8p ?━ ?_))))
 	(org-ascii--justify-string
 	 (concat line "\n"
@@ -920,7 +928,7 @@ holding export options."
 	    (concat
 	     title "\n"
 	     (make-string
-	      (length title)
+	      (string-width title)
 	      (if (eq (plist-get info :ascii-charset) 'utf-8) ?─ ?_))))
 	  "\n\n"
 	  (let ((text-width (- org-ascii-text-width org-ascii-global-margin)))
@@ -1197,7 +1205,7 @@ contextual information."
 ;;;; Inlinetask
 
 (defun org-ascii-format-inlinetask-default
-    (todo type priority name tags contents width inlinetask info)
+  (todo type priority name tags contents width inlinetask info)
   "Format an inline task element for ASCII export.
 See `org-ascii-format-inlinetask-function' for a description
 of the paramaters."
@@ -1210,7 +1218,7 @@ of the paramaters."
       (unless utf8p (concat (make-string width ? ) "\n"))
       ;; Add title.  Fill it if wider than inlinetask.
       (let ((title (org-ascii--build-title inlinetask info width)))
-	(if (<= (length title) width) title
+	(if (<= (string-width title) width) title
 	  (org-ascii--fill-string title width info)))
       "\n"
       ;; If CONTENTS is not empty, insert it along with
@@ -1303,7 +1311,7 @@ contextual information."
      ;; Contents: Pay attention to indentation.  Note: check-boxes are
      ;; already taken care of at the paragraph level so they don't
      ;; interfere with indentation.
-     (let ((contents (org-ascii--indent-string contents (length bullet))))
+     (let ((contents (org-ascii--indent-string contents (string-width bullet))))
        (if (eq (org-element-type (car (org-element-contents item))) 'paragraph)
 	   (org-trim contents)
 	 (concat "\n" contents))))))
@@ -1675,7 +1683,7 @@ are ignored."
 	       (org-element-map table 'table-row
 		 (lambda (row)
 		   (setq max-width
-			 (max (length
+			 (max (string-width
 			       (org-export-data
 				(org-element-contents
 				 (elt (org-element-contents row) col))
@@ -1695,7 +1703,8 @@ a communication channel."
   ;; each cell in the column.
   (let ((width (org-ascii--table-cell-width table-cell info)))
     ;; When contents are too large, truncate them.
-    (unless (or org-ascii-table-widen-columns (<= (length contents) width))
+    (unless (or org-ascii-table-widen-columns
+		(<= (string-width (or contents "")) width))
       (setq contents (concat (substring contents 0 (- width 2)) "=>")))
     ;; Align contents correctly within the cell.
     (let* ((indent-tabs-mode nil)
@@ -1704,7 +1713,7 @@ a communication channel."
 	      (org-ascii--justify-string
 	       contents width
 	       (org-export-table-cell-alignment table-cell info)))))
-      (setq contents (concat data (make-string (- width (length data)) ? ))))
+      (setq contents (concat data (make-string (- width (string-width (or data ""))) ? ))))
     ;; Return cell.
     (concat (format " %s " contents)
 	    (when (memq 'right (org-export-table-cell-borders table-cell info))
-- 
1.8.5.2

-- 
             yashi

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC] ox-ascii.el: fixing variable width character handling
  2014-01-04 11:18       ` Yasushi SHOJI
@ 2014-01-04 19:34         ` Nicolas Goaziou
  2014-01-16 15:25         ` Nicolas Goaziou
  1 sibling, 0 replies; 7+ messages in thread
From: Nicolas Goaziou @ 2014-01-04 19:34 UTC (permalink / raw)
  To: Yasushi SHOJI; +Cc: emacs-orgmode

Hello,

Yasushi SHOJI <yashi@atmark-techno.com> writes:

> Opps.  Rebased version attached.

Thank you. However, I'm still unable to apply it. Odd.

> Thanks.  I've now checked Emacs' Changelog.  It seems like each line
> have open and close parentheses.  Am I right?

That's correct, although I don't apply this rule.

> +		 (let* ((struct (org-element-property :structure parent-item))
> +			(beg-item (org-element-property :begin parent-item)))

You forgot to turn back `let*' into `let'.


Regards,

-- 
Nicolas Goaziou

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC] ox-ascii.el: fixing variable width character handling
  2014-01-04 11:18       ` Yasushi SHOJI
  2014-01-04 19:34         ` Nicolas Goaziou
@ 2014-01-16 15:25         ` Nicolas Goaziou
  1 sibling, 0 replies; 7+ messages in thread
From: Nicolas Goaziou @ 2014-01-16 15:25 UTC (permalink / raw)
  To: Yasushi SHOJI; +Cc: emacs-orgmode

Hello,

Yasushi SHOJI <yashi@atmark-techno.com> writes:

> Opps.  Rebased version attached.

Applied. Thank you for your work.


Regards,

-- 
Nicolas Goaziou

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2014-01-16 15:25 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-11-10 10:40 [RFC] ox-ascii.el: fixing variable width character handling Yasushi SHOJI
2013-11-10 13:09 ` Nicolas Goaziou
2014-01-03  7:55   ` Yasushi SHOJI
2014-01-03  9:34     ` Nicolas Goaziou
2014-01-04 11:18       ` Yasushi SHOJI
2014-01-04 19:34         ` Nicolas Goaziou
2014-01-16 15:25         ` Nicolas Goaziou

Code repositories for project(s) associated with this inbox:

	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).