emacs-orgmode@gnu.org archives
 help / color / mirror / code / Atom feed
From: "Mark E. Shoulson" <mark@kli.org>
To: Nicolas Goaziou <n.goaziou@gmail.com>
Cc: emacs-orgmode@gnu.org
Subject: Smart Quotes Exporting (Was: Re: (no subject))
Date: Thu, 31 May 2012 19:26:36 -0400	[thread overview]
Message-ID: <4FC7FE2C.6040702@kli.org> (raw)
In-Reply-To: <87r4u031ye.fsf@gmail.com>

Sorry for messing up the thread subject header; I think I misused 
gmane's posting.

On 05/31/2012 09:38 AM, Nicolas Goaziou wrote:
> Hello,
>
> Mark Shoulson<mark@kli.org>  writes:
>
>> +(defvar org-e-html-quote-replacements
>> +  '(("fr" "« " " »" "‘" "’" "’")
>> +    ("en" "“" "”" "‘" "’" "’")
>> +    ("de" "„" "“" "‚" "‘" "’"))
> A docstring will be required for this variable. It should be
> a defcustom.

Oh, certainly; they're all a disaster.  I think I said that in the 
writeup at the top.  This is just proof of concept, nothing is in the 
right place, nothing is properly documented.  They have to be 
defcustoms, there needs to be a good :type in the defcustom as well as a 
proper docstring.  You'll get no argument from me about the lack (or 
inaccuracy) of docstrings and such.  I hadn't gotten that far yet.  I 
said the patch was only if you wanted to tinker with the development as 
this progresses.

> +(defun org-e-latex--quotation-marks (text info)
> +  (org-export-quotation-marks text info org-e-latex-quote-replacements))
> +  ;; (mapc (lambda(l)
> +  ;; 	  (let ((start 0))
> +  ;; 	    (while (setq start (string-match (car l) text start))
> +  ;; 	      (let ((new-quote (concat (match-string 1 text) (cdr l))))
> +  ;; 		(setq text (replace-match new-quote  t t text))))))
> +  ;; 	(cdr (or (assoc (plist-get info :language) org-e-latex-quotes)
> +  ;; 		 ;; Falls back on English.
> +  ;; 		 (assoc "en" org-e-latex-quotes))))
> +  ;; text)
> Use directly `org-e-latex-quote-replacements' in code then.

Not sure I understand this comment.

>> +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
>> +;; Probably a defcustom eventually.
>> +
>> +;; Each element of this consists of: car=language code, cdr=list of
>> +;; double-quote-open-regexp, double-quote-close-regexp,
>> +;; single-quote-open-regexp, single-quote-close-regexp,&optional
>> +;; single-apostrophe regexp?
>> +;; Just about all will be the same anyway, so mostly language DEFAULT.
>> +
>> +;; For testing purposes, poorly-designed at first.
>> +(defvar org-export-quotes-regexps
>> +  '((DEFAULT
>> +      "\\(?:\\s-\\|[[(]\\|^\\)\\(\"\\)\\w"
>> +      "\\(?:\\S-\\)\\(\"\\)\\s-"
>> +      "\\(?:\\s-\\|(\\|^\\)\\('\\)\\w"
>> +      "\\w\\('\\)\\(?:\\s-\\|\\s.\\|$\\)"
>> +      "\\w\\('\\)\\w")))
> I'm not sure this variable can be used for both the buffer and the
> export engine. Export back-ends will only see chunks of the paragraph.
>
> For example, in the following text,
>
>    He crossed the Rubicon and said: "/Alea jacta est./"
>
> Plain text translators will see three strings:
>
>    1. "He crossed the Rubicon and said: \""
>    2. "Alea jacta est."
>    3. "\""
>
> In case 1, you have an opening quote with nothing after it. In case 3,
> you have a closing quote with nothing before or after it. Plain regexps
> can't help here.
>
> The only solution in can think of is to do quote substitutions in
> paragraphs within the parse tree before they reach the translators (i.e.
> with `org-export-filter-parse-tree-functions').
>
> That's the only way to know if "\"" is an opening or a closing quote,
> for example. The current approach won't work.

Hm.  OK, this may indeed be (a) a problem and (b) an indication that I 
really don't understand the process as I thought I did... ... ...  Ah.  
So when the "plain" text is being exported, the exporter passes along 
the text in chunks as divided up by the formatting.  So string #2 is 
broken out from the others due to its being in italics.  That is indeed 
an issue.  Moreover, I never even properly considered the effects of 
formatting characters (as opposed to punctuation) right next to the 
quote-marks, even if this weren't a problem.

So... there's the filter-parse-tree-functions hook gets applied within 
the parse tree... so a back-end can add a function to that list which 
looks over the parse-tree and watches for these border cases (and also 
the ones within ordinary strings).  Looks like it's going to be tough to 
work in any flexibility to define further per-language or per-backend 
cleverness to handle anything beyond the "canonical set" of open-double, 
close-double, open-single, close-single, and mid-word.

To be sure, anything we do will most assuredly fail even on some fairly 
reasonable input, in which case the users are pretty much on their own 
and will have to do things the hard way.  And I could use that as the 
answer here, that, "well, it'll work only within plain-text strings" 
(and I might possibly still have to use that answer), but I would rather 
include the situations you bring up in the supported set and not throw 
up my hands at it.  So, yes, will look at that.
>> +  (let* ((start 0)
>> +	 (regexps
>> +	  (cdr
>> +	   (or
>> +	    (assoc (plist-get info :language)
>> +		   org-export-quotes-regexps)
>> +	    (assoc 'DEFAULT org-export-quotes-regexps))))
> Use `assq' instead of `assoc' in the second case.

Good call.

>> +	 (subs (cdr (or (assoc (plist-get info :language)
>> +			       replacements)
>> +			(assoc "en" replacements))))
>> +	 (quotes (pairlis regexps subs)))
>> +    (mapc (lambda (p)
>> +	    (let ((re (car p))
>> +		  (su (cdr p)))
>> +	      (while (setq start (string-match re text start))
>> +		(setq text (replace-match su t t text 1)))))
> Use `replace-regexp-in-string' instead.
>
>    (replace-regexp-in-string (car p) (cdr p) text t t 1)

I'd been looking at other functions that didn't have that available; 
thanks for pointing me at it.

~mark

  reply	other threads:[~2012-05-31 23:26 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-05-22  3:32 "Smart" quotes Mark E. Shoulson
2012-05-23 22:17 ` Nicolas Goaziou
2012-05-24  3:05   ` Mark E. Shoulson
2012-05-25 17:14     ` Nicolas Goaziou
2012-05-25 17:51       ` Jambunathan K
2012-05-25 22:51       ` Mark E. Shoulson
2012-05-26  6:48         ` Nicolas Goaziou
2012-05-29  1:30           ` Mark E. Shoulson
2012-05-29 17:57             ` Nicolas Goaziou
2012-05-30  0:51               ` Mark E. Shoulson
2012-05-31  1:50                 ` (no subject) Mark Shoulson
2012-05-31 13:38                   ` Nicolas Goaziou
2012-05-31 23:26                     ` Mark E. Shoulson [this message]
2012-06-01 17:11                       ` Smart Quotes Exporting Nicolas Goaziou
2012-06-01 22:41                         ` Mark E. Shoulson
2012-06-03  3:16                         ` Mark E. Shoulson
2012-06-06  2:14                         ` Mark E. Shoulson
2012-06-07 19:21                           ` Nicolas Goaziou
2012-06-11  1:28                             ` Mark Shoulson
2012-06-12 13:21                               ` Nicolas Goaziou
2012-06-15 16:20                                 ` Mark Shoulson
2012-06-19  9:26                                   ` Nicolas Goaziou
2012-08-07 23:18                                     ` Bastien

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.orgmode.org/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4FC7FE2C.6040702@kli.org \
    --to=mark@kli.org \
    --cc=emacs-orgmode@gnu.org \
    --cc=n.goaziou@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).