* Re: Smart Quotes Exporting
2012-06-01 17:11 ` Smart Quotes Exporting Nicolas Goaziou
@ 2012-06-01 22:41 ` Mark E. Shoulson
2012-06-03 3:16 ` Mark E. Shoulson
2012-06-06 2:14 ` Mark E. Shoulson
2 siblings, 0 replies; 23+ messages in thread
From: Mark E. Shoulson @ 2012-06-01 22:41 UTC (permalink / raw)
To: Nicolas Goaziou; +Cc: emacs-orgmode
On 06/01/2012 01:11 PM, Nicolas Goaziou wrote:
> Hello,
>
> "Mark E. Shoulson"<mark@kli.org> writes:
>
>> Oh, certainly; they're all a disaster. I think I said that in the
>> writeup at the top. This is just proof of concept, nothing is in the
>> right place, nothing is properly documented. They have to be
>> defcustoms, there needs to be a good :type in the defcustom as well as
>> a proper docstring. You'll get no argument from me about the lack (or
>> inaccuracy) of docstrings and such. I hadn't gotten that far yet.
>> I said the patch was only if you wanted to tinker with the development
>> as this progresses.
> No worries, I was just making some comments before forgetting about
> them.
Ah, ok. Good! Thanks.
>>> +(defun org-e-latex--quotation-marks (text info)
>>> + (org-export-quotation-marks text info org-e-latex-quote-replacements))
>>> + ;; (mapc (lambda(l)
>>> + ;; (let ((start 0))
>>> + ;; (while (setq start (string-match (car l) text start))
>>> + ;; (let ((new-quote (concat (match-string 1 text) (cdr l))))
>>> + ;; (setq text (replace-match new-quote t t text))))))
>>> + ;; (cdr (or (assoc (plist-get info :language) org-e-latex-quotes)
>>> + ;; ;; Falls back on English.
>>> + ;; (assoc "en" org-e-latex-quotes))))
>>> + ;; text)
>>> Use directly `org-e-latex-quote-replacements' in code then.
>> Not sure I understand this comment.
> Since `org-e-latex--quotation-marks' just calls
> `org-export-quotation-marks', you can remove completely the former from
> "org-export.el" and use the latter instead.
Well, that was done on purpose, and maybe the reason will make sense.
As I see it, each exporter should be able to have its own smartifier
function, and the export engine should make no assumptions about that:
just call the individual exporter's function. On the other hand, many
(but perhaps not all!) of the exporters may find themselves using
essentially the same code just with different replacement strings. So I
thought that "general-purpose" should be in org-export.el, just for the
convenience of exporters should they choose to make use of it. So, many
of the exporters' smartifier functions will really just be calls to the
more general-purpose function.
Does that make sense?
>> So... there's the filter-parse-tree-functions hook gets applied within
>> the parse tree... so a back-end can add a function to that list which
>> looks over the parse-tree and watches for these border cases (and also
>> the ones within ordinary strings). Looks like it's going to be tough
>> to work in any flexibility to define further per-language or
>> per-backend cleverness to handle anything beyond the "canonical set"
>> of open-double, close-double, open-single, close-single, and mid-word.
>>
>> To be sure, anything we do will most assuredly fail even on some
>> fairly reasonable input, in which case the users are pretty much on
>> their own and will have to do things the hard way. And I could use
>> that as the answer here, that, "well, it'll work only within
>> plain-text strings" (and I might possibly still have to use that
>> answer), but I would rather include the situations you bring up in the
>> supported set and not throw up my hands at it. So, yes, will look at
>> that.
> Actually it isn't very hard to handle this problem. But it will be
> different than the fontification used in an Org buffer.
Yes, the fontification on-screen is different, and uses a rather
different function--but if I can help it, the same regexps! So things
work the same everywhere.
I also started thinking a little about what you write below, how we can
inspect the characters just after or before quotes at the very beginning
or end of each chunk. It would be nice if it could all be encapsulated
neatly in the regexp(s).
> As a first approximation, I can imagine a function accepting an element,
> an object or a secondary string and returning an equivalent element,
> object or secondary string, with its quotes "smartified". The algorithm
> could go like this:
>
> Walk element/object/secondary-string's contents .
Need it be element/object/secondary-string? At the bottom level it's
always about strings; the higher levels don't affect the processing of
each string in isolation. Do we need to intercept it at the element
level or just wait to grab things in the plain-text filter, since we
have access at that point too?
(Might also be that my understanding of the process and the nature of
elements is faulty or limited. Will have to see what works.)
>
> 1. When a string is encountered:
>
> 1. If it has a quote as its first or last position, check for
> objects before or after the string to guess its status. An
> object never starts with a white space, but you may have to
> check :post-blank property in order to know if previous object
> had white spaces at its end.
Hmm, this may in fact answer my question above: you need to be able to
get at the object level to test the post-blank. I'll experiment.
> 2. For each quote everywhere else in the string, your regexp can
> handle it fine.
>
> 2. When an object belonging to `org-element-recursive-objects' is
> encountered, apply the function to this object.
>
> 3. Accumulate returned strings or objects.
>
> Use accumulated data as the contents of the new object to return (i.e.
> just add the type and the same properties at the beginning of this list
> if it was an object or an element, return it as-is if that was
> a secondary string).
>
> On the elements side, only paragraphs, verse-blocks and table-rows can
> directly contain quotes. Also, headline, inlinetask item and
> footnote-reference have secondary strings containing quotes.
I also haven't yet worked in smarts (especially in the on-screen
fontifier) for things like not fontifying inside comments or verbatim
strings, etc. That'll come in time.
> I'm not sure yet where and how to install such a function, but I will
> think about it when it is implemented.
Uuum... Maybe org-export-filter-parse-tree-functions?
~mark
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Smart Quotes Exporting
2012-06-01 17:11 ` Smart Quotes Exporting Nicolas Goaziou
2012-06-01 22:41 ` Mark E. Shoulson
@ 2012-06-03 3:16 ` Mark E. Shoulson
2012-06-06 2:14 ` Mark E. Shoulson
2 siblings, 0 replies; 23+ messages in thread
From: Mark E. Shoulson @ 2012-06-03 3:16 UTC (permalink / raw)
To: Nicolas Goaziou; +Cc: emacs-orgmode
[-- Attachment #1: Type: text/plain, Size: 1090 bytes --]
All right, preliminary patch is attached, *maybe* good enough for more
serious consideration now, but might need some fixes. Still only uses
ordinary regexps and plain-text strings, but can now handle the example
with formatting-breaks next to quotes. Things have been moved into more
appropriate locations, made customs, docstrings and types fixed, etc, etc.
It supports onscreen display of "smart" quotes (when enabled); I have
the quotes displayed in org-document-info face so they are slightly
distinct, to make it clearer that they are "altered" from what they are
in the plain text. This may or may not be a popular (or good) idea. I
have also built it into the new export engine in org-e-latex and
org-e-html as proofs of concept. I'm not positive the latex one will
work properly for German, though; there might need to be something
enabled in LaTeX for it to format ,, into „.
It should probably be set not to smartify quotes onscreen in comments; I
haven't done that yet.
Comments welcome; I hope I didn't complicate matters in the export
engines too much.
~mark
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-Add-smart-quotes-for-onscreen-display-and-for-latex-.patch --]
[-- Type: text/x-patch; name="0001-Add-smart-quotes-for-onscreen-display-and-for-latex-.patch", Size: 13760 bytes --]
From 1bc507cf69c94d5645436abc6e28e7d96999083e Mon Sep 17 00:00:00 2001
From: Mark Shoulson <mark@kli.org>
Date: Tue, 29 May 2012 23:01:12 -0400
Subject: [PATCH] Add `smart' quotes for onscreen display and for latex and
html export
* lisp/org.el: Add `smart' quotes: custom variables to define
regexps to recognize quotes, to define how and whether to
display them, and org-fontify-quotes to display `smart-quote'
characters when activated.
* contrib/lisp/org-export.el: Add function org-export-quotation-marks
as a utility function usable by individual exporters to apply
`smart' quotes.
* contrib/lisp/org-e-latex.el: Replace org-e-latex-quotes custom with
org-e-latex-quotes-replacements and make org-e-latex--quotation-marks
use the org-export-quotation-marks function in org-export.el.
* contrib/lisp/org-e-html.el: Replace org-e-html-quotes custom with
org-e-html-quotes-replacements and enable org-e-html--quotation-marks,
using org-export-quotation-marks function in org-export.el.
---
contrib/lisp/org-e-html.el | 57 ++++++++----------------
contrib/lisp/org-e-latex.el | 67 ++++++++++-------------------
contrib/lisp/org-export.el | 26 +++++++++++
lisp/org.el | 101 +++++++++++++++++++++++++++++++++++++++++++
4 files changed, 168 insertions(+), 83 deletions(-)
diff --git a/contrib/lisp/org-e-html.el b/contrib/lisp/org-e-html.el
index 53547a0..d4a505e 100644
--- a/contrib/lisp/org-e-html.el
+++ b/contrib/lisp/org-e-html.el
@@ -1077,37 +1077,24 @@ in order to mimic default behaviour:
;;;; Plain text
-(defcustom org-e-html-quotes
- '(("fr"
- ("\\(\\s-\\|[[(]\\|^\\)\"" . "«~")
- ("\\(\\S-\\)\"" . "~»")
- ("\\(\\s-\\|(\\|^\\)'" . "'"))
- ("en"
- ("\\(\\s-\\|[[(]\\|^\\)\"" . "``")
- ("\\(\\S-\\)\"" . "''")
- ("\\(\\s-\\|(\\|^\\)'" . "`")))
- "Alist for quotes to use when converting english double-quotes.
-
-The CAR of each item in this alist is the language code.
-The CDR of each item in this alist is a list of three CONS:
-- the first CONS defines the opening quote;
-- the second CONS defines the closing quote;
-- the last CONS defines single quotes.
-
-For each item in a CONS, the first string is a regexp
-for allowed characters before/after the quote, the second
-string defines the replacement string for this quote."
+(defcustom org-e-html-smart-quote-replacements
+ '(("fr" "« " " »" "‘" "’" "’")
+ ("en" "“" "”" "‘" "’" "’")
+ ("de" "„" "“" "‚" "‘" "’"))
+ "What to export for `smart-quotes'.
+A list of five strings:
+ 1. Open double-quotes
+ 2. Close double-quotes
+ 3. Open single-quote
+ 4. Close single-quote
+ 5. Mid-word apostrophe"
:group 'org-export-e-html
:type '(list
- (cons :tag "Opening quote"
- (string :tag "Regexp for char before")
- (string :tag "Replacement quote "))
- (cons :tag "Closing quote"
- (string :tag "Regexp for char after ")
- (string :tag "Replacement quote "))
- (cons :tag "Single quote"
- (string :tag "Regexp for char before")
- (string :tag "Replacement quote "))))
+ (string :tag "Open double-quotes") ; "“"
+ (string :tag "Close double-quotes") ; "”"
+ (string :tag "Open single-quote") ; "‘"
+ (string :tag "Close single-quote") ; "’"
+ (string :tag "Mid-word apostrophe"))) ; "’"
;;;; Compilation
@@ -1497,15 +1484,7 @@ This is used to choose a separator for constructs like \\verb."
"Export quotation marks depending on language conventions.
TEXT is a string containing quotation marks to be replaced. INFO
is a plist used as a communication channel."
- (mapc (lambda(l)
- (let ((start 0))
- (while (setq start (string-match (car l) text start))
- (let ((new-quote (concat (match-string 1 text) (cdr l))))
- (setq text (replace-match new-quote t t text))))))
- (cdr (or (assoc (plist-get info :language) org-e-html-quotes)
- ;; Falls back on English.
- (assoc "en" org-e-html-quotes))))
- text)
+ (org-export-quotation-marks text info org-e-html-smart-quote-replacements))
(defun org-e-html--wrap-label (element output)
"Wrap label associated to ELEMENT around OUTPUT, if appropriate.
@@ -2729,7 +2708,7 @@ contextual information."
;; (format "\\%s{}" (match-string 1 text)) nil t text)
;; start (match-end 0))))
;; Handle quotation marks
- ;; (setq text (org-e-html--quotation-marks text info))
+ (setq text (org-e-html--quotation-marks text info))
;; Convert special strings.
;; (when (plist-get info :with-special-strings)
;; (while (string-match (regexp-quote "...") text)
diff --git a/contrib/lisp/org-e-latex.el b/contrib/lisp/org-e-latex.el
index 67e9197..2543c29 100644
--- a/contrib/lisp/org-e-latex.el
+++ b/contrib/lisp/org-e-latex.el
@@ -687,38 +687,28 @@ during latex export it will output
;;;; Plain text
-(defcustom org-e-latex-quotes
- '(("fr"
- ("\\(\\s-\\|[[(]\\|^\\)\"" . "«~")
- ("\\(\\S-\\)\"" . "~»")
- ("\\(\\s-\\|(\\|^\\)'" . "'"))
- ("en"
- ("\\(\\s-\\|[[(]\\|^\\)\"" . "``")
- ("\\(\\S-\\)\"" . "''")
- ("\\(\\s-\\|(\\|^\\)'" . "`")))
- "Alist for quotes to use when converting english double-quotes.
-
-The CAR of each item in this alist is the language code.
-The CDR of each item in this alist is a list of three CONS:
-- the first CONS defines the opening quote;
-- the second CONS defines the closing quote;
-- the last CONS defines single quotes.
-
-For each item in a CONS, the first string is a regexp
-for allowed characters before/after the quote, the second
-string defines the replacement string for this quote."
+(defcustom org-e-latex-quote-replacements
+ '(("en" "``" "''" "`" "'" "'")
+ ("fr" "«~" "~»" "‹~" "~›" "'")
+ ("de" ",," "``" "," "`" "'"))
+ "What to output for quotes. Each element is a list of six strings.
+The first string specifies the language these quotes apply to (\"en\",
+\"fr\", \"de\", etc.; see the LANGUAGE keyword), and the other five
+define the strings to use for, in order:
+ 1. Open double-quotes
+ 2. Close double-quotes
+ 3. Open single-quote
+ 4. Close single-quote
+ 5. Mid-word apostrophe"
:group 'org-export-e-latex
- :type '(list
- (cons :tag "Opening quote"
- (string :tag "Regexp for char before")
- (string :tag "Replacement quote "))
- (cons :tag "Closing quote"
- (string :tag "Regexp for char after ")
- (string :tag "Replacement quote "))
- (cons :tag "Single quote"
- (string :tag "Regexp for char before")
- (string :tag "Replacement quote "))))
-
+ :type '(repeat
+ (list
+ (string :tag "Language code")
+ (string :tag "Open double-quotes")
+ (string :tag "Close double-quotes")
+ (string :tag "Open single-quote")
+ (string :tag "Close single-quote")
+ (string :tag "Mid-word apostrophe"))))
;;;; Compilation
@@ -852,19 +842,8 @@ nil."
options
","))
-(defun org-e-latex--quotation-marks (text info)
- "Export quotation marks depending on language conventions.
-TEXT is a string containing quotation marks to be replaced. INFO
-is a plist used as a communication channel."
- (mapc (lambda(l)
- (let ((start 0))
- (while (setq start (string-match (car l) text start))
- (let ((new-quote (concat (match-string 1 text) (cdr l))))
- (setq text (replace-match new-quote t t text))))))
- (cdr (or (assoc (plist-get info :language) org-e-latex-quotes)
- ;; Falls back on English.
- (assoc "en" org-e-latex-quotes))))
- text)
+(defun org-e-latex--quotation-marks (text info)
+ (org-export-quotation-marks text info org-e-latex-quote-replacements))
(defun org-e-latex--wrap-label (element output)
"Wrap label associated to ELEMENT around OUTPUT, if appropriate.
diff --git a/contrib/lisp/org-export.el b/contrib/lisp/org-export.el
index b9294e5..87f5c84 100644
--- a/contrib/lisp/org-export.el
+++ b/contrib/lisp/org-export.el
@@ -284,6 +284,32 @@ rules.")
:tag "Org Export General"
:group 'org-export)
+;; Generic function, usable by exporters, but they can define their own
+;; instead.
+(defun org-export-quotation-marks (text info replacements)
+ "Export quotation marks depending on language conventions.
+TEXT is a string containing quotation marks to be replaced. INFO
+is a plist used as a communication channel."
+ ;; (message text)
+ (let* ((regexps
+ (cdr
+ (or
+ (assoc (plist-get info :language)
+ org-smart-quotes-regexps)
+ (assq 'DEFAULT org-smart-quotes-regexps))))
+ (subs (cdr (or (assoc (plist-get info :language)
+ replacements)
+ (assoc "en" replacements))))
+ (quotes (pairlis regexps subs)))
+ (mapc (lambda (p)
+ (let ((re (car p))
+ (su (cdr p)))
+ (setq text (replace-regexp-in-string re su text t t 9))))
+ quotes))
+ text)
+
+;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
+
(defcustom org-export-with-archived-trees 'headline
"Whether sub-trees with the ARCHIVE tag should be exported.
diff --git a/lisp/org.el b/lisp/org.el
index 0157e36..70d7266 100644
--- a/lisp/org.el
+++ b/lisp/org.el
@@ -3625,6 +3625,69 @@ When nil, the \\name form remains in the buffer."
:version "24.1"
:type 'boolean)
+(defcustom org-smart-quotes nil
+ "Non-nil means display `smart' quotes on-screen in place
+of \" and ' characters."
+ :group 'org-appearance
+ :type 'boolean)
+
+(defcustom org-smart-quotes-replacements
+ '("“" "”" "‘" "’" "’")
+ "What to display on-screen when `org-smart-quotes' is non-nil.
+A list of five strings:
+ 1. Open double-quotes
+ 2. Close double-quotes
+ 3. Open single-quote
+ 4. Close single-quote
+ 5. Mid-word apostrophe"
+ :group 'org-appearance
+ :type '(list
+ (string :tag "Open double-quotes" "«") ; "“"
+ (string :tag "Close double-quotes" "»") ; "”"
+ (string :tag "Open single-quote" "‹") ; "‘"
+ (string :tag "Close single-quote" "›") ; "’"
+ (string :tag "Mid-word apostrophe" "’"))) ; "’"
+
+(defcustom org-smart-quotes-regexps
+ '((DEFAULT
+ "\\(?:\\s-\\|\\s(\\|^\\)\\(?9:\"\\)\\(?:\\w\\|\\s.\\|\\s_\\)\\|\\s-\\(?9:\"\\)$"
+ "\\(?:\\S-\\)\\(?9:\"\\)\\(?:\\s-\\|$\\|\\s)\\|\\s.\\)\\|^\\(?9:\"\\)\\s-"
+ "\\(?:\\s-\\|(\\|^\\)\\(?9:'\\)\\w\\|\\s-\\(?9:'\\)$"
+ "\\w\\(?9:'\\)\\(?:\\s-\\|\\s.\\|$\\)\\|^\\(?9:'\\)\\s-"
+ "\\w\\(?9:'\\)\\w"))
+ "Regexps for quotes to be made `smart' quotes upon export or onscreen.
+Each element is a list of six strings. The car is the a string
+representing the language to which this definition applies (e.g. \"en\",
+\"fr\", \"de\", etc.); the cdr (the other five elements) are five REs
+matching, in order:
+ 1. Opening double-quotes
+ 2. Closing double-quotes
+ 3. Opening single-quotes
+ 4. Closing single-quotes
+ 5. Mid-word apostrophes
+
+Each regexp should surround the actual quote in a capturing group, which
+must be specified as number 9 (so as not to conflict with other processing.)
+
+One element should have as its car the atom DEFAULT, to be used when no
+other element fits. It is also the one used for on-screen display of
+`smart' quotes (see the variable `org-smart-quotes').
+
+As what makes an opening or closing quote is somewhat consistent across
+languages (as opposed to how they are represented in typography), the
+DEFAULT element is likely sufficient for most purposes."
+ :group 'org-export-general
+ :group 'org-appearance
+ :type '(repeat
+ (list
+ (choice (const DEFAULT)
+ (string :tag "Language"))
+ (regexp :tag "Open double-quotes")
+ (regexp :tag "Close double-quotes")
+ (regexp :tag "Open single-quote")
+ (regexp :tag "Close double-quote")
+ (regexp :tag "Mid-word apostrophe"))))
+
(defvar org-emph-re nil
"Regular expression for matching emphasis.
After a match, the match groups contain these elements:
@@ -5927,6 +5990,7 @@ needs to be inserted at a specific position in the font-lock sequence.")
;; Specials
'(org-do-latex-and-special-faces)
'(org-fontify-entities)
+ '(org-fontify-quotes)
'(org-raise-scripts)
;; Code
'(org-activate-code (1 'org-code t))
@@ -5948,6 +6012,43 @@ needs to be inserted at a specific position in the font-lock sequence.")
'(org-font-lock-keywords t nil nil backward-paragraph))
(kill-local-variable 'font-lock-keywords) nil))
+(defun org-fontify-quotes (limit)
+ (require 'org-export)
+ (when org-smart-quotes
+ (let* ((start (point))
+ k su
+ (splice-string (lambda (lst join)
+ (if (null (cdr lst)) (car lst)
+ (concat (car lst) join
+ (splice-string (cdr lst) join)))))
+ (regexps
+ (cdr
+ (assq 'DEFAULT org-smart-quotes-regexps)))
+ (i 1)
+ (allreg
+ (mapconcat (lambda (n) (prog1 (format "\\(?%d:%s\\)" i n)
+ (setq i (1+ i))))
+ regexps "\\|"))
+ (quotes (pairlis regexps org-smart-quotes-replacements)))
+ (catch 'match
+ (while (re-search-forward allreg limit t)
+ (cond ((match-string 1)
+ (setq su (nth 0 org-smart-quotes-replacements)))
+ ((match-string 2)
+ (setq su (nth 1 org-smart-quotes-replacements)))
+ ((match-string 3)
+ (setq su (nth 2 org-smart-quotes-replacements)))
+ ((match-string 4)
+ (setq su (nth 3 org-smart-quotes-replacements)))
+ ((match-string 5)
+ (setq su (nth 4 org-smart-quotes-replacements))))
+ (add-text-properties (match-beginning 9) (match-end 9)
+ (list 'font-lock-fontified t
+ 'face 'org-document-info))
+ (compose-region (match-beginning 9) (match-end 9) su nil)
+ (backward-char 1)
+ (throw 'match t))))))
+
(defun org-toggle-pretty-entities ()
"Toggle the composition display of entities as UTF8 characters."
(interactive)
--
1.7.7.6
^ permalink raw reply related [flat|nested] 23+ messages in thread
* Re: Smart Quotes Exporting
2012-06-01 17:11 ` Smart Quotes Exporting Nicolas Goaziou
2012-06-01 22:41 ` Mark E. Shoulson
2012-06-03 3:16 ` Mark E. Shoulson
@ 2012-06-06 2:14 ` Mark E. Shoulson
2012-06-07 19:21 ` Nicolas Goaziou
2 siblings, 1 reply; 23+ messages in thread
From: Mark E. Shoulson @ 2012-06-06 2:14 UTC (permalink / raw)
To: Nicolas Goaziou; +Cc: emacs-orgmode
[-- Attachment #1: Type: text/plain, Size: 970 bytes --]
Update on the smart-quotes patch. Supports the odt exporter now too,
which I think covers all the current major "new" exporters for which it
is relevant (adding smart quotes to ASCII export is a contradiction in
terms; should it be in the "publish" exporter? It didn't look like it
to me).
Added an options keyword, '"' (that is, the double-quote mark) to select
smart quotes on/off, and a defcustom for customizing your default. Set
the default default [sic] to nil, though actually it might be reasonable
to set it to t. Slight touch-up to the regexps since last time, but
they will definitely be subject to a lot of fine-tuning as more special
cases are found that break them and ways to fix it are found (the
close-quote still breaks on one of "/a/." or "/a./")
It's pretty good on the whole, though, usually guesses right. I know
there's some work being done on the odt exporter; hope this fits in well
with it.
How does it look to you?
~mark
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-Add-smart-quotes-for-onscreen-display-and-for-latex-.patch --]
[-- Type: text/x-patch; name="0001-Add-smart-quotes-for-onscreen-display-and-for-latex-.patch", Size: 18446 bytes --]
From e6df2efd1a9ce36964a20fc06aa2a688acd87efb Mon Sep 17 00:00:00 2001
From: Mark Shoulson <mark@kli.org>
Date: Tue, 29 May 2012 23:01:12 -0400
Subject: [PATCH] Add `smart' quotes for onscreen display and for latex and
html export
* lisp/org.el: Add `smart' quotes: custom variables to define
regexps to recognize quotes, to define how and whether to
display them, and org-fontify-quotes to display `smart-quote'
characters when activated.
* contrib/lisp/org-export.el: Add function org-export-quotation-marks
as a utility function usable by individual exporters to apply
`smart' quotes. Also add keyword '"' for customizing smart quotes,
and custom default for it.
* contrib/lisp/org-e-latex.el: Replace org-e-latex-quotes custom with
org-e-latex-quotes-replacements and make org-e-latex--quotation-marks
use the org-export-quotation-marks function in org-export.el.
* contrib/lisp/org-e-html.el: Replace org-e-html-quotes custom with
org-e-html-quotes-replacements and enable org-e-html--quotation-marks,
using org-export-quotation-marks function in org-export.el.
* contrib/lisp/org-e-odt.el: Replace org-e-odt-quotes custom with
org-e-odt-quotes-replacements and make org-e-odt--quotation-marks
use org-export-quotations-marks function in org-export.el.
---
contrib/lisp/org-e-html.el | 57 ++++++++----------------
contrib/lisp/org-e-latex.el | 67 ++++++++++-------------------
contrib/lisp/org-e-odt.el | 68 ++++++++++-------------------
contrib/lisp/org-export.el | 38 ++++++++++++++++
lisp/org.el | 101 +++++++++++++++++++++++++++++++++++++++++++
5 files changed, 203 insertions(+), 128 deletions(-)
diff --git a/contrib/lisp/org-e-html.el b/contrib/lisp/org-e-html.el
index 4287a59..c49608d 100644
--- a/contrib/lisp/org-e-html.el
+++ b/contrib/lisp/org-e-html.el
@@ -1043,37 +1043,24 @@ in order to mimic default behaviour:
;;;; Plain text
-(defcustom org-e-html-quotes
- '(("fr"
- ("\\(\\s-\\|[[(]\\|^\\)\"" . "«~")
- ("\\(\\S-\\)\"" . "~»")
- ("\\(\\s-\\|(\\|^\\)'" . "'"))
- ("en"
- ("\\(\\s-\\|[[(]\\|^\\)\"" . "``")
- ("\\(\\S-\\)\"" . "''")
- ("\\(\\s-\\|(\\|^\\)'" . "`")))
- "Alist for quotes to use when converting english double-quotes.
-
-The CAR of each item in this alist is the language code.
-The CDR of each item in this alist is a list of three CONS:
-- the first CONS defines the opening quote;
-- the second CONS defines the closing quote;
-- the last CONS defines single quotes.
-
-For each item in a CONS, the first string is a regexp
-for allowed characters before/after the quote, the second
-string defines the replacement string for this quote."
+(defcustom org-e-html-smart-quote-replacements
+ '(("fr" "« " " »" "‘" "’" "’")
+ ("en" "“" "”" "‘" "’" "’")
+ ("de" "„" "“" "‚" "‘" "’"))
+ "What to export for `smart-quotes'.
+A list of five strings:
+ 1. Open double-quotes
+ 2. Close double-quotes
+ 3. Open single-quote
+ 4. Close single-quote
+ 5. Mid-word apostrophe"
:group 'org-export-e-html
:type '(list
- (cons :tag "Opening quote"
- (string :tag "Regexp for char before")
- (string :tag "Replacement quote "))
- (cons :tag "Closing quote"
- (string :tag "Regexp for char after ")
- (string :tag "Replacement quote "))
- (cons :tag "Single quote"
- (string :tag "Regexp for char before")
- (string :tag "Replacement quote "))))
+ (string :tag "Open double-quotes") ; "“"
+ (string :tag "Close double-quotes") ; "”"
+ (string :tag "Open single-quote") ; "‘"
+ (string :tag "Close single-quote") ; "’"
+ (string :tag "Mid-word apostrophe"))) ; "’"
;;;; Compilation
@@ -1459,15 +1446,7 @@ This is used to choose a separator for constructs like \\verb."
"Export quotation marks depending on language conventions.
TEXT is a string containing quotation marks to be replaced. INFO
is a plist used as a communication channel."
- (mapc (lambda(l)
- (let ((start 0))
- (while (setq start (string-match (car l) text start))
- (let ((new-quote (concat (match-string 1 text) (cdr l))))
- (setq text (replace-match new-quote t t text))))))
- (cdr (or (assoc (plist-get info :language) org-e-html-quotes)
- ;; Falls back on English.
- (assoc "en" org-e-html-quotes))))
- text)
+ (org-export-quotation-marks text info org-e-html-smart-quote-replacements))
(defun org-e-html--wrap-label (element output)
"Wrap label associated to ELEMENT around OUTPUT, if appropriate.
@@ -2691,7 +2670,7 @@ contextual information."
;; (format "\\%s{}" (match-string 1 text)) nil t text)
;; start (match-end 0))))
;; Handle quotation marks
- ;; (setq text (org-e-html--quotation-marks text info))
+ (setq text (org-e-html--quotation-marks text info))
;; Convert special strings.
;; (when (plist-get info :with-special-strings)
;; (while (string-match (regexp-quote "...") text)
diff --git a/contrib/lisp/org-e-latex.el b/contrib/lisp/org-e-latex.el
index 67e9197..2543c29 100644
--- a/contrib/lisp/org-e-latex.el
+++ b/contrib/lisp/org-e-latex.el
@@ -687,38 +687,28 @@ during latex export it will output
;;;; Plain text
-(defcustom org-e-latex-quotes
- '(("fr"
- ("\\(\\s-\\|[[(]\\|^\\)\"" . "«~")
- ("\\(\\S-\\)\"" . "~»")
- ("\\(\\s-\\|(\\|^\\)'" . "'"))
- ("en"
- ("\\(\\s-\\|[[(]\\|^\\)\"" . "``")
- ("\\(\\S-\\)\"" . "''")
- ("\\(\\s-\\|(\\|^\\)'" . "`")))
- "Alist for quotes to use when converting english double-quotes.
-
-The CAR of each item in this alist is the language code.
-The CDR of each item in this alist is a list of three CONS:
-- the first CONS defines the opening quote;
-- the second CONS defines the closing quote;
-- the last CONS defines single quotes.
-
-For each item in a CONS, the first string is a regexp
-for allowed characters before/after the quote, the second
-string defines the replacement string for this quote."
+(defcustom org-e-latex-quote-replacements
+ '(("en" "``" "''" "`" "'" "'")
+ ("fr" "«~" "~»" "‹~" "~›" "'")
+ ("de" ",," "``" "," "`" "'"))
+ "What to output for quotes. Each element is a list of six strings.
+The first string specifies the language these quotes apply to (\"en\",
+\"fr\", \"de\", etc.; see the LANGUAGE keyword), and the other five
+define the strings to use for, in order:
+ 1. Open double-quotes
+ 2. Close double-quotes
+ 3. Open single-quote
+ 4. Close single-quote
+ 5. Mid-word apostrophe"
:group 'org-export-e-latex
- :type '(list
- (cons :tag "Opening quote"
- (string :tag "Regexp for char before")
- (string :tag "Replacement quote "))
- (cons :tag "Closing quote"
- (string :tag "Regexp for char after ")
- (string :tag "Replacement quote "))
- (cons :tag "Single quote"
- (string :tag "Regexp for char before")
- (string :tag "Replacement quote "))))
-
+ :type '(repeat
+ (list
+ (string :tag "Language code")
+ (string :tag "Open double-quotes")
+ (string :tag "Close double-quotes")
+ (string :tag "Open single-quote")
+ (string :tag "Close single-quote")
+ (string :tag "Mid-word apostrophe"))))
;;;; Compilation
@@ -852,19 +842,8 @@ nil."
options
","))
-(defun org-e-latex--quotation-marks (text info)
- "Export quotation marks depending on language conventions.
-TEXT is a string containing quotation marks to be replaced. INFO
-is a plist used as a communication channel."
- (mapc (lambda(l)
- (let ((start 0))
- (while (setq start (string-match (car l) text start))
- (let ((new-quote (concat (match-string 1 text) (cdr l))))
- (setq text (replace-match new-quote t t text))))))
- (cdr (or (assoc (plist-get info :language) org-e-latex-quotes)
- ;; Falls back on English.
- (assoc "en" org-e-latex-quotes))))
- text)
+(defun org-e-latex--quotation-marks (text info)
+ (org-export-quotation-marks text info org-e-latex-quote-replacements))
(defun org-e-latex--wrap-label (element output)
"Wrap label associated to ELEMENT around OUTPUT, if appropriate.
diff --git a/contrib/lisp/org-e-odt.el b/contrib/lisp/org-e-odt.el
index cab4c66..7eb92b6 100644
--- a/contrib/lisp/org-e-odt.el
+++ b/contrib/lisp/org-e-odt.el
@@ -2318,39 +2318,28 @@ in order to mimic default behaviour:
;;;; Plain text
-(defcustom org-e-odt-quotes
- '(("fr"
- ("\\(\\s-\\|[[(]\\|^\\)\"" . "« ")
- ("\\(\\S-\\)\"" . "» ")
- ("\\(\\s-\\|(\\|^\\)'" . "'"))
- ("en"
- ("\\(\\s-\\|[[(]\\|^\\)\"" . "“")
- ("\\(\\S-\\)\"" . "”")
- ("\\(\\s-\\|(\\|^\\)'" . "‘")
- ("\\(\\S-\\)'" . "’")))
- "Alist for quotes to use when converting english double-quotes.
-
-The CAR of each item in this alist is the language code.
-The CDR of each item in this alist is a list of three CONS:
-- the first CONS defines the opening quote;
-- the second CONS defines the closing quote;
-- the last CONS defines single quotes.
-
-For each item in a CONS, the first string is a regexp
-for allowed characters before/after the quote, the second
-string defines the replacement string for this quote."
+(defcustom org-e-odt-quote-replacements
+ '(("en" "“" "”" "‘" "’" "’")
+ ("fr" "« " " »" "‹ " " ›" "’")
+ ("de" "„" "“" "‚" "‘" "’"))
+ "What to output for quotes. Each element is a list of six strings.
+The first string specifies the language these quotes apply to (\"en\",
+\"fr\", \"de\", etc.; see the LANGUAGE keyword), and the other five
+define the strings to use for, in order:
+ 1. Open double-quotes
+ 2. Close double-quotes
+ 3. Open single-quote
+ 4. Close single-quote
+ 5. Mid-word apostrophe"
:group 'org-export-e-odt
- :type '(list
- (cons :tag "Opening quote"
- (string :tag "Regexp for char before")
- (string :tag "Replacement quote "))
- (cons :tag "Closing quote"
- (string :tag "Regexp for char after ")
- (string :tag "Replacement quote "))
- (cons :tag "Single quote"
- (string :tag "Regexp for char before")
- (string :tag "Replacement quote "))))
-
+ :type '(repeat
+ (list
+ (string :tag "Language code")
+ (string :tag "Open double-quotes")
+ (string :tag "Close double-quotes")
+ (string :tag "Open single-quote")
+ (string :tag "Close single-quote")
+ (string :tag "Mid-word apostrophe"))))
;;;; Compilation
@@ -2485,19 +2474,8 @@ This is used to choose a separator for constructs like \\verb."
when (not (string-match (regexp-quote (char-to-string c)) s))
return (char-to-string c))))
-(defun org-e-odt--quotation-marks (text info)
- "Export quotation marks depending on language conventions.
-TEXT is a string containing quotation marks to be replaced. INFO
-is a plist used as a communication channel."
- (mapc (lambda(l)
- (let ((start 0))
- (while (setq start (string-match (car l) text start))
- (let ((new-quote (concat (match-string 1 text) (cdr l))))
- (setq text (replace-match new-quote t t text))))))
- (cdr (or (assoc (plist-get info :language) org-e-odt-quotes)
- ;; Falls back on English.
- (assoc "en" org-e-odt-quotes))))
- text)
+(defun org-e-odt--quotation-marks (text info)
+ (org-export-quotation-marks text info org-e-odt-quote-replacements))
(defun org-e-odt--wrap-label (element output)
"Wrap label associated to ELEMENT around OUTPUT, if appropriate.
diff --git a/contrib/lisp/org-export.el b/contrib/lisp/org-export.el
index b9294e5..4e5f738 100644
--- a/contrib/lisp/org-export.el
+++ b/contrib/lisp/org-export.el
@@ -143,6 +143,7 @@
(:with-priority nil "pri" org-export-with-priority)
(:with-special-strings nil "-" org-export-with-special-strings)
(:with-sub-superscript nil "^" org-export-with-sub-superscripts)
+ (:with-smart-quotes nil "\"" org-export-with-smart-quotes)
(:with-toc nil "toc" org-export-with-toc)
(:with-tables nil "|" org-export-with-tables)
(:with-tags nil "tags" org-export-with-tags)
@@ -284,6 +285,33 @@ rules.")
:tag "Org Export General"
:group 'org-export)
+;; Generic function, usable by exporters, but they can define their own
+;; instead.
+(defun org-export-quotation-marks (text info replacements)
+ "Export quotation marks depending on language conventions.
+TEXT is a string containing quotation marks to be replaced. INFO
+is a plist used as a communication channel."
+ ;; (message text)
+ (when (plist-get info :with-smart-quotes)
+ (let* ((regexps
+ (cdr
+ (or
+ (assoc (plist-get info :language)
+ org-smart-quotes-regexps)
+ (assq 'DEFAULT org-smart-quotes-regexps))))
+ (subs (cdr (or (assoc (plist-get info :language)
+ replacements)
+ (assoc "en" replacements))))
+ (quotes (pairlis regexps subs)))
+ (mapc (lambda (p)
+ (let ((re (car p))
+ (su (cdr p)))
+ (setq text (replace-regexp-in-string re su text t t 9))))
+ quotes)))
+ text)
+
+;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
+
(defcustom org-export-with-archived-trees 'headline
"Whether sub-trees with the ARCHIVE tag should be exported.
@@ -445,6 +473,16 @@ e.g. \"e:nil\"."
:group 'org-export-general
:type 'boolean)
+(defcustom org-export-with-smart-quotes t
+ "Non-nil means try to make quotes \"smart\" when exporting.
+
+For example, HTML export would convert \"Hello\" to “Hello”.
+
+The exact style of quotes depends on the language; see the LANGUAGE
+keyword and also the smart-quote custom settings for each exporter."
+ :group 'org-export-general
+ :type 'boolean)
+
(defcustom org-export-with-planning nil
"Non-nil means include planning info in export.
This option can also be set with the #+OPTIONS: line,
diff --git a/lisp/org.el b/lisp/org.el
index b89889d..8a446ec 100644
--- a/lisp/org.el
+++ b/lisp/org.el
@@ -3629,6 +3629,69 @@ When nil, the \\name form remains in the buffer."
:version "24.1"
:type 'boolean)
+(defcustom org-smart-quotes nil
+ "Non-nil means display `smart' quotes on-screen in place
+of \" and ' characters."
+ :group 'org-appearance
+ :type 'boolean)
+
+(defcustom org-smart-quotes-replacements
+ '("“" "”" "‘" "’" "’")
+ "What to display on-screen when `org-smart-quotes' is non-nil.
+A list of five strings:
+ 1. Open double-quotes
+ 2. Close double-quotes
+ 3. Open single-quote
+ 4. Close single-quote
+ 5. Mid-word apostrophe"
+ :group 'org-appearance
+ :type '(list
+ (string :tag "Open double-quotes" "“")
+ (string :tag "Close double-quotes" "”")
+ (string :tag "Open single-quote" "‘")
+ (string :tag "Close single-quote" "’")
+ (string :tag "Mid-word apostrophe" "’")))
+
+(defcustom org-smart-quotes-regexps
+ '((DEFAULT
+ "\\(?:\\s-\\|\\s(\\|^\\)\\(?9:\"\\)\\(?:\\w\\|\\s.\\|\\s_\\)\\|\\s-\\(?9:\"\\)$"
+ "\\(?:\\S-\\)\\(?9:\"\\)\\(?:\\s-\\|$\\|\\s)\\|\\s.\\)\\|^\\(?9:\"\\)\\s-"
+ "\\(?:\\s-\\|(\\|^\\)\\(?9:'\\)\\w\\|\\s-\\(?9:'\\)$"
+ "\\w\\s.*\\(?9:'\\)\\(?:\\s-\\|\\s.\\|$\\)\\|^\\(?9:'\\)\\s-"
+ "\\w\\(?9:'\\)\\w"))
+ "Regexps for quotes to be made `smart' quotes upon export or onscreen.
+Each element is a list of six strings. The car is the a string
+representing the language to which this definition applies (e.g. \"en\",
+\"fr\", \"de\", etc.); the cdr (the other five elements) are five REs
+matching, in order:
+ 1. Opening double-quotes
+ 2. Closing double-quotes
+ 3. Opening single-quotes
+ 4. Closing single-quotes
+ 5. Mid-word apostrophes
+
+Each regexp should surround the actual quote in a capturing group, which
+must be specified as number 9 (so as not to conflict with other processing.)
+
+One element should have as its car the atom DEFAULT, to be used when no
+other element fits. It is also the one used for on-screen display of
+`smart' quotes (see the variable `org-smart-quotes').
+
+As what makes an opening or closing quote is somewhat consistent across
+languages (as opposed to how they are represented in typography), the
+DEFAULT element is likely sufficient for most purposes."
+ :group 'org-export-general
+ :group 'org-appearance
+ :type '(repeat
+ (list
+ (choice (const DEFAULT)
+ (string :tag "Language"))
+ (regexp :tag "Open double-quotes")
+ (regexp :tag "Close double-quotes")
+ (regexp :tag "Open single-quote")
+ (regexp :tag "Close double-quote")
+ (regexp :tag "Mid-word apostrophe"))))
+
(defvar org-emph-re nil
"Regular expression for matching emphasis.
After a match, the match groups contain these elements:
@@ -5931,6 +5994,7 @@ needs to be inserted at a specific position in the font-lock sequence.")
;; Specials
'(org-do-latex-and-special-faces)
'(org-fontify-entities)
+ '(org-fontify-quotes)
'(org-raise-scripts)
;; Code
'(org-activate-code (1 'org-code t))
@@ -5952,6 +6016,43 @@ needs to be inserted at a specific position in the font-lock sequence.")
'(org-font-lock-keywords t nil nil backward-paragraph))
(kill-local-variable 'font-lock-keywords) nil))
+(defun org-fontify-quotes (limit)
+ (require 'org-export)
+ (when org-smart-quotes
+ (let* ((start (point))
+ k su
+ (splice-string (lambda (lst join)
+ (if (null (cdr lst)) (car lst)
+ (concat (car lst) join
+ (splice-string (cdr lst) join)))))
+ (regexps
+ (cdr
+ (assq 'DEFAULT org-smart-quotes-regexps)))
+ (i 1)
+ (allreg
+ (mapconcat (lambda (n) (prog1 (format "\\(?%d:%s\\)" i n)
+ (setq i (1+ i))))
+ regexps "\\|"))
+ (quotes (pairlis regexps org-smart-quotes-replacements)))
+ (catch 'match
+ (while (re-search-forward allreg limit t)
+ (cond ((match-string 1)
+ (setq su (nth 0 org-smart-quotes-replacements)))
+ ((match-string 2)
+ (setq su (nth 1 org-smart-quotes-replacements)))
+ ((match-string 3)
+ (setq su (nth 2 org-smart-quotes-replacements)))
+ ((match-string 4)
+ (setq su (nth 3 org-smart-quotes-replacements)))
+ ((match-string 5)
+ (setq su (nth 4 org-smart-quotes-replacements))))
+ (add-text-properties (match-beginning 9) (match-end 9)
+ (list 'font-lock-fontified t
+ 'face 'org-document-info))
+ (compose-region (match-beginning 9) (match-end 9) su nil)
+ (backward-char 1)
+ (throw 'match t))))))
+
(defun org-toggle-pretty-entities ()
"Toggle the composition display of entities as UTF8 characters."
(interactive)
--
1.7.7.6
^ permalink raw reply related [flat|nested] 23+ messages in thread
* Re: Smart Quotes Exporting
2012-06-06 2:14 ` Mark E. Shoulson
@ 2012-06-07 19:21 ` Nicolas Goaziou
2012-06-11 1:28 ` Mark Shoulson
0 siblings, 1 reply; 23+ messages in thread
From: Nicolas Goaziou @ 2012-06-07 19:21 UTC (permalink / raw)
To: Mark E. Shoulson; +Cc: emacs-orgmode
Hello,
"Mark E. Shoulson" <mark@kli.org> writes:
> Update on the smart-quotes patch. Supports the odt exporter now too,
> which I think covers all the current major "new" exporters for which
> it is relevant (adding smart quotes to ASCII export is a contradiction
> in terms;
ASCII exporter also handle UTF-8. So it's good to have there too.
> should it be in the "publish" exporter? It didn't look like it to
> me).
No.
> Added an options keyword, '"' (that is, the double-quote mark) to
> select smart quotes on/off, and a defcustom for customizing your
> default. Set the default default [sic] to nil, though actually it
> might be reasonable to set it to t. Slight touch-up to the regexps
> since last time, but they will definitely be subject to a lot of
> fine-tuning as more special cases are found that break them and ways
> to fix it are found (the close-quote still breaks on one of "/a/." or
> "/a./")
Again, using regexps on plain text objects is a wrong approach, as you
need a better understanding of the whole paragraph structure to
properly. I already suggested a possible solution, is there anything
wrong with it?
Regards,
--
Nicolas Goaziou
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Smart Quotes Exporting
2012-06-07 19:21 ` Nicolas Goaziou
@ 2012-06-11 1:28 ` Mark Shoulson
2012-06-12 13:21 ` Nicolas Goaziou
0 siblings, 1 reply; 23+ messages in thread
From: Mark Shoulson @ 2012-06-11 1:28 UTC (permalink / raw)
To: emacs-orgmode
Nicolas Goaziou <n.goaziou <at> gmail.com> writes:
>
> Hello,
>
> "Mark E. Shoulson" <mark <at> kli.org> writes:
>
> > Update on the smart-quotes patch. Supports the odt exporter now too,
> > which I think covers all the current major "new" exporters for which
> > it is relevant (adding smart quotes to ASCII export is a contradiction
> > in terms;
>
> ASCII exporter also handle UTF-8. So it's good to have there too.
Really? I would have thought ASCII meant ASCII, as in 7-bit clean text. More
of a "plain text" exporter then. Fair enough. I'll work it in.
> > should it be in the "publish" exporter? It didn't look like it to
> > me).
>
> No.
OK, good.
>
> > Added an options keyword, '"' (that is, the double-quote mark) to
> > select smart quotes on/off, and a defcustom for customizing your
> > default. Set the default default [sic] to nil, though actually it
> > might be reasonable to set it to t. Slight touch-up to the regexps
> > since last time, but they will definitely be subject to a lot of
> > fine-tuning as more special cases are found that break them and ways
> > to fix it are found (the close-quote still breaks on one of "/a/." or
> > "/a./")
>
> Again, using regexps on plain text objects is a wrong approach, as you
> need a better understanding of the whole paragraph structure to
> properly. I already suggested a possible solution, is there anything
> wrong with it?
It looked to me like your solution would essentially boil down to "do string
handling when there's a string, otherwise recur down and find the strings,"
which essentially means apply it to all the strings... and there were already
functions out there applying things to strings, so this can just ride along with
them. Here, let's look at your suggestion and see if we can find what I missed:
] Walk element/object/secondary-string's contents .
]
] 1. When a string is encountered:
]
] 1. If it has a quote as its first or last position, check for
] objects before or after the string to guess its status. An
] object never starts with a white space, but you may have to
] check :post-blank property in order to know if previous object
] had white spaces at its end.
]
] 2. For each quote everywhere else in the string, your regexp can
] handle it fine.
]
] 2. When an object belonging to `org-element-recursive-objects' is
] encountered, apply the function to this object.
]
] 3. Accumulate returned strings or objects.
So, if it's a string, use the regexps (if they can be smart enough to look at
beginning and end of the string, which they can--though I haven't been using the
:post-blank property so presumably something is amiss), and if it isn't a
string, recur down until you get to a string... Ah, but only if it's in
org-element-recursive-objects. So the issue with the current state is that it
would wind up applying to too much? (it would hit code and verbatim elements,
for example, and that would be wrong.) And detecting such things at the string
level would be the wrong place... So it remains to find the right place in the
processing to put a function like the one you describe. I'm trying to get a
proper understanding of the code structure to see what you mean. Looks like it
should be something like a transcoder, only called on everything... wait, called
on the top-level parsed tree object, recursively doing its thing before(?) the
transcoders of the individual objects get to it. So almost something replacing
the (lambda (blob contents info) contents) stub in org-export-transcoder; does
that make sense to you? Otherwise, called somehow in org-export-data. In either
case made a hook of some kind so that it is backend-specific.
Does it sound like I am understanding this right, to you?
The on-screen one would still use the plain-string computation, as you said,
since the full parse isn't available. And that seems to work okay (the export
works okay too, for simple cases.) It would also need to be tweaked not to act
on verbatim/comment text, etc.
Thanks,
~mark
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Smart Quotes Exporting
2012-06-11 1:28 ` Mark Shoulson
@ 2012-06-12 13:21 ` Nicolas Goaziou
2012-06-15 16:20 ` Mark Shoulson
0 siblings, 1 reply; 23+ messages in thread
From: Nicolas Goaziou @ 2012-06-12 13:21 UTC (permalink / raw)
To: Mark Shoulson; +Cc: emacs-orgmode
Hello,
Mark Shoulson <mark@kli.org> writes:
>> ASCII exporter also handle UTF-8. So it's good to have there too.
>
> Really? I would have thought ASCII meant ASCII, as in 7-bit clean
> text.
org-e-ascii.el (as old org-ascii.el) handles ASCII, Latin1 and UTF-8
encodings.
> It looked to me like your solution would essentially boil down to "do
> string handling when there's a string, otherwise recur down and find
> the strings," which essentially means apply it to all the
> strings... and there were already functions out there applying things
> to strings, so this can just ride along with them. Here, let's look
> at your suggestion and see if we can find what I missed:
>
> ] Walk element/object/secondary-string's contents .
> ]
> ] 1. When a string is encountered:
> ]
> ] 1. If it has a quote as its first or last position, check for
> ] objects before or after the string to guess its status. An
> ] object never starts with a white space, but you may have to
> ] check :post-blank property in order to know if previous object
> ] had white spaces at its end.
> ]
> ] 2. For each quote everywhere else in the string, your regexp can
> ] handle it fine.
> ]
> ] 2. When an object belonging to `org-element-recursive-objects' is
> ] encountered, apply the function to this object.
> ]
> ] 3. Accumulate returned strings or objects.
>
> So, if it's a string, use the regexps (if they can be smart enough to look at
> beginning and end of the string, which they can--though I haven't been using the
> :post-blank property so presumably something is amiss), and if it isn't a
> string, recur down until you get to a string... Ah, but only if it's in
> org-element-recursive-objects.
You're missing an important part: the regexps cannot be smart enough for
quotes at the beginning or the end of the string. There, you must look
outside the string. Hence:
> ] 1. If it has a quote as its first or last position, check for
> ] objects before or after the string to guess its status. An
> ] object never starts with a white space, but you may have to
> ] check :post-blank property in order to know if previous object
> ] had white spaces at its end.
But you can only do that from the element containing the string, not
from the string itself.
> So the issue with the current state is that it
> would wind up applying to too much? (it would hit code and verbatim elements,
> for example, and that would be wrong.)
No, you are not applying it too much (verbatim elements don't contain
plain-text objects) but your function hasn't got access to enough
information to be useful.
> So it remains to find the right place in the processing to put
> a function like the one you describe. I'm trying to get a proper
> understanding of the code structure to see what you mean. Looks like
> it should be something like a transcoder, only called on
> everything...
Transcoders are type specific, so that's not an option.
> wait, called on the top-level parsed tree object, recursively doing
> its thing before(?) the transcoders of the individual objects get to
> it.
That's called a parse tree filter. That should be a possibility
indeed. The function would be applied on the parse tree and would
replace strings within elements containing plain text (that is
paragraph, verse-block and table-row types). parse tree filters are
applied very early in the export process.
Another option would be to integrate it into
`org-element-normalize-contents', but I think the previous way is
better.
> The on-screen one would still use the plain-string computation, as you said,
> since the full parse isn't available.
Yes.
> It would also need to be tweaked not to act on verbatim/comment text,
> etc.
Yes. You may want to use `org-element-at-point' and `org-element-type'
to tell if you're somewhere smart quotes are allowed (in table,
table-row, paragraph, verse-block elements).
Regards,
--
Nicolas Goaziou
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Smart Quotes Exporting
2012-06-12 13:21 ` Nicolas Goaziou
@ 2012-06-15 16:20 ` Mark Shoulson
2012-06-19 9:26 ` Nicolas Goaziou
0 siblings, 1 reply; 23+ messages in thread
From: Mark Shoulson @ 2012-06-15 16:20 UTC (permalink / raw)
To: emacs-orgmode
Nicolas Goaziou <n.goaziou <at> gmail.com> writes:
>
> Hello,
>
> Mark Shoulson <mark <at> kli.org> writes:
>
> >> ASCII exporter also handle UTF-8. So it's good to have there too.
> >
> > Really? I would have thought ASCII meant ASCII, as in 7-bit clean
> > text.
>
> org-e-ascii.el (as old org-ascii.el) handles ASCII, Latin1 and UTF-8
> encodings.
I noticed that after writing my response. The name just threw me a little.
Yes, that exporter needs to handle it too.
> > It looked to me like your solution would essentially boil down to "do
> > string handling when there's a string, otherwise recur down and find
> > the strings," which essentially means apply it to all the
> > strings... and there were already functions out there applying things
> > to strings, so this can just ride along with them. Here, let's look
> > at your suggestion and see if we can find what I missed:
> >
....
> > So, if it's a string, use the regexps (if they can be smart enough to look
at
> > beginning and end of the string, which they can--though I haven't been
using the
> > :post-blank property so presumably something is amiss), and if it isn't a
> > string, recur down until you get to a string... Ah, but only if it's in
> > org-element-recursive-objects.
>
> You're missing an important part: the regexps cannot be smart enough for
> quotes at the beginning or the end of the string. There, you must look
> outside the string. Hence:
Well, wait; regexps can make some pretty darn good guesses at the beginnings
or ends of strings. Quotations don't normally end in spaces (in the
conventions used with ""; French typography is different, but if you're using
spaces around your quotes you have worse problems (line-breaks) to worry
about). So if a string ends in space(s) followed by a quote, it's very likely
that quote is an open-quote for some stuff that comes after. Conversely, if a
string starts with a quote followed by some spaces, it's very likely a close-
quote to what went on before.
This isn't quite it; beginning-of-string followed by quote, then punctuation
and then spaces is also a close-quote, etc... There is a lot of fine-tuning.
But even what I currently have was able to handle your
Caesar said, "/Alea Jacta est./"
example. Yes, there are edge-cases which this won't catch, and it remains to
be seen how pervasive and annoying those are. It may be that repeated
tweaking of regexps will handle enough of the ordinary cases. It may be that
after a few rounds of regexp-hacking someone will finally decide that regexp-
hacking just won't handle enough of the important cases. But I think even as
it stands now we'd probably handle 80-90% of the normal situations, which
really is as much as we reasonably can hope for.
Could I trouble someone to try applying my patch and trying it out for
yourself and seeing just how bad/good the performance is? It seems to work
okay for the cases I've been trying, but maybe my dataset isn't robust
enough. Let's give it a test and seen how many actual cases in common usage
it gets wrong. Maybe see how much can be fixed by tuning regexps.
>
> > ] 1. If it has a quote as its first or last position, check for
> > ] objects before or after the string to guess its status. An
> > ] object never starts with a white space, but you may have to
> > ] check :post-blank property in order to know if previous object
> > ] had white spaces at its end.
>
> But you can only do that from the element containing the string, not
> from the string itself.
The case where a quote both sits at the edge of a string (i.e. at the border
of some element, formatting, etc) *and* does not have whitespace next to it,
with possible punctuation, does not seem to be a normal occurrence to me. If
I'm wrong, how common *is* it?
>
> > So the issue with the current state is that it
> > would wind up applying to too much? (it would hit code and verbatim
elements,
> > for example, and that would be wrong.)
>
> No, you are not applying it too much (verbatim elements don't contain
> plain-text objects) but your function hasn't got access to enough
> information to be useful.
The on-screen version, of course, will have to be smarter and check for
the "face" formatting to make sure it doesn't happen in comments or verbatims;
I am pretty sure it does not do that yet.
> > wait, called on the top-level parsed tree object, recursively doing
> > its thing before(?) the transcoders of the individual objects get to
> > it.
>
> That's called a parse tree filter. That should be a possibility
> indeed. The function would be applied on the parse tree and would
> replace strings within elements containing plain text (that is
> paragraph, verse-block and table-row types). parse tree filters are
> applied very early in the export process.
>
> Another option would be to integrate it into
> `org-element-normalize-contents', but I think the previous way is
> better.
Maybe. I know it sounds like I'm fixated on the plain-text solution, but I'm
not convinced the envisioned problems are more than theoretical, or that they
will cause an unacceptable amount of error (keeping in mind that some error
*is* acceptable and unavoidable).
> > The on-screen one would still use the plain-string computation, as you
said,
> > since the full parse isn't available.
>
> Yes.
>
> > It would also need to be tweaked not to act on verbatim/comment text,
> > etc.
>
> Yes. You may want to use `org-element-at-point' and `org-element-type'
> to tell if you're somewhere smart quotes are allowed (in table,
> table-row, paragraph, verse-block elements).
Probably. I think I saw some other package make these decisions by peeking at
the formatting and seeing if it is set in comment-face or something, but
checking the element at point is presumably more sensible.
~mark
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Smart Quotes Exporting
2012-06-15 16:20 ` Mark Shoulson
@ 2012-06-19 9:26 ` Nicolas Goaziou
2012-08-07 23:18 ` Bastien
0 siblings, 1 reply; 23+ messages in thread
From: Nicolas Goaziou @ 2012-06-19 9:26 UTC (permalink / raw)
To: Mark Shoulson; +Cc: emacs-orgmode
Hello,
Mark Shoulson <mark@kli.org> writes:
> Well, wait; regexps can make some pretty darn good guesses at the beginnings
> or ends of strings.
I know that. They make a good job, I just want a better one.
> This isn't quite it; beginning-of-string followed by quote, then punctuation
> and then spaces is also a close-quote, etc... There is a lot of fine-tuning.
> But even what I currently have was able to handle your
>
> Caesar said, "/Alea Jacta est./"
>
> example.
No, it doesn't handle that, actually, it's just sheer luck. Indeed, the
quoting function is applied to "\"". There's absolutely no space,
punctuation, etc. to save the day. So it makes a wild guess with
a probability of 0.5 of success. Since the guess is always the same,
"/a/" will always fail.
> The case where a quote both sits at the edge of a string (i.e. at the border
> of some element, formatting, etc) *and* does not have whitespace next to it,
> with possible punctuation, does not seem to be a normal occurrence to me. If
> I'm wrong, how common *is* it?
Even if it rarely happens, it can be _very_ annoying to have to cope
with bad guesses. If it can be avoided, I see no reason not to do so.
Now, here the infrastructure I propose.
Internally, the two following functions are required.
#+begin_src emacs-lisp
(defun org-export--smart-quotes-in-element (element backend)
"Replace plain quotes with smart quotes in ELEMENT.
ELEMENT is an Org element or a secondary string. BACKEND is the
back-end to check for rules, as a symbol.
This is a destructive operation. Return new element."
(let* ((type (org-element-type element))
(properties (and type (nth 1 element))))
;; Destructively apply changes to secondary string, if any.
(let ((secondary (and type (assq type org-element-secondary-value-alist))))
(when secondary
(let* ((sec-symbol (cdr secondary))
(sec-value (plist-get properties sec-symbol)))
(when sec-value
(setq properties
(plist-put properties
sec-symbol
(org-export--smart-quotes-in-element
sec-value backend)))))))
;; Destructively change `:caption' if present. Since it's a dual
;; keyword, apply smart quotes to both CAR and CDR, if required.
(let ((caption (plist-get :caption properties)))
(when caption
(setq properties
(plist-put properties
:caption
(cons
(org-export--smart-quotes-in-element
(car caption) backend)
(and (cdr caption)
(org-export--smart-quotes-in-element
(cdr caption) backend)))))))
;; Recursively apply changes to contents. Rebuild ELEMENT along
;; the way, with updated strings.
(let ((contents (if type (org-element-contents element) element))
previous current next acc)
(while contents
(setq current (pop contents)
next (car contents)
previous current)
(push
(cond ((stringp current)
;; CURRENT is a string: Call
;; `org-export-quotation-marks' with appropriate
;; information.
(org-export-quotation-marks
current
(and previous
(if (stringp previous)
(length (and (string-match " +\\'" previous)
(match-string 0 previous)))
(org-element-property :post-blank previous)))
(and next
(if (not (stringp next)) 0
(length (and (string-match "\\` +" next)
(match-string 0 next)))))
backend))
;; CURRENT is recursive: Move into it.
((plist-get properties :contents-begin)
(org-export--smart-quotes-in-element current backend))
;; Otherwise, just accumulate CURRENT.
(t current))
acc))
;; Re-build transformed element.
(if (or (not type) (eq type 'plain-text)) (nreverse acc)
(nconc (list type properties) (nreverse acc))))))
(defun org-export-set-smart-quotes (tree backend info)
"Replace plain quotes with smart quotes in TREE.
BACKEND is the back-end, as a symbol, used for transcoding. INFO
is a plist used as a communication channel.
This is a destructive operation. This function is meant to be
used as a parse tree filter for back-ends activating smart
quotes."
;; Destructively apply smart quotes to parsed keywords in info.
(let ((value (plist-get info :title)))
(when value
(setq info
(plist-put info
:title
(org-export--smart-quotes-in-element value backend)))))
;; Replace smart quotes in elements containing plain text or
;; secondary strings across the parse tree.
(org-element-map
tree '(paragraph verse-block table-cell headline inlinetask item)
(lambda (el)
(org-export-set-element el
(org-export--smart-quotes-in-element el backend))))
;; Return parse tree.
tree)
#+end_src
Then, all is left to do is write the function replacing quotes in
a string, with additional information:
#+begin_src emacs-lisp
(defun org-export-quotation-marks (s &optional prev next backend)
"Replace plain quotes with smart quotes in string S.
Optional argument PREV (resp. NEXT) is the number of white space
characters before (resp. after) the string, or nil if
S starts (resp. ends) a paragraph.
Optional argument BACKEND is a symbol representing the back-end
to use for substitutions.
The function returns the new string."
...)
#+end_src
Once this function is written, add `org-export-set-smart-quotes' as
a parse tree filter in `org-BACKEND-filters-alist'.
For example, one can add the following in org-e-latex.el to activate
smart quotes in latex export:
#+begin_src emacs-lisp
(defconst org-e-latex-filters-alist
'((:filter-parse-tree . org-export-set-smart-quotes))
"Alist between filters keywords and back-end specific filters.
See `org-export-filters-alist' for more information.")
#+end_src
Could you please try to modify your original
`org-export-quotation-marks' accordingly and test it?
>> Yes. You may want to use `org-element-at-point' and `org-element-type'
>> to tell if you're somewhere smart quotes are allowed (in table,
>> table-row, paragraph, verse-block elements).
>
> Probably. I think I saw some other package make these decisions by peeking at
> the formatting and seeing if it is set in comment-face or something, but
> checking the element at point is presumably more sensible.
Thinking about it, looking at face used will definitely be faster,
though. That's your call.
Regards,
--
Nicolas Goaziou
^ permalink raw reply [flat|nested] 23+ messages in thread