From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Mark E. Shoulson" Subject: Re: Smart Quotes Exporting Date: Fri, 01 Jun 2012 18:41:56 -0400 Message-ID: <4FC94534.8000309@kli.org> References: <4FBB08CA.5060705@kli.org> <87d35u8rvk.fsf@gmail.com> <4FBDA56E.5030901@kli.org> <87zk8w6v4q.fsf@gmail.com> <4FC00CE0.6060308@kli.org> <87r4u75tg9.fsf@gmail.com> <4FC426AC.2030109@kli.org> <87ehq227ky.fsf@gmail.com> <4FC56F1B.5040201@kli.org> <87r4u031ye.fsf@gmail.com> <4FC7FE2C.6040702@kli.org> <878vg72bzy.fsf@gmail.com> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Return-path: Received: from eggs.gnu.org ([208.118.235.92]:38397) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1SaaXj-0008UQ-46 for emacs-orgmode@gnu.org; Fri, 01 Jun 2012 18:42:04 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1SaaXg-0007tt-Ja for emacs-orgmode@gnu.org; Fri, 01 Jun 2012 18:42:02 -0400 Received: from pi.meson.org ([96.56.207.26]:55036) by eggs.gnu.org with smtp (Exim 4.71) (envelope-from ) id 1SaaXg-0007tk-ET for emacs-orgmode@gnu.org; Fri, 01 Jun 2012 18:42:00 -0400 In-Reply-To: <878vg72bzy.fsf@gmail.com> List-Id: "General discussions about Org-mode." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-orgmode-bounces+geo-emacs-orgmode=m.gmane.org@gnu.org Sender: emacs-orgmode-bounces+geo-emacs-orgmode=m.gmane.org@gnu.org To: Nicolas Goaziou Cc: emacs-orgmode@gnu.org On 06/01/2012 01:11 PM, Nicolas Goaziou wrote: > Hello, > > "Mark E. Shoulson" writes: > >> Oh, certainly; they're all a disaster. I think I said that in the >> writeup at the top. This is just proof of concept, nothing is in the >> right place, nothing is properly documented. They have to be >> defcustoms, there needs to be a good :type in the defcustom as well as >> a proper docstring. You'll get no argument from me about the lack (or >> inaccuracy) of docstrings and such. I hadn't gotten that far yet. >> I said the patch was only if you wanted to tinker with the development >> as this progresses. > No worries, I was just making some comments before forgetting about > them. Ah, ok. Good! Thanks. >>> +(defun org-e-latex--quotation-marks (text info) >>> + (org-export-quotation-marks text info org-e-latex-quote-replacements)) >>> + ;; (mapc (lambda(l) >>> + ;; (let ((start 0)) >>> + ;; (while (setq start (string-match (car l) text start)) >>> + ;; (let ((new-quote (concat (match-string 1 text) (cdr l)))) >>> + ;; (setq text (replace-match new-quote t t text)))))) >>> + ;; (cdr (or (assoc (plist-get info :language) org-e-latex-quotes) >>> + ;; ;; Falls back on English. >>> + ;; (assoc "en" org-e-latex-quotes)))) >>> + ;; text) >>> Use directly `org-e-latex-quote-replacements' in code then. >> Not sure I understand this comment. > Since `org-e-latex--quotation-marks' just calls > `org-export-quotation-marks', you can remove completely the former from > "org-export.el" and use the latter instead. Well, that was done on purpose, and maybe the reason will make sense. As I see it, each exporter should be able to have its own smartifier function, and the export engine should make no assumptions about that: just call the individual exporter's function. On the other hand, many (but perhaps not all!) of the exporters may find themselves using essentially the same code just with different replacement strings. So I thought that "general-purpose" should be in org-export.el, just for the convenience of exporters should they choose to make use of it. So, many of the exporters' smartifier functions will really just be calls to the more general-purpose function. Does that make sense? >> So... there's the filter-parse-tree-functions hook gets applied within >> the parse tree... so a back-end can add a function to that list which >> looks over the parse-tree and watches for these border cases (and also >> the ones within ordinary strings). Looks like it's going to be tough >> to work in any flexibility to define further per-language or >> per-backend cleverness to handle anything beyond the "canonical set" >> of open-double, close-double, open-single, close-single, and mid-word. >> >> To be sure, anything we do will most assuredly fail even on some >> fairly reasonable input, in which case the users are pretty much on >> their own and will have to do things the hard way. And I could use >> that as the answer here, that, "well, it'll work only within >> plain-text strings" (and I might possibly still have to use that >> answer), but I would rather include the situations you bring up in the >> supported set and not throw up my hands at it. So, yes, will look at >> that. > Actually it isn't very hard to handle this problem. But it will be > different than the fontification used in an Org buffer. Yes, the fontification on-screen is different, and uses a rather different function--but if I can help it, the same regexps! So things work the same everywhere. I also started thinking a little about what you write below, how we can inspect the characters just after or before quotes at the very beginning or end of each chunk. It would be nice if it could all be encapsulated neatly in the regexp(s). > As a first approximation, I can imagine a function accepting an element, > an object or a secondary string and returning an equivalent element, > object or secondary string, with its quotes "smartified". The algorithm > could go like this: > > Walk element/object/secondary-string's contents . Need it be element/object/secondary-string? At the bottom level it's always about strings; the higher levels don't affect the processing of each string in isolation. Do we need to intercept it at the element level or just wait to grab things in the plain-text filter, since we have access at that point too? (Might also be that my understanding of the process and the nature of elements is faulty or limited. Will have to see what works.) > > 1. When a string is encountered: > > 1. If it has a quote as its first or last position, check for > objects before or after the string to guess its status. An > object never starts with a white space, but you may have to > check :post-blank property in order to know if previous object > had white spaces at its end. Hmm, this may in fact answer my question above: you need to be able to get at the object level to test the post-blank. I'll experiment. > 2. For each quote everywhere else in the string, your regexp can > handle it fine. > > 2. When an object belonging to `org-element-recursive-objects' is > encountered, apply the function to this object. > > 3. Accumulate returned strings or objects. > > Use accumulated data as the contents of the new object to return (i.e. > just add the type and the same properties at the beginning of this list > if it was an object or an element, return it as-is if that was > a secondary string). > > On the elements side, only paragraphs, verse-blocks and table-rows can > directly contain quotes. Also, headline, inlinetask item and > footnote-reference have secondary strings containing quotes. I also haven't yet worked in smarts (especially in the on-screen fontifier) for things like not fontifying inside comments or verbatim strings, etc. That'll come in time. > I'm not sure yet where and how to install such a function, but I will > think about it when it is implemented. Uuum... Maybe org-export-filter-parse-tree-functions? ~mark