From mboxrd@z Thu Jan 1 00:00:00 1970 From: Mark Shoulson Subject: Re: Smart Quotes Exporting Date: Mon, 11 Jun 2012 01:28:12 +0000 (UTC) Message-ID: References: <4FBB08CA.5060705@kli.org> <87d35u8rvk.fsf@gmail.com> <4FBDA56E.5030901@kli.org> <87zk8w6v4q.fsf@gmail.com> <4FC00CE0.6060308@kli.org> <87r4u75tg9.fsf@gmail.com> <4FC426AC.2030109@kli.org> <87ehq227ky.fsf@gmail.com> <4FC56F1B.5040201@kli.org> <87r4u031ye.fsf@gmail.com> <4FC7FE2C.6040702@kli.org> <878vg72bzy.fsf@gmail.com> <4FCEBCF5.1070209@kli.org> <87haunexn8.fsf@gmail.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Return-path: Received: from eggs.gnu.org ([208.118.235.92]:46103) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1SdtQl-0007QL-LV for emacs-orgmode@gnu.org; Sun, 10 Jun 2012 21:28:33 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1SdtQj-0002aG-Ff for emacs-orgmode@gnu.org; Sun, 10 Jun 2012 21:28:31 -0400 Received: from plane.gmane.org ([80.91.229.3]:55894) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1SdtQj-0002Zy-4k for emacs-orgmode@gnu.org; Sun, 10 Jun 2012 21:28:29 -0400 Received: from list by plane.gmane.org with local (Exim 4.69) (envelope-from ) id 1SdtQe-00072L-8D for emacs-orgmode@gnu.org; Mon, 11 Jun 2012 03:28:24 +0200 Received: from pi.meson.org ([96.56.207.26]) by main.gmane.org with esmtp (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Mon, 11 Jun 2012 03:28:24 +0200 Received: from mark by pi.meson.org with local (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Mon, 11 Jun 2012 03:28:24 +0200 List-Id: "General discussions about Org-mode." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-orgmode-bounces+geo-emacs-orgmode=m.gmane.org@gnu.org Sender: emacs-orgmode-bounces+geo-emacs-orgmode=m.gmane.org@gnu.org To: emacs-orgmode@gnu.org Nicolas Goaziou gmail.com> writes: > > Hello, > > "Mark E. Shoulson" kli.org> writes: > > > Update on the smart-quotes patch. Supports the odt exporter now too, > > which I think covers all the current major "new" exporters for which > > it is relevant (adding smart quotes to ASCII export is a contradiction > > in terms; > > ASCII exporter also handle UTF-8. So it's good to have there too. Really? I would have thought ASCII meant ASCII, as in 7-bit clean text. More of a "plain text" exporter then. Fair enough. I'll work it in. > > should it be in the "publish" exporter? It didn't look like it to > > me). > > No. OK, good. > > > Added an options keyword, '"' (that is, the double-quote mark) to > > select smart quotes on/off, and a defcustom for customizing your > > default. Set the default default [sic] to nil, though actually it > > might be reasonable to set it to t. Slight touch-up to the regexps > > since last time, but they will definitely be subject to a lot of > > fine-tuning as more special cases are found that break them and ways > > to fix it are found (the close-quote still breaks on one of "/a/." or > > "/a./") > > Again, using regexps on plain text objects is a wrong approach, as you > need a better understanding of the whole paragraph structure to > properly. I already suggested a possible solution, is there anything > wrong with it? It looked to me like your solution would essentially boil down to "do string handling when there's a string, otherwise recur down and find the strings," which essentially means apply it to all the strings... and there were already functions out there applying things to strings, so this can just ride along with them. Here, let's look at your suggestion and see if we can find what I missed: ] Walk element/object/secondary-string's contents . ] ] 1. When a string is encountered: ] ] 1. If it has a quote as its first or last position, check for ] objects before or after the string to guess its status. An ] object never starts with a white space, but you may have to ] check :post-blank property in order to know if previous object ] had white spaces at its end. ] ] 2. For each quote everywhere else in the string, your regexp can ] handle it fine. ] ] 2. When an object belonging to `org-element-recursive-objects' is ] encountered, apply the function to this object. ] ] 3. Accumulate returned strings or objects. So, if it's a string, use the regexps (if they can be smart enough to look at beginning and end of the string, which they can--though I haven't been using the :post-blank property so presumably something is amiss), and if it isn't a string, recur down until you get to a string... Ah, but only if it's in org-element-recursive-objects. So the issue with the current state is that it would wind up applying to too much? (it would hit code and verbatim elements, for example, and that would be wrong.) And detecting such things at the string level would be the wrong place... So it remains to find the right place in the processing to put a function like the one you describe. I'm trying to get a proper understanding of the code structure to see what you mean. Looks like it should be something like a transcoder, only called on everything... wait, called on the top-level parsed tree object, recursively doing its thing before(?) the transcoders of the individual objects get to it. So almost something replacing the (lambda (blob contents info) contents) stub in org-export-transcoder; does that make sense to you? Otherwise, called somehow in org-export-data. In either case made a hook of some kind so that it is backend-specific. Does it sound like I am understanding this right, to you? The on-screen one would still use the plain-string computation, as you said, since the full parse isn't available. And that seems to work okay (the export works okay too, for simple cases.) It would also need to be tweaked not to act on verbatim/comment text, etc. Thanks, ~mark