From mboxrd@z Thu Jan 1 00:00:00 1970 From: Nicolas Goaziou Subject: Re: Smart Quotes Exporting Date: Tue, 12 Jun 2012 15:21:05 +0200 Message-ID: <874nqgeke6.fsf@gmail.com> References: <4FBB08CA.5060705@kli.org> <87d35u8rvk.fsf@gmail.com> <4FBDA56E.5030901@kli.org> <87zk8w6v4q.fsf@gmail.com> <4FC00CE0.6060308@kli.org> <87r4u75tg9.fsf@gmail.com> <4FC426AC.2030109@kli.org> <87ehq227ky.fsf@gmail.com> <4FC56F1B.5040201@kli.org> <87r4u031ye.fsf@gmail.com> <4FC7FE2C.6040702@kli.org> <878vg72bzy.fsf@gmail.com> <4FCEBCF5.1070209@kli.org> <87haunexn8.fsf@gmail.com> Mime-Version: 1.0 Content-Type: text/plain Return-path: Received: from eggs.gnu.org ([208.118.235.92]:43171) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1SeR4N-0003CF-Hi for emacs-orgmode@gnu.org; Tue, 12 Jun 2012 09:23:45 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1SeR4G-0002Mu-Rl for emacs-orgmode@gnu.org; Tue, 12 Jun 2012 09:23:39 -0400 Received: from mail-we0-f169.google.com ([74.125.82.169]:63976) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1SeR4G-0002MF-Fh for emacs-orgmode@gnu.org; Tue, 12 Jun 2012 09:23:32 -0400 Received: by wefh52 with SMTP id h52so4213823wef.0 for ; Tue, 12 Jun 2012 06:23:30 -0700 (PDT) In-Reply-To: (Mark Shoulson's message of "Mon, 11 Jun 2012 01:28:12 +0000 (UTC)") List-Id: "General discussions about Org-mode." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-orgmode-bounces+geo-emacs-orgmode=m.gmane.org@gnu.org Sender: emacs-orgmode-bounces+geo-emacs-orgmode=m.gmane.org@gnu.org To: Mark Shoulson Cc: emacs-orgmode@gnu.org Hello, Mark Shoulson writes: >> ASCII exporter also handle UTF-8. So it's good to have there too. > > Really? I would have thought ASCII meant ASCII, as in 7-bit clean > text. org-e-ascii.el (as old org-ascii.el) handles ASCII, Latin1 and UTF-8 encodings. > It looked to me like your solution would essentially boil down to "do > string handling when there's a string, otherwise recur down and find > the strings," which essentially means apply it to all the > strings... and there were already functions out there applying things > to strings, so this can just ride along with them. Here, let's look > at your suggestion and see if we can find what I missed: > > ] Walk element/object/secondary-string's contents . > ] > ] 1. When a string is encountered: > ] > ] 1. If it has a quote as its first or last position, check for > ] objects before or after the string to guess its status. An > ] object never starts with a white space, but you may have to > ] check :post-blank property in order to know if previous object > ] had white spaces at its end. > ] > ] 2. For each quote everywhere else in the string, your regexp can > ] handle it fine. > ] > ] 2. When an object belonging to `org-element-recursive-objects' is > ] encountered, apply the function to this object. > ] > ] 3. Accumulate returned strings or objects. > > So, if it's a string, use the regexps (if they can be smart enough to look at > beginning and end of the string, which they can--though I haven't been using the > :post-blank property so presumably something is amiss), and if it isn't a > string, recur down until you get to a string... Ah, but only if it's in > org-element-recursive-objects. You're missing an important part: the regexps cannot be smart enough for quotes at the beginning or the end of the string. There, you must look outside the string. Hence: > ] 1. If it has a quote as its first or last position, check for > ] objects before or after the string to guess its status. An > ] object never starts with a white space, but you may have to > ] check :post-blank property in order to know if previous object > ] had white spaces at its end. But you can only do that from the element containing the string, not from the string itself. > So the issue with the current state is that it > would wind up applying to too much? (it would hit code and verbatim elements, > for example, and that would be wrong.) No, you are not applying it too much (verbatim elements don't contain plain-text objects) but your function hasn't got access to enough information to be useful. > So it remains to find the right place in the processing to put > a function like the one you describe. I'm trying to get a proper > understanding of the code structure to see what you mean. Looks like > it should be something like a transcoder, only called on > everything... Transcoders are type specific, so that's not an option. > wait, called on the top-level parsed tree object, recursively doing > its thing before(?) the transcoders of the individual objects get to > it. That's called a parse tree filter. That should be a possibility indeed. The function would be applied on the parse tree and would replace strings within elements containing plain text (that is paragraph, verse-block and table-row types). parse tree filters are applied very early in the export process. Another option would be to integrate it into `org-element-normalize-contents', but I think the previous way is better. > The on-screen one would still use the plain-string computation, as you said, > since the full parse isn't available. Yes. > It would also need to be tweaked not to act on verbatim/comment text, > etc. Yes. You may want to use `org-element-at-point' and `org-element-type' to tell if you're somewhere smart quotes are allowed (in table, table-row, paragraph, verse-block elements). Regards, -- Nicolas Goaziou