From mboxrd@z Thu Jan  1 00:00:00 1970
From: Nicolas Goaziou <n.goaziou@gmail.com>
Subject: Re: Smart Quotes Exporting
Date: Tue, 12 Jun 2012 15:21:05 +0200
Message-ID: <874nqgeke6.fsf@gmail.com>
References: <4FBB08CA.5060705@kli.org> <87d35u8rvk.fsf@gmail.com>
	<4FBDA56E.5030901@kli.org> <87zk8w6v4q.fsf@gmail.com>
	<4FC00CE0.6060308@kli.org> <87r4u75tg9.fsf@gmail.com>
	<4FC426AC.2030109@kli.org> <87ehq227ky.fsf@gmail.com>
	<4FC56F1B.5040201@kli.org> <loom.20120531T025544-595@post.gmane.org>
	<87r4u031ye.fsf@gmail.com> <4FC7FE2C.6040702@kli.org>
	<878vg72bzy.fsf@gmail.com> <4FCEBCF5.1070209@kli.org>
	<87haunexn8.fsf@gmail.com> <loom.20120611T024716-455@post.gmane.org>
Mime-Version: 1.0
Content-Type: text/plain
Return-path: <emacs-orgmode-bounces+geo-emacs-orgmode=m.gmane.org@gnu.org>
Received: from eggs.gnu.org ([208.118.235.92]:43171)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <n.goaziou@gmail.com>) id 1SeR4N-0003CF-Hi
	for emacs-orgmode@gnu.org; Tue, 12 Jun 2012 09:23:45 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <n.goaziou@gmail.com>) id 1SeR4G-0002Mu-Rl
	for emacs-orgmode@gnu.org; Tue, 12 Jun 2012 09:23:39 -0400
Received: from mail-we0-f169.google.com ([74.125.82.169]:63976)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <n.goaziou@gmail.com>) id 1SeR4G-0002MF-Fh
	for emacs-orgmode@gnu.org; Tue, 12 Jun 2012 09:23:32 -0400
Received: by wefh52 with SMTP id h52so4213823wef.0
	for <emacs-orgmode@gnu.org>; Tue, 12 Jun 2012 06:23:30 -0700 (PDT)
In-Reply-To: <loom.20120611T024716-455@post.gmane.org> (Mark Shoulson's
	message of "Mon, 11 Jun 2012 01:28:12 +0000 (UTC)")
List-Id: "General discussions about Org-mode." <emacs-orgmode.gnu.org>
List-Unsubscribe: <https://lists.gnu.org/mailman/options/emacs-orgmode>,
	<mailto:emacs-orgmode-request@gnu.org?subject=unsubscribe>
List-Archive: <http://lists.gnu.org/archive/html/emacs-orgmode>
List-Post: <mailto:emacs-orgmode@gnu.org>
List-Help: <mailto:emacs-orgmode-request@gnu.org?subject=help>
List-Subscribe: <https://lists.gnu.org/mailman/listinfo/emacs-orgmode>,
	<mailto:emacs-orgmode-request@gnu.org?subject=subscribe>
Errors-To: emacs-orgmode-bounces+geo-emacs-orgmode=m.gmane.org@gnu.org
Sender: emacs-orgmode-bounces+geo-emacs-orgmode=m.gmane.org@gnu.org
To: Mark Shoulson <mark@kli.org>
Cc: emacs-orgmode@gnu.org

Hello,

Mark Shoulson <mark@kli.org> writes:

>> ASCII exporter also handle UTF-8. So it's good to have there too.
>
> Really?  I would have thought ASCII meant ASCII, as in 7-bit clean
> text.

org-e-ascii.el (as old org-ascii.el) handles ASCII, Latin1 and UTF-8
encodings.

> It looked to me like your solution would essentially boil down to "do
> string handling when there's a string, otherwise recur down and find
> the strings," which essentially means apply it to all the
> strings... and there were already functions out there applying things
> to strings, so this can just ride along with them.  Here, let's look
> at your suggestion and see if we can find what I missed:
>
> ] Walk element/object/secondary-string's contents .
> ] 
> ]   1. When a string is encountered:
> ]
> ]      1. If it has a quote as its first or last position, check for
> ]         objects before or after the string to guess its status. An
> ]         object never starts with a white space, but you may have to
> ]         check :post-blank property in order to know if previous object
> ]         had white spaces at its end.
> ]
> ]      2. For each quote everywhere else in the string, your regexp can
> ]         handle it fine.
> ]
> ]   2. When an object belonging to `org-element-recursive-objects' is
> ]      encountered, apply the function to this object.
> ]
> ]   3. Accumulate returned strings or objects.
>
> So, if it's a string, use the regexps (if they can be smart enough to look at
> beginning and end of the string, which they can--though I haven't been using the
> :post-blank property so presumably something is amiss), and if it isn't a
> string, recur down until you get to a string... Ah, but only if it's in
> org-element-recursive-objects.

You're missing an important part: the regexps cannot be smart enough for
quotes at the beginning or the end of the string. There, you must look
outside the string. Hence:

> ]      1. If it has a quote as its first or last position, check for
> ]         objects before or after the string to guess its status. An
> ]         object never starts with a white space, but you may have to
> ]         check :post-blank property in order to know if previous object
> ]         had white spaces at its end.

But you can only do that from the element containing the string, not
from the string itself.

> So the issue with the current state is that it
> would wind up applying to too much? (it would hit code and verbatim elements,
> for example, and that would be wrong.)

No, you are not applying it too much (verbatim elements don't contain
plain-text objects) but your function hasn't got access to enough
information to be useful.

> So it remains to find the right place in the processing to put
> a function like the one you describe.  I'm trying to get a proper
> understanding of the code structure to see what you mean.  Looks like
> it should be something like a transcoder, only called on
> everything... 

Transcoders are type specific, so that's not an option.

> wait, called on the top-level parsed tree object, recursively doing
> its thing before(?) the transcoders of the individual objects get to
> it.

That's called a parse tree filter. That should be a possibility
indeed. The function would be applied on the parse tree and would
replace strings within elements containing plain text (that is
paragraph, verse-block and table-row types). parse tree filters are
applied very early in the export process.

Another option would be to integrate it into
`org-element-normalize-contents', but I think the previous way is
better.

> The on-screen one would still use the plain-string computation, as you said,
> since the full parse isn't available.

Yes.

> It would also need to be tweaked not to act on verbatim/comment text,
> etc.

Yes. You may want to use `org-element-at-point' and `org-element-type'
to tell if you're somewhere smart quotes are allowed (in table,
table-row, paragraph, verse-block elements).


Regards,

-- 
Nicolas Goaziou