From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Mark E. Shoulson" Subject: Re: "Smart" quotes Date: Tue, 29 May 2012 20:51:39 -0400 Message-ID: <4FC56F1B.5040201@kli.org> References: <4FBB08CA.5060705@kli.org> <87d35u8rvk.fsf@gmail.com> <4FBDA56E.5030901@kli.org> <87zk8w6v4q.fsf@gmail.com> <4FC00CE0.6060308@kli.org> <87r4u75tg9.fsf@gmail.com> <4FC426AC.2030109@kli.org> <87ehq227ky.fsf@gmail.com> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Return-path: Received: from eggs.gnu.org ([208.118.235.92]:42548) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1SZX8c-0006UU-6V for emacs-orgmode@gnu.org; Tue, 29 May 2012 20:51:47 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1SZX8a-0001CT-4k for emacs-orgmode@gnu.org; Tue, 29 May 2012 20:51:45 -0400 Received: from pi.meson.org ([96.56.207.26]:51543) by eggs.gnu.org with smtp (Exim 4.71) (envelope-from ) id 1SZX8Z-00019j-Va for emacs-orgmode@gnu.org; Tue, 29 May 2012 20:51:44 -0400 In-Reply-To: <87ehq227ky.fsf@gmail.com> List-Id: "General discussions about Org-mode." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-orgmode-bounces+geo-emacs-orgmode=m.gmane.org@gnu.org Sender: emacs-orgmode-bounces+geo-emacs-orgmode=m.gmane.org@gnu.org To: Nicolas Goaziou Cc: emacs-orgmode@gnu.org On 05/29/2012 01:57 PM, Nicolas Goaziou wrote: > Hello, > > "Mark E. Shoulson" writes: > > >> I guess it doesn't actually matter, but it starts to get weird if you >> find yourself looking arbitrarily far back, and then you start >> building in exceptions for crossing paragraph boundaries... > True. I had the exporter in mind, where you always start at the > beginning of the paragraph. It would be more difficult with search > starting in the middle of the paragraph. Maybe the on-screen stuff is no harder; will just have to see. >> And then there's the fact that multi-paragraph quotes usually have an >> open-quote for each paragraph but only one close-quote at the end... > Some french typographers suggest to use a close-quote at the beginning > of the paragraph to avoid that confusion, or to simply drop them (since > they are a pain to maintain anyway). I don't know about other languages > but, if that's the same, is it a good idea to bother implementing it? I've never heard of it. But I think we may be overthinking this; we can drive ourselves crazy trying to compress a dozen different typographical traditions (and informal customs) into a few Elisp rules. On the other hand, I don't think we need to throw up our hands and give up either! :) >> Actually keeping count of what level you're at, accurately, is >> a classic example of a non-regular language; you need a push-down >> automaton to keep count, and regular expressions don't cut it. > This is limited to 2 levels. True. >> I'm rambling. In sum, I'm going to start off /not/ trying to solve >> that problem, and assume the writer is going to use alternating " and >> as typography requires and not try to second-guess what level we're >> at. > You are right, the problem will be easier to solve with both " and '. > > Though, "as typography requires" is not true. In France, the /Imprimerie > Nationale/ suggests to use guillemots at both levels. Remember that > typography is localized, which is the main difficulty of the > implementation. Also a good point. All right, bottom line, this is sort of what I'm seeing. I'm not 100% sure which files should house these things, but something like this: 1) a variable containing for each language regexp for each of: open double-quote, close double-quote, open single-quote, close single-quote, and maybe mid-word apostrophe. Odds are these regexps are going to be the same for just about all languages (the regexps detecting them, mind you), so probably should have some sort of default that the alist can just reference. A language should also be allowed to define other quote regexps in its list too. We need these to be ordered, with a standard set, so that we can have... 2) for each *exporter* (including on-screen display), a variable that defines, for each language, what the *substitution* will be for open-double-quote, close-double-quote, etc. Other extras can be defined too. That way we can have an exporter-independent way to detect quotes to be smartified, but each exporter has its own way to smartify them. 3) Since most exporters are probably going to be handling doing the process approximately the same (match the regexp, stick in the associated substitution), org-export.el should have a generic function that does this which each exporter *may* call in (or as) its quote-smartifier in its text translator, unless it needs something more specific which it can provide itself. In terms of what is handled, the idea in my head is that we would expect the writer to be using " or ' to surround their quotes, regardless of what their native custom is (if they're doing it using their language-specific quote-marks, we don't need to bother with all this anyway). Goal is to handle either "quotes" or 'quotes' in either nesting (or no nesting, if someone does "quote' for some reason), and with any luck not get too confused with other uses of apostrophe. It makes sense to me, but I bet I explained it badly and people are going to have all kinds of issues with it. :) No telling when (if?) I'll be able to produce something along these lines, but it's something to start thinking about anyway. ~mark