From mboxrd@z Thu Jan 1 00:00:00 1970 From: Nicolas Goaziou Subject: Re: Citations, continued Date: Fri, 06 Feb 2015 11:27:15 +0100 Message-ID: <87oap7z664.fsf@nicolasgoaziou.fr> References: <87vbjmn6wy.fsf@berkeley.edu> <87sieokx8e.fsf@berkeley.edu> <54d04780.cb58460a.5243.2603@mx.google.com> <87h9v3li8t.fsf@berkeley.edu> <54d078ff.b044440a.06ec.3cf6@mx.google.com> <87d25rkmag.fsf@berkeley.edu> <54d1bc7b.c57d440a.3c5d.2dca@mx.google.com> <87vbjh284z.fsf@nicolasgoaziou.fr> <87mw4tk4m7.fsf@berkeley.edu> Mime-Version: 1.0 Content-Type: text/plain Return-path: Received: from eggs.gnu.org ([2001:4830:134:3::10]:41516) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1YJg79-0001jN-O3 for emacs-orgmode@gnu.org; Fri, 06 Feb 2015 05:26:21 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1YJg76-00068S-Fw for emacs-orgmode@gnu.org; Fri, 06 Feb 2015 05:26:19 -0500 Received: from relay5-d.mail.gandi.net ([2001:4b98:c:538::197]:48304) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1YJg71-00067A-TH for emacs-orgmode@gnu.org; Fri, 06 Feb 2015 05:26:12 -0500 In-Reply-To: <87mw4tk4m7.fsf@berkeley.edu> (Richard Lawrence's message of "Wed, 04 Feb 2015 08:45:20 -0800") List-Id: "General discussions about Org-mode." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-orgmode-bounces+geo-emacs-orgmode=m.gmane.org@gnu.org Sender: emacs-orgmode-bounces+geo-emacs-orgmode=m.gmane.org@gnu.org To: Richard Lawrence Cc: emacs-orgmode@gnu.org Richard Lawrence writes: Thanks for this reverse engineering. > Specifically I think we need the following categories, all of which > would be objects: > - key > - prefix / pre-text > - suffix / post-text > - locator > - individual citation > - bracketed citation > - unbracketed citation > > These should have a grammar like the following, based on my > (reverse-engineered) understanding of the Pandoc syntax for citations: > > - A bracketed citation is a list of one or more individual citations, > separated by ';' if there are two or more, and surrounded by '[' ']' > - An individual citation is formatted like: PREFIX KEY LOCATOR SUFFIX > The key is obligatory, and the prefix, locator and suffix > are optional. > - A key optionally begins with '-', and obligatorily contains '@' > followed by a string of charcters which begins with a letter or '_', > and may contain alphanumeric characters and the following internal > punctuation characters: > :.#$%&-+?<>~/ > - A prefix or suffix is a text object (that may contain markup like > emphasis or macros) > - An unbracketed citation consists of a key, optionally followed by a > locator which is enclosed in '[' ']' I don't think all should be objects. For example, prefix and suffix can be properties in a `full-citation' object (like :tag in items). IIUC, we need three objects (I'm not wedded to the names): - short-citation (aka unbracketed citation), with :cite-key and :locator properties, both being strings and :suppress-author as a boolean ; - full-citation (aka individual citation), with, in addition to the properties above, :prefix and :suffix, both being parsed string. Since full citations can only exist in a bracketed citation, there is no reason to create a third object type for the latter. It acts as a mere container only useful for lexer. > I am not sure about the syntax of locators. In particular, I do not > know if they should allow internal markup, I do not know if they have an > internal syntax, and I do not know if a comma is required to separate > them from a key in a bracketed citation. This needs to be decided indeed. Is there any reason to allow markup there? My only concern is speed. A bracketed citation can induce a lot of backtracking since it can be triggered each time a square bracket is opened, which is not too uncommon, I think. Basically, at each "[", we need to find corresponding "]", and if there is, any key between the two. That's some overhead. Also, syntax is ambiguous. For example, in [[http://orgmode.org][some @key]] it is not clear if @key should be treated as a short-citation in a link description, or included in a full citation with "[http://orgmode.org][some " as its prefix. I mean, the answer is clear for you and me, but not necessarily at lexer's level. For example, Eric's parser chose the former, which is good, but also disallows square brackets in prefix, which rules out some objects from this location (mainly links and footnotes). That's why I suggested the [cite: ...] part in the first place, which you dismissed quickly. It reduces backtracking a lot and can solve easily some confusing situations. Of course I understand the need for compatibility with existing Pandoc syntax, but I wouldn't want us to shoot ourselves in the foot. Even if we don't use "cite:" markup, I think we should carefully specify current syntax to avoid loopholes. Regards, -- Nicolas Goaziou