From mboxrd@z Thu Jan 1 00:00:00 1970 From: Richard Lawrence Subject: Re: Citations, continued Date: Fri, 06 Feb 2015 14:41:19 -0800 Message-ID: <87fvaibr3k.fsf@berkeley.edu> References: <87vbjmn6wy.fsf@berkeley.edu> <87sieokx8e.fsf@berkeley.edu> <54d04780.cb58460a.5243.2603@mx.google.com> <87h9v3li8t.fsf@berkeley.edu> <54d078ff.b044440a.06ec.3cf6@mx.google.com> <87d25rkmag.fsf@berkeley.edu> <54d1bc7b.c57d440a.3c5d.2dca@mx.google.com> <87vbjh284z.fsf@nicolasgoaziou.fr> <87mw4tk4m7.fsf@berkeley.edu> <87oap7z664.fsf@nicolasgoaziou.fr> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Return-path: Received: from eggs.gnu.org ([2001:4830:134:3::10]:59180) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1YJrbJ-0004Fw-9J for emacs-orgmode@gnu.org; Fri, 06 Feb 2015 17:42:14 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1YJrbF-0001N6-QU for emacs-orgmode@gnu.org; Fri, 06 Feb 2015 17:42:13 -0500 Received: from plane.gmane.org ([80.91.229.3]:46924) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1YJrbF-0001L3-G0 for emacs-orgmode@gnu.org; Fri, 06 Feb 2015 17:42:09 -0500 Received: from list by plane.gmane.org with local (Exim 4.69) (envelope-from ) id 1YJrbD-0000SP-0T for emacs-orgmode@gnu.org; Fri, 06 Feb 2015 23:42:07 +0100 Received: from airbears2-136-152-142-176.airbears2.berkeley.edu ([136.152.142.176]) by main.gmane.org with esmtp (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Fri, 06 Feb 2015 23:42:06 +0100 Received: from richard.lawrence by airbears2-136-152-142-176.airbears2.berkeley.edu with local (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Fri, 06 Feb 2015 23:42:06 +0100 List-Id: "General discussions about Org-mode." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-orgmode-bounces+geo-emacs-orgmode=m.gmane.org@gnu.org Sender: emacs-orgmode-bounces+geo-emacs-orgmode=m.gmane.org@gnu.org To: emacs-orgmode@gnu.org Hi Nicolas and all, Nicolas Goaziou writes: > Richard Lawrence writes: > > Thanks for this reverse engineering. > >> Specifically I think we need the following categories, all of which >> would be objects: >> - key >> - prefix / pre-text >> - suffix / post-text >> - locator >> - individual citation >> - bracketed citation >> - unbracketed citation >> >> These should have a grammar like the following, based on my >> (reverse-engineered) understanding of the Pandoc syntax for citations: >> >> - A bracketed citation is a list of one or more individual citations, >> separated by ';' if there are two or more, and surrounded by '[' ']' >> - An individual citation is formatted like: PREFIX KEY LOCATOR SUFFIX >> The key is obligatory, and the prefix, locator and suffix >> are optional. >> - A key optionally begins with '-', and obligatorily contains '@' >> followed by a string of charcters which begins with a letter or '_', >> and may contain alphanumeric characters and the following internal >> punctuation characters: >> :.#$%&-+?<>~/ >> - A prefix or suffix is a text object (that may contain markup like >> emphasis or macros) >> - An unbracketed citation consists of a key, optionally followed by a >> locator which is enclosed in '[' ']' > > I don't think all should be objects. For example, prefix and suffix can > be properties in a `full-citation' object (like :tag in items). Yes, sorry, this was dumb of me...for some reason, I was thinking "everything in Org syntax has to be an object or an element, and these aren't elements, so they're objects". But obviously, some of these categories are merely internal or merely represent properties of objects. > IIUC, we need three objects (I'm not wedded to the names): > > - short-citation (aka unbracketed citation), with :cite-key > and :locator properties, both being strings and :suppress-author as > a boolean ; > > - full-citation (aka individual citation), with, in addition to the > properties above, :prefix and :suffix, both being parsed string. > Since full citations can only exist in a bracketed citation, there is no > reason to create a third object type for the latter. It acts as a mere > container only useful for lexer. I think this is not quite right: in my original terminology, `individual citation' is just an intermediate category. A bracketed/full citation contains at least one, but may contain many, `individual' citations, like: [See @Doe99, p. 3; also @Doe2000, p. 989.] This is a bracketed/full citation containing two individual citations, each with their own prefix and suffix. >> I am not sure about the syntax of locators. In particular, I do not >> know if they should allow internal markup, I do not know if they have an >> internal syntax, and I do not know if a comma is required to separate >> them from a key in a bracketed citation. > > This needs to be decided indeed. Is there any reason to allow markup > there? I had a look at the Pandoc parser; see: https://github.com/jgm/pandoc/blob/master/src/Text/Pandoc/Readers/Markdown.hs (Citation stuff starts at line 1843.) My Haskell is about as good as my German (which is to say: not very), but I think I learned a few interesting things. First, from what I can tell, there is actually no separate category of `locators', despite the documentation. There's just the suffix, which is anything between the key and `;' or `]'. (The bareloc function seems to just look for a regular suffix. But I could be missing something.) Thus, maybe we can drop that. Also, it appears that you can write things like @Smith99 [p. 33; see also @Doe2014] says something interesting. That is, an in-text citation with a suffix may also contain further citations in the brackets, after the suffix for the in-text citations. > My only concern is speed. A bracketed citation can induce a lot of > backtracking since it can be triggered each time a square bracket is > opened, which is not too uncommon, I think. Basically, at each "[", we > need to find corresponding "]", and if there is, any key between the > two. That's some overhead. Good point. I hadn't thought about this at all. > Also, syntax is ambiguous. For example, in > > [[http://orgmode.org][some @key]] > > it is not clear if @key should be treated as a short-citation in a link > description, or included in a full citation with > "[http://orgmode.org][some " as its prefix. I mean, the answer is clear > for you and me, but not necessarily at lexer's level. For example, > Eric's parser chose the former, which is good, but also disallows square > brackets in prefix, which rules out some objects from this location > (mainly links and footnotes). Yes, good point. Also, inline export snippets (@@latex: ...@@) could prove problematic. I do think it's important to allow some markup in the prefix and suffix, because there are obvious uses where you might want emphasis, etc. My initial thought is that a prefix or suffix should only allow: - Entities and LaTeX fragments - Line breaks? - Macros - Text markup I'd also be happy without macros and line breaks, personally. > That's why I suggested the [cite: ...] part in the first place, which > you dismissed quickly. It reduces backtracking a lot and can solve > easily some confusing situations. > > Of course I understand the need for compatibility with existing Pandoc > syntax, but I wouldn't want us to shoot ourselves in the foot. Even if > we don't use "cite:" markup, I think we should carefully specify current > syntax to avoid loopholes. Another interesting thing I learned from the Pandoc source is that, should we want to adopt "[cite: ...]" syntax, I think it would be pretty trivial for Pandoc to support it. (Worst case, they can copy-and-paste the Markdown citation parser and then add "cite:" in a couple of places.) So if this is necessary on the Org side for performance or ambiguity reasons, I am not against it. One question, though, is how this should work with in-text citations. Should I have to write: @Smith99 [cite:p. 33] or @Smith99 [cite:p. 33; see also @Doe2014] ? Best, Richard