From mboxrd@z Thu Jan 1 00:00:00 1970 From: torys.anderson@gmail.com (Tory S. Anderson) Subject: Re: Citation syntax: a revised proposal Date: Sun, 15 Feb 2015 06:17:30 -0500 Message-ID: <87fva72zlh.fsf@gmail.com> References: <87k2zjnc0e.fsf@berkeley.edu> Mime-Version: 1.0 Content-Type: text/plain Return-path: Received: from eggs.gnu.org ([2001:4830:134:3::10]:36745) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1YMxCl-0007mz-Kg for emacs-orgmode@gnu.org; Sun, 15 Feb 2015 06:17:45 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1YMxCf-0000QI-7D for emacs-orgmode@gnu.org; Sun, 15 Feb 2015 06:17:39 -0500 Received: from mail-qg0-x235.google.com ([2607:f8b0:400d:c04::235]:57391) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1YMxCf-0000Q4-1m for emacs-orgmode@gnu.org; Sun, 15 Feb 2015 06:17:33 -0500 Received: by mail-qg0-f53.google.com with SMTP id f51so19660028qge.12 for ; Sun, 15 Feb 2015 03:17:32 -0800 (PST) In-Reply-To: <87k2zjnc0e.fsf@berkeley.edu> (Richard Lawrence's message of "Sat, 14 Feb 2015 18:29:05 -0800") List-Id: "General discussions about Org-mode." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-orgmode-bounces+geo-emacs-orgmode=m.gmane.org@gnu.org Sender: emacs-orgmode-bounces+geo-emacs-orgmode=m.gmane.org@gnu.org To: Richard Lawrence Cc: emacs-orgmode@gnu.org +1 Thanks for the work substantiating the idea. Richard Lawrence writes: > Hi everyone, > > Since discussion seems to have petered out on the previous thread (see: > http://thread.gmane.org/gmane.emacs.orgmode/94524), I took some time to > go back over the discussion and write up a concrete proposal for > citation syntax. > > This proposal represents my attempt to formulate a syntax that is easy > to read, easy to parse, and covers all the use-cases that people > mentioned as being important. It is surely not perfect, but I learned a > lot from the previous thread, and I hope something like this will serve > the community's needs. > > The proposal is below, both inline (for easy quoting) and attached (for > easy reading). To keep it relatively short, I have mostly not explained > my reasoning for the choices I made, but I am happy to do so here if > anyone has questions. > > I welcome feedback, comments, criticisms, and objections on any point. > However, since we've already had a long discussion about this, I > respectfully request that we try to keep this thread focused. To that > end, I suggest: > > 1) If you have criticisms or objections, please try to indicate > whether you think they are `substantive' (e.g., you see a problem > that would prevent you from using this syntax, or prevent Org from > implementing it) or not (e.g., you would prefer a slightly > different but equivalent way of expressing something). > > 2) If you wish to express an opinion about the proposal without > offering further comments, let us know by just replying with +1 > (meaning you'd like to see this syntax, or something reasonably > similar to it, be adopted), 0, or -1 (meaning you'd prefer not to > see this syntax or anything similar to it adopted). > > I guess this is my Valentine to the Org community. :) Thanks for reading! > > Best, > Richard > > #+TITLE: Citation syntax, a revised proposal > #+DATE: <2015-02-14 Sat> > > #+AUTHOR: Richard Lawrence > #+EMAIL: richard.lawrence@berkeley.edu > > #+LANGUAGE: en > #+SELECT_TAGS: export > > #+EXCLUDE_TAGS: noexport > > * Citation syntax > ** Requirements > A citation is a textual reference to one or more individual works, > together with other information about those works, grouped together in > a single place. > > Within a citation, each reference to an individual work needs to be > capable of containing: > 1) a database key that references the cited work > 2) prefix / pre-note > 3) suffix / post-note > > Whole citations also need: > 4) [@4] a way of specifying whether the citation is in-text or > parenthetical > 5) a way of representing a common prefix and suffix, if the citation > is a multi-cite > 6) a way of specifying whether the citation should produce a > complete bibliography entry in-place > 7) an extensible way of specifying formatting properties to export > filters and/or specific export backends > > ** Citation definitions > *** Citation keys; bibliography references vs. complete entries > A citation key consists of a unique label preceded by a flag, which is > optionally preceded by a hyphen. > > The flag is either `@' or `&'. `@' indicates that the citation should > produce a normal reference to the bibliography entry for the cited > work (in whatever style the document uses), located elsewhere. > > The `&' flag indicates that the citation should produce a complete > bibliography entry for the cited work in the place where the citation > appears. > > The optional hyphen (`-') indicates that the author's name should be > suppressed from the rendered citation. (Note that this is only useful > in author-X citation styles; it should have no effect in numeric > styles.) > > *** Basic citations: Parenthetical vs. in-text > There are two basic types of citation: /parenthetical/ and /in-text/. > Each of these may contain references to one or more individual works. > > The difference between parenthetical and in-text citations is > expressed using parentheses around the /first/ citation key. A > parenthetical citation has such parentheses around the first citation > key; an in-text citation lacks them. (Parentheses around non-initial > keys are permitted for visual consistency and to keep the grammar > simple, but have no meaning.) > > A citation thus consists in general of a bracketed list, beginning > with `cite:', of one or more individual references, each of which: > - may contain a prefix, > - must contain a citation key, which may or may not be surrounded by `(...)' > - and may contain a suffix > Individual references are separated by semi-colons. > > There are also two special cases to make simple-but-common uses very > easy to type and read: > 1) a parenthetical citation for a single work with no prefix and > suffix may be written by just surrounding the key with brackets, > like: [@Doe99]. > 2) an in-text citation for a single work with no prefix and suffix > may be written as a /bare/ key, without brackets, like: @Doe99. > (Thus, in both of the `simple' cases, one less level of bracketing is > required.) > > Prefix and suffix text are regular Org text, which are allowed to > contain various kinds of Org markup (see the grammar below for a > complete list). > > *** Multi-cite citations > Multi-cite citations are distinguished from basic parenthetical and > in-text citations by the presence of an optional common prefix or > common suffix (which may not contain keys). If present, the common > prefix must occur before the first individual reference, and the > common suffix must occur after the last individual reference. The > common prefix and suffix are separated from the individual references > by semi-colons. > > *** Examples of main citation syntax > Basic parenthetical citation: > > #+BEGIN_QUOTE > The nineteenth century was very interesting. [cite: (@Doe99)] > #+END_QUOTE > > > Basic parenthetical citation using special-case syntax: > > #+BEGIN_QUOTE > The nineteenth century was very interesting. [@Doe99] > #+END_QUOTE > > > Parenthetical citation with multiple works and prefix and suffix: > > #+BEGIN_QUOTE > The nineteenth century was in fact lovely [cite: see (@Doe99) p. 44; > @Smith2000 has a review]. > #+END_QUOTE > > > Basic in-text citation with a suffix: > > #+BEGIN_QUOTE > As [cite: @Doe99 p. 44] says, the nineteenth century was very interesting. > #+END_QUOTE > > > In-text citation using special-case syntax: > > #+BEGIN_QUOTE > @Doe2000 explains that the twentieth century was even more interesting. > #+END_QUOTE > > > In-text citation with author suppressed: > > #+BEGIN_QUOTE > As Doe explained in his -@Doe2003, the twentieth century was somewhat > less interesting than previously thought. > #+END_QUOTE > > > Parenthetical citation with full-entry key: > > #+BEGIN_QUOTE > A complete bibliography entry follows in parentheses. [cite: (&Doe99)] > A complete bibliography entry follows in parentheses. [&Doe99] > #+END_QUOTE > > > In-text citation with full-entry key: > > #+BEGIN_QUOTE > A complete bibliography entry follows: [cite: &Doe99]. > A complete bibliography entry follows: &Doe99. > #+END_QUOTE > > > Full-entry in-text citation, in a footnote: > > #+BEGIN_QUOTE > Doe exhibits unusual scholarship.[fn:: &Doe99.] > #+END_QUOTE > > > In-text citation, with a complete bibliography entry minus the author > in a footnote, plus a suffix: > > #+BEGIN_QUOTE > @Doe99 exhibits unusual scholarship.[fn:1] > > [fn:1] [cite: -&Doe99 Cf. especially section 4.] > #+END_QUOTE > > > In-text multi-cite: > > #+BEGIN_QUOTE > Speculation abounds about what the twenty-first century will > bring. [cite: For an overview of this topic, see; @Smith1998; > @Jones1999; @Miller2001; and references therein.] > #+END_QUOTE > > > Parenthetical multi-cite: > > #+BEGIN_QUOTE > Speculation abounds about what the twenty-first century will > bring. [cite: For an overview of this topic, see; (@Smith1998); > @Jones1999; @Miller2001; and references therein.] > #+END_QUOTE > > > *** Syntax for extensions > Additional information can be supplied in a citation that may affect > how export filters or particular backends format it. > > This additional information may be supplied following the brackets of > a citation between the following delimiters: `%%( ... )'. > > (Note: I am proposing that this expression go /after/ the main > citation brackets both because it visually separates this extra > information from the main citation, and in order to avoid imposing any > further syntactic restriction on suffixes.) > > At least for now, any information supplied this way is /strictly the > user's responsibility/ to interpret (e.g., using an export filter). > This means that citations that have information like this are not > portable and might not be exported correctly: > - in other users' setups > - by particular backends > - by future versions of Org > > I will not deal with the details of how this additional information > should be syntactically represented, since this has not really been > discussed. But I suggest that, to deal with the complexities of > additional information in full generality, something like a complete > Lisp list is required. Thus, I suggest that this additional > information simply be represented as a Lisp list. (Besides > generality, this has the benefit of making the syntax easy to parse: > the parser can just call Elisp's read function with a marker after the > `%%'.) > > I provide these examples merely to illustrate the possibilities here: > > #+BEGIN_QUOTE > @vonNeumann1930 %%(:type genitive :capitalize t) model can only handle > a limited range of observed cases. > > @McCarthy1950 %%('s) clever use of Lisp syntax was also used to > express the Saxon genitive. > > For more, see Ref. @Doe99 %%(:type refnum :follow-to "some.pdf"). > > Even more complicated examples occur after Doe's famous article from > [cite: @Doe99] %%(:type date-only). > > And in [cite: @Doe2000] %%(:attr_latex (:format-string > "\citeyear{%KEY}") :attr_html (:only-fields (month year))), Doe > finally realized that arbitrary complexity was a powerful but > double-edged sword. > > @_aParticularlyUGLYkey:is-this-one %%(:overlay "Nice Display") > #+END_QUOTE > > ** Grammar > This section formally documents the syntax of citations discussed > above. > > To represent the syntax of citations, we need a category of /citation/ > objects, which require the following properties (the names here are not > important and could be changed): > - is-parenthetical (boolean; nil means is in-text) > - common-prefix (text) > - common-suffix (text) > - references (list) > - extra-info (list) > > Each reference in the list of references should be a plist with the > following properties: > - prefix (text) > - suffix (text) > - key (string) > - is-parenthesized (boolean; t means key was parenthesized; only > significant for the first reference in a citation) > - suppress-author (boolean; t means author name should not be output) > - is-full (boolean; t means a full bibliography entry should be > output in-place) > > The category of citations has the following grammar: > - A CITATION is a PARENTHETICAL-CITATION or an IN-TEXT citation. > - A PARENTHETICAL-CITATION is either a SIMPLE-PARENTHETICAL or a > CITATION-LIST whose first individual INDIVIDUAL-REFERENCE is a > PARENTHESIZED-KEY > - An IN-TEXT-CITATION is either a SIMPLE-IN-TEXT, or a > CITATION-LIST whose first INDIVIDUAL-REFERENCE is a BARE-KEY. > - A SIMPLE-PARENTHETICAL is a KEY immediately surrounded by square > brackets, optionally followed by an EXTRA-INFO clause. > - A SIMPLE-IN-TEXT is a BARE-KEY, optionally followed by an > EXTRA-INFO clause > - A CITATION-LIST has the format > [cite: PREFIX; INDIVIDUAL-REFERENCE; ... INDIVIDUAL-REFERENCE; SUFFIX] EXTRA-INFO > where the initial PREFIX, final SUFFIX, and EXTRA-INFO clause are > optional. At least one INDIVIDUAL-REFERENCE must be present. > - An INDIVIDUAL-REFERENCE has the format: > PREFIX KEY-MAYBE-PARENS SUFFIX > The KEY-MAYBE-PARENS is obligatory, and the prefix and suffix > are optional. > - A KEY-MAYBE-PARENS is either a BARE-KEY or PARENTHESIZED-KEY > - A BARE-KEY is a KEY with immediately-preceding whitespace > - A PARENTHESIZED-KEY is a KEY immediately surrounded by `(' and `)'. > - A KEY optionally begins with `-', and obligatorily contains `@' or > `&' followed by a string of characters which begins with a letter > or `_', and may contain alphanumeric characters and the following > internal punctuation characters: > :.#$%&-+?<>~/ > - A PREFIX or SUFFIX is arbitrary text (except `;', `]', and > KEY-MAYBE-PARENs) which may contain only the following Org > objects: > - bold > - code > - entity > - italic > - latex-fragment > - line-break > - strike-through > - subscript > - superscript > - underline > - superscript > (Note that this list could be extended somewhat if necessary.) > - An EXTRA-INFO clause consists of data not specified by this > grammar, in between `%%(' and `)' > > ** Outstanding issues > It seems to me that there are potential problems with the above > proposal in a number of areas, but I cannot tell how serious they are, > or what changes (if any) should be made to solve them. I don't > pretend that this is an exhaustive list: > 1) *Nesting.* I have favored LaTeX compatibility for in-text > citations with multiple references; but this means there is no > way to `nest' citations. Thus, there is no way to express (in > the main syntax) what Pandoc expresses as: > @Doe99 [p. 34; see also @DoeRoe2000] > which renders like: > Doe (1999, p. 34; see also Doe and Roe 2000) > Instead, since a citation is in-text or parenthetical as a whole, > the equivalent in the above syntax > [cite: @Doe99 p. 34; see also @DoeRoe2000] > should render like: > Doe (1999, p. 34), see also Doe and Roe (2000). > I am not certain if Pandoc-like output is important in this case. > The few people who commented on this said that it was not. > 2) *Limitations on prefixes and suffixes.* There may be legitimate > uses of `@', `;', `]', etc. inside prefix or suffix text that the > above syntax does not allow. Examples might include: > - use of semi-colons as part of the prefix/suffix text > - footnotes, links, or timestamps inside a prefix/suffix > I am not certain how important these cases are. If they are > important, some of them might be able to be worked around with > entities. > 3) *Edge cases.* The above syntax may make it possible to express > things that don't make sense, or would be too difficult to > export. The only one I can think of is that it is possible to > mix `@'-style and `&'-style keys in the same citation. I am not > sure if this should be forbidden; it may sometimes make sense. > It may also be possible to express things that external tools, > such as citeproc-js, don't know how to process. I do not have a > good sense of what, if anything, falls into that category, and > what should be done about it. > 4) *Citation commands.* Rather than introduce an explicit > representation for different citation commands/types, I have used > different parts of the syntax to express the common distinctions > that people mentioned. I suggest that, for now, anything beyond > these basic distinctions be left to the user-extension syntax. > However, if it becomes clear in the future that there is a need > to add a representation for a command to the main syntax, there > is a natural place to do so: immediately after the `cite:' tag > (as Nicolas suggested). > > Also, I have not said anything in this proposal to address how other > document metadata should be represented, which has not been discussed > much on the list. I think this should be discussed separately. >