Hi All (and Nicolas in particular), There's recently been a little bit of progress on the Pandoc issue for org-cite support (https://github.com/jgm/pandoc/issues/7329#issuecomment-982805313), and the maintainers are now asking if there is any specification for org-cite syntax. Looking at https://orgmode.org/worg/dev/org-syntax.html, there isn't, and so I've tried to come up with a draft. Most of it was worked out from what I know of the syntax combined with looking at org-element.el, however I'm not sure what character class applies to the global and key prefix/suffix components. The draft: Citation syntax is currently not documented, but from the implementation it looks something like this: #+begin_example [cite CITESTYLE: GLOBALPREFIX KEYCITES GLOBALSUFFIX] #+end_example The only mandatory component, =KEYCITES= consists of one or more instances of the pattern, separated by semicolons. #+begin_example KEYPREFIX @KEY KEYSUFFIX #+end_example =KEY= can be made of any word-constituent character, =-=, =.=, =:=, =?=, =!=, =`=, ='=, =/=, =*=, =@=, =+=, =|=, =(=, =)=, ={=, =}=, =<=, =>=, =&=, =_=, =^=, =$=, =#=, =%=, =%=, or =~=. I have not yet confirmed what =KEYPREFIX= and =KEYSUFFIX= may contain, but as a starting point, any of the characters allowed in =KEY= except =@= plus whitespace would seem fairly safe. =KEYSUFFIX= must start with a whitespace character to be able to be differentiated from =KEY=. =CITESTYLE= consists of a main =STYLE= and any number of =VARIANT=s (including zero), prefixed by forwards slashes in the following pattern #+begin_example /STYLE/VARIANT/VARIANT/VARIANT #+end_example =STYLE= and =VARIANT= can be made of any alphanumeric character, =_=, or =-=. =GLOBALPREFIX= and =GLOBALSUFFIX= can contain the same characters as =KEYPREFIX= and =KEYSUFFIX=, however =GLOBALPREFIX= must end with a semicolon, and =GLOBALSUFFIX= must start with a semicolon. "cite" and =CITESTYLE=, =KEYCITES= and =GLOBALSUFFIX= are /not/ separated by whitespace. Neither are =KEYPREFIX=, =@KEY=, or =KEYSUFFIX= separated by whitespace. Do let me know what you think :) Timothy.
Hi Timothy, Thanks for putting this together. Comments in line. Best! Tom For reference here is the tokenizer pattern I use in laundry at the moment. There are a number of issues with it ... https://github.com/tgbugs/laundry/blob/5a396bef98d9a3cd9ee929f21cd47612dd6cb1ac/laundry/lex-abbrev.rkt#L896-L913 > Citation syntax is currently not documented, but from the implementation > it looks something like this: > #+begin_example > [cite CITESTYLE: GLOBALPREFIX KEYCITES GLOBALSUFFIX] > #+end_example There is potential confusion here because =[cite= does not have to be followed by a space (rather, cannot be). The top level syntax is =[cite= terminating at the first occurrence of =]=. I think we may also need to include a note that no whitespace is allowed either? It will only be recognized within paragraph context (e.g. headings, paragraphs, and other places where org objects can appear). Stating that up front would clarify that the rest of the syntax described here is how to determine whether the citation is well formed/how to parse it. > =KEY= can be made of any word-constituent character, =-=, =.=, =:=, =?=, > =!=, =`=, ='=, =/=, =*=, =@=, =+=, =|=, =(=, =)=, ={=, =}=, =<=, =>=, > =&=, =_=, =^=, =$=, =#=, =%=, =%=, or =~=. You have a duplicated =%= here. > I have not yet confirmed what =KEYPREFIX= and =KEYSUFFIX= may contain, > but as a starting point, any of the characters allowed in =KEY= except > =@= plus whitespace would seem fairly safe. =KEYSUFFIX= must start with > a whitespace character to be able to be differentiated from =KEY=. I don't think we can allow whitespace here? > =CITESTYLE= consists of a main =STYLE= and any number of =VARIANT=s > (including zero), prefixed by forwards slashes in the following pattern > #+begin_example > /STYLE/VARIANT/VARIANT/VARIANT > #+end_example Need clarification on empty syles e.g. [cite//:] > "cite" and =CITESTYLE=, =KEYCITES= and =GLOBALSUFFIX= are /not/ > separated by whitespace. Neither are =KEYPREFIX=, =@KEY=, or =KEYSUFFIX= > separated by whitespace. I may be missing something, but this is confusing with respect to the statement about =KEYSUFFIX= and whitespace made above.
Hello, Timothy <tecosaur@gmail.com> writes: > Looking at https://orgmode.org/worg/dev/org-syntax.html, there isn't, Yup, I forgot to update it. > I have not yet confirmed what =KEYPREFIX= and =KEYSUFFIX= may contain, > but as a starting point, any of the characters allowed in =KEY= except > =@= plus whitespace would seem fairly safe. =KEYSUFFIX= must start with > a whitespace character to be able to be differentiated from =KEY=. KEYPREFIX may not contain a semicolon nor any combination forming a key (at-sign followed by a word character or some symbols). Square brackets are allowed only if they form a symmetric pair. Any other character is allowed. KEYSUFFIX has the same restrictions, minus the limitation about the key. > =CITESTYLE= consists of a main =STYLE= and any number of =VARIANT=s > (including zero), prefixed by forwards slashes in the following pattern > > #+begin_example > /STYLE/VARIANT/VARIANT/VARIANT > #+end_example Nope. This is only /STYLE/VARIANT, however VARIANT can contain "/" character. > =STYLE= and =VARIANT= can be made of any alphanumeric character, =_=, or =-=. > > =GLOBALPREFIX= and =GLOBALSUFFIX= can contain the same characters as > =KEYPREFIX= and =KEYSUFFIX=, however =GLOBALPREFIX= must end with a > semicolon, and =GLOBALSUFFIX= must start with a semicolon. Note the semicolons do not belong to affixes. > "cite" and =CITESTYLE=, =KEYCITES= and =GLOBALSUFFIX= are /not/ > separated by whitespace. Neither are =KEYPREFIX=, =@KEY=, or =KEYSUFFIX= > separated by whitespace. Addendum: whitespaces are not significant after the leading colon, and before the closing square bracket. They are significant in any other case. HTH, Regards, -- Nicolas Goaziou
Hi Nicolas, Thanks you for your feedback and clarifications. They were most helpful. Thomas, I've also made a few tweaks based on your reply. I've revised the draft in light of your comments, see below. Citations follow the pattern #+begin_example [cite CITESTYLE: GLOBALPREFIX KEYCITES GLOBALSUFFIX] #+end_example where "cite" and =CITESTYLE=, =KEYCITES= and =GLOBALSUFFIX= are /not/ separated by whitespace. Whitespace after the leading colon or before the closing square bracket is not significant. All other whitespace is significant. The only mandatory component, =KEYCITES= consists of one or more instances of the following pattern, separated by semicolons, #+begin_example KEYPREFIX @KEY KEYSUFFIX #+end_example where =KEYPREFIX=, =@KEY=, and =KEYSUFFIX= are /not/ separated by whitespace. =KEY= can be made of any word-constituent character, =-=, =.=, =:=, =?=, =!=, =`=, ='=, =/=, =*=, =@=, =+=, =|=, =(=, =)=, ={=, =}=, =<=, =>=, =&=, =_=, =^=, =$=, =#=, =%=, or =~=. =KEYPREFIX= and =KEYSUFFIX= are optional and can contain any characters other than a semicolon (=;=), so long as square brackets are balanced. =KEYPREFIX= cannot contain any subsequence that forms a =KEY=. Hence, a minimal citation is formed by the pattern ~[cite:@KEY]~. =CITESTYLE= consists of a main =STYLE= and optionally a =VARIANT=. Both the =STYLE= and =VARIANT= are prefixed by a forwards slash. #+begin_example /STYLE/VARIANT #+end_example =STYLE= and =VARIANT= can be made of any alphanumeric character, =_=, or =-=. Additionally, =VARIANT= can itself contain forward slashes (=/=) . =GLOBALPREFIX= and =GLOBALSUFFIX= can contain the same characters as =KEYPREFIX= and =KEYSUFFIX=. In the same manner as instances of the =KEYCITES= pattern, =KEYCITES=, =GLOBALPREFIX=, and =GLOBALSUFFIX= must be separated by semicolons. -- Timothy
[-- Attachment #1: Type: text/plain, Size: 907 bytes --] Hi Nicolas, Tom, I just thought I’d do a little test to see what markup citations accept with org-element, and I’m not quite sure what to make of the results. I’d appreciate hearing your thoughts on the example below. Let’s look at a strange citation. ┌──── │ [cite: @key *bold* text @@org:hey@@ and /italic/ and {{{macro()}}} and [fn:3] and │ <<target>> <2021-12-02 Thu> \latex and \alpha and a statscookie [3/7] src_lang{hi} │ s^{up} and s_{ub}] └──── Examining the interpretation of this citation with `org-element', the following inline elements / objects are recognised. ⁃ bold/italic/etc. (text markup elements) ⁃ latex fragment ⁃ super/subscript but most of the non-markup elements are not. ⁃ Export snippet ⁃ macro ⁃ footnote ⁃ target ⁃ timestamp ⁃ statistics cookie ⁃ inline source code All the best, Timothy
Timothy <tecosaur@gmail.com> writes:
> Let’s look at a strange citation.
>
> ┌────
> │ [cite: @key *bold* text @@org:hey@@ and /italic/ and {{{macro()}}} and [fn:3] and
> │ <<target>> <2021-12-02 Thu> \latex and \alpha and a statscookie [3/7] src_lang{hi}
> │ s^{up} and s_{ub}]
> └────
>
> Examining the interpretation of this citation with `org-element', the following
> inline elements / objects are recognised.
> ⁃ bold/italic/etc. (text markup elements)
> ⁃ latex fragment
> ⁃ super/subscript
> but most of the non-markup elements are not.
> ⁃ Export snippet
> ⁃ macro
> ⁃ footnote
> ⁃ target
> ⁃ timestamp
> ⁃ statistics cookie
> ⁃ inline source code
This is all defined in org-element-object-restrictions together with
other restrictions on what kinds of objects can be placed inside other
elements and objects:
(minimal-set '(bold code entity italic latex-fragment strike-through
subscript superscript underline verbatim))
(citation-reference ,@minimal-set)
Best,
Ihor
[-- Attachment #1: Type: text/plain, Size: 483 bytes --] Hi Ihor, > This is all defined in org-element-object-restrictions together with > other restrictions on what kinds of objects can be placed inside other > elements and objects: > > (minimal-set ’(bold code entity italic latex-fragment strike-through > subscript superscript underline verbatim)) > (citation-reference ,@minimal-set) Ah thanks! I wasn’t aware of that. I think this is something that should be mentioned in the spec. All the best, Timothy
Hi All, I've been away for a bit, but now I'm back and getting back into things 🙂. For starters, I'm thinking we may as well put a description of the citation syntax into org-syntax. I'm still interested in more/broader changes to improve the clarity and specificity of org-syntax (and have had some more ideas on that front), but we may as well start by adding this missing section. I think my current version resolves all the comments made so far, see below in case there's anything else that jumps out as needing changing. The draft v3 follows: ----- Citations follow the pattern #+begin_example [cite CITESTYLE: GLOBALPREFIX KEYCITES GLOBALSUFFIX] #+end_example where "cite" and =CITESTYLE=, =KEYCITES= and =GLOBALSUFFIX= are /not/ separated by whitespace. Whitespace after the leading colon or before the closing square bracket is not significant. All other whitespace is significant. The only mandatory component, =KEYCITES= consists of one or more instances of the following pattern, separated by semicolons, #+begin_example KEYPREFIX @KEY KEYSUFFIX #+end_example where =KEYPREFIX=, =@KEY=, and =KEYSUFFIX= are /not/ separated by whitespace. =KEY= can be made of any word-constituent character, =-=, =.=, =:=, =?=, =!=, =`=, ='=, =/=, =*=, =@=, =+=, =|=, =(=, =)=, ={=, =}=, =<=, =>=, =&=, =_=, =^=, =$=, =#=, =%=, or =~=. =KEYPREFIX= and =KEYSUFFIX= are optional and can contain any characters other than a semicolon (=;=), so long as all square brackets are balanced. =KEYPREFIX= cannot contain any subsequence that forms a =KEY=. Hence, a minimal citation is formed by the pattern ~[cite:@KEY]~. =CITESTYLE= consists of a main =STYLE= and optionally a =VARIANT=. Both the =STYLE= and =VARIANT= are prefixed by a forwards slash. #+begin_example /STYLE/VARIANT #+end_example =STYLE= and =VARIANT= can be made of any alphanumeric character, =_=, or =-=. Additionally, =VARIANT= can itself contain forward slashes (=/=) . =GLOBALPREFIX= and =GLOBALSUFFIX= can contain the same characters as =KEYPREFIX= and =KEYSUFFIX=. In the same manner as instances of the =KEYCITES= pattern, =KEYCITES=, =GLOBALPREFIX=, and =GLOBALSUFFIX= must be separated by semicolons. ------ Let me know if this looks alright. All the best, Timothy.