emacs-orgmode@gnu.org archives
 help / color / mirror / code / Atom feed
* Formal syntax for org-cite
@ 2021-11-30 17:21 Timothy
  2021-11-30 23:24 ` Tom Gillespie
                   ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: Timothy @ 2021-11-30 17:21 UTC (permalink / raw)
  To: emacs-orgmode, Nicolas Goaziou


Hi All (and Nicolas in particular),

There's recently been a little bit of progress on the Pandoc issue for
org-cite support
(https://github.com/jgm/pandoc/issues/7329#issuecomment-982805313), and
the maintainers are now asking if there is any specification for
org-cite syntax.

Looking at https://orgmode.org/worg/dev/org-syntax.html, there isn't,
and so I've tried to come up with a draft. Most of it was worked out
from what I know of the syntax combined with looking at org-element.el,
however I'm not sure what character class applies to the global and key
prefix/suffix components.

The draft:


Citation syntax is currently not documented, but from the implementation
it looks something like this:
#+begin_example
[cite CITESTYLE: GLOBALPREFIX KEYCITES GLOBALSUFFIX]
#+end_example

The only mandatory component, =KEYCITES= consists of one or more
instances of the pattern, separated by semicolons.
#+begin_example
KEYPREFIX @KEY KEYSUFFIX
#+end_example

=KEY= can be made of any word-constituent character, =-=, =.=, =:=, =?=,
=!=, =`=, ='=, =/=, =*=, =@=, =+=, =|=, =(=, =)=, ={=, =}=, =<=, =>=,
=&=, =_=, =^=, =$=, =#=, =%=, =%=, or =~=.

I have not yet confirmed what =KEYPREFIX= and =KEYSUFFIX= may contain,
but as a starting point, any of the characters allowed in =KEY= except
=@= plus whitespace would seem fairly safe. =KEYSUFFIX= must start with
a whitespace character to be able to be differentiated from =KEY=.

=CITESTYLE= consists of a main =STYLE= and any number of =VARIANT=​s
(including zero), prefixed by forwards slashes in the following pattern
#+begin_example
/STYLE/VARIANT/VARIANT/VARIANT
#+end_example

=STYLE= and =VARIANT= can be made of any alphanumeric character, =_=, or =-=.

=GLOBALPREFIX= and =GLOBALSUFFIX= can contain the same characters as
=KEYPREFIX= and =KEYSUFFIX=, however =GLOBALPREFIX= must end with a
semicolon, and =GLOBALSUFFIX= must start with a semicolon.

"cite" and =CITESTYLE=, =KEYCITES= and =GLOBALSUFFIX= are /not/
separated by whitespace. Neither are =KEYPREFIX=, =@KEY=, or =KEYSUFFIX=
separated by whitespace.


Do let me know what you think :)

Timothy.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Formal syntax for org-cite
  2021-11-30 17:21 Formal syntax for org-cite Timothy
@ 2021-11-30 23:24 ` Tom Gillespie
  2021-12-01 13:16 ` Nicolas Goaziou
  2021-12-30  8:27 ` Timothy
  2 siblings, 0 replies; 8+ messages in thread
From: Tom Gillespie @ 2021-11-30 23:24 UTC (permalink / raw)
  To: Timothy; +Cc: emacs-orgmode, Nicolas Goaziou

Hi Timothy,
    Thanks for putting this together. Comments in line. Best!
Tom

For reference here is the tokenizer pattern I use in laundry at the moment.
There are a number of issues with it ...
https://github.com/tgbugs/laundry/blob/5a396bef98d9a3cd9ee929f21cd47612dd6cb1ac/laundry/lex-abbrev.rkt#L896-L913

> Citation syntax is currently not documented, but from the implementation
> it looks something like this:
> #+begin_example
> [cite CITESTYLE: GLOBALPREFIX KEYCITES GLOBALSUFFIX]
> #+end_example

There is potential confusion here because =[cite= does not have to be
followed by a space (rather, cannot be).

The top level syntax is =[cite= terminating at the first occurrence of =]=.
I think we may also need to include a note that no whitespace is allowed either?
It will only be recognized within paragraph context (e.g. headings, paragraphs,
and other places where org objects can appear). Stating that up front would
clarify that the rest of the syntax described here is how to determine whether
the citation is well formed/how to parse it.

> =KEY= can be made of any word-constituent character, =-=, =.=, =:=, =?=,
> =!=, =`=, ='=, =/=, =*=, =@=, =+=, =|=, =(=, =)=, ={=, =}=, =<=, =>=,
> =&=, =_=, =^=, =$=, =#=, =%=, =%=, or =~=.

You have a duplicated =%= here.

> I have not yet confirmed what =KEYPREFIX= and =KEYSUFFIX= may contain,
> but as a starting point, any of the characters allowed in =KEY= except
> =@= plus whitespace would seem fairly safe. =KEYSUFFIX= must start with
> a whitespace character to be able to be differentiated from =KEY=.

I don't think we can allow whitespace here?

> =CITESTYLE= consists of a main =STYLE= and any number of =VARIANT=s
> (including zero), prefixed by forwards slashes in the following pattern
> #+begin_example
> /STYLE/VARIANT/VARIANT/VARIANT
> #+end_example

Need clarification on empty syles e.g. [cite//:]

> "cite" and =CITESTYLE=, =KEYCITES= and =GLOBALSUFFIX= are /not/
> separated by whitespace. Neither are =KEYPREFIX=, =@KEY=, or =KEYSUFFIX=
> separated by whitespace.

I may be missing something, but this is confusing with respect to the
statement about =KEYSUFFIX= and whitespace made above.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Formal syntax for org-cite
  2021-11-30 17:21 Formal syntax for org-cite Timothy
  2021-11-30 23:24 ` Tom Gillespie
@ 2021-12-01 13:16 ` Nicolas Goaziou
  2021-12-02  6:26   ` Timothy
  2021-12-30  8:27 ` Timothy
  2 siblings, 1 reply; 8+ messages in thread
From: Nicolas Goaziou @ 2021-12-01 13:16 UTC (permalink / raw)
  To: Timothy; +Cc: emacs-orgmode

Hello,

Timothy <tecosaur@gmail.com> writes:

> Looking at https://orgmode.org/worg/dev/org-syntax.html, there isn't,

Yup, I forgot to update it.

> I have not yet confirmed what =KEYPREFIX= and =KEYSUFFIX= may contain,
> but as a starting point, any of the characters allowed in =KEY= except
> =@= plus whitespace would seem fairly safe. =KEYSUFFIX= must start with
> a whitespace character to be able to be differentiated from =KEY=.

KEYPREFIX may not contain a semicolon nor any combination forming a key
(at-sign followed by a word character or some symbols). Square brackets
are allowed only if they form a symmetric pair. Any other character is
allowed.

KEYSUFFIX has the same restrictions, minus the limitation about the key.

> =CITESTYLE= consists of a main =STYLE= and any number of =VARIANT=​s
> (including zero), prefixed by forwards slashes in the following pattern
>
> #+begin_example
> /STYLE/VARIANT/VARIANT/VARIANT
> #+end_example

Nope. This is only /STYLE/VARIANT, however VARIANT can contain "/" character.

> =STYLE= and =VARIANT= can be made of any alphanumeric character, =_=, or =-=.
>
> =GLOBALPREFIX= and =GLOBALSUFFIX= can contain the same characters as
> =KEYPREFIX= and =KEYSUFFIX=, however =GLOBALPREFIX= must end with a
> semicolon, and =GLOBALSUFFIX= must start with a semicolon.

Note the semicolons do not belong to affixes.

> "cite" and =CITESTYLE=, =KEYCITES= and =GLOBALSUFFIX= are /not/
> separated by whitespace. Neither are =KEYPREFIX=, =@KEY=, or =KEYSUFFIX=
> separated by whitespace.

Addendum: whitespaces are not significant after the leading colon, and
before the closing square bracket. They are significant in any other
case.

HTH,

Regards,
-- 
Nicolas Goaziou


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Formal syntax for org-cite
  2021-12-01 13:16 ` Nicolas Goaziou
@ 2021-12-02  6:26   ` Timothy
  2021-12-02 12:13     ` Timothy
  0 siblings, 1 reply; 8+ messages in thread
From: Timothy @ 2021-12-02  6:26 UTC (permalink / raw)
  To: Nicolas Goaziou, Tom Gillespie; +Cc: emacs-orgmode


Hi Nicolas,

Thanks you for your feedback and clarifications. They were most helpful.
Thomas, I've also made a few tweaks based on your reply.
I've revised the draft in light of your comments, see below.

Citations follow the pattern
#+begin_example
[cite CITESTYLE: GLOBALPREFIX KEYCITES GLOBALSUFFIX]
#+end_example
where "cite" and =CITESTYLE=, =KEYCITES= and =GLOBALSUFFIX= are /not/
separated by whitespace. Whitespace after the leading colon or before
the closing square bracket is not significant. All other whitespace is
significant.

The only mandatory component, =KEYCITES= consists of one or more
instances of the following pattern, separated by semicolons,
#+begin_example
KEYPREFIX @KEY KEYSUFFIX
#+end_example
where =KEYPREFIX=, =@KEY=, and =KEYSUFFIX= are /not/ separated by
whitespace.

=KEY= can be made of any word-constituent character, =-=, =.=, =:=, =?=,
=!=, =`=, ='=, =/=, =*=, =@=, =+=, =|=, =(=, =)=, ={=, =}=, =<=, =>=,
=&=, =_=, =^=, =$=, =#=, =%=, or =~=.

=KEYPREFIX= and =KEYSUFFIX= are optional and can contain any characters
other than a semicolon (=;=), so long as square brackets are balanced.
=KEYPREFIX= cannot contain any subsequence that forms a =KEY=.

Hence, a minimal citation is formed by the pattern ~[cite:@KEY]~.

=CITESTYLE= consists of a main =STYLE= and optionally a =VARIANT=​.
Both the =STYLE= and =VARIANT= are prefixed by a forwards slash.
#+begin_example
/STYLE/VARIANT
#+end_example
=STYLE= and =VARIANT= can be made of any alphanumeric character, =_=, or
=-=.  Additionally, =VARIANT= can itself contain forward slashes (=/=) .

=GLOBALPREFIX= and =GLOBALSUFFIX= can contain the same characters as
=KEYPREFIX= and =KEYSUFFIX=. In the same manner as instances of the
=KEYCITES= pattern, =KEYCITES=, =GLOBALPREFIX=, and =GLOBALSUFFIX= must
be separated by semicolons.

--
Timothy


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Formal syntax for org-cite
  2021-12-02  6:26   ` Timothy
@ 2021-12-02 12:13     ` Timothy
  2021-12-02 12:50       ` Ihor Radchenko
  0 siblings, 1 reply; 8+ messages in thread
From: Timothy @ 2021-12-02 12:13 UTC (permalink / raw)
  To: Nicolas Goaziou, Tom Gillespie; +Cc: emacs-orgmode

[-- Attachment #1: Type: text/plain, Size: 907 bytes --]

Hi Nicolas, Tom,

I just thought I’d do a little test to see what markup citations accept with
org-element, and I’m not quite sure what to make of the results. I’d appreciate
hearing your thoughts on the example below.

Let’s look at a strange citation.

┌────
│ [cite: @key *bold* text @@org:hey@@ and /italic/ and {{{macro()}}} and [fn:3] and
│ <<target>> <2021-12-02 Thu> \latex and \alpha and a statscookie [3/7] src_lang{hi}
│ s^{up} and s_{ub}]
└────

Examining the interpretation of this citation with `org-element', the following
inline elements / objects are recognised.
⁃ bold/italic/etc. (text markup elements)
⁃ latex fragment
⁃ super/subscript
but most of the non-markup elements are not.
⁃ Export snippet
⁃ macro
⁃ footnote
⁃ target
⁃ timestamp
⁃ statistics cookie
⁃ inline source code

All the best,
Timothy

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Formal syntax for org-cite
  2021-12-02 12:13     ` Timothy
@ 2021-12-02 12:50       ` Ihor Radchenko
  2021-12-02 12:55         ` Timothy
  0 siblings, 1 reply; 8+ messages in thread
From: Ihor Radchenko @ 2021-12-02 12:50 UTC (permalink / raw)
  To: Timothy; +Cc: Tom Gillespie, emacs-orgmode, Nicolas Goaziou

Timothy <tecosaur@gmail.com> writes:

> Let’s look at a strange citation.
>
> ┌────
> │ [cite: @key *bold* text @@org:hey@@ and /italic/ and {{{macro()}}} and [fn:3] and
> │ <<target>> <2021-12-02 Thu> \latex and \alpha and a statscookie [3/7] src_lang{hi}
> │ s^{up} and s_{ub}]
> └────
>
> Examining the interpretation of this citation with `org-element', the following
> inline elements / objects are recognised.
> ⁃ bold/italic/etc. (text markup elements)
> ⁃ latex fragment
> ⁃ super/subscript
> but most of the non-markup elements are not.
> ⁃ Export snippet
> ⁃ macro
> ⁃ footnote
> ⁃ target
> ⁃ timestamp
> ⁃ statistics cookie
> ⁃ inline source code

This is all defined in org-element-object-restrictions together with
other restrictions on what kinds of objects can be placed inside other
elements and objects:

(minimal-set '(bold code entity italic latex-fragment strike-through
			     subscript superscript underline verbatim))
(citation-reference ,@minimal-set)

Best,
Ihor


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Formal syntax for org-cite
  2021-12-02 12:50       ` Ihor Radchenko
@ 2021-12-02 12:55         ` Timothy
  0 siblings, 0 replies; 8+ messages in thread
From: Timothy @ 2021-12-02 12:55 UTC (permalink / raw)
  To: Ihor Radchenko; +Cc: Tom Gillespie, emacs-orgmode, Nicolas Goaziou

[-- Attachment #1: Type: text/plain, Size: 483 bytes --]

Hi Ihor,

> This is all defined in org-element-object-restrictions together with
> other restrictions on what kinds of objects can be placed inside other
> elements and objects:
>
> (minimal-set ’(bold code entity italic latex-fragment strike-through
> 			     subscript superscript underline verbatim))
> (citation-reference ,@minimal-set)

Ah thanks! I wasn’t aware of that. I think this is something that should be
mentioned in the spec.

All the best,
Timothy

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Formal syntax for org-cite
  2021-11-30 17:21 Formal syntax for org-cite Timothy
  2021-11-30 23:24 ` Tom Gillespie
  2021-12-01 13:16 ` Nicolas Goaziou
@ 2021-12-30  8:27 ` Timothy
  2 siblings, 0 replies; 8+ messages in thread
From: Timothy @ 2021-12-30  8:27 UTC (permalink / raw)
  To: emacs-orgmode, Nicolas Goaziou


Hi All,

I've been away for a bit, but now I'm back and getting back into things
🙂. For starters, I'm thinking we may as well put a description of the
citation syntax into org-syntax. I'm still interested in more/broader
changes to improve the clarity and specificity of org-syntax (and have
had some more ideas on that front), but we may as well start by adding
this missing section.

I think my current version resolves all the comments made so far, see
below in case there's anything else that jumps out as needing changing.

The draft v3 follows:

-----

Citations follow the pattern
#+begin_example
[cite CITESTYLE: GLOBALPREFIX KEYCITES GLOBALSUFFIX]
#+end_example
where "cite" and =CITESTYLE=, =KEYCITES= and =GLOBALSUFFIX= are /not/
separated by whitespace. Whitespace after the leading colon or before
the closing square bracket is not significant. All other whitespace is
significant.

The only mandatory component, =KEYCITES= consists of one or more instances of the
following pattern, separated by semicolons,
#+begin_example
KEYPREFIX @KEY KEYSUFFIX
#+end_example
where =KEYPREFIX=, =@KEY=, and =KEYSUFFIX= are /not/ separated by whitespace.

=KEY= can be made of any word-constituent character, =-=, =.=, =:=, =?=,
=!=, =`=, ='=, =/=, =*=, =@=, =+=, =|=, =(=, =)=, ={=, =}=, =<=, =>=,
=&=, =_=, =^=, =$=, =#=, =%=, or =~=.

=KEYPREFIX= and =KEYSUFFIX= are optional and can contain any characters
other than a semicolon (=;=), so long as all square brackets are
balanced. =KEYPREFIX= cannot contain any subsequence that forms a =KEY=.

Hence, a minimal citation is formed by the pattern ~[cite:@KEY]~.

=CITESTYLE= consists of a main =STYLE= and optionally a =VARIANT=​.
Both the =STYLE= and =VARIANT= are prefixed by a forwards slash.
#+begin_example
/STYLE/VARIANT
#+end_example
=STYLE= and =VARIANT= can be made of any alphanumeric character, =_=, or
=-=​. Additionally, =VARIANT= can itself contain forward slashes (=/=) .

=GLOBALPREFIX= and =GLOBALSUFFIX= can contain the same characters as
=KEYPREFIX= and =KEYSUFFIX=. In the same manner as instances of the
=KEYCITES= pattern, =KEYCITES=, =GLOBALPREFIX=, and =GLOBALSUFFIX= must
be separated by semicolons.

------

Let me know if this looks alright.

All the best,
Timothy.


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2021-12-30  8:32 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-11-30 17:21 Formal syntax for org-cite Timothy
2021-11-30 23:24 ` Tom Gillespie
2021-12-01 13:16 ` Nicolas Goaziou
2021-12-02  6:26   ` Timothy
2021-12-02 12:13     ` Timothy
2021-12-02 12:50       ` Ihor Radchenko
2021-12-02 12:55         ` Timothy
2021-12-30  8:27 ` Timothy

Code repositories for project(s) associated with this inbox:

	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).