emacs-orgmode@gnu.org archives
 help / color / mirror / code / Atom feed
* Some commentary on the Org Syntax document
@ 2021-12-02  9:23 Timothy
  2021-12-02 19:00 ` Tom Gillespie
  0 siblings, 1 reply; 13+ messages in thread
From: Timothy @ 2021-12-02  9:23 UTC (permalink / raw)
  To: emacs-orgmode

[-- Attachment #1: Type: text/plain, Size: 2302 bytes --]

Hi All (& Nicolas in particular again),

With my recent efforts to write a parser based on
<https://orgmode.org/worg/dev/org-syntax.html>, I’ve developed a few thoughts on
that document. Hopefully, they can lead to some improvements and
clarifications.

――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――

As a general comment, in many places the Org Syntax document states what
characters a component can contain, but not what objects/elements. This feels
like a bit of a hole in the current specifications.


Sections
════════

Heading
───────

⁃ Ok, so `TITLE' can have any character but a newline, but what Org components can it contain?
  I’m going to assume any object?


Affiliated Keywords
═══════════════════


Greater Elements
════════════════

Greater blocks
──────────────

⁃ It is not explained what is ment by a “special block”
⁃ Aren’t lines starting with `#+' also quoted by a comma?


Drawers and Property Drawers
────────────────────────────

⁃ “Contents can contain any element but another drawer”
  • Does “any element” mean “any Element or Greater Element”


Dynamic Blocks
──────────────

⁃ It is not specified what `CONTENTS' may be
⁃ Surely `PARAMETERS' cannot contain a newline?


Plain Lists and Items
─────────────────────

⁃ It is not completely clear what content an item may have.
  I assume any Object?


Tables
──────

⁃ Surely newlines are not allowed in `FORMULAS'


Elements
════════

Clocks
──────

Two allowed forms are listed, but are all four of the below allowed or only two?
┌────
│ CLOCK: INACTIVE-TIMESTAMP
│ CLOCK: INACTIVE-TIMESTAMP DURATION
│ CLOCK: INACTIVE-TIMESTAMP-RANGE
│ CLOCK: INACTIVE-TIMESTAMP-RANGE DURATION
└────

All the best,
Timothy

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Some commentary on the Org Syntax document
  2021-12-02  9:23 Some commentary on the Org Syntax document Timothy
@ 2021-12-02 19:00 ` Tom Gillespie
  2021-12-02 19:16   ` Timothy
  0 siblings, 1 reply; 13+ messages in thread
From: Tom Gillespie @ 2021-12-02 19:00 UTC (permalink / raw)
  To: Timothy; +Cc: emacs-orgmode

Hi Timothy,
    Replies in line. Best!
Tom

On Thu, Dec 2, 2021 at 1:32 AM Timothy <tecosaur@gmail.com> wrote:
>
> Hi All (& Nicolas in particular again),
>
> With my recent efforts to write a parser based on
> <https://orgmode.org/worg/dev/org-syntax.html>, I’ve developed a few thoughts on
> that document. Hopefully, they can lead to some improvements and
> clarifications.
>
> ――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――
>
> As a general comment, in many places the Org Syntax document states what
> characters a component can contain, but not what objects/elements. This feels
> like a bit of a hole in the current specifications.

This is indeed confusing because there are some implicit constraints
that are not
listed because they never come up. For example, you cannot have two newlines
inside an inline footnote because the two newlines break the paragraph and the
thing that appears to be an inline footnote is just plain text that is
never terminated.

Ensuring that font locking is in sync org-element and org-export is
critical to ensure
that users know what will actually happen.

>
>
> Sections
> ════════
>
> Heading
> ───────
>
> ⁃ Ok, so `TITLE' can have any character but a newline, but what Org components can it contain?
>   I’m going to assume any object?

Via org-element-object-restrictions it is standard-set-no-line-break which is
all elements except citation-reference, table-cell, and line-break.

>
>
> Affiliated Keywords
> ═══════════════════
>
>
> Greater Elements
> ════════════════
>
> Greater blocks
> ──────────────
>
> ⁃ It is not explained what is ment by a “special block”
> ⁃ Aren’t lines starting with `#+' also quoted by a comma?
>
>
> Drawers and Property Drawers
> ────────────────────────────
>
> ⁃ “Contents can contain any element but another drawer”
>   • Does “any element” mean “any Element or Greater Element”

Any element that does not have greater precedence, so that would
be only a heading.

>
> Dynamic Blocks
> ──────────────
>
> ⁃ It is not specified what `CONTENTS' may be

Implicitly follows the same rules as drawers, no headings
and no nesting of dynamic blocks. Text should be added
that states this explicitly.

> ⁃ Surely `PARAMETERS' cannot contain a newline?

Termination by newline is implicit in the example, but the text is confusing.

> Plain Lists and Items
> ─────────────────────
>
> ⁃ It is not completely clear what content an item may have.
>   I assume any Object?

By my reading it may contain anything, objects and elements,
except for a heading, but that is already implied by the de-indent.

To quote from the docs:

An item ends before the next item, the first line less or equally
indented than its starting line, or two consecutive empty lines.
Indentation of lines within other greater elements do not count,
neither do inlinetasks boundaries.

This makes plain lists one of the most complex elements to parse.

>
> Tables
> ──────
>
> ⁃ Surely newlines are not allowed in `FORMULAS'

No newlines are implicit in the use of "lines" but still confusing.

>
> Elements
> ════════
>
> Clocks
> ──────
>
> Two allowed forms are listed, but are all four of the below allowed or only two?
> ┌────
> │ CLOCK: INACTIVE-TIMESTAMP
> │ CLOCK: INACTIVE-TIMESTAMP DURATION
> │ CLOCK: INACTIVE-TIMESTAMP-RANGE
> │ CLOCK: INACTIVE-TIMESTAMP-RANGE DURATION
> └────

No. Only the two are allowed. An inactive timestamp alone is a
starting point, adding a duration without the end point means
that there is no way to check that the range and duration match.

> All the best,
> Timothy


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Some commentary on the Org Syntax document
  2021-12-02 19:00 ` Tom Gillespie
@ 2021-12-02 19:16   ` Timothy
  2021-12-04  5:26     ` Tom Gillespie
  2021-12-04  6:17     ` Ihor Radchenko
  0 siblings, 2 replies; 13+ messages in thread
From: Timothy @ 2021-12-02 19:16 UTC (permalink / raw)
  To: Tom Gillespie; +Cc: emacs-orgmode


Hi Tom,

Thanks for your comments, they've been most helpful.
I have some comments on your comments, and have also started drafting
some tweaks to the document in light of your initial comments, put as a
diff excerpt at the end of this email.

For starters, I have come more general comments. However, this has
turned out a bit longer than I intended. Unfortunately I am moments away
from heading to bed, so to quote Pascal "I have only made this letter
longer because I have not had the time to make it shorter".

I think a a big problem is the mix of implicit and explicit information.
Some components are rigorously specified in terms of the characters they
may contain, elements and objects that are recognised inside them, and
even the order in which different parts of the pattern are parsed.

As mentioned originally, the current Dynamic Blocks description doesn't
even mention the CONTENTS part of the pattern, and relies on the reader
inferring that it operates similarly to the CONTENTS part of Drawers.

Forcing the reader to start making inferences like this is a treacherous
path, and I think I can blame for some of the other issues I've
experienced. Take for instance the "surely X can't contain a newline?"
comments I've made. In the Node Properties and Entities descriptions you
have statements along the lines of "X can contain any character [...]
except a newline". In my mind this then sets up the reader to interpret
a similar statement without the "except a newline" clause to mean that
newlines are permitted.

I'm also thinking that the term "element" is overworked in the document.
It's basically pulling tripple duty: you have Elements, Greater
Elements, and elements which are Elements and/or Greater Elements 😓.

The naming here is quite understandable, and I think we all know that
naming things well isn't easy, but I think it would behove us to try to
give each term a single unique meaning across the document --- or at
least try to come as close to that as reasonably possible.

I think we may be able to improve this by tweaking the hierarchy of
terms and then applying it rigorously throughout the document.

At the highest level, I think we want to encapsulate Headlines,
Sections, Greater Elements, Elements, and Objects. I suppose we might
call these the *components* of an Org document. Then we have the group
of Element and Greater Elements, which are useful to clump together.
Each component is usually given in terms of a number of forms or
patterns, which usually contain terms which are elucidated in the
description of that component.

So, the hierarchy appears to be something like.

1. (Headline / Section / Greater Element / Element / Object)
2. Headline
3. Section
4. Greater Element
5. (Greater Element / Element)
6. Element
7. Object
8. Pattern / Form
9. Term

We could say call (1) Components, (7) Units, (6) Objects, (5) Element or
Object (why not spell it out to avoid telling people to remember
something).

I could have put more thought into this, but it should do for
illustrating my line of thinking. Let me know if you have any good
ideas.

A separate improvement could be using more formatting to distinguish
when terms are used in a particular way.


Now for a few specific comments.

Tom Gillespie <tgbugs@gmail.com> writes:

>> As a general comment, in many places the Org Syntax document states what
>> characters a component can contain, but not what objects/elements. This feels
>> like a bit of a hole in the current specifications.
>
> This is indeed confusing because there are some implicit constraints
> that are not listed because they never come up.

I've sort of covered this before, but I think the document would benefit
from being more explicit in general.

> For example, you cannot have two newlines
> inside an inline footnote because the two newlines break the paragraph and the
> thing that appears to be an inline footnote is just plain text that is
> never terminated.

Specifically regarding newlines, perhaps we could add something like
this to the start of the Objects section?

"Furthermore, while many objects may contain newlines, an empty line
(i.e. a double newline) often terminates the element that the object is
a part of, such as a paragraph."

> Ensuring that font locking is in sync org-element and org-export is
> critical to ensure that users know what will actually happen.

On this, I'm cautiously optimistic about the discussion about using
org-element for fontification.

>> Heading
>> ───────
>>
>> ⁃ Ok, so `TITLE' can have any character but a newline, but what Org components can it contain?
>>   I’m going to assume any object?
>
> Via org-element-object-restrictions it is standard-set-no-line-break which is
> all elements except citation-reference, table-cell, and line-break.

I must thank you and Ihor for pointing me to
org-element-object-restrictions! I wasn't aware of that till now, and
it's most helpful. Should all the information given by it be included in
the Syntax document? I lean towards saying yes.

>>
>> Drawers and Property Drawers
>> ────────────────────────────
>>
>> ⁃ “Contents can contain any element but another drawer”
>>   • Does “any element” mean “any Element or Greater Element”
>
> Any element that does not have greater precedence, so that would
> be only a heading.

I'm not sure this element = Element / Greater Element "shorthand" is
doing us any favours, but I've discussed that already...

>>
>> Dynamic Blocks
>> ──────────────
>>
>> ⁃ It is not specified what `CONTENTS' may be
>
> Implicitly follows the same rules as drawers, no headings
> and no nesting of dynamic blocks. Text should be added
> that states this explicitly.

I'm drafting some changes, and this change has been added.

>> ⁃ Surely `PARAMETERS' cannot contain a newline?
>
> Termination by newline is implicit in the example, but the text is confusing.

Made explicit in my draft.

>> Plain Lists and Items
>> ─────────────────────
>>
>> ⁃ It is not completely clear what content an item may have.
>>   I assume any Object?
>
> By my reading it may contain anything, objects and elements,
> except for a heading, but that is already implied by the de-indent.
>
> To quote from the docs:
>
> An item ends before the next item, the first line less or equally
> indented than its starting line, or two consecutive empty lines.
> Indentation of lines within other greater elements do not count,
> neither do inlinetasks boundaries.
>
> This makes plain lists one of the most complex elements to parse.

Is it? Perhaps I'm not doing it right but it didn't seem bad to me when
implementing my parser (though I need to add the element support).

All right, that's all I have time for for now.
Hopefully some of this is of use/interest.

--
Timothy


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Some commentary on the Org Syntax document
  2021-12-02 19:16   ` Timothy
@ 2021-12-04  5:26     ` Tom Gillespie
  2021-12-04  6:17     ` Ihor Radchenko
  1 sibling, 0 replies; 13+ messages in thread
From: Tom Gillespie @ 2021-12-04  5:26 UTC (permalink / raw)
  To: Timothy; +Cc: emacs-orgmode

Hi Timothy,
   Replies in line. Some things might seem a bit out of order
because I responded from bottom to top. Best,
Tom

> from heading to bed, so to quote Pascal "I have only made this letter
> longer because I have not had the time to make it shorter".

Likewise, and I've heard it as Mark Twain :D

> I think a a big problem is the mix of implicit and explicit information.
> Some components are rigorously specified in terms of the characters they
> may contain, elements and objects that are recognised inside them, and
> even the order in which different parts of the pattern are parsed.

I agree completely.

> As mentioned originally, the current Dynamic Blocks description doesn't
> even mention the CONTENTS part of the pattern, and relies on the reader
> inferring that it operates similarly to the CONTENTS part of Drawers.

Indeed this should be fixed.

> Forcing the reader to start making inferences like this is a treacherous
> path, and I think I can blame for some of the other issues I've
> experienced. Take for instance the "surely X can't contain a newline?"
> comments I've made. In the Node Properties and Entities descriptions you
> have statements along the lines of "X can contain any character [...]
> except a newline". In my mind this then sets up the reader to interpret
> a similar statement without the "except a newline" clause to mean that
> newlines are permitted.

I agree completely and had almost the exact same experience as you
when I was working on it. As I mention below, my responses were to
illustrate why the explicit information is missing, not to suggest that it
should be left out. We should definitely work to make everything more
explicit so that future readers don't have to go through the same issues
we have.

> I'm also thinking that the term "element" is overworked in the document.
> It's basically pulling tripple duty: you have Elements, Greater
> Elements, and elements which are Elements and/or Greater Elements 😓.

In extreme agreement.

> 3. Section

Technically This isn't part of the syntax, rather it is part of
elisp Org mode's internal representation. I'm not sure I would
even mention sections at all, because they have to do with
the interpretation of the syntax. In a section on the internal
representation for Org sections definitely belong, but they
are incidental. That said, I suspect we will find that they are
useful for talking about the behavior of the file under transformation,
e.g. "headings are not reordered when pressing M-up or M-down,
sections are reordered" this allows us to make it possible to
talk about an Org implementation that has commands that allow
one to switch the headings without moving their associated
sections.

> 5. (Greater Element / Element)

There are issues here with forms that are part of the syntax vs
forms that are part of the intermediate representation. A line
based parser for Org syntax that assembles greater blocks
after the fact and a parser that uses arbitrary lookahead to
truncate on headings won't have the exact same surface
syntax, however they will both have an equivalent in their
intermediate representation that corresponds to a greater
block. Again, very deep in implementation details here,
but trying to force things like sections into the syntax
hierarchy seems confusing to me.

> 7. Object

Paragraph element maybe? Might seem odd for heading titles
to have paragraph scope, but on the other hand it certainly
simplifies the explanation of the grammar. And you can put
an inline footnote in a heading title.

> 8. Pattern / Form

Don't know what to make of this one. Like "Term" these are
incredibly generic.

> 9. Term

Use of "Term" is super confusing to me.

> We could say call (1) Components, (7) Units, (6) Objects, (5) Element or
> Object (why not spell it out to avoid telling people to remember
> something).

I'm not sure we are ready to specify this. One way that we
might try to manage this would be to create a taxonomy of
element types, e.g. top-level elements, paragraph elements,
etc. This would be consistent with the fact that the elisp
implementation of org-element has all of these as an instance
of element.

> I could have put more thought into this, but it should do for
> illustrating my line of thinking. Let me know if you have any good
> ideas.

Let's leave the terminology as is right now. I'm expecting that there
will be quite a few new terms that we will want to introduce and we
will want to separate syntax and intermediate representation.

With progress on using org-element for fontification and on laundry
we should be able to come up with language that can be used to
distinguish between concepts that are needed for syntax, (tokens,
parser) and for intermediate representations. Things like basic syntax
highlighting need only the language for syntax to be specified, but more
complex syntax such as babel font-locking either requires a more
advanced tokenizer or it requires that we talk about it at the level
of the intermediate representation. Other things such as behavior
in response to commands (e.g. M-up and M-down mentioned
above) require the language of the intermediate representation.

> A separate improvement could be using more formatting to distinguish
> when terms are used in a particular way.

I think it will be clearer to come up with distinct terms. There are
times where this stuff has to be talked about in spoken language
and it is hard to speak /*_markup_*/.

> I've sort of covered this before, but I think the document would benefit
> from being more explicit in general.

Yes. The reason I brought this up was to indicate the reason why
an explicit account was not present, not to suggest that we shouldn't
add one. Overall the more explicit we can be the better the document.
I have some stashed changes in worg from the time I was reading this
syntax document deeply. I'll see if any of them are relevant for the pass
you are doing now.

> Specifically regarding newlines, perhaps we could add something like
> this to the start of the Objects section?
>
> "Furthermore, while many objects may contain newlines, an empty line
> (i.e. a double newline) often terminates the element that the object is
> a part of, such as a paragraph."

Good idea.

> On this, I'm cautiously optimistic about the discussion about using
> org-element for fontification.

Likewise. Though I expect there will be some growing pains
based on the divergent behaviors I have seen while developing
the laundry test cases.

> I must thank you and Ihor for pointing me to
> org-element-object-restrictions! I wasn't aware of that till now, and
> it's most helpful. Should all the information given by it be included in
> the Syntax document? I lean towards saying yes.

I'm not entirely sure. I think this may be one area where we don't want
to over-specify. I consider it an implementation detail. For example,
when we were discussing valid scopes for org-cite syntax a few
months ago https://lists.gnu.org/archive/html/emacs-orgmode/2021-09/msg00128.html
I suggested that the [cite:] syntax could appear in property drawers.
Nicolas corrected me on that. However, there is no reason why a
parser should be prevented from recognizing [cite:] syntax wherever
it wants --- so long as it does not immediately expand that syntax
and execute it to add/include such a citation in the exported file.

For example, in laundry I would parse it and have it expand to a no-op
when exporting, but still have it expand for user interaction so that they
could jump to the citation reference by clicking in the buffer. Similar thing
for syntax in comment comment blocks where I frequently abuse the fact
that it is possible to jump to org links that are in comment blocks to make
it easier to navigate files.

In short, elisp Org mode doesn't have a single intermediate representation
atm, so syntactic restrictions listed by org-element-object-restrictions
are overly narrow and should not be included in the spec for the syntax
because they can be controlled at other levels of the implementation in
cases where there is a unified intermediate representation.

> I'm not sure this element = Element / Greater Element "shorthand" is
> doing us any favours, but I've discussed that already...

Agree. (see response above, I responded from bottom to top)
The object/element/greater-element/org-element/org-object
is supremely confusing. We got the name for heading updated,
(or are in the process of doing so?), but at some point I think we
should see if we can make this a bit less confusing. Too many
collisions when dropping a single qualifier.

> Is it? Perhaps I'm not doing it right but it didn't seem bad to me when
> implementing my parser (though I need to add the element support).

For a ... fun? time see the test case I cooked up for plain lists (linked
below) and then consider how to deal with cases where someone has
put a source block at some indent level. IIRC the suggested behavior
is to truncate leading whitespace to the #+end_src level. Tracking
the indentation level is required to correctly ressemble the nesting
of the lists and cannot be done during tokenization or during parsing
as a result indentation level must be retained for _all_ paragraphs
because they might be preceded by a plain list line. Not hard to
implement, just a lot of things to keep track, thus complex.

https://github.com/tgbugs/laundry/blame/c90700bd1c15d7b04e5ead44ac10005d8d2ada50/laundry/test.org#L70-L91


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Some commentary on the Org Syntax document
  2021-12-02 19:16   ` Timothy
  2021-12-04  5:26     ` Tom Gillespie
@ 2021-12-04  6:17     ` Ihor Radchenko
  2021-12-04  6:48       ` Timothy
  1 sibling, 1 reply; 13+ messages in thread
From: Ihor Radchenko @ 2021-12-04  6:17 UTC (permalink / raw)
  To: Timothy; +Cc: Tom Gillespie, emacs-orgmode

Timothy <tecosaur@gmail.com> writes:

> So, the hierarchy appears to be something like.
>
> 1. (Headline / Section / Greater Element / Element / Object)
> 2. Headline
> 3. Section
> 4. Greater Element
> 5. (Greater Element / Element)
> 6. Element
> 7. Object
> 8. Pattern / Form
> 9. Term

> We could say call (1) Components, (7) Units, (6) Objects, (5) Element or
> Object (why not spell it out to avoid telling people to remember
> something).

I am against renaming this. We should rather improve the syntax document
keeping the key concepts consistent with Elisp code.

Org parser distinguish two principal types of syntax structures:
1. Elements
2. Objects

Neither elements nor objects can intersect their boundaries, but they
can be nested.

An object is always a part of some element or other object.
Greater element can contain other elements and objects.
Element that is not greater element can only contain objects.

Headings are an example of greater element with the following structure
(headline (optional section) (optional repeat nested-headline))

Sections can only exist inside headings or top-level document (org-data
element):
1. (org-data (optional whitespace) (section) (optional repeat headline))
2. (headline (optional section) ...)

Best,
Ihor


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Some commentary on the Org Syntax document
  2021-12-04  6:17     ` Ihor Radchenko
@ 2021-12-04  6:48       ` Timothy
  2021-12-04  7:40         ` Ihor Radchenko
  0 siblings, 1 reply; 13+ messages in thread
From: Timothy @ 2021-12-04  6:48 UTC (permalink / raw)
  To: Ihor Radchenko; +Cc: Tom Gillespie, emacs-orgmode

[-- Attachment #1: Type: text/plain, Size: 1460 bytes --]

Hi Ihor,

Because your reply is shorter, you get my first response 😛.

>> [Renaming parts of the Hierarchy]
> I am against renaming this. We should rather improve the syntax document
> keeping the key concepts consistent with Elisp code.

This is certainly something to be conservative about, but I think some small
tweaks could be beneficial. See my comment below.

> Org parser distinguish two principal types of syntax structures:
> 1. Elements
> 2. Objects
>
> Neither elements nor objects can intersect their boundaries, but they
> can be nested.
>
> An object is always a part of some element or other object.
> Greater element can contain other elements and objects.
> Element that is not greater element can only contain objects.

A thought has just occurred to me, how about instead of having
“elements” which are split into “Greater Elements” and other “Elements”, what if
we simply added the prefix “lesser” to the later?

I.e. go from

⁃ Elements
  • Greater Elements
  • (other) Elements

to

⁃ Elements
  • Greater Elements
  • Lesser Elements

I think having something explicit like this could reduce the chance of
confusion.

> [Comments on headings and sections]

This accords with my reading of the document and the way I’ve implemented things
in OrgMode.jl (see <https://github.com/tecosaur/OrgMode.jl/blob/main/src/types/sections.jl>).

All the best,
Timothy

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Some commentary on the Org Syntax document
  2021-12-04  6:48       ` Timothy
@ 2021-12-04  7:40         ` Ihor Radchenko
  2021-12-04  8:09           ` Timothy
  2021-12-04  9:41           ` Nicolas Goaziou
  0 siblings, 2 replies; 13+ messages in thread
From: Ihor Radchenko @ 2021-12-04  7:40 UTC (permalink / raw)
  To: Timothy; +Cc: Tom Gillespie, emacs-orgmode

Timothy <tecosaur@gmail.com> writes:

> ⁃ Elements
>   • Greater Elements
>   • (other) Elements
>
> to
>
> ⁃ Elements
>   • Greater Elements
>   • Lesser Elements

This sounds reasonable. We can change

- Three categories are used to classify these environments: “Greater
  elements”, “elements”, and “objects”, from the broadest scope to the
  narrowest. The word “element” is used for both Greater and non-Greater
  elements, the context should make that clear.
+ Two main categories are used to classify these environments:
  "elements" and "objects", from the broadest scope to the narrowest.
  "Elements" consist of "greater elements" that can contain other
  elements and objects and "lesser elements" that can only contain
  objects.

>> [Comments on headings and sections]
>
> This accords with my reading of the document and the way I’ve implemented things
> in OrgMode.jl (see <https://github.com/tecosaur/OrgMode.jl/blob/main/src/types/sections.jl>).

One small clarification. The headline structure is actually
(headline (optional whitespace) (optional section) (optional repeat nester-headlines))

Section may not start immediately after the first newline but also after
you skip blank chars in front.

For example:

* This is a headline _without_ section, even though it contains some newlines


* Another headline

Section starts at the word "section" and spans all the way to the next headline or EOB


* Next headline

Best,
Ihor


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Some commentary on the Org Syntax document
  2021-12-04  7:40         ` Ihor Radchenko
@ 2021-12-04  8:09           ` Timothy
  2021-12-04  9:41           ` Nicolas Goaziou
  1 sibling, 0 replies; 13+ messages in thread
From: Timothy @ 2021-12-04  8:09 UTC (permalink / raw)
  To: Ihor Radchenko; +Cc: Tom Gillespie, emacs-orgmode

[-- Attachment #1: Type: text/plain, Size: 958 bytes --]

Hi Ihor,

> This sounds reasonable. We can change
> [snip]

👍 I’ll make a note in my draft then.

>>> [Comments on headings and sections]
>>
>> This accords with my reading of the document and the way I’ve implemented things
>> in OrgMode.jl (see <https://github.com/tecosaur/OrgMode.jl/blob/main/src/types/sections.jl>).
>
> One small clarification. The headline structure is actually
> (headline (optional whitespace) (optional section) (optional repeat nester-headlines))

You may be happy to hear that your example seems to be interpreted correctly by
OrgMode.jl, here’s the parse tree:

┌────
│ Org Parse Tree
│     Heading (This is a headline _without_ section, even though it contains some newlines) (empty)
│     Heading (Another headline)
│         Section
│             Paragraph
│                 TextPlain
│     Heading (Next headline) (empty)
└────

All the best,
Timothy

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Some commentary on the Org Syntax document
  2021-12-04  7:40         ` Ihor Radchenko
  2021-12-04  8:09           ` Timothy
@ 2021-12-04  9:41           ` Nicolas Goaziou
  2021-12-04 14:00             ` Ihor Radchenko
  1 sibling, 1 reply; 13+ messages in thread
From: Nicolas Goaziou @ 2021-12-04  9:41 UTC (permalink / raw)
  To: Ihor Radchenko; +Cc: Tom Gillespie, emacs-orgmode, Timothy

Hello,

Ihor Radchenko <yantar92@gmail.com> writes:

> Timothy <tecosaur@gmail.com> writes:
>
>> ⁃ Elements
>>   • Greater Elements
>>   • (other) Elements
>>
>> to
>>
>> ⁃ Elements
>>   • Greater Elements
>>   • Lesser Elements
>
> This sounds reasonable. We can change
>
> - Three categories are used to classify these environments: “Greater
>   elements”, “elements”, and “objects”, from the broadest scope to the
>   narrowest. The word “element” is used for both Greater and non-Greater
>   elements, the context should make that clear.
> + Two main categories are used to classify these environments:
>   "elements" and "objects", from the broadest scope to the narrowest.
>   "Elements" consist of "greater elements" that can contain other
>   elements and objects and "lesser elements" that can only contain
>   objects.

There are actually three types of elements: not all elements can contain
objects.

Regards,
-- 
Nicolas Goaziou


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Some commentary on the Org Syntax document
  2021-12-04  9:41           ` Nicolas Goaziou
@ 2021-12-04 14:00             ` Ihor Radchenko
  2021-12-04 14:43               ` Nicolas Goaziou
  0 siblings, 1 reply; 13+ messages in thread
From: Ihor Radchenko @ 2021-12-04 14:00 UTC (permalink / raw)
  To: Nicolas Goaziou; +Cc: Tom Gillespie, emacs-orgmode, Timothy

Nicolas Goaziou <mail@nicolasgoaziou.fr> writes:

>> This sounds reasonable. We can change
>>
>> - Three categories are used to classify these environments: “Greater
>>   elements”, “elements”, and “objects”, from the broadest scope to the
>>   narrowest. The word “element” is used for both Greater and non-Greater
>>   elements, the context should make that clear.
>> + Two main categories are used to classify these environments:
>>   "elements" and "objects", from the broadest scope to the narrowest.
>>   "Elements" consist of "greater elements" that can contain other
>>   elements and objects and "lesser elements" that can only contain
>>   objects.
>
> There are actually three types of elements: not all elements can contain
> objects.

You are right. However, I am not sure if it is a good idea to mention
this in the introduction part of the syntax document.

Maybe we can just say "... lesser elements" that cannot contain other
elements."? Then, we mention that some elements cannot contain objects
in the description of those elements.

Best,
Ihor




^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Some commentary on the Org Syntax document
  2021-12-04 14:00             ` Ihor Radchenko
@ 2021-12-04 14:43               ` Nicolas Goaziou
  2021-12-05  6:30                 ` Ihor Radchenko
  0 siblings, 1 reply; 13+ messages in thread
From: Nicolas Goaziou @ 2021-12-04 14:43 UTC (permalink / raw)
  To: Ihor Radchenko; +Cc: Tom Gillespie, emacs-orgmode, Timothy

Ihor Radchenko <yantar92@gmail.com> writes:

>> There are actually three types of elements: not all elements can contain
>> objects.
>
> You are right. However, I am not sure if it is a good idea to mention
> this in the introduction part of the syntax document.
>
> Maybe we can just say "... lesser elements" that cannot contain other
> elements."? Then, we mention that some elements cannot contain objects
> in the description of those elements.

But then, you do not remove the ambiguity that is condemned in this
thread. The greater element/element and greater element/lesser element
distinctions are equivalent, albeit not identical.

IIUC, you want three terms for elements (I am not even talking about
secondary strings, which can hold objects that are not part of
contents), and probably two for objects: terminal and non-terminal.

Regards,


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Some commentary on the Org Syntax document
  2021-12-04 14:43               ` Nicolas Goaziou
@ 2021-12-05  6:30                 ` Ihor Radchenko
  2021-12-05  9:28                   ` Nicolas Goaziou
  0 siblings, 1 reply; 13+ messages in thread
From: Ihor Radchenko @ 2021-12-05  6:30 UTC (permalink / raw)
  To: Nicolas Goaziou; +Cc: Tom Gillespie, emacs-orgmode, Timothy

Nicolas Goaziou <mail@nicolasgoaziou.fr> writes:

>> Maybe we can just say "... lesser elements" that cannot contain other
>> elements."? Then, we mention that some elements cannot contain objects
>> in the description of those elements.
>
> But then, you do not remove the ambiguity that is condemned in this
> thread. The greater element/element and greater element/lesser element
> distinctions are equivalent, albeit not identical.

AFAIU, elements = greater-elements ∪ lesser-elements
The current syntax draft contains section "Greater elements" defining
all the greater-elements and section "Elements" defining lesser-elements
However, the word "elements" also refers to all possible elements in
some parts of the draft.
I propose to remove the ambiguity by referring to members of
org-element-greater-elements as "greater elements"; to
org-element-all-elements - org-element-greater-elements as "lesser
elements"; and to org-element-all-elements as just "elements".

> IIUC, you want three terms for elements (I am not even talking about
> secondary strings, which can hold objects that are not part of
> contents),

Yep.

> ... and probably two for objects: terminal and non-terminal.

Sorry, I do not understand what you refer to here.

Best,
Ihor


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Some commentary on the Org Syntax document
  2021-12-05  6:30                 ` Ihor Radchenko
@ 2021-12-05  9:28                   ` Nicolas Goaziou
  0 siblings, 0 replies; 13+ messages in thread
From: Nicolas Goaziou @ 2021-12-05  9:28 UTC (permalink / raw)
  To: Ihor Radchenko; +Cc: Tom Gillespie, emacs-orgmode, Timothy

Hello,

Ihor Radchenko <yantar92@gmail.com> writes:

> Nicolas Goaziou <mail@nicolasgoaziou.fr> writes:

>> But then, you do not remove the ambiguity that is condemned in this
>> thread. The greater element/element and greater element/lesser element
>> distinctions are equivalent, albeit not identical.
>
> AFAIU, elements = greater-elements ∪ lesser-elements
> The current syntax draft contains section "Greater elements" defining
> all the greater-elements and section "Elements" defining lesser-elements
> However, the word "elements" also refers to all possible elements in
> some parts of the draft.
> I propose to remove the ambiguity by referring to members of
> org-element-greater-elements as "greater elements"; to
> org-element-all-elements - org-element-greater-elements as "lesser
> elements"; and to org-element-all-elements as just "elements".

I understand the proposal. I'm just pointing out that currently, the
distinction exists already in some other form—as noted, what you call
lesser elements is currently the set difference between greater elements
and elements. Therefore, it is hardly a huge step forward.

In any case, both proposals are incomplete.

>> IIUC, you want three terms for elements (I am not even talking about
>> secondary strings, which can hold objects that are not part of
>> contents),
>
> Yep.

For clarity, I mean three terms /in addition to "elements"/. For
example, a drawer, a paragraph and a planning line all are elements.
Yet, they may be different enough so as to deserve their own label.

>> ... and probably two for objects: terminal and non-terminal.
>
> Sorry, I do not understand what you refer to here.

Some objects can contain other objects. Others cannot. Per above, it may
be ambiguous to use the term "object" for both categories.

In a nutshell, naming is hard.

Regards,
-- 
Nicolas Goaziou


^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2021-12-05  9:28 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-12-02  9:23 Some commentary on the Org Syntax document Timothy
2021-12-02 19:00 ` Tom Gillespie
2021-12-02 19:16   ` Timothy
2021-12-04  5:26     ` Tom Gillespie
2021-12-04  6:17     ` Ihor Radchenko
2021-12-04  6:48       ` Timothy
2021-12-04  7:40         ` Ihor Radchenko
2021-12-04  8:09           ` Timothy
2021-12-04  9:41           ` Nicolas Goaziou
2021-12-04 14:00             ` Ihor Radchenko
2021-12-04 14:43               ` Nicolas Goaziou
2021-12-05  6:30                 ` Ihor Radchenko
2021-12-05  9:28                   ` Nicolas Goaziou

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).