emacs-orgmode@gnu.org archives
 help / color / mirror / code / Atom feed
* Extra paragraphs incorrectly spawning when ":end:" appears.
@ 2023-09-30 18:47 Tom Alexander
  2023-09-30 19:15 ` Tom Alexander
  2023-10-01  7:50 ` Ihor Radchenko
  0 siblings, 2 replies; 5+ messages in thread
From: Tom Alexander @ 2023-09-30 18:47 UTC (permalink / raw)
  To: emacs-orgmode

[-- Attachment #1: Type: text/plain, Size: 616 bytes --]

This test document has 1 paragraph:
```
foo
bar
baz
```
which parses as:
```
(section
  (paragraph "foo\nbar\nbaz\n")
)
```

This test document should have 1 paragraph but org-mode is parsing it as 2:
```
foo
:end:
baz
```

which parses as:
```
(section
  (paragraph "foo\n")
  (paragraph ":end:\nbaz\n")
)
```

The paragraph documentation[1] states that:
> Empty lines and other elements end paragraphs.

But the document contains no empty lines and we can see in the output that it only contains paragraphs.

[1] https://orgmode.org/worg/org-syntax.html#Paragraphs

--
Tom Alexander
pgp: https://fizz.buzz/pgp.asc

[-- Attachment #2: Type: text/html, Size: 1523 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Extra paragraphs incorrectly spawning when ":end:" appears.
  2023-09-30 18:47 Extra paragraphs incorrectly spawning when ":end:" appears Tom Alexander
@ 2023-09-30 19:15 ` Tom Alexander
  2023-10-01  7:50 ` Ihor Radchenko
  1 sibling, 0 replies; 5+ messages in thread
From: Tom Alexander @ 2023-09-30 19:15 UTC (permalink / raw)
  To: emacs-orgmode

Same problem occurs with this sample document:
```
foo
#+BEGIN: bar
baz
```

which parses as:
```
(section
  (paragraph "foo\n")
  (paragraph "#+BEGIN: bar\nbaz\n)
)
```

again, no blank lines and no non-paragraph elements but the single paragraph got split in two.

--
Tom Alexander
pgp: https://fizz.buzz/pgp.asc


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Extra paragraphs incorrectly spawning when ":end:" appears.
  2023-09-30 18:47 Extra paragraphs incorrectly spawning when ":end:" appears Tom Alexander
  2023-09-30 19:15 ` Tom Alexander
@ 2023-10-01  7:50 ` Ihor Radchenko
  2023-10-02 14:17   ` Tom Alexander
  2024-02-13 11:49   ` Ihor Radchenko
  1 sibling, 2 replies; 5+ messages in thread
From: Ihor Radchenko @ 2023-10-01  7:50 UTC (permalink / raw)
  To: Tom Alexander; +Cc: emacs-orgmode

"Tom Alexander" <tom@fizz.buzz> writes:

> This test document should have 1 paragraph but org-mode is parsing it as 2:
> ```
> foo
> :end:
> baz
> ```
>
> which parses as:
> ```
> (section
>   (paragraph "foo\n")
>   (paragraph ":end:\nbaz\n")
> )
> ```
>
> The paragraph documentation[1] states that:
>> Empty lines and other elements end paragraphs.
>
> But the document contains no empty lines and we can see in the output that it only contains paragraphs.

The documentation is not accurate here.

The parser uses anything that _potentially_ looks like the beginning of
another element to calculate paragraph boundaries
(`org-element-paragraph-separate'). ":end:" is potentially a drawer and
thus ends the preceding paragraph.

Later, ":end:" line is parsed as a new structural element using
`org-element-drawer-parser'. The drawer parser detects that there is no
closing :end: line and thus falls back to paragraph parsing:

(defun org-element-drawer-parser (limit affiliated)
...
	;; Incomplete drawer: parse it as a paragraph.
	(org-element-paragraph-parser limit affiliated)

The same logic applies to a number of other incomplete elements.

The reason behind the current logic and not re-parsing the preceding
paragraph when we encounter incomplete drawer/block/etc is that Org
parser is written to do a single pass - we never re-parse already parsed
parts. Doing things otherwise, while could solve certain non-intuitive
behaviors, would be problematic performance-wise.

So, the actual paragraph separator that should be used is
`org-element-paragraph-separate' regexp.

We need to fix the WORG syntax description accordingly.

-- 
Ihor Radchenko // yantar92,
Org mode contributor,
Learn more about Org mode at <https://orgmode.org/>.
Support Org development at <https://liberapay.com/org-mode>,
or support my work at <https://liberapay.com/yantar92>


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Extra paragraphs incorrectly spawning when ":end:" appears.
  2023-10-01  7:50 ` Ihor Radchenko
@ 2023-10-02 14:17   ` Tom Alexander
  2024-02-13 11:49   ` Ihor Radchenko
  1 sibling, 0 replies; 5+ messages in thread
From: Tom Alexander @ 2023-10-02 14:17 UTC (permalink / raw)
  To: Ihor Radchenko; +Cc: emacs-orgmode

Hmm thanks, that makes sense. I guess a post-processing step to merge adjacent paragraphs wouldn't work either since that wouldn't stitch together objects like the bold in this test document without re-parsing the entire paragraph:
```
foo *bar
:end:
baz*
```

oh well 🤷

--
Tom Alexander
pgp: https://fizz.buzz/pgp.asc


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Extra paragraphs incorrectly spawning when ":end:" appears.
  2023-10-01  7:50 ` Ihor Radchenko
  2023-10-02 14:17   ` Tom Alexander
@ 2024-02-13 11:49   ` Ihor Radchenko
  1 sibling, 0 replies; 5+ messages in thread
From: Ihor Radchenko @ 2024-02-13 11:49 UTC (permalink / raw)
  To: Tom Alexander; +Cc: emacs-orgmode

Ihor Radchenko <yantar92@posteo.net> writes:

> "Tom Alexander" <tom@fizz.buzz> writes:
>
>> This test document should have 1 paragraph but org-mode is parsing it as 2:
>> ```
>> foo
>> :end:
>> baz
>> ```
>>
>> which parses as:
>> ```
>> (section
>>   (paragraph "foo\n")
>>   (paragraph ":end:\nbaz\n")
>> )
>> ```
> ....
> The documentation is not accurate here.
>
> The parser uses anything that _potentially_ looks like the beginning of
> another element to calculate paragraph boundaries
> (`org-element-paragraph-separate'). ":end:" is potentially a drawer and
> thus ends the preceding paragraph.

I was wrong.
`org-element-paragraph-parser' actually does perform forward-checking.
So, your example is a genuine bug in the parser. (and the relevant tests
were not very accurate due to copy-pasting)
Fixed, on main.
https://git.savannah.gnu.org/cgit/emacs/org-mode.git/commit/?id=61c235b77

-- 
Ihor Radchenko // yantar92,
Org mode contributor,
Learn more about Org mode at <https://orgmode.org/>.
Support Org development at <https://liberapay.com/org-mode>,
or support my work at <https://liberapay.com/yantar92>


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2024-02-13 11:46 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-09-30 18:47 Extra paragraphs incorrectly spawning when ":end:" appears Tom Alexander
2023-09-30 19:15 ` Tom Alexander
2023-10-01  7:50 ` Ihor Radchenko
2023-10-02 14:17   ` Tom Alexander
2024-02-13 11:49   ` Ihor Radchenko

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).