* Extra paragraphs incorrectly spawning when ":end:" appears.
@ 2023-09-30 18:47 Tom Alexander
2023-09-30 19:15 ` Tom Alexander
2023-10-01 7:50 ` Ihor Radchenko
0 siblings, 2 replies; 5+ messages in thread
From: Tom Alexander @ 2023-09-30 18:47 UTC (permalink / raw)
To: emacs-orgmode
[-- Attachment #1: Type: text/plain, Size: 616 bytes --]
This test document has 1 paragraph:
```
foo
bar
baz
```
which parses as:
```
(section
(paragraph "foo\nbar\nbaz\n")
)
```
This test document should have 1 paragraph but org-mode is parsing it as 2:
```
foo
:end:
baz
```
which parses as:
```
(section
(paragraph "foo\n")
(paragraph ":end:\nbaz\n")
)
```
The paragraph documentation[1] states that:
> Empty lines and other elements end paragraphs.
But the document contains no empty lines and we can see in the output that it only contains paragraphs.
[1] https://orgmode.org/worg/org-syntax.html#Paragraphs
--
Tom Alexander
pgp: https://fizz.buzz/pgp.asc
[-- Attachment #2: Type: text/html, Size: 1523 bytes --]
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Extra paragraphs incorrectly spawning when ":end:" appears.
2023-09-30 18:47 Extra paragraphs incorrectly spawning when ":end:" appears Tom Alexander
@ 2023-09-30 19:15 ` Tom Alexander
2023-10-01 7:50 ` Ihor Radchenko
1 sibling, 0 replies; 5+ messages in thread
From: Tom Alexander @ 2023-09-30 19:15 UTC (permalink / raw)
To: emacs-orgmode
Same problem occurs with this sample document:
```
foo
#+BEGIN: bar
baz
```
which parses as:
```
(section
(paragraph "foo\n")
(paragraph "#+BEGIN: bar\nbaz\n)
)
```
again, no blank lines and no non-paragraph elements but the single paragraph got split in two.
--
Tom Alexander
pgp: https://fizz.buzz/pgp.asc
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Extra paragraphs incorrectly spawning when ":end:" appears.
2023-09-30 18:47 Extra paragraphs incorrectly spawning when ":end:" appears Tom Alexander
2023-09-30 19:15 ` Tom Alexander
@ 2023-10-01 7:50 ` Ihor Radchenko
2023-10-02 14:17 ` Tom Alexander
2024-02-13 11:49 ` Ihor Radchenko
1 sibling, 2 replies; 5+ messages in thread
From: Ihor Radchenko @ 2023-10-01 7:50 UTC (permalink / raw)
To: Tom Alexander; +Cc: emacs-orgmode
"Tom Alexander" <tom@fizz.buzz> writes:
> This test document should have 1 paragraph but org-mode is parsing it as 2:
> ```
> foo
> :end:
> baz
> ```
>
> which parses as:
> ```
> (section
> (paragraph "foo\n")
> (paragraph ":end:\nbaz\n")
> )
> ```
>
> The paragraph documentation[1] states that:
>> Empty lines and other elements end paragraphs.
>
> But the document contains no empty lines and we can see in the output that it only contains paragraphs.
The documentation is not accurate here.
The parser uses anything that _potentially_ looks like the beginning of
another element to calculate paragraph boundaries
(`org-element-paragraph-separate'). ":end:" is potentially a drawer and
thus ends the preceding paragraph.
Later, ":end:" line is parsed as a new structural element using
`org-element-drawer-parser'. The drawer parser detects that there is no
closing :end: line and thus falls back to paragraph parsing:
(defun org-element-drawer-parser (limit affiliated)
...
;; Incomplete drawer: parse it as a paragraph.
(org-element-paragraph-parser limit affiliated)
The same logic applies to a number of other incomplete elements.
The reason behind the current logic and not re-parsing the preceding
paragraph when we encounter incomplete drawer/block/etc is that Org
parser is written to do a single pass - we never re-parse already parsed
parts. Doing things otherwise, while could solve certain non-intuitive
behaviors, would be problematic performance-wise.
So, the actual paragraph separator that should be used is
`org-element-paragraph-separate' regexp.
We need to fix the WORG syntax description accordingly.
--
Ihor Radchenko // yantar92,
Org mode contributor,
Learn more about Org mode at <https://orgmode.org/>.
Support Org development at <https://liberapay.com/org-mode>,
or support my work at <https://liberapay.com/yantar92>
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Extra paragraphs incorrectly spawning when ":end:" appears.
2023-10-01 7:50 ` Ihor Radchenko
@ 2023-10-02 14:17 ` Tom Alexander
2024-02-13 11:49 ` Ihor Radchenko
1 sibling, 0 replies; 5+ messages in thread
From: Tom Alexander @ 2023-10-02 14:17 UTC (permalink / raw)
To: Ihor Radchenko; +Cc: emacs-orgmode
Hmm thanks, that makes sense. I guess a post-processing step to merge adjacent paragraphs wouldn't work either since that wouldn't stitch together objects like the bold in this test document without re-parsing the entire paragraph:
```
foo *bar
:end:
baz*
```
oh well 🤷
--
Tom Alexander
pgp: https://fizz.buzz/pgp.asc
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Extra paragraphs incorrectly spawning when ":end:" appears.
2023-10-01 7:50 ` Ihor Radchenko
2023-10-02 14:17 ` Tom Alexander
@ 2024-02-13 11:49 ` Ihor Radchenko
1 sibling, 0 replies; 5+ messages in thread
From: Ihor Radchenko @ 2024-02-13 11:49 UTC (permalink / raw)
To: Tom Alexander; +Cc: emacs-orgmode
Ihor Radchenko <yantar92@posteo.net> writes:
> "Tom Alexander" <tom@fizz.buzz> writes:
>
>> This test document should have 1 paragraph but org-mode is parsing it as 2:
>> ```
>> foo
>> :end:
>> baz
>> ```
>>
>> which parses as:
>> ```
>> (section
>> (paragraph "foo\n")
>> (paragraph ":end:\nbaz\n")
>> )
>> ```
> ....
> The documentation is not accurate here.
>
> The parser uses anything that _potentially_ looks like the beginning of
> another element to calculate paragraph boundaries
> (`org-element-paragraph-separate'). ":end:" is potentially a drawer and
> thus ends the preceding paragraph.
I was wrong.
`org-element-paragraph-parser' actually does perform forward-checking.
So, your example is a genuine bug in the parser. (and the relevant tests
were not very accurate due to copy-pasting)
Fixed, on main.
https://git.savannah.gnu.org/cgit/emacs/org-mode.git/commit/?id=61c235b77
--
Ihor Radchenko // yantar92,
Org mode contributor,
Learn more about Org mode at <https://orgmode.org/>.
Support Org development at <https://liberapay.com/org-mode>,
or support my work at <https://liberapay.com/yantar92>
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2024-02-13 11:46 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-09-30 18:47 Extra paragraphs incorrectly spawning when ":end:" appears Tom Alexander
2023-09-30 19:15 ` Tom Alexander
2023-10-01 7:50 ` Ihor Radchenko
2023-10-02 14:17 ` Tom Alexander
2024-02-13 11:49 ` Ihor Radchenko
Code repositories for project(s) associated with this public inbox
https://git.savannah.gnu.org/cgit/emacs/org-mode.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).