emacs-orgmode@gnu.org archives
 help / color / mirror / code / Atom feed
* Inline markup: How does org identify nested code/verbatim?
@ 2023-01-29 18:05 c.buhtz
  2023-01-29 18:20 ` c.buhtz
  2023-01-30 14:56 ` Ihor Radchenko
  0 siblings, 2 replies; 5+ messages in thread
From: c.buhtz @ 2023-01-29 18:05 UTC (permalink / raw)
  To: emacs-orgmode

Hi folks,

this is a question about org(mode) development itself.
It is magic to me how you do this. ;) And I would like to learn it
because I do write kind of an org parser in Python.

Here is a nested code-in-verbatim text.

    This =is ~code~ in verbatim= text.

Exporting this to html (via org-html-export-as-html)

    This <code>is ~code~ in verbatim</code> text.

Awsome! :D

The point is myself I'm able to identify code or verbatim with regex
including three catch groups for the content before, between and
after the inline markers.

    for verbatim: "(^|[ .,;:\-?!({\"'])=(.*?)=([ .,;:\-?!)}\"']|$)"
    for code:     "(^|[ .,;:\-?!({\"'])~(.*?)~([ .,;:\-?!)}\"']|$)"
 
But they don't work together. In the example above I need to use the
verbatim regex first to make it right. 

If I would use the code regex first it wouldn't work because it would
find the ~code~ but without knowing that it is surrounded by ~verbatim~.

I don't know what my users inputs to my software: verbatim in code or
code in verbatim. So I have to figure out which regex to use first.

How does org solve this problem? I don't need a full working solution
but just an idea.

One approach in my mind is to run both regex separate and then compare
the results "somehow":

    Verbatim: ['This', ' ', 'is ~code~ in verbatim', ' ', 'text.']
    Code :    ['This =is', ' ', 'code', ' ', 'in verbatim= text.']

"Somehow"!

Another approach in my mind is to do something I would call nested
regex. Constructing a regex pattern looking for verbatim with code in
it. And the other way around of course.


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Inline markup: How does org identify nested code/verbatim?
  2023-01-29 18:05 Inline markup: How does org identify nested code/verbatim? c.buhtz
@ 2023-01-29 18:20 ` c.buhtz
  2023-01-30  2:29   ` Max Nikulin
  2023-01-30 14:56 ` Ihor Radchenko
  1 sibling, 1 reply; 5+ messages in thread
From: c.buhtz @ 2023-01-29 18:20 UTC (permalink / raw)
  To: emacs-orgmode

Please let me add the nested-regex-approach. I wouldn't call this a
solution but just an approach. No one understand that regex it is
nearly unmaintainable.

I hope for a more elegant solution.

This matches if we have code in verbatim
^|[ .,;:\-?!({\"']=.*?(?:^|[ .,;:\-?!({\"']~.*?~[.,;:\-?!)}\"']|$).*?=[ .,;:\-?!)}\"']|$

This matches if we have verbatim in code
(?:^|[ .,;:\-?!({\"']~.*?(?:^|[ .,;:\-?!({\"']=.*?=[.,;:\-?!)}\"']|$).*?~[ .,;:\-?!)}\"']|$)

If one of this matching I now which one of my "usual" regex pattern using catching groups to extract the content I should use first.

Just for testing (maybe on regex101.com) here is the text I used.

This =is ~code~ in verbatim= text.
This =is usual verbatim= text.

This ~is =verbatim= in code~ text.
This ~is usual code~ text.


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Inline markup: How does org identify nested code/verbatim?
  2023-01-29 18:20 ` c.buhtz
@ 2023-01-30  2:29   ` Max Nikulin
  0 siblings, 0 replies; 5+ messages in thread
From: Max Nikulin @ 2023-01-30  2:29 UTC (permalink / raw)
  To: emacs-orgmode

On 30/01/2023 01:20, c.buhtz wrote:
> Please let me add the nested-regex-approach.

You should look up for any markup starting at first. org-element parser 
uses "first wins" approach. Notice the following:

     /italics ~code/ verbatim~

is exported as

     <p>
     <i>italics ~code</i> verbatim~</p>

Notice that closing italics marker cancels recognizing of code snippet.



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Inline markup: How does org identify nested code/verbatim?
  2023-01-29 18:05 Inline markup: How does org identify nested code/verbatim? c.buhtz
  2023-01-29 18:20 ` c.buhtz
@ 2023-01-30 14:56 ` Ihor Radchenko
  2023-01-30 23:36   ` Tom Gillespie
  1 sibling, 1 reply; 5+ messages in thread
From: Ihor Radchenko @ 2023-01-30 14:56 UTC (permalink / raw)
  To: c.buhtz; +Cc: emacs-orgmode

<c.buhtz@posteo.jp> writes:

> The point is myself I'm able to identify code or verbatim with regex
> including three catch groups for the content before, between and
> after the inline markers.
>
>     for verbatim: "(^|[ .,;:\-?!({\"'])=(.*?)=([ .,;:\-?!)}\"']|$)"
>     for code:     "(^|[ .,;:\-?!({\"'])~(.*?)~([ .,;:\-?!)}\"']|$)"
>  
> But they don't work together. In the example above I need to use the
> verbatim regex first to make it right. 

See https://orgmode.org/worg/org-syntax.html#Emphasis_Markers
Note that Org is not context-free. Within Org AST elements that can
contain objects, the first match "wins":
1. Org looks at a text and searches the first matching object regexp
2. Everything before the match is considered plain-text
3. Everything inside the match is considered the matched object and then
   parsed recursively
4. go to (1)   

-- 
Ihor Radchenko // yantar92,
Org mode contributor,
Learn more about Org mode at <https://orgmode.org/>.
Support Org development at <https://liberapay.com/org-mode>,
or support my work at <https://liberapay.com/yantar92>


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Inline markup: How does org identify nested code/verbatim?
  2023-01-30 14:56 ` Ihor Radchenko
@ 2023-01-30 23:36   ` Tom Gillespie
  0 siblings, 0 replies; 5+ messages in thread
From: Tom Gillespie @ 2023-01-30 23:36 UTC (permalink / raw)
  To: c.buhtz; +Cc: emacs-orgmode

[-- Attachment #1: Type: text/plain, Size: 532 bytes --]

In short, you cannot nest code in verbatim or verbatim in code
because they are both terminal (end of the line for nesting).
In fact you can't nest anything inside them by their very nature.

Anything inside of them cannot have special functionality, even
escape codes don't play well in that part of the grammar.

There is no way around this because you cannot nest inside
things that are by definition terminal. However, from your
examples it seems that you can get the effect you are looking
for using ~is~ =verbatim= ~in code~.

[-- Attachment #2: Type: text/html, Size: 614 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2023-01-30 23:37 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-01-29 18:05 Inline markup: How does org identify nested code/verbatim? c.buhtz
2023-01-29 18:20 ` c.buhtz
2023-01-30  2:29   ` Max Nikulin
2023-01-30 14:56 ` Ihor Radchenko
2023-01-30 23:36   ` Tom Gillespie

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).