emacs-orgmode@gnu.org archives
 help / color / mirror / code / Atom feed
* Pandoc and nested emhases
@ 2022-02-18  0:47 Juan Manuel Macías
  2022-02-18 12:06 ` Max Nikulin
  2023-10-11 16:11 ` Max Nikulin
  0 siblings, 2 replies; 6+ messages in thread
From: Juan Manuel Macías @ 2022-02-18  0:47 UTC (permalink / raw)
  To: orgmode

Hi all,

Sorry in advance if this may sound too trivial, imprecise or naive: it's
just for my curiosity, as I've recently been doing some tests with Pandoc
and I've seen something that has caught my attention.

It is known that LaTeX-style nested emphases of the same category are
not possible in Org. For example, the following string does not export
to LaTeX as expected:

#+begin_src org :results latex replace
/lorem /ipsum/ dolor/
#+end_src

#+RESULTS:
#+begin_export latex
\emph{lorem /ipsum} dolor/
#+end_export

Otherwise, if you export to LaTeX with pandoc (v. 2.14.2), the result is
(to my surprise) correct:

#+begin_src sh :results latex
str="/lorem /ipsum/ dolor/" 
pandoc -f org -t latex <<< $str
#+end_src

#+RESULTS:
#+begin_export latex
\emph{lorem \emph{ipsum} dolor}
#+end_export

If memory serves me, I think this was not possible before with Pandoc
(neither from Org nor from Markdown, but I insist that I don't know if
my memory is failing me too much :-)).

Anyway, I wonder if it would be possible for Org to somehow implement
some Pandoc procedure to be able to export nested emphases of the same
category.

Another (more abstract) doubt that arises, although I am not an expert
in matters of grammar and specifications. If nested emphases of the same
category are not possible in Org, should this be understood as a bug or
a feature? What implication does it have if a external parser, like
Pandoc, parses them just "fine"?

Best regards,

Juan Manuel


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Pandoc and nested emhases
  2022-02-18  0:47 Pandoc and nested emhases Juan Manuel Macías
@ 2022-02-18 12:06 ` Max Nikulin
  2022-02-18 12:31   ` Juan Manuel Macías
  2023-10-11 16:11 ` Max Nikulin
  1 sibling, 1 reply; 6+ messages in thread
From: Max Nikulin @ 2022-02-18 12:06 UTC (permalink / raw)
  To: emacs-orgmode

On 18/02/2022 07:47, Juan Manuel Macías wrote:
> 
> Otherwise, if you export to LaTeX with pandoc (v. 2.14.2), the result is
> (to my surprise) correct:
> 
> str="/lorem /ipsum/ dolor/"
> pandoc -f org -t latex <<< $str
> \emph{lorem \emph{ipsum} dolor}

2.5-3build2 from Ubuntu-20.04 works in the same way.

I like such behavior:

echo "/lorem =ip/ sum= dolor/" | pandoc -f org -t latex
\emph{lorem \texttt{ip/\ sum} dolor}

I know at least one more persons who will be happy as well:
https://list.orgmode.org/87pmtqp79s.fsf@web.de/T/#u 
mid:87pmtqp79s.fsf@web.de
(tracked as a confirmed bug at https://updates.orgmode.org/)

printf '/lorem\nipsum [[https://orgmode.org/,service][dolor]] ipsum/\n' 
| pandoc -f org -t latex
\emph{lorem ipsum \href{https://orgmode.org/,service}{dolor} ipsum}

> Another (more abstract) doubt that arises, although I am not an expert
> in matters of grammar and specifications. If nested emphases of the same
> category are not possible in Org, should this be understood as a bug or
> a feature? What implication does it have if a external parser, like
> Pandoc, parses them just "fine"?

Nicolas Goaziou explicitly stated that current behavior is correct, see 
"[Patch] to correctly sort the items with emphasis marks in a list". 
Tue, 20 Apr 2021 22:37:31 +0200. mid:874kg0ae0k.fsf@nicolasgoaziou.fr
https://list.orgmode.org/874kg0ae0k.fsf@nicolasgoaziou.fr/

Nicolas confirmed it when I posted a similar example later in the 
following discussion:

Ihor Radchenko. c47b535bb origin/main org-element: Remove dependency on 
‘org-emphasis-regexp-components’
Thu, 18 Nov 2021 20:25:33 +0800. mid:87tug93b2a.fsf@localhost
https://list.orgmode.org/87tug93b2a.fsf@localhost/
> My intuition says that the current parser behaviour is not correct. It
> would make more sense to prioritise link over italics. However, it would
> require a major change in the parser - instead of a single pass, the
> parser may parse different types of objects sequentially.

Nicolas Goaziou. c47b535bb origin/main org-element: Remove dependency on 
‘org-emphasis-regexp-components’
Thu, 18 Nov 2021 13:35:19 +0100. mid:87y25l8wvs.fsf@nicolasgoaziou.fr
https://list.orgmode.org/87y25l8wvs.fsf@nicolasgoaziou.fr/
> I disagree. Priority should be given to the first object being started.
> This is, IMO, the only sane way to handle syntax.

And once more in response to my message:

Nicolas Goaziou. org parser and priorities of inline elements.
Sat, 27 Nov 2021 20:02:31 +0100. mid:87mtlppgl4.fsf@nicolasgoaziou.fr
https://list.orgmode.org/87mtlppgl4.fsf@nicolasgoaziou.fr/
> I don't see any incentive to change the order objects are parsed, once
> you know how Org does it. This is just a red herring. What is useful,
> however, is to fontify them the way Org sees them.

So formally this feature of pandoc is a bug (due to different kind of 
parser). It is the reason why a corpus of tests should exist in a format 
that can be easily imported from various programming languages.



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Pandoc and nested emhases
  2022-02-18 12:06 ` Max Nikulin
@ 2022-02-18 12:31   ` Juan Manuel Macías
  2022-02-24 12:50     ` Max Nikulin
  0 siblings, 1 reply; 6+ messages in thread
From: Juan Manuel Macías @ 2022-02-18 12:31 UTC (permalink / raw)
  To: Max Nikulin; +Cc: orgmode

Hi Maxim,

Max Nikulin writes:

> So formally this feature of pandoc is a bug (due to different kind of
> parser). It is the reason why a corpus of tests should exist in a
> format that can be easily imported from various programming languages.

Your conclusions seem logical to me. It may sound a bit surrealistic to
think that Pandoc is doing it wrong precisely for doing it "right", but...
if from Org's point of view this is not something specified in its
syntax, then here Pandoc makes a mistake parsing Org's syntax. All this
is very curious, indeed. I confess that before I did not see the need for
those corpus of tests very clearly, but this case has opened my mind.

Best regards,

Juan Manuel 


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Pandoc and nested emhases
  2022-02-18 12:31   ` Juan Manuel Macías
@ 2022-02-24 12:50     ` Max Nikulin
  0 siblings, 0 replies; 6+ messages in thread
From: Max Nikulin @ 2022-02-24 12:50 UTC (permalink / raw)
  To: emacs-orgmode

On 18/02/2022 19:31, Juan Manuel Macías wrote:
> Max Nikulin writes:
> 
>> So formally this feature of pandoc is a bug (due to different kind of
>> parser). It is the reason why a corpus of tests should exist in a
>> format that can be easily imported from various programming languages.
> 
> Your conclusions seem logical to me. It may sound a bit surrealistic to
> think that Pandoc is doing it wrong precisely for doing it "right", but...

Even pandoc is not perfect (at least not really consistent):

printf '%s' '- [[https://orgmode.org/][Lorem :: Ipsum]]' \
   | pandoc -f org -t html
<dl>
<dt>[[<a href="https://orgmode.org/">https://orgmode.org/</a>][Lorem</dt>
<dd>Ipsum]]
</dd>
</dl>

Default zero-width workaround does not work for e.g. for code snippets 
since it would break syntax of target programming language:

printf '%s' '- src_haskell{monoidBSFold :: FilePath -> IO Counts}' \
   | pandoc -f org -t html

Examples are inspired by the following message:

Cash Weaver. Bug with exporting list with link item containing "::" to 
markdown. Sat, 12 Feb 2022 12:12:45 -0800.
https://list.orgmode.org/CABGRHLkLGXYgGNm4CXK_LjOTGTpsLO=5aWD=FyPd1aMy2QdBxw@mail.gmail.com



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Pandoc and nested emhases
  2022-02-18  0:47 Pandoc and nested emhases Juan Manuel Macías
  2022-02-18 12:06 ` Max Nikulin
@ 2023-10-11 16:11 ` Max Nikulin
  2023-10-22 19:15   ` Juan Manuel Macías
  1 sibling, 1 reply; 6+ messages in thread
From: Max Nikulin @ 2023-10-11 16:11 UTC (permalink / raw)
  To: Juan Manuel Macías, orgmode

It was long time ago...

On 18/02/2022 07:47, Juan Manuel Macías wrote:
> Otherwise, if you export to LaTeX with pandoc (v. 2.14.2), the result is
> (to my surprise) correct:
> 
> #+begin_src sh :results latex
> str="/lorem /ipsum/ dolor/"
> pandoc -f org -t latex <<< $str
> #+end_src
> 
> #+RESULTS:
> #+begin_export latex
> \emph{lorem \emph{ipsum} dolor}
> #+end_export

Nesting of the same emphasis style may be achieved with Org parser as well:

(org-export-string-as
  "/lorem /ipsum// /dolor/"
  'latex t)
"\\emph{lorem \\emph{ipsum}} \\emph{dolor}
"

Namely closing markers should be combined. See the recent thread
Tom Alexander to emacs-orgmode. Inconsistent text markup handling when 
double-nesting markers. Mon, 09 Oct 2023 19:02:18 -0400.
https://list.orgmode.org/dad964f5-c764-4dd5-9829-ca38e3fbeb0d@app.fastmail.com



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Pandoc and nested emhases
  2023-10-11 16:11 ` Max Nikulin
@ 2023-10-22 19:15   ` Juan Manuel Macías
  0 siblings, 0 replies; 6+ messages in thread
From: Juan Manuel Macías @ 2023-10-22 19:15 UTC (permalink / raw)
  To: Max Nikulin; +Cc: orgmode

Sorry for the late reply. I haven't had much time these days to
keep up with the list...

Max Nikulin writes:

> It was long time ago...
>
> On 18/02/2022 07:47, Juan Manuel Macías wrote:
>> Otherwise, if you export to LaTeX with pandoc (v. 2.14.2), the result is
>> (to my surprise) correct:
>> #+begin_src sh :results latex
>> str="/lorem /ipsum/ dolor/"
>> pandoc -f org -t latex <<< $str
>> #+end_src
>> #+RESULTS:
>> #+begin_export latex
>> \emph{lorem \emph{ipsum} dolor}
>> #+end_export
>
> Nesting of the same emphasis style may be achieved with Org parser as well:
>
> (org-export-string-as
>  "/lorem /ipsum// /dolor/"
>  'latex t)
> "\\emph{lorem \\emph{ipsum}} \\emph{dolor}
> "
>
> Namely closing markers should be combined. See the recent thread
> Tom Alexander to emacs-orgmode. Inconsistent text markup handling when
> double-nesting markers. Mon, 09 Oct 2023 19:02:18 -0400.
> https://list.orgmode.org/dad964f5-c764-4dd5-9829-ca38e3fbeb0d@app.fastmail.com

Interesting. So it should be understood that in Org syntax this dummy
title would be possible:

/Essays on Homer's /Odyssey//

==>

\emph{Essays on Homer's \emph{Odyssey}}

but not this other one:

/Essays on Homer's /Odyssey/. Part One/

\emph{Essays on Homer's /Odyssey}. Part One/

In any case, as Ihor commented in the thread you link, and as you also
commented in the other old thread of my discovery about Pandoc, this
behavior of Pandoc should be understood as a bug since it does not
respect the Org syntax. Although the previous example leaves me a little
perplexed.

Actually, I think nested emphases of equal category are not that
necessary. In general, putting something like \textbf{lorem
\textbf{ipsum} dolor} in LaTeX doesn't make much sense, because \textbf
is not a switch. It would only make sense with the LaTeX \emph command,
which, contrary to what some may think, is not used to put text in
italics (that's what \textit is for), but to emphasize a text according
to the context: if the context is in normal font, italics; and vice
versa. If I don't miss anything, I think there is nothing similar in
either html or odt/docx.


-- 
Juan Manuel Macías


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2023-10-22 19:21 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-02-18  0:47 Pandoc and nested emhases Juan Manuel Macías
2022-02-18 12:06 ` Max Nikulin
2022-02-18 12:31   ` Juan Manuel Macías
2022-02-24 12:50     ` Max Nikulin
2023-10-11 16:11 ` Max Nikulin
2023-10-22 19:15   ` Juan Manuel Macías

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).