emacs-orgmode@gnu.org archives
 help / color / mirror / code / Atom feed
* Multiple underscores crash org latex export; other exporters survive
@ 2016-12-03 19:40 Scott Otterson
  2016-12-03 21:24 ` Scott Randby
  0 siblings, 1 reply; 19+ messages in thread
From: Scott Otterson @ 2016-12-03 19:40 UTC (permalink / raw)
  To: Emacs-orgmode@gnu.org


[-- Attachment #1.1: Type: text/plain, Size: 558 bytes --]

When an org file contains a string with more than one underscore, the
orgmode export result will crash latex (example attached).   On the other
hand, the org html export does finish successfully, and while result is
odd, it's odd in a way that makes the problem visible and easy to identify.

Many people have orgfiles with heavily underscored code snippets buried
deep inside.  To them, the latex crashes are probably as mysterious as they
were to me.  So I'd like to suggest that the org latex exporter adopt
something like the org html exporter behavior.

[-- Attachment #1.2: Type: text/html, Size: 656 bytes --]

[-- Attachment #2: multiple_underscores.org --]
[-- Type: application/octet-stream, Size: 312 bytes --]

This string below will crash latex:

a_variable_deleteThisAndItWorks

Many people have orgfiles with heavily underscored code snippets buried deep inside.  The latex crashes are probably mysterious.

html export of multple underscores is odd but in a way that makes the problem visible and easy to debug.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Multiple underscores crash org latex export; other exporters survive
  2016-12-03 19:40 Multiple underscores crash org latex export; other exporters survive Scott Otterson
@ 2016-12-03 21:24 ` Scott Randby
  2016-12-03 22:31   ` Charles C. Berry
  0 siblings, 1 reply; 19+ messages in thread
From: Scott Randby @ 2016-12-03 21:24 UTC (permalink / raw)
  To: emacs-orgmode

On 12/03/2016 02:40 PM, Scott Otterson wrote:
> When an org file contains a string with more than one underscore, the
> orgmode export result will crash latex (example attached).   On the
> other hand, the org html export does finish successfully, and while
> result is odd, it's odd in a way that makes the problem visible and easy
> to identify.

I exported your file using LaTeX export without trouble. The result does
look odd, but that is because the string you gave is missing the braces
needed to make it export properly. The output messages warn that the
string contains a double subscript, so the problem is with the string
and not Org's LaTeX export.

Your string: a_variable_deleteThisAndItWorks

Properly formed string: a_{variable_{deleteThisAndItWorks}}

Scott Randby

> 
> Many people have orgfiles with heavily underscored code snippets buried
> deep inside.  To them, the latex crashes are probably as mysterious as
> they were to me.  So I'd like to suggest that the org latex exporter
> adopt something like the org html exporter behavior.
> 
> 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Multiple underscores crash org latex export; other exporters survive
  2016-12-03 21:24 ` Scott Randby
@ 2016-12-03 22:31   ` Charles C. Berry
  2016-12-04  4:53     ` Scott Randby
  0 siblings, 1 reply; 19+ messages in thread
From: Charles C. Berry @ 2016-12-03 22:31 UTC (permalink / raw)
  To: Scott Randby; +Cc: emacs-orgmode

On Sat, 3 Dec 2016, Scott Randby wrote:

> On 12/03/2016 02:40 PM, Scott Otterson wrote:
>> When an org file contains a string with more than one underscore, the
>> orgmode export result will crash latex (example attached).   On the
>> other hand, the org html export does finish successfully, and while
>> result is odd, it's odd in a way that makes the problem visible and easy
>> to identify.
>
> I exported your file using LaTeX export without trouble. The result does
> look odd, but that is because the string you gave is missing the braces
> needed to make it export properly. The output messages warn that the
> string contains a double subscript, so the problem is with the string
> and not Org's LaTeX export.
>
> Your string: a_variable_deleteThisAndItWorks
>
> Properly formed string: a_{variable_{deleteThisAndItWorks}}

I don't think the OP wanted double subscripts.  If the OP really wanted 
"heavily underscored code snippets", something like:

Code snippet:  ~a_variable_deleteThisAndItWorks~

seems like the way to go.

See (info "(org) Emphasis and monospace")

That way the latex is

: \texttt{a\_variable\_deleteThisAndItWorks}

and that idiom would properly render html as

: <code>a_variable_deleteThisAndItWorks</code>

rather than

: a<sub>variable</sub><sub>deleteThisAndItWorks</sub></p>

as in OPs case. The latter is valid html, but doesn't seem quite like the 
right thing to do for code snippets.

HTH,

Chuck

>
> Scott Randby
>
>>
>> Many people have orgfiles with heavily underscored code snippets buried
>> deep inside.  To them, the latex crashes are probably as mysterious as
>> they were to me.  So I'd like to suggest that the org latex exporter
>> adopt something like the org html exporter behavior.
>>
>>
>
>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Multiple underscores crash org latex export; other exporters survive
  2016-12-03 22:31   ` Charles C. Berry
@ 2016-12-04  4:53     ` Scott Randby
  2016-12-04 10:13       ` Nicolas Goaziou
  0 siblings, 1 reply; 19+ messages in thread
From: Scott Randby @ 2016-12-04  4:53 UTC (permalink / raw)
  To: Charles C. Berry; +Cc: emacs-orgmode

On 12/03/2016 05:31 PM, Charles C. Berry wrote:
> On Sat, 3 Dec 2016, Scott Randby wrote:
> 
>> On 12/03/2016 02:40 PM, Scott Otterson wrote:
>>> When an org file contains a string with more than one underscore, the
>>> orgmode export result will crash latex (example attached).   On the
>>> other hand, the org html export does finish successfully, and while
>>> result is odd, it's odd in a way that makes the problem visible and easy
>>> to identify.
>>
>> I exported your file using LaTeX export without trouble. The result does
>> look odd, but that is because the string you gave is missing the braces
>> needed to make it export properly. The output messages warn that the
>> string contains a double subscript, so the problem is with the string
>> and not Org's LaTeX export.
>>
>> Your string: a_variable_deleteThisAndItWorks
>>
>> Properly formed string: a_{variable_{deleteThisAndItWorks}}
> 
> I don't think the OP wanted double subscripts.  If the OP really wanted
> "heavily underscored code snippets", something like:
> 
> Code snippet:  ~a_variable_deleteThisAndItWorks~
> 
> seems like the way to go.

It seems I misunderstood the original question.

There is an interesting issue here. I sometimes want to use ~ in a code
snippet, so I can't use ~code snippet~. Yet,

Org code: \verb@a_variable_deleteThisAndItWorks@

Exported LaTeX: \verb@a\(_{\text{variable}}_{\text{deleteThisAndItWorks}}\)@

The exported LaTeX is not what we want. Instead,

Org code: #+latex:\verb@a_variable_deleteThisAndItWorks@

Alternative: @@latex:\verb@a_variable_deleteThisAndItWorks@@@

Exported LaTeX: \verb@a_variable_deleteThisAndItWorks@

I've wondered why \verb isn't exported correctly without specifying it
as literal LaTeX, but I don't have the expertise to be able to determine
why.

Scott

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Multiple underscores crash org latex export; other exporters survive
  2016-12-04  4:53     ` Scott Randby
@ 2016-12-04 10:13       ` Nicolas Goaziou
  2016-12-04 16:03         ` Scott Randby
                           ` (2 more replies)
  0 siblings, 3 replies; 19+ messages in thread
From: Nicolas Goaziou @ 2016-12-04 10:13 UTC (permalink / raw)
  To: Scott Randby; +Cc: emacs-orgmode, Charles C. Berry

Hello,

Scott Randby <srandby@gmail.com> writes:

> There is an interesting issue here. I sometimes want to use ~ in a code
> snippet, so I can't use ~code snippet~. Yet,

Indeed, this was discussed in this ML. We need some escape character in
Org. A general escape character is a bit ambitious, and not necessarily
useful, but we could introduce one specifically for verbatim and code
markers, much like in macros and verbatim blocks, e.g.

  ~some\~code\=with special\\ characters~

There is a design decision involved: what character can be escaped? It
could be anything, or limit to "~" for code and "=" for verbatim
markers. For example macros limit escape-able characters to "," and "\".
This makes the contents easier to read, but the rule is inconsistent.

Thoughts?

> Org code: \verb@a_variable_deleteThisAndItWorks@
>
> Exported LaTeX: \verb@a\(_{\text{variable}}_{\text{deleteThisAndItWorks}}\)@
>
> The exported LaTeX is not what we want. Instead,
>
> Org code: #+latex:\verb@a_variable_deleteThisAndItWorks@
>
> Alternative: @@latex:\verb@a_variable_deleteThisAndItWorks@@@
>
> Exported LaTeX: \verb@a_variable_deleteThisAndItWorks@
>
> I've wondered why \verb isn't exported correctly without specifying it
> as literal LaTeX,

It's because Org recognize LaTeX commands only if they are followed by
a blank character, the end of buffer, or "{}", which is not the case
with \verb@...@.

Regards,

-- 
Nicolas Goaziou

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Multiple underscores crash org latex export; other exporters survive
  2016-12-04 10:13       ` Nicolas Goaziou
@ 2016-12-04 16:03         ` Scott Randby
  2016-12-05  8:36         ` Scott Otterson
  2016-12-05 17:30         ` Charles C. Berry
  2 siblings, 0 replies; 19+ messages in thread
From: Scott Randby @ 2016-12-04 16:03 UTC (permalink / raw)
  To: emacs-orgmode

On 12/04/2016 05:13 AM, Nicolas Goaziou wrote:
> Scott Randby <srandby@gmail.com> writes:
> 
>> There is an interesting issue here. I sometimes want to use ~ in a code
>> snippet, so I can't use ~code snippet~. Yet,
> 
> Indeed, this was discussed in this ML. We need some escape character in
> Org. A general escape character is a bit ambitious, and not necessarily
> useful, but we could introduce one specifically for verbatim and code
> markers, much like in macros and verbatim blocks, e.g.
> 
>   ~some\~code\=with special\\ characters~
> 
> There is a design decision involved: what character can be escaped? It
> could be anything, or limit to "~" for code and "=" for verbatim
> markers. For example macros limit escape-able characters to "," and "\".
> This makes the contents easier to read, but the rule is inconsistent.
> 
> Thoughts?

I don't think I have a good enough technical understanding of the issue
to provide constructive thoughts. I've adapted to using Org markup when
it works and specifying literal LaTeX when necessary. I don't need to
specify literal LaTeX very often anyway.

>> I've wondered why \verb isn't exported correctly without specifying it
>> as literal LaTeX,
> 
> It's because Org recognize LaTeX commands only if they are followed by
> a blank character, the end of buffer, or "{}", which is not the case
> with \verb@...@.

Now I understand. Thanks.

Scott

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Multiple underscores crash org latex export; other exporters survive
  2016-12-04 10:13       ` Nicolas Goaziou
  2016-12-04 16:03         ` Scott Randby
@ 2016-12-05  8:36         ` Scott Otterson
  2016-12-06  1:33           ` Scott Randby
  2016-12-05 17:30         ` Charles C. Berry
  2 siblings, 1 reply; 19+ messages in thread
From: Scott Otterson @ 2016-12-05  8:36 UTC (permalink / raw)
  To: Scott Randby, Charles C. Berry, Emacs-orgmode@gnu.org

[-- Attachment #1: Type: text/plain, Size: 2208 bytes --]

Yes, there's a general question of how to escape multiple underscores.

But there's a bigger question too:  Should an org-doc that runs fine in
other exporters cause a messy-to-debug crash when it's exported to Latex?
Is that the Pandoc-like behavior that orgmode seems to be aiming for?

I love org-mode.  For years, I've used it as a project organizer,
brainstorming tool, and extremely versatile notekeeper.  I've already got a
big investment in it, so I'll spend the time to track down this kind of
problem.

But I'd guess that such unexpected Latex crashes have driven new users back
to Word or whatever.

On Sun, Dec 4, 2016 at 11:13 AM, Nicolas Goaziou <mail@nicolasgoaziou.fr>
wrote:

> Hello,
>
> Scott Randby <srandby@gmail.com> writes:
>
> > There is an interesting issue here. I sometimes want to use ~ in a code
> > snippet, so I can't use ~code snippet~. Yet,
>
> Indeed, this was discussed in this ML. We need some escape character in
> Org. A general escape character is a bit ambitious, and not necessarily
> useful, but we could introduce one specifically for verbatim and code
> markers, much like in macros and verbatim blocks, e.g.
>
>   ~some\~code\=with special\\ characters~
>
> There is a design decision involved: what character can be escaped? It
> could be anything, or limit to "~" for code and "=" for verbatim
> markers. For example macros limit escape-able characters to "," and "\".
> This makes the contents easier to read, but the rule is inconsistent.
>
> Thoughts?
>
> > Org code: \verb@a_variable_deleteThisAndItWorks@
> >
> > Exported LaTeX: \verb@a\(_{\text{variable}}_{\
> text{deleteThisAndItWorks}}\)@
> >
> > The exported LaTeX is not what we want. Instead,
> >
> > Org code: #+latex:\verb@a_variable_deleteThisAndItWorks@
> >
> > Alternative: @@latex:\verb@a_variable_deleteThisAndItWorks@@@
> >
> > Exported LaTeX: \verb@a_variable_deleteThisAndItWorks@
> >
> > I've wondered why \verb isn't exported correctly without specifying it
> > as literal LaTeX,
>
> It's because Org recognize LaTeX commands only if they are followed by
> a blank character, the end of buffer, or "{}", which is not the case
> with \verb@...@.
>
> Regards,
>
> --
> Nicolas Goaziou
>
>

[-- Attachment #2: Type: text/html, Size: 3043 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Multiple underscores crash org latex export; other exporters survive
  2016-12-04 10:13       ` Nicolas Goaziou
  2016-12-04 16:03         ` Scott Randby
  2016-12-05  8:36         ` Scott Otterson
@ 2016-12-05 17:30         ` Charles C. Berry
  2 siblings, 0 replies; 19+ messages in thread
From: Charles C. Berry @ 2016-12-05 17:30 UTC (permalink / raw)
  To: Nicolas Goaziou; +Cc: Scott Randby, emacs-orgmode

On Sun, 4 Dec 2016, Nicolas Goaziou wrote:

> Hello,
>
> Scott Randby <srandby@gmail.com> writes:
>
>> There is an interesting issue here. I sometimes want to use ~ in a code
>> snippet, so I can't use ~code snippet~. Yet,
>
> Indeed, this was discussed in this ML. We need some escape character in
> Org. A general escape character is a bit ambitious, and not necessarily
> useful, but we could introduce one specifically for verbatim and code
> markers, much like in macros and verbatim blocks, e.g.
>
>  ~some\~code\=with special\\ characters~
>
> There is a design decision involved: what character can be escaped? It
> could be anything, or limit to "~" for code and "=" for verbatim
> markers. For example macros limit escape-able characters to "," and "\".
> This makes the contents easier to read, but the rule is inconsistent.
>
> Thoughts?

I do not think this is needed.

The existing markup is simple and useful and does not complicate the 
appearance of the org buffer nor editting it.

More complicated inline markups can be rendered with export snippets, 
macros, and/or inline src blocks. For example:

--
#+MACRO: code src_code[:eval no :exports code]{$1}

{{{code(y~x)}}}
--

will handle many situations.

If one needs to render something with embedded commas and/or backslashes 
like

: y~x(z,w)
: cat("here is a newline\n")

then

: src_code{ y~x(z,w) }
: src_code{ cat("here is a newline\n") }

works if `org-babel-default-header-args:code' is

: '((:eval . "no")(:exports . "code"))

HTH,

Chuck

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Multiple underscores crash org latex export; other exporters survive
  2016-12-05  8:36         ` Scott Otterson
@ 2016-12-06  1:33           ` Scott Randby
       [not found]             ` <CAPY3P0QcpouDjmNpmaY6FzNSNjZyHAGufrbVtSuBzPxr6mUAzg@mail.gmail.com>
  2016-12-06 23:50             ` Nicolas Goaziou
  0 siblings, 2 replies; 19+ messages in thread
From: Scott Randby @ 2016-12-06  1:33 UTC (permalink / raw)
  To: Scott Otterson, Emacs-orgmode@gnu.org

On 12/05/2016 03:36 AM, Scott Otterson wrote:
> Yes, there's a general question of how to escape multiple underscores.  
> 
> But there's a bigger question too:  Should an org-doc that runs fine in
> other exporters cause a messy-to-debug crash when it's exported to
> Latex?  Is that the Pandoc-like behavior that orgmode seems to be aiming
> for?
> 
> I love org-mode.  For years, I've used it as a project organizer,
> brainstorming tool, and extremely versatile notekeeper.  I've already
> got a big investment in it, so I'll spend the time to track down this
> kind of problem.  
> 
> But I'd guess that such unexpected Latex crashes have driven new users
> back to Word or whatever.

I don't think you can blame Org for the crashes you've experienced. Org
exports the string "a_variable_deleteThisAndItWorks" to
"a\(_{\text{variable}}_{\text{deleteThisAndItWorks}}\)" which is not a
valid LaTeX expression. This invalid output is not the fault of Org, it
is the fault of the input string. Since there are different ways of
interpreting "a_variable_deleteThisAndItWorks," you have to tell Org
which interpretation you want it to make when you export to LaTeX. This
means additional markup of the string is required.

That the exported LaTeX file crashes your installation of pdfTeX (or
whatever TeX engine you use) when you process it is not the fault of
Org. The exported LaTeX file contains an invalid LaTeX expression that
came from a ambiguous string, so pdfTeX shows an error when processing
the LaTeX file. Using additional markup of the string so that Org
exports it as a valid LaTeX expression is the solution to the problem.

Scott Randby

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Multiple underscores crash org latex export; other exporters survive
       [not found]             ` <CAPY3P0QcpouDjmNpmaY6FzNSNjZyHAGufrbVtSuBzPxr6mUAzg@mail.gmail.com>
@ 2016-12-06 18:18               ` Scott Randby
  0 siblings, 0 replies; 19+ messages in thread
From: Scott Randby @ 2016-12-06 18:18 UTC (permalink / raw)
  To: Scott Otterson, emacs-orgmode

On 12/06/2016 02:19 AM, Scott Otterson wrote:
> I understand that orgmode is literally copying Latex; I am suggesting
> that orgmode should do more than copy.

Org only literally copies some expressions. The expression "A_B_C" is
not a LaTeX expression and it cannot be literally copied when exporting
to LaTeX. Instead, Org must export "A_B_C" to a LaTeX expression. In
this case, Org exports to "A\(_{\text{B}}_{\text{C}}\)" because in LaTeX
the underscore character is used for subscripts in mathematical
expressions. The exported string is not a valid LaTeX expression because
the original string is incomplete. If you don't want Org LaTeX export to
export the underscore character to a subscript, the you need to tell Org
you want it to do that with additional markup such as "~A_B_C~" or
something similar. As someone who relies on Org's LaTeX exporting
capability, I would not be in favor of changing how Org exports a
non-LaTeX expression containing underscores.

> 
> This is for the reasons I gave: fixing problems export problems for one
> export type (usually Latex) breaks it for other export types; new users,
> etc. ...). 

I don't think it is possible to have a markup scheme that enables one to
perfectly export every document to every possible format. For instance,
LaTeX and HTML were designed to meet very different needs. The
intersection of these two markup languages is far from the union of
them. Org does a fantastic job of taking care of the intersection, and
it also includes wonderful tools that allow users to deal with instances
outside of the intersection.

I generally avoid exporting an Org file to both LaTeX and HTML (the two
export formats I use) unless the file is very basic. It is too much work
to make a file that exports to both formats well. But I can't blame Org
for that problem. It is the differing natures and purposes of LaTeX and
HTML that cause the trouble.

I'm not saying that Org export is as good as it can be. Certain aspects
of the default LaTeX export could be improved to help new users in my
opinion, but I haven't had the time to suggest those improvements.

I sympathize with the problems some new users may face when exporting an
Org file to various formats. I have friends who moved to Org but had
trouble with PDF export because they had no knowledge of LaTeX or
understanding of TeX distributions. Installing Emacs was a challenge for
some of my friends. The only solution I've been able to devise is to
provide as much help as possible and to encourage my friends to
subscribe to this mailing list.

Scott Randby

> 
> What do you think of those points?
> 
> On Tue, Dec 6, 2016 at 2:33 AM, Scott Randby <srandby@gmail.com
> <mailto:srandby@gmail.com>> wrote:
> 
>     On 12/05/2016 03:36 AM, Scott Otterson wrote:
>     > Yes, there's a general question of how to escape multiple underscores.
>     >
>     > But there's a bigger question too:  Should an org-doc that runs fine in
>     > other exporters cause a messy-to-debug crash when it's exported to
>     > Latex?  Is that the Pandoc-like behavior that orgmode seems to be aiming
>     > for?
>     >
>     > I love org-mode.  For years, I've used it as a project organizer,
>     > brainstorming tool, and extremely versatile notekeeper.  I've already
>     > got a big investment in it, so I'll spend the time to track down this
>     > kind of problem.
>     >
>     > But I'd guess that such unexpected Latex crashes have driven new users
>     > back to Word or whatever.
> 
>     I don't think you can blame Org for the crashes you've experienced. Org
>     exports the string "a_variable_deleteThisAndItWorks" to
>     "a\(_{\text{variable}}_{\text{deleteThisAndItWorks}}\)" which is not a
>     valid LaTeX expression. This invalid output is not the fault of Org, it
>     is the fault of the input string. Since there are different ways of
>     interpreting "a_variable_deleteThisAndItWorks," you have to tell Org
>     which interpretation you want it to make when you export to LaTeX. This
>     means additional markup of the string is required.
> 
>     That the exported LaTeX file crashes your installation of pdfTeX (or
>     whatever TeX engine you use) when you process it is not the fault of
>     Org. The exported LaTeX file contains an invalid LaTeX expression that
>     came from a ambiguous string, so pdfTeX shows an error when processing
>     the LaTeX file. Using additional markup of the string so that Org
>     exports it as a valid LaTeX expression is the solution to the problem.
> 
>     Scott Randby
> 
> 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Multiple underscores crash org latex export; other exporters survive
  2016-12-06  1:33           ` Scott Randby
       [not found]             ` <CAPY3P0QcpouDjmNpmaY6FzNSNjZyHAGufrbVtSuBzPxr6mUAzg@mail.gmail.com>
@ 2016-12-06 23:50             ` Nicolas Goaziou
  2016-12-07  2:06               ` Scott Randby
  1 sibling, 1 reply; 19+ messages in thread
From: Nicolas Goaziou @ 2016-12-06 23:50 UTC (permalink / raw)
  To: Scott Randby; +Cc: Emacs-orgmode@gnu.org, Scott Otterson

Hello,

Scott Randby <srandby@gmail.com> writes:

> I don't think you can blame Org for the crashes you've experienced. Org
> exports the string "a_variable_deleteThisAndItWorks" to
> "a\(_{\text{variable}}_{\text{deleteThisAndItWorks}}\)" which is not a
> valid LaTeX expression. This invalid output is not the fault of Org, it
> is the fault of the input string. Since there are different ways of
> interpreting "a_variable_deleteThisAndItWorks," you have to tell Org
> which interpretation you want it to make when you export to LaTeX. This
> means additional markup of the string is required.
>
> That the exported LaTeX file crashes your installation of pdfTeX (or
> whatever TeX engine you use) when you process it is not the fault of
> Org. The exported LaTeX file contains an invalid LaTeX expression that
> came from a ambiguous string, so pdfTeX shows an error when processing
> the LaTeX file. Using additional markup of the string so that Org
> exports it as a valid LaTeX expression is the solution to the problem.

Even though the OP used an ambiguous string, I don't think Org should
generate invalid LaTeX.

I fixed it in maint. Basically, a_b_c is translated as a\(_b\)\(_c\).

Regards,

-- 
Nicolas Goaziou

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Multiple underscores crash org latex export; other exporters survive
  2016-12-06 23:50             ` Nicolas Goaziou
@ 2016-12-07  2:06               ` Scott Randby
  2016-12-07 11:21                 ` Nicolas Goaziou
  0 siblings, 1 reply; 19+ messages in thread
From: Scott Randby @ 2016-12-07  2:06 UTC (permalink / raw)
  To: Nicolas Goaziou, emacs-orgmode

On 12/06/2016 06:50 PM, Nicolas Goaziou wrote:
> Hello,
> 
> Scott Randby <srandby@gmail.com> writes:
> 
>> I don't think you can blame Org for the crashes you've experienced. Org
>> exports the string "a_variable_deleteThisAndItWorks" to
>> "a\(_{\text{variable}}_{\text{deleteThisAndItWorks}}\)" which is not a
>> valid LaTeX expression. This invalid output is not the fault of Org, it
>> is the fault of the input string. Since there are different ways of
>> interpreting "a_variable_deleteThisAndItWorks," you have to tell Org
>> which interpretation you want it to make when you export to LaTeX. This
>> means additional markup of the string is required.
>>
>> That the exported LaTeX file crashes your installation of pdfTeX (or
>> whatever TeX engine you use) when you process it is not the fault of
>> Org. The exported LaTeX file contains an invalid LaTeX expression that
>> came from a ambiguous string, so pdfTeX shows an error when processing
>> the LaTeX file. Using additional markup of the string so that Org
>> exports it as a valid LaTeX expression is the solution to the problem.
> 
> Even though the OP used an ambiguous string, I don't think Org should
> generate invalid LaTeX.
> 
> I fixed it in maint. Basically, a_b_c is translated as a\(_b\)\(_c\).

With this solution, "a" will be exported as text while "b" and "c" will
be exported as a single subsrcipt using the math font. I would prefer
that a_b_c be translated as \(a_{b_{c}}\).

Scott Randby

> 
> Regards,
> 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Multiple underscores crash org latex export; other exporters survive
  2016-12-07  2:06               ` Scott Randby
@ 2016-12-07 11:21                 ` Nicolas Goaziou
  2016-12-10 23:24                   ` Scott Randby
  0 siblings, 1 reply; 19+ messages in thread
From: Nicolas Goaziou @ 2016-12-07 11:21 UTC (permalink / raw)
  To: Scott Randby; +Cc: emacs-orgmode

Hello,

Scott Randby <srandby@gmail.com> writes:

> On 12/06/2016 06:50 PM, Nicolas Goaziou wrote:
>> I fixed it in maint. Basically, a_b_c is translated as a\(_b\)\(_c\).
>
> With this solution, "a" will be exported as text while "b" and "c" will
> be exported as a single subsrcipt using the math font.

Not at all. I simplified the output for clarity but the real generated
code is

  a\(_{\text{b}}\)\(_{\text{c}}\)

> I would prefer that a_b_c be translated as \(a_{b_{c}}\).

As you pointed out, the string is ambiguous anyway. The point of the fix
is to generate valid LaTeX code. You can obtain the above unambiguously
with

  a_{b_c}

Regards,

-- 
Nicolas Goaziou

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Multiple underscores crash org latex export; other exporters survive
  2016-12-07 11:21                 ` Nicolas Goaziou
@ 2016-12-10 23:24                   ` Scott Randby
  2016-12-11  0:05                     ` Nicolas Goaziou
  0 siblings, 1 reply; 19+ messages in thread
From: Scott Randby @ 2016-12-10 23:24 UTC (permalink / raw)
  To: emacs-orgmode, Nicolas Goaziou

[-- Attachment #1: Type: text/plain, Size: 2426 bytes --]

On 12/07/2016 06:21 AM, Nicolas Goaziou wrote:
> Hello,
> 
> Scott Randby <srandby@gmail.com> writes:
> 
>> On 12/06/2016 06:50 PM, Nicolas Goaziou wrote:
>>> I fixed it in maint. Basically, a_b_c is translated as a\(_b\)\(_c\).
>>
>> With this solution, "a" will be exported as text while "b" and "c" will
>> be exported as a single subsrcipt using the math font.
> 
> Not at all. I simplified the output for clarity but the real generated
> code is
> 
>   a\(_{\text{b}}\)\(_{\text{c}}\)
> 
>> I would prefer that a_b_c be translated as \(a_{b_{c}}\).
> 
> As you pointed out, the string is ambiguous anyway. The point of the fix
> is to generate valid LaTeX code. You can obtain the above unambiguously
> with
> 
>   a_{b_c}

I have a few comments about the choice you've made for the translation
of underscores that do not appear in \(...\) or $...$ expressions. I've
attached a PDF to help explain my points.

First, a_b_c and a_bc give nearly identical results when processed
except that the spacing between letters in the subscript coming from
a_b_c is wider than that in the subscript coming from a_bc. The spacing
issue can be seen more clearly by comparing the processed versions of
a_b_c_d and a_bcd. The wider spacing looks funny when compared to
surrounding text.

Second, I've been considering the issue of what the primary meaning of
a_b_c should be when it is used in a file. Since underscores are used in
LaTeX to designate subscripts in mathematical expressions, it is clear
to me that a_b_c should mean a_(b_c) instead of a_(bc). Similarly, a^b^c
should mean a^(b^c) instead of a^(bc). I really can't think of any case
when I would type in a_b_c in order to get a_(bc).

Finally, I question the use of text style instead of math style when
a_b, a^b, or similar expressions are exported to LaTeX. TeX was designed
to produce beautiful typeset mathematics, and the math style is an
essential part of that beauty. Math style is also the expected way of
presenting mathematical expressions in all of the professional journals
that I read and use. New users may not be aware of professional
standards, but we could help them be aware by translating a_b_c as
\(a_{b_{c}}\) exactly. Similarly a^b^c would be translated as
\(a^{b^{c}}\) exactly.

However, the translation of a_b_c doesn't matter a whole lot to me since
I always use \(a_{b_{c}}\) or $a_{b_{c}}$ in my Org documents.

Scott Randby

> 
> Regards,
> 

[-- Attachment #2: Org-Subscripts-Superscripts.pdf --]
[-- Type: application/pdf, Size: 74654 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Multiple underscores crash org latex export; other exporters survive
  2016-12-10 23:24                   ` Scott Randby
@ 2016-12-11  0:05                     ` Nicolas Goaziou
  2016-12-12  3:37                       ` Scott Randby
  0 siblings, 1 reply; 19+ messages in thread
From: Nicolas Goaziou @ 2016-12-11  0:05 UTC (permalink / raw)
  To: Scott Randby; +Cc: emacs-orgmode

Hello,

Scott Randby <srandby@gmail.com> writes:

> First, a_b_c and a_bc give nearly identical results when processed
> except that the spacing between letters in the subscript coming from
> a_b_c is wider than that in the subscript coming from a_bc. The spacing
> issue can be seen more clearly by comparing the processed versions of
> a_b_c_d and a_bcd. The wider spacing looks funny when compared to
> surrounding text.
>
> Second, I've been considering the issue of what the primary meaning of
> a_b_c should be when it is used in a file. Since underscores are used in
> LaTeX to designate subscripts in mathematical expressions, it is clear
> to me that a_b_c should mean a_(b_c) instead of a_(bc). Similarly, a^b^c
> should mean a^(b^c) instead of a^(bc). I really can't think of any case
> when I would type in a_b_c in order to get a_(bc).

I chose this translation because

  1. It was the easiest to implement;
  2. It is the closest to what Org (unlike to LaTeX) really sees;
  3. It doesn't matter since it is a pathological case.

> Finally, I question the use of text style instead of math style when
> a_b, a^b, or similar expressions are exported to LaTeX. TeX was designed
> to produce beautiful typeset mathematics, and the math style is an
> essential part of that beauty. Math style is also the expected way of
> presenting mathematical expressions in all of the professional journals
> that I read and use. New users may not be aware of professional
> standards, but we could help them be aware by translating a_b_c as
> \(a_{b_{c}}\) exactly. Similarly a^b^c would be translated as
> \(a^{b^{c}}\) exactly.

I strongly disagree. 

Org is not a front-end to LaTeX. LaTeX is but one of its export output
formats. In most of the others, sup/superscript doesn't force math mode
on the user. 

In Org, if you want to enter math mode, you need to explicitly request
it, e.g., with:

  a_{$1 + 1$}

> However, the translation of a_b_c doesn't matter a whole lot to me since
> I always use \(a_{b_{c}}\) or $a_{b_{c}}$ in my Org documents.

Exactly.

Regards,

-- 
Nicolas Goaziou

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Multiple underscores crash org latex export; other exporters survive
  2016-12-11  0:05                     ` Nicolas Goaziou
@ 2016-12-12  3:37                       ` Scott Randby
  2016-12-12  7:46                         ` Nicolas Goaziou
  0 siblings, 1 reply; 19+ messages in thread
From: Scott Randby @ 2016-12-12  3:37 UTC (permalink / raw)
  To: emacs-orgmode, Nicolas Goaziou

On 12/10/2016 07:05 PM, Nicolas Goaziou wrote:
> Hello,
> 
> Scott Randby <srandby@gmail.com> writes:
> 
>> First, a_b_c and a_bc give nearly identical results when processed
>> except that the spacing between letters in the subscript coming from
>> a_b_c is wider than that in the subscript coming from a_bc. The spacing
>> issue can be seen more clearly by comparing the processed versions of
>> a_b_c_d and a_bcd. The wider spacing looks funny when compared to
>> surrounding text.
>>
>> Second, I've been considering the issue of what the primary meaning of
>> a_b_c should be when it is used in a file. Since underscores are used in
>> LaTeX to designate subscripts in mathematical expressions, it is clear
>> to me that a_b_c should mean a_(b_c) instead of a_(bc). Similarly, a^b^c
>> should mean a^(b^c) instead of a^(bc). I really can't think of any case
>> when I would type in a_b_c in order to get a_(bc).
> 
> I chose this translation because
> 
>   1. It was the easiest to implement;
>   2. It is the closest to what Org (unlike to LaTeX) really sees;

It seems odd to me that Org would see multiple subscripts as being
really one subscript. And the spacing in the subscript when a_b_c is
exported and processed is still bad.

>   3. It doesn't matter since it is a pathological case.

I agree.

> 
>> Finally, I question the use of text style instead of math style when
>> a_b, a^b, or similar expressions are exported to LaTeX. TeX was designed
>> to produce beautiful typeset mathematics, and the math style is an
>> essential part of that beauty. Math style is also the expected way of
>> presenting mathematical expressions in all of the professional journals
>> that I read and use. New users may not be aware of professional
>> standards, but we could help them be aware by translating a_b_c as
>> \(a_{b_{c}}\) exactly. Similarly a^b^c would be translated as
>> \(a^{b^{c}}\) exactly.
> 
> I strongly disagree. 
> 
> Org is not a front-end to LaTeX. LaTeX is but one of its export output
> formats. In most of the others, sup/superscript doesn't force math mode
> on the user.

I realize that Org is not a front-end to LaTeX. But the standards of
other output formats are not the standards of LaTeX. Different output
formats have different purposes. In this case, one could view Org as
forcing text mode on the user when math mode is appropriate.

> 
> In Org, if you want to enter math mode, you need to explicitly request
> it, e.g., with:
> 
>   a_{$1 + 1$}

If this was true, then a_b would be exported to a\_b instead of
a\(_\text{b}\). Math mode is entered any time \(...\) is used, and in
the a_b case, the user did not explicitly request it.

Scott Randby

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Multiple underscores crash org latex export; other exporters survive
  2016-12-12  3:37                       ` Scott Randby
@ 2016-12-12  7:46                         ` Nicolas Goaziou
  2016-12-12 15:18                           ` Scott Otterson
  0 siblings, 1 reply; 19+ messages in thread
From: Nicolas Goaziou @ 2016-12-12  7:46 UTC (permalink / raw)
  To: Scott Randby; +Cc: emacs-orgmode

Hello,

Scott Randby <srandby@gmail.com> writes:

> It seems odd to me that Org would see multiple subscripts as being
> really one subscript.

It doesn't. LaTeX does. Org clearly translates a_b_c as two consecutive
subscript components. Then LaTeX visually merges them.

> I realize that Org is not a front-end to LaTeX. But the standards of
> other output formats are not the standards of LaTeX. Different output
> formats have different purposes. In this case, one could view Org as
> forcing text mode on the user when math mode is appropriate.

Who knows what mode is appropriate? Certainly not Org.

>> In Org, if you want to enter math mode, you need to explicitly request
>> it, e.g., with:
>> 
>>   a_{$1 + 1$}
>
> If this was true, then a_b would be exported to a\_b instead of
> a\(_\text{b}\).

Of course note, a\_b is not even a subscript. a\(_\text{b}\) is a robust
way to get "textual" subscript, which is the default for Org.

> Math mode is entered any time \(...\) is used, and in the a_b case,
> the user did not explicitly request it.

AFAIK, there is no way in LaTeX to get subscript without entering,
temporarily in this case, in math mode. Anyway, per \text{}, "b" is not
really in math mode, as you noticed.

Regards,

-- 
Nicolas Goaziou

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Multiple underscores crash org latex export; other exporters survive
  2016-12-12  7:46                         ` Nicolas Goaziou
@ 2016-12-12 15:18                           ` Scott Otterson
  2016-12-13 12:54                             ` Nicolas Goaziou
  0 siblings, 1 reply; 19+ messages in thread
From: Scott Otterson @ 2016-12-12 15:18 UTC (permalink / raw)
  To: Scott Randby, Emacs-orgmode@gnu.org

[-- Attachment #1: Type: text/plain, Size: 2708 bytes --]

Thanks to Nicolas and Scott for your painstaking efforts.  At least for me,
a fine stopgap measure is to simply avoid Latex crashes for orgmode
contents that are not explicitly Latex.  Sometime after that, it would be
ideal to produce similar output for all export types, insofar as that's
possible.

I thought I'd see what ox-pandoc does.  As I'm sure you know, pandoc
converts all input formats to a master markup language, and then converts
that to whatever output format is desired -- a design that makes output
uniformity easier to obtain.  Orgmode is already halfway there, since the
master markup language is orgmode itself.

Here's what pandoc does in the three cases I've recently posted about:

1.) *Multiple underscores* (the subject of this thread):  Pandoc doesn't
crash and it exports the same thing for either html or latex: everything
after the first underscore is subscripted and all underscores are deleted.
I don't love that behavior but it's consistent.

2.) *Plain lists with more than four sublevels*: For html export, pandoc
and orgmode do what you'd expect: produce a deeply nested html list.  For
(Windows) latex export, pandoc and orgmode also do the same thing: crash.
Ideally, pandoc would have generated valid Latex for deep list nesting, but
at least it's not completely ornery; it snips out the part of the original
Latex error message that points to the cause.

Perhaps pandoc latex export wouldn't crash in Linux, just as orgmode latex
export doesn't crash in Linux (from Nicolas).  This is still a mystery.
Nicolas's Linux-produced tex file is essentially the same as the one I got
in Windows, and it crashes Windows latexmk just like mine does.  *Nicolas*,
could it be that you're not running latexmk on your exports?

3.) *Web link with a '#' in the URL*: Pandoc never crashes and it exports
nearly the same thing for html or latex pdf:  In either case, clicking on
the link sends you to the right web page, and the only difference is that,
in the output pdf, the link text isn't highlighted; instead there's a
tooltip popup.

The reason pandoc latex export doesn't crash but orgmode does (in Windows)
is that pandoc escapes the '#'.  In the example I posted last week, orgmode
does this:

\section{Some section  \href{http://orgmode.org/manual/Column-groups.html#
Column-groups}{A random link}}


while pandoc does this:

\section{\texorpdfstring{Some section \href{http://orgmode.org/
manual/Column-groups.html\#Column-groups}{A random link}}{Some section A
random link}}


I don't understand why the escape prevents Windows crashes but doesn't
appear to be needed for Linux.  Nevertheless, it looks like pandoc does
something special to prevent this crash.

Scott

>
>

[-- Attachment #2: Type: text/html, Size: 3782 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Multiple underscores crash org latex export; other exporters survive
  2016-12-12 15:18                           ` Scott Otterson
@ 2016-12-13 12:54                             ` Nicolas Goaziou
  0 siblings, 0 replies; 19+ messages in thread
From: Nicolas Goaziou @ 2016-12-13 12:54 UTC (permalink / raw)
  To: Scott Otterson; +Cc: Scott Randby, Emacs-orgmode@gnu.org

Hello,

Scott Otterson <scotto@sharpleaf.org> writes:

> 1.) *Multiple underscores* (the subject of this thread):  Pandoc doesn't
> crash and it exports the same thing for either html or latex: everything
> after the first underscore is subscripted and all underscores are deleted.
> I don't love that behavior but it's consistent.

I think the current Org behaviour is satisfying enough.

> 2.) *Plain lists with more than four sublevels*: For html export, pandoc
> and orgmode do what you'd expect: produce a deeply nested html list.  For
> (Windows) latex export, pandoc and orgmode also do the same thing: crash.
> Ideally, pandoc would have generated valid Latex for deep list nesting, but
> at least it's not completely ornery; it snips out the part of the original
> Latex error message that points to the cause.

I'm not versed enough in LaTeX to work around this properly. Depending
on user-defined document header, deep lists may or may not be exported
correctly.

> 3.) *Web link with a '#' in the URL*: Pandoc never crashes and it exports
> nearly the same thing for html or latex pdf:  In either case, clicking on
> the link sends you to the right web page, and the only difference is that,
> in the output pdf, the link text isn't highlighted; instead there's a
> tooltip popup.

This is now fixed in maint. Thank you.

Regards,

-- 
Nicolas Goaziou

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2016-12-13 12:54 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-12-03 19:40 Multiple underscores crash org latex export; other exporters survive Scott Otterson
2016-12-03 21:24 ` Scott Randby
2016-12-03 22:31   ` Charles C. Berry
2016-12-04  4:53     ` Scott Randby
2016-12-04 10:13       ` Nicolas Goaziou
2016-12-04 16:03         ` Scott Randby
2016-12-05  8:36         ` Scott Otterson
2016-12-06  1:33           ` Scott Randby
     [not found]             ` <CAPY3P0QcpouDjmNpmaY6FzNSNjZyHAGufrbVtSuBzPxr6mUAzg@mail.gmail.com>
2016-12-06 18:18               ` Scott Randby
2016-12-06 23:50             ` Nicolas Goaziou
2016-12-07  2:06               ` Scott Randby
2016-12-07 11:21                 ` Nicolas Goaziou
2016-12-10 23:24                   ` Scott Randby
2016-12-11  0:05                     ` Nicolas Goaziou
2016-12-12  3:37                       ` Scott Randby
2016-12-12  7:46                         ` Nicolas Goaziou
2016-12-12 15:18                           ` Scott Otterson
2016-12-13 12:54                             ` Nicolas Goaziou
2016-12-05 17:30         ` Charles C. Berry

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).