emacs-orgmode@gnu.org archives
 help / color / mirror / code / Atom feed
* Embedded LaTeX does not work with Unicode quotes
@ 2014-11-11 21:45 Marcin Borkowski
  2014-11-12  6:05 ` Nick Dokos
  0 siblings, 1 reply; 5+ messages in thread
From: Marcin Borkowski @ 2014-11-11 21:45 UTC (permalink / raw)
  To: Org-Mode mailing list

Hi list,

I have this: „$n\eps\le b$”, and it seems not to be recognized as a
LaTeX fragment.  The manual says:

================
To avoid conflicts with currency specifications, single `$' characters
are only recognized as math delimiters if the enclosed text contains at
most two line breaks, is directly attached to the `$' characters with no
whitespace in between, and if the closing `$' is followed by whitespace,
punctuation or a dash.
================

When I C-u C-x = on the closing quote, I get

================
             position: 54465 of 108125 (50%), restriction: <52496-56766>, column: 152
            character: ” (displayed as ”) (codepoint 8221, #o20035, #x201d)
    preferred charset: unicode (Unicode (ISO10646))
code point in charset: 0x201D
               syntax: . 	which means: punctuation
             category: .:Base, c:Chinese, h:Korean, j:Japanese
             to input: type "C-x 8 RET HEX-CODEPOINT" or "C-x 8 RET NAME"
          buffer code: #xE2 #x80 #x9D
            file code: #xE2 #x80 #x9D (encoded by coding system utf-8-unix)
              display: by this font (glyph code)
    xft:-unknown-Ubuntu Mono-normal-normal-normal-*-17-*-*-*-m-0-iso10646-1 (#x71)

Character code properties: customize what to show
  name: RIGHT DOUBLE QUOTATION MARK
  old-name: DOUBLE COMMA QUOTATION MARK
  general-category: Pf (Punctuation, Final quote)
  decomposition: (8221) ('”')

There are text properties here:
  fontified            t
================

so I don't know why it is not recognized as punctuation.  Consequently,
it is exported verbatim (with `\$') into LaTeX, and also (obviously) C-c
C-x C-l does not fontify it.  When I change ” into " (the ASCII #x22
quote), everything is ok.

My questions:

1. Isn't it a bug?

2. If not, what can I do to in my config so that it is recognized
properly?

PS. I just recalled that using \(...\) should help, and indeed it does.
Still, I'm curious about the answer to my questions (now that I
remembered a workaround, especially #1).

TIA,

-- 
Marcin Borkowski
http://octd.wmi.amu.edu.pl/en/Marcin_Borkowski
Adam Mickiewicz University

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Embedded LaTeX does not work with Unicode quotes
  2014-11-11 21:45 Embedded LaTeX does not work with Unicode quotes Marcin Borkowski
@ 2014-11-12  6:05 ` Nick Dokos
  2014-11-12 18:37   ` Marcin Borkowski
  2014-11-12 19:53   ` Florian Beck
  0 siblings, 2 replies; 5+ messages in thread
From: Nick Dokos @ 2014-11-12  6:05 UTC (permalink / raw)
  To: emacs-orgmode

Marcin Borkowski <mbork@wmi.amu.edu.pl> writes:

> Hi list,
>
> I have this: „$n\eps\le b$”, and it seems not to be recognized as a
> LaTeX fragment.  The manual says:
>
> ================
> To avoid conflicts with currency specifications, single `$' characters
> are only recognized as math delimiters if the enclosed text contains at
> most two line breaks, is directly attached to the `$' characters with no
> whitespace in between, and if the closing `$' is followed by whitespace,
> punctuation or a dash.
> ================
>
> When I C-u C-x = on the closing quote, I get
>
> ================
> ...
>                syntax: . 	which means: punctuation
> ...
> ================
>
> so I don't know why it is not recognized as punctuation.  Consequently,
> it is exported verbatim (with `\$') into LaTeX, and also (obviously) C-c
> C-x C-l does not fontify it.  When I change ” into " (the ASCII #x22
> quote), everything is ok.
>

The $...$ construct is recognized by a regexp which, while complicated,
is not complicated enough to recognize everything that's marked
"punctuation" in the syntax tables. Look for org-latex-regexps in org.el
(and note that the regexp for "$" is about twice as long as the next
longest regexp - the one for "begin"). The others (for \(...\), \[...\]
and $$..$$) are fairly trivial.

> My questions:
>
> 1. Isn't it a bug?
>

Yes, probably - but looking at the regexp, I cringe: I don't want to even
try deciphering it, let alone change it - life's too short...

> 2. If not, what can I do to in my config so that it is recognized
> properly?
>
> PS. I just recalled that using \(...\) should help, and indeed it does.
> Still, I'm curious about the answer to my questions (now that I
> remembered a workaround, especially #1).
>
That is indeed the best solution.

-- 
Nick

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Embedded LaTeX does not work with Unicode quotes
  2014-11-12  6:05 ` Nick Dokos
@ 2014-11-12 18:37   ` Marcin Borkowski
  2014-11-12 19:53   ` Florian Beck
  1 sibling, 0 replies; 5+ messages in thread
From: Marcin Borkowski @ 2014-11-12 18:37 UTC (permalink / raw)
  To: emacs-orgmode


On 2014-11-12, at 07:05, Nick Dokos wrote:

> Marcin Borkowski <mbork@wmi.amu.edu.pl> writes:
>
>> Hi list,
>>
>> I have this: „$n\eps\le b$”, and it seems not to be recognized as a
>> LaTeX fragment.  The manual says:
>>
>> ================
>> To avoid conflicts with currency specifications, single `$' characters
>> are only recognized as math delimiters if the enclosed text contains at
>> most two line breaks, is directly attached to the `$' characters with no
>> whitespace in between, and if the closing `$' is followed by whitespace,
>> punctuation or a dash.
>> ================
>>
>> When I C-u C-x = on the closing quote, I get
>>
>> ================
>> ...
>>                syntax: . 	which means: punctuation
>> ...
>> ================
>>
>> so I don't know why it is not recognized as punctuation.  Consequently,
>> it is exported verbatim (with `\$') into LaTeX, and also (obviously) C-c
>> C-x C-l does not fontify it.  When I change ” into " (the ASCII #x22
>> quote), everything is ok.
>>
>
> The $...$ construct is recognized by a regexp which, while complicated,
> is not complicated enough to recognize everything that's marked
> "punctuation" in the syntax tables. Look for org-latex-regexps in org.el
> (and note that the regexp for "$" is about twice as long as the next
> longest regexp - the one for "begin"). The others (for \(...\), \[...\]
> and $$..$$) are fairly trivial.
>
>> My questions:
>>
>> 1. Isn't it a bug?
>>
>
> Yes, probably - but looking at the regexp, I cringe: I don't want to even
> try deciphering it, let alone change it - life's too short...

Ah, regex.  I have no more questions...

>> 2. If not, what can I do to in my config so that it is recognized
>> properly?
>>
>> PS. I just recalled that using \(...\) should help, and indeed it does.
>> Still, I'm curious about the answer to my questions (now that I
>> remembered a workaround, especially #1).
>>
> That is indeed the best solution.

Yep.

Thanks!

-- 
Marcin Borkowski
http://octd.wmi.amu.edu.pl/en/Marcin_Borkowski
Adam Mickiewicz University

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Embedded LaTeX does not work with Unicode quotes
  2014-11-12  6:05 ` Nick Dokos
  2014-11-12 18:37   ` Marcin Borkowski
@ 2014-11-12 19:53   ` Florian Beck
  2014-11-13 18:24     ` Nicolas Goaziou
  1 sibling, 1 reply; 5+ messages in thread
From: Florian Beck @ 2014-11-12 19:53 UTC (permalink / raw)
  To: emacs-orgmode

Nick Dokos <ndokos@gmail.com> writes:

> "punctuation" in the syntax tables. Look for org-latex-regexps in
> org.el

The line in question is

#+BEGIN_SRC emacs-lisp
("$" "\\([^$]\\|^\\)\\(\\(\\$\\([^ 	\r\n,;.$][^$\n\r]*?\\(\n[^$\n\r]*?\\)\\{0,2\\}[^ 	\r\n,.$]\\)\\$\\)\\)\\([- 	.,?;:'\")\000]\\|$\\)" 2 nil)
#+END_SRC

It's probably not too hard to see that the culprit is the bunch of
punctuation characters towards the end. Indeed if you change .,?;:'\"
to .,?;:'\"” -- that solves the OPs problem. However, it might be even
better to use a more general syntax, [:punct:], which matches all
punctuation (as we want). So:

#+BEGIN_SRC emacs-lisp
("$" "\\([^$]\\|^\\)\\(\\(\\$\\([^ 	\r\n,;.$][^$\n\r]*?\\(\n[^$\n\r]*?\\)\\{0,2\\}[^ 	\r\n,.$]\\)\\$\\)\\)\\([- 	[:punct:]\000]\\|$\\)" 2 nil)
#+END_SRC


-- 
Florian Beck

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Embedded LaTeX does not work with Unicode quotes
  2014-11-12 19:53   ` Florian Beck
@ 2014-11-13 18:24     ` Nicolas Goaziou
  0 siblings, 0 replies; 5+ messages in thread
From: Nicolas Goaziou @ 2014-11-13 18:24 UTC (permalink / raw)
  To: Florian Beck; +Cc: emacs-orgmode

Hello,

Florian Beck <fb@miszellen.de> writes:

> Nick Dokos <ndokos@gmail.com> writes:
>
>> "punctuation" in the syntax tables. Look for org-latex-regexps in
>> org.el
>
> The line in question is
>
> #+BEGIN_SRC emacs-lisp
> ("$" "\\([^$]\\|^\\)\\(\\(\\$\\([^ 	\r\n,;.$][^$\n\r]*?\\(\n[^$\n\r]*?\\)\\{0,2\\}[^ 	\r\n,.$]\\)\\$\\)\\)\\([- 	.,?;:'\")\000]\\|$\\)" 2 nil)
> #+END_SRC
>
> It's probably not too hard to see that the culprit is the bunch of
> punctuation characters towards the end. Indeed if you change .,?;:'\"
> to .,?;:'\"” -- that solves the OPs problem. However, it might be even
> better to use a more general syntax, [:punct:], which matches all
> punctuation (as we want). So:
>
> #+BEGIN_SRC emacs-lisp
> ("$" "\\([^$]\\|^\\)\\(\\(\\$\\([^ 	\r\n,;.$][^$\n\r]*?\\(\n[^$\n\r]*?\\)\\{0,2\\}[^ 	\r\n,.$]\\)\\$\\)\\)\\([- 	[:punct:]\000]\\|$\\)" 2 nil)
> #+END_SRC

Actually this variable is hardly used throughout Org code base. See
org-element-latex-fragment-parser instead (which has the same "problem"
anyway).

Also, according to Elisp manual; [:punct:] is not ideal either:

  `[:punct:]'
       This matches any punctuation character.  (At present, for multibyte
       characters, it matches anything that has non-word syntax.)

There is also "\s.".

Anyway, it might be better to know exactly what kind of false positives
we want to avoid.


Regards,

-- 
Nicolas Goaziou

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2014-11-13 18:24 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-11-11 21:45 Embedded LaTeX does not work with Unicode quotes Marcin Borkowski
2014-11-12  6:05 ` Nick Dokos
2014-11-12 18:37   ` Marcin Borkowski
2014-11-12 19:53   ` Florian Beck
2014-11-13 18:24     ` Nicolas Goaziou

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).