* Embedded LaTeX does not work with Unicode quotes
@ 2014-11-11 21:45 Marcin Borkowski
2014-11-12 6:05 ` Nick Dokos
0 siblings, 1 reply; 5+ messages in thread
From: Marcin Borkowski @ 2014-11-11 21:45 UTC (permalink / raw)
To: Org-Mode mailing list
Hi list,
I have this: „$n\eps\le b$”, and it seems not to be recognized as a
LaTeX fragment. The manual says:
================
To avoid conflicts with currency specifications, single `$' characters
are only recognized as math delimiters if the enclosed text contains at
most two line breaks, is directly attached to the `$' characters with no
whitespace in between, and if the closing `$' is followed by whitespace,
punctuation or a dash.
================
When I C-u C-x = on the closing quote, I get
================
position: 54465 of 108125 (50%), restriction: <52496-56766>, column: 152
character: ” (displayed as ”) (codepoint 8221, #o20035, #x201d)
preferred charset: unicode (Unicode (ISO10646))
code point in charset: 0x201D
syntax: . which means: punctuation
category: .:Base, c:Chinese, h:Korean, j:Japanese
to input: type "C-x 8 RET HEX-CODEPOINT" or "C-x 8 RET NAME"
buffer code: #xE2 #x80 #x9D
file code: #xE2 #x80 #x9D (encoded by coding system utf-8-unix)
display: by this font (glyph code)
xft:-unknown-Ubuntu Mono-normal-normal-normal-*-17-*-*-*-m-0-iso10646-1 (#x71)
Character code properties: customize what to show
name: RIGHT DOUBLE QUOTATION MARK
old-name: DOUBLE COMMA QUOTATION MARK
general-category: Pf (Punctuation, Final quote)
decomposition: (8221) ('”')
There are text properties here:
fontified t
================
so I don't know why it is not recognized as punctuation. Consequently,
it is exported verbatim (with `\$') into LaTeX, and also (obviously) C-c
C-x C-l does not fontify it. When I change ” into " (the ASCII #x22
quote), everything is ok.
My questions:
1. Isn't it a bug?
2. If not, what can I do to in my config so that it is recognized
properly?
PS. I just recalled that using \(...\) should help, and indeed it does.
Still, I'm curious about the answer to my questions (now that I
remembered a workaround, especially #1).
TIA,
--
Marcin Borkowski
http://octd.wmi.amu.edu.pl/en/Marcin_Borkowski
Adam Mickiewicz University
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Embedded LaTeX does not work with Unicode quotes
2014-11-11 21:45 Embedded LaTeX does not work with Unicode quotes Marcin Borkowski
@ 2014-11-12 6:05 ` Nick Dokos
2014-11-12 18:37 ` Marcin Borkowski
2014-11-12 19:53 ` Florian Beck
0 siblings, 2 replies; 5+ messages in thread
From: Nick Dokos @ 2014-11-12 6:05 UTC (permalink / raw)
To: emacs-orgmode
Marcin Borkowski <mbork@wmi.amu.edu.pl> writes:
> Hi list,
>
> I have this: „$n\eps\le b$”, and it seems not to be recognized as a
> LaTeX fragment. The manual says:
>
> ================
> To avoid conflicts with currency specifications, single `$' characters
> are only recognized as math delimiters if the enclosed text contains at
> most two line breaks, is directly attached to the `$' characters with no
> whitespace in between, and if the closing `$' is followed by whitespace,
> punctuation or a dash.
> ================
>
> When I C-u C-x = on the closing quote, I get
>
> ================
> ...
> syntax: . which means: punctuation
> ...
> ================
>
> so I don't know why it is not recognized as punctuation. Consequently,
> it is exported verbatim (with `\$') into LaTeX, and also (obviously) C-c
> C-x C-l does not fontify it. When I change ” into " (the ASCII #x22
> quote), everything is ok.
>
The $...$ construct is recognized by a regexp which, while complicated,
is not complicated enough to recognize everything that's marked
"punctuation" in the syntax tables. Look for org-latex-regexps in org.el
(and note that the regexp for "$" is about twice as long as the next
longest regexp - the one for "begin"). The others (for \(...\), \[...\]
and $$..$$) are fairly trivial.
> My questions:
>
> 1. Isn't it a bug?
>
Yes, probably - but looking at the regexp, I cringe: I don't want to even
try deciphering it, let alone change it - life's too short...
> 2. If not, what can I do to in my config so that it is recognized
> properly?
>
> PS. I just recalled that using \(...\) should help, and indeed it does.
> Still, I'm curious about the answer to my questions (now that I
> remembered a workaround, especially #1).
>
That is indeed the best solution.
--
Nick
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Embedded LaTeX does not work with Unicode quotes
2014-11-12 6:05 ` Nick Dokos
@ 2014-11-12 18:37 ` Marcin Borkowski
2014-11-12 19:53 ` Florian Beck
1 sibling, 0 replies; 5+ messages in thread
From: Marcin Borkowski @ 2014-11-12 18:37 UTC (permalink / raw)
To: emacs-orgmode
On 2014-11-12, at 07:05, Nick Dokos wrote:
> Marcin Borkowski <mbork@wmi.amu.edu.pl> writes:
>
>> Hi list,
>>
>> I have this: „$n\eps\le b$”, and it seems not to be recognized as a
>> LaTeX fragment. The manual says:
>>
>> ================
>> To avoid conflicts with currency specifications, single `$' characters
>> are only recognized as math delimiters if the enclosed text contains at
>> most two line breaks, is directly attached to the `$' characters with no
>> whitespace in between, and if the closing `$' is followed by whitespace,
>> punctuation or a dash.
>> ================
>>
>> When I C-u C-x = on the closing quote, I get
>>
>> ================
>> ...
>> syntax: . which means: punctuation
>> ...
>> ================
>>
>> so I don't know why it is not recognized as punctuation. Consequently,
>> it is exported verbatim (with `\$') into LaTeX, and also (obviously) C-c
>> C-x C-l does not fontify it. When I change ” into " (the ASCII #x22
>> quote), everything is ok.
>>
>
> The $...$ construct is recognized by a regexp which, while complicated,
> is not complicated enough to recognize everything that's marked
> "punctuation" in the syntax tables. Look for org-latex-regexps in org.el
> (and note that the regexp for "$" is about twice as long as the next
> longest regexp - the one for "begin"). The others (for \(...\), \[...\]
> and $$..$$) are fairly trivial.
>
>> My questions:
>>
>> 1. Isn't it a bug?
>>
>
> Yes, probably - but looking at the regexp, I cringe: I don't want to even
> try deciphering it, let alone change it - life's too short...
Ah, regex. I have no more questions...
>> 2. If not, what can I do to in my config so that it is recognized
>> properly?
>>
>> PS. I just recalled that using \(...\) should help, and indeed it does.
>> Still, I'm curious about the answer to my questions (now that I
>> remembered a workaround, especially #1).
>>
> That is indeed the best solution.
Yep.
Thanks!
--
Marcin Borkowski
http://octd.wmi.amu.edu.pl/en/Marcin_Borkowski
Adam Mickiewicz University
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Embedded LaTeX does not work with Unicode quotes
2014-11-12 6:05 ` Nick Dokos
2014-11-12 18:37 ` Marcin Borkowski
@ 2014-11-12 19:53 ` Florian Beck
2014-11-13 18:24 ` Nicolas Goaziou
1 sibling, 1 reply; 5+ messages in thread
From: Florian Beck @ 2014-11-12 19:53 UTC (permalink / raw)
To: emacs-orgmode
Nick Dokos <ndokos@gmail.com> writes:
> "punctuation" in the syntax tables. Look for org-latex-regexps in
> org.el
The line in question is
#+BEGIN_SRC emacs-lisp
("$" "\\([^$]\\|^\\)\\(\\(\\$\\([^ \r\n,;.$][^$\n\r]*?\\(\n[^$\n\r]*?\\)\\{0,2\\}[^ \r\n,.$]\\)\\$\\)\\)\\([- .,?;:'\")\000]\\|$\\)" 2 nil)
#+END_SRC
It's probably not too hard to see that the culprit is the bunch of
punctuation characters towards the end. Indeed if you change .,?;:'\"
to .,?;:'\"” -- that solves the OPs problem. However, it might be even
better to use a more general syntax, [:punct:], which matches all
punctuation (as we want). So:
#+BEGIN_SRC emacs-lisp
("$" "\\([^$]\\|^\\)\\(\\(\\$\\([^ \r\n,;.$][^$\n\r]*?\\(\n[^$\n\r]*?\\)\\{0,2\\}[^ \r\n,.$]\\)\\$\\)\\)\\([- [:punct:]\000]\\|$\\)" 2 nil)
#+END_SRC
--
Florian Beck
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Embedded LaTeX does not work with Unicode quotes
2014-11-12 19:53 ` Florian Beck
@ 2014-11-13 18:24 ` Nicolas Goaziou
0 siblings, 0 replies; 5+ messages in thread
From: Nicolas Goaziou @ 2014-11-13 18:24 UTC (permalink / raw)
To: Florian Beck; +Cc: emacs-orgmode
Hello,
Florian Beck <fb@miszellen.de> writes:
> Nick Dokos <ndokos@gmail.com> writes:
>
>> "punctuation" in the syntax tables. Look for org-latex-regexps in
>> org.el
>
> The line in question is
>
> #+BEGIN_SRC emacs-lisp
> ("$" "\\([^$]\\|^\\)\\(\\(\\$\\([^ \r\n,;.$][^$\n\r]*?\\(\n[^$\n\r]*?\\)\\{0,2\\}[^ \r\n,.$]\\)\\$\\)\\)\\([- .,?;:'\")\000]\\|$\\)" 2 nil)
> #+END_SRC
>
> It's probably not too hard to see that the culprit is the bunch of
> punctuation characters towards the end. Indeed if you change .,?;:'\"
> to .,?;:'\"” -- that solves the OPs problem. However, it might be even
> better to use a more general syntax, [:punct:], which matches all
> punctuation (as we want). So:
>
> #+BEGIN_SRC emacs-lisp
> ("$" "\\([^$]\\|^\\)\\(\\(\\$\\([^ \r\n,;.$][^$\n\r]*?\\(\n[^$\n\r]*?\\)\\{0,2\\}[^ \r\n,.$]\\)\\$\\)\\)\\([- [:punct:]\000]\\|$\\)" 2 nil)
> #+END_SRC
Actually this variable is hardly used throughout Org code base. See
org-element-latex-fragment-parser instead (which has the same "problem"
anyway).
Also, according to Elisp manual; [:punct:] is not ideal either:
`[:punct:]'
This matches any punctuation character. (At present, for multibyte
characters, it matches anything that has non-word syntax.)
There is also "\s.".
Anyway, it might be better to know exactly what kind of false positives
we want to avoid.
Regards,
--
Nicolas Goaziou
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2014-11-13 18:24 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-11-11 21:45 Embedded LaTeX does not work with Unicode quotes Marcin Borkowski
2014-11-12 6:05 ` Nick Dokos
2014-11-12 18:37 ` Marcin Borkowski
2014-11-12 19:53 ` Florian Beck
2014-11-13 18:24 ` Nicolas Goaziou
Code repositories for project(s) associated with this public inbox
https://git.savannah.gnu.org/cgit/emacs/org-mode.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).