emacs-orgmode@gnu.org archives
 help / color / mirror / code / Atom feed
From: "Juan Manuel Macías" <maciaschain@posteo.net>
To: Ihor Radchenko <yantar92@posteo.net>
Cc: orgmode <emacs-orgmode@gnu.org>,  Timothy <orgmode@tec.tecosaur.net>
Subject: Re: Fallback fonts in LaTeX export for non latin scripts
Date: Mon, 04 Sep 2023 22:22:25 +0000	[thread overview]
Message-ID: <87bkehshni.fsf@posteo.net> (raw)
In-Reply-To: <87jzt6weae.fsf@localhost> (Ihor Radchenko's message of "Mon, 04 Sep 2023 08:09:13 +0000")

Ihor Radchenko writes:

> Juan Manuel Macías <maciaschain@posteo.net> writes:
>
>>> #+language: ancientgreek russian arabic
>>
>> Of course, this syntax would be the most appropriate and consistent
>> within Org. The problem is LaTeX, specifically babel, and that certain
>> inconsistencies would be created with the rest of the backends. At first
>> some pitfalls come to mind:
>>
>> - The keyword #+language accepts for now only language codes (es, en,
>>   el, ar, ru, etc.). Consistency with other backends should
>>   be maintained in this regard: ancientgreek is not a valid language
>>   code, but a name that only babel understands. If we put something
>>   like (a valid language code):
>>
>>   #+language: el-polyton
>>
>>   this could be translated in babel as polutonikogreek (in the classic
>>   syntax, that is, the languages that are loaded in the options of
>>   \usepackage[options]{babel}), or, in the new syntax, ancientgreek and
>>   polytonicgreek, which are actually two different languages: the first
>>   is ancient polytonic Greek and the second modern polytonic Greek. To
>>   add more confusion to the matter, in classical babel syntax
>>   greek.ancient and greek.polytonic are also supported. But neither of
>>   these things can be deduced by simply putting el-polyton, unless
>>   breaking the consistency with the other backends.
>
> I am now working on unifying Org translation system as discussed in
> https://orgmode.org/list/87o7iw8yem.fsf@bzg.fr
> As a part of the effort, I plan to introduce a new constant that will
> unify language abbreviations across Org and also associate them with
> more human-readable names.
>
> (defconst org-language-abbrevs
>   '(("am".  "Amharic")
>     ("ar" . "Arabic")
>     ("ast" . "Asturian")
>     ("bg" . "Bulgarian")
>     ("bn" . "Bengali")
>     ...))
>
> The idea is to allow
>
> #+language: Austrian German, Greek
> as a valid specifier, in addition to
>
> #+language: de-at, el
>
> Then, across Org, we will make use of the standardized language
> abbreviations.

Great! I think it's great news. Yes, I agree with what you say below. I
think Org should move towards a multilingual support that is 100% native
to Org. That is, Org had its own "selectlanguage" mechanism, to be able
to delimit text segments in other languages and have control over them,
both within Org and when exporting to the different backends. That
scenario seems very desirable to me, and I would like to contribute my
help to the best of my ability (and time).

In LaTeX, as I mentioned, things are complicated. There is Babel and
Polyglossia, and there is LuaTeX and XeTeX. In addition, there is also
pdfTeX, which is still the default engine and (to be honest) is the
engine used by a high percentage of LaTeX users. Although perhaps things
will change soon to the detriment of LuaTeX. Both babel and polyglossia
could be supported, but that means more work, more code, and more
complications. And we are not sure that polyglossia is no longer
maintained. After all, babel is the official LaTeX package for language
support, and polyglossia appeared at a time when babel had no support
for the new unicode engines. Now Babel supports all of that and is much
more powerful, but its interface has also grown in complexity. There is
the problem of the double syntax for loading languages: the old one,
which loads traditional ldf files, and the modern one (\babelprovide),
which loads languages using ini files. It is more powerful, with more
options, but has added more verbosity to babel. I have taken advantage
of \babelprovide, specifically its onchar=id fonts property, to
automatically apply fonts to non-Latin scripts.

>> I like this idea, but with the exception that in the two examples you
>> give the user is declaring two fonts for both languages. In my example
>> there was also Arabic, where the default font for the Arabic script is
>> used.
>
> My idea was that
>
> #+language: ancientgreek russian arabic
>
> implies "use default font for arabic", unless #+latex_font is specified.

This seems the most consistent to me for Org, but, as I mentioned in the
other email, I have some concerns. Currently, what we are talking about
is simply font support for non-Latin languages. If it is allowed, in the
current state of things, that #+language can accept a list of language
names, we can give the user a wrong perception of reality. That is:
multilingual support that does not exist as such. It is more like font
support for non-Latin languages. And only in LaTeX, and specifically in
LuaLaTeX. Furthermore, the user could mix languages that in Babel are
loaded through ldf and others through ini files. For example, something
like this:

#+language: spanish, english, french, russian

in Babel it would be:

\usepackage[english,french,spanish]{babel}

and here we need babelprovide for the font (and load Russian via ini
file):

\babelprovide[onchar=id fonts, import]{russian}
\babelfont[russian]{rm}[options]{somefont}

Org would have to discern which name refers to a non-Latin language
(which wouldn't be complicated with the functionality you're working on)
and then apply the default font by adding a line with \babelprovide.

Of course, English, French and Spanish can also be loaded via ini files:

\babelprovide[main,import]{spanish}
\babelprovide[import]{french}
\babelprovide[import]{english}

Even babel also supports:

\usepackage[english,french,spanish,provide*=*]{babel}

but in that line we cannot put Russian with onchar, etc. And then there
is pdfTeX, where only the classic babel syntax is allowed, without any
"*provide".

In short, I find everything very confusing. I am not opposed to doing it
as you propose (in fact, it is the option I like the most, especially
when org is polyglot in the future), but I also want to warn of possible
complications.

Therefore, since we are, for now, with fonts for non-Latin languages, I
think it should be made clear that the keyword is about fonts (and about
LuaLaTeX). Maybe through two keywords:

#+lualatex_fonts_for: language(s)
#+lualatex_fonts[language(s)]: "font" options

?

I think it's ugly, but I can't think of anything else.

By the way, and as a side note, is it currently possible in Org to
define a keyword within :options-alist of the style #+foo[anything] or
would something like org-collect-keywords have to be modified?

-- 
Juan Manuel Macías

https://juanmanuelmacias.com

https://lunotipia.juanmanuelmacias.com

https://gnutas.juanmanuelmacias.com


  reply	other threads:[~2023-09-04 22:23 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-08-30  8:25 Fallback fonts in LaTeX export for non latin scripts Juan Manuel Macías
2023-08-31  8:17 ` Ihor Radchenko
2023-08-31 11:42   ` Juan Manuel Macías
2023-09-01  9:18     ` Ihor Radchenko
2023-09-02 21:39       ` Juan Manuel Macías
2023-09-03  7:22         ` Ihor Radchenko
2023-09-03 11:05           ` Juan Manuel Macías
2023-09-04  8:09             ` Ihor Radchenko
2023-09-04 22:22               ` Juan Manuel Macías [this message]
2023-09-05 10:44                 ` Ihor Radchenko
2023-09-20 14:03                   ` Juan Manuel Macías
2023-09-21  9:00                     ` Ihor Radchenko
2023-09-24 18:24                       ` Juan Manuel Macías
2023-09-26 10:37                         ` Ihor Radchenko
2023-09-05 16:42                 ` Max Nikulin
2023-09-05 18:33                   ` Juan Manuel Macías
2023-09-06  9:29                     ` Ihor Radchenko
2023-09-06 14:58                       ` Juan Manuel Macías
2023-09-07 10:22                         ` Ihor Radchenko
2023-09-07 12:04                           ` Juan Manuel Macías
2023-09-08  7:42                             ` Ihor Radchenko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.orgmode.org/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87bkehshni.fsf@posteo.net \
    --to=maciaschain@posteo.net \
    --cc=emacs-orgmode@gnu.org \
    --cc=orgmode@tec.tecosaur.net \
    --cc=yantar92@posteo.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).