emacs-orgmode@gnu.org archives
 help / color / mirror / code / Atom feed
From: Aleksandar Dimitrov <mail@aleks.bg>
To: org-mode-email <emacs-orgmode@gnu.org>
Subject: Re: About multilingual documents
Date: Tue, 04 May 2021 10:44:43 +0200	[thread overview]
Message-ID: <87bl9qvqf8.fsf@aleks.bg> (raw)
In-Reply-To: <874kfjtv5u.fsf@posteo.net>

Hi Juan,

> Thank you very much for your interesting comments. I think your idea of
> applying org-babel to (multi) language support is tremendously
> suggestive and, of course, more org-centric. I suppose it could be
> applied also to languages within the paragraph by inline blocks... I
> really liked what you propose.
> Well, I admit that my marks are a bit exotic :-D. The main problem I see
> is that they are not as robust as Org's own marks, since they are
> controlled by an export filter. Doing some further tests, by the way, I
> think it would be better to add the filter to
> `org-export-filter-plain-text-functions', instead of
> `...final-output-functions'. I also see that it would be convenient to
> avoid their expansion in verbatim texts, with a `(unless
> (org-in-verbatim-emphasis)...)'.

What I like about =org-edit-special= is that it gives you a dedicated
little environment in a different language (either natural, or
programming language!) This allows me to focus on the task of editing it
really easily.

I must admit that I find the inline org-src notation (of which I didn't
know yet) somewhat jarring, and certainly less pleasant to read. Perhaps
we could use a similar mechanism to =org-hide-emphasis-markers= to make
it more pleasant to read. [1]

> Anyway, I think (in general terms) it would be interesting for Org to
> incorporate some multilingual support and the ability to toggle between
> languages in a document, and the idea you propose seems to
> me that it makes a lot of sense.

I definitely agree that Org would benefit from more multilingual
support. I'm not very experienced in emacs-lisp but would love to contribute.

One problem I foresee is the translation of locales into LaTeX macros
for either (LaTeX)-Babel or Polyglossia (which is what I use.) So a
string like "en" or "en_UK" (which is readily understood by
([ai]|hun)spell) would have to be translated to the necessary
macros. For example for Polyglossia [2] the preamble would read


And then the inline commands would have to be rendered as
\textenglish{…} or \textlang{english}{…} (probably the latter would be easier.)

I forgot what it is for LaTeX-Babel.

Note that the HTML export backend, too, could (or should) support
declaring multiple languages. [3]

There's a lot of work in there, but I would say that any implementation
effort should focus on one thing first. That could be switching the
dictionary on org-edit-special if a :lang-variable is set, or it could
be re-using what you, Juan, already wrote for LaTeX-Babel
exports. Support for Polyglossia or HTML could come at a later time.


[1] https://stackoverflow.com/questions/20309842/how-to-syntax-highlight-for-org-mode-inline-source-code-src-lang/28059832#28059832
[2] https://ftp.rrze.uni-erlangen.de/ctan/macros/unicodetex/latex/polyglossia/polyglossia.pdf
[3] https://www.w3.org/International/questions/qa-html-language-declarations

> Best regards,
> Juan Manuel 
> Aleksandar Dimitrov writes:
>> Hi Juan,
>> this sounds very interesting to me, as I, too, mostly write in Org
>> and, sometimes write documents in multiple languages, usually with
>> different varieties of either Latin or Cyrillic.
>> I have some suggestions:
>> Apart from the export, one of my biggest gripes is
>> flyspell. Specifically, the fact that you have to choose one language to
>> spell check the entire document with. That is insufficient in my case.
>> I think that the syntax you're suggesting looks good, but I'm not
>> sure how well it'd fit into org-mode's ecosystem. I had something in
>> mind that was closer to how org-babel works (it's called *babel*
>> for a reason, isn't it? :D)
>> #+begin_src org :lang pl
>>   … po polsku
>> #+end_src
>> #+begin_src org :lang de
>>   … auf deutsch
>> #+end_src
>> This would make use of org-mode's edit special environment function. It
>> would make it easier to persuade flyspell to do the right thing. You
>> could, perhaps, add
>> #+LANGUAGE: en
>> to the parent document, and then org would take care to set the correct
>> flyspell language (and the correct macros on LaTeX-export) and change
>> these parameters in the special environments.
>> I'm not 100% sure it should be #+begin_src org, maybe introducing a
>> different special environment would be better, say #+begin_lang XX where
>> XX is the ISO-code of said language, or the locale (think en_US
>> vs. en_GB.)
>> The drawback, and the clear disadvantage compared to your method is that
>> this works great only when the languages are separated by paragraph
>> breaks.
>> Therefore, I think our suggestions might be somewhat orthogonal. Yours
>> could be a shorthand syntax for introducing inline foreign-language
>> snippets.
>> What do you think?
>> Regards,
>> Aleks
>> Juan Manuel Macías writes:
>>> Hi all,
>>> I'm curious to see how other Org users deal with multilingual documents,
>>> that is, those documents (for example, philology or linguistics texts)
>>> that contain a significant number of online quotes in other languages.
>>> Naturally, this makes more sense in the LaTeX backend, since it is
>>> convenient to enclose these quotes in a \foreignlanguage command to
>>> ensure that LaTeX at least apply the correct hyphenation patterns for
>>> words in other languages.
>>> Luckily, in the latest versions of Babel (the Babel of LaTeX) you don't
>>> need to do this when it comes to languages whose script is different
>>> from Latin (e.g. Greek, languages with Cyrillic, Arabic, Hindi, etc.).
>>> We can, for example, define Russian and Greek as:
>>> #+begin_src latex
>>> \babelprovide[onchar=ids fonts,hyphenrules=russian]{russian}
>>> \babelprovide[onchar=ids fonts,hyphenrules=ancientgreek]{greek}
>>> #+end_src
>>> And also the fonts for both languages:
>>> #+begin_src latex
>>> \babelfont[russian]{rm}{Linux Libertine O}
>>> \babelfont[greek]{rm}]{Free Serif}
>>> #+end_src
>>> For Latin-based scripts it is still necessary enclose the text in the
>>> \foreignlanguage command. And now comes the question: how do Org users
>>> who work in multilingual documents to obtain this command when exporting
>>> to Latex?
>>> I usually use macros, which always tend to work fine. But lately I have
>>> been testing an alternative markup system using an export filter. The
>>> idea would be something like:
>>> %(lang) lorem ipsum dolor %()
>>> I start from a list of the most used languages:
>>> #+begin_src emacs-lisp
>>> (langs '(("en" "english")
>>> 	 ("fr" "french")
>>> 	 ("de" "german")
>>> 	 ("it" "italian")
>>> 	 ("pt" "portuguese")))
>>> #+end_src
>>> And other possible languages that Babel supports can be indicated
>>> explicitly, by prepending "--":
>>> %(fr) ... %()
>>> %(--esperanto) ... %()
>>> (If someone wants to try it, I attach a small Org document).
>>> Best regards,
>>> Juan Manuel
> -- 

  parent reply	other threads:[~2021-05-04  8:46 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-05-02 20:20 Juan Manuel Macías
2021-05-03  6:58 ` Aleksandar Dimitrov
2021-05-03 17:47   ` Greg Minshall
2021-05-04  7:30     ` Aleksandar Dimitrov
2021-05-04 17:09       ` Maxim Nikulin
2021-05-04 18:55         ` Aleksandar Dimitrov
2021-05-06 16:22           ` Maxim Nikulin
2021-05-04  8:19     ` Eric S Fraga
2021-05-04  8:29       ` Input methods [was: Re: About multilingual documents] Joost Kremers
2021-05-04  9:36         ` Eric S Fraga
2021-05-03 18:48   ` About multilingual documents Joost Kremers
2021-05-04  8:00     ` Aleksandar Dimitrov
2021-05-03 20:33   ` Juan Manuel Macías
2021-05-04  1:00     ` Tom Gillespie
2021-05-04  8:13       ` Aleksandar Dimitrov
2021-05-04  8:44     ` Aleksandar Dimitrov [this message]
2021-05-06 11:11       ` Juan Manuel Macías
2021-05-04 11:43 autofrettage

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:

  List information: https://www.orgmode.org/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87bl9qvqf8.fsf@aleks.bg \
    --to=mail@aleks.bg \
    --cc=emacs-orgmode@gnu.org \


* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).