* About multilingual documents @ 2021-05-02 20:20 Juan Manuel Macías 2021-05-03 6:58 ` Aleksandar Dimitrov 0 siblings, 1 reply; 17+ messages in thread From: Juan Manuel Macías @ 2021-05-02 20:20 UTC (permalink / raw) To: orgmode [-- Attachment #1: Type: text/plain, Size: 1865 bytes --] Hi all, I'm curious to see how other Org users deal with multilingual documents, that is, those documents (for example, philology or linguistics texts) that contain a significant number of online quotes in other languages. Naturally, this makes more sense in the LaTeX backend, since it is convenient to enclose these quotes in a \foreignlanguage command to ensure that LaTeX at least apply the correct hyphenation patterns for words in other languages. Luckily, in the latest versions of Babel (the Babel of LaTeX) you don't need to do this when it comes to languages whose script is different from Latin (e.g. Greek, languages with Cyrillic, Arabic, Hindi, etc.). We can, for example, define Russian and Greek as: #+begin_src latex \babelprovide[onchar=ids fonts,hyphenrules=russian]{russian} \babelprovide[onchar=ids fonts,hyphenrules=ancientgreek]{greek} #+end_src And also the fonts for both languages: #+begin_src latex \babelfont[russian]{rm}{Linux Libertine O} \babelfont[greek]{rm}]{Free Serif} #+end_src For Latin-based scripts it is still necessary enclose the text in the \foreignlanguage command. And now comes the question: how do Org users who work in multilingual documents to obtain this command when exporting to Latex? I usually use macros, which always tend to work fine. But lately I have been testing an alternative markup system using an export filter. The idea would be something like: %(lang) lorem ipsum dolor %() I start from a list of the most used languages: #+begin_src emacs-lisp (langs '(("en" "english") ("fr" "french") ("de" "german") ("it" "italian") ("pt" "portuguese"))) #+end_src And other possible languages that Babel supports can be indicated explicitly, by prepending "--": %(fr) ... %() %(--esperanto) ... %() (If someone wants to try it, I attach a small Org document). Best regards, Juan Manuel [-- Attachment #2: test-langs.org --] [-- Type: application/vnd.lotus-organizer, Size: 2263 bytes --] ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: About multilingual documents 2021-05-02 20:20 About multilingual documents Juan Manuel Macías @ 2021-05-03 6:58 ` Aleksandar Dimitrov 2021-05-03 17:47 ` Greg Minshall ` (2 more replies) 0 siblings, 3 replies; 17+ messages in thread From: Aleksandar Dimitrov @ 2021-05-03 6:58 UTC (permalink / raw) To: emacs-orgmode Hi Juan, this sounds very interesting to me, as I, too, mostly write in Org and, sometimes write documents in multiple languages, usually with different varieties of either Latin or Cyrillic. I have some suggestions: Apart from the export, one of my biggest gripes is flyspell. Specifically, the fact that you have to choose one language to spell check the entire document with. That is insufficient in my case. I think that the syntax you're suggesting looks good, but I'm not sure how well it'd fit into org-mode's ecosystem. I had something in mind that was closer to how org-babel works (it's called *babel* for a reason, isn't it? :D) #+begin_src org :lang pl … po polsku #+end_src #+begin_src org :lang de … auf deutsch #+end_src This would make use of org-mode's edit special environment function. It would make it easier to persuade flyspell to do the right thing. You could, perhaps, add #+LANGUAGE: en to the parent document, and then org would take care to set the correct flyspell language (and the correct macros on LaTeX-export) and change these parameters in the special environments. I'm not 100% sure it should be #+begin_src org, maybe introducing a different special environment would be better, say #+begin_lang XX where XX is the ISO-code of said language, or the locale (think en_US vs. en_GB.) The drawback, and the clear disadvantage compared to your method is that this works great only when the languages are separated by paragraph breaks. Therefore, I think our suggestions might be somewhat orthogonal. Yours could be a shorthand syntax for introducing inline foreign-language snippets. What do you think? Regards, Aleks Juan Manuel Macías writes: > Hi all, > > I'm curious to see how other Org users deal with multilingual documents, > that is, those documents (for example, philology or linguistics texts) > that contain a significant number of online quotes in other languages. > Naturally, this makes more sense in the LaTeX backend, since it is > convenient to enclose these quotes in a \foreignlanguage command to > ensure that LaTeX at least apply the correct hyphenation patterns for > words in other languages. > > Luckily, in the latest versions of Babel (the Babel of LaTeX) you don't > need to do this when it comes to languages whose script is different > from Latin (e.g. Greek, languages with Cyrillic, Arabic, Hindi, etc.). > We can, for example, define Russian and Greek as: > > #+begin_src latex > \babelprovide[onchar=ids fonts,hyphenrules=russian]{russian} > \babelprovide[onchar=ids fonts,hyphenrules=ancientgreek]{greek} > #+end_src > > And also the fonts for both languages: > > #+begin_src latex > \babelfont[russian]{rm}{Linux Libertine O} > \babelfont[greek]{rm}]{Free Serif} > #+end_src > > For Latin-based scripts it is still necessary enclose the text in the > \foreignlanguage command. And now comes the question: how do Org users > who work in multilingual documents to obtain this command when exporting > to Latex? > > I usually use macros, which always tend to work fine. But lately I have > been testing an alternative markup system using an export filter. The > idea would be something like: > > %(lang) lorem ipsum dolor %() > > I start from a list of the most used languages: > > #+begin_src emacs-lisp > (langs '(("en" "english") > ("fr" "french") > ("de" "german") > ("it" "italian") > ("pt" "portuguese"))) > #+end_src > > And other possible languages that Babel supports can be indicated > explicitly, by prepending "--": > > %(fr) ... %() > > %(--esperanto) ... %() > > (If someone wants to try it, I attach a small Org document). > > Best regards, > > Juan Manuel ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: About multilingual documents 2021-05-03 6:58 ` Aleksandar Dimitrov @ 2021-05-03 17:47 ` Greg Minshall 2021-05-04 7:30 ` Aleksandar Dimitrov 2021-05-04 8:19 ` Eric S Fraga 2021-05-03 18:48 ` About multilingual documents Joost Kremers 2021-05-03 20:33 ` Juan Manuel Macías 2 siblings, 2 replies; 17+ messages in thread From: Greg Minshall @ 2021-05-03 17:47 UTC (permalink / raw) To: Aleksandar Dimitrov; +Cc: emacs-orgmode Aleks, et al., > Apart from the export, one of my biggest gripes is > flyspell. Specifically, the fact that you have to choose one language to > spell check the entire document with. That is insufficient in my case. in case it's relevant: i also switch between languages. but, for me (maybe i'm missing something?) it means i switch input methods. so, i've code bound to (toggle-input-method) that, depending on the input method, changes the dictionary "for" that input method. this is not org-specific, but, rather, works for all my emacs buffers. cheers, Greg ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: About multilingual documents 2021-05-03 17:47 ` Greg Minshall @ 2021-05-04 7:30 ` Aleksandar Dimitrov 2021-05-04 17:09 ` Maxim Nikulin 2021-05-04 8:19 ` Eric S Fraga 1 sibling, 1 reply; 17+ messages in thread From: Aleksandar Dimitrov @ 2021-05-04 7:30 UTC (permalink / raw) To: emacs-orgmode Hi Greg, >> Apart from the export, one of my biggest gripes is >> flyspell. Specifically, the fact that you have to choose one language to >> spell check the entire document with. That is insufficient in my case. > > in case it's relevant: > > i also switch between languages. but, for me (maybe i'm missing > something?) it means i switch input methods. so, i've code bound to > (toggle-input-method) that, depending on the input method, changes the > dictionary "for" that input method. this is not org-specific, but, > rather, works for all my emacs buffers. I don't usually switch input methods. Instead I rely on the X-Server's facilities, including group toggles and XCompose. For example I use XCompose to write all languages with a Latin alphabet without having to switch layouts/input methods. Cheers, Aleks ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: About multilingual documents 2021-05-04 7:30 ` Aleksandar Dimitrov @ 2021-05-04 17:09 ` Maxim Nikulin 2021-05-04 18:55 ` Aleksandar Dimitrov 0 siblings, 1 reply; 17+ messages in thread From: Maxim Nikulin @ 2021-05-04 17:09 UTC (permalink / raw) To: emacs-orgmode On 04/05/2021 14:30, Aleksandar Dimitrov wrote: > > I don't usually switch input methods. Instead I rely on the X-Server's > facilities, including group toggles and XCompose. For example I use > XCompose to write all languages with a Latin alphabet without having to > switch layouts/input methods. You mentioned Cyrillic, and it is inconvenient to switch keyboard layout (Xkb group) for any command (C-c ...). Unfortunately keymaps in emacs are unaware of keysyms from "base" group when another group is active. On the other hand, emacs input method requires special tricks to keep emacs window (almost) always with latin keyboard layout while other applications rely on xkb. On 04/05/2021 15:19, Eric S Fraga wrote: > So, on this note, without hopefully hijacking the thread, maybe somebody > can tell me: what is the "default" input method, i.e. the one I get when > I start Emacs and haven't changed input methods at all? Default input method depend on locale. E.g. en_US.UTF-8 does not require anything special. ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: About multilingual documents 2021-05-04 17:09 ` Maxim Nikulin @ 2021-05-04 18:55 ` Aleksandar Dimitrov 2021-05-06 16:22 ` Maxim Nikulin 0 siblings, 1 reply; 17+ messages in thread From: Aleksandar Dimitrov @ 2021-05-04 18:55 UTC (permalink / raw) To: emacs-orgmode Maxim Nikulin writes: > On 04/05/2021 14:30, Aleksandar Dimitrov wrote: >> >> I don't usually switch input methods. Instead I rely on the X-Server's >> facilities, including group toggles and XCompose. For example I use >> XCompose to write all languages with a Latin alphabet without having to >> switch layouts/input methods. > > You mentioned Cyrillic, and it is inconvenient to switch keyboard layout > (Xkb group) for any command (C-c ...). Unfortunately keymaps in emacs > are unaware of keysyms from "base" group when another group is active. > On the other hand, emacs input method requires special tricks to keep > emacs window (almost) always with latin keyboard layout while other > applications rely on xkb. Yeah, I know the issue, which is why I rely on XCompose for Latin scripts. For Cyrillic, alas, that is impossible. It means that I basically can't control Emacs while using a Cyrillic layout, which is a pity. I have no good workaround. ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: About multilingual documents 2021-05-04 18:55 ` Aleksandar Dimitrov @ 2021-05-06 16:22 ` Maxim Nikulin 0 siblings, 0 replies; 17+ messages in thread From: Maxim Nikulin @ 2021-05-06 16:22 UTC (permalink / raw) To: emacs-orgmode On 05/05/2021 01:55, Aleksandar Dimitrov wrote: > Yeah, I know the issue, which is why I rely on XCompose for Latin > scripts. For Cyrillic, alas, that is impossible. It means that I > basically can't control Emacs while using a Cyrillic layout, which is a > pity. I have no good workaround. Generally, the idea is to enable layout (Xkb group) per window and to reset layout to English if active window is Emacs. I have not tried recipes with managing Xkb group from emacs itself, e.g. https://github.com/lislon/emacs-switch-lang https://github.com/Mihara/kbd-indicator.el Another approach it to set global hotkey and if Emacs is focused, send some special key event that is bound to switching of input method. I have some links but the pages are not in English. Personally, I have not fully polished my setup, however it works with some limitations. I started from bash script calling xdotool, xvkbd, and xprop. Then I realized that C code is not dramatically longer but it allows to avoid struggling with limitations of such tools. Tim Cross suggested me to raise the question concerning keymaps in emacs-devel once more, but I still do not feel that I am ready to discussion of technical aspects (e.g. hotkey handling in applications that fixed similar issues several years ago) https://orgmode.org/list/87r1lnvjh0.fsf@gmail.com ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: About multilingual documents 2021-05-03 17:47 ` Greg Minshall 2021-05-04 7:30 ` Aleksandar Dimitrov @ 2021-05-04 8:19 ` Eric S Fraga 2021-05-04 8:29 ` Input methods [was: Re: About multilingual documents] Joost Kremers 1 sibling, 1 reply; 17+ messages in thread From: Eric S Fraga @ 2021-05-04 8:19 UTC (permalink / raw) To: Greg Minshall; +Cc: emacs-orgmode On Monday, 3 May 2021 at 20:47, Greg Minshall wrote: > but, for me (maybe i'm missing something?) it means i switch input > methods. Which is what I do. So, on this note, without hopefully hijacking the thread, maybe somebody can tell me: what is the "default" input method, i.e. the one I get when I start Emacs and haven't changed input methods at all? I see no way to get back to it once I have switched to a different one. -- : Eric S Fraga via Emacs 28.0.50, Org release_9.4.5-480-g479a3d ^ permalink raw reply [flat|nested] 17+ messages in thread
* Input methods [was: Re: About multilingual documents] 2021-05-04 8:19 ` Eric S Fraga @ 2021-05-04 8:29 ` Joost Kremers 2021-05-04 9:36 ` Eric S Fraga 0 siblings, 1 reply; 17+ messages in thread From: Joost Kremers @ 2021-05-04 8:29 UTC (permalink / raw) To: Eric S Fraga; +Cc: emacs-orgmode On Tue, May 04 2021, Eric S Fraga wrote: > So, on this note, without hopefully hijacking the thread, maybe somebody > can tell me: what is the "default" input method, i.e. the one I get when > I start Emacs and haven't changed input methods at all? I see no way to > get back to it once I have switched to a different one. It's not really an input method, more like the lack of one. You're probably using =set-input-method= to change input methods? Check out =toggle-input-method=. :-) -- Joost Kremers Life has its moments ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Input methods [was: Re: About multilingual documents] 2021-05-04 8:29 ` Input methods [was: Re: About multilingual documents] Joost Kremers @ 2021-05-04 9:36 ` Eric S Fraga 0 siblings, 0 replies; 17+ messages in thread From: Eric S Fraga @ 2021-05-04 9:36 UTC (permalink / raw) To: Joost Kremers; +Cc: emacs-orgmode On Tuesday, 4 May 2021 at 10:29, Joost Kremers wrote: > It's not really an input method, more like the lack of one. You're probably > using =set-input-method= to change input methods? Check out > =toggle-input-method=. :-) Ah, interesting. A lack of input method. Kind of non-obvious. But the documentation for toggle-input-method explains it perfectly. Thank you! -- : Eric S Fraga via Emacs 28.0.50, Org release_9.4.5-480-g479a3d ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: About multilingual documents 2021-05-03 6:58 ` Aleksandar Dimitrov 2021-05-03 17:47 ` Greg Minshall @ 2021-05-03 18:48 ` Joost Kremers 2021-05-04 8:00 ` Aleksandar Dimitrov 2021-05-03 20:33 ` Juan Manuel Macías 2 siblings, 1 reply; 17+ messages in thread From: Joost Kremers @ 2021-05-03 18:48 UTC (permalink / raw) To: Aleksandar Dimitrov; +Cc: emacs-orgmode [Not directly related to the OP, but might be useful to know.] On Mon, May 03 2021, Aleksandar Dimitrov wrote: > this sounds very interesting to me, as I, too, mostly write in Org > and, sometimes write documents in multiple languages, usually with > different varieties of either Latin or Cyrillic. [...] > Apart from the export, one of my biggest gripes is > flyspell. Specifically, the fact that you have to choose one language to > spell check the entire document with. That is insufficient in my case. flyspell is basically just ispell, and ispell can be configured with different backends. One possible backend is hunspell, which allows you to set multiple dictionaries. So if you regularly use different languages in a buffer, you should give hunspell a try. [...] > The drawback, and the clear disadvantage compared to your method is that > this works great only when the languages are separated by paragraph > breaks. If that is the case, you could also check out the =guess-language= package: <https://github.com/tmalsburg/guess-language.el>. It tries to detect the language of the current paragraph and sets the ispell (and hence flyspell) dictionary accordingly. I use it because I write in three different languages, but usually don't mix them in one buffer. -- Joost Kremers Life has its moments ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: About multilingual documents 2021-05-03 18:48 ` About multilingual documents Joost Kremers @ 2021-05-04 8:00 ` Aleksandar Dimitrov 0 siblings, 0 replies; 17+ messages in thread From: Aleksandar Dimitrov @ 2021-05-04 8:00 UTC (permalink / raw) To: Joost Kremers; +Cc: emacs-orgmode Hi Joost > [Not directly related to the OP, but might be useful to know.] > > On Mon, May 03 2021, Aleksandar Dimitrov wrote: >> this sounds very interesting to me, as I, too, mostly write in Org >> and, sometimes write documents in multiple languages, usually with >> different varieties of either Latin or Cyrillic. > [...] >> Apart from the export, one of my biggest gripes is >> flyspell. Specifically, the fact that you have to choose one language to >> spell check the entire document with. That is insufficient in my case. > > flyspell is basically just ispell, and ispell can be configured with different > backends. One possible backend is hunspell, which allows you to set multiple > dictionaries. So if you regularly use different languages in a buffer, you > should give hunspell a try. > > [...] >> The drawback, and the clear disadvantage compared to your method is that >> this works great only when the languages are separated by paragraph >> breaks. > > If that is the case, you could also check out the =guess-language= package: > <https://github.com/tmalsburg/guess-language.el>. It tries to detect the > language of the current paragraph and sets the ispell (and hence flyspell) > dictionary accordingly. I use it because I write in three different languages, > but usually don't mix them in one buffer. Thanks for your hints! =guess-language= seems really cool! I also didn't know hunspell supported more than one dictionary. Thanks! Aleks ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: About multilingual documents 2021-05-03 6:58 ` Aleksandar Dimitrov 2021-05-03 17:47 ` Greg Minshall 2021-05-03 18:48 ` About multilingual documents Joost Kremers @ 2021-05-03 20:33 ` Juan Manuel Macías 2021-05-04 1:00 ` Tom Gillespie 2021-05-04 8:44 ` Aleksandar Dimitrov 2 siblings, 2 replies; 17+ messages in thread From: Juan Manuel Macías @ 2021-05-03 20:33 UTC (permalink / raw) To: Aleksandar Dimitrov; +Cc: orgmode Hi Aleksandar, Thank you very much for your interesting comments. I think your idea of applying org-babel to (multi) language support is tremendously suggestive and, of course, more org-centric. I suppose it could be applied also to languages within the paragraph by inline blocks... I really liked what you propose. Well, I admit that my marks are a bit exotic :-D. The main problem I see is that they are not as robust as Org's own marks, since they are controlled by an export filter. Doing some further tests, by the way, I think it would be better to add the filter to `org-export-filter-plain-text-functions', instead of `...final-output-functions'. I also see that it would be convenient to avoid their expansion in verbatim texts, with a `(unless (org-in-verbatim-emphasis)...)'. Anyway, I think (in general terms) it would be interesting for Org to incorporate some multilingual support and the ability to toggle between languages in a document, and the idea you propose seems to me that it makes a lot of sense. Best regards, Juan Manuel Aleksandar Dimitrov writes: > Hi Juan, > > this sounds very interesting to me, as I, too, mostly write in Org > and, sometimes write documents in multiple languages, usually with > different varieties of either Latin or Cyrillic. > > I have some suggestions: > > Apart from the export, one of my biggest gripes is > flyspell. Specifically, the fact that you have to choose one language to > spell check the entire document with. That is insufficient in my case. > > I think that the syntax you're suggesting looks good, but I'm not > sure how well it'd fit into org-mode's ecosystem. I had something in > mind that was closer to how org-babel works (it's called *babel* > for a reason, isn't it? :D) > > #+begin_src org :lang pl > … po polsku > #+end_src > > #+begin_src org :lang de > … auf deutsch > #+end_src > > > This would make use of org-mode's edit special environment function. It > would make it easier to persuade flyspell to do the right thing. You > could, perhaps, add > > #+LANGUAGE: en > > to the parent document, and then org would take care to set the correct > flyspell language (and the correct macros on LaTeX-export) and change > these parameters in the special environments. > > I'm not 100% sure it should be #+begin_src org, maybe introducing a > different special environment would be better, say #+begin_lang XX where > XX is the ISO-code of said language, or the locale (think en_US > vs. en_GB.) > > The drawback, and the clear disadvantage compared to your method is that > this works great only when the languages are separated by paragraph > breaks. > > Therefore, I think our suggestions might be somewhat orthogonal. Yours > could be a shorthand syntax for introducing inline foreign-language > snippets. > > What do you think? > > Regards, > Aleks > > Juan Manuel Macías writes: > >> Hi all, >> >> I'm curious to see how other Org users deal with multilingual documents, >> that is, those documents (for example, philology or linguistics texts) >> that contain a significant number of online quotes in other languages. >> Naturally, this makes more sense in the LaTeX backend, since it is >> convenient to enclose these quotes in a \foreignlanguage command to >> ensure that LaTeX at least apply the correct hyphenation patterns for >> words in other languages. >> >> Luckily, in the latest versions of Babel (the Babel of LaTeX) you don't >> need to do this when it comes to languages whose script is different >> from Latin (e.g. Greek, languages with Cyrillic, Arabic, Hindi, etc.). >> We can, for example, define Russian and Greek as: >> >> #+begin_src latex >> \babelprovide[onchar=ids fonts,hyphenrules=russian]{russian} >> \babelprovide[onchar=ids fonts,hyphenrules=ancientgreek]{greek} >> #+end_src >> >> And also the fonts for both languages: >> >> #+begin_src latex >> \babelfont[russian]{rm}{Linux Libertine O} >> \babelfont[greek]{rm}]{Free Serif} >> #+end_src >> >> For Latin-based scripts it is still necessary enclose the text in the >> \foreignlanguage command. And now comes the question: how do Org users >> who work in multilingual documents to obtain this command when exporting >> to Latex? >> >> I usually use macros, which always tend to work fine. But lately I have >> been testing an alternative markup system using an export filter. The >> idea would be something like: >> >> %(lang) lorem ipsum dolor %() >> >> I start from a list of the most used languages: >> >> #+begin_src emacs-lisp >> (langs '(("en" "english") >> ("fr" "french") >> ("de" "german") >> ("it" "italian") >> ("pt" "portuguese"))) >> #+end_src >> >> And other possible languages that Babel supports can be indicated >> explicitly, by prepending "--": >> >> %(fr) ... %() >> >> %(--esperanto) ... %() >> >> (If someone wants to try it, I attach a small Org document). >> >> Best regards, >> >> Juan Manuel > > -- -- ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: About multilingual documents 2021-05-03 20:33 ` Juan Manuel Macías @ 2021-05-04 1:00 ` Tom Gillespie 2021-05-04 8:13 ` Aleksandar Dimitrov 2021-05-04 8:44 ` Aleksandar Dimitrov 1 sibling, 1 reply; 17+ messages in thread From: Tom Gillespie @ 2021-05-04 1:00 UTC (permalink / raw) To: Juan Manuel Macías; +Cc: Aleksandar Dimitrov, orgmode I like Aleksandar's solution quite a bit because it also works inline e.g. as src_org[:lang de]{Meine deutsch ist zher schlect!}. In principle this means that you could leverage the org-babel and org-src buffer system to get flyspell results in that language in line as well (though I don't think transporting overlays into the original buffer has been implemented). Best! Tom ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: About multilingual documents 2021-05-04 1:00 ` Tom Gillespie @ 2021-05-04 8:13 ` Aleksandar Dimitrov 0 siblings, 0 replies; 17+ messages in thread From: Aleksandar Dimitrov @ 2021-05-04 8:13 UTC (permalink / raw) To: orgmode > I like Aleksandar's solution quite a bit because it also works inline > e.g. as src_org[:lang de]{Meine deutsch ist zher schlect!}. In > principle this means that you could leverage the org-babel and org-src > buffer system to get flyspell results in that language in line as well > (though I don't think transporting overlays into the original buffer > has been implemented). Best! Oh wow, I'm learning lots of new things today, including inline-babel in Org. I'm not sure highlighting typos in src-blocks is necessary. I think it's enough if you can see them while you're editing the block. ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: About multilingual documents 2021-05-03 20:33 ` Juan Manuel Macías 2021-05-04 1:00 ` Tom Gillespie @ 2021-05-04 8:44 ` Aleksandar Dimitrov 2021-05-06 11:11 ` Juan Manuel Macías 1 sibling, 1 reply; 17+ messages in thread From: Aleksandar Dimitrov @ 2021-05-04 8:44 UTC (permalink / raw) To: org-mode-email Hi Juan, > Thank you very much for your interesting comments. I think your idea of > applying org-babel to (multi) language support is tremendously > suggestive and, of course, more org-centric. I suppose it could be > applied also to languages within the paragraph by inline blocks... I > really liked what you propose. > > Well, I admit that my marks are a bit exotic :-D. The main problem I see > is that they are not as robust as Org's own marks, since they are > controlled by an export filter. Doing some further tests, by the way, I > think it would be better to add the filter to > `org-export-filter-plain-text-functions', instead of > `...final-output-functions'. I also see that it would be convenient to > avoid their expansion in verbatim texts, with a `(unless > (org-in-verbatim-emphasis)...)'. What I like about =org-edit-special= is that it gives you a dedicated little environment in a different language (either natural, or programming language!) This allows me to focus on the task of editing it really easily. I must admit that I find the inline org-src notation (of which I didn't know yet) somewhat jarring, and certainly less pleasant to read. Perhaps we could use a similar mechanism to =org-hide-emphasis-markers= to make it more pleasant to read. [1] > Anyway, I think (in general terms) it would be interesting for Org to > incorporate some multilingual support and the ability to toggle between > languages in a document, and the idea you propose seems to > me that it makes a lot of sense. I definitely agree that Org would benefit from more multilingual support. I'm not very experienced in emacs-lisp but would love to contribute. One problem I foresee is the translation of locales into LaTeX macros for either (LaTeX)-Babel or Polyglossia (which is what I use.) So a string like "en" or "en_UK" (which is readily understood by ([ai]|hun)spell) would have to be translated to the necessary macros. For example for Polyglossia [2] the preamble would read \setdefaultlanguage[variant=uk]{english} And then the inline commands would have to be rendered as \textenglish{…} or \textlang{english}{…} (probably the latter would be easier.) I forgot what it is for LaTeX-Babel. Note that the HTML export backend, too, could (or should) support declaring multiple languages. [3] There's a lot of work in there, but I would say that any implementation effort should focus on one thing first. That could be switching the dictionary on org-edit-special if a :lang-variable is set, or it could be re-using what you, Juan, already wrote for LaTeX-Babel exports. Support for Polyglossia or HTML could come at a later time. Cheers, Aleks [1] https://stackoverflow.com/questions/20309842/how-to-syntax-highlight-for-org-mode-inline-source-code-src-lang/28059832#28059832 [2] https://ftp.rrze.uni-erlangen.de/ctan/macros/unicodetex/latex/polyglossia/polyglossia.pdf [3] https://www.w3.org/International/questions/qa-html-language-declarations > > Best regards, > > Juan Manuel > > Aleksandar Dimitrov writes: > >> Hi Juan, >> >> this sounds very interesting to me, as I, too, mostly write in Org >> and, sometimes write documents in multiple languages, usually with >> different varieties of either Latin or Cyrillic. >> >> I have some suggestions: >> >> Apart from the export, one of my biggest gripes is >> flyspell. Specifically, the fact that you have to choose one language to >> spell check the entire document with. That is insufficient in my case. >> >> I think that the syntax you're suggesting looks good, but I'm not >> sure how well it'd fit into org-mode's ecosystem. I had something in >> mind that was closer to how org-babel works (it's called *babel* >> for a reason, isn't it? :D) >> >> #+begin_src org :lang pl >> … po polsku >> #+end_src >> >> #+begin_src org :lang de >> … auf deutsch >> #+end_src >> >> >> This would make use of org-mode's edit special environment function. It >> would make it easier to persuade flyspell to do the right thing. You >> could, perhaps, add >> >> #+LANGUAGE: en >> >> to the parent document, and then org would take care to set the correct >> flyspell language (and the correct macros on LaTeX-export) and change >> these parameters in the special environments. >> >> I'm not 100% sure it should be #+begin_src org, maybe introducing a >> different special environment would be better, say #+begin_lang XX where >> XX is the ISO-code of said language, or the locale (think en_US >> vs. en_GB.) >> >> The drawback, and the clear disadvantage compared to your method is that >> this works great only when the languages are separated by paragraph >> breaks. >> >> Therefore, I think our suggestions might be somewhat orthogonal. Yours >> could be a shorthand syntax for introducing inline foreign-language >> snippets. >> >> What do you think? >> >> Regards, >> Aleks >> >> Juan Manuel Macías writes: >> >>> Hi all, >>> >>> I'm curious to see how other Org users deal with multilingual documents, >>> that is, those documents (for example, philology or linguistics texts) >>> that contain a significant number of online quotes in other languages. >>> Naturally, this makes more sense in the LaTeX backend, since it is >>> convenient to enclose these quotes in a \foreignlanguage command to >>> ensure that LaTeX at least apply the correct hyphenation patterns for >>> words in other languages. >>> >>> Luckily, in the latest versions of Babel (the Babel of LaTeX) you don't >>> need to do this when it comes to languages whose script is different >>> from Latin (e.g. Greek, languages with Cyrillic, Arabic, Hindi, etc.). >>> We can, for example, define Russian and Greek as: >>> >>> #+begin_src latex >>> \babelprovide[onchar=ids fonts,hyphenrules=russian]{russian} >>> \babelprovide[onchar=ids fonts,hyphenrules=ancientgreek]{greek} >>> #+end_src >>> >>> And also the fonts for both languages: >>> >>> #+begin_src latex >>> \babelfont[russian]{rm}{Linux Libertine O} >>> \babelfont[greek]{rm}]{Free Serif} >>> #+end_src >>> >>> For Latin-based scripts it is still necessary enclose the text in the >>> \foreignlanguage command. And now comes the question: how do Org users >>> who work in multilingual documents to obtain this command when exporting >>> to Latex? >>> >>> I usually use macros, which always tend to work fine. But lately I have >>> been testing an alternative markup system using an export filter. The >>> idea would be something like: >>> >>> %(lang) lorem ipsum dolor %() >>> >>> I start from a list of the most used languages: >>> >>> #+begin_src emacs-lisp >>> (langs '(("en" "english") >>> ("fr" "french") >>> ("de" "german") >>> ("it" "italian") >>> ("pt" "portuguese"))) >>> #+end_src >>> >>> And other possible languages that Babel supports can be indicated >>> explicitly, by prepending "--": >>> >>> %(fr) ... %() >>> >>> %(--esperanto) ... %() >>> >>> (If someone wants to try it, I attach a small Org document). >>> >>> Best regards, >>> >>> Juan Manuel >> >> > > -- ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: About multilingual documents 2021-05-04 8:44 ` Aleksandar Dimitrov @ 2021-05-06 11:11 ` Juan Manuel Macías 0 siblings, 0 replies; 17+ messages in thread From: Juan Manuel Macías @ 2021-05-06 11:11 UTC (permalink / raw) To: Aleksandar Dimitrov; +Cc: orgmode [-- Attachment #1: Type: text/plain, Size: 4300 bytes --] Hi Aleksandar, Aleksandar Dimitrov writes: > [...] > I must admit that I find the inline org-src notation (of which I > didn't know yet) somewhat jarring, and certainly less pleasant to > read. Perhaps we could use a similar mechanism to > =org-hide-emphasis-markers= to make it more pleasant to read. [1] You may be interested in this thread: https://orgmode.org/list/87a6r6avgg.fsf@gmail.com/ > I definitely agree that Org would benefit from more multilingual > support. I'm not very experienced in emacs-lisp but would love to contribute. > > One problem I foresee is the translation of locales into LaTeX macros > for either (LaTeX)-Babel or Polyglossia (which is what I use.) So a > string like "en" or "en_UK" (which is readily understood by > ([ai]|hun)spell) would have to be translated to the necessary > macros. For example for Polyglossia [2] the preamble would read > > \setdefaultlanguage[variant=uk]{english} > > And then the inline commands would have to be rendered as > \textenglish{…} or \textlang{english}{…} (probably the latter would be easier.) Since these days I had some free time, I have written this little snippet, based on your idea. Of course, it is only a 'sketch', or a 'proof of concept'. It has obvious limitations and does not collect all the features that your idea suggests. Here I only apply the (LaTeX) Babel environments, but they can be easily substituted by those of Polyglossia [1], or add both possibilities using a defcustom. I have put two options: `:lang' and `:lang-quotes'. The second option is to use it with the csquotes package. As I have only focused on exporting to LaTeX I have not included support for html (or odt), but I agree with you that it would be necessary to add some multilingual support as well for these backends. And there's no support for inline blocks either, as the output of the variables I've added is multiline. Anyway, it is a very hasty sketch (maybe too hasty ;-)), but if you want to try it, I attach here a small test document. The code: #+begin_src emacs-lisp (defun my-lang-org-backend (lang body) (cond ((org-export-derived-backend-p org-export-current-backend 'latex) (format "@@latex:\\begin{otherlanguage}{%s}@@\n%s\n@@latex:\\end{otherlanguage}@@" lang body)) ((or (org-export-derived-backend-p org-export-current-backend 'html) (org-export-derived-backend-p org-export-current-backend 'odt)) (format "%s" body)))) (defun my-lang-csquotes-org-backend (lang body) (cond ((org-export-derived-backend-p org-export-current-backend 'latex) (format "@@latex:\\begin{otherlanguage*}{%s}\n\\EnableQuotes@@\n%s\n@@latex:\\end{otherlanguage*}@@" lang body)) ((or (org-export-derived-backend-p org-export-current-backend 'html) (org-export-derived-backend-p org-export-current-backend 'odt)) (format "%s" body)))) (defun org-babel-execute:org (body params) "Execute a block of Org code with. This function is called by `org-babel-execute-src-block'." (let ((result-params (split-string (or (cdr (assq :results params)) ""))) (lang (cdr (assq :lang params))) (lang-quotes (cdr (assq :lang-quotes params))) (body (org-babel-expand-body:org (replace-regexp-in-string "^," "" body) params))) (cond (lang (my-lang-org-backend lang body)) (lang-quotes (my-lang-csquotes-org-backend lang-quotes body)) ((member "latex" result-params) (org-export-string-as (concat "#+Title: \n" body) 'latex t)) ((member "html" result-params) (org-export-string-as body 'html t)) ((member "ascii" result-params) (org-export-string-as body 'ascii t)) (t body)))) #+end_src Best regards, Juan Manuel [1] I used Polyglossia for a while, when I migrated to XeTeX and then to LuaTeX, and babel at that time did not support both engines. But now Babel does give them full support and has grown so much that it has surpassed (IMHO) to Polyglossia. I recommend taking a look at all novelties and new functionalities that has added the current Babel maintainer, Javier Bezos: http://mirrors.ctan.org/macros/latex/required/babel/base/babel.pdf [-- Attachment #2: langs-test.org --] [-- Type: application/vnd.lotus-organizer, Size: 2120 bytes --] ^ permalink raw reply [flat|nested] 17+ messages in thread
end of thread, other threads:[~2021-05-06 16:43 UTC | newest] Thread overview: 17+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2021-05-02 20:20 About multilingual documents Juan Manuel Macías 2021-05-03 6:58 ` Aleksandar Dimitrov 2021-05-03 17:47 ` Greg Minshall 2021-05-04 7:30 ` Aleksandar Dimitrov 2021-05-04 17:09 ` Maxim Nikulin 2021-05-04 18:55 ` Aleksandar Dimitrov 2021-05-06 16:22 ` Maxim Nikulin 2021-05-04 8:19 ` Eric S Fraga 2021-05-04 8:29 ` Input methods [was: Re: About multilingual documents] Joost Kremers 2021-05-04 9:36 ` Eric S Fraga 2021-05-03 18:48 ` About multilingual documents Joost Kremers 2021-05-04 8:00 ` Aleksandar Dimitrov 2021-05-03 20:33 ` Juan Manuel Macías 2021-05-04 1:00 ` Tom Gillespie 2021-05-04 8:13 ` Aleksandar Dimitrov 2021-05-04 8:44 ` Aleksandar Dimitrov 2021-05-06 11:11 ` Juan Manuel Macías
Code repositories for project(s) associated with this public inbox https://git.savannah.gnu.org/cgit/emacs/org-mode.git This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).