* Fallback fonts in LaTeX export for non latin scripts @ 2023-08-30 8:25 Juan Manuel Macías 2023-08-31 8:17 ` Ihor Radchenko 0 siblings, 1 reply; 21+ messages in thread From: Juan Manuel Macías @ 2023-08-30 8:25 UTC (permalink / raw) To: orgmode The Unicode TeX engines, LuaTeX and XeTeX, have certain features to apply fonts to scripts (Greek, Cyrillic, Arabic, etc.), without the need to switch fonts explicitly. But LaTeX does not include any functionality for loading 'fallback fonts' out of the box. Seeing things from TeX and LaTeX this is understandable: since LaTeX is a typographic tool, the user has the responsibility of choosing the fonts and knowing which fonts to use. But from the Org side things may look different, as the average user (who may not be interested in typographical or font complexities) is looking for immediate readability of their texts when exporting to any format. We know that, when exporting to LaTeX, this does not always happen, if texts include non-Latin scripts. These days I'm working on some experimental code to try to provide Org with some sort of fallbacks fonts on LaTeX export. The functionality would (for now) be linked to LuaTeX + babel package, since XeTeX, although it has the ucharclasses package, is more limited. The idea is to start from a defcustom that is an alist where each element has the structure (script font). There would also be a default script + font, for example ("latin" "Linux Libertine"). At the moment it would only work for the default roman font, but it can be extended to default sans serif, mono, etc. The functionality would not be activated by default. When activated, it also enables LuaTeX as the default LaTeX engine, and on each export a list of non-latin scripts in the buffer is extracted. Perhaps with some code like this, which checks for any non-latin characters: (let ((scripts)) (save-excursion (goto-char (point-min)) (while (re-search-forward "\\([^\u0000-\u007F\u0080-\u00FF\u0100-\u017F]\\)" nil t) (let ((script (aref char-script-table (string-to-char (match-string 1))))) (add-to-list 'scripts script) (setq script-list scripts)))) script-list) ? Once the list has been extracted, an ad hoc preamble would be formatted assigning each script the chosen font. WDYT? Do you think this would be a viable path? I think that in a few days I can offer something usable for discussion. Best regards, Juan Manuel -- Juan Manuel Macías ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Fallback fonts in LaTeX export for non latin scripts 2023-08-30 8:25 Fallback fonts in LaTeX export for non latin scripts Juan Manuel Macías @ 2023-08-31 8:17 ` Ihor Radchenko 2023-08-31 11:42 ` Juan Manuel Macías 0 siblings, 1 reply; 21+ messages in thread From: Ihor Radchenko @ 2023-08-31 8:17 UTC (permalink / raw) To: Juan Manuel Macías, Timothy; +Cc: orgmode Juan Manuel Macías <maciaschain@posteo.net> writes: > These days I'm working on some experimental code to try to provide Org > with some sort of fallbacks fonts on LaTeX export. The functionality > would (for now) be linked to LuaTeX + babel package, since XeTeX, > although it has the ucharclasses package, is more limited. Thanks! That would be a welcome addition. > The idea is to start from a defcustom that is an alist where each element > has the structure (script font). There would also be a default script + > font, for example ("latin" "Linux Libertine"). At the moment it would > only work for the default roman font, but it can be extended to default > sans serif, mono, etc. Are the fonts you have in mind shipped with LuaTeX distribution? > The functionality would not be activated by default. When activated, it > also enables LuaTeX as the default LaTeX engine, and on each export a > list of non-latin scripts in the buffer is extracted. Perhaps with > some code like this, which checks for any non-latin characters: > > (let ((scripts)) > (save-excursion > (goto-char (point-min)) > (while > (re-search-forward "\\([^\u0000-\u007F\u0080-\u00FF\u0100-\u017F]\\)" nil t) > (let ((script (aref char-script-table > (string-to-char (match-string 1))))) > (add-to-list 'scripts script) > (setq script-list scripts)))) > script-list) > > ? > > Once the list has been extracted, an ad hoc preamble would be formatted > assigning each script the chosen font. > > WDYT? Do you think this would be a viable path? I think that in a few > days I can offer something usable for discussion. Adding Timothy to CC. His WIP conditional preamble branch looks suitable to add the proposed functionality. What will happen if LuaTeX is not installed on the system? Also, just to double check, is LuaTeX fully compatible to LaTeX? That is, if we have an existing org file using LaTeX-specific commands and packages, will it work with LuaTeX? -- Ihor Radchenko // yantar92, Org mode contributor, Learn more about Org mode at <https://orgmode.org/>. Support Org development at <https://liberapay.com/org-mode>, or support my work at <https://liberapay.com/yantar92> ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Fallback fonts in LaTeX export for non latin scripts 2023-08-31 8:17 ` Ihor Radchenko @ 2023-08-31 11:42 ` Juan Manuel Macías 2023-09-01 9:18 ` Ihor Radchenko 0 siblings, 1 reply; 21+ messages in thread From: Juan Manuel Macías @ 2023-08-31 11:42 UTC (permalink / raw) To: Ihor Radchenko; +Cc: Timothy, orgmode Ihor Radchenko writes: > Juan Manuel Macías <maciaschain@posteo.net> writes: >> The idea is to start from a defcustom that is an alist where each element >> has the structure (script font). There would also be a default script + >> font, for example ("latin" "Linux Libertine"). At the moment it would >> only work for the default roman font, but it can be extended to default >> sans serif, mono, etc. > > Are the fonts you have in mind shipped with LuaTeX distribution? Yes, in fact the complete installation of TeX live includes a wide catalog of free opentype fonts with good coverage for non-Latin scripts. Added to that, more free (as in freedom) easily accessible fonts can be recommended. Even many GNU/Linux distros already include them. In any case, the fonts issue is the most delicate part. What default fonts to add to the list? Here the user's taste or preferences would influence. It must also be taken into account that if one has typographical scruples, not all fonts match each other. For design purposes, I mean. The Computer Modern, which is a modern style font (similar to the Didot or Bodoni), does not usually pair well with (for example) a Garamond, which is in the Renaissance style. That's why I think the best solution would be to offer a basic defcustom, based on the purely utilitarian, and let the user modify or extend it according to their taste, preferences or convenience. Another thing to keep in mind is the following. Offering basic readability based on the unicode scripts means that we rely on scripts and not languages. For example, the Cyrillic script covers several languages, as you well know: Russian, Bulgarian, etc. The Latin script is used for languages as diverse as English or Vietnamese. The choice of font based on the script is a low-level LuaTeX functionality, that is, it does not add features specific to each language, such as hyphenation patterns. This means that long texts in (for example) Cyrillic or Greek are not justified well because LaTeX does not know how hyphenate them: https://i.imgur.com/PSja3x2.png However, this may be sufficient for documents containing words or small texts in non latin scripts, rather than long texts. There is another possibility that I am working on in parallel: relying on languages instead of scripts. This would add both readability and support for each particular language. There could be two options for the user: a basic one (the low level one, based on scripts: ensures readability but the document may not look pretty) and an advanced one, based on language support. Something like this occurred to me: #+LaTeX_Header: % !enable-fonts-for ancientgreek russian:Old Standard arabic This means: enable default fonts for ancient Greek and Arabic (associated with Greek and Arabic scripts). For Russian, enable the Old Standard font (included in TeX live). And in the case of Arabic, enable 'bidi' (bidirectional text). If the user added that line it would be enough to do the magic. I hope :-) >> The functionality would not be activated by default. When activated, it >> also enables LuaTeX as the default LaTeX engine, and on each export a >> list of non-latin scripts in the buffer is extracted. Perhaps with >> some code like this, which checks for any non-latin characters: >> >> (let ((scripts)) >> (save-excursion >> (goto-char (point-min)) >> (while >> (re-search-forward "\\([^\u0000-\u007F\u0080-\u00FF\u0100-\u017F]\\)" nil t) >> (let ((script (aref char-script-table >> (string-to-char (match-string 1))))) >> (add-to-list 'scripts script) >> (setq script-list scripts)))) >> script-list) >> >> ? >> >> Once the list has been extracted, an ad hoc preamble would be formatted >> assigning each script the chosen font. >> >> WDYT? Do you think this would be a viable path? I think that in a few >> days I can offer something usable for discussion. > > Adding Timothy to CC. His WIP conditional preamble branch looks suitable > to add the proposed functionality. Great! > What will happen if LuaTeX is not installed on the system? Yes, there should be some kind of warning. Also it's not just LuaTeX, but certain packages for fonts and multilingual support. The problem is that the different versions of TeX live cooked in the distros usually name these packages differently. This is another added problem... Arch or Gentoo offer a more vanilla TeX live. > Also, just to double check, is LuaTeX fully compatible to LaTeX? That > is, if we have an existing org file using LaTeX-specific commands and > packages, will it work with LuaTeX? Yes, it is fully compatible, except that LuaLaTeX does not need to load the fontenc or inputenc packages. LuaTeX is intended to be the natural replacement for pdfTeX. The latest edition of The LaTeX Companion is already very focused on LuaTeX. And 90% of the new LaTeX packages that are uploaded to CTAN only work in LuaLaTeX. One of the essential advantages of LuaTeX is that TeX now (finally!) has a simple scripting language. With a little Lua you can achieve very low level things in TeX that were horribly complicated in 'pure TeX'. -- Juan Manuel Macías https://juanmanuelmacias.com https://lunotipia.juanmanuelmacias.com https://gnutas.juanmanuelmacias.com ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Fallback fonts in LaTeX export for non latin scripts 2023-08-31 11:42 ` Juan Manuel Macías @ 2023-09-01 9:18 ` Ihor Radchenko 2023-09-02 21:39 ` Juan Manuel Macías 0 siblings, 1 reply; 21+ messages in thread From: Ihor Radchenko @ 2023-09-01 9:18 UTC (permalink / raw) To: Juan Manuel Macías; +Cc: Timothy, orgmode Juan Manuel Macías <maciaschain@posteo.net> writes: > ...In any > case, the fonts issue is the most delicate part. What default fonts to > add to the list? Here the user's taste or preferences would influence. Commonly available libre fonts look like a good candidate. > It must also be taken into account that if one has typographical > scruples, not all fonts match each other. For design purposes, I mean. > The Computer Modern, which is a modern style font (similar to the Didot > or Bodoni), does not usually pair well with (for example) a Garamond, > which is in the Renaissance style. That's why I think the best solution > would be to offer a basic defcustom, based on the purely utilitarian, > and let the user modify or extend it according to their taste, > preferences or convenience. +1. > Another thing to keep in mind is the following. Offering basic > readability based on the unicode scripts means that we rely on scripts > and not languages. For example, the Cyrillic script covers several > languages, as you well know: Russian, Bulgarian, etc. The Latin script > is used for languages as diverse as English or Vietnamese. The choice of > font based on the script is a low-level LuaTeX functionality, that is, > it does not add features specific to each language, such as hyphenation > patterns. This means that long texts in (for example) Cyrillic or Greek > are not justified well because LaTeX does not know how hyphenate them: > ... > There is another possibility that I am working on in parallel: relying > on languages instead of scripts. This would add both readability and > support for each particular language. There could be two options for the > user: a basic one (the low level one, based on scripts: ensures > readability but the document may not look pretty) and an advanced one, > based on language support. Something like this occurred to me: > > #+LaTeX_Header: % !enable-fonts-for ancientgreek russian:Old Standard > arabic We already have #+language keyword and `org-latex-guess-babel-language'/`org-latex-guess-polyglossia-language'. May as well have default fonts for a given language. As for multiple languages, do we actually support this? >> What will happen if LuaTeX is not installed on the system? > > Yes, there should be some kind of warning. Also it's not just LuaTeX, > but certain packages for fonts and multilingual support. The problem is > that the different versions of TeX live cooked in the distros > usually name these packages differently. This is another added problem... > Arch or Gentoo offer a more vanilla TeX live. We might use `org-latex-known-warnings'. >> Also, just to double check, is LuaTeX fully compatible to LaTeX? That >> is, if we have an existing org file using LaTeX-specific commands and >> packages, will it work with LuaTeX? > > Yes, it is fully compatible, except that LuaLaTeX does not need to load > the fontenc or inputenc packages. LuaTeX is intended to be the natural > replacement for pdfTeX. The latest edition of The LaTeX Companion is > already very focused on LuaTeX. And 90% of the new LaTeX packages that > are uploaded to CTAN only work in LuaLaTeX. One of the essential > advantages of LuaTeX is that TeX now (finally!) has a simple scripting > language. With a little Lua you can achieve very low level things in TeX > that were horribly complicated in 'pure TeX'. Then, we might even consider LuaTeX as the new default for `org-latex-compiler'. -- Ihor Radchenko // yantar92, Org mode contributor, Learn more about Org mode at <https://orgmode.org/>. Support Org development at <https://liberapay.com/org-mode>, or support my work at <https://liberapay.com/yantar92> ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Fallback fonts in LaTeX export for non latin scripts 2023-09-01 9:18 ` Ihor Radchenko @ 2023-09-02 21:39 ` Juan Manuel Macías 2023-09-03 7:22 ` Ihor Radchenko 0 siblings, 1 reply; 21+ messages in thread From: Juan Manuel Macías @ 2023-09-02 21:39 UTC (permalink / raw) To: orgmode; +Cc: Ihor Radchenko, Timothy [-- Attachment #1: Type: text/plain, Size: 1551 bytes --] Finally I can upload some usable code here, in this case to be able to load and manage fonts for languages with non-Latin scripts, through babel and fontspec (in LuaLaTeX). It is an attempt to simplify from Org the multiform syntax of babel + fontspec. Of course, it is more limited, but for regular use I think it may be enough. Since this code is mostly a proof of concept and the names of many things (and the things themselves) are still tentative, I thought it would be more useful to attach it in an *.el file, rather than a regular patch. Loading that file everything should work fine. I also attach an org document with some examples of use. In any case, there are more explanations inside the .el file. One of the big problems I have encountered when trying to create a "(LaTeX) Babel interface in Org" is the *horrible* multiplicity that Babel has for language names. That is the reason for the :babel-alt property in 'org-latex-language-alist', which collects the names that babel supports for \babelprovide, which are not always the same as the 'classic' babel syntax. Finally, I find this way more useful (that is, loading fonts with language support), instead of a fallback font system based only on the Unicode scripts. It is less 'automatic', but more precise, and it also does not require much 'specialized' intervention on the part of the user. Best regards, -- Juan Manuel Macías https://juanmanuelmacias.com https://lunotipia.juanmanuelmacias.com https://gnutas.juanmanuelmacias.com [-- Attachment #2: test-lang.org --] [-- Type: application/vnd.lotus-organizer, Size: 2866 bytes --] [-- Attachment #3: unicode-font-support.el --] [-- Type: text/plain, Size: 8721 bytes --] ;; -*- lexical-binding: t; -*- ;; A proof of concept for Unicode font support in LaTeX export, using ;; babel and fontspec, with luatex as the default compiler. ;; Use example: ;; It is not necessary to load languages with non-Latin alphabet in babel options: ;; #+LaTeX_Header: \usepackage[AUTO]{babel} ;; Languages and fonts (there may be multiple lines): ;; #+LaTeX_Header: % !enable-fonts-for ancientgreek:Linux Libertine O(Scale=MatchLowercase) ;; #+LaTeX_Header: % !enable-fonts-for russian:FreeSerif(Numbers=Lowercase,Color=blue) :: arabic ;; Explanation: ;; - lang = enable default font for lang ;; - lang:font = enable font for lang in current document ;; - lanf:font(options) = enable font for lang in this document with options ;; - :: = separator ;; code ;; This is supposed to be a defcustom. (setq org-latex-uc-fonts-support t) ;; A mini version of `org-latex-language-alist', for this proof of ;; concept. Babel uses various names for languages. The ones that ;; interest us here are those collected in `:babel-alt', which is ;; always a list. The names sometimes match the `classic' babel name ;; and other times they don't. And in the case of "el-polyton" there ;; are two possible names. For a list of these names see: ;; [[https://CTAN/macros/latex/required/babel/base/babel.pdf]], ;; p. 22. (defconst org-latex-language-alist '(("en" :babel "american" :babel-alt ("english-unitedstates") :polyglossia "english" :polyglossia-variant "usmax" :lang-name "English" :script "latin" :code "latn") ("ar" :babel "arabic" :babel-alt ("arabic") :polyglossia "arabic" :lang-name "Arabic" :script "arabic" :code "arab") ("el" :babel "greek" :babel-alt ("greek") :polyglossia "greek" :lang-name "Greek" :script "greek" :code "grk") ("el-polyton" :babel "polutonikogreek" :babel-alt ("ancientgreek" "polytonicgreek") :polyglossia "greek" :polyglossia-variant "polytonic" :lang-name "Polytonic Greek" :script "greek" :code "grk") ("ru" :babel "russian" :babel-alt ("russian") :polyglossia "russian" :lang-name "Russian" :script "cyrillic" :code "cyrl")) "TODO") ;; This is supposed to be a defcustom for the main fonts. `'default' ;; means 'use the main default fonts'. Otherwise, the value must be ;; a plist. Valid props. are: ;; - :main = roman font ;; - :sans = sans font ;; - :mono = mono font ;; - :math = math font ;; - :...-options = font options ;; For the font options and the fontspec package syntax, see ;; [[https://CTAN/macros/unicodetex/latex/fontspec/fontspec.pdf]] (setq org-latex-uc-fonts-support-default-main-fonts '(:main "FreeSerif" :mono "inconsolatan" :mono-options "Scale=0.95")) ;; This is supposed to be a defcustom. Each element has the structure: ;; script - font - (optional) font options (setq org-latex-uc-fonts-support-default-scripts-fonts '(("greek" "Linux Libertine") ("cyrillic" "Old Standard") ("arabic" "FreeSerif"))) ;; Get main fonts (declared in ;; `org-latex-uc-fonts-support-default-main-fonts') (defun org-latex-uc-fonts-support-get-main-fonts (plist prop) (let ((format)) (if (not (plist-member plist prop)) (ignore) (let* ((value (plist-get plist prop)) (prop-name (replace-regexp-in-string ":" "" (symbol-name prop))) (options (plist-get plist (intern (format ":%s-options" prop-name))))) (setq format (format "\\\\set%sfont{%s}[%s]" prop-name value (if options options "") )))) format)) ;; get non latin fonts explicitly added (defun org-latex-uc-fonts-support-get-fonts-other-languages (header) (interactive) (let ((format-str) (lines)) (with-temp-buffer (insert header) (save-excursion (goto-char (point-min)) (while (re-search-forward "%\s+!enable-fonts-for\s+\\(.+\\)" nil t) (add-to-list 'lines (match-string 1))))) (let* ((lines-list (mapcar (lambda (x) (split-string x "::")) lines)) (flat (flatten-list lines-list)) (format-list (mapcar (lambda (x) (org-latex-uc-fonts-support-format-font-for-language (string-trim x))) flat))) (setq format-str (mapconcat #'identity format-list "\n\n"))) format-str)) ;; format each lang/font (defun org-latex-uc-fonts-support-format-font-for-language (lang) (let* ((regexp "\\([^:]+\\):*\\([^()]*\\)(*\\([^()]*\\))*") (lang-name (when (string-match regexp lang) (match-string 1 lang))) (lang-explicit-font (when (string-match regexp lang) (match-string 2 lang))) (lang-explicit-font-opts (when (string-match regexp lang) (match-string 3 lang))) (lang-alias (let ((candidato)) (mapc (lambda (x) (when (member :babel-alt x) (let* ((plist (cdr x)) (babel-alt (plist-get plist :babel-alt))) (when (member lang-name babel-alt) (setq candidato (car x)))))) org-latex-language-alist) candidato)) (plist (cdr (assoc lang-alias org-latex-language-alist))) (script (plist-get plist :script)) (default-script-font (assoc script org-latex-uc-fonts-support-default-scripts-fonts)) (default-font (nth 1 default-script-font)) (default-font-options (nth 2 default-script-font)) (default-font-options? (if default-font-options default-font-options ""))) (format "\\\\babelprovide[onchar=ids fonts]{%s}\n \\\\babelfont[%s]{rm}[%s]{%s}\n" lang-name lang-name (if (not (equal lang-explicit-font-opts "")) lang-explicit-font-opts default-font-options?) (if (not (equal lang-explicit-font "")) lang-explicit-font default-font)))) ;; make preamble definitions. This is supposed to be part of ;; `org-latex-guess-babel-language', as in the modified version below (defun org-latex-uc-fonts-support-make-preamble (header) (let* ((main-fonts (unless (eq 'org-latex-uc-fonts-support-default-main-fonts 'default) (mapconcat #'identity (cl-remove-if-not #'identity (mapcar (lambda (elt) (let ((str (org-latex-uc-fonts-support-get-main-fonts org-latex-uc-fonts-support-default-main-fonts elt))) (when str str))) (list :main :sans :mono :math))) "\n"))) (other-fonts-per-language (org-latex-uc-fonts-support-get-fonts-other-languages header)) (preamble (with-temp-buffer (insert "\n\n") (when main-fonts (insert main-fonts)) (insert "\n\n") (when other-fonts-per-language (insert other-fonts-per-language)) (buffer-string)))) preamble)) (defun org-latex-guess-babel-language (header info) "Modified version for this proof of concept" (let* ((language-code (plist-get info :language)) (plist (cdr (assoc language-code org-latex-language-alist))) (language (plist-get plist :babel)) (language-ini-only (plist-get plist :babel-ini-only)) ;; If no language is set, or Babel package is not loaded, or ;; LANGUAGE keyword value is a language served by Babel ;; exclusively through ini files, return HEADER as-is. (header (if (or language-ini-only (not (stringp language-code)) (not (string-match "\\\\usepackage\\[\\(.*\\)\\]{babel}" header))) header (let ((options (save-match-data (org-split-string (match-string 1 header) ",[ \t]*")))) ;; If LANGUAGE is already loaded, return header ;; without AUTO. Otherwise, replace AUTO with language or ;; append language if AUTO is not present. Languages that are ;; served in Babel exclusively through ini files are not added ;; to the babel argument, and must be loaded using ;; `\babelprovide'. (replace-match (mapconcat (lambda (option) (if (equal "AUTO" option) language option)) (cond ((member language options) (delete "AUTO" options)) ((member "AUTO" options) options) (t (append options (list language)))) ", ") t nil header 1))))) ;;; adition: (when org-latex-uc-fonts-support (setq header (let ((form (org-latex-uc-fonts-support-make-preamble header))) (replace-regexp-in-string "\\(\\\\usepackage\\[?.*\\]?{babel}\\)" (format "\n\\\\usepackage{fontspec}\n\n\\1\n%s" form) header)))) ;;; ;; If `\babelprovide[args]{AUTO}' is present, AUTO is ;; replaced by LANGUAGE. (if (not (string-match "\\\\babelprovide\\[.*\\]{\\(.+\\)}" header)) header (let ((prov (match-string 1 header))) (if (equal "AUTO" prov) (replace-regexp-in-string (format "\\(\\\\babelprovide\\[.*\\]\\)\\({\\)%s}" prov) (format "\\1\\2%s}" (or language language-ini-only)) header t) header))))) ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Fallback fonts in LaTeX export for non latin scripts 2023-09-02 21:39 ` Juan Manuel Macías @ 2023-09-03 7:22 ` Ihor Radchenko 2023-09-03 11:05 ` Juan Manuel Macías 0 siblings, 1 reply; 21+ messages in thread From: Ihor Radchenko @ 2023-09-03 7:22 UTC (permalink / raw) To: Juan Manuel Macías; +Cc: orgmode, Timothy Juan Manuel Macías <maciaschain@posteo.net> writes: > Finally I can upload some usable code here, in this case to be able to > load and manage fonts for languages with non-Latin scripts, through > babel and fontspec (in LuaLaTeX). It is an attempt to simplify from Org > the multiform syntax of babel + fontspec. Of course, it is more limited, > but for regular use I think it may be enough. I can see that you did not add defaults for Chinese, which is one of the problematic scripts for LaTeX. Can you add it? > ;; #+LaTeX_Header: % !enable-fonts-for ancientgreek:Linux Libertine O(Scale=MatchLowercase) > ;; #+LaTeX_Header: % !enable-fonts-for russian:FreeSerif(Numbers=Lowercase,Color=blue) :: arabic I do not like this approach. Would be more consistent to allow multiple languages in #+language + #+LATEX_FONT keyword to optionally specify per-language font: #+LANGUAGE: <main language> <other languages...> #+LATEX_FONT[lang]: font #+language: ancientgreek russian arabic #+latex_font[ancientgreek]: "Linux Libertine O" Scale=MatchLowercase #+latex_font[russian]: "FreeSerif" Numbers=Lowercase,Color=blue Also, I think that it may still make sense to have some kind of fallback font if the specified fonts are not sufficient. For example, when using emoji symbols, which do not correspond to any language. -- Ihor Radchenko // yantar92, Org mode contributor, Learn more about Org mode at <https://orgmode.org/>. Support Org development at <https://liberapay.com/org-mode>, or support my work at <https://liberapay.com/yantar92> ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Fallback fonts in LaTeX export for non latin scripts 2023-09-03 7:22 ` Ihor Radchenko @ 2023-09-03 11:05 ` Juan Manuel Macías 2023-09-04 8:09 ` Ihor Radchenko 0 siblings, 1 reply; 21+ messages in thread From: Juan Manuel Macías @ 2023-09-03 11:05 UTC (permalink / raw) To: Ihor Radchenko; +Cc: orgmode, Timothy Thanks for your comments! Ihor Radchenko writes: > Juan Manuel Macías <maciaschain@posteo.net> writes: > >> Finally I can upload some usable code here, in this case to be able to >> load and manage fonts for languages with non-Latin scripts, through >> babel and fontspec (in LuaLaTeX). It is an attempt to simplify from Org >> the multiform syntax of babel + fontspec. Of course, it is more limited, >> but for regular use I think it may be enough. > > I can see that you did not add defaults for Chinese, which is one of the > problematic scripts for LaTeX. Can you add it? In that first proof of concept I only put a few scripts, less problematic, simply to show the functionality. In CJK languages things are a little more complicated, but it can be done too. The idea is to cover all scripts. In the next code I submit, when I redo the current one, I will try to introduce the case of CJK scripts. >> ;; #+LaTeX_Header: % !enable-fonts-for ancientgreek:Linux Libertine O(Scale=MatchLowercase) >> ;; #+LaTeX_Header: % !enable-fonts-for russian:FreeSerif(Numbers=Lowercase,Color=blue) :: arabic > > I do not like this approach. I'm not a big fan of doing it like that either. I chose this option because I didn't have to define a new keyword and to be less "intrusive" with the actual code. But on the other hand it adds a new syntax. Well, I discard it, to the detriment of an idea that you mention below. > Would be more consistent to allow multiple languages in #+language + > #+LATEX_FONT keyword to optionally specify per-language font: > #+language: ancientgreek russian arabic Of course, this syntax would be the most appropriate and consistent within Org. The problem is LaTeX, specifically babel, and that certain inconsistencies would be created with the rest of the backends. At first some pitfalls come to mind: - The keyword #+language accepts for now only language codes (es, en, el, ar, ru, etc.). Consistency with other backends should be maintained in this regard: ancientgreek is not a valid language code, but a name that only babel understands. If we put something like (a valid language code): #+language: el-polyton this could be translated in babel as polutonikogreek (in the classic syntax, that is, the languages that are loaded in the options of \usepackage[options]{babel}), or, in the new syntax, ancientgreek and polytonicgreek, which are actually two different languages: the first is ancient polytonic Greek and the second modern polytonic Greek. To add more confusion to the matter, in classical babel syntax greek.ancient and greek.polytonic are also supported. But neither of these things can be deduced by simply putting el-polyton, unless breaking the consistency with the other backends. - Added to this is that Babel has two ways to load languages: the classic syntax and the \babelprovide command, which is the one we are interested in here for languages with non-Latin scripts, because the onchar=ids fonts property must be added here. And what happens if the user has already defined several languages with babel, using the current procedure: \usepackage[french, english, AUTO]{babel}? Therefore, the least complicated thing, in my opinion, is to leave the syntax of the keyword #+language as it is. It is not necessary for the user to explicitly define secondary non-latin languages. The idea is that Org is responsible for generating the necessary babel code by simply giving a command like enable font for X language. What we are talking about here is ensuring readability using a series of fonts that LaTeX does not load by default, not even LuaLaTeX. And, after all, Org is monolingual: it does not have multilingual support at the moment; that is, there is nothing in Org to switch languages in the middle of the document. What happens is that here we take advantage of the functionality that Babel has to automatically apply a font for a non-Latin language/script, also loading its properties (hyphen rules, captions, etc.). A new keyword #+latex_language could be created, which would understand the babel names, but I think it is unnecessary and would add more complexity. As I said before, defining the necessary fonts would be enough, since my idea in this is a basic practicality to ensure the readability of the documents. And anyone looking for more advanced functions would have to enter LaTeX code explicitly. > #+latex_font[ancientgreek]: "Linux Libertine O" Scale=MatchLowercase > > #+latex_font[russian]: "FreeSerif" Numbers=Lowercase,Color=blue I like this idea, but with the exception that in the two examples you give the user is declaring two fonts for both languages. In my example there was also Arabic, where the default font for the Arabic script is used. Note that each script would have default fonts, which the user can change or not change in their document. A user could simply put something like "enable the default fonts for ancientgreek, russian, malayalam, georgian, chinese". And nothing more. Or choose some other font with or without options for a specific lang. Could be: #+latex_font: ancientgreek, russian, malayalam, sanskrit-devanagari beside: #+latex_font[arabic]: "FreeSerif" Numbers=Lowercase,Color=blue This last syntax would also be valid to modify the main default fonts: #+latex_font[main]: "FreeSerif" Numbers=Lowercase #+latex_font[sans]: "some font" #+latex_font[mono]: "some font" #+latex_font[math]: "some font" A practical use case. Suppose a user has a document in Spanish, which includes passages in Greek and Russian. It would be enough to use the Old Standard font (included in TeX live) for the entire document, ensuring consistency: #+latex_header: \usepackage[AUTO]{babel} #+language:es #+latex_font[main,greek,russian]: Old Standard > Also, I think that it may still make sense to have some kind of fallback > font if the specified fonts are not sufficient. For example, when using > emoji symbols, which do not correspond to any language. Yes I agree. That could also be included in the generated preamble. -- Juan Manuel Macías https://juanmanuelmacias.com https://lunotipia.juanmanuelmacias.com https://gnutas.juanmanuelmacias.com ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Fallback fonts in LaTeX export for non latin scripts 2023-09-03 11:05 ` Juan Manuel Macías @ 2023-09-04 8:09 ` Ihor Radchenko 2023-09-04 22:22 ` Juan Manuel Macías 0 siblings, 1 reply; 21+ messages in thread From: Ihor Radchenko @ 2023-09-04 8:09 UTC (permalink / raw) To: Juan Manuel Macías; +Cc: orgmode, Timothy Juan Manuel Macías <maciaschain@posteo.net> writes: >> #+language: ancientgreek russian arabic > > Of course, this syntax would be the most appropriate and consistent > within Org. The problem is LaTeX, specifically babel, and that certain > inconsistencies would be created with the rest of the backends. At first > some pitfalls come to mind: > > - The keyword #+language accepts for now only language codes (es, en, > el, ar, ru, etc.). Consistency with other backends should > be maintained in this regard: ancientgreek is not a valid language > code, but a name that only babel understands. If we put something > like (a valid language code): > > #+language: el-polyton > > this could be translated in babel as polutonikogreek (in the classic > syntax, that is, the languages that are loaded in the options of > \usepackage[options]{babel}), or, in the new syntax, ancientgreek and > polytonicgreek, which are actually two different languages: the first > is ancient polytonic Greek and the second modern polytonic Greek. To > add more confusion to the matter, in classical babel syntax > greek.ancient and greek.polytonic are also supported. But neither of > these things can be deduced by simply putting el-polyton, unless > breaking the consistency with the other backends. I am now working on unifying Org translation system as discussed in https://orgmode.org/list/87o7iw8yem.fsf@bzg.fr As a part of the effort, I plan to introduce a new constant that will unify language abbreviations across Org and also associate them with more human-readable names. (defconst org-language-abbrevs '(("am". "Amharic") ("ar" . "Arabic") ("ast" . "Asturian") ("bg" . "Bulgarian") ("bn" . "Bengali") ...)) The idea is to allow #+language: Austrian German, Greek as a valid specifier, in addition to #+language: de-at, el Then, across Org, we will make use of the standardized language abbreviations. > - Added to this is that Babel has two ways to load languages: the > classic syntax and the \babelprovide command, which is the one we are > interested in here for languages with non-Latin scripts, because the > onchar=ids fonts property must be added here. And what happens if the > user has already defined several languages with babel, using the > current procedure: \usepackage[french, english, AUTO]{babel}? For LaTeX specifically, `org-latex-language-alist', will be re-used to map whatever is allowed in #+language keyword to its name in babel/polyglossia. Does it make sense? > Therefore, the least complicated thing, in my opinion, is to leave the > syntax of the keyword #+language as it is. It is not necessary for the > user to explicitly define secondary non-latin languages. The idea is > that Org is responsible for generating the necessary babel code by > simply giving a command like enable font for X language. What we are > talking about here is ensuring readability using a series of fonts that > LaTeX does not load by default, not even LuaLaTeX. And, after all, Org > is monolingual: it does not have multilingual support at the moment; > that is, there is nothing in Org to switch languages in the middle of > the document. What happens is that here we take advantage of the > functionality that Babel has to automatically apply a font for a > non-Latin language/script, also loading its properties (hyphen rules, > captions, etc.). > > A new keyword #+latex_language could be created, which would understand > the babel names, but I think it is unnecessary and would add more > complexity. As I said before, defining the necessary fonts would be > enough, since my idea in this is a basic practicality to ensure the > readability of the documents. And anyone looking for more advanced > functions would have to enter LaTeX code explicitly. I think that we should move towards multi-language support. Such support would practically simplify WORG and orgmode.org translation process, and may also be used as a basis to allow translating the Org manual. My rough idea is to allow specifying language as affiliated keyword and, in future, allow selective export to certain target language. Multi-language documents are another potential target to support. >> #+latex_font[ancientgreek]: "Linux Libertine O" Scale=MatchLowercase >> >> #+latex_font[russian]: "FreeSerif" Numbers=Lowercase,Color=blue > > I like this idea, but with the exception that in the two examples you > give the user is declaring two fonts for both languages. In my example > there was also Arabic, where the default font for the Arabic script is > used. My idea was that #+language: ancientgreek russian arabic implies "use default font for arabic", unless #+latex_font is specified. > #+latex_font[arabic]: "FreeSerif" Numbers=Lowercase,Color=blue > > This last syntax would also be valid to modify the main default fonts: > > #+latex_font[main]: "FreeSerif" Numbers=Lowercase > #+latex_font[sans]: "some font" > #+latex_font[mono]: "some font" > #+latex_font[math]: "some font" > > A practical use case. Suppose a user has a document in Spanish, which > includes passages in Greek and Russian. It would be enough to use the > Old Standard font (included in TeX live) for the entire document, > ensuring consistency: > > #+latex_header: \usepackage[AUTO]{babel} > #+language:es > #+latex_font[main,greek,russian]: Old Standard Looks reasonable. -- Ihor Radchenko // yantar92, Org mode contributor, Learn more about Org mode at <https://orgmode.org/>. Support Org development at <https://liberapay.com/org-mode>, or support my work at <https://liberapay.com/yantar92> ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Fallback fonts in LaTeX export for non latin scripts 2023-09-04 8:09 ` Ihor Radchenko @ 2023-09-04 22:22 ` Juan Manuel Macías 2023-09-05 10:44 ` Ihor Radchenko 2023-09-05 16:42 ` Max Nikulin 0 siblings, 2 replies; 21+ messages in thread From: Juan Manuel Macías @ 2023-09-04 22:22 UTC (permalink / raw) To: Ihor Radchenko; +Cc: orgmode, Timothy Ihor Radchenko writes: > Juan Manuel Macías <maciaschain@posteo.net> writes: > >>> #+language: ancientgreek russian arabic >> >> Of course, this syntax would be the most appropriate and consistent >> within Org. The problem is LaTeX, specifically babel, and that certain >> inconsistencies would be created with the rest of the backends. At first >> some pitfalls come to mind: >> >> - The keyword #+language accepts for now only language codes (es, en, >> el, ar, ru, etc.). Consistency with other backends should >> be maintained in this regard: ancientgreek is not a valid language >> code, but a name that only babel understands. If we put something >> like (a valid language code): >> >> #+language: el-polyton >> >> this could be translated in babel as polutonikogreek (in the classic >> syntax, that is, the languages that are loaded in the options of >> \usepackage[options]{babel}), or, in the new syntax, ancientgreek and >> polytonicgreek, which are actually two different languages: the first >> is ancient polytonic Greek and the second modern polytonic Greek. To >> add more confusion to the matter, in classical babel syntax >> greek.ancient and greek.polytonic are also supported. But neither of >> these things can be deduced by simply putting el-polyton, unless >> breaking the consistency with the other backends. > > I am now working on unifying Org translation system as discussed in > https://orgmode.org/list/87o7iw8yem.fsf@bzg.fr > As a part of the effort, I plan to introduce a new constant that will > unify language abbreviations across Org and also associate them with > more human-readable names. > > (defconst org-language-abbrevs > '(("am". "Amharic") > ("ar" . "Arabic") > ("ast" . "Asturian") > ("bg" . "Bulgarian") > ("bn" . "Bengali") > ...)) > > The idea is to allow > > #+language: Austrian German, Greek > as a valid specifier, in addition to > > #+language: de-at, el > > Then, across Org, we will make use of the standardized language > abbreviations. Great! I think it's great news. Yes, I agree with what you say below. I think Org should move towards a multilingual support that is 100% native to Org. That is, Org had its own "selectlanguage" mechanism, to be able to delimit text segments in other languages and have control over them, both within Org and when exporting to the different backends. That scenario seems very desirable to me, and I would like to contribute my help to the best of my ability (and time). In LaTeX, as I mentioned, things are complicated. There is Babel and Polyglossia, and there is LuaTeX and XeTeX. In addition, there is also pdfTeX, which is still the default engine and (to be honest) is the engine used by a high percentage of LaTeX users. Although perhaps things will change soon to the detriment of LuaTeX. Both babel and polyglossia could be supported, but that means more work, more code, and more complications. And we are not sure that polyglossia is no longer maintained. After all, babel is the official LaTeX package for language support, and polyglossia appeared at a time when babel had no support for the new unicode engines. Now Babel supports all of that and is much more powerful, but its interface has also grown in complexity. There is the problem of the double syntax for loading languages: the old one, which loads traditional ldf files, and the modern one (\babelprovide), which loads languages using ini files. It is more powerful, with more options, but has added more verbosity to babel. I have taken advantage of \babelprovide, specifically its onchar=id fonts property, to automatically apply fonts to non-Latin scripts. >> I like this idea, but with the exception that in the two examples you >> give the user is declaring two fonts for both languages. In my example >> there was also Arabic, where the default font for the Arabic script is >> used. > > My idea was that > > #+language: ancientgreek russian arabic > > implies "use default font for arabic", unless #+latex_font is specified. This seems the most consistent to me for Org, but, as I mentioned in the other email, I have some concerns. Currently, what we are talking about is simply font support for non-Latin languages. If it is allowed, in the current state of things, that #+language can accept a list of language names, we can give the user a wrong perception of reality. That is: multilingual support that does not exist as such. It is more like font support for non-Latin languages. And only in LaTeX, and specifically in LuaLaTeX. Furthermore, the user could mix languages that in Babel are loaded through ldf and others through ini files. For example, something like this: #+language: spanish, english, french, russian in Babel it would be: \usepackage[english,french,spanish]{babel} and here we need babelprovide for the font (and load Russian via ini file): \babelprovide[onchar=id fonts, import]{russian} \babelfont[russian]{rm}[options]{somefont} Org would have to discern which name refers to a non-Latin language (which wouldn't be complicated with the functionality you're working on) and then apply the default font by adding a line with \babelprovide. Of course, English, French and Spanish can also be loaded via ini files: \babelprovide[main,import]{spanish} \babelprovide[import]{french} \babelprovide[import]{english} Even babel also supports: \usepackage[english,french,spanish,provide*=*]{babel} but in that line we cannot put Russian with onchar, etc. And then there is pdfTeX, where only the classic babel syntax is allowed, without any "*provide". In short, I find everything very confusing. I am not opposed to doing it as you propose (in fact, it is the option I like the most, especially when org is polyglot in the future), but I also want to warn of possible complications. Therefore, since we are, for now, with fonts for non-Latin languages, I think it should be made clear that the keyword is about fonts (and about LuaLaTeX). Maybe through two keywords: #+lualatex_fonts_for: language(s) #+lualatex_fonts[language(s)]: "font" options ? I think it's ugly, but I can't think of anything else. By the way, and as a side note, is it currently possible in Org to define a keyword within :options-alist of the style #+foo[anything] or would something like org-collect-keywords have to be modified? -- Juan Manuel Macías https://juanmanuelmacias.com https://lunotipia.juanmanuelmacias.com https://gnutas.juanmanuelmacias.com ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Fallback fonts in LaTeX export for non latin scripts 2023-09-04 22:22 ` Juan Manuel Macías @ 2023-09-05 10:44 ` Ihor Radchenko 2023-09-20 14:03 ` Juan Manuel Macías 2023-09-05 16:42 ` Max Nikulin 1 sibling, 1 reply; 21+ messages in thread From: Ihor Radchenko @ 2023-09-05 10:44 UTC (permalink / raw) To: Juan Manuel Macías; +Cc: orgmode, Timothy Juan Manuel Macías <maciaschain@posteo.net> writes: >> The idea is to allow >> >> #+language: Austrian German, Greek >> as a valid specifier, in addition to >> >> #+language: de-at, el >> >> Then, across Org, we will make use of the standardized language >> abbreviations. > > In LaTeX, as I mentioned, things are complicated. There is Babel and > Polyglossia, and there is LuaTeX and XeTeX. In addition, there is also > pdfTeX, which is still the default engine and (to be honest) is the > engine used by a high percentage of LaTeX users. Although perhaps things > will change soon to the detriment of LuaTeX. Both babel and polyglossia > could be supported, but that means more work, more code, and more > complications. And we are not sure that polyglossia is no longer > maintained. After all, babel is the official LaTeX package for language > support, and polyglossia appeared at a time when babel had no support > for the new unicode engines. Now Babel supports all of that and is much > more powerful, but its interface has also grown in complexity. There is > the problem of the double syntax for loading languages: the old one, > which loads traditional ldf files, and the modern one (\babelprovide), > which loads languages using ini files. It is more powerful, with more > options, but has added more verbosity to babel. I have taken advantage > of \babelprovide, specifically its onchar=id fonts property, to > automatically apply fonts to non-Latin scripts. > ... > multilingual support that does not exist as such. It is more like font > support for non-Latin languages. And only in LaTeX, and specifically in > LuaLaTeX. Furthermore, the user could mix languages that in Babel are > loaded through ldf and others through ini files. For example, something > like this: > > #+language: spanish, english, french, russian > > in Babel it would be: > > \usepackage[english,french,spanish]{babel} > > and here we need babelprovide for the font (and load Russian via ini > file): > > \babelprovide[onchar=id fonts, import]{russian} > \babelfont[russian]{rm}[options]{somefont} > > Org would have to discern which name refers to a non-Latin language > (which wouldn't be complicated with the functionality you're working on) > and then apply the default font by adding a line with \babelprovide. > > Of course, English, French and Spanish can also be loaded via ini files: > > \babelprovide[main,import]{spanish} > \babelprovide[import]{french} > \babelprovide[import]{english} > > Even babel also supports: > > \usepackage[english,french,spanish,provide*=*]{babel} > > but in that line we cannot put Russian with onchar, etc. And then there > is pdfTeX, where only the classic babel syntax is allowed, without any > "*provide". Aren't we already handling this problem in `org-latex-make-preamble'? >> My idea was that >> >> #+language: ancientgreek russian arabic >> >> implies "use default font for arabic", unless #+latex_font is specified. > > This seems the most consistent to me for Org, but, as I mentioned in the > other email, I have some concerns. Currently, what we are talking about > is simply font support for non-Latin languages. If it is allowed, in the > current state of things, that #+language can accept a list of language > names, we can give the user a wrong perception of reality. That is: <complications with full support not being possible in all the LaTeX flavors> > In short, I find everything very confusing. I am not opposed to doing it > as you propose (in fact, it is the option I like the most, especially > when org is polyglot in the future), but I also want to warn of possible > complications. > > Therefore, since we are, for now, with fonts for non-Latin languages, I > think it should be made clear that the keyword is about fonts (and about > LuaLaTeX). Maybe through two keywords: > > #+lualatex_fonts_for: language(s) > #+lualatex_fonts[language(s)]: "font" options > > ? > > I think it's ugly, but I can't think of anything else. Maybe just #+lualatex_fonts[languages(s)]: default to force the default. > By the way, and as a side note, is it currently possible in Org to > define a keyword within :options-alist of the style #+foo[anything] or > would something like org-collect-keywords have to be modified? We will need to add things to `org-element-dual-keywords' and make sure that the code expects the keyword value to be a list, as returned by the parser. AFAIU, it should be enough. -- Ihor Radchenko // yantar92, Org mode contributor, Learn more about Org mode at <https://orgmode.org/>. Support Org development at <https://liberapay.com/org-mode>, or support my work at <https://liberapay.com/yantar92> ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Fallback fonts in LaTeX export for non latin scripts 2023-09-05 10:44 ` Ihor Radchenko @ 2023-09-20 14:03 ` Juan Manuel Macías 2023-09-21 9:00 ` Ihor Radchenko 0 siblings, 1 reply; 21+ messages in thread From: Juan Manuel Macías @ 2023-09-20 14:03 UTC (permalink / raw) To: orgmode; +Cc: Ihor Radchenko, Timothy, Max Nikulin Some new information about Babel that may be of interest to the topic of this thread. I have received an email from Javier Bezos (whom I know from the Spanish-speaking TeX users' mailing list), who is the current babel mantainer, as well as the person responsible for all the improvements and new features of the package. Although he is not currently an Emacs/Org user, he has been following this thread with great interest, so I am transmitting here, with his permission, some interesting comments from him: #+begin_quote [...] I am very interested in all possible improvements in babel so that it integrates as best as possible with automatically generated files. Among them are the possibility of using BCP47 codes or using a language (at least basically) without the need for a prior declaration. These are things already done, but there are others that can still be improved. [...] any suggestion for improvement is very welcome [...] Among the things I agree on is name issue. I am unifying the dice in the CLDR as much as possible, and already, in fact, it is very advanced: https://latex3.github.io/babel/guides/locale-naming.html [...] The ini files contain information that is not actually used by babel, but that could be useful in other packages or even external applications. One of them is the name of the language in English and in the vernacular form, as they are in the Unicode CLDR. As I explain in the link I gave you, the purpose is that the babel name is based on the CLDR name with mechanical changes. Anyway, CLDR names are also included in the ini files, to establish correspondences more easily. #+end_quote ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Fallback fonts in LaTeX export for non latin scripts 2023-09-20 14:03 ` Juan Manuel Macías @ 2023-09-21 9:00 ` Ihor Radchenko 2023-09-24 18:24 ` Juan Manuel Macías 0 siblings, 1 reply; 21+ messages in thread From: Ihor Radchenko @ 2023-09-21 9:00 UTC (permalink / raw) To: Juan Manuel Macías; +Cc: orgmode, Timothy, Max Nikulin Juan Manuel Macías <maciaschain@posteo.net> writes: > [...] I am very interested in all possible improvements in babel so that > it integrates as best as possible with automatically generated files. > Among them are the possibility of using BCP47 codes or using a language > (at least basically) without the need for a prior declaration. These are > things already done, but there are others that can still be improved. Do I understand correctly that babel, in future, may be able to auto-detect more languages without explicitly declaring them? > [...] any suggestion for improvement is very welcome [...] This is a bit too out of context. Improvement of what? > Among the things I agree on is name issue. I am unifying the dice in the > CLDR as much as possible, and already, in fact, it is very advanced: > > https://latex3.github.io/babel/guides/locale-naming.html AFAIU, the relevant quote is They are taken from the CLDR. Wherever the CLDR doesn’t provide a name (eg, “Medieval Latin”), the pattern followed in practice for other names is applied, namely, use the ‘natural’ form in English: medievallatin. They should be preferably based on the description field in the IANA registry (eg, polytonicgreek), although some simplifications can be necessary, because some names are “too” descriptive. See also the templates for about 500 locales already available. As a secondary source, Glottolog is used, too. (Wikipedia articles can be taken as a complementary but unreliable source, and its information must be verified; on the other hand, internal data, like this one, is useful for both names and tags.) I am not very sure about "some simplifications" referring to IANA. I guess it is referring to language names in https://www.iana.org/assignments/language-subtag-registry/language-subtag-registry like "Puter idiom of Romansh". From Org perspective, verbosity is not a primary concern as long as we provide #+language: completion support. Probably, we should favor names that are more likely known (or can be easily found) by the language users. IANA and https://glottolog.org/ look like good sources we can link to. We can also provide multiple language name variants though I don't see a need to bother unless we get user requests to do such thing. > The ini files contain information that is not actually used by babel, > but that could be useful in other packages or even external > applications. One of them is the name of the language in English and in > the vernacular form, as they are in the Unicode CLDR. As I explain in > the link I gave you, the purpose is that the babel name is based on the > CLDR name with mechanical changes. Anyway, CLDR names are also included > in the ini files, to establish correspondences more easily. Are the "verbose" language names (name.english) changed to "simplify" them? Or is it only done for name.babel? -- Ihor Radchenko // yantar92, Org mode contributor, Learn more about Org mode at <https://orgmode.org/>. Support Org development at <https://liberapay.com/org-mode>, or support my work at <https://liberapay.com/yantar92> ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Fallback fonts in LaTeX export for non latin scripts 2023-09-21 9:00 ` Ihor Radchenko @ 2023-09-24 18:24 ` Juan Manuel Macías 2023-09-26 10:37 ` Ihor Radchenko 0 siblings, 1 reply; 21+ messages in thread From: Juan Manuel Macías @ 2023-09-24 18:24 UTC (permalink / raw) To: Ihor Radchenko; +Cc: orgmode, Timothy, Max Nikulin Sorry for the late reply. Ihor Radchenko writes: > Juan Manuel Macías <maciaschain@posteo.net> writes: > >> [...] I am very interested in all possible improvements in babel so that >> it integrates as best as possible with automatically generated files. >> Among them are the possibility of using BCP47 codes or using a language >> (at least basically) without the need for a prior declaration. These are >> things already done, but there are others that can still be improved. > > Do I understand correctly that babel, in future, may be able to > auto-detect more languages without explicitly declaring them? Correct. Indeed, it is possible to use the command \foreignlanguage or its environment version (otherlanguage*) without having to declare the language previously. I would say that \foreignlanguage is a command that covers a high percentage of use cases in multilingual documents, since it is intended for short fragments of text and only loads the hyphen rules of the host language. >> [...] any suggestion for improvement is very welcome [...] > > This is a bit too out of context. Improvement of what? I think it is related to the previous paragraph: "I am very interested in all possible improvements in babel so that it integrates as best as possible with automatically generated files[...]" >> Among the things I agree on is name issue. I am unifying the dice in the >> CLDR as much as possible, and already, in fact, it is very advanced: >> >> https://latex3.github.io/babel/guides/locale-naming.html > > AFAIU, the relevant quote is > > They are taken from the CLDR. Wherever the CLDR doesn’t provide a name > (eg, “Medieval Latin”), the pattern followed in practice for other names > is applied, namely, use the ‘natural’ form in English: medievallatin. > They should be preferably based on the description field in the IANA > registry (eg, polytonicgreek), although some simplifications can be > necessary, because some names are “too” descriptive. See also the > templates for about 500 locales already available. As a secondary > source, Glottolog is used, too. (Wikipedia articles can be taken as a > complementary but unreliable source, and its information must be > verified; on the other hand, internal data, like this one, is useful for > both names and tags.) > > I am not very sure about "some simplifications" referring to IANA. I > guess it is referring to language names in > https://www.iana.org/assignments/language-subtag-registry/language-subtag-registry > like "Puter idiom of Romansh". > > From Org perspective, verbosity is not a primary concern as long as we > provide #+language: completion support. Probably, we should favor names > that are more likely known (or can be easily found) by the language > users. IANA and https://glottolog.org/ look like good sources we can > link to. > > We can also provide multiple language name variants though I don't see a > need to bother unless we get user requests to do such thing. I agree. I even think it would be a good point to also include the vernacular name of each language. By the way, Javier has also told me that he is going to consider the 'onchar=ids fonts' issue related to the case of several languages that use the same script (already discussed here in past messages). Best regards, Juan Manuel -- Juan Manuel Macías https://juanmanuelmacias.com https://lunotipia.juanmanuelmacias.com https://gnutas.juanmanuelmacias.com ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Fallback fonts in LaTeX export for non latin scripts 2023-09-24 18:24 ` Juan Manuel Macías @ 2023-09-26 10:37 ` Ihor Radchenko 0 siblings, 0 replies; 21+ messages in thread From: Ihor Radchenko @ 2023-09-26 10:37 UTC (permalink / raw) To: Juan Manuel Macías; +Cc: orgmode, Timothy, Max Nikulin Juan Manuel Macías <maciaschain@posteo.net> writes: >>> [...] any suggestion for improvement is very welcome [...] >> >> This is a bit too out of context. Improvement of what? > > I think it is related to the previous paragraph: "I am very interested > in all possible improvements in babel so that it integrates as best as > possible with automatically generated files[...]" That's good to hear. In practical terms, if Javier gives us some contact email, we may CC him when we think that what we discuss is related to Babel. >> We can also provide multiple language name variants though I don't see a >> need to bother unless we get user requests to do such thing. > > I agree. I even think it would be a good point to also include the > vernacular name of each language. Sounds reasonable. Although, let's come back to this when we have actual code to discuss. > By the way, Javier has also told me that he is going to consider the > 'onchar=ids fonts' issue related to the case of several languages that > use the same script (already discussed here in past messages). That would be nice, although determining language may not be trivial. AFAIK, automatic language detection often relies upon word frequencies (for example, see https://pypi.org/project/langdetect/) and cannot be reliable for very short text fragments. In the case of texts combining multiple languages arbitrarily, the problem becomes even more difficult. In some cases (dialects), multiple languages can be valid for the same text fragment. That said, frequency-based approach can mostly work well, except certain edge cases. But it requires word corpus. I am not sure how feasible it would be to include into TeX distribution. (Maybe not very hard - it is already quite large and a few dictionary files will not change much). -- Ihor Radchenko // yantar92, Org mode contributor, Learn more about Org mode at <https://orgmode.org/>. Support Org development at <https://liberapay.com/org-mode>, or support my work at <https://liberapay.com/yantar92> ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Fallback fonts in LaTeX export for non latin scripts 2023-09-04 22:22 ` Juan Manuel Macías 2023-09-05 10:44 ` Ihor Radchenko @ 2023-09-05 16:42 ` Max Nikulin 2023-09-05 18:33 ` Juan Manuel Macías 1 sibling, 1 reply; 21+ messages in thread From: Max Nikulin @ 2023-09-05 16:42 UTC (permalink / raw) To: emacs-orgmode On 05/09/2023 05:22, Juan Manuel Macías wrote: > \usepackage[english,french,spanish,provide*=*]{babel} > > but in that line we cannot put Russian with onchar, etc. Cyrillic letters may appear not only in Russian just as French and Spanish use Latin script. So language detection based on symbol code points works only for distinct enough languages. Explicit markup may still be necessary to switch hyphenation rules, dash styles, etc. I have a couple of bookmarks for language detection libraries (not for Emacs), but I am unsure if they may work for texts containing fragments written in different languages. ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Fallback fonts in LaTeX export for non latin scripts 2023-09-05 16:42 ` Max Nikulin @ 2023-09-05 18:33 ` Juan Manuel Macías 2023-09-06 9:29 ` Ihor Radchenko 0 siblings, 1 reply; 21+ messages in thread From: Juan Manuel Macías @ 2023-09-05 18:33 UTC (permalink / raw) To: Max Nikulin; +Cc: emacs-orgmode, Ihor Radchenko Max Nikulin writes: > Cyrillic letters may appear not only in Russian just as French and > Spanish use Latin script. So language detection based on symbol code > points works only for distinct enough languages. Explicit markup may > still be necessary to switch hyphenation rules, dash styles, etc. True. Thanks for pointing it out. Indeed, \babelprovide with the ochar=id fonts option only makes sense when 1 foreign language = 1 script. For example, different variants of Greek cannot be combined without an explicit switch. And something like this wouldn't work either: \babelprovide[import,onchar=id fonts]{russian} \babelprovide[import,onchar=id fonts]{bulgarian} \babelfont[russian]{rm}[Color=blue]{Old Standard} \babelfont[bulgarian]{rm}[Color=green]{FreeSerif} because bulgarian overwrites russian. Well, another added complication :-(. -- Juan Manuel Macías https://juanmanuelmacias.com https://lunotipia.juanmanuelmacias.com https://gnutas.juanmanuelmacias.com ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Fallback fonts in LaTeX export for non latin scripts 2023-09-05 18:33 ` Juan Manuel Macías @ 2023-09-06 9:29 ` Ihor Radchenko 2023-09-06 14:58 ` Juan Manuel Macías 0 siblings, 1 reply; 21+ messages in thread From: Ihor Radchenko @ 2023-09-06 9:29 UTC (permalink / raw) To: Juan Manuel Macías; +Cc: Max Nikulin, emacs-orgmode, Ihor Radchenko Juan Manuel Macías <maciaschain@posteo.net> writes: > True. Thanks for pointing it out. Indeed, \babelprovide with the > ochar=id fonts option only makes sense when 1 foreign language = 1 > script. For example, different variants of Greek cannot be combined > without an explicit switch. > > And something like this wouldn't work either: > > \babelprovide[import,onchar=id fonts]{russian} > \babelprovide[import,onchar=id fonts]{bulgarian} > \babelfont[russian]{rm}[Color=blue]{Old Standard} > \babelfont[bulgarian]{rm}[Color=green]{FreeSerif} > > because bulgarian overwrites russian. > > Well, another added complication :-(. AFAIU, there is simply no way to solve this unless the user manually indicates the indented language. Do I understand correctly that onchar=id will not break anything if text is correctly marked with \selectlanguage{<lang>}? -- Ihor Radchenko // yantar92, Org mode contributor, Learn more about Org mode at <https://orgmode.org/>. Support Org development at <https://liberapay.com/org-mode>, or support my work at <https://liberapay.com/yantar92> ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Fallback fonts in LaTeX export for non latin scripts 2023-09-06 9:29 ` Ihor Radchenko @ 2023-09-06 14:58 ` Juan Manuel Macías 2023-09-07 10:22 ` Ihor Radchenko 0 siblings, 1 reply; 21+ messages in thread From: Juan Manuel Macías @ 2023-09-06 14:58 UTC (permalink / raw) To: Ihor Radchenko; +Cc: Max Nikulin, emacs-orgmode, Ihor Radchenko Ihor Radchenko writes: > Do I understand correctly that onchar=id will not break anything if text > is correctly marked with \selectlanguage{<lang>}? To load language features (hyphen rules, captions, etc.) there is no problem. But to load a font associated with a language, the font of the last declared language will always be loaded. Well, it is not a problem, because if in a document there are texts in Russian and Bulgarian, for example, the natural thing is that they go in the same font, since both languages share the Cyrillic script. But there may be cases when the author needs different fonts. In such a case, the user should not use the onchar = etc property: https://i.imgur.com/vmsCNkP.png In any case (to organize myself mentally) I thought that it could be done on two levels: - Level 0: The fonts associated with each script are loaded (from a defcustom list) if luatex is the current engine. And low-level code is generated in Lua with the luaotfload.add_fallback function. That code can be in a Lua file or directly within the preamble, enclosed in the \directlua primitive (mode=harf means that HarfBuzz is used as otf rendering): \directlua {luaotfload.add_fallback("orgfallback", { "oldstandard:mode=harf;script=grek;", "oldstandard:mode=harf;script=cyrl;", "freeserif:mode=harf;script=arab;", "freeserif:mode=harf;script=dev2;", etc., etc. }) } And, to load the fallback fonts: \setmainfont{latinmodernroman}[RawFeature={fallback=orgfallback}] At this level per-language properties are not loaded, but at least readability is ensured. The user cannot modify the fonts associated with each script within the document, but can modify, of course, the defcustom. - Level 1: The user can load language properties and associate fonts with each language using Babel's high-level code (via keywords in Org, as we have commented in previous messages). Here you can also modify the default fonts (also, as we mentioned before): main, mono, sans and math. If the language is declared with an asterisk (for example: russian*) the onchar=etc property will be included in the preamble, and it would not be necessary to switch to russian explicitly. It is assumed that in this scenario the only language with Cyrillic script would be Russian. For language swithcing, in the rest of the cases, some babel command would have to be used using @@latex:@@, special blocks, etc. When Org already has its own language switching mechanism, this would be used instead. Wdyt? -- Juan Manuel Macías https://juanmanuelmacias.com https://lunotipia.juanmanuelmacias.com https://gnutas.juanmanuelmacias.com ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Fallback fonts in LaTeX export for non latin scripts 2023-09-06 14:58 ` Juan Manuel Macías @ 2023-09-07 10:22 ` Ihor Radchenko 2023-09-07 12:04 ` Juan Manuel Macías 0 siblings, 1 reply; 21+ messages in thread From: Ihor Radchenko @ 2023-09-07 10:22 UTC (permalink / raw) To: Juan Manuel Macías; +Cc: Max Nikulin, emacs-orgmode, Ihor Radchenko Juan Manuel Macías <maciaschain@posteo.net> writes: >> Do I understand correctly that onchar=id will not break anything if text >> is correctly marked with \selectlanguage{<lang>}? > > To load language features (hyphen rules, captions, etc.) there is no > problem. But to load a font associated with a language, the font of the > last declared language will always be loaded. May we explicitly set the needed font around language environments? Something like \setfontforrussian \selectlanguage{russian} .... \setfontforbulgarian \selectlanguage{bulgarian} .... > In any case (to organize myself mentally) I thought that it could be > done on two levels: > > - Level 0: The fonts associated with each script are loaded (from a > defcustom list) if luatex is the current engine. And low-level code is > generated in Lua with the luaotfload.add_fallback function. That code > can be in a Lua file or directly within the preamble, enclosed in the > \directlua primitive (mode=harf means that HarfBuzz is used as otf > rendering): > ... Sounds reasonable. > - Level 1: The user can load language properties and associate fonts > with each language using Babel's high-level code (via keywords in Org, > as we have commented in previous messages). Here you can also modify > the default fonts (also, as we mentioned before): main, mono, sans and > math. If the language is declared with an asterisk (for example: > russian*) the onchar=etc property will be included in the preamble, > and it would not be necessary to switch to russian explicitly. It is > assumed that in this scenario the only language with Cyrillic script > would be Russian. For language swithcing, in the rest of the cases, > some babel command would have to be used using @@latex:@@, special > blocks, etc. When Org already has its own language switching > mechanism, this would be used instead. Wdyt? I am not sure if I like "russian*" idea. May you explain a bit more about how onchar works? What if language characters are intersecting, and not using exactly the same char sets? -- Ihor Radchenko // yantar92, Org mode contributor, Learn more about Org mode at <https://orgmode.org/>. Support Org development at <https://liberapay.com/org-mode>, or support my work at <https://liberapay.com/yantar92> ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Fallback fonts in LaTeX export for non latin scripts 2023-09-07 10:22 ` Ihor Radchenko @ 2023-09-07 12:04 ` Juan Manuel Macías 2023-09-08 7:42 ` Ihor Radchenko 0 siblings, 1 reply; 21+ messages in thread From: Juan Manuel Macías @ 2023-09-07 12:04 UTC (permalink / raw) To: Ihor Radchenko; +Cc: Max Nikulin, emacs-orgmode Ihor Radchenko writes: > May we explicitly set the needed font around language environments? > > Something like > > \setfontforrussian > \selectlanguage{russian} > .... > > \setfontforbulgarian > \selectlanguage{bulgarian} > .... There's no need. With \babelfont you can associate a font with a language (declared with both the classic syntax and \babelprovide. And when you use \selectlanguage, \foreignlanguage or any other babel command or environment to switch languages, the associated font is activated for that language. For example: \babelprovide[import]{russian} \babelprovide[import]{bulgarian} \babelfont[russian]{rm}[]{Old Standard} \babelfont[bulgarian]{rm}[]{Freeserif} and then: \selectlanguage{russian} ... \selectlanguage{bulgarian} ... \babelprovide supports several properties. Adding the onchar=ids fonts/letters property equates language and script, and everything in that script is associated with a font. This would only make sense to use when there is only one language in the document that has that script, as we discussed before. In case like russian/bulgarian, the source of the last babelprovide is overwritten for all cases where that script appears. > >> In any case (to organize myself mentally) I thought that it could be >> done on two levels: >> >> - Level 0: The fonts associated with each script are loaded (from a >> defcustom list) if luatex is the current engine. And low-level code is >> generated in Lua with the luaotfload.add_fallback function. That code >> can be in a Lua file or directly within the preamble, enclosed in the >> \directlua primitive (mode=harf means that HarfBuzz is used as otf >> rendering): >> ... > > Sounds reasonable. > >> - Level 1: The user can load language properties and associate fonts >> with each language using Babel's high-level code (via keywords in Org, >> as we have commented in previous messages). Here you can also modify >> the default fonts (also, as we mentioned before): main, mono, sans and >> math. If the language is declared with an asterisk (for example: >> russian*) the onchar=etc property will be included in the preamble, >> and it would not be necessary to switch to russian explicitly. It is >> assumed that in this scenario the only language with Cyrillic script >> would be Russian. For language swithcing, in the rest of the cases, >> some babel command would have to be used using @@latex:@@, special >> blocks, etc. When Org already has its own language switching >> mechanism, this would be used instead. Wdyt? > > I am not sure if I like "russian*" idea. May you explain a bit more > about how onchar works? What if language characters are intersecting, > and not using exactly the same char sets? Basically, it's like I said above. According to the Babel Manual: #+begin_quote onchar= ids | fonts | letters This option is much like an ‘event’ called when a character belonging to the script of this locale is found (as its name implies, it acts on characters, not on spaces). There are currently two ‘actions’, which can be used at the same time (separated by a space): with ids the \language and the \localeid are set to the values of this locale; with fonts, the fonts are changed to those of this locale (as set with \babelfont). Characters can be added or modified with \babelcharproperty. [...] Option letters restricts the ‘actions’ to letters, in the TEX sense (i. e., with catcode 11). Digits and punctuation are then considered part of current locale (as set by a selector). This option is useful when the main script is non-Latin and there is a secondary one whose script is Latin. #+end_quote -- Juan Manuel Macías https://juanmanuelmacias.com https://lunotipia.juanmanuelmacias.com https://gnutas.juanmanuelmacias.com ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Fallback fonts in LaTeX export for non latin scripts 2023-09-07 12:04 ` Juan Manuel Macías @ 2023-09-08 7:42 ` Ihor Radchenko 0 siblings, 0 replies; 21+ messages in thread From: Ihor Radchenko @ 2023-09-08 7:42 UTC (permalink / raw) To: Juan Manuel Macías; +Cc: Max Nikulin, emacs-orgmode Juan Manuel Macías <maciaschain@posteo.net> writes: >> I am not sure if I like "russian*" idea. May you explain a bit more >> about how onchar works? What if language characters are intersecting, >> and not using exactly the same char sets? > > Basically, it's like I said above. According to the Babel Manual: > > #+begin_quote > onchar= ids | fonts | letters > > This option is much like an ‘event’ called when a character belonging to > the script of this locale is found (as its name implies, it acts on > characters, not on spaces). There are currently two ‘actions’, which can > be used at the same time (separated by a space): with ids the \language > and the \localeid are set to the values of this locale; with fonts, the > fonts are changed to those of this locale (as set with \babelfont). > Characters can be added or modified with \babelcharproperty. > > [...] Option letters restricts the ‘actions’ to letters, in the TEX > sense (i. e., with catcode 11). Digits and punctuation are then > considered part of current locale (as set by a selector). This option is > useful when the main script is non-Latin and there is a secondary one > whose script is Latin. > #+end_quote Thanks for the explanation! Then, language* it is. I have no better idea. -- Ihor Radchenko // yantar92, Org mode contributor, Learn more about Org mode at <https://orgmode.org/>. Support Org development at <https://liberapay.com/org-mode>, or support my work at <https://liberapay.com/yantar92> ^ permalink raw reply [flat|nested] 21+ messages in thread
end of thread, other threads:[~2023-09-26 10:36 UTC | newest] Thread overview: 21+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2023-08-30 8:25 Fallback fonts in LaTeX export for non latin scripts Juan Manuel Macías 2023-08-31 8:17 ` Ihor Radchenko 2023-08-31 11:42 ` Juan Manuel Macías 2023-09-01 9:18 ` Ihor Radchenko 2023-09-02 21:39 ` Juan Manuel Macías 2023-09-03 7:22 ` Ihor Radchenko 2023-09-03 11:05 ` Juan Manuel Macías 2023-09-04 8:09 ` Ihor Radchenko 2023-09-04 22:22 ` Juan Manuel Macías 2023-09-05 10:44 ` Ihor Radchenko 2023-09-20 14:03 ` Juan Manuel Macías 2023-09-21 9:00 ` Ihor Radchenko 2023-09-24 18:24 ` Juan Manuel Macías 2023-09-26 10:37 ` Ihor Radchenko 2023-09-05 16:42 ` Max Nikulin 2023-09-05 18:33 ` Juan Manuel Macías 2023-09-06 9:29 ` Ihor Radchenko 2023-09-06 14:58 ` Juan Manuel Macías 2023-09-07 10:22 ` Ihor Radchenko 2023-09-07 12:04 ` Juan Manuel Macías 2023-09-08 7:42 ` Ihor Radchenko
Code repositories for project(s) associated with this public inbox https://git.savannah.gnu.org/cgit/emacs/org-mode.git This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).