* [PATCH][oc-csl] Improve reference parsing @ 2022-10-26 15:40 András Simonyi 2022-10-27 4:10 ` Ihor Radchenko 0 siblings, 1 reply; 16+ messages in thread From: András Simonyi @ 2022-10-26 15:40 UTC (permalink / raw) To: emacs-orgmode list [-- Attachment #1: Type: text/plain, Size: 268 bytes --] Dear All, the attached patch improves the parsing and exporting of cite prefixes, suffixes and locators -- the most noticeable change is probably the support for formatted locators and of underlining in general. Comments are welcome. best wishes, András [-- Attachment #2: 0001-oc-csl.el-Improve-reference-parsing.patch --] [-- Type: text/x-patch, Size: 5233 bytes --] From 5bec7025f66eb65f13a701dc616aca2440110c1a Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Andr=C3=A1s=20Simonyi?= <andras.simonyi@gmail.com> Date: Wed, 26 Oct 2022 12:15:42 +0200 Subject: [PATCH] oc-csl.el: Improve reference parsing * lisp/oc-csl.el (org-cite-csl--export-backend): New constant to provide a trivial export back-end for exporting reference affixes and locators with the simple html-based markup expected by citeproc. (org-cite-csl--parse-reference): Do not construct the reference locator and include it in the result, since citeproc does not make use of it. Start the suffix immediately after the locator's ending, skipping the ending comma if necessary. Use `org-cite-csl--export-backend' to export reference affixes and locators. --- lisp/oc-csl.el | 45 ++++++++++++++++++++++++++++----------------- 1 file changed, 28 insertions(+), 17 deletions(-) diff --git a/lisp/oc-csl.el b/lisp/oc-csl.el index 1ccb74e92..30eac9f1a 100644 --- a/lisp/oc-csl.el +++ b/lisp/oc-csl.el @@ -140,9 +140,10 @@ (declare-function org-element-property "org-element" (property element)) (declare-function org-element-put-property "org-element" (element property value)) -(declare-function org-export-data "org-export" (data info)) +(declare-function org-export-data-with-backend "org-export" (data backend info)) (declare-function org-export-derived-backend-p "org-export" (backend &rest backends)) (declare-function org-export-get-footnote-number "org-export" (footnote info &optional data body-first)) +(declare-function org-export-create-backend "org-export" (&key transcoders)) \f ;;; Customization @@ -310,6 +311,16 @@ If nil then the Chicago author-date style is used as a fallback.") "Regexp matching a label in a citation reference suffix. Label is in match group 1.") +(defconst org-cite-csl--export-backend + (org-export-create-backend + :transcoders + '((plain-text . (lambda (text _info) text)) + (bold . (lambda (_bold contents _info) (concat "<b>" contents "</b>"))) + (italic . (lambda (_italic contents _info) (concat "<i>" contents "</i>"))) + (underline . (lambda (_underline contents _info) + (concat "<span class=\"underline\">" contents "</span>"))))) + "Custom backend for exporting citation affixes and locators.") + \f ;;; Internal functions (defun org-cite-csl--barf-without-citeproc () @@ -476,11 +487,10 @@ property in INFO." INFO is the export state, as a property list. The result is a association list. Keys are: `id', `prefix',`suffix', -`location', `locator' and `label'." - (let (label location-start locator-start location locator prefix suffix) +`locator' and `label'." + (let (label location-start locator-start locator prefix suffix) ;; Parse suffix. Insert it in a temporary buffer to find - ;; different parts: pre-label, label, locator, location (label + - ;; locator), and suffix. + ;; different parts: pre-label, label, locator, and suffix. (with-temp-buffer (save-excursion (insert (org-element-interpret-data @@ -506,12 +516,15 @@ The result is a association list. Keys are: `id', `prefix',`suffix', (let ((re (rx (or "," (group digit))))) (when (re-search-backward re location-start t) (goto-char (or (match-end 1) (match-beginning 0))) - (setq location (buffer-substring location-start (point))) - (setq locator (org-trim (buffer-substring locator-start (point)))) + (setq locator + (org-cite-parse-objects + (buffer-substring locator-start (point)) + t)) ;; Skip comma in suffix. + (when (= (following-char) ?,) (forward-char)) (setq suffix (org-cite-parse-objects - (buffer-substring (match-end 0) (point-max)) + (buffer-substring (point) (point-max)) t))))) (setq prefix (org-cite-concat @@ -525,18 +538,16 @@ The result is a association list. Keys are: `id', `prefix',`suffix', (lambda (data) (org-string-nw-p (org-trim - ;; When Citeproc exports to Org syntax, avoid mix and - ;; matching output formats by also generating Org - ;; syntax for prefix and suffix. - (if (eq 'org (org-cite-csl--output-format info)) - (org-element-interpret-data data) - (org-export-data data info))))))) + ;; Export the parsed prefix, suffix, and locator + ;; with a custom backend, which produces the simple + ;; html markup expected by citeproc. + (org-export-data-with-backend + data org-cite-csl--export-backend info)))))) `((id . ,(org-element-property :key reference)) (prefix . ,(funcall export prefix)) (suffix . ,(funcall export suffix)) - (locator . ,locator) - (label . ,label) - (location . ,location))))) + (locator . ,(funcall export locator)) + (label . ,label))))) (defun org-cite-csl--create-structure (citation info) "Create Citeproc structure for CITATION object. -- 2.25.1 ^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [PATCH][oc-csl] Improve reference parsing 2022-10-26 15:40 [PATCH][oc-csl] Improve reference parsing András Simonyi @ 2022-10-27 4:10 ` Ihor Radchenko 2022-11-01 15:02 ` András Simonyi 0 siblings, 1 reply; 16+ messages in thread From: Ihor Radchenko @ 2022-10-27 4:10 UTC (permalink / raw) To: András Simonyi; +Cc: emacs-orgmode list András Simonyi <andras.simonyi@gmail.com> writes: > the attached patch improves the parsing and exporting of cite > prefixes, suffixes and locators -- the most noticeable change is > probably the support for formatted locators and of underlining in > general. Comments are welcome. Thanks! > +(defconst org-cite-csl--export-backend > + (org-export-create-backend > + :transcoders > + '((plain-text . (lambda (text _info) text)) > + (bold . (lambda (_bold contents _info) (concat "<b>" contents "</b>"))) > + (italic . (lambda (_italic contents _info) (concat "<i>" contents "</i>"))) > + (underline . (lambda (_underline contents _info) > + (concat "<span class=\"underline\">" contents "</span>"))))) > + "Custom backend for exporting citation affixes and locators.") This will render e.g. strike-through empty. Note that citation references may contain the following Org markup objects: '(bold code entity italic latex-fragment strike-through subscript superscript underline verbatim) And we may add more, as discussed in https://orgmode.org/list/87k04xhhw3.fsf@localhost -- Ihor Radchenko // yantar92, Org mode contributor, Learn more about Org mode at <https://orgmode.org/>. Support Org development at <https://liberapay.com/org-mode>, or support my work at <https://liberapay.com/yantar92> ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH][oc-csl] Improve reference parsing 2022-10-27 4:10 ` Ihor Radchenko @ 2022-11-01 15:02 ` András Simonyi 2022-11-02 6:29 ` Ihor Radchenko 0 siblings, 1 reply; 16+ messages in thread From: András Simonyi @ 2022-11-01 15:02 UTC (permalink / raw) To: Ihor Radchenko; +Cc: emacs-orgmode list [-- Attachment #1: Type: text/plain, Size: 1604 bytes --] Dear All, On Thu, 27 Oct 2022 at 06:10, Ihor Radchenko <yantar92@posteo.net> wrote: > This will render e.g. strike-through empty. > Note that citation references may contain the following Org markup objects: > '(bold code entity italic > latex-fragment strike-through subscript > superscript underline verbatim) thanks for pointing out the problem!! I've attached a new version of the patch, in which the custom exporter backend has an (in many cases trivial) translator for all currently allowed objects. > And we may add more, as discussed in > https://orgmode.org/list/87k04xhhw3.fsf@localhost I don't think that it would make much sense to add a lot more, with the possible exception of links, since citations are at most sentence-sized textual units, not to mention the possible complications arising for the existing export processors. (What type of objects could the various LaTeX-based exporters support without complex changes?) Since CSL has only a few types of formatting attributes (font-style, font-variant, font-weight, text-decoration and vertical-align), if the set of allowed object is radically expanded then it will probably be more reasonable to define a derived backed, maybe based on the ascii exporter, but I feel that the current set doesn't require this solution. thanks & best wishes, András > > -- > Ihor Radchenko // yantar92, > Org mode contributor, > Learn more about Org mode at <https://orgmode.org/>. > Support Org development at <https://liberapay.com/org-mode>, > or support my work at <https://liberapay.com/yantar92> [-- Attachment #2: 0001-oc-csl.el-Improve-reference-parsing.patch --] [-- Type: text/x-patch, Size: 6019 bytes --] From 5dfbb8ef9291f906014800013cdb9a9d5569b728 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Andr=C3=A1s=20Simonyi?= <andras.simonyi@gmail.com> Date: Wed, 26 Oct 2022 12:15:42 +0200 Subject: [PATCH] oc-csl.el: Improve reference parsing * lisp/oc-csl.el (org-cite-csl--export-backend): New constant to provide a trivial export back-end for exporting reference affixes and locators with the simple html-based markup expected by citeproc. (org-cite-csl--parse-reference): Do not construct the reference locator and include it in the result, since citeproc does not make use of it. Start the suffix immediately after the locator's ending, skipping the ending comma if necessary. Use `org-cite-csl--export-backend' to export reference affixes and locators. --- lisp/oc-csl.el | 57 +++++++++++++++++++++++++++++++++++--------------- 1 file changed, 40 insertions(+), 17 deletions(-) diff --git a/lisp/oc-csl.el b/lisp/oc-csl.el index 1ccb74e92..1f40a9e8a 100644 --- a/lisp/oc-csl.el +++ b/lisp/oc-csl.el @@ -140,9 +140,10 @@ (declare-function org-element-property "org-element" (property element)) (declare-function org-element-put-property "org-element" (element property value)) -(declare-function org-export-data "org-export" (data info)) +(declare-function org-export-data-with-backend "org-export" (data backend info)) (declare-function org-export-derived-backend-p "org-export" (backend &rest backends)) (declare-function org-export-get-footnote-number "org-export" (footnote info &optional data body-first)) +(declare-function org-export-create-backend "org-export" (&key transcoders)) \f ;;; Customization @@ -310,8 +311,30 @@ If nil then the Chicago author-date style is used as a fallback.") "Regexp matching a label in a citation reference suffix. Label is in match group 1.") +(defconst org-cite-csl--export-backend + (org-export-create-backend + :transcoders + '((bold . (lambda (_bold contents _info) (format "<b>%s</b>" contents))) + (code . org-cite-csl--element-value) + (entity . (lambda (entity _contents _info) + (format "\\%s" (org-element-property :name entity)))) + (italic . (lambda (_italic contents _info) (format "<i>%s</i>" contents))) + (latex-fragment . org-cite-csl--element-value) + (plaintext . (lambda (contents _info) contents)) + (strike-through . (lambda (_strike-through contents _info) contents)) + (subscript . (lambda (_subscript contents _info) (format "<sub>%s</sub>" contents))) + (superscript . (lambda (_superscript contents _info) (format "<sup>%s</sup>" contents))) + (underline . (lambda (_underline contents _info) + (format "<span class=\"underline\">%s</span>" contents))) + (verbatim . org-cite-csl--element-value))) + "Custom backend for exporting citation affixes and locators.") + \f ;;; Internal functions +(defun org-cite-csl--element-value (element _contents _info) + "Return the`:value' property of ELEMENT." + (org-element-property :value element)) + (defun org-cite-csl--barf-without-citeproc () "Raise an error if Citeproc library is not loaded." (unless (featurep 'citeproc) @@ -476,11 +499,10 @@ property in INFO." INFO is the export state, as a property list. The result is a association list. Keys are: `id', `prefix',`suffix', -`location', `locator' and `label'." - (let (label location-start locator-start location locator prefix suffix) +`locator' and `label'." + (let (label location-start locator-start locator prefix suffix) ;; Parse suffix. Insert it in a temporary buffer to find - ;; different parts: pre-label, label, locator, location (label + - ;; locator), and suffix. + ;; different parts: pre-label, label, locator, and suffix. (with-temp-buffer (save-excursion (insert (org-element-interpret-data @@ -506,12 +528,15 @@ The result is a association list. Keys are: `id', `prefix',`suffix', (let ((re (rx (or "," (group digit))))) (when (re-search-backward re location-start t) (goto-char (or (match-end 1) (match-beginning 0))) - (setq location (buffer-substring location-start (point))) - (setq locator (org-trim (buffer-substring locator-start (point)))) + (setq locator + (org-cite-parse-objects + (buffer-substring locator-start (point)) + t)) ;; Skip comma in suffix. + (when (= (following-char) ?,) (forward-char)) (setq suffix (org-cite-parse-objects - (buffer-substring (match-end 0) (point-max)) + (buffer-substring (point) (point-max)) t))))) (setq prefix (org-cite-concat @@ -525,18 +550,16 @@ The result is a association list. Keys are: `id', `prefix',`suffix', (lambda (data) (org-string-nw-p (org-trim - ;; When Citeproc exports to Org syntax, avoid mix and - ;; matching output formats by also generating Org - ;; syntax for prefix and suffix. - (if (eq 'org (org-cite-csl--output-format info)) - (org-element-interpret-data data) - (org-export-data data info))))))) + ;; Export the parsed prefix, suffix, and locator + ;; with a custom backend that produces the simple + ;; html markup expected by citeproc. + (org-export-data-with-backend + data org-cite-csl--export-backend info)))))) `((id . ,(org-element-property :key reference)) (prefix . ,(funcall export prefix)) (suffix . ,(funcall export suffix)) - (locator . ,locator) - (label . ,label) - (location . ,location))))) + (locator . ,(funcall export locator)) + (label . ,label))))) (defun org-cite-csl--create-structure (citation info) "Create Citeproc structure for CITATION object. -- 2.25.1 ^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [PATCH][oc-csl] Improve reference parsing 2022-11-01 15:02 ` András Simonyi @ 2022-11-02 6:29 ` Ihor Radchenko 2022-11-02 17:58 ` András Simonyi 0 siblings, 1 reply; 16+ messages in thread From: Ihor Radchenko @ 2022-11-02 6:29 UTC (permalink / raw) To: András Simonyi; +Cc: emacs-orgmode list András Simonyi <andras.simonyi@gmail.com> writes: > thanks for pointing out the problem!! I've attached a new version of > the patch, in which the custom exporter backend has an (in many cases > trivial) translator for all currently allowed objects. Thanks! >> And we may add more, as discussed in >> https://orgmode.org/list/87k04xhhw3.fsf@localhost > > I don't think that it would make much sense to add a lot more, with > the possible exception of links, since citations are at most > sentence-sized textual units, not to mention the possible > complications arising for the existing export processors. (What type > of objects could the various LaTeX-based exporters support without > complex changes?) Since CSL has only a few types of formatting > attributes (font-style, font-variant, font-weight, text-decoration and > vertical-align), if the set of allowed object is radically expanded > then it will probably be more reasonable to define a derived backed, > maybe based on the ascii exporter, but I feel that the current set > doesn't require this solution. I do not think that CSL limitations are really limiting us. - Allowing macros will be handled by ox.el itself automatically - Export snippets can also be processed without much issue (consider direct LaTeX code) - inline-babel-call and inline src blocks may be useful with :exports results when some auto-generation of text is needed. They will also be handled automatically by ob-exp. - latex-fragments are either equivalent to direct LaTeX or to inserting an image - timestamps could be exported as text, although I do not see any obvious utility of timestamps inside references. However, oc-csl should not ignore the export processor to support all the above. I am not sure why you need a dedicated export processor instead of passing the string to current processor (or derivative) instead. If you really need to mark certain constructs specially for CSL, you can create a derived export backend for the current backend and replace the transcoders for the object types that must be treated specially. > +(defconst org-cite-csl--export-backend > + (org-export-create-backend > + :transcoders > + '((bold . (lambda (_bold contents _info) (format "<b>%s</b>" contents))) > + (code . org-cite-csl--element-value) > + (entity . (lambda (entity _contents _info) > + (format "\\%s" (org-element-property :name entity)))) Why :name, but not :html? -- Ihor Radchenko // yantar92, Org mode contributor, Learn more about Org mode at <https://orgmode.org/>. Support Org development at <https://liberapay.com/org-mode>, or support my work at <https://liberapay.com/yantar92> ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH][oc-csl] Improve reference parsing 2022-11-02 6:29 ` Ihor Radchenko @ 2022-11-02 17:58 ` András Simonyi 2022-11-03 6:34 ` Ihor Radchenko 0 siblings, 1 reply; 16+ messages in thread From: András Simonyi @ 2022-11-02 17:58 UTC (permalink / raw) To: Ihor Radchenko; +Cc: emacs-orgmode list Dear All, On Wed, 2 Nov 2022 at 07:28, Ihor Radchenko <yantar92@posteo.net> wrote: > I do not think that CSL limitations are really limiting us. > > - Allowing macros will be handled by ox.el itself automatically > - Export snippets can also be processed without much issue (consider > direct LaTeX code) > - inline-babel-call and inline src blocks may be useful with :exports > results when some auto-generation of text is needed. They will also be > handled automatically by ob-exp. > - latex-fragments are either equivalent to direct LaTeX or to inserting > an image > - timestamps could be exported as text, although I do not see any > obvious utility of timestamps inside references. I'm not really familiar with the internals of the Org exporter but, looking at the ox.el code, macros and babel calls are processed and resolved before processing citations, so they seemingly have no bearing on the org-cite-csl--parse-reference function my patch is concerned with. > However, oc-csl should not ignore the export processor to support all > the above. I am not sure why you need a dedicated export processor > instead of passing the string to current processor (or derivative) > instead. > If you really need to mark certain constructs specially for CSL, you can > create a derived export backend for the current backend and replace the > transcoders for the object types that must be treated specially. Other than macros and babel calls, e.g., timestamps, LaTeX fragments etc. the problem is that citeproc-el expects and needs the affixes and locator to be passed in the very limited html-like markup supported by CSL (see https://www.zotero.org/support/kb/rich_text_bibliography for a rudimentary description), and, crucially, the assumption is that everything else is plain text, which, if necessary, will be escaped according to the target format, i.e., '$' signs are escaped by citeproc-el's own LaTeX formatter. The reason for this limitation is that the affixes and especially the locator have to be parsed into citeproc-el's internal rich-text representation for further processing according to the used CSL style. (Affixes are only concatenated to other elements but locators can be the subject of any type of formatting.) As a consequence, I think the only real alternatives are using a custom backend as I do in the current patch or a backend derived from the plain text Org exporter -- I don't have a strong preference as to which solution we choose, just went with the seemingly more minimalist option. (The proper way of dealing with LaTeX fragments in this context, in particular with LaTeX math fragments, would be to support those in citeproc-el's internal representation and markup, which is planned but not implemented yet.) > > +(defconst org-cite-csl--export-backend > > + (org-export-create-backend > > + :transcoders > > + '((bold . (lambda (_bold contents _info) (format "<b>%s</b>" contents))) > > + (code . org-cite-csl--element-value) > > + (entity . (lambda (entity _contents _info) > > + (format "\\%s" (org-element-property :name entity)))) > > Why :name, but not :html? Good point, thinking about it a bit more, :utf-8 would probably be a slightly better solution (in keeping with citeproc-el's 'plain text' requirement), I'will change this when we will have sorted out the other details. best wishes, András ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH][oc-csl] Improve reference parsing 2022-11-02 17:58 ` András Simonyi @ 2022-11-03 6:34 ` Ihor Radchenko 2023-01-07 12:50 ` M. ‘quintus’ Gülker 0 siblings, 1 reply; 16+ messages in thread From: Ihor Radchenko @ 2022-11-03 6:34 UTC (permalink / raw) To: András Simonyi; +Cc: emacs-orgmode list András Simonyi <andras.simonyi@gmail.com> writes: > On Wed, 2 Nov 2022 at 07:28, Ihor Radchenko <yantar92@posteo.net> wrote: > >> I do not think that CSL limitations are really limiting us. > ... > I'm not really familiar with the internals of the Org exporter but, > looking at the ox.el code, macros and babel calls are processed and > resolved before processing citations, so they seemingly have no > bearing on the org-cite-csl--parse-reference function my patch is > concerned with. > Other than macros and babel calls, e.g., timestamps, LaTeX fragments > etc. the problem is that citeproc-el expects and needs the affixes and > locator to be passed in the very limited html-like markup supported by > CSL (see https://www.zotero.org/support/kb/rich_text_bibliography for > a rudimentary description), and, crucially, the assumption is that > everything else is plain text, which, if necessary, will be escaped > according to the target format, i.e., '$' signs are escaped by > citeproc-el's own LaTeX formatter. The reason for this limitation is > that the affixes and especially the locator have to be parsed into > citeproc-el's internal rich-text representation for further processing > according to the used CSL style. (Affixes are only concatenated to > other elements but locators can be the subject of any type of > formatting.) As a consequence, I think the only real alternatives are > using a custom backend as I do in the current patch or a backend > derived from the plain text Org exporter -- I don't have a strong > preference as to which solution we choose, just went with the > seemingly more minimalist option. (The proper way of dealing with > LaTeX fragments in this context, in particular with LaTeX math > fragments, would be to support those in citeproc-el's internal > representation and markup, which is planned but not implemented yet.) Could you please explain in more details why CSL require special export of the prefix/suffix? What will happen if we simply pass the Org markup verbatim? I am asking because org-cite-csl-render-citation uses org-cite-parse-objects so, unless citeproc does something terrible with the original Org syntax, we can re-parse the output string and export appropriately according to the current export backend. -- Ihor Radchenko // yantar92, Org mode contributor, Learn more about Org mode at <https://orgmode.org/>. Support Org development at <https://liberapay.com/org-mode>, or support my work at <https://liberapay.com/yantar92> ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH][oc-csl] Improve reference parsing 2022-11-03 6:34 ` Ihor Radchenko @ 2023-01-07 12:50 ` M. ‘quintus’ Gülker 2023-01-15 8:56 ` Ihor Radchenko 0 siblings, 1 reply; 16+ messages in thread From: M. ‘quintus’ Gülker @ 2023-01-07 12:50 UTC (permalink / raw) To: Ihor Radchenko; +Cc: András Simonyi, emacs-orgmode Dear all, I probably have not much to contribute to this rather technical thread, but Ihor has redirected me here two times for my citation formatting questions[1][2]. So I would like to ask if there is something I can do to accelerate its inclusion into org so that I can start using macros in citations? -quintus [1]: https://list.orgmode.org/orgmode/87o7tb8pc1.fsf@localhost/ [2]: https://list.orgmode.org/orgmode/87zgcw8gtd.fsf@localhost/ -- Dipl.-Jur. M. Gülker | https://mg.guelker.eu | PGP: Siehe Webseite Passau, Deutschland | kontakt@guelker.eu | O< ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH][oc-csl] Improve reference parsing 2023-01-07 12:50 ` M. ‘quintus’ Gülker @ 2023-01-15 8:56 ` Ihor Radchenko 2023-01-18 23:08 ` András Simonyi 0 siblings, 1 reply; 16+ messages in thread From: Ihor Radchenko @ 2023-01-15 8:56 UTC (permalink / raw) To: M. ‘quintus’ Gülker; +Cc: András Simonyi, emacs-orgmode M. ‘quintus’ Gülker <post+orgmodeml@guelker.eu> writes: > I probably have not much to contribute to this rather technical thread, > but Ihor has redirected me here two times for my citation formatting > questions[1][2]. So I would like to ask if there is something I can do to > accelerate its inclusion into org so that I can start using macros in > citations? András is the author of citeproc.el. I am not sure who else would be in position to help us to move this forward. My understanding of CSL is non-existing. I can only tell that citeproc.el has its own implementation of citation export (`citeproc-render-citations'), which expects some limited kind of html as input. I am hoping that we can somehow work around limited markup support of citeproc's implementation and instead leverage ox.el to do the job. Otherwise, we will keep stumbling upon citeproc.el limitations when exporting bibliography items. -- Ihor Radchenko // yantar92, Org mode contributor, Learn more about Org mode at <https://orgmode.org/>. Support Org development at <https://liberapay.com/org-mode>, or support my work at <https://liberapay.com/yantar92> ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH][oc-csl] Improve reference parsing 2023-01-15 8:56 ` Ihor Radchenko @ 2023-01-18 23:08 ` András Simonyi 2023-01-19 8:21 ` M. ‘quintus’ Gülker 2023-01-19 9:56 ` Ihor Radchenko 0 siblings, 2 replies; 16+ messages in thread From: András Simonyi @ 2023-01-18 23:08 UTC (permalink / raw) To: Ihor Radchenko; +Cc: M. ‘quintus’ Gülker, emacs-orgmode Dear All, apologies for replying that late. If I understand the situation correctly, we could handle the question of allowing macros in citations independently of the handling of other constructs, because macros are resolved before processing citations, so they have no effect on the input of Citeproc-el. In light of this, maybe there could be a separate patch for just allowing macros? As for the question of other elements, I proposed the custom backend-based approach because CSL has its own rich-text markup (which is actually not simply a subset of Org's, for example, it contains small-caps, which is not in Org), and, consequently, Citeproc-el has its own internal rich-text representations (ASTs), on which it performs the operations that are prescribed by the various CSL styles. When the rich text citation/bibliography is finalized, it can be "serialized" or "formatted" (analogously to Org's exporting a parse tree) using one of the Citeproc formatters, e.g. into LaTeX, HTML or Org. As the prefix, suffix and the locator also need to be operated on by the processor (concatenated to other rich text elements etc.,), they also have to be parsed into CIteproc el's internal rich-text representations. Since this is a given, the only question is in what format should they be passed, and the simple HTML-like standard which is already supported by Citeproc-el (see https://www.zotero.org/support/kb/rich_text_bibliography) seems to be the simplest solution. Ihor Radchenko <yantar92@posteo.net> wrote: > Could you please explain in more details why CSL require special > export of the prefix/suffix? What will happen if we simply pass the Org > markup verbatim? Since Citeproc-el assumes that all formatting in the prefix/suffix is in the HTML-like markup mentioned above, any Org markup would be treated as plain text which should be preserved as is, and not interpreted as formatting, so, for example, when an Org document with underlined text in a citation prefix were exported to LaTeX then the Citeproc LaTeX formatter would escape the underscore characters ("\_") to preserve them in the output and the citation would be inserted in this form into the resulting LaTeX document. > I am asking because org-cite-csl-render-citation uses > org-cite-parse-objects so, unless citeproc does something terrible with > the original Org syntax, we can re-parse the output string and export > appropriately according to the current export backend. See above, unfortunately, this wouldn't work, at least not in a general and safe way. best wishes, András On Sun, 15 Jan 2023 at 09:56, Ihor Radchenko <yantar92@posteo.net> wrote: > > M. ‘quintus’ Gülker <post+orgmodeml@guelker.eu> writes: > > > I probably have not much to contribute to this rather technical thread, > > but Ihor has redirected me here two times for my citation formatting > > questions[1][2]. So I would like to ask if there is something I can do to > > accelerate its inclusion into org so that I can start using macros in > > citations? > > András is the author of citeproc.el. I am not sure who else would be in > position to help us to move this forward. > > My understanding of CSL is non-existing. I can only tell that > citeproc.el has its own implementation of citation export > (`citeproc-render-citations'), which expects some limited kind of html > as input. I am hoping that we can somehow work around limited markup > support of citeproc's implementation and instead leverage ox.el to do > the job. Otherwise, we will keep stumbling upon citeproc.el limitations > when exporting bibliography items. > > -- > Ihor Radchenko // yantar92, > Org mode contributor, > Learn more about Org mode at <https://orgmode.org/>. > Support Org development at <https://liberapay.com/org-mode>, > or support my work at <https://liberapay.com/yantar92> ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH][oc-csl] Improve reference parsing 2023-01-18 23:08 ` András Simonyi @ 2023-01-19 8:21 ` M. ‘quintus’ Gülker 2023-01-19 9:35 ` András Simonyi 2023-01-19 9:56 ` Ihor Radchenko 1 sibling, 1 reply; 16+ messages in thread From: M. ‘quintus’ Gülker @ 2023-01-19 8:21 UTC (permalink / raw) To: András Simonyi; +Cc: Ihor Radchenko, emacs-orgmode Am Donnerstag, dem 19. Januar 2023 schrieb András Simonyi: > apologies for replying that late. If I understand the situation > correctly, we could handle the question of allowing macros in > citations independently of the handling of other constructs, because > macros are resolved before processing citations, so they have no > effect on the input of Citeproc-el. In light of this, maybe there > could be a separate patch for just allowing macros? I am not sure this targets the usecase I am pursuing, which is to use macros to produce @@latex: escape constructs in order to have small-caps markup in the citation footnotes: #+MACRO: name @@latex:\textsc{$1}@@@@html:<span class="name">$1</span>@@ If the macro resolves, but the @@latex construct does not, that would be problematic. That being said, I /found/ an alternative that works, albeit it is a bit ugly. I can create an explicit footnote, use a [cite/default/bare:] construct (to suppress the terminal period) within it and terminate the citation before the macro begins. That way, the macro is outside of the citation construct. This construction is however unfortunate when I want to cite multiple sources and have the macro used on an earlier one, e.g.: [fn:1] [cite/default/bare:@foo p. 5], countering {{{name(Doe’s)}}} argument; [cite/default/bare:@bar p. 37]. It would be nicer if I could just write into the main text [cite:@foo p. 5, countering {{{name(Doe’s)}}} argument;@bar p. 37] I can however live with the more elaborate construction, if nothing else. -quintus -- Dipl.-Jur. M. Gülker | https://mg.guelker.eu | PGP: Siehe Webseite Passau, Deutschland | kontakt@guelker.eu | O< ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH][oc-csl] Improve reference parsing 2023-01-19 8:21 ` M. ‘quintus’ Gülker @ 2023-01-19 9:35 ` András Simonyi 2023-01-19 9:59 ` Ihor Radchenko 2023-01-19 10:11 ` M. ‘quintus’ Gülker 0 siblings, 2 replies; 16+ messages in thread From: András Simonyi @ 2023-01-19 9:35 UTC (permalink / raw) To: András Simonyi, Ihor Radchenko, emacs-orgmode Dear All, On Thu, 19 Jan 2023 at 09:35, M. ‘quintus’ Gülker <post+orgmodeml@guelker.eu> wrote: > I am not sure this targets the usecase I am pursuing, which is to use > macros to produce @@latex: escape constructs in order to have small-caps > markup in the citation footnotes: > > #+MACRO: name @@latex:\textsc{$1}@@@@html:<span class="name">$1</span>@@ > > If the macro resolves, but the @@latex construct does not, that would be > problematic. hopefully somebody more knowledgeable than me can comment on how viable this is, but would a @@csl like export snippet construct help with the problem? In that case your macro could be along the lines of #+MACRO: name @@csl:<span style="font-variant: small-caps">$1</span>@@ and -- assuming the custom export backend approach I proposed in the patch -- we would only need to make sure that the inline @@csl export snippets are exported as is by this "csl" backend. best wishes, András ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH][oc-csl] Improve reference parsing 2023-01-19 9:35 ` András Simonyi @ 2023-01-19 9:59 ` Ihor Radchenko 2023-01-19 10:11 ` M. ‘quintus’ Gülker 1 sibling, 0 replies; 16+ messages in thread From: Ihor Radchenko @ 2023-01-19 9:59 UTC (permalink / raw) To: András Simonyi; +Cc: emacs-orgmode András Simonyi <andras.simonyi@gmail.com> writes: > In that case your macro could be along the lines of > > #+MACRO: name @@csl:<span style="font-variant: small-caps">$1</span>@@ > > and -- assuming the custom export backend approach I proposed in the > patch -- we would only need to make sure that the inline @@csl export > snippets are exported as is by this "csl" backend. I think it could be a good option. Especially if the macro also provides a good fallback for non-CSL citation backends. -- Ihor Radchenko // yantar92, Org mode contributor, Learn more about Org mode at <https://orgmode.org/>. Support Org development at <https://liberapay.com/org-mode>, or support my work at <https://liberapay.com/yantar92> ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH][oc-csl] Improve reference parsing 2023-01-19 9:35 ` András Simonyi 2023-01-19 9:59 ` Ihor Radchenko @ 2023-01-19 10:11 ` M. ‘quintus’ Gülker 2023-01-25 22:44 ` András Simonyi 1 sibling, 1 reply; 16+ messages in thread From: M. ‘quintus’ Gülker @ 2023-01-19 10:11 UTC (permalink / raw) To: András Simonyi; +Cc: Ihor Radchenko, emacs-orgmode Am Donnerstag, dem 19. Januar 2023 schrieb András Simonyi: > hopefully somebody more knowledgeable than me can comment on how > viable this is, but would a @@csl like export snippet construct help > with the problem? > In that case your macro could be along the lines of > > #+MACRO: name @@csl:<span style="font-variant: small-caps">$1</span>@@ It is an interesting approach, but it has a drawback. I use this macro also in the ordinary text when I refer to persons without an explicit citation. That is, the macro has to work both in a citation and in normal text. Even if a @@csl: construct would be ignored in normal text, I cannot see how to write the macro then, because something like #+MACRO: name @@csl:<span style="font-variant: small-caps">$1</span>@@@@latex:\textsc{$1}@@@@html:<span class="name">$1</span>@@ would still transfer the @@latex: and @@html: constructs into the footnote. They would have to be expressly ignored by the citation processor. -quintus -- Dipl.-Jur. M. Gülker | https://mg.guelker.eu | PGP: Siehe Webseite Passau, Deutschland | kontakt@guelker.eu | O< ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH][oc-csl] Improve reference parsing 2023-01-19 10:11 ` M. ‘quintus’ Gülker @ 2023-01-25 22:44 ` András Simonyi 0 siblings, 0 replies; 16+ messages in thread From: András Simonyi @ 2023-01-25 22:44 UTC (permalink / raw) To: András Simonyi, Ihor Radchenko, emacs-orgmode Dear All, On Thu, 19 Jan 2023 at 11:15, M. ‘quintus’ Gülker <post+orgmodeml@guelker.eu> wrote: > That is, the macro has to work both in a citation and in> normal text. Even if a @@csl: construct would be ignored in normal text,> I cannot see how to write the macro then, because something like > > #+MACRO: name @@csl:<span style="font-variant: small-caps">$1</span>@@@@latex:\textsc{$1}@@@@html:<span class="name">$1</span>@@ > > would still transfer the @@latex: and @@html: constructs into the > footnote. They would have to be expressly ignored by the citation > processor. If we take the approach I suggested the macro definition you suggested should work correctly both for LaTeX and HTML export combined with the CSL citation processor, because in the case of citation locators and affixes Citeproc would receive only the output produced by the planned CSL ox backend, which would remove the non-CSL export snippets and keep only the content of the csl snippet. Citeproc would parse the produced <span class="name">text</span> into the appropriate small-caps CSL representation and then format the citation with small-caps using the Citeproc formatter corresponding to the export format. best wishes, András > > -quintus > > -- > Dipl.-Jur. M. Gülker | https://mg.guelker.eu | PGP: Siehe Webseite > Passau, Deutschland | kontakt@guelker.eu | O< ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH][oc-csl] Improve reference parsing 2023-01-18 23:08 ` András Simonyi 2023-01-19 8:21 ` M. ‘quintus’ Gülker @ 2023-01-19 9:56 ` Ihor Radchenko [not found] ` <CAOWRwxD3pSfao7+G145naE=jaAz6=m2BjvUX0rj_c4r8qeu7rQ@mail.gmail.com> 1 sibling, 1 reply; 16+ messages in thread From: Ihor Radchenko @ 2023-01-19 9:56 UTC (permalink / raw) To: András Simonyi; +Cc: M. ‘quintus’ Gülker, emacs-orgmode András Simonyi <andras.simonyi@gmail.com> writes: > As for the question of other elements, I proposed the custom > backend-based approach because CSL has its own rich-text markup (which > is actually not simply a subset of Org's, for example, it contains > small-caps, which is not in Org), and, consequently, Citeproc-el has > its own internal rich-text representations (ASTs), on which it > performs the operations that are prescribed by the various CSL styles. > When the rich text citation/bibliography is finalized, it can be > "serialized" or "formatted" (analogously to Org's exporting a parse > tree) using one of the Citeproc formatters, e.g. into LaTeX, HTML or > Org. As the prefix, suffix and the locator also need to be operated on > by the processor (concatenated to other rich text elements etc.,), > they also have to be parsed into CIteproc el's internal rich-text > representations. Since this is a given, the only question is in what > format should they be passed, and the simple HTML-like standard which > is already supported by Citeproc-el (see > https://www.zotero.org/support/kb/rich_text_bibliography) seems to be > the simplest solution. So, do I understand correctly that italics, bold, subscript, superscript, small-caps, and nocase must be passed to the CSL processor in a format understood by CSL? Everything else could just be left in Org and later exported according to actual export settings? > Ihor Radchenko <yantar92@posteo.net> wrote: >> Could you please explain in more details why CSL require special >> export of the prefix/suffix? What will happen if we simply pass the Org >> markup verbatim? > > Since Citeproc-el assumes that all formatting in the prefix/suffix is > in the HTML-like markup mentioned above, any Org markup would be > treated as plain text which should be preserved as is, and not > interpreted as formatting, so, for example, when an Org document with > underlined text in a citation prefix were exported to LaTeX then the > Citeproc LaTeX formatter would escape the underscore characters ("\_") > to preserve them in the output and the citation would be inserted in > this form into the resulting LaTeX document. What if we pass Org constructs as verbatim html? That way, LaTeX formatter should not alter the text. >> I am asking because org-cite-csl-render-citation uses >> org-cite-parse-objects so, unless citeproc does something terrible with >> the original Org syntax, we can re-parse the output string and export >> appropriately according to the current export backend. > > See above, unfortunately, this wouldn't work, at least not in a > general and safe way. May we: 1. Convert the Org markup supported by CSL into CSL-understood HTML format 2. Convert all other Org markup into verbatim 3. Convert back non-verbatim markup altered by CSL into Org 4. Perform exporting Org->current export backend as usual. (In the worst case scenario, we might replace non-convertable Org markup constructs into dummy text and later replace the dummies back into original Org markup) WDYT? Also, small-caps and nocase are currently not supported by Org. Maybe it would make sense to document how to pass these constructs to CSL properly. -- Ihor Radchenko // yantar92, Org mode contributor, Learn more about Org mode at <https://orgmode.org/>. Support Org development at <https://liberapay.com/org-mode>, or support my work at <https://liberapay.com/yantar92> ^ permalink raw reply [flat|nested] 16+ messages in thread
[parent not found: <CAOWRwxD3pSfao7+G145naE=jaAz6=m2BjvUX0rj_c4r8qeu7rQ@mail.gmail.com>]
* Re: [PATCH][oc-csl] Improve reference parsing [not found] ` <CAOWRwxD3pSfao7+G145naE=jaAz6=m2BjvUX0rj_c4r8qeu7rQ@mail.gmail.com> @ 2023-01-26 9:43 ` Ihor Radchenko 0 siblings, 0 replies; 16+ messages in thread From: Ihor Radchenko @ 2023-01-26 9:43 UTC (permalink / raw) To: András Simonyi; +Cc: emacs-orgmode [ Adding Org ML back to CC ] András Simonyi <andras.simonyi@gmail.com> writes: > On Thu, 19 Jan 2023 at 10:56, Ihor Radchenko <yantar92@posteo.net> wrote: >> So, do I understand correctly that italics, bold, subscript, >> superscript, small-caps, and nocase must be passed to the CSL processor >> in a format understood by CSL? Everything else could just be left in Org >> and later exported according to actual export settings? > > Unfortunately, the situation is a bit different -- the solution I see > the most viable is to export the affixes and the locator to a form in > which > the markup elements you listed (plus links, which I haven't mentioned > but are also supported) are in the required CSL > input form, but the rest is in plain text. What do you mean by plain text? Plain text as in Org or plain text as in ASCII export? > Anything else would be way > more complicated to handle in Citeproc and I don't > really see the benefits/use-case either (we are talking about elements > within a citation). This would "only" require a custom backend > exporting CSL-supported elements in the html-like CSL format and > everything else which is allowed by the Org syntax > in plain text. I am not much concerned about CSL format itself. I am concerned about the formatted citation returned back to Org by CSL. Consider the following: [cite:Prefix *bold* +strike-through+ @key] It will be interpreted by Org export as (citation (:style nil ...) (citation-reference (:key "key"... :prefix ("Prefix " (bold (... :post-blank 1 ...) "bold") (strike-through (... :post-blank 1 ...) "strike-through"))))) Now, consider that the user has a custom export filter that decorates "+strike-through+" like "!!strike-through!!" upon export. If we pass the original citation to the CSL, will the export filter be applied? Also, what if user decorates a CSL locator with Org markup like strike-through? >> May we: >> 1. Convert the Org markup supported by CSL into CSL-understood HTML >> format >> 2. Convert all other Org markup into verbatim > > I'm not sure what you mean by verbatim -- leaving it as Org markup? Whatever prevents CSL from altering the text. (Like escaping "_" you mentioned earlier) >> 3. Convert back non-verbatim markup altered by CSL into Org >> 4. Perform exporting Org->current export backend as usual. > > If verbatim is Org then step 3 could be rather complicated, we'd need > to identify > the Org fragments in citeproc's HTML and LaTeX output when those > backends are used. But can't CSL output in Org format? Isn't the whole CSL thing supposed to work for arbitrary export backend, not just HTML and LaTeX? > Also I'd worry that the result would not pass through Citeproc's > post-processing steps -- > there is now a user-customizable hook variable for citation post-processing > which acts on the internal representations. I envision the conversion back to Org to happen after _all_ the Citeproc's processing, be it user-customized or not. > All in all I'd first concentrate on the use-case: is there anything > important left out > if we go with simply using a custom backend to export the CSL-supported markup > in CSL input format and everything else as plain text, then do what we > do know, namely > either simply insert the Citeproc-formatted output into the exported document > without any post-processing (currently this is for LaTeX and HTML), or > parse and export > with Org when the Org Citeproc formatter is used (currently for all > other formats), What I imagine is doing "parse and export with Org" all the time, including HTML and LaTeX export. -- Ihor Radchenko // yantar92, Org mode contributor, Learn more about Org mode at <https://orgmode.org/>. Support Org development at <https://liberapay.com/org-mode>, or support my work at <https://liberapay.com/yantar92> ^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2023-01-26 9:44 UTC | newest] Thread overview: 16+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2022-10-26 15:40 [PATCH][oc-csl] Improve reference parsing András Simonyi 2022-10-27 4:10 ` Ihor Radchenko 2022-11-01 15:02 ` András Simonyi 2022-11-02 6:29 ` Ihor Radchenko 2022-11-02 17:58 ` András Simonyi 2022-11-03 6:34 ` Ihor Radchenko 2023-01-07 12:50 ` M. ‘quintus’ Gülker 2023-01-15 8:56 ` Ihor Radchenko 2023-01-18 23:08 ` András Simonyi 2023-01-19 8:21 ` M. ‘quintus’ Gülker 2023-01-19 9:35 ` András Simonyi 2023-01-19 9:59 ` Ihor Radchenko 2023-01-19 10:11 ` M. ‘quintus’ Gülker 2023-01-25 22:44 ` András Simonyi 2023-01-19 9:56 ` Ihor Radchenko [not found] ` <CAOWRwxD3pSfao7+G145naE=jaAz6=m2BjvUX0rj_c4r8qeu7rQ@mail.gmail.com> 2023-01-26 9:43 ` Ihor Radchenko
Code repositories for project(s) associated with this public inbox https://git.savannah.gnu.org/cgit/emacs/org-mode.git This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).