emacs-orgmode@gnu.org archives
 help / color / mirror / code / Atom feed
* [PATCH][oc-csl] Improve reference parsing
@ 2022-10-26 15:40 András Simonyi
  2022-10-27  4:10 ` Ihor Radchenko
  0 siblings, 1 reply; 16+ messages in thread
From: András Simonyi @ 2022-10-26 15:40 UTC (permalink / raw)
  To: emacs-orgmode list

[-- Attachment #1: Type: text/plain, Size: 268 bytes --]

Dear All,

the attached patch improves the parsing and exporting of cite
prefixes, suffixes and locators -- the most noticeable change is
probably the  support for formatted locators and of underlining in
general.  Comments are welcome.

best wishes,
András

[-- Attachment #2: 0001-oc-csl.el-Improve-reference-parsing.patch --]
[-- Type: text/x-patch, Size: 5233 bytes --]

From 5bec7025f66eb65f13a701dc616aca2440110c1a Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Andr=C3=A1s=20Simonyi?= <andras.simonyi@gmail.com>
Date: Wed, 26 Oct 2022 12:15:42 +0200
Subject: [PATCH] oc-csl.el: Improve reference parsing

* lisp/oc-csl.el (org-cite-csl--export-backend): New constant to
provide a trivial export back-end for exporting reference affixes and
locators with the simple html-based markup expected by citeproc.
(org-cite-csl--parse-reference): Do not construct the reference
locator and include it in the result, since citeproc does not make use
of it.  Start the suffix immediately after the locator's ending,
skipping the ending comma if necessary.  Use
`org-cite-csl--export-backend' to export reference affixes and
locators.
---
 lisp/oc-csl.el | 45 ++++++++++++++++++++++++++++-----------------
 1 file changed, 28 insertions(+), 17 deletions(-)

diff --git a/lisp/oc-csl.el b/lisp/oc-csl.el
index 1ccb74e92..30eac9f1a 100644
--- a/lisp/oc-csl.el
+++ b/lisp/oc-csl.el
@@ -140,9 +140,10 @@
 (declare-function org-element-property "org-element" (property element))
 (declare-function org-element-put-property "org-element" (element property value))
 
-(declare-function org-export-data "org-export" (data info))
+(declare-function org-export-data-with-backend "org-export" (data backend info))
 (declare-function org-export-derived-backend-p "org-export" (backend &rest backends))
 (declare-function org-export-get-footnote-number "org-export" (footnote info &optional data body-first))
+(declare-function org-export-create-backend "org-export" (&key transcoders))
 
 \f
 ;;; Customization
@@ -310,6 +311,16 @@ If nil then the Chicago author-date style is used as a fallback.")
   "Regexp matching a label in a citation reference suffix.
 Label is in match group 1.")
 
+(defconst org-cite-csl--export-backend
+  (org-export-create-backend
+   :transcoders
+   '((plain-text . (lambda (text _info) text))
+     (bold . (lambda (_bold contents _info) (concat "<b>" contents "</b>")))
+     (italic . (lambda (_italic contents _info) (concat "<i>" contents "</i>")))
+     (underline . (lambda (_underline contents _info)
+		    (concat "<span class=\"underline\">" contents "</span>")))))
+  "Custom backend for exporting citation affixes and locators.")
+
 \f
 ;;; Internal functions
 (defun org-cite-csl--barf-without-citeproc ()
@@ -476,11 +487,10 @@ property in INFO."
 INFO is the export state, as a property list.
 
 The result is a association list.  Keys are: `id', `prefix',`suffix',
-`location', `locator' and `label'."
-  (let (label location-start locator-start location locator prefix suffix)
+`locator' and `label'."
+  (let (label location-start locator-start locator prefix suffix)
     ;; Parse suffix.  Insert it in a temporary buffer to find
-    ;; different parts: pre-label, label, locator, location (label +
-    ;; locator), and suffix.
+    ;; different parts: pre-label, label, locator, and suffix.
     (with-temp-buffer
       (save-excursion
         (insert (org-element-interpret-data
@@ -506,12 +516,15 @@ The result is a association list.  Keys are: `id', `prefix',`suffix',
         (let ((re (rx (or "," (group digit)))))
           (when (re-search-backward re location-start t)
             (goto-char (or (match-end 1) (match-beginning 0)))
-            (setq location (buffer-substring location-start (point)))
-            (setq locator (org-trim (buffer-substring locator-start (point))))
+            (setq locator
+                  (org-cite-parse-objects
+                   (buffer-substring locator-start (point))
+                   t))
             ;; Skip comma in suffix.
+            (when (= (following-char) ?,) (forward-char))
             (setq suffix
                   (org-cite-parse-objects
-                   (buffer-substring (match-end 0) (point-max))
+                   (buffer-substring (point) (point-max))
                    t)))))
       (setq prefix
             (org-cite-concat
@@ -525,18 +538,16 @@ The result is a association list.  Keys are: `id', `prefix',`suffix',
            (lambda (data)
              (org-string-nw-p
               (org-trim
-               ;; When Citeproc exports to Org syntax, avoid mix and
-               ;; matching output formats by also generating Org
-               ;; syntax for prefix and suffix.
-               (if (eq 'org (org-cite-csl--output-format info))
-                   (org-element-interpret-data data)
-                 (org-export-data data info)))))))
+               ;; Export the parsed prefix, suffix, and locator  
+               ;; with a custom backend, which produces the simple
+               ;; html markup expected by citeproc.
+               (org-export-data-with-backend
+                data org-cite-csl--export-backend info))))))
       `((id . ,(org-element-property :key reference))
         (prefix . ,(funcall export prefix))
         (suffix . ,(funcall export suffix))
-        (locator . ,locator)
-        (label . ,label)
-        (location . ,location)))))
+        (locator . ,(funcall export locator))
+        (label . ,label)))))
 
 (defun org-cite-csl--create-structure (citation info)
   "Create Citeproc structure for CITATION object.
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH][oc-csl] Improve reference parsing
  2022-10-26 15:40 [PATCH][oc-csl] Improve reference parsing András Simonyi
@ 2022-10-27  4:10 ` Ihor Radchenko
  2022-11-01 15:02   ` András Simonyi
  0 siblings, 1 reply; 16+ messages in thread
From: Ihor Radchenko @ 2022-10-27  4:10 UTC (permalink / raw)
  To: András Simonyi; +Cc: emacs-orgmode list

András Simonyi <andras.simonyi@gmail.com> writes:

> the attached patch improves the parsing and exporting of cite
> prefixes, suffixes and locators -- the most noticeable change is
> probably the  support for formatted locators and of underlining in
> general.  Comments are welcome.

Thanks!

> +(defconst org-cite-csl--export-backend
> +  (org-export-create-backend
> +   :transcoders
> +   '((plain-text . (lambda (text _info) text))
> +     (bold . (lambda (_bold contents _info) (concat "<b>" contents "</b>")))
> +     (italic . (lambda (_italic contents _info) (concat "<i>" contents "</i>")))
> +     (underline . (lambda (_underline contents _info)
> +		    (concat "<span class=\"underline\">" contents "</span>")))))
> +  "Custom backend for exporting citation affixes and locators.")

This will render e.g. strike-through empty.
Note that citation references may contain the following Org markup objects:
'(bold code entity italic
  latex-fragment strike-through subscript
  superscript underline verbatim)

And we may add more, as discussed in
https://orgmode.org/list/87k04xhhw3.fsf@localhost

-- 
Ihor Radchenko // yantar92,
Org mode contributor,
Learn more about Org mode at <https://orgmode.org/>.
Support Org development at <https://liberapay.com/org-mode>,
or support my work at <https://liberapay.com/yantar92>


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH][oc-csl] Improve reference parsing
  2022-10-27  4:10 ` Ihor Radchenko
@ 2022-11-01 15:02   ` András Simonyi
  2022-11-02  6:29     ` Ihor Radchenko
  0 siblings, 1 reply; 16+ messages in thread
From: András Simonyi @ 2022-11-01 15:02 UTC (permalink / raw)
  To: Ihor Radchenko; +Cc: emacs-orgmode list

[-- Attachment #1: Type: text/plain, Size: 1604 bytes --]

Dear All,

On Thu, 27 Oct 2022 at 06:10, Ihor Radchenko <yantar92@posteo.net> wrote:
> This will render e.g. strike-through empty.
> Note that citation references may contain the following Org markup objects:
> '(bold code entity italic
>   latex-fragment strike-through subscript
>   superscript underline verbatim)

thanks for pointing out the problem!! I've attached a new version of
the patch, in which the custom exporter backend has an (in many cases
trivial) translator for all currently allowed objects.

> And we may add more, as discussed in
> https://orgmode.org/list/87k04xhhw3.fsf@localhost

I don't think that it would make much sense to add a lot more, with
the possible exception of links, since citations are at most
sentence-sized textual units, not to mention the possible
complications arising for the existing export processors. (What type
of objects could the various LaTeX-based exporters support without
complex changes?)  Since CSL has only a few types of formatting
attributes (font-style, font-variant, font-weight, text-decoration and
vertical-align), if the set of allowed object is radically expanded
then it will probably be more reasonable to define a derived backed,
maybe based on the ascii exporter, but I feel that the current set
doesn't require this solution.

thanks & best wishes,
András

>
> --
> Ihor Radchenko // yantar92,
> Org mode contributor,
> Learn more about Org mode at <https://orgmode.org/>.
> Support Org development at <https://liberapay.com/org-mode>,
> or support my work at <https://liberapay.com/yantar92>

[-- Attachment #2: 0001-oc-csl.el-Improve-reference-parsing.patch --]
[-- Type: text/x-patch, Size: 6019 bytes --]

From 5dfbb8ef9291f906014800013cdb9a9d5569b728 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Andr=C3=A1s=20Simonyi?= <andras.simonyi@gmail.com>
Date: Wed, 26 Oct 2022 12:15:42 +0200
Subject: [PATCH] oc-csl.el: Improve reference parsing

* lisp/oc-csl.el (org-cite-csl--export-backend): New constant to
provide a trivial export back-end for exporting reference affixes and
locators with the simple html-based markup expected by citeproc.
(org-cite-csl--parse-reference): Do not construct the reference
locator and include it in the result, since citeproc does not make use
of it.  Start the suffix immediately after the locator's ending,
skipping the ending comma if necessary.  Use
`org-cite-csl--export-backend' to export reference affixes and
locators.
---
 lisp/oc-csl.el | 57 +++++++++++++++++++++++++++++++++++---------------
 1 file changed, 40 insertions(+), 17 deletions(-)

diff --git a/lisp/oc-csl.el b/lisp/oc-csl.el
index 1ccb74e92..1f40a9e8a 100644
--- a/lisp/oc-csl.el
+++ b/lisp/oc-csl.el
@@ -140,9 +140,10 @@
 (declare-function org-element-property "org-element" (property element))
 (declare-function org-element-put-property "org-element" (element property value))
 
-(declare-function org-export-data "org-export" (data info))
+(declare-function org-export-data-with-backend "org-export" (data backend info))
 (declare-function org-export-derived-backend-p "org-export" (backend &rest backends))
 (declare-function org-export-get-footnote-number "org-export" (footnote info &optional data body-first))
+(declare-function org-export-create-backend "org-export" (&key transcoders))
 
 \f
 ;;; Customization
@@ -310,8 +311,30 @@ If nil then the Chicago author-date style is used as a fallback.")
   "Regexp matching a label in a citation reference suffix.
 Label is in match group 1.")
 
+(defconst org-cite-csl--export-backend
+  (org-export-create-backend 
+   :transcoders
+   '((bold . (lambda (_bold contents _info) (format "<b>%s</b>" contents)))
+     (code . org-cite-csl--element-value)
+     (entity . (lambda (entity _contents _info)
+                 (format "\\%s" (org-element-property :name entity))))
+     (italic . (lambda (_italic contents _info) (format "<i>%s</i>" contents)))
+     (latex-fragment . org-cite-csl--element-value)
+     (plaintext . (lambda (contents _info) contents))
+     (strike-through . (lambda (_strike-through contents _info) contents))
+     (subscript . (lambda (_subscript contents _info) (format "<sub>%s</sub>" contents)))
+     (superscript . (lambda (_superscript contents _info) (format "<sup>%s</sup>" contents)))
+     (underline . (lambda (_underline contents _info)
+                    (format "<span class=\"underline\">%s</span>" contents)))
+     (verbatim . org-cite-csl--element-value)))
+  "Custom backend for exporting citation affixes and locators.")
+
 \f
 ;;; Internal functions
+(defun org-cite-csl--element-value (element _contents _info)
+  "Return the`:value' property of ELEMENT."
+  (org-element-property :value element))
+
 (defun org-cite-csl--barf-without-citeproc ()
   "Raise an error if Citeproc library is not loaded."
   (unless (featurep 'citeproc)
@@ -476,11 +499,10 @@ property in INFO."
 INFO is the export state, as a property list.
 
 The result is a association list.  Keys are: `id', `prefix',`suffix',
-`location', `locator' and `label'."
-  (let (label location-start locator-start location locator prefix suffix)
+`locator' and `label'."
+  (let (label location-start locator-start locator prefix suffix)
     ;; Parse suffix.  Insert it in a temporary buffer to find
-    ;; different parts: pre-label, label, locator, location (label +
-    ;; locator), and suffix.
+    ;; different parts: pre-label, label, locator, and suffix.
     (with-temp-buffer
       (save-excursion
         (insert (org-element-interpret-data
@@ -506,12 +528,15 @@ The result is a association list.  Keys are: `id', `prefix',`suffix',
         (let ((re (rx (or "," (group digit)))))
           (when (re-search-backward re location-start t)
             (goto-char (or (match-end 1) (match-beginning 0)))
-            (setq location (buffer-substring location-start (point)))
-            (setq locator (org-trim (buffer-substring locator-start (point))))
+            (setq locator
+                  (org-cite-parse-objects
+                   (buffer-substring locator-start (point))
+                   t))
             ;; Skip comma in suffix.
+            (when (= (following-char) ?,) (forward-char))
             (setq suffix
                   (org-cite-parse-objects
-                   (buffer-substring (match-end 0) (point-max))
+                   (buffer-substring (point) (point-max))
                    t)))))
       (setq prefix
             (org-cite-concat
@@ -525,18 +550,16 @@ The result is a association list.  Keys are: `id', `prefix',`suffix',
            (lambda (data)
              (org-string-nw-p
               (org-trim
-               ;; When Citeproc exports to Org syntax, avoid mix and
-               ;; matching output formats by also generating Org
-               ;; syntax for prefix and suffix.
-               (if (eq 'org (org-cite-csl--output-format info))
-                   (org-element-interpret-data data)
-                 (org-export-data data info)))))))
+               ;; Export the parsed prefix, suffix, and locator  
+               ;; with a custom backend that produces the simple
+               ;; html markup expected by citeproc.
+               (org-export-data-with-backend
+                data org-cite-csl--export-backend info))))))
       `((id . ,(org-element-property :key reference))
         (prefix . ,(funcall export prefix))
         (suffix . ,(funcall export suffix))
-        (locator . ,locator)
-        (label . ,label)
-        (location . ,location)))))
+        (locator . ,(funcall export locator))
+        (label . ,label)))))
 
 (defun org-cite-csl--create-structure (citation info)
   "Create Citeproc structure for CITATION object.
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH][oc-csl] Improve reference parsing
  2022-11-01 15:02   ` András Simonyi
@ 2022-11-02  6:29     ` Ihor Radchenko
  2022-11-02 17:58       ` András Simonyi
  0 siblings, 1 reply; 16+ messages in thread
From: Ihor Radchenko @ 2022-11-02  6:29 UTC (permalink / raw)
  To: András Simonyi; +Cc: emacs-orgmode list

András Simonyi <andras.simonyi@gmail.com> writes:

> thanks for pointing out the problem!! I've attached a new version of
> the patch, in which the custom exporter backend has an (in many cases
> trivial) translator for all currently allowed objects.

Thanks!

>> And we may add more, as discussed in
>> https://orgmode.org/list/87k04xhhw3.fsf@localhost
>
> I don't think that it would make much sense to add a lot more, with
> the possible exception of links, since citations are at most
> sentence-sized textual units, not to mention the possible
> complications arising for the existing export processors. (What type
> of objects could the various LaTeX-based exporters support without
> complex changes?)  Since CSL has only a few types of formatting
> attributes (font-style, font-variant, font-weight, text-decoration and
> vertical-align), if the set of allowed object is radically expanded
> then it will probably be more reasonable to define a derived backed,
> maybe based on the ascii exporter, but I feel that the current set
> doesn't require this solution.

I do not think that CSL limitations are really limiting us.

- Allowing macros will be handled by ox.el itself automatically
- Export snippets can also be processed without much issue (consider
  direct LaTeX code)
- inline-babel-call and inline src blocks may be useful with :exports
  results when some auto-generation of text is needed. They will also be
  handled automatically by ob-exp.
- latex-fragments are either equivalent to direct LaTeX or to inserting
  an image
- timestamps could be exported as text, although I do not see any
  obvious utility of timestamps inside references.

However, oc-csl should not ignore the export processor to support all
the above. I am not sure why you need a dedicated export processor
instead of passing the string to current processor (or derivative)
instead.

If you really need to mark certain constructs specially for CSL, you can
create a derived export backend for the current backend and replace the
transcoders for the object types that must be treated specially.

> +(defconst org-cite-csl--export-backend
> +  (org-export-create-backend 
> +   :transcoders
> +   '((bold . (lambda (_bold contents _info) (format "<b>%s</b>" contents)))
> +     (code . org-cite-csl--element-value)
> +     (entity . (lambda (entity _contents _info)
> +                 (format "\\%s" (org-element-property :name entity))))

Why :name, but not :html?

-- 
Ihor Radchenko // yantar92,
Org mode contributor,
Learn more about Org mode at <https://orgmode.org/>.
Support Org development at <https://liberapay.com/org-mode>,
or support my work at <https://liberapay.com/yantar92>


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH][oc-csl] Improve reference parsing
  2022-11-02  6:29     ` Ihor Radchenko
@ 2022-11-02 17:58       ` András Simonyi
  2022-11-03  6:34         ` Ihor Radchenko
  0 siblings, 1 reply; 16+ messages in thread
From: András Simonyi @ 2022-11-02 17:58 UTC (permalink / raw)
  To: Ihor Radchenko; +Cc: emacs-orgmode list

Dear All,

On Wed, 2 Nov 2022 at 07:28, Ihor Radchenko <yantar92@posteo.net> wrote:

> I do not think that CSL limitations are really limiting us.
>
> - Allowing macros will be handled by ox.el itself automatically
> - Export snippets can also be processed without much issue (consider
>   direct LaTeX code)
> - inline-babel-call and inline src blocks may be useful with :exports
>   results when some auto-generation of text is needed. They will also be
>   handled automatically by ob-exp.
> - latex-fragments are either equivalent to direct LaTeX or to inserting
>   an image
> - timestamps could be exported as text, although I do not see any
>   obvious utility of timestamps inside references.

I'm not really familiar with the internals of the Org exporter but,
looking at the ox.el code, macros and babel calls are processed and
resolved before processing citations, so they seemingly have no
bearing on the org-cite-csl--parse-reference function my patch is
concerned with.

> However, oc-csl should not ignore the export processor to support all
> the above. I am not sure why you need a dedicated export processor
> instead of passing the string to current processor (or derivative)
> instead.

> If you really need to mark certain constructs specially for CSL, you can
> create a derived export backend for the current backend and replace the
> transcoders for the object types that must be treated specially.

Other than macros and babel calls, e.g., timestamps, LaTeX  fragments
etc. the problem is that citeproc-el expects and needs the affixes and
locator to be passed in the very limited html-like markup supported by
CSL (see https://www.zotero.org/support/kb/rich_text_bibliography for
a rudimentary description), and, crucially, the assumption is that
everything else is plain text, which, if necessary, will be escaped
according to the target format, i.e., '$' signs are escaped by
citeproc-el's own LaTeX formatter. The reason for this limitation is
that the affixes and especially the locator have to be parsed into
citeproc-el's internal rich-text representation for further processing
according to the used CSL style. (Affixes are only concatenated to
other elements but locators can be the subject of any type of
formatting.)  As a consequence, I think the only real alternatives are
using a custom backend as I do in the current patch or a backend
derived from the plain text Org exporter -- I don't have a strong
preference as to which solution we choose, just went with the
seemingly more minimalist option. (The proper way of dealing with
LaTeX fragments in this context, in particular with LaTeX math
fragments, would be to support those in citeproc-el's internal
representation and markup, which is planned but not implemented yet.)

> > +(defconst org-cite-csl--export-backend
> > +  (org-export-create-backend
> > +   :transcoders
> > +   '((bold . (lambda (_bold contents _info) (format "<b>%s</b>" contents)))
> > +     (code . org-cite-csl--element-value)
> > +     (entity . (lambda (entity _contents _info)
> > +                 (format "\\%s" (org-element-property :name entity))))
>
> Why :name, but not :html?

Good point, thinking about it a bit more, :utf-8 would probably be a
slightly better solution (in keeping with citeproc-el's 'plain text'
requirement), I'will change this when we will have sorted out the
other details.

best wishes,
András


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH][oc-csl] Improve reference parsing
  2022-11-02 17:58       ` András Simonyi
@ 2022-11-03  6:34         ` Ihor Radchenko
  2023-01-07 12:50           ` M. ‘quintus’ Gülker
  0 siblings, 1 reply; 16+ messages in thread
From: Ihor Radchenko @ 2022-11-03  6:34 UTC (permalink / raw)
  To: András Simonyi; +Cc: emacs-orgmode list

András Simonyi <andras.simonyi@gmail.com> writes:

> On Wed, 2 Nov 2022 at 07:28, Ihor Radchenko <yantar92@posteo.net> wrote:
>
>> I do not think that CSL limitations are really limiting us.
> ...
> I'm not really familiar with the internals of the Org exporter but,
> looking at the ox.el code, macros and babel calls are processed and
> resolved before processing citations, so they seemingly have no
> bearing on the org-cite-csl--parse-reference function my patch is
> concerned with.

> Other than macros and babel calls, e.g., timestamps, LaTeX  fragments
> etc. the problem is that citeproc-el expects and needs the affixes and
> locator to be passed in the very limited html-like markup supported by
> CSL (see https://www.zotero.org/support/kb/rich_text_bibliography for
> a rudimentary description), and, crucially, the assumption is that
> everything else is plain text, which, if necessary, will be escaped
> according to the target format, i.e., '$' signs are escaped by
> citeproc-el's own LaTeX formatter. The reason for this limitation is
> that the affixes and especially the locator have to be parsed into
> citeproc-el's internal rich-text representation for further processing
> according to the used CSL style. (Affixes are only concatenated to
> other elements but locators can be the subject of any type of
> formatting.)  As a consequence, I think the only real alternatives are
> using a custom backend as I do in the current patch or a backend
> derived from the plain text Org exporter -- I don't have a strong
> preference as to which solution we choose, just went with the
> seemingly more minimalist option. (The proper way of dealing with
> LaTeX fragments in this context, in particular with LaTeX math
> fragments, would be to support those in citeproc-el's internal
> representation and markup, which is planned but not implemented yet.)

Could you please explain in more details why CSL require special
export of the prefix/suffix? What will happen if we simply pass the Org
markup verbatim?

I am asking because org-cite-csl-render-citation uses
org-cite-parse-objects so, unless citeproc does something terrible with
the original Org syntax, we can re-parse the output string and export
appropriately according to the current export backend.

-- 
Ihor Radchenko // yantar92,
Org mode contributor,
Learn more about Org mode at <https://orgmode.org/>.
Support Org development at <https://liberapay.com/org-mode>,
or support my work at <https://liberapay.com/yantar92>


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH][oc-csl] Improve reference parsing
  2022-11-03  6:34         ` Ihor Radchenko
@ 2023-01-07 12:50           ` M. ‘quintus’ Gülker
  2023-01-15  8:56             ` Ihor Radchenko
  0 siblings, 1 reply; 16+ messages in thread
From: M. ‘quintus’ Gülker @ 2023-01-07 12:50 UTC (permalink / raw)
  To: Ihor Radchenko; +Cc: András Simonyi, emacs-orgmode

Dear all,

I probably have not much to contribute to this rather technical thread,
but Ihor has redirected me here two times for my citation formatting
questions[1][2]. So I would like to ask if there is something I can do to
accelerate its inclusion into org so that I can start using macros in
citations?

  -quintus

[1]: https://list.orgmode.org/orgmode/87o7tb8pc1.fsf@localhost/

[2]: https://list.orgmode.org/orgmode/87zgcw8gtd.fsf@localhost/

-- 
Dipl.-Jur. M. Gülker | https://mg.guelker.eu | PGP: Siehe Webseite
Passau, Deutschland  | kontakt@guelker.eu    | O<


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH][oc-csl] Improve reference parsing
  2023-01-07 12:50           ` M. ‘quintus’ Gülker
@ 2023-01-15  8:56             ` Ihor Radchenko
  2023-01-18 23:08               ` András Simonyi
  0 siblings, 1 reply; 16+ messages in thread
From: Ihor Radchenko @ 2023-01-15  8:56 UTC (permalink / raw)
  To: M. ‘quintus’ Gülker; +Cc: András Simonyi, emacs-orgmode

M. ‘quintus’ Gülker <post+orgmodeml@guelker.eu> writes:

> I probably have not much to contribute to this rather technical thread,
> but Ihor has redirected me here two times for my citation formatting
> questions[1][2]. So I would like to ask if there is something I can do to
> accelerate its inclusion into org so that I can start using macros in
> citations?

András is the author of citeproc.el. I am not sure who else would be in
position to help us to move this forward.

My understanding of CSL is non-existing. I can only tell that
citeproc.el has its own implementation of citation export
(`citeproc-render-citations'), which expects some limited kind of html
as input. I am hoping that we can somehow work around limited markup
support of citeproc's implementation and instead leverage ox.el to do
the job. Otherwise, we will keep stumbling upon citeproc.el limitations
when exporting bibliography items.

-- 
Ihor Radchenko // yantar92,
Org mode contributor,
Learn more about Org mode at <https://orgmode.org/>.
Support Org development at <https://liberapay.com/org-mode>,
or support my work at <https://liberapay.com/yantar92>


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH][oc-csl] Improve reference parsing
  2023-01-15  8:56             ` Ihor Radchenko
@ 2023-01-18 23:08               ` András Simonyi
  2023-01-19  8:21                 ` M. ‘quintus’ Gülker
  2023-01-19  9:56                 ` Ihor Radchenko
  0 siblings, 2 replies; 16+ messages in thread
From: András Simonyi @ 2023-01-18 23:08 UTC (permalink / raw)
  To: Ihor Radchenko; +Cc: M. ‘quintus’ Gülker, emacs-orgmode

Dear All,

apologies for replying that late. If I understand the situation
correctly, we could handle the question of allowing macros in
citations independently of the handling of other constructs, because
macros are resolved before processing citations, so they have no
effect on the input of Citeproc-el.  In light of this, maybe there
could be a separate patch for just allowing macros?

As for the question of other elements, I proposed the custom
backend-based approach because CSL has its own rich-text markup (which
is actually not simply a subset of Org's, for example, it contains
small-caps, which is not in Org), and, consequently, Citeproc-el has
its own internal rich-text representations (ASTs), on which it
performs the operations that are prescribed by the various CSL styles.
When the rich text citation/bibliography is finalized, it can be
"serialized" or "formatted" (analogously to Org's exporting a parse
tree) using one of the Citeproc formatters, e.g. into LaTeX, HTML or
Org. As the prefix, suffix and the locator also need to be operated on
by the processor (concatenated to other rich text elements etc.,),
they also have to be parsed into CIteproc el's internal rich-text
representations. Since this is a given, the only question is in what
format should they be passed, and the simple HTML-like standard which
is already supported by Citeproc-el (see
https://www.zotero.org/support/kb/rich_text_bibliography) seems to be
the simplest solution.

Ihor Radchenko <yantar92@posteo.net> wrote:
> Could you please explain in more details why CSL require special
> export of the prefix/suffix? What will happen if we simply pass the Org
> markup verbatim?

Since Citeproc-el assumes that all formatting in the prefix/suffix is
in the HTML-like markup mentioned above, any Org markup would be
treated as plain text which should be preserved as is, and not
interpreted as formatting, so, for example, when an Org document with
underlined text in a citation prefix were exported to LaTeX then the
Citeproc LaTeX formatter would escape the underscore characters ("\_")
to preserve them in the output and the citation would be inserted in
this form into the resulting LaTeX document.

> I am asking because org-cite-csl-render-citation uses
> org-cite-parse-objects so, unless citeproc does something terrible with
> the original Org syntax, we can re-parse the output string and export
> appropriately according to the current export backend.

See above, unfortunately, this wouldn't work, at least not in a
general and safe way.

best wishes,
András


On Sun, 15 Jan 2023 at 09:56, Ihor Radchenko <yantar92@posteo.net> wrote:
>
> M. ‘quintus’ Gülker <post+orgmodeml@guelker.eu> writes:
>
> > I probably have not much to contribute to this rather technical thread,
> > but Ihor has redirected me here two times for my citation formatting
> > questions[1][2]. So I would like to ask if there is something I can do to
> > accelerate its inclusion into org so that I can start using macros in
> > citations?
>
> András is the author of citeproc.el. I am not sure who else would be in
> position to help us to move this forward.
>
> My understanding of CSL is non-existing. I can only tell that
> citeproc.el has its own implementation of citation export
> (`citeproc-render-citations'), which expects some limited kind of html
> as input. I am hoping that we can somehow work around limited markup
> support of citeproc's implementation and instead leverage ox.el to do
> the job. Otherwise, we will keep stumbling upon citeproc.el limitations
> when exporting bibliography items.
>
> --
> Ihor Radchenko // yantar92,
> Org mode contributor,
> Learn more about Org mode at <https://orgmode.org/>.
> Support Org development at <https://liberapay.com/org-mode>,
> or support my work at <https://liberapay.com/yantar92>


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH][oc-csl] Improve reference parsing
  2023-01-18 23:08               ` András Simonyi
@ 2023-01-19  8:21                 ` M. ‘quintus’ Gülker
  2023-01-19  9:35                   ` András Simonyi
  2023-01-19  9:56                 ` Ihor Radchenko
  1 sibling, 1 reply; 16+ messages in thread
From: M. ‘quintus’ Gülker @ 2023-01-19  8:21 UTC (permalink / raw)
  To: András Simonyi; +Cc: Ihor Radchenko, emacs-orgmode


Am Donnerstag, dem 19. Januar 2023 schrieb András Simonyi:
> apologies for replying that late. If I understand the situation
> correctly, we could handle the question of allowing macros in
> citations independently of the handling of other constructs, because
> macros are resolved before processing citations, so they have no
> effect on the input of Citeproc-el.  In light of this, maybe there
> could be a separate patch for just allowing macros?

I am not sure this targets the usecase I am pursuing, which is to use
macros to produce @@latex: escape constructs in order to have small-caps
markup in the citation footnotes:

    #+MACRO: name @@latex:\textsc{$1}@@@@html:<span class="name">$1</span>@@

If the macro resolves, but the @@latex construct does not, that would be
problematic. That being said, I /found/ an alternative that works,
albeit it is a bit ugly. I can create an explicit footnote, use a
[cite/default/bare:] construct (to suppress the terminal period) within
it and terminate the citation before the macro begins. That way, the
macro is outside of the citation construct. This construction is however
unfortunate when I want to cite multiple sources and have the macro used
on an earlier one, e.g.:

    [fn:1] [cite/default/bare:@foo p. 5], countering {{{name(Doe’s)}}} argument; [cite/default/bare:@bar p. 37].

It would be nicer if I could just write into the main text

    [cite:@foo p. 5, countering {{{name(Doe’s)}}} argument;@bar p. 37]

I can however live with the more elaborate construction, if nothing
else.

  -quintus

-- 
Dipl.-Jur. M. Gülker | https://mg.guelker.eu | PGP: Siehe Webseite
Passau, Deutschland  | kontakt@guelker.eu    | O<


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH][oc-csl] Improve reference parsing
  2023-01-19  8:21                 ` M. ‘quintus’ Gülker
@ 2023-01-19  9:35                   ` András Simonyi
  2023-01-19  9:59                     ` Ihor Radchenko
  2023-01-19 10:11                     ` M. ‘quintus’ Gülker
  0 siblings, 2 replies; 16+ messages in thread
From: András Simonyi @ 2023-01-19  9:35 UTC (permalink / raw)
  To: András Simonyi, Ihor Radchenko, emacs-orgmode

Dear All,

On Thu, 19 Jan 2023 at 09:35, M. ‘quintus’ Gülker
<post+orgmodeml@guelker.eu> wrote:

> I am not sure this targets the usecase I am pursuing, which is to use
> macros to produce @@latex: escape constructs in order to have small-caps
> markup in the citation footnotes:
>
>     #+MACRO: name @@latex:\textsc{$1}@@@@html:<span class="name">$1</span>@@
>
> If the macro resolves, but the @@latex construct does not, that would be
> problematic.

hopefully somebody more knowledgeable than me can comment on how
viable this is, but would a @@csl like export snippet construct help
with the problem?
In that case your macro could be along the lines of

#+MACRO: name @@csl:<span style="font-variant: small-caps">$1</span>@@

and -- assuming the custom export backend approach I proposed in the
patch -- we would only need to make sure that the inline @@csl export
snippets are exported as is by this "csl"  backend.

best wishes,
András


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH][oc-csl] Improve reference parsing
  2023-01-18 23:08               ` András Simonyi
  2023-01-19  8:21                 ` M. ‘quintus’ Gülker
@ 2023-01-19  9:56                 ` Ihor Radchenko
       [not found]                   ` <CAOWRwxD3pSfao7+G145naE=jaAz6=m2BjvUX0rj_c4r8qeu7rQ@mail.gmail.com>
  1 sibling, 1 reply; 16+ messages in thread
From: Ihor Radchenko @ 2023-01-19  9:56 UTC (permalink / raw)
  To: András Simonyi; +Cc: M. ‘quintus’ Gülker, emacs-orgmode

András Simonyi <andras.simonyi@gmail.com> writes:

> As for the question of other elements, I proposed the custom
> backend-based approach because CSL has its own rich-text markup (which
> is actually not simply a subset of Org's, for example, it contains
> small-caps, which is not in Org), and, consequently, Citeproc-el has
> its own internal rich-text representations (ASTs), on which it
> performs the operations that are prescribed by the various CSL styles.
> When the rich text citation/bibliography is finalized, it can be
> "serialized" or "formatted" (analogously to Org's exporting a parse
> tree) using one of the Citeproc formatters, e.g. into LaTeX, HTML or
> Org. As the prefix, suffix and the locator also need to be operated on
> by the processor (concatenated to other rich text elements etc.,),
> they also have to be parsed into CIteproc el's internal rich-text
> representations. Since this is a given, the only question is in what
> format should they be passed, and the simple HTML-like standard which
> is already supported by Citeproc-el (see
> https://www.zotero.org/support/kb/rich_text_bibliography) seems to be
> the simplest solution.

So, do I understand correctly that italics, bold, subscript,
superscript, small-caps, and nocase must be passed to the CSL processor
in a format understood by CSL? Everything else could just be left in Org
and later exported according to actual export settings?

> Ihor Radchenko <yantar92@posteo.net> wrote:
>> Could you please explain in more details why CSL require special
>> export of the prefix/suffix? What will happen if we simply pass the Org
>> markup verbatim?
>
> Since Citeproc-el assumes that all formatting in the prefix/suffix is
> in the HTML-like markup mentioned above, any Org markup would be
> treated as plain text which should be preserved as is, and not
> interpreted as formatting, so, for example, when an Org document with
> underlined text in a citation prefix were exported to LaTeX then the
> Citeproc LaTeX formatter would escape the underscore characters ("\_")
> to preserve them in the output and the citation would be inserted in
> this form into the resulting LaTeX document.

What if we pass Org constructs as verbatim html? That way, LaTeX
formatter should not alter the text.

>> I am asking because org-cite-csl-render-citation uses
>> org-cite-parse-objects so, unless citeproc does something terrible with
>> the original Org syntax, we can re-parse the output string and export
>> appropriately according to the current export backend.
>
> See above, unfortunately, this wouldn't work, at least not in a
> general and safe way.

May we:
1. Convert the Org markup supported by CSL into CSL-understood HTML
format
2. Convert all other Org markup into verbatim
3. Convert back non-verbatim markup altered by CSL into Org
4. Perform exporting Org->current export backend as usual.

(In the worst case scenario, we might replace non-convertable Org markup
constructs into dummy text and later replace the dummies back into
original Org markup)

WDYT?

Also, small-caps and nocase are currently not supported by Org. Maybe it
would make sense to document how to pass these constructs to CSL
properly.

-- 
Ihor Radchenko // yantar92,
Org mode contributor,
Learn more about Org mode at <https://orgmode.org/>.
Support Org development at <https://liberapay.com/org-mode>,
or support my work at <https://liberapay.com/yantar92>


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH][oc-csl] Improve reference parsing
  2023-01-19  9:35                   ` András Simonyi
@ 2023-01-19  9:59                     ` Ihor Radchenko
  2023-01-19 10:11                     ` M. ‘quintus’ Gülker
  1 sibling, 0 replies; 16+ messages in thread
From: Ihor Radchenko @ 2023-01-19  9:59 UTC (permalink / raw)
  To: András Simonyi; +Cc: emacs-orgmode

András Simonyi <andras.simonyi@gmail.com> writes:

> In that case your macro could be along the lines of
>
> #+MACRO: name @@csl:<span style="font-variant: small-caps">$1</span>@@
>
> and -- assuming the custom export backend approach I proposed in the
> patch -- we would only need to make sure that the inline @@csl export
> snippets are exported as is by this "csl"  backend.

I think it could be a good option. Especially if the macro also provides
a good fallback for non-CSL citation backends.

-- 
Ihor Radchenko // yantar92,
Org mode contributor,
Learn more about Org mode at <https://orgmode.org/>.
Support Org development at <https://liberapay.com/org-mode>,
or support my work at <https://liberapay.com/yantar92>


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH][oc-csl] Improve reference parsing
  2023-01-19  9:35                   ` András Simonyi
  2023-01-19  9:59                     ` Ihor Radchenko
@ 2023-01-19 10:11                     ` M. ‘quintus’ Gülker
  2023-01-25 22:44                       ` András Simonyi
  1 sibling, 1 reply; 16+ messages in thread
From: M. ‘quintus’ Gülker @ 2023-01-19 10:11 UTC (permalink / raw)
  To: András Simonyi; +Cc: Ihor Radchenko, emacs-orgmode


Am Donnerstag, dem 19. Januar 2023 schrieb András Simonyi:
> hopefully somebody more knowledgeable than me can comment on how
> viable this is, but would a @@csl like export snippet construct help
> with the problem?
> In that case your macro could be along the lines of
>
> #+MACRO: name @@csl:<span style="font-variant: small-caps">$1</span>@@

It is an interesting approach, but it has a drawback. I use this macro
also in the ordinary text when I refer to persons without an explicit
citation. That is, the macro has to work both in a citation and in
normal text. Even if a @@csl: construct would be ignored in normal text,
I cannot see how to write the macro then, because something like

    #+MACRO: name @@csl:<span style="font-variant: small-caps">$1</span>@@@@latex:\textsc{$1}@@@@html:<span class="name">$1</span>@@

would still transfer the @@latex: and @@html: constructs into the
footnote. They would have to be expressly ignored by the citation
processor.

  -quintus

-- 
Dipl.-Jur. M. Gülker | https://mg.guelker.eu | PGP: Siehe Webseite
Passau, Deutschland  | kontakt@guelker.eu    | O<


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH][oc-csl] Improve reference parsing
  2023-01-19 10:11                     ` M. ‘quintus’ Gülker
@ 2023-01-25 22:44                       ` András Simonyi
  0 siblings, 0 replies; 16+ messages in thread
From: András Simonyi @ 2023-01-25 22:44 UTC (permalink / raw)
  To: András Simonyi, Ihor Radchenko, emacs-orgmode

Dear All,

On Thu, 19 Jan 2023 at 11:15, M. ‘quintus’ Gülker
<post+orgmodeml@guelker.eu> wrote:

> That is, the macro has to work both in a citation and in> normal text. Even if a @@csl: construct would be ignored in normal text,> I cannot see how to write the macro then, because something like
>
>     #+MACRO: name @@csl:<span style="font-variant: small-caps">$1</span>@@@@latex:\textsc{$1}@@@@html:<span class="name">$1</span>@@
>
> would still transfer the @@latex: and @@html: constructs into the
> footnote. They would have to be expressly ignored by the citation
> processor.

If we take the approach I suggested the macro definition you suggested
should work correctly both for LaTeX and HTML export combined with the
CSL citation processor, because in the case of citation locators and
affixes Citeproc would receive only the output produced by the planned
CSL ox backend, which would remove the non-CSL export snippets and
keep only the content of the csl snippet.  Citeproc would parse the
produced <span class="name">text</span> into the appropriate
small-caps CSL representation and then format the citation with
small-caps using the Citeproc formatter corresponding to the export
format.

best wishes,
András

>
>   -quintus
>
> --
> Dipl.-Jur. M. Gülker | https://mg.guelker.eu | PGP: Siehe Webseite
> Passau, Deutschland  | kontakt@guelker.eu    | O<


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH][oc-csl] Improve reference parsing
       [not found]                   ` <CAOWRwxD3pSfao7+G145naE=jaAz6=m2BjvUX0rj_c4r8qeu7rQ@mail.gmail.com>
@ 2023-01-26  9:43                     ` Ihor Radchenko
  0 siblings, 0 replies; 16+ messages in thread
From: Ihor Radchenko @ 2023-01-26  9:43 UTC (permalink / raw)
  To: András Simonyi; +Cc: emacs-orgmode

[ Adding Org ML back to CC ]

András Simonyi <andras.simonyi@gmail.com> writes:

> On Thu, 19 Jan 2023 at 10:56, Ihor Radchenko <yantar92@posteo.net> wrote:
>> So, do I understand correctly that italics, bold, subscript,
>> superscript, small-caps, and nocase must be passed to the CSL processor
>> in a format understood by CSL? Everything else could just be left in Org
>> and later exported according to actual export settings?
>
> Unfortunately, the situation is a bit different --  the solution I see
> the most viable is to export the affixes and the locator to a form in
> which
> the markup elements you listed (plus links, which I haven't mentioned
> but are also supported) are in the required CSL
> input form, but the rest is in plain text.

What do you mean by plain text? Plain text as in Org or plain text as in
ASCII export?

> Anything else would be way
> more complicated to handle in Citeproc and I don't
> really see the benefits/use-case either (we are talking about elements
> within a citation). This would "only" require a custom backend
> exporting CSL-supported elements in the html-like CSL format and
> everything else which is allowed by the Org syntax
> in plain text.

I am not much concerned about CSL format itself.
I am concerned about the formatted citation returned back to Org by CSL.

Consider the following:

[cite:Prefix *bold* +strike-through+ @key]

It will be interpreted by Org export as

(citation
    (:style nil ...)
    (citation-reference
     (:key "key"... :prefix
	   ("Prefix "
	    (bold
	     (... :post-blank 1 ...)
             "bold")
	    (strike-through
	     (... :post-blank 1 ...)
             "strike-through")))))

Now, consider that the user has a custom export filter that decorates
"+strike-through+" like "!!strike-through!!" upon export.

If we pass the original citation to the CSL, will the export filter be
applied?

Also, what if user decorates a CSL locator with Org markup like
strike-through?
             
>> May we:
>> 1. Convert the Org markup supported by CSL into CSL-understood HTML
>> format
>> 2. Convert all other Org markup into verbatim
>
> I'm not sure what you mean by verbatim -- leaving it as Org markup?

Whatever prevents CSL from altering the text. (Like escaping "_" you
mentioned earlier)

>> 3. Convert back non-verbatim markup altered by CSL into Org
>> 4. Perform exporting Org->current export backend as usual.
>
> If verbatim is Org then step 3 could be rather complicated, we'd need
> to identify
> the Org fragments in citeproc's HTML and LaTeX output when those
> backends are used.

But can't CSL output in Org format? Isn't the whole CSL thing supposed
to work for arbitrary export backend, not just HTML and LaTeX?

> Also I'd worry that the result would not pass through Citeproc's
> post-processing steps --
> there is now a user-customizable hook variable for citation post-processing
> which acts on the internal representations.

I envision the conversion back to Org to happen after _all_ the
Citeproc's processing, be it user-customized or not.

> All in all I'd first concentrate on the use-case: is there anything
> important left out
> if we go with simply using a custom backend to export the CSL-supported markup
> in CSL input format and everything else as plain text, then do what we
> do know, namely
> either simply insert the Citeproc-formatted output into the exported document
> without any post-processing (currently this is for LaTeX and HTML), or
> parse and export
> with Org when the Org Citeproc formatter is used (currently for all
> other formats),

What I imagine is doing "parse and export with Org" all the time,
including HTML and LaTeX export.

-- 
Ihor Radchenko // yantar92,
Org mode contributor,
Learn more about Org mode at <https://orgmode.org/>.
Support Org development at <https://liberapay.com/org-mode>,
or support my work at <https://liberapay.com/yantar92>


^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2023-01-26  9:44 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-10-26 15:40 [PATCH][oc-csl] Improve reference parsing András Simonyi
2022-10-27  4:10 ` Ihor Radchenko
2022-11-01 15:02   ` András Simonyi
2022-11-02  6:29     ` Ihor Radchenko
2022-11-02 17:58       ` András Simonyi
2022-11-03  6:34         ` Ihor Radchenko
2023-01-07 12:50           ` M. ‘quintus’ Gülker
2023-01-15  8:56             ` Ihor Radchenko
2023-01-18 23:08               ` András Simonyi
2023-01-19  8:21                 ` M. ‘quintus’ Gülker
2023-01-19  9:35                   ` András Simonyi
2023-01-19  9:59                     ` Ihor Radchenko
2023-01-19 10:11                     ` M. ‘quintus’ Gülker
2023-01-25 22:44                       ` András Simonyi
2023-01-19  9:56                 ` Ihor Radchenko
     [not found]                   ` <CAOWRwxD3pSfao7+G145naE=jaAz6=m2BjvUX0rj_c4r8qeu7rQ@mail.gmail.com>
2023-01-26  9:43                     ` Ihor Radchenko

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).