emacs-orgmode@gnu.org archives
 help / color / mirror / code / Atom feed
From: Ihor Radchenko <yantar92@posteo.net>
To: "András Simonyi" <andras.simonyi@gmail.com>
Cc: "M. ‘quintus’ Gülker" <post+orgmodeml@guelker.eu>, emacs-orgmode@gnu.org
Subject: Re: [PATCH][oc-csl] Improve reference parsing
Date: Thu, 19 Jan 2023 09:56:50 +0000	[thread overview]
Message-ID: <87h6wm6e59.fsf@localhost> (raw)
In-Reply-To: <CAOWRwxC62=3BK=3j7F7ve0srz6VcZZuzAJrJBO7F7GK0t-vNWQ@mail.gmail.com>

András Simonyi <andras.simonyi@gmail.com> writes:

> As for the question of other elements, I proposed the custom
> backend-based approach because CSL has its own rich-text markup (which
> is actually not simply a subset of Org's, for example, it contains
> small-caps, which is not in Org), and, consequently, Citeproc-el has
> its own internal rich-text representations (ASTs), on which it
> performs the operations that are prescribed by the various CSL styles.
> When the rich text citation/bibliography is finalized, it can be
> "serialized" or "formatted" (analogously to Org's exporting a parse
> tree) using one of the Citeproc formatters, e.g. into LaTeX, HTML or
> Org. As the prefix, suffix and the locator also need to be operated on
> by the processor (concatenated to other rich text elements etc.,),
> they also have to be parsed into CIteproc el's internal rich-text
> representations. Since this is a given, the only question is in what
> format should they be passed, and the simple HTML-like standard which
> is already supported by Citeproc-el (see
> https://www.zotero.org/support/kb/rich_text_bibliography) seems to be
> the simplest solution.

So, do I understand correctly that italics, bold, subscript,
superscript, small-caps, and nocase must be passed to the CSL processor
in a format understood by CSL? Everything else could just be left in Org
and later exported according to actual export settings?

> Ihor Radchenko <yantar92@posteo.net> wrote:
>> Could you please explain in more details why CSL require special
>> export of the prefix/suffix? What will happen if we simply pass the Org
>> markup verbatim?
>
> Since Citeproc-el assumes that all formatting in the prefix/suffix is
> in the HTML-like markup mentioned above, any Org markup would be
> treated as plain text which should be preserved as is, and not
> interpreted as formatting, so, for example, when an Org document with
> underlined text in a citation prefix were exported to LaTeX then the
> Citeproc LaTeX formatter would escape the underscore characters ("\_")
> to preserve them in the output and the citation would be inserted in
> this form into the resulting LaTeX document.

What if we pass Org constructs as verbatim html? That way, LaTeX
formatter should not alter the text.

>> I am asking because org-cite-csl-render-citation uses
>> org-cite-parse-objects so, unless citeproc does something terrible with
>> the original Org syntax, we can re-parse the output string and export
>> appropriately according to the current export backend.
>
> See above, unfortunately, this wouldn't work, at least not in a
> general and safe way.

May we:
1. Convert the Org markup supported by CSL into CSL-understood HTML
format
2. Convert all other Org markup into verbatim
3. Convert back non-verbatim markup altered by CSL into Org
4. Perform exporting Org->current export backend as usual.

(In the worst case scenario, we might replace non-convertable Org markup
constructs into dummy text and later replace the dummies back into
original Org markup)

WDYT?

Also, small-caps and nocase are currently not supported by Org. Maybe it
would make sense to document how to pass these constructs to CSL
properly.

-- 
Ihor Radchenko // yantar92,
Org mode contributor,
Learn more about Org mode at <https://orgmode.org/>.
Support Org development at <https://liberapay.com/org-mode>,
or support my work at <https://liberapay.com/yantar92>


  parent reply	other threads:[~2023-01-19 10:21 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-10-26 15:40 [PATCH][oc-csl] Improve reference parsing András Simonyi
2022-10-27  4:10 ` Ihor Radchenko
2022-11-01 15:02   ` András Simonyi
2022-11-02  6:29     ` Ihor Radchenko
2022-11-02 17:58       ` András Simonyi
2022-11-03  6:34         ` Ihor Radchenko
2023-01-07 12:50           ` M. ‘quintus’ Gülker
2023-01-15  8:56             ` Ihor Radchenko
2023-01-18 23:08               ` András Simonyi
2023-01-19  8:21                 ` M. ‘quintus’ Gülker
2023-01-19  9:35                   ` András Simonyi
2023-01-19  9:59                     ` Ihor Radchenko
2023-01-19 10:11                     ` M. ‘quintus’ Gülker
2023-01-25 22:44                       ` András Simonyi
2023-01-19  9:56                 ` Ihor Radchenko [this message]
     [not found]                   ` <CAOWRwxD3pSfao7+G145naE=jaAz6=m2BjvUX0rj_c4r8qeu7rQ@mail.gmail.com>
2023-01-26  9:43                     ` Ihor Radchenko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.orgmode.org/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87h6wm6e59.fsf@localhost \
    --to=yantar92@posteo.net \
    --cc=andras.simonyi@gmail.com \
    --cc=emacs-orgmode@gnu.org \
    --cc=post+orgmodeml@guelker.eu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).