From: Ihor Radchenko <yantar92@posteo.net>
To: "András Simonyi" <andras.simonyi@gmail.com>
Cc: "M. ‘quintus’ Gülker" <post+orgmodeml@guelker.eu>, emacs-orgmode@gnu.org
Subject: Re: [PATCH][oc-csl] Improve reference parsing
Date: Thu, 19 Jan 2023 09:56:50 +0000 [thread overview]
Message-ID: <87h6wm6e59.fsf@localhost> (raw)
In-Reply-To: <CAOWRwxC62=3BK=3j7F7ve0srz6VcZZuzAJrJBO7F7GK0t-vNWQ@mail.gmail.com>
András Simonyi <andras.simonyi@gmail.com> writes:
> As for the question of other elements, I proposed the custom
> backend-based approach because CSL has its own rich-text markup (which
> is actually not simply a subset of Org's, for example, it contains
> small-caps, which is not in Org), and, consequently, Citeproc-el has
> its own internal rich-text representations (ASTs), on which it
> performs the operations that are prescribed by the various CSL styles.
> When the rich text citation/bibliography is finalized, it can be
> "serialized" or "formatted" (analogously to Org's exporting a parse
> tree) using one of the Citeproc formatters, e.g. into LaTeX, HTML or
> Org. As the prefix, suffix and the locator also need to be operated on
> by the processor (concatenated to other rich text elements etc.,),
> they also have to be parsed into CIteproc el's internal rich-text
> representations. Since this is a given, the only question is in what
> format should they be passed, and the simple HTML-like standard which
> is already supported by Citeproc-el (see
> https://www.zotero.org/support/kb/rich_text_bibliography) seems to be
> the simplest solution.
So, do I understand correctly that italics, bold, subscript,
superscript, small-caps, and nocase must be passed to the CSL processor
in a format understood by CSL? Everything else could just be left in Org
and later exported according to actual export settings?
> Ihor Radchenko <yantar92@posteo.net> wrote:
>> Could you please explain in more details why CSL require special
>> export of the prefix/suffix? What will happen if we simply pass the Org
>> markup verbatim?
>
> Since Citeproc-el assumes that all formatting in the prefix/suffix is
> in the HTML-like markup mentioned above, any Org markup would be
> treated as plain text which should be preserved as is, and not
> interpreted as formatting, so, for example, when an Org document with
> underlined text in a citation prefix were exported to LaTeX then the
> Citeproc LaTeX formatter would escape the underscore characters ("\_")
> to preserve them in the output and the citation would be inserted in
> this form into the resulting LaTeX document.
What if we pass Org constructs as verbatim html? That way, LaTeX
formatter should not alter the text.
>> I am asking because org-cite-csl-render-citation uses
>> org-cite-parse-objects so, unless citeproc does something terrible with
>> the original Org syntax, we can re-parse the output string and export
>> appropriately according to the current export backend.
>
> See above, unfortunately, this wouldn't work, at least not in a
> general and safe way.
May we:
1. Convert the Org markup supported by CSL into CSL-understood HTML
format
2. Convert all other Org markup into verbatim
3. Convert back non-verbatim markup altered by CSL into Org
4. Perform exporting Org->current export backend as usual.
(In the worst case scenario, we might replace non-convertable Org markup
constructs into dummy text and later replace the dummies back into
original Org markup)
WDYT?
Also, small-caps and nocase are currently not supported by Org. Maybe it
would make sense to document how to pass these constructs to CSL
properly.
--
Ihor Radchenko // yantar92,
Org mode contributor,
Learn more about Org mode at <https://orgmode.org/>.
Support Org development at <https://liberapay.com/org-mode>,
or support my work at <https://liberapay.com/yantar92>
next prev parent reply other threads:[~2023-01-19 10:21 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-10-26 15:40 [PATCH][oc-csl] Improve reference parsing András Simonyi
2022-10-27 4:10 ` Ihor Radchenko
2022-11-01 15:02 ` András Simonyi
2022-11-02 6:29 ` Ihor Radchenko
2022-11-02 17:58 ` András Simonyi
2022-11-03 6:34 ` Ihor Radchenko
2023-01-07 12:50 ` M. ‘quintus’ Gülker
2023-01-15 8:56 ` Ihor Radchenko
2023-01-18 23:08 ` András Simonyi
2023-01-19 8:21 ` M. ‘quintus’ Gülker
2023-01-19 9:35 ` András Simonyi
2023-01-19 9:59 ` Ihor Radchenko
2023-01-19 10:11 ` M. ‘quintus’ Gülker
2023-01-25 22:44 ` András Simonyi
2023-01-19 9:56 ` Ihor Radchenko [this message]
[not found] ` <CAOWRwxD3pSfao7+G145naE=jaAz6=m2BjvUX0rj_c4r8qeu7rQ@mail.gmail.com>
2023-01-26 9:43 ` Ihor Radchenko
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://www.orgmode.org/
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87h6wm6e59.fsf@localhost \
--to=yantar92@posteo.net \
--cc=andras.simonyi@gmail.com \
--cc=emacs-orgmode@gnu.org \
--cc=post+orgmodeml@guelker.eu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://git.savannah.gnu.org/cgit/emacs/org-mode.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).