From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mp10.migadu.com ([2001:41d0:2:4a6f::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by ms5.migadu.com with LMTPS id cKNdOaoZyWMP+gAAbAwnHQ (envelope-from ) for ; Thu, 19 Jan 2023 11:21:31 +0100 Received: from aspmx1.migadu.com ([2001:41d0:2:4a6f::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by mp10.migadu.com with LMTPS id gFNiOKoZyWPBxgAAG6o9tA (envelope-from ) for ; Thu, 19 Jan 2023 11:21:30 +0100 Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by aspmx1.migadu.com (Postfix) with ESMTPS id 9BE4416F67 for ; Thu, 19 Jan 2023 11:21:30 +0100 (CET) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1pIReo-00082k-8I; Thu, 19 Jan 2023 04:56:30 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1pIRem-00082Y-En for emacs-orgmode@gnu.org; Thu, 19 Jan 2023 04:56:28 -0500 Received: from mout01.posteo.de ([185.67.36.65]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1pIRek-0003al-5a for emacs-orgmode@gnu.org; Thu, 19 Jan 2023 04:56:28 -0500 Received: from submission (posteo.de [185.67.36.169]) by mout01.posteo.de (Postfix) with ESMTPS id B801B2405E1 for ; Thu, 19 Jan 2023 10:56:23 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=posteo.net; s=2017; t=1674122183; bh=yaJd173j0+Uq3p0nUc/LjMgrumLgTQU+qQrIImkT46U=; h=From:To:Cc:Subject:Date:From; b=EA2kFFWZoHmj4O23kZt+iWo4bdfGjPlcnfxhMeOD/KmPQPOJOOebfvl/jP453toCc I/tW60AYKUO78e1f/HSQm5gMxRL/Vqr2+malYcUueU9161xcFcJRgoj2GH3t1IBhwx iRT1VTA40PsqXyex80Xs9tyaxMDCOXMffDr2oFzO6W4a+wSh598eA8DGl48+1suMgg Xv8jNQMDXQByY2000uJh8Z82B8eBgtw//3mFhjy87N9EWIGect5RKoJTI1btwRF7Or otH7jux+Cbd+inWseLCpCCggXp01Uaar/kMXXl9IDKEqPAEaw8b7xRDhUjX+qcqctp Ztz5d8SjzNTTw== Received: from customer (localhost [127.0.0.1]) by submission (posteo.de) with ESMTPSA id 4NyJ1L2LzFz9rxF; Thu, 19 Jan 2023 10:56:22 +0100 (CET) From: Ihor Radchenko To: =?utf-8?Q?Andr=C3=A1s?= Simonyi Cc: =?utf-8?Q?M=2E_=E2=80=98quintus=E2=80=99_G=C3=BClker?= , emacs-orgmode@gnu.org Subject: Re: [PATCH][oc-csl] Improve reference parsing In-Reply-To: References: <87r0ytoqi6.fsf@localhost> <87k04dlvie.fsf@localhost> <87zgd87di9.fsf@localhost> <87zgauqzb8.fsf@guelker.eu> <87ilh889bz.fsf@localhost> Date: Thu, 19 Jan 2023 09:56:50 +0000 Message-ID: <87h6wm6e59.fsf@localhost> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Received-SPF: pass client-ip=185.67.36.65; envelope-from=yantar92@posteo.net; helo=mout01.posteo.de X-Spam_score_int: -43 X-Spam_score: -4.4 X-Spam_bar: ---- X-Spam_report: (-4.4 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_MED=-2.3, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: emacs-orgmode@gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "General discussions about Org-mode." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-orgmode-bounces+larch=yhetil.org@gnu.org Sender: emacs-orgmode-bounces+larch=yhetil.org@gnu.org X-Migadu-Flow: FLOW_IN X-Migadu-Country: US ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=yhetil.org; s=key1; t=1674123690; h=from:from:sender:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:list-id:list-help: list-unsubscribe:list-subscribe:list-post:dkim-signature; bh=yaJd173j0+Uq3p0nUc/LjMgrumLgTQU+qQrIImkT46U=; b=YcvJw8ggmVrOJylW428NjW+uS1fyvCAl7rV5X3cS2b4kPUHTIbv+fXyJInUqNzrGl6jL2Z 8lV7D2bLPt/YVgf5vQrCz7NO1wOjgvNimUsNrPUNwz4SECsvzwuXjJSKirpNFoPruAzCBI V40YCYpZ0gxR17vqlkP/r3mC3t9h+nwEg54DanNTe1w8KmxVDU1wjyfoRYVHk23wAV6fwd ijO4MfJN+rlvkIVp6Kjd6pg8Fpl5WGBQp6o5AsG/zcxyOBgSywW5JfBNo5Yl4sxozMlMNf OAfOEsxLgjEXNFVVHiMrJ/O8MMYIWW2II92/urhhvVvW1FSol7QGnyePC3704g== ARC-Authentication-Results: i=1; aspmx1.migadu.com; dkim=pass header.d=posteo.net header.s=2017 header.b=EA2kFFWZ; dmarc=pass (policy=none) header.from=posteo.net; spf=pass (aspmx1.migadu.com: domain of "emacs-orgmode-bounces+larch=yhetil.org@gnu.org" designates 209.51.188.17 as permitted sender) smtp.mailfrom="emacs-orgmode-bounces+larch=yhetil.org@gnu.org" ARC-Seal: i=1; s=key1; d=yhetil.org; t=1674123690; a=rsa-sha256; cv=none; b=SNNKCFFsQswKxZXb/zmhvigScl7rZT27KQvJTb0qVIpW5EBHpOdwT+53C/SQz6WdxBmxWH ObEo3qHN/PPJqcY5HqHD3ZZn1vkqll853PRqDleyqW0dQo/DBLh9PD2U5gG7B9yWp0oIAi VPNyaJT8yuUYBEcJgDg/3iQudYcP6kB7VHLgMslTJ1cAIYGkjRGe1qwfrLQjlSrRCSHvr6 9NNTio7Jbv3sZfLfGO3ErKp5SQyfUe3DJUzqoxSatofVhDYvoplWzZhLa1DcZZKV8NuJ+/ P48FLpv7d9Gb1HftED4kZ74qemc7BjtN80dbzjdsgRs90RI5/6mZQzOB29Wzgg== X-Migadu-Scanner: scn1.migadu.com X-Migadu-Spam-Score: -6.08 X-Spam-Score: -6.08 X-Migadu-Queue-Id: 9BE4416F67 Authentication-Results: aspmx1.migadu.com; dkim=pass header.d=posteo.net header.s=2017 header.b=EA2kFFWZ; dmarc=pass (policy=none) header.from=posteo.net; spf=pass (aspmx1.migadu.com: domain of "emacs-orgmode-bounces+larch=yhetil.org@gnu.org" designates 209.51.188.17 as permitted sender) smtp.mailfrom="emacs-orgmode-bounces+larch=yhetil.org@gnu.org" X-TUID: CulpZ0Jp+lX2 Andr=C3=A1s Simonyi writes: > As for the question of other elements, I proposed the custom > backend-based approach because CSL has its own rich-text markup (which > is actually not simply a subset of Org's, for example, it contains > small-caps, which is not in Org), and, consequently, Citeproc-el has > its own internal rich-text representations (ASTs), on which it > performs the operations that are prescribed by the various CSL styles. > When the rich text citation/bibliography is finalized, it can be > "serialized" or "formatted" (analogously to Org's exporting a parse > tree) using one of the Citeproc formatters, e.g. into LaTeX, HTML or > Org. As the prefix, suffix and the locator also need to be operated on > by the processor (concatenated to other rich text elements etc.,), > they also have to be parsed into CIteproc el's internal rich-text > representations. Since this is a given, the only question is in what > format should they be passed, and the simple HTML-like standard which > is already supported by Citeproc-el (see > https://www.zotero.org/support/kb/rich_text_bibliography) seems to be > the simplest solution. So, do I understand correctly that italics, bold, subscript, superscript, small-caps, and nocase must be passed to the CSL processor in a format understood by CSL? Everything else could just be left in Org and later exported according to actual export settings? > Ihor Radchenko wrote: >> Could you please explain in more details why CSL require special >> export of the prefix/suffix? What will happen if we simply pass the Org >> markup verbatim? > > Since Citeproc-el assumes that all formatting in the prefix/suffix is > in the HTML-like markup mentioned above, any Org markup would be > treated as plain text which should be preserved as is, and not > interpreted as formatting, so, for example, when an Org document with > underlined text in a citation prefix were exported to LaTeX then the > Citeproc LaTeX formatter would escape the underscore characters ("\_") > to preserve them in the output and the citation would be inserted in > this form into the resulting LaTeX document. What if we pass Org constructs as verbatim html? That way, LaTeX formatter should not alter the text. >> I am asking because org-cite-csl-render-citation uses >> org-cite-parse-objects so, unless citeproc does something terrible with >> the original Org syntax, we can re-parse the output string and export >> appropriately according to the current export backend. > > See above, unfortunately, this wouldn't work, at least not in a > general and safe way. May we: 1. Convert the Org markup supported by CSL into CSL-understood HTML format 2. Convert all other Org markup into verbatim 3. Convert back non-verbatim markup altered by CSL into Org 4. Perform exporting Org->current export backend as usual. (In the worst case scenario, we might replace non-convertable Org markup constructs into dummy text and later replace the dummies back into original Org markup) WDYT? Also, small-caps and nocase are currently not supported by Org. Maybe it would make sense to document how to pass these constructs to CSL properly. --=20 Ihor Radchenko // yantar92, Org mode contributor, Learn more about Org mode at . Support Org development at , or support my work at