From mboxrd@z Thu Jan  1 00:00:00 1970
From: Aaron Ecay <aaronecay@gmail.com>
Subject: Re: Citation syntax: a revised proposal
Date: Tue, 10 Mar 2015 23:21:33 -0300
Message-ID: <871tkwmh87.fsf@gmail.com>
References: <87k2zjnc0e.fsf@berkeley.edu> <m2iof3q11j.fsf@tsdye.com>
	<87bnkvm8la.fsf@berkeley.edu> <m21tlrot9o.fsf@tsdye.com>
	<87zj8co3se.fsf@berkeley.edu> <m2iof0gdnc.fsf@tsdye.com>
	<87ioezooi2.fsf@berkeley.edu> <87mw4bpaiu.fsf@nicolasgoaziou.fr>
	<8761aznpiq.fsf@berkeley.edu> <87twyjnh0r.fsf@nicolasgoaziou.fr>
	<87oaopx24e.fsf@berkeley.edu> <87k2zd4f3w.fsf@nicolasgoaziou.fr>
	<87egpkv8g9.fsf@berkeley.edu> <877fv6xfaq.fsf@gmail.com>
	<87twya2ak0.fsf@berkeley.edu> <87zj81aa97.fsf@nicolasgoaziou.fr>
	<87ioep2r6p.fsf@berkeley.edu>
	<87y4ngbgm7.fsf@nicolasgoaziou.fr> <87bnkbi61v.fsf@gmail.com>
	<87a8zlmujp.fsf@gmail.com> <87ioe8yftp.fsf@berkeley.edu>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable
Return-path: <emacs-orgmode-bounces+geo-emacs-orgmode=m.gmane.org@gnu.org>
Received: from eggs.gnu.org ([2001:4830:134:3::10]:49340)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <aaronecay@gmail.com>) id 1YVWHW-0005fT-1F
	for emacs-orgmode@gnu.org; Tue, 10 Mar 2015 22:21:59 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <aaronecay@gmail.com>) id 1YVWHR-000890-NA
	for emacs-orgmode@gnu.org; Tue, 10 Mar 2015 22:21:57 -0400
Received: from mail-qg0-x22c.google.com ([2607:f8b0:400d:c04::22c]:36216)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <aaronecay@gmail.com>) id 1YVWHR-00088L-Gy
	for emacs-orgmode@gnu.org; Tue, 10 Mar 2015 22:21:53 -0400
Received: by qgdz107 with SMTP id z107so6919244qgd.3
	for <emacs-orgmode@gnu.org>; Tue, 10 Mar 2015 19:21:52 -0700 (PDT)
In-Reply-To: <87ioe8yftp.fsf@berkeley.edu>
List-Id: "General discussions about Org-mode." <emacs-orgmode.gnu.org>
List-Unsubscribe: <https://lists.gnu.org/mailman/options/emacs-orgmode>,
	<mailto:emacs-orgmode-request@gnu.org?subject=unsubscribe>
List-Archive: <http://lists.gnu.org/archive/html/emacs-orgmode>
List-Post: <mailto:emacs-orgmode@gnu.org>
List-Help: <mailto:emacs-orgmode-request@gnu.org?subject=help>
List-Subscribe: <https://lists.gnu.org/mailman/listinfo/emacs-orgmode>,
	<mailto:emacs-orgmode-request@gnu.org?subject=subscribe>
Errors-To: emacs-orgmode-bounces+geo-emacs-orgmode=m.gmane.org@gnu.org
Sender: emacs-orgmode-bounces+geo-emacs-orgmode=m.gmane.org@gnu.org
To: Richard Lawrence <richard.lawrence@berkeley.edu>
Cc: emacs-orgmode@gnu.org

Hi Richard,

Thanks for your comments, and for your work on an implementation.

2015ko martxoak 10an, Richard Lawrence-ek idatzi zuen:
> I have actually been working on the same problem, using citeproc-hs as
> the CSL processor instead of citeproc-java.

This is an interesting approach.  What version of citeproc-hs are you
using?  The version under that name is no longer maintained, and I had
some trouble getting it to build.  The pandoc fork (under the name
pandoc-citeproc) seemed to me to lack command-line functionality.  (But
I only looked briefly and could have missed something.)

> I took a (very) brief look at your code; it seems like you are only
> communicating with citeproc-java via command line arguments and stdout.
> Is that right?

Yes.

>
> My approach to the problems you mention has been the following:
>   1) Generate JSON from citation objects on the Org side.
>   2) Pass that JSON to the processor via stdin.
>   3) Pass the output format, the CSL file, and the bibliography database
>      to the processor via command line arguments.
>   4) Return, to stdout, the formatted citations and the bibliography.
>      These are formatted such that there is one citation or entry per lin=
e,
>      and a recognizable separator separates the citations from the
>      bibliography.
>
> This allows passing formatting options for individual citations via the
> JSON object for that citation, so it allows citeproc-hs to do more of
> the work of formatting citations.
>
> (See http://gsl-nagoya-u.net/http/pub/citeproc-doc.html#citation-data-obj=
ect
> for documentation of the citation data JSON format.)

Very interesting.  But it looks like the CSL standard does not
differentiate parenthesized/not and capitalized/not citations, whereas
biblatex (taken as the best representative of the latex family of
citation processors) does.  I think we have decided we need to support
these.  So we will always need to do some post-processing of the CSL
output.  Then the question arises (for example) whether it=E2=80=99s better=
 to
let CSL/citeproc handle the prefix and suffix, or to do it ourselves.

I don=E2=80=99t think we can decide this without looking at a working =E2=
=80=9Csketch=E2=80=9D
of both implementations.  It would be very good to see your draft code.

>
> I don't know whether this will ultimately be a good design, but the way
> I am picturing it right now is that exporting citations will work sort
> of like footnotes: the exporter will gather them all together as they
> are encountered, then generate the JSON and run a single call to the CSL
> processor at the end of the export process.  It can then replace the
> citations in the document with the result from the CSL processor, and
> insert the bibliography at the end of the document.

My code does something similar.  It processes all citations at the
beginning of export and stashes the data in the info plist, so that
it=E2=80=99s available to transcoders during the =E2=80=9Cmain=E2=80=9D exp=
ort process.  IDK if
footnotes are handled in the same way, or rather processed in a late
step after the transcoders.  But it=E2=80=99s six of one, half a dozen of t=
he
other I think.

>
> (The code is not very pretty yet, but it does generate citations and
> bibliographies in both plain text and HTML, and it would be
> straightforward to extend it to other output formats.  I can post it
> somewhere if anyone is interested in taking a look.)

Please do!

>
>> Some people have talked about supporting other CSL processors.  I don=E2=
=80=99t
>> see much utility in that, since CSL is a standard that all processors
>> should implement faithfully.
>
> Indeed!  Though as you have observed, `should' and `already does' come
> apart.  I doubt there are any implementations that are perfectly
> complete.  So it may be worth thinking about how Org can talk to CSL
> processors in a processor-independent way.  That way, different users
> can use different CSL processors if one works better for their
> particular document or environment.

I take your point, but any differences in the implementation just make
it potentially harder to be processor-independent.  I think we should
tightly integrate with one processor, working around whatever warts it
may have.

>
> I think the generate-and-pass-JSON approach is promising for that
> reason.  That is what citeproc-js accepts as input (so maybe that is
> what citeproc-java is doing internally?), and my code aims to allow
> citeproc-hs to interpret the same JSON format as citeproc-js.

... hmm.  Do you mean you=E2=80=99ve written Haskell code?

> I don't know Ruby, but I think it would be easy to make citeproc-ruby
> accept the same JSON format.  Do you have a sense of how easy it would
> be to coax citeproc-java into accepting JSON on stdin?

My understanding is that citeproc-java in its current form can read JSON
database files (in addition to bibtex).  However, it does not accept JSON
to control output of citations =E2=80=93 it merely allows passing a key, for
which a =E2=80=9Cdefault=E2=80=9D citation will be generated (no prefix/suf=
fix/author
suppression/...).

It would not be easy for me to extend it, because I=E2=80=99m not fluent in
Java.

Thanks,

--
Aaron Ecay