emacs-orgmode@gnu.org archives
 help / color / mirror / code / Atom feed
* Citation processing via Zotero + zotxt
@ 2015-11-28 20:16 Richard Lawrence
  2015-11-30  8:12 ` Martin Yrjölä
                   ` (3 more replies)
  0 siblings, 4 replies; 37+ messages in thread
From: Richard Lawrence @ 2015-11-28 20:16 UTC (permalink / raw)
  To: emacs-orgmode

Hi everyone,

For the past few days, I've been looking more closely at using the
combination of Zotero [1] with Erik Hetzner's zotxt plugin [2] as a
means of processing citations when exporting to non-LaTeX backends.  I
am now thinking that this is probably our best option, but I'd like to
know what other people think before I sink a lot of work into it.

Here are the reasons I think this is the best option:

1) It is really easy for users.

For those unfamiliar, Zotero is a reference manager, and zotxt is a
Zotero plugin that makes it easier to work with Zotero from plain text
documents.  Both are Firefox plugins, which means they can be installed
by a non-technical user with a couple of clicks.  It also means that
users get updates automatically.  I think this is *really* important.

Pretty much all the other options we have talked about seem like they
will require multi-step, non-trivial installation procedures ("First
install {Node.js/Haskell/JVM ...}, then install
{citeproc-node/pandoc-citeproc/citeproc-java...}, then install our
wrapper script...").  Updating could require other manual operations of
similar complexity.  Avoiding that kind of procedure will make citations
a lot more usable from Org for everyone.

Also, unlike the other options, Zotero is a full-featured reference
manager, not just a batch processor.  So we as users get a useful piece
of software with a simple installation procedure; the other options
require a complex installation procedure for a less-useful program.

2) It is quite complete.

Previously, I thought that it would be a similar amount of work to
communicate with Zotero from Emacs as any of the other CSL
implementations out there.  However, after looking at zotxt a bit more
closely, I discovered that it has an (undocumented) API endpoint [3]
that pretty much does exactly what we need: it accepts a list of
citation objects, and returns a list of formatted citations and a
formatted bibliography, which can be inserted into the exported
document.

This endpoint still needs a little bit of work, to generalize it and
make it easier to get the data in the format we need.  (That is probably
why it is undocumented in the README.) But it requires much less work
than I thought it would, and much less work than it would be to get a
full-featured setup with something like citeproc-node.

Erik has also written a package for communicating with zotxt from Emacs,
zotxt-emacs [4], which is available on MELPA.  This package already
contains a lot of useful functions for querying the Zotero database and
inserting reference data into documents, including links in Org
documents.  I think it would be pretty straightforward to extend this
package to provide a nice UI for writers who are inserting citations
into Org documents, including search-based lookups of keys, etc.
Perhaps org-ref could also be taught to communicate with zotxt (with or
without zotxt-emacs) without too much work.

3) It uses citeproc-js.

In previous discussions, I think we agreed that it would be best for us
to use citeproc-js as a CSL processor, since it is the `canonical' CSL
implementation, as opposed to pandoc-citeproc or citeproc-java.  Zotero
just uses citeproc-js internally to process citations, so it meets this
requirement. 


I know that many people (perhaps especially the `power users' who have
been active in the citations discussion so far) prefer to maintain their
reference database without the aid of a GUI reference manager like
Zotero.  I still think Zotero + zotxt is the best option for non-LaTeX
citation processing, even for these folks.  The ease of installation
(and removal) of the required programs alone makes it worth it, even if
you never actually populate a Zotero database.  So given what I know at
the moment, I think our efforts would best be directed at making the
in-progress org-cite library communicate with Zotero via zotxt.  What do
you think?

Best,
Richard

[1] https://www.zotero.org/
[2] https://gitlab.com/egh/zotxt/
[3] See the bibliographyEndpoint function in:
    https://gitlab.com/egh/zotxt/blob/master/extension/bootstrap.js
[4] https://gitlab.com/egh/zotxt-emacs

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Citation processing via Zotero + zotxt
  2015-11-28 20:16 Citation processing via Zotero + zotxt Richard Lawrence
@ 2015-11-30  8:12 ` Martin Yrjölä
  2015-12-01  0:32   ` John Kitchin
  2015-11-30 11:35 ` Eric S Fraga
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 37+ messages in thread
From: Martin Yrjölä @ 2015-11-30  8:12 UTC (permalink / raw)
  To: Richard Lawrence; +Cc: emacs-orgmode


Hi Richard.

Thank you for your work on improving citation support for org-mode. I
have some experiences from using Zotero, but for the moment I'm using
org-ref[1] to manage my citations.

Richard Lawrence writes:

> For the past few days, I've been looking more closely at using the
> combination of Zotero [1] with Erik Hetzner's zotxt plugin [2] as a
> means of processing citations when exporting to non-LaTeX backends.  I
> am now thinking that this is probably our best option, but I'd like to
> know what other people think before I sink a lot of work into it.
>
> Here are the reasons I think this is the best option:
>
> 1) It is really easy for users.

I think ease of use is very important. Maybe some day we can talk with
Zotero directly from Emacs, but installing a browser extension seems to
be the easiest we can do for now.

> I know that many people (perhaps especially the `power users' who have
> been active in the citations discussion so far) prefer to maintain their
> reference database without the aid of a GUI reference manager like
> Zotero.

Maybe two way sync between BibTeX and Zotero would be sufficient for
power users? I know there is already automatic exporters to BibTex for
Zotero, but some work needs to be done to make it a two-way sync.

I don't mind the GUI for editing the bibliography database, especially
when the Zotero Translators [2] make such a good job for scraping
citation metadata directly from web pages.

I wouldn't like to use the Zotero GUI for bibliography notes. I think
the best feature of org-ref is that the bibliography notes are stored in
an org-mode file, my preferred note-taking tool. Will this be supported
with the new citation processing you are working on?

Best regards,
Martin Yrjölä

[1] https://github.com/jkitchin/org-ref
[2] https://www.zotero.org/support/dev/translators

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Citation processing via Zotero + zotxt
  2015-11-28 20:16 Citation processing via Zotero + zotxt Richard Lawrence
  2015-11-30  8:12 ` Martin Yrjölä
@ 2015-11-30 11:35 ` Eric S Fraga
  2015-11-30 12:02   ` Ramon Diaz-Uriarte
       [not found] ` <CAN_Dec-JCmwQ2bc=U93rRaPMYyXOiN_PRbtyya9x396r3qqMrg@mail.gmail.com>
  2015-12-01 14:36 ` Rasmus
  3 siblings, 1 reply; 37+ messages in thread
From: Eric S Fraga @ 2015-11-30 11:35 UTC (permalink / raw)
  To: Richard Lawrence; +Cc: emacs-orgmode

On Saturday, 28 Nov 2015 at 12:16, Richard Lawrence wrote:
> Hi everyone,
>
> For the past few days, I've been looking more closely at using the
> combination of Zotero [1] with Erik Hetzner's zotxt plugin [2] as a
> means of processing citations when exporting to non-LaTeX backends.  I
> am now thinking that this is probably our best option, but I'd like to
> know what other people think before I sink a lot of work into it.

Richard,

thanks for the long and descriptive email.  What you propose does seem
like a potentially attractive way forward especially in the context of
minimal impact for installation.

I have two questions:

1. Is Zotero open source?  I believe not so I wonder about the
   implications of this for org.

2. How would I use this starting from an org-bibtex database (which I
   typically export to bibtex)?

Regardless, my export requirements are usually satisfied by export to
LaTeX so my main concern will not be org to odt or HTML but on the user
interface aspects within org for inserting citations etc.

Thanks again,
eric
-- 
: Eric S Fraga (0xFFFCF67D), Emacs 24.5.1, Org release_8.3beta-1229-ge900eb

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Citation processing via Zotero + zotxt
  2015-11-30 11:35 ` Eric S Fraga
@ 2015-11-30 12:02   ` Ramon Diaz-Uriarte
  0 siblings, 0 replies; 37+ messages in thread
From: Ramon Diaz-Uriarte @ 2015-11-30 12:02 UTC (permalink / raw)
  To: Eric S Fraga; +Cc: emacs-orgmode

Dear Eric,


On Mon, 30-11-2015, at 12:35, Eric S Fraga <e.fraga@ucl.ac.uk> wrote:
> On Saturday, 28 Nov 2015 at 12:16, Richard Lawrence wrote:
>> Hi everyone,
>>
>> For the past few days, I've been looking more closely at using the
>> combination of Zotero [1] with Erik Hetzner's zotxt plugin [2] as a
>> means of processing citations when exporting to non-LaTeX backends.  I
>> am now thinking that this is probably our best option, but I'd like to
>> know what other people think before I sink a lot of work into it.
>
> Richard,
>
> thanks for the long and descriptive email.  What you propose does seem
> like a potentially attractive way forward especially in the context of
> minimal impact for installation.
>
> I have two questions:
>
> 1. Is Zotero open source?  I believe not so I wonder about the
>    implications of this for org.
>

Yes, it is open source (GNU GPL Affero, v.3):

https://www.zotero.org/support/licensing


And code is available from github:

https://github.com/zotero



Best,

R.


> 2. How would I use this starting from an org-bibtex database (which I
>    typically export to bibtex)?
>
> Regardless, my export requirements are usually satisfied by export to
> LaTeX so my main concern will not be org to odt or HTML but on the user
> interface aspects within org for inserting citations etc.
>
> Thanks again,
> eric


-- 
Ramon Diaz-Uriarte
Department of Biochemistry, Lab B-25
Facultad de Medicina
Universidad Autónoma de Madrid 
Arzobispo Morcillo, 4
28029 Madrid
Spain

Phone: +34-91-497-2412

Email: rdiaz02@gmail.com
       ramon.diaz@iib.uam.es

http://ligarto.org/rdiaz

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Citation processing via Zotero + zotxt
  2015-11-30  8:12 ` Martin Yrjölä
@ 2015-12-01  0:32   ` John Kitchin
  2015-12-01  2:59     ` Richard Lawrence
  2015-12-01  9:34     ` Ramon Diaz-Uriarte
  0 siblings, 2 replies; 37+ messages in thread
From: John Kitchin @ 2015-12-01  0:32 UTC (permalink / raw)
  To: Martin Yrjölä; +Cc: emacs-orgmode


Martin Yrjölä writes:

> Hi Richard.
>
> Thank you for your work on improving citation support for org-mode. I
> have some experiences from using Zotero, but for the moment I'm using
> org-ref[1] to manage my citations.

This really means you are using bibtex as the reference database
backend, and probably helm-bibtex to insert citations as org-mode links.
Bibliography generation is probably managed by LaTeX/Bibtex. org-ref
mostly provides the functionality of the links, and some utility
functions to help build and maintain bibtex files, and to check all the
cross-reference, label and citation links in the document.

Ideally, something like zotxt would enable Zotero as the reference
database backend with some selection mechanism (it could be helm, or
ido, or some other interface) to insert citation syntax, and at export
time also leverage Zotero to generate the bibliography. I guess this
would not too often be export to LaTeX, since that would require syncing
with a Bibtex file. For quick and dirty work this would be fine, but my
experience is converted bibtex files are never clean enough for
production publication, and require a lot of manual cleaning of
capitalization, journal abbreviations, math, etc... It is ok to do it
once, but not every time you build a document!

A minimal, generalized interface that would probably work with org-ref
would look something like (* is some prefix name like org-cite-):

*-select-citations-from-db opens some selection interface with
  candidates from the db-backend ((org-)bibtex, Zotero, Mendeley, RIS,
  ...), allows user to select some citations with some filtering
  capability then inserts citation syntax into org file. This function
  should allow multiple selections, and allow appending to an existing
  citation. org-ref supports reftex and helm-bibtex for this, and ebib
  would also work.

*-export-citations This is the most difficult part, this code would have
  to functionally do what LaTeX/BibTex accomplishes, and probably with
  multiple passes to collect citations, get formatted bibliographic
  entries, replace citations in the text (eg with numbers, au-year,
  etc...), and insert the formatted bibliography in the right place. If
  Zotero can give the formatted entry for each citation, this might be a
  good start. It would be integrated into the export process. It might
  even be backend-agnostic if the formatting was just plain or in org
  markup text.

*-on-cite-click-func This is technically optional, but it is what makes
  org-ref so great. Figures out what citation was clicked on, and runs
  some action on that citation, e.g. open citation, find related
  articles in google scholar, open browser to citation url, open notes
  to the citation, etc... This is org-ref's most clever code, figuring
  out what was clicked on, and providing multiple actions on that
  citation key.

most other stuff in org-ref covers cross-references, labels and eqn
labels, or is very specific to bibtex.


> Richard Lawrence writes:
>
>> For the past few days, I've been looking more closely at using the
>> combination of Zotero [1] with Erik Hetzner's zotxt plugin [2] as a
>> means of processing citations when exporting to non-LaTeX backends.  I
>> am now thinking that this is probably our best option, but I'd like to
>> know what other people think before I sink a lot of work into it.
>>
>> Here are the reasons I think this is the best option:
>>
>> 1) It is really easy for users.

It is pretty easy to use zotero. I couldn't figure out zotxt though.

>
> I think ease of use is very important. Maybe some day we can talk with
> Zotero directly from Emacs, but installing a browser extension seems to
> be the easiest we can do for now.

It is possible to talk directly to the Zotero sqlite database, but you
need the browser extension or standalone to make it!

>> I know that many people (perhaps especially the `power users' who have
>> been active in the citations discussion so far) prefer to maintain their
>> reference database without the aid of a GUI reference manager like
>> Zotero.

These are people who use bibtex I think. I don't think its practical to
directly edit the sqlite database ;)

> Maybe two way sync between BibTeX and Zotero would be sufficient for
> power users? I know there is already automatic exporters to BibTex for
> Zotero, but some work needs to be done to make it a two-way sync.

I am personally skeptical of this, having "translated" Endnote libraries
to Bibtex and back. The simple stuff mostly works, but accented names,
chemical formulas, and many other things tend to not translate very well.

> I don't mind the GUI for editing the bibliography database, especially
> when the Zotero Translators [2] make such a good job for scraping
> citation metadata directly from web pages.

you should try doi-utils-add-bibtex-entry-from-doi in org-ref ;) it does
a lot of those things too! maybe i will get some good ideas there ;)

> I wouldn't like to use the Zotero GUI for bibliography notes. I think
> the best feature of org-ref is that the bibliography notes are stored in
> an org-mode file, my preferred note-taking tool. Will this be supported
> with the new citation processing you are working on?

This should be independent of the citation processing, and is only
related to the functions available when you click on a citation. Notes
can be accessed via helm-bibtex too.

>
> Best regards,
> Martin Yrjölä
>
> [1] https://github.com/jkitchin/org-ref
> [2] https://www.zotero.org/support/dev/translators

--
Professor John Kitchin
Doherty Hall A207F
Department of Chemical Engineering
Carnegie Mellon University
Pittsburgh, PA 15213
412-268-7803
@johnkitchin
http://kitchingroup.cheme.cmu.edu

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Citation processing via Zotero + zotxt
  2015-12-01  0:32   ` John Kitchin
@ 2015-12-01  2:59     ` Richard Lawrence
  2015-12-03  0:27       ` Matt Lundin
  2015-12-01  9:34     ` Ramon Diaz-Uriarte
  1 sibling, 1 reply; 37+ messages in thread
From: Richard Lawrence @ 2015-12-01  2:59 UTC (permalink / raw)
  To: John Kitchin, Martin Yrjölä; +Cc: emacs-orgmode

Hi John and all,

Thanks for your input!

John Kitchin <jkitchin@andrew.cmu.edu> writes:

> This really means you are using bibtex as the reference database
> backend, and probably helm-bibtex to insert citations as org-mode links.
> Bibliography generation is probably managed by LaTeX/Bibtex. org-ref
> mostly provides the functionality of the links, and some utility
> functions to help build and maintain bibtex files, and to check all the
> cross-reference, label and citation links in the document.
>
> Ideally, something like zotxt would enable Zotero as the reference
> database backend with some selection mechanism (it could be helm, or
> ido, or some other interface) to insert citation syntax

This is pretty much what the zotxt-emacs package does.  It communicates
with the zotxt API, and it defines a minor mode for use with Org.  This
minor mode provides functions for querying the zotxt database (by
default, it searches by author, title and year) and for inserting and
updating Org links that point to items in the Zotero database.

It should be straightforward to extend these capabilities to insert
citation syntax instead of, or in addition to, links.

> , and at export time also leverage Zotero to generate the
> bibliography. I guess this would not too often be export to LaTeX,
> since that would require syncing with a Bibtex file. For quick and
> dirty work this would be fine, but my experience is converted bibtex
> files are never clean enough for production publication, and require a
> lot of manual cleaning of capitalization, journal abbreviations, math,
> etc... It is ok to do it once, but not every time you build a
> document!

Hmm, that is interesting and a bit concerning.  Zotero has the ability
to export sets of items to a BibTeX file.  Ideally this kind of cleanup
should, and in principle could, be dealt with `upstream' in the Zotero
database.  The CSL representation of items allows for pretty
fine-grained and explicit control over things like capitalization and
abbreviations.  But I do not know how much of that is exposed in Zotero,
and I don't know how well it is preserved upon export to BibTeX.

Certainly, I'd want export to LaTeX to be just as quick and painless as
export to other backends in general.  Some manual cleanup of the .bib
file seems acceptable to me as a final step before publication, but I
think it would count against Zotero if it was too annoying to

Zotero can also import BibTeX, so it would also be possible in principle
to go in the other direction: keep your database as .bib, but convert it
on the fly when exporting to other backends.

> A minimal, generalized interface that would probably work with org-ref
> would look something like (* is some prefix name like org-cite-):
>
> *-select-citations-from-db opens some selection interface with
>   candidates from the db-backend ((org-)bibtex, Zotero, Mendeley, RIS,
>   ...), allows user to select some citations with some filtering
>   capability then inserts citation syntax into org file. This function
>   should allow multiple selections, and allow appending to an existing
>   citation. org-ref supports reftex and helm-bibtex for this, and ebib
>   would also work.

As I mentioned above, zotxt-emacs already provides an API in this direction.

> *-export-citations This is the most difficult part, this code would have
>   to functionally do what LaTeX/BibTex accomplishes, and probably with
>   multiple passes to collect citations, get formatted bibliographic
>   entries, replace citations in the text (eg with numbers, au-year,
>   etc...), and insert the formatted bibliography in the right place. If
>   Zotero can give the formatted entry for each citation, this might be a
>   good start. It would be integrated into the export process. It might
>   even be backend-agnostic if the formatting was just plain or in org
>   markup text.

Yep, this is the hard one, but there is some more-or-less working code
to do this (via my org-citeproc script) in the in-progress org-cite.el.
I'm proposing reworking this code to communicate with Zotero, rather
than org-citeproc or another citeproc implementation.

> *-on-cite-click-func This is technically optional, but it is what makes
>   org-ref so great. Figures out what citation was clicked on, and runs
>   some action on that citation, e.g. open citation, find related
>   articles in google scholar, open browser to citation url, open notes
>   to the citation, etc... This is org-ref's most clever code, figuring
>   out what was clicked on, and providing multiple actions on that
>   citation key.

This needs more work, in org-element.  Right now, as far as I know,
there's no support for making citation-reference objects clickable.  On
the other hand, it doesn't seem like it should be that difficult.

> It is pretty easy to use zotero. I couldn't figure out zotxt though.

Hmm, where did you have trouble?  I was able to install it via this link

https://addons.mozilla.org/en-US/firefox/addon/zotxt/

and that was pretty much all there was to do.  (Figuring out how to use
the API it exposes is another matter, but that's programming work, not
something an ordinary user would have to do.)

>> Maybe two way sync between BibTeX and Zotero would be sufficient for
>> power users? I know there is already automatic exporters to BibTex for
>> Zotero, but some work needs to be done to make it a two-way sync.
>
> I am personally skeptical of this, having "translated" Endnote libraries
> to Bibtex and back. The simple stuff mostly works, but accented names,
> chemical formulas, and many other things tend to not translate very well.

Me too.  I think two-way sync is probably too annoying to try to get
right, and there isn't much to be gained by it.  I think instead we
should just support keeping reference information in a single
database---even if we "bless" several different kinds of database
(Zotero, org-bibtex, .bib)---and then provide a way to export citations
that point to items in that database for all relevant export backends.

Eric's question is relevant here:

Eric S Fraga <e.fraga@ucl.ac.uk> writes:

> 2. How would I use this starting from an org-bibtex database (which I
>    typically export to bibtex)?

I can envision a couple of possibilities.  One simple option would be to
switch to managing your reference database with Zotero, by exporting
from org-bibtex to .bib, and then importing the .bib into Zotero.

I don't want to force that on anyone, though.  Another option is to
use the org-bibtex to produce .bib at export time, and then use Zotero
to read the .bib and process citations when exporting to non-LaTeX formats.

This second option is more work, as I don't know of any API for loading
items into Zotero's citation processor in BibTeX format.  But given that
Zotero is able to import .bib files, I imagine this API would not be too
much work to build.

So, the short answer is: I think you should be able to continue to use
org-bibtex as your reference database, and gain citation support when
exporting to non-LaTeX backends, perhaps with as little effort as
installing Zotero + zotxt and writing

#+BIBDB: my-org-bibtex-file.org

in your document.  Anyway, that would be my goal.
 
Best,
Richard

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Citation processing via Zotero + zotxt
  2015-12-01  0:32   ` John Kitchin
  2015-12-01  2:59     ` Richard Lawrence
@ 2015-12-01  9:34     ` Ramon Diaz-Uriarte
  1 sibling, 0 replies; 37+ messages in thread
From: Ramon Diaz-Uriarte @ 2015-12-01  9:34 UTC (permalink / raw)
  To: John Kitchin; +Cc: emacs-orgmode, Martin Yrjölä




On Tue, 01-12-2015, at 01:32, John Kitchin <jkitchin@andrew.cmu.edu> wrote:
> Martin Yrjölä writes:
>
>> Maybe two way sync between BibTeX and Zotero would be sufficient for
>> power users? I know there is already automatic exporters to BibTex for
>> Zotero, but some work needs to be done to make it a two-way sync.
>
> I am personally skeptical of this, having "translated" Endnote libraries
> to Bibtex and back. The simple stuff mostly works, but accented names,
> chemical formulas, and many other things tend to not translate very well.


In addition, Zotero users often organize their references using
collections, tags, and "related" (and the first two I find essential, both
with Zotero itself and when using Zotero in a tablet). Some of these (e.g.,
collections, tags) might be synced/translated from BibTeX into Zotero
though not always reliably. And trying to get these to work reliably is
probably too much work.


Best,

R.


-- 
Ramon Diaz-Uriarte
Department of Biochemistry, Lab B-25
Facultad de Medicina
Universidad Autónoma de Madrid 
Arzobispo Morcillo, 4
28029 Madrid
Spain

Phone: +34-91-497-2412

Email: rdiaz02@gmail.com
       ramon.diaz@iib.uam.es

http://ligarto.org/rdiaz

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Fwd:  Citation processing via Zotero + zotxt
       [not found] ` <CAN_Dec-JCmwQ2bc=U93rRaPMYyXOiN_PRbtyya9x396r3qqMrg@mail.gmail.com>
@ 2015-12-01 12:12   ` Matt Price
  2015-12-01 15:28     ` Eric S Fraga
  2015-12-01 21:51     ` Matt Lundin
  0 siblings, 2 replies; 37+ messages in thread
From: Matt Price @ 2015-12-01 12:12 UTC (permalink / raw)
  To: Org Mode

[-- Attachment #1: Type: text/plain, Size: 4859 bytes --]

On Sat, Nov 28, 2015 at 3:16 PM, Richard Lawrence <
richard.lawrence@berkeley.edu> wrote:

> Hi everyone,
>
> For the past few days, I've been looking more closely at using the
> combination of Zotero [1] with Erik Hetzner's zotxt plugin [2] as a
> means of processing citations when exporting to non-LaTeX backends.  I
> am now thinking that this is probably our best option, but I'd like to
> know what other people think before I sink a lot of work into it.
>
> Here are the reasons I think this is the best option:
>
> 1) It is really easy for users.
>
> For those unfamiliar, Zotero is a reference manager, and zotxt is a
> Zotero plugin that makes it easier to work with Zotero from plain text
> documents.  Both are Firefox plugins, which means they can be installed
> by a non-technical user with a couple of clicks.  It also means that
> users get updates automatically.  I think this is *really* important.
>
> Pretty much all the other options we have talked about seem like they
> will require multi-step, non-trivial installation procedures ("First
> install {Node.js/Haskell/JVM ...}, then install
> {citeproc-node/pandoc-citeproc/citeproc-java...}, then install our
> wrapper script...").  Updating could require other manual operations of
> similar complexity.  Avoiding that kind of procedure will make citations
> a lot more usable from Org for everyone.
>
> Also, unlike the other options, Zotero is a full-featured reference
> manager, not just a batch processor.  So we as users get a useful piece
> of software with a simple installation procedure; the other options
> require a complex installation procedure for a less-useful program.
>
> 2) It is quite complete.
>
> Previously, I thought that it would be a similar amount of work to
> communicate with Zotero from Emacs as any of the other CSL
> implementations out there.  However, after looking at zotxt a bit more
> closely, I discovered that it has an (undocumented) API endpoint [3]
> that pretty much does exactly what we need: it accepts a list of
> citation objects, and returns a list of formatted citations and a
> formatted bibliography, which can be inserted into the exported
> document.
>
> This endpoint still needs a little bit of work, to generalize it and
> make it easier to get the data in the format we need.  (That is probably
> why it is undocumented in the README.) But it requires much less work
> than I thought it would, and much less work than it would be to get a
> full-featured setup with something like citeproc-node.
>
> Erik has also written a package for communicating with zotxt from Emacs,
> zotxt-emacs [4], which is available on MELPA.  This package already
> contains a lot of useful functions for querying the Zotero database and
> inserting reference data into documents, including links in Org
> documents.  I think it would be pretty straightforward to extend this
> package to provide a nice UI for writers who are inserting citations
> into Org documents, including search-based lookups of keys, etc.
> Perhaps org-ref could also be taught to communicate with zotxt (with or
> without zotxt-emacs) without too much work.
>
> 3) It uses citeproc-js.
>
> In previous discussions, I think we agreed that it would be best for us
> to use citeproc-js as a CSL processor, since it is the `canonical' CSL
> implementation, as opposed to pandoc-citeproc or citeproc-java.  Zotero
> just uses citeproc-js internally to process citations, so it meets this
> requirement.
>
>
> I know that many people (perhaps especially the `power users' who have
> been active in the citations discussion so far) prefer to maintain their
> reference database without the aid of a GUI reference manager like
> Zotero.  I still think Zotero + zotxt is the best option for non-LaTeX
> citation processing, even for these folks.  The ease of installation
> (and removal) of the required programs alone makes it worth it, even if
> you never actually populate a Zotero database.  So given what I know at
> the moment, I think our efforts would best be directed at making the
> in-progress org-cite library communicate with Zotero via zotxt.  What do
> you think?
>
> Best,
> Richard
>
> [1] https://www.zotero.org/
> [2] https://gitlab.com/egh/zotxt/
> [3] See the bibliographyEndpoint function in:
>     https://gitlab.com/egh/zotxt/blob/master/extension/bootstrap.js
> [4] https://gitlab.com/egh/zotxt-emacs
>
> I love Zotxt. my only concern is for those people who run a remote Emacs.
If, say, you run emacs on a server that you access by ssh, it will not be
possible to process your citations with org.

I personally don't use this method, though I have occasionally had to work
off of a server via some randome computer when my laptop breaks down.  For
me, this is a veyr rare case. Is there anyone on the list who might
actually be negatively affected by this?

[-- Attachment #2: Type: text/html, Size: 5963 bytes --]

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Citation processing via Zotero + zotxt
  2015-11-28 20:16 Citation processing via Zotero + zotxt Richard Lawrence
                   ` (2 preceding siblings ...)
       [not found] ` <CAN_Dec-JCmwQ2bc=U93rRaPMYyXOiN_PRbtyya9x396r3qqMrg@mail.gmail.com>
@ 2015-12-01 14:36 ` Rasmus
  2015-12-01 19:37   ` Matt Price
  2015-12-02 15:58   ` Richard Lawrence
  3 siblings, 2 replies; 37+ messages in thread
From: Rasmus @ 2015-12-01 14:36 UTC (permalink / raw)
  To: emacs-orgmode

Hi Richard,

First, thank you for looking into this.  I learned something new from this
post.

Richard Lawrence <richard.lawrence@berkeley.edu> writes:

> 1) It is really easy for users.

I generally don’t buy the arguments a reference manager being nice per
see.

Also, last I checked Zotero also existed as a standalone manager.  If this
also works re your points below, using Zotero is a non-issue.  There
exists a plugin to keep a bibtex file up to date.  I don’t know if it’s
two way, though.

> For those unfamiliar, Zotero is a reference manager, and zotxt is a
> Zotero plugin that makes it easier to work with Zotero from plain text
> documents.  Both are Firefox plugins, which means they can be installed
> by a non-technical user with a couple of clicks.  It also means that
> users get updates automatically.  I think this is *really* important.

I don’t think this is relevant.

> Pretty much all the other options we have talked about seem like they
> will require multi-step, non-trivial installation procedures ("First
> install {Node.js/Haskell/JVM ...}, then install
> {citeproc-node/pandoc-citeproc/citeproc-java...}, then install our
> wrapper script...").  Updating could require other manual operations of
> similar complexity.  Avoiding that kind of procedure will make citations
> a lot more usable from Org for everyone.

I think this is *very* important.

> Also, unlike the other options, Zotero is a full-featured reference
> manager, not just a batch processor.  So we as users get a useful piece
> of software with a simple installation procedure; the other options
> require a complex installation procedure for a less-useful program.

I don’t think this is relevant.

> 2) It is quite complete.
>
> Previously, I thought that it would be a similar amount of work to
> communicate with Zotero from Emacs as any of the other CSL
> implementations out there.  However, after looking at zotxt a bit more
> closely, I discovered that it has an (undocumented) API endpoint [3]

This sounds amazing, but also dangerous.  Do you know whether stabilizing
the API has been discussed upstream?

> that pretty much does exactly what we need: it accepts a list of
> citation objects, and returns a list of formatted citations and a
> formatted bibliography, which can be inserted into the exported
> document.

Could you give an example of the sort of input you give?  I.e. is it based
on keys?  From my bibtex-centric world view I imagine something like:

   I send key or pointer @K to a DB entry as well as a CSL-file pointer C,
   and maybe a desired output format F.  I get a string back that is the
   formatting of the data behind @K formatted according to the rules in C,
   adapted to F.

Is that correct?  If so, does it support html, text and odt?

> This endpoint still needs a little bit of work, to generalize it and
> make it easier to get the data in the format we need.  (That is probably
> why it is undocumented in the README.) But it requires much less work
> than I thought it would, and much less work than it would be to get a
> full-featured setup with something like citeproc-node.

This is a very strong argument.

At some point Matt talked about adding support for org citation syntax in
citeproc-js.  Would this be advantageous if going this route?  I guess not.

> Erik has also written a package for communicating with zotxt from Emacs,
> zotxt-emacs [4], which is available on MELPA.  This package already
> contains a lot of useful functions for querying the Zotero database and
> inserting reference data into documents, including links in Org
> documents.  I think it would be pretty straightforward to extend this
> package to provide a nice UI for writers who are inserting citations
> into Org documents, including search-based lookups of keys, etc.
> Perhaps org-ref could also be taught to communicate with zotxt (with or
> without zotxt-emacs) without too much work.

I guess we’d need to convince Erik to move it to GELPA.  Fortunately (from
that point of view), neither zotxt nor zotxt-emacs seems to have too many
contributors.

> 3) It uses citeproc-js.
>
> In previous discussions, I think we agreed that it would be best for us
> to use citeproc-js as a CSL processor, since it is the `canonical' CSL
> implementation, as opposed to pandoc-citeproc or citeproc-java.  Zotero
> just uses citeproc-js internally to process citations, so it meets this
> requirement. 

Indeed.

> I know that many people (perhaps especially the `power users' who have
> been active in the citations discussion so far) prefer to maintain their
> reference database without the aid of a GUI reference manager like
> Zotero.

I think this solves part of that issue.  Again it could be a dangerous
solution.  Alternatively, perhaps it’s possible to feed-in a bibtex file
to Zotero via an API.  This would also limit the damage.

IMO we can leverage zotero as a tool, but we cannot enforce it as a
bibliography manager.

> I still think Zotero + zotxt is the best option for non-LaTeX
> citation processing, even for these folks.  The ease of installation
> (and removal) of the required programs alone makes it worth it, even if
> you never actually populate a Zotero database.  So given what I know at
> the moment, I think our efforts would best be directed at making the
> in-progress org-cite library communicate with Zotero via zotxt.  What do
> you think?

+1, though re zotxt we should make sure Erik would want to move it to
GELPA.

Rasmus

-- 
The right to be left alone is a human right

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Fwd:  Citation processing via Zotero + zotxt
  2015-12-01 12:12   ` Fwd: " Matt Price
@ 2015-12-01 15:28     ` Eric S Fraga
  2015-12-01 15:51       ` Richard Lawrence
  2015-12-01 21:51     ` Matt Lundin
  1 sibling, 1 reply; 37+ messages in thread
From: Eric S Fraga @ 2015-12-01 15:28 UTC (permalink / raw)
  To: Matt Price; +Cc: Org Mode

On Tuesday,  1 Dec 2015 at 07:12, Matt Price wrote:
> I love Zotxt. my only concern is for those people who run a remote Emacs.
> If, say, you run emacs on a server that you access by ssh, it will not be
> possible to process your citations with org.

I'm not sure I understand this.

Is the implication that one will need to be connected to the Internet to
be able to export citations?  If so, this is not good.  I do much, often
most, of my writing while off-line (long train journeys every day).  It
would be quite intrusive to my writing habits to have to ensure that I
can connect to prepare actual exported documents.

For most, this may not be an issue so obviously my concern should not
derail any progress etc. but I am probably not alone...  Clarification,
in any case, would be welcome.

thanks,
eric
-- 
: Eric S Fraga (0xFFFCF67D), Emacs 25.0.50.2, Org release_8.3.2-355-g18f083

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Fwd:  Citation processing via Zotero + zotxt
  2015-12-01 15:28     ` Eric S Fraga
@ 2015-12-01 15:51       ` Richard Lawrence
  2015-12-01 19:20         ` Matt Price
  2015-12-01 21:34         ` Eric S Fraga
  0 siblings, 2 replies; 37+ messages in thread
From: Richard Lawrence @ 2015-12-01 15:51 UTC (permalink / raw)
  To: Eric S Fraga, Matt Price; +Cc: Org Mode

Hi Eric and all,

Eric S Fraga <e.fraga@ucl.ac.uk> writes:

> On Tuesday,  1 Dec 2015 at 07:12, Matt Price wrote:
>> I love Zotxt. my only concern is for those people who run a remote Emacs.
>> If, say, you run emacs on a server that you access by ssh, it will not be
>> possible to process your citations with org.
>
> I'm not sure I understand this.
>
> Is the implication that one will need to be connected to the Internet to
> be able to export citations?  If so, this is not good.

No, you should not need to be connected to the Internet.  As I envision
things, Emacs will just speak to the Zotero installation on the local
machine.  That's what zotxt does now.  So, there should be no problem as
long as Firefox, Zotero and zotxt are running on the same machine that
is running Emacs.

I suppose the concern is that someone running a remote Emacs is likely
doing so on a machine that does not have a GUI...and hence may not have
Firefox installed, and consequently won't have Zotero or zotxt.  In that
case, citation processing would have to go another route.

Are there a lot of people in that situation?
 
Best,
Richard

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Fwd: Citation processing via Zotero + zotxt
  2015-12-01 15:51       ` Richard Lawrence
@ 2015-12-01 19:20         ` Matt Price
  2015-12-01 21:34         ` Eric S Fraga
  1 sibling, 0 replies; 37+ messages in thread
From: Matt Price @ 2015-12-01 19:20 UTC (permalink / raw)
  To: Org Mode

[-- Attachment #1: Type: text/plain, Size: 1410 bytes --]

On Tue, Dec 1, 2015 at 10:51 AM, Richard Lawrence <
richard.lawrence@berkeley.edu> wrote:

> Hi Eric and all,
>
> Eric S Fraga <e.fraga@ucl.ac.uk> writes:
>
> > On Tuesday,  1 Dec 2015 at 07:12, Matt Price wrote:
> >> I love Zotxt. my only concern is for those people who run a remote
> Emacs.
> >> If, say, you run emacs on a server that you access by ssh, it will not
> be
> >> possible to process your citations with org.
> >
> > I'm not sure I understand this.
> >
> > Is the implication that one will need to be connected to the Internet to
> > be able to export citations?  If so, this is not good.
>
> No, you should not need to be connected to the Internet.  As I envision
> things, Emacs will just speak to the Zotero installation on the local
> machine.  That's what zotxt does now.  So, there should be no problem as
> long as Firefox, Zotero and zotxt are running on the same machine that
> is running Emacs.
>
> I suppose the concern is that someone running a remote Emacs is likely
> doing so on a machine that does not have a GUI...and hence may not have
> Firefox installed, and consequently won't have Zotero or zotxt.  In that
> case, citation processing would have to go another route.
>

This is the scenario I was envisioning, thank you both for clarifying.

>
> Are there a lot of people in that situation?
>

... and this is precisely the question I meant to ask.

>
> Best,
> Richard
>

[-- Attachment #2: Type: text/html, Size: 2237 bytes --]

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Citation processing via Zotero + zotxt
  2015-12-01 14:36 ` Rasmus
@ 2015-12-01 19:37   ` Matt Price
  2015-12-02 15:58   ` Richard Lawrence
  1 sibling, 0 replies; 37+ messages in thread
From: Matt Price @ 2015-12-01 19:37 UTC (permalink / raw)
  To: Rasmus; +Cc: Org Mode

[-- Attachment #1: Type: text/plain, Size: 4613 bytes --]

On Tue, Dec 1, 2015 at 9:36 AM, Rasmus <rasmus@gmx.us> wrote:

> Hi Richard,
>
> First, thank you for looking into this.  I learned something new from this
>
> > Pretty much all the other options we have talked about seem like they
> > will require multi-step, non-trivial installation procedures ("First
> > install {Node.js/Haskell/JVM ...}, then install
> > {citeproc-node/pandoc-citeproc/citeproc-java...}, then install our
> > wrapper script...").  Updating could require other manual operations of
> > similar complexity.  Avoiding that kind of procedure will make citations
> > a lot more usable from Org for everyone.
>
> I think this is *very* important.
>
I totally agree.



> > 2) It is quite complete.
> >
> > Previously, I thought that it would be a similar amount of work to
> > communicate with Zotero from Emacs as any of the other CSL
> > implementations out there.  However, after looking at zotxt a bit more
> > closely, I discovered that it has an (undocumented) API endpoint [3]
>
> This sounds amazing, but also dangerous.  Do you know whether stabilizing
> the API has been discussed upstream?
>

I think the API Richard is referring to is *zotxt's* API, not Zotero's. So
"upstream" is a very short distance to an underground spring under our
house (Erik Hetzner).
The somewhat more widely-used Better Bibtex plugin also has provides an API
to the Zotero database (https://zotplus.github.io/better-bibtex/cayw.html).
In either case, it probably would be relatively easy to provide patches to
the maintainer if we run into trouble.


>
> > that pretty much does exactly what we need: it accepts a list of
> > citation objects, and returns a list of formatted citations and a
> > formatted bibliography, which can be inserted into the exported
> > document.
>
> Could you give an example of the sort of input you give?  I.e. is it based
> on keys?  From my bibtex-centric world view I imagine something like:
>
>    I send key or pointer @K to a DB entry as well as a CSL-file pointer C,
>    and maybe a desired output format F.  I get a string back that is the
>    formatting of the data behind @K formatted according to the rules in C,
>    adapted to F.
>
> Is that correct?  If so, does it support html, text and odt?
>
> Right now, IIUC, zotxt accepts only a *style name*, not a CSL file -- it
will locate the CSL file in the Zotero style list.  It supports html and
text output formats, as well as the QuickKey syntax used by the ODF-scan
zotero plugin (https://github.com/Zotero-ODF-Scan/zotero-odf-scan). My
understanding is that providing fully-formed odt syntax is difficult,
because  of the structure of the odt file, which I guess wants a bunch of
metadata that isn't trivial to provide.  The recommended path right now is
to run ODF-scan on the odt from libreoffice -- it's an annoying extra step
that I was hoping to be able to avoid, but am not competent to program:

https://forums.zotero.org/discussion/29308/7/rtfodf-scan-for-zotero/#Comment_226799

> This endpoint still needs a little bit of work, to generalize it and
> > make it easier to get the data in the format we need.  (That is probably
> > why it is undocumented in the README.) But it requires much less work
> > than I thought it would, and much less work than it would be to get a
> > full-featured setup with something like citeproc-node.
>
> This is a very strong argument.
>
> At some point Matt talked about adding support for org citation syntax in
> citeproc-js.  Would this be advantageous if going this route?  I guess not.
>

Depends on whether you want to be able to request org-mode syntax from
zotero. Zotero uses citation-js internally; changes we make to citation-js
will eventually percolate up to zotero, and it's not impossible to replace
zotero's citeproc with one's own copy (even I can do it).

>
> >
>


> IMO we can leverage zotero as a tool, but we cannot enforce it as a
> bibliography manager.
>

yes

>
> > I still think Zotero + zotxt is the best option for non-LaTeX
> > citation processing, even for these folks.  The ease of installation
> > (and removal) of the required programs alone makes it worth it, even if
> > you never actually populate a Zotero database.  So given what I know at
> > the moment, I think our efforts would best be directed at making the
> > in-progress org-cite library communicate with Zotero via zotxt.  What do
> > you think?
>
> +1, though re zotxt we should make sure Erik would want to move it to
> GELPA.
>

Basically I'm enthusiastic and glad you are taking up the challenge, since
matt's programming:snail's pace :: snail's pace:leopard run

[-- Attachment #2: Type: text/html, Size: 6677 bytes --]

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Fwd:  Citation processing via Zotero + zotxt
  2015-12-01 15:51       ` Richard Lawrence
  2015-12-01 19:20         ` Matt Price
@ 2015-12-01 21:34         ` Eric S Fraga
  1 sibling, 0 replies; 37+ messages in thread
From: Eric S Fraga @ 2015-12-01 21:34 UTC (permalink / raw)
  To: Richard Lawrence; +Cc: Org Mode

On Tuesday,  1 Dec 2015 at 07:51, Richard Lawrence wrote:

[...]

> No, you should not need to be connected to the Internet.  As I envision
> things, Emacs will just speak to the Zotero installation on the local
> machine.  That's what zotxt does now.

Ah, okay.  Thanks.

> So, there should be no problem as long as Firefox, Zotero and zotxt
> are running on the same machine that is running Emacs.

I guess the only concern I would have is the impact of Firefox on the
system.  I do use my wee Pandora for writing, albeit slowly, and emacs
already essentially fills the small amount of memory (RAM) the system
has.  Running Firefox simultaneously hits the system hard.  It's not
exactly a sleek program...

But this is definitely an outlier in terms of use cases so no worries!

(and I hope to upgrade to a Pyra when they come out "real soon now" ;-)

Thanks again,
eric

-- 
: Eric S Fraga (0xFFFCF67D), Emacs 25.0.50.2, Org release_8.3.2-355-g18f083

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Fwd:  Citation processing via Zotero + zotxt
  2015-12-01 12:12   ` Fwd: " Matt Price
  2015-12-01 15:28     ` Eric S Fraga
@ 2015-12-01 21:51     ` Matt Lundin
  2015-12-02 15:21       ` Richard Lawrence
  1 sibling, 1 reply; 37+ messages in thread
From: Matt Lundin @ 2015-12-01 21:51 UTC (permalink / raw)
  To: Matt Price; +Cc: Org Mode

Matt Price <moptop99@gmail.com> writes:

> On Sat, Nov 28, 2015 at 3:16 PM, Richard Lawrence
> <richard.lawrence@berkeley.edu> wrote:

>     I know that many people (perhaps especially the `power users' who
>     have been active in the citations discussion so far) prefer to
>     maintain their reference database without the aid of a GUI
>     reference manager like Zotero. I still think Zotero + zotxt is the
>     best option for non-LaTeX citation processing, even for these
>     folks. The ease of installation (and removal) of the required
>     programs alone makes it worth it, even if you never actually
>     populate a Zotero database. So given what I know at the moment, I
>     think our efforts would best be directed at making the in-progress
>     org-cite library communicate with Zotero via zotxt. What do you
>     think?
>
>     Best,
>     Richard
>
>     [1] https://www.zotero.org/
>     [2] https://gitlab.com/egh/zotxt/
>     [3] See the bibliographyEndpoint function in:
>     https://gitlab.com/egh/zotxt/blob/master/extension/bootstrap.js
>     [4] https://gitlab.com/egh/zotxt-emacs
>
> I love Zotxt. my only concern is for those people who run a remote
> Emacs. If, say, you run emacs on a server that you access by ssh, it
> will not be possible to process your citations with org. 
>
> I personally don't use this method, though I have occasionally had to
> work off of a server via some randome computer when my laptop breaks
> down. For me, this is a veyr rare case. Is there anyone on the list
> who might actually be negatively affected by this? 

I do have a remote emacs installation that I use occasionally, but not
enough to make this a concern.

One question (based on complete ignorance of either zotero or zotxt) is
whether those of us who maintain bibtex databases solely in emacs would
have to interact with the zotero GUI. Let's say I have a bib file and an
org document containing citations. Would everything place transparently
in the background without ever having to interact with zotero? Can
zotero query a bib file without requiring the user manually to
import/export the database?

Thanks,
Matt

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Fwd:  Citation processing via Zotero + zotxt
  2015-12-01 21:51     ` Matt Lundin
@ 2015-12-02 15:21       ` Richard Lawrence
  0 siblings, 0 replies; 37+ messages in thread
From: Richard Lawrence @ 2015-12-02 15:21 UTC (permalink / raw)
  To: Matt Lundin, Matt Price; +Cc: Org Mode

Hi Matt and all,

Matt Lundin <mdl@imapmail.org> writes:

> One question (based on complete ignorance of either zotero or zotxt) is
> whether those of us who maintain bibtex databases solely in emacs would
> have to interact with the zotero GUI.

My goal would be that you wouldn't need to interact with it at all.
(You would need to have Firefox running in the background, but that's it
-- no need to actually interact with the window as a user.)

> Let's say I have a bib file and an org document containing
> citations. Would everything place transparently in the background
> without ever having to interact with zotero? Can zotero query a bib
> file without requiring the user manually to import/export the
> database?

I do not know of a way to do this at the moment.  But it may already
exist somewhere, and if it doesn't, I don't think it will be too hard to
build.  (If anyone knows better, please let me know!)
 
Best,
Richard

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Citation processing via Zotero + zotxt
  2015-12-01 14:36 ` Rasmus
  2015-12-01 19:37   ` Matt Price
@ 2015-12-02 15:58   ` Richard Lawrence
  1 sibling, 0 replies; 37+ messages in thread
From: Richard Lawrence @ 2015-12-02 15:58 UTC (permalink / raw)
  To: Rasmus, emacs-orgmode

Hi Rasmus and all,

Rasmus <rasmus@gmx.us> writes:

> Also, last I checked Zotero also existed as a standalone manager.  If this
> also works re your points below, using Zotero is a non-issue.  There
> exists a plugin to keep a bibtex file up to date.  I don’t know if it’s
> two way, though.

I haven't looked into whether things work the same way when Zotero is
installed as a standalone program, but I don't imagine they'd be
significantly different.  (What do you mean by "using Zotero is a
non-issue"?  Do you think installing as a standalone program would be
better, or did I misunderstand?)


>> Previously, I thought that it would be a similar amount of work to
>> communicate with Zotero from Emacs as any of the other CSL
>> implementations out there.  However, after looking at zotxt a bit more
>> closely, I discovered that it has an (undocumented) API endpoint [3]
>
> This sounds amazing, but also dangerous.  Do you know whether stabilizing
> the API has been discussed upstream?

As Matt mentioned, "upstream" in this case is zotxt, not Zotero.  I have
heard from Erik that he's open to changing and stabilizing this API.

>> that pretty much does exactly what we need: it accepts a list of
>> citation objects, and returns a list of formatted citations and a
>> formatted bibliography, which can be inserted into the exported
>> document.
>
> Could you give an example of the sort of input you give?  I.e. is it based
> on keys?  From my bibtex-centric world view I imagine something like:
>
>    I send key or pointer @K to a DB entry as well as a CSL-file pointer C,
>    and maybe a desired output format F.  I get a string back that is the
>    formatting of the data behind @K formatted according to the rules in C,
>    adapted to F.

Yes, that's correct, except that ideally you send the data for all
citations at once (because context is important).  So for example, if
Erik accepts a patch for some small changes I wrote, one can query

/zotxt/bibliography?outputFormat=html&style=chicago-fullnote-bibliography

sending POST data like

[
    { "citationItems": [{"key": "0_ZOTKEY1"}],
      "properties": {"noteIndex": 0}},
    { "citationItems": [{"key": "0_ZOTKEY2"}]
      "properties": {"noteIndex": 0}},
    ...
]

and get back data that basically looks like

{
  "bibliography": [
    {
      ...
      "bibstart": "<div class=\"csl-bib-body\">\n",
      "bibend": "</div>"
    },
    [
      "  <div class=\"csl-entry\">Formatted entry for 0_ZOTKEY1</div>\n",
      "  <div class=\"csl-entry\">Formatted entry for 0_ZOTKEY2</div>\n",
      ...
    ]
  ],
  "citationClusters": [
    "Formatted citation for 0_ZOTKEY1",
    "Formatted citation for 0_ZOTKEY2",
    ...
  ]
}

> Is that correct?  If so, does it support html, text and odt?

At the moment it supports html and text.  I suppose it could be made to
support ODT, though I'm not sure how difficult it is.  However, I think
a better solution would actually be to have it return *Org* markup, and
then replace citation objects in the document with that.  The main
problem this solves concerns note-based styles: we can insert Org
footnotes into the document at the beginning of the export process, and
then let Org figure out how to number the total set of footnotes in its
usual way.  I think it's a lot harder to deal with citation footnotes if
they come already-formatted in the output format.

>> Erik has also written a package for communicating with zotxt from Emacs,
>> zotxt-emacs [4], which is available on MELPA.  This package already
>> contains a lot of useful functions for querying the Zotero database and
>> inserting reference data into documents, including links in Org
>> documents.  I think it would be pretty straightforward to extend this
>> package to provide a nice UI for writers who are inserting citations
>> into Org documents, including search-based lookups of keys, etc.
>> Perhaps org-ref could also be taught to communicate with zotxt (with or
>> without zotxt-emacs) without too much work.
>
> I guess we’d need to convince Erik to move it to GELPA.  Fortunately (from
> that point of view), neither zotxt nor zotxt-emacs seems to have too many
> contributors.

Is the concern here that Org should not officially depend on anything
that isn't in GELPA?  Or just that it would make installing easier for
users?

>> I know that many people (perhaps especially the `power users' who have
>> been active in the citations discussion so far) prefer to maintain their
>> reference database without the aid of a GUI reference manager like
>> Zotero.
>
> I think this solves part of that issue.  Again it could be a dangerous
> solution.  Alternatively, perhaps it’s possible to feed-in a bibtex file
> to Zotero via an API.  This would also limit the damage.

I am hoping it will prove simple to feed a .bib file to Zotero via an
existing API, or to add such an API to zotxt.  But at the moment that's
just a hope.  (If anyone knows more about how to access a BibTeX
translator in a Zotero plugin, please get in touch!)

> IMO we can leverage zotero as a tool, but we cannot enforce it as a
> bibliography manager.

Agreed.  

Best,
Richard

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Citation processing via Zotero + zotxt
  2015-12-01  2:59     ` Richard Lawrence
@ 2015-12-03  0:27       ` Matt Lundin
  2015-12-03  1:41         ` Christian Wittern
                           ` (2 more replies)
  0 siblings, 3 replies; 37+ messages in thread
From: Matt Lundin @ 2015-12-03  0:27 UTC (permalink / raw)
  To: Richard Lawrence; +Cc: emacs-orgmode, Martin Yrjölä, John Kitchin

Richard Lawrence <richard.lawrence@berkeley.edu> writes:

> Eric S Fraga <e.fraga@ucl.ac.uk> writes:
>
>> 2. How would I use this starting from an org-bibtex database (which I
>>    typically export to bibtex)?
>
> I can envision a couple of possibilities.  One simple option would be to
> switch to managing your reference database with Zotero, by exporting
> from org-bibtex to .bib, and then importing the .bib into Zotero.
>
> I don't want to force that on anyone, though.  Another option is to
> use the org-bibtex to produce .bib at export time, and then use Zotero
> to read the .bib and process citations when exporting to non-LaTeX formats.
>
> This second option is more work, as I don't know of any API for loading
> items into Zotero's citation processor in BibTeX format.  But given that
> Zotero is able to import .bib files, I imagine this API would not be too
> much work to build.

Given these complexities, it seems that if we went the zotero route we
could end up with a fairly large installation chain (firefox, zotero,
zotxt, plugin for zotero). And this would require installing items from
multiple, heterogeneous sources.

I wonder at this point whether pandoc-citeproc (packaged with pandoc)
would actually be the simpler route. It can parse bibtex files directly
and (as a filter within pandoc) can output formatted citations in org
format.

As a GNU/Linux user, I would find installing zotero and all the add-ons
messier and more cumbersome than installing pandoc and/or node-js (were
we to use citeproc-js) from the command line.

Best,
Matt

Footnotes:

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Citation processing via Zotero + zotxt
  2015-12-03  0:27       ` Matt Lundin
@ 2015-12-03  1:41         ` Christian Wittern
  2015-12-03 14:25           ` Matt Lundin
  2015-12-03  4:32         ` Richard Lawrence
  2015-12-03 23:31         ` John Kitchin
  2 siblings, 1 reply; 37+ messages in thread
From: Christian Wittern @ 2015-12-03  1:41 UTC (permalink / raw)
  To: emacs-orgmode

On 2015-12-03 8:27, Matt Lundin wrote:
> Given these complexities, it seems that if we went the zotero route we
> could end up with a fairly large installation chain (firefox, zotero,
> zotxt, plugin for zotero). And this would require installing items from
> multiple, heterogeneous sources.
I guess it depends on where you come from.  For those who already maintain
the bibliography in Zotero this is not much of an extra load.
>
> I wonder at this point whether pandoc-citeproc (packaged with pandoc)
> would actually be the simpler route. It can parse bibtex files directly
> and (as a filter within pandoc) can output formatted citations in org
> format.
This sounds like those maintaining a BibTex bibliography would be better
served here.  That also means that the Zotero route would not have to worry
about importing a bibtex file into Zotero behind the scenes.
>
> As a GNU/Linux user, I would find installing zotero and all the add-ons
> messier and more cumbersome than installing pandoc and/or node-js (were
> we to use citeproc-js) from the command line.
Maybe the best situation would be to support both the pandoc and zotero
toolchain as backends and let the user decide what to use. If the result is
in both cases org-formated citations, that should not make it too difficult, no?

All the best,

Christian

-- 
Christian Wittern, Kyoto

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Citation processing via Zotero + zotxt
  2015-12-03  0:27       ` Matt Lundin
  2015-12-03  1:41         ` Christian Wittern
@ 2015-12-03  4:32         ` Richard Lawrence
  2015-12-03 15:27           ` Matt Lundin
  2015-12-03 23:31         ` John Kitchin
  2 siblings, 1 reply; 37+ messages in thread
From: Richard Lawrence @ 2015-12-03  4:32 UTC (permalink / raw)
  To: Matt Lundin; +Cc: emacs-orgmode

Hi Matt and all,

Matt Lundin <mdl@imapmail.org> writes:

> Given these complexities, it seems that if we went the zotero route we
> could end up with a fairly large installation chain (firefox, zotero,
> zotxt, plugin for zotero). And this would require installing items from
> multiple, heterogeneous sources.

Well, I would guess that many people who are interested in this already
have Firefox installed, and after that, you just need to install two
Firefox plugins: Zotero and zotxt.  Open a couple of links, give your
permission, and that's it.

If you're skeptical, I encourage you to try it:

https://www.zotero.org/download/
https://addons.mozilla.org/en-US/firefox/addon/zotxt/

It's pretty easy.  And removing the plugins via about:addons is just as
easy.

> I wonder at this point whether pandoc-citeproc (packaged with pandoc)
> would actually be the simpler route. It can parse bibtex files directly
> and (as a filter within pandoc) can output formatted citations in org
> format.

We have discussed this before, and in fact, I already started work along
this route: see https://github.com/wyleyr/org-citeproc

I stopped because people objected that distributing a Haskell program is
too difficult.  Even if you can install pandoc-citeproc via your
system's package manager, to build org-citeproc against it you need a
complete Haskell build environment, which is (somewhat notoriously)
difficult to work with, and too much to expect for the average person
who just wants citation support in their Org documents.  Nor has anyone
volunteered to take care of building and distributing a binary for every
platform we'd want to support (including, I assume, Windows and OS
X...).

> As a GNU/Linux user, I would find installing zotero and all the add-ons
> messier and more cumbersome than installing pandoc and/or node-js (were
> we to use citeproc-js) from the command line.

I'm a Debian user, so I can appreciate your concern here.  But it's only
simpler to use the system package manager if all the dependencies are
already packaged for $YOUR_DISTRO, in a version that's up-to-date enough
for you to use.  Given the diversity of Org users, it seems likely that
we won't be able come up with a solution that goes via system package
managers that will work for everybody, at least not without a lot of
work.

The nice thing about Firefox (and these days, Emacs) is that it's a sort
of cross-platform package manager.  If the citation processing
dependencies are just Firefox plugins, they'll be much more accessible
to a much wider group of people without a lot of work on our part.  So,
that's why I'd prefer depending on Zotero to depending on something like
org-citeproc or citeproc-node.

Best,
Richard

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Citation processing via Zotero + zotxt
  2015-12-03  1:41         ` Christian Wittern
@ 2015-12-03 14:25           ` Matt Lundin
  0 siblings, 0 replies; 37+ messages in thread
From: Matt Lundin @ 2015-12-03 14:25 UTC (permalink / raw)
  To: Christian Wittern; +Cc: emacs-orgmode

Christian Wittern <cwittern@gmail.com> writes:

> On 2015-12-03 8:27, Matt Lundin wrote:
>> Given these complexities, it seems that if we went the zotero route we
>> could end up with a fairly large installation chain (firefox, zotero,
>> zotxt, plugin for zotero). And this would require installing items from
>> multiple, heterogeneous sources.
> I guess it depends on where you come from.  For those who already maintain
> the bibliography in Zotero this is not much of an extra load.

Yes, I think this is the key point. For those of us who maintain bib
databases within emacs, importing in zotero is problematic (overhead,
potential of data loss, etc.).

>> As a GNU/Linux user, I would find installing zotero and all the add-ons
>> messier and more cumbersome than installing pandoc and/or node-js (were
>> we to use citeproc-js) from the command line.
> Maybe the best situation would be to support both the pandoc and zotero
> toolchain as backends and let the user decide what to use. If the result is
> in both cases org-formated citations, that should not make it too difficult, no?

That sounds right. Ideally, org-cite could support zotero *and* bibtex
users (i.e., gui and text based/command line) solutions.

Matt

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Citation processing via Zotero + zotxt
  2015-12-03  4:32         ` Richard Lawrence
@ 2015-12-03 15:27           ` Matt Lundin
  2015-12-03 17:56             ` Richard Lawrence
  0 siblings, 1 reply; 37+ messages in thread
From: Matt Lundin @ 2015-12-03 15:27 UTC (permalink / raw)
  To: Richard Lawrence; +Cc: emacs-orgmode

Hi Richard, 

Richard Lawrence <richard.lawrence@berkeley.edu> writes:

> Hi Matt and all,
>
> Matt Lundin <mdl@imapmail.org> writes:
>
>> Given these complexities, it seems that if we went the zotero route we
>> could end up with a fairly large installation chain (firefox, zotero,
>> zotxt, plugin for zotero). And this would require installing items from
>> multiple, heterogeneous sources.
>
> Well, I would guess that many people who are interested in this already
> have Firefox installed, and after that, you just need to install two
> Firefox plugins: Zotero and zotxt.  Open a couple of links, give your
> permission, and that's it.
>
> If you're skeptical, I encourage you to try it:
>
> https://www.zotero.org/download/
> https://addons.mozilla.org/en-US/firefox/addon/zotxt/
>
> It's pretty easy.  And removing the plugins via about:addons is just as
> easy.

Thanks for the links. I'll try them out. But for bibtex users, wouldn't
there presumably have to be another zotero plugin that would allow for
live, automated importing of bibtex into zotero? (If anyone knows
whether such a plugin exists, please do let me know.)

> We have discussed this before, and in fact, I already started work along
> this route: see https://github.com/wyleyr/org-citeproc
>
> I stopped because people objected that distributing a Haskell program is
> too difficult.  Even if you can install pandoc-citeproc via your
> system's package manager, to build org-citeproc against it you need a
> complete Haskell build environment, which is (somewhat notoriously)
> difficult to work with, and too much to expect for the average person
> who just wants citation support in their Org documents.  Nor has anyone
> volunteered to take care of building and distributing a binary for every
> platform we'd want to support (including, I assume, Windows and OS
> X...).

Thanks for your work on this. I would agree that compile a haskell
program makes this a nonstarter for many users.

A thought experiment... Do we need a fork of pandoc-citeproc? Or could
we rather write an emacs-lisp wrapper that would feed citation data and
a bibliography to pandoc and receive a string containing citations
formatted in org syntax (one of pandoc's outputs). This could be done
via an export filter, with all the necessary manipulation being done on
the emacs/org side of things.

Pandoc is available in several linux distributions (in addition to the
deb file released by the project) and installers are available OS
Windows, so most people wouldn't need to compile it.

> The nice thing about Firefox (and these days, Emacs) is that it's a sort
> of cross-platform package manager.  If the citation processing
> dependencies are just Firefox plugins, they'll be much more accessible
> to a much wider group of people without a lot of work on our part.  So,
> that's why I'd prefer depending on Zotero to depending on something like
> org-citeproc or citeproc-node.

Javascript interpreters/engines are widely available for all platforms
if we create a wrapper script around citeproc-js. Node itself is also
easily available for most platforms. But we wouldn't need to set it up
as a node server à la citeproc-node.

Thanks so much for all your work on this. I'm happy to help out wherever
I can.

Best,
Matt

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Citation processing via Zotero + zotxt
  2015-12-03 15:27           ` Matt Lundin
@ 2015-12-03 17:56             ` Richard Lawrence
  2015-12-03 19:14               ` Thomas S. Dye
  2015-12-03 20:45               ` Matt Lundin
  0 siblings, 2 replies; 37+ messages in thread
From: Richard Lawrence @ 2015-12-03 17:56 UTC (permalink / raw)
  To: Matt Lundin; +Cc: emacs-orgmode

Hi Matt and all,

Matt Lundin <mdl@imapmail.org> writes:

> But for bibtex users, wouldn't there presumably have to be another
> zotero plugin that would allow for live, automated importing of bibtex
> into zotero? (If anyone knows whether such a plugin exists, please do
> let me know.)

Well, my hope is that this could be added to zotxt without much effort,
so we could still just depend on Zotero and zotxt.  The translation
capability already exists in Zotero; it's just a matter of exposing it
as an API, and I imagine that Erik would happily accept a patch to zotxt
that does so.

> A thought experiment... Do we need a fork of pandoc-citeproc?

(The name "org-citeproc" might be a bit misleading: it's not a fork of
pandoc-citeproc, but a small wrapper script that takes care of
marshalling data into and out of Pandoc, in a format that's easy to
generate and read from Emacs lisp.  But because it's written in Haskell,
building it still requires a full Haskell build chain, and distributing
it is non-trivial.)

> Or could we rather write an emacs-lisp wrapper that would feed
> citation data and a bibliography to pandoc and receive a string
> containing citations formatted in org syntax (one of pandoc's
> outputs). This could be done via an export filter, with all the
> necessary manipulation being done on the emacs/org side of things.

Yes, you're basically describing the approach that I eventually realized
org-citeproc should take: use the full capabilities of Pandoc to render
citations and bibliography in Org format, then re-parse these on the Org
side.  I did start to work on this, though I didn't finish and I'm not
sure if I pushed it to the public repo.

If we want to use pandoc-citeproc directly, instead of wrapping it with
something like org-citeproc, what we'd need to do is be able to
translate an Org document (or at least the citations within it) both to
and from pandoc-compatible JSON, since pandoc-citeproc reads and writes
in that format.  This can certainly be done in Emacs Lisp, and maybe it
would be worth doing, because it would mean that we'd gain a nice
serialization format for Org documents.

I am not opposed to this idea -- indeed, I kind of like it, which is why
I started work on org-citeproc in the first place.  Still, it would be a
non-trivial amount of work to develop this solution even to the point
that it can do what Zotero and zotxt can do right now.

> Javascript interpreters/engines are widely available for all platforms
> if we create a wrapper script around citeproc-js. Node itself is also
> easily available for most platforms. But we wouldn't need to set it up
> as a node server à la citeproc-node.

My concern here is with the wrapper script.  Yes, it's pretty easy to
install a javascript interpreter; but getting from there to the point
where you have a fully-working toolchain for processing citations from
Org mode is the problem.  What I think we should avoid is a process that
looks like:

1) Install node (or whatever interpreter)
2) Install citeproc-js and the wrapper script
3) Make sure the wrapper script is available as an executable that can
be called from Emacs
4) ...

Steps 2 and 3 are a bit much to ask people to do manually, and we don't
have the option of doing them through the system package manager.  They
could of course be automated by packaging up citeproc-js and a wrapper
script into a node package.  But again, that requires some work that no
one so far has volunteered to do; and anyway, it doesn't get away from
the worry about having to install packages/plugins from heterogeneous
sources.  (Also, we'd have to develop the wrapper script, and our own
way of translating items stored in BibTeX format into a format
compatible with citeproc-js.  This means duplicating efforts that have
already been put into both pandoc-citeproc and Zotero.)

It's a question of where to focus the limited resources we've got.  My
impression is that going with the combination of Zotero and zotxt will
represent the least amount of effort to get citations working on
non-LaTeX backends, for both Org developers and users.  I understand
that it won't be ideal for everyone, and there is time enough later on
to develop an alternative toolchain if that would better suit the needs
of people who don't want to depend on a GUI, etc.  I fully support that.
But until more people have time to work on this, it seems to me that
Zotero and zotxt represent the most practical path forward.
 
Best,
Richard

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Citation processing via Zotero + zotxt
  2015-12-03 17:56             ` Richard Lawrence
@ 2015-12-03 19:14               ` Thomas S. Dye
  2015-12-03 22:05                 ` Rasmus
  2015-12-03 20:45               ` Matt Lundin
  1 sibling, 1 reply; 37+ messages in thread
From: Thomas S. Dye @ 2015-12-03 19:14 UTC (permalink / raw)
  To: Richard Lawrence; +Cc: Matt Lundin, emacs-orgmode

Aloha all,

At the risk of introducing noise, I've long wondered if tex4ht might
help with citation processing in Org mode?  IIUC, tex4ht uses the
device-independent dvi file produced by TeX to make html, odt, and other
formats. With an Org mode that relied on tex4ht, users would use BibTeX
bibliographies, and Org mode documents with citations would be processed
by LaTeX and a tex4ht script that would yield the correctly formatted
in-text citations and bibliography in the desired output format.  Org mode
would then parse this output and plug in the pieces at the right places.

This approach, if feasible, would please LaTeX users happy with BibTeX
and wary of translating bibliographic databases to Zotero.  It would
only require a working LaTeX installation, and so would avoid the
Haskell hassle, etc.  And, it would make use of BibTex/BibLaTeX,
arguably the best bibliography software out there.

I'm in way over my head here and won't be surprised to learn that this
idea is a bad one.  Given the wide-ranging nature of this discussion, I
thought this might be the time to suggest it.

All the best,
Tom

-- 
Thomas S. Dye
http://www.tsdye.com

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Citation processing via Zotero + zotxt
  2015-12-03 17:56             ` Richard Lawrence
  2015-12-03 19:14               ` Thomas S. Dye
@ 2015-12-03 20:45               ` Matt Lundin
  2015-12-03 21:04                 ` Matt Lundin
  1 sibling, 1 reply; 37+ messages in thread
From: Matt Lundin @ 2015-12-03 20:45 UTC (permalink / raw)
  To: Richard Lawrence; +Cc: emacs-orgmode

Hi Richard, 

Thanks so much for this very helpful explanation!

Richard Lawrence <richard.lawrence@berkeley.edu> writes:

> Hi Matt and all,
>
> Matt Lundin <mdl@imapmail.org> writes:
>
>> But for bibtex users, wouldn't there presumably have to be another
>> zotero plugin that would allow for live, automated importing of bibtex
>> into zotero? (If anyone knows whether such a plugin exists, please do
>> let me know.)
>
> Well, my hope is that this could be added to zotxt without much effort,
> so we could still just depend on Zotero and zotxt.  The translation
> capability already exists in Zotero; it's just a matter of exposing it
> as an API, and I imagine that Erik would happily accept a patch to zotxt
> that does so.

That sounds like a great plan. 

> Yes, you're basically describing the approach that I eventually realized
> org-citeproc should take: use the full capabilities of Pandoc to render
> citations and bibliography in Org format, then re-parse these on the Org
> side.  I did start to work on this, though I didn't finish and I'm not
> sure if I pushed it to the public repo.
>
> If we want to use pandoc-citeproc directly, instead of wrapping it
> with something like org-citeproc, what we'd need to do is be able to
> translate an Org document (or at least the citations within it) both
> to and from pandoc-compatible JSON, since pandoc-citeproc reads and
> writes in that format.

I'm probably missing something, but would we necessarily need to convert
to pandoc's JSON format? A quick and dirty approach might be to use an
org export filter function to grab citations and insert some temporary
unique ids in the export string as placeholders for each citation. Then
we could create a temporary buffer that looks like this:

--8<---------------cut here---------------start------------->8---
unique_id1 [@some_citation, pp. 1-10]

unique_id2 [@another_citation, p. 23]
--8<---------------cut here---------------end--------------->8---

We could then run a shell command on the buffer (i.e., "pandoc
--filter=pandoc-citeproc --csl=/path/to/csl
--bibligoraphy=/path/to/bibdata -t org"), resulting in formatted
citations for each id. With some simple mapping, we could use a filter
function to insert the citations in the export string/buffer.

Obviously, JSON would be way more elegant. But would still need to run
the results through pandoc to get strings of formatted org output.

> I am not opposed to this idea -- indeed, I kind of like it, which is why
> I started work on org-citeproc in the first place.  Still, it would be a
> non-trivial amount of work to develop this solution even to the point
> that it can do what Zotero and zotxt can do right now.

Thanks for this explanation. I vote for you going full speed ahead with
the zotero/zotxt plans. I'd be happy to build on the work you've already
done to try to make pandoc-citeproc work.

>> Javascript interpreters/engines are widely available for all platforms
>> if we create a wrapper script around citeproc-js. Node itself is also
>> easily available for most platforms. But we wouldn't need to set it up
>> as a node server à la citeproc-node.
>
> My concern here is with the wrapper script.  Yes, it's pretty easy to
> install a javascript interpreter; but getting from there to the point
> where you have a fully-working toolchain for processing citations from
> Org mode is the problem.  What I think we should avoid is a process that
> looks like:
>
> 1) Install node (or whatever interpreter)
> 2) Install citeproc-js and the wrapper script
> 3) Make sure the wrapper script is available as an executable that can
> be called from Emacs
> 4) ...

If we chose node, we could try to package the wrapper script so it can
be installed via npm. Then the installation process would be:

a) install node
b) npm install citeproc-js-wrapper [or whatever]

> It's a question of where to focus the limited resources we've got. My
> impression is that going with the combination of Zotero and zotxt will
> represent the least amount of effort to get citations working on
> non-LaTeX backends, for both Org developers and users.... I fully
> support that. But until more people have time to work on this, it
> seems to me that Zotero and zotxt represent the most practical path
> forward.

That makes a lot of sense. Thanks for all the work you've already put
into this. I'm happy to help out wherever I can.

Matt

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Citation processing via Zotero + zotxt
  2015-12-03 20:45               ` Matt Lundin
@ 2015-12-03 21:04                 ` Matt Lundin
  0 siblings, 0 replies; 37+ messages in thread
From: Matt Lundin @ 2015-12-03 21:04 UTC (permalink / raw)
  To: Richard Lawrence; +Cc: emacs-orgmode

Matt Lundin <mdl@imapmail.org> writes:
>
> If we chose node, we could try to package the wrapper script so it can
> be installed via npm. Then the installation process would be:
>
> a) install node
> b) npm install citeproc-js-wrapper [or whatever]

I forgot. They would need to install npm as well.

Matt

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Citation processing via Zotero + zotxt
  2015-12-03 19:14               ` Thomas S. Dye
@ 2015-12-03 22:05                 ` Rasmus
  0 siblings, 0 replies; 37+ messages in thread
From: Rasmus @ 2015-12-03 22:05 UTC (permalink / raw)
  To: emacs-orgmode

Hi,

Thomas S. Dye <tsd@tsdye.com> writes:

> At the risk of introducing noise, I've long wondered if tex4ht might
> help with citation processing in Org mode?  IIUC, tex4ht uses the
> device-independent dvi file produced by TeX to make html, odt, and other
> formats. With an Org mode that relied on tex4ht, users would use BibTeX
> bibliographies, and Org mode documents with citations would be processed
> by LaTeX and a tex4ht script that would yield the correctly formatted
> in-text citations and bibliography in the desired output format.  Org mode
> would then parse this output and plug in the pieces at the right places.

tex4ht is beautiful piece of software.  For odt it creates perfect
citations and reference list.

I was thinking about utilizing biber for this purpose at some point.  I
forgot why I stopped looking at it...

It might be possible and if it is, it would be very nice, as biber and
biblatex are arguably the better tools.

> I'm in way over my head here and won't be surprised to learn that this
> idea is a bad one.  Given the wide-ranging nature of this discussion, I
> thought this might be the time to suggest it.

The last time I had to produce a "word document" with citations I used
tex4ht to generate the end-of-file bibliography and bibtex.el + hacks for
creating in-text citations.

Here’s snippets from my Makefile:

      paper_.docx: paper_.odm
              soffice --headless --convert-to odt paper_.odm; \
              soffice --headless --convert-to docx:"MS Word 2007 XML" paper_.odt

      paper_.odm: all_refs.odt paper.odt

      paper.odt: paper.org lit.bib
              emacs --batch --no-init-file --load paper-config/org-conf.el paper.org --funcall org-odt-export-to-odt; \
              $(call LO-TBLWIDTH, paper.odt)

      all_refs.odt: all_refs.tex paper.pdf
              biber paper.bcf --output_format=bibtex -O cited.bib; \
              latexmk -dvi all_refs.tex; \
              mk4ht oolatex all_refs.tex; \
              abiword --to=odt all_refs.odt

With all_refs.tex:

     \documentclass{article}
     \usepackage[utf8]{inputenc}
     \usepackage[T1]{fontenc}
     \usepackage[style=authoryear]{biblatex}
     \addbibresource{cited.bib}
     \begin{document}
     \nocite{*}
     \printbibliography[heading=none]
     \end{document}

Rasmus

-- 
May contains speling mistake

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Citation processing via Zotero + zotxt
  2015-12-03  0:27       ` Matt Lundin
  2015-12-03  1:41         ` Christian Wittern
  2015-12-03  4:32         ` Richard Lawrence
@ 2015-12-03 23:31         ` John Kitchin
  2015-12-04  0:03           ` Rasmus
                             ` (2 more replies)
  2 siblings, 3 replies; 37+ messages in thread
From: John Kitchin @ 2015-12-03 23:31 UTC (permalink / raw)
  To: Matt Lundin; +Cc: emacs-orgmode, Martin Yrjölä

I clearly had some super important academic work to do today, so instead
I played around with citations ;)

I am not yet convinced a citation processor will get us where we want
because of the complexity of the external dependencies, and the
potential/probable need for us to define new CSL files for different
backends, or at a minimum for org-formatted citations and
bibliographies. Hacking bst files is no fun, and it doesn't look like
CSL files are much better! Plus you have to find them and install them
somehow.

I wanted to get a sense for how well I could manipulate citation and
bibliography format from org-ref with a bibtex database. The answer is
it is pretty easy, not perfect, but pretty good, and could certainly be
made better with dedicated effort. You can see how here, and some
discussion about its limitations:

http://kitchingroup.cheme.cmu.edu/blog/2015/12/03/Exporting-numbered-citations-in-html-with-unsorted-numbered-bibliography/

I am pointing this out because I think the approach I used could allow
for plugins for different database backends, different ways to get the
replacements, etc... you could substitute org-ref links for the citation
syntax at some point with no real loss of generality. org-ref could
insert the new syntax as soon as it is available in a main org branch.
Some code will have to be rewritten to get the key under point, but that
probably won't be too hard.

We could provide a solution like this for some backends, using bibtex as
the database, for pretty immediate use. Then other more advanced
solutions could come along that would likely be superior in output
quality if they use real citation processors, but only if there are CSLs
for different backends (if I understand how they work). These would be
optional, and only needed if higher quality and flexibility in output
was required.

What do you think?




Matt Lundin writes:

> Richard Lawrence <richard.lawrence@berkeley.edu> writes:
>
>> Eric S Fraga <e.fraga@ucl.ac.uk> writes:
>>
>>> 2. How would I use this starting from an org-bibtex database (which I
>>>    typically export to bibtex)?
>>
>> I can envision a couple of possibilities.  One simple option would be to
>> switch to managing your reference database with Zotero, by exporting
>> from org-bibtex to .bib, and then importing the .bib into Zotero.
>>
>> I don't want to force that on anyone, though.  Another option is to
>> use the org-bibtex to produce .bib at export time, and then use Zotero
>> to read the .bib and process citations when exporting to non-LaTeX formats.
>>
>> This second option is more work, as I don't know of any API for loading
>> items into Zotero's citation processor in BibTeX format.  But given that
>> Zotero is able to import .bib files, I imagine this API would not be too
>> much work to build.
>
> Given these complexities, it seems that if we went the zotero route we
> could end up with a fairly large installation chain (firefox, zotero,
> zotxt, plugin for zotero). And this would require installing items from
> multiple, heterogeneous sources.
>
> I wonder at this point whether pandoc-citeproc (packaged with pandoc)
> would actually be the simpler route. It can parse bibtex files directly
> and (as a filter within pandoc) can output formatted citations in org
> format.
>
> As a GNU/Linux user, I would find installing zotero and all the add-ons
> messier and more cumbersome than installing pandoc and/or node-js (were
> we to use citeproc-js) from the command line.
>
> Best,
> Matt
>
> Footnotes:

--
Professor John Kitchin
Doherty Hall A207F
Department of Chemical Engineering
Carnegie Mellon University
Pittsburgh, PA 15213
412-268-7803
@johnkitchin
http://kitchingroup.cheme.cmu.edu

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Citation processing via Zotero + zotxt
  2015-12-03 23:31         ` John Kitchin
@ 2015-12-04  0:03           ` Rasmus
  2015-12-04  1:49             ` John Kitchin
  2015-12-04  2:06           ` Matt Lundin
  2015-12-04  9:20           ` Eric S Fraga
  2 siblings, 1 reply; 37+ messages in thread
From: Rasmus @ 2015-12-04  0:03 UTC (permalink / raw)
  To: emacs-orgmode

John Kitchin <jkitchin@andrew.cmu.edu> writes:

> I clearly had some super important academic work to do today, so instead
> I played around with citations ;)
>
> I am not yet convinced a citation processor will get us where we want
> because of the complexity of the external dependencies, and the
> potential/probable need for us to define new CSL files for different
> backends, or at a minimum for org-formatted citations and
> bibliographies. Hacking bst files is no fun, and it doesn't look like
> CSL files are much better! Plus you have to find them and install them
> somehow.
>
> I wanted to get a sense for how well I could manipulate citation and
> bibliography format from org-ref with a bibtex database. The answer is
> it is pretty easy, not perfect, but pretty good, and could certainly be
> made better with dedicated effort. You can see how here, and some
> discussion about its limitations:
>
> http://kitchingroup.cheme.cmu.edu/blog/2015/12/03/Exporting-numbered-citations-in-html-with-unsorted-numbered-bibliography/
>
> I am pointing this out because I think the approach I used could allow
> for plugins for different database backends, different ways to get the
> replacements, etc... you could substitute org-ref links for the citation
> syntax at some point with no real loss of generality. org-ref could
> insert the new syntax as soon as it is available in a main org branch.
> Some code will have to be rewritten to get the key under point, but that
> probably won't be too hard.
>
> We could provide a solution like this for some backends, using bibtex as
> the database, for pretty immediate use. Then other more advanced
> solutions could come along that would likely be superior in output
> quality if they use real citation processors, but only if there are CSLs
> for different backends (if I understand how they work). These would be
> optional, and only needed if higher quality and flexibility in output
> was required.
>
> What do you think?

My guess is that it is orders of magnitudes harder with author (year) and
footnotes and whatnot.  But bibtex.el helps in some regards (indeed in an
ancestor to this thread we discussed whether bibtex.el + reftex.el could
be used).  Then comes the formatting and sorting of the bibliography, also
when using different languages (Marcin talks about this) or types of
publications (bibtex.el lists 27 when you switch the dialect to
biblatex)...

As a small example comes out of your example which has mistakes such as
"17()".

The fact that the documentation of biblatex-chicago (a biblatex *style*)
is 149 pages might also serve as a warning.

Note, I use a similar system myself when exporting citations outside of
latex (lookup stuff from bibtex and cross-fingers), but fitted to
author-year.  It’s certainly doable when you know your fixed needs, but it
is harder to convince yourself it’s a good general solution.

It would be nice if you could prove me wrong.

Rasmus

-- 
What will be next?

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Citation processing via Zotero + zotxt
  2015-12-04  0:03           ` Rasmus
@ 2015-12-04  1:49             ` John Kitchin
  2015-12-05 21:18               ` Matt Lundin
  0 siblings, 1 reply; 37+ messages in thread
From: John Kitchin @ 2015-12-04  1:49 UTC (permalink / raw)
  To: Rasmus; +Cc: emacs-orgmode

>> What do you think?
>
> My guess is that it is orders of magnitudes harder with author (year) and
> footnotes and whatnot.

footnotes... The only place I can see this is in org/odt export. In org
I can see it ok with the approach I described. For odt, I guess it would
be about the same, if you know the markup code to insert for a footnote.
Other formats are continuous pages, so the footnotes would end up at the
end of the document.

> But bibtex.el helps in some regards (indeed in an
> ancestor to this thread we discussed whether bibtex.el + reftex.el could
> be used).  Then comes the formatting and sorting of the bibliography, also
> when using different languages (Marcin talks about this) or types of
> publications (bibtex.el lists 27 when you switch the dialect to
> biblatex)...

CSL does not currently support multiple languages according to
http://citationstyles.org/styles/, although a fork of zotero is supposed
to support it. CSL also does not support composite citations, e.g. that
are common in chemistry journals. It doesn't support journal
abbreviations.

If a reference type is not listed in the CSL, it also will not be
supported by CSL I suppose.

I also suppose the CSL must be backend specific to output formats
appropriate to org, html, LaTeX, markdown, etc... for any particular
style.

The point is there is no single solution now, or ever that will easily
handle all reference databases, all reference types, to any output, in
arbitrary format/style.

We should not try to support all of these things. We could support a
small number of things that could be improved or increased in the
future.

> As a small example comes out of your example which has mistakes such as
> "17()".

this is from missing information in the bibtex file, and the single
format string in reftex. This kind of error is easy to fix. More
difficult changes are to the authors (e.g. initials, et al after 3
authors, etc...), journal abbreviations, and for different types of
references.

For a blogpost I consider that acceptable. For a publication/proposal
not so much. For that, I would use LaTeX export which wouldn't have that
issue. A key point here is I don't think you can have it both ways. A
bibtex file with LaTeX markup of equations cannot be used to make an
html/org/markdown/etc.. bibliography that looks right without handling
the LateX markup in it and converting it some way to the right format. A
bibtex file that uses strings to switch between full journal names and
abbreviated journal names will be challenging for any system other than
bibtex. This is not special to bibtex, it is general to any reference
database I think. CSL does not save us from this.

No doubt there are many corner cases that could be problematic. I don't
have the feeling there would be more than using CSL though. Good luck
getting the CSL to make the in text citations clickable hyperlinks with
tooltips of the citation in html. I would rather learn to use elisp for
this than CSL for that sole purpose.

> The fact that the documentation of biblatex-chicago (a biblatex *style*)
> is 149 pages might also serve as a warning.

The CSL for this is only 651 lines
(https://github.com/citation-style-language/styles/blob/master/chicago-author-date.csl).
Its either very concise, or not complete ;)

There are plenty of warning signs ;) If you want perfect
biblatex-chicago, I think the answer is use biblatex. The issue as I see
it is one of balancing something that mostly works, and is improveable
and something that currently doesn't exist for org-mode for something
other than latex.

> Note, I use a similar system myself when exporting citations outside of
> latex (lookup stuff from bibtex and cross-fingers), but fitted to
> author-year.  It’s certainly doable when you know your fixed needs, but it
> is harder to convince yourself it’s a good general solution.
>
> It would be nice if you could prove me wrong.

I cannot. I don't think you are wrong. The only time-tested, publication
quality solutions for citations in my opinion right now are bib(la)tex, MS
Word/reference manager, and "by hand". Even these get "edited" in their
final print versions by journals.

I think we can provide some fixed solutions for non-LaTeX exports that
will be pretty good. At least until a proven external cite-processor
solution exists.

As a finishing thought:

Just imagine if this CSL snippet (https://github.com/citation-style-language/styles/blob/master/acs-nano.csl):
<citation collapse="citation-number">
    <sort>
      <key variable="citation-number"/>
    </sort>
    <layout delimiter="," vertical-align="sup">
      <text variable="citation-number"/>
    </layout>
  </citation>

was really this much more readable sexp form :)

(citation (:collapse "citation-number")
          (sort ()
           (key (:variable "citation-number")))
          (layout (:delimiter "," :vertical-align "sup")
                  (text (:variable "citation-number"))))

It might start making more sense to think of a lisp based citation
processor. It might even address some limitations of bib(la)tex.

Now, about those cross-references and labels...

>
> Rasmus

--
Professor John Kitchin
Doherty Hall A207F
Department of Chemical Engineering
Carnegie Mellon University
Pittsburgh, PA 15213
412-268-7803
@johnkitchin
http://kitchingroup.cheme.cmu.edu

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Citation processing via Zotero + zotxt
  2015-12-03 23:31         ` John Kitchin
  2015-12-04  0:03           ` Rasmus
@ 2015-12-04  2:06           ` Matt Lundin
  2015-12-04 11:22             ` John Kitchin
  2015-12-04  9:20           ` Eric S Fraga
  2 siblings, 1 reply; 37+ messages in thread
From: Matt Lundin @ 2015-12-04  2:06 UTC (permalink / raw)
  To: John Kitchin; +Cc: Martin Yrjölä, emacs-orgmode

John Kitchin <jkitchin@andrew.cmu.edu> writes:
>
> I am not yet convinced a citation processor will get us where we want
> because of the complexity of the external dependencies, and the
> potential/probable need for us to define new CSL files for different
> backends, or at a minimum for org-formatted citations and
> bibliographies. Hacking bst files is no fun, and it doesn't look like
> CSL files are much better! Plus you have to find them and install them
> somehow.

As I understand it, we would not need to hack the CSL files to get org
markup. With a good processor, such as citeproc-js, it should be trivial
to modify the output format.[fn:1]

What CSL implementations do offer is the complexity to handle all the
nuances of multiple citation styles, languages, etc. (e.g., something
like Chicago Manual of Style footnotes). My suspicion is that it would
take years to code something in emacs-lisp that offers all of the
functionality of CSL processors.

> http://kitchingroup.cheme.cmu.edu/blog/2015/12/03/Exporting-numbered-citations-in-html-with-unsorted-numbered-bibliography/
>
> I am pointing this out because I think the approach I used could allow
> for plugins for different database backends, different ways to get the
> replacements, etc... you could substitute org-ref links for the
> citation syntax at some point with no real loss of generality. org-ref
> could insert the new syntax as soon as it is available in a main org
> branch. Some code will have to be rewritten to get the key under
> point, but that probably won't be too hard.

Thanks for sharing this. I use something like this myself for *basic*
Chicago Manual of Style formatting when I can't rely on
biblatex-chicago. Might I ask: What is org-ref syntax as opposed to
citation syntax?

>  Then other more advanced solutions could come along that would likely
> be superior in output quality if they use real citation processors,
> but only if there are CSLs for different backends (if I understand how
> they work).

I don't think modifying CSL styles would be necessary. A huge number
already exist.[fn:2] I think all we would need to do is to convert the
final CSL output to org syntax, which pandoc can already do and which
citeproc-js could do with minor additions.

Best,
Matt

Footnotes:

[fn:1]
See https://bitbucket.org/fbennett/citeproc-js/src/tip/src/formats.js?fileviewer=file-view-default

[fn:2] https://github.com/citation-style-language/styles

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Citation processing via Zotero + zotxt
  2015-12-03 23:31         ` John Kitchin
  2015-12-04  0:03           ` Rasmus
  2015-12-04  2:06           ` Matt Lundin
@ 2015-12-04  9:20           ` Eric S Fraga
  2 siblings, 0 replies; 37+ messages in thread
From: Eric S Fraga @ 2015-12-04  9:20 UTC (permalink / raw)
  To: John Kitchin; +Cc: Martin Yrjölä, Matt Lundin, emacs-orgmode

On Thursday,  3 Dec 2015 at 18:31, John Kitchin wrote:
> I clearly had some super important academic work to do today, so instead
> I played around with citations ;)

Hey, I am doing just that right now ;-)

> I am not yet convinced a citation processor will get us where we want
> because of the complexity of the external dependencies, and the
> potential/probable need for us to define new CSL files for different
> backends, or at a minimum for org-formatted citations and
> bibliographies. Hacking bst files is no fun, and it doesn't look like
> CSL files are much better! Plus you have to find them and install them
> somehow.

I agree completely.

I've looked at your web page for this.  I like your solution.  It is not
complete but we know that a complete solution for citations, given all
the weird and wonderful formats journals etc. expect, requires a very
complex database of style information (witness bst and csl files).  For
a quick 'n' dirty solution, keeping to emacs lisp and org is very
appealing.

However, it is most appealing for those of us coming from the bibtex
world...

-- 
: Eric S Fraga (0xFFFCF67D), Emacs 25.0.50.2, Org release_8.3.2-363-g5c13a6

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Citation processing via Zotero + zotxt
  2015-12-04  2:06           ` Matt Lundin
@ 2015-12-04 11:22             ` John Kitchin
  2015-12-04 14:13               ` Matt Lundin
  0 siblings, 1 reply; 37+ messages in thread
From: John Kitchin @ 2015-12-04 11:22 UTC (permalink / raw)
  To: Matt Lundin; +Cc: Martin Yrjölä, emacs-orgmode@gnu.org

[-- Attachment #1: Type: text/plain, Size: 3755 bytes --]

> > I am not yet convinced a citation processor will get us where we want
> > because of the complexity of the external dependencies, and the
> > potential/probable need for us to define new CSL files for different
> > backends, or at a minimum for org-formatted citations and
> > bibliographies. Hacking bst files is no fun, and it doesn't look like
> > CSL files are much better! Plus you have to find them and install them
> > somehow.
>
> As I understand it, we would not need to hack the CSL files to get org
> markup. With a good processor, such as citeproc-js, it should be trivial
> to modify the output format.[fn:1]
>

This is good. It still means adding each output somewhere.


>
> What CSL implementations do offer is the complexity to handle all the
> nuances of multiple citation styles, languages, etc. (e.g., something
> like Chicago Manual of Style footnotes). My suspicion is that it would
> take years to code something in emacs-lisp that offers all of the
> functionality of CSL processors.
>

That sounds right. bibtex was developed for a long time, and there are
still efforts to improve it!


>
> >
> http://kitchingroup.cheme.cmu.edu/blog/2015/12/03/Exporting-numbered-citations-in-html-with-unsorted-numbered-bibliography/
> >
> > I am pointing this out because I think the approach I used could allow
> > for plugins for different database backends, different ways to get the
> > replacements, etc... you could substitute org-ref links for the
> > citation syntax at some point with no real loss of generality. org-ref
> > could insert the new syntax as soon as it is available in a main org
> > branch. Some code will have to be rewritten to get the key under
> > point, but that probably won't be too hard.
>
> Thanks for sharing this. I use something like this myself for *basic*
> Chicago Manual of Style formatting when I can't rely on
> biblatex-chicago. Might I ask: What is org-ref syntax as opposed to
> citation syntax?
>

org-ref syntax for a citation is just a link such as
cite:some-key,another-key, All the cite types in bibtex and biblatex have a
link definition. Here is an example document with a citation with pre and
post text. A limitation is you can only currently put pre/post text on a
single citation.

Org-mode is great [[cite:Dominik201408][See page 20::for example]].

bibliographystyle:unsrt
bibliography:~/Dropbox/bibliography/references.bib

This exports to LaTeX as

\cite[See page 20][for example]{Dominik201408}.

\bibliographystyle{unsrt}
\bibliography{/Users/jkitchin/Dropbox/bibliography/references}

I looked at an alternative syntax for pre/post text a year ago, but there
doesn't seem to be much demand for it, and we don't use pre/post text.
http://kitchingroup.cheme.cmu.edu/blog/2014/06/26/Another-parsing-of-links-for-citations-with-pre-and-post-text/

 org-ref also provides links for labels, cross-references, index entries
and glossaries, and mostly supports biblatex too. See
https://github.com/jkitchin/org-ref/blob/master/org-ref.org for a pretty
good intro to it.


>
> >  Then other more advanced solutions could come along that would likely
> > be superior in output quality if they use real citation processors,
> > but only if there are CSLs for different backends (if I understand how
> > they work).
>
> I don't think modifying CSL styles would be necessary. A huge number
> already exist.[fn:2] I think all we would need to do is to convert the
> final CSL output to org syntax, which pandoc can already do and which
> citeproc-js could do with minor additions.
>
> Best,
> Matt
>
> Footnotes:
>
> [fn:1]
> See
> https://bitbucket.org/fbennett/citeproc-js/src/tip/src/formats.js?fileviewer=file-view-default
>
> [fn:2] https://github.com/citation-style-language/styles
>

[-- Attachment #2: Type: text/html, Size: 5814 bytes --]

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Citation processing via Zotero + zotxt
  2015-12-04 11:22             ` John Kitchin
@ 2015-12-04 14:13               ` Matt Lundin
  2015-12-05  1:23                 ` John Kitchin
  0 siblings, 1 reply; 37+ messages in thread
From: Matt Lundin @ 2015-12-04 14:13 UTC (permalink / raw)
  To: John Kitchin; +Cc: Martin Yrjölä, emacs-orgmode@gnu.org

Hi John,

John Kitchin <jkitchin@andrew.cmu.edu> writes:

>     > I am not yet convinced a citation processor will get us where we
>     want
>     > because of the complexity of the external dependencies, and the
>     > potential/probable need for us to define new CSL files for
>     different
>     > backends, or at a minimum for org-formatted citations and
>     > bibliographies. Hacking bst files is no fun, and it doesn't look
>     like
>     > CSL files are much better! Plus you have to find them and
>     install them
>     > somehow.
>
>     As I understand it, we would not need to hack the CSL files to get
>     org markup. With a good processor, such as citeproc-js, it should
>     be trivial to modify the output format.[fn:1]
>
> This is good. It still means adding each output somewhere.
>

Do you mean adding html, odt, etc.? I don't think this would be
necessary. We could invoke the citation processor (e.g., citeproc-js,
pandoc, etc.) during export, feeding it an ordered list of citation
keys. It would then return the formatted citations in org markup, which
we could insert into at the appropriate locations before the final
export to the desired backend. Thus, all we would need to do is define
an single output format (org mode). Or we could parse JSON output
directly in emacs....

>     What CSL implementations do offer is the complexity to handle all
>     the nuances of multiple citation styles, languages, etc. (e.g.,
>     something like Chicago Manual of Style footnotes). My suspicion is
>     that it would take years to code something in emacs-lisp that
>     offers all of the functionality of CSL processors.
>
> That sounds right. bibtex was developed for a long time, and there are
> still efforts to improve it!

Yes. And CSL and biblatex are far more complex and robust than bibtex.
:) I use biblatex (biblatex-chicago), which finally gives those of us in
the humanities robust automation of complex footnote styles. One of my
dreams for citation processing in org mode is to be able to achieve in
org the same level of automation, thus allowing non-LaTeX backends. IMO,
the best hope of accomplishing this is to plug into the existing power
of CSL, which has emerged as the new standard for automated citations.

(A note: the other way to accomplish this right now it to export a file
to org or markdown --- evaluating code blocks, etc. and converting
citations to [@some_cite_key] --- and then running the output through
pandoc, which uses CSL.)

> org-ref syntax for a citation is just a link such as
> cite:some-key,another-key, All the cite types in bibtex and biblatex
> have a link definition. Here is an example document with a citation
> with pre and post text. A limitation is you can only currently put
> pre/post text on a single citation.

> I looked at an alternative syntax for pre/post text a year ago, but
> there doesn't seem to be much demand for it, and we don't use pre/post
> text.
> http://kitchingroup.cheme.cmu.edu/blog/2014/06/26/Another-parsing-of-links-for-citations-with-pre-and-post-text/
>
> org-ref also provides links for labels, cross-references, index
> entries and glossaries, and mostly supports biblatex too. See
> https://github.com/jkitchin/org-ref/blob/master/org-ref.org for a
> pretty good intro to it.

Thanks. This sounds great! I'll check it out. (I've obviously not been
hanging around on the list often enough to learn about all the recent
goodies.)

Best,
Matt

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Citation processing via Zotero + zotxt
  2015-12-04 14:13               ` Matt Lundin
@ 2015-12-05  1:23                 ` John Kitchin
  0 siblings, 0 replies; 37+ messages in thread
From: John Kitchin @ 2015-12-05  1:23 UTC (permalink / raw)
  To: Matt Lundin; +Cc: Martin Yrjölä, emacs-orgmode@gnu.org


>
> Yes. And CSL and biblatex are far more complex and robust than bibtex.
> :) I use biblatex (biblatex-chicago), which finally gives those of us in
> the humanities robust automation of complex footnote styles. One of my
> dreams for citation processing in org mode is to be able to achieve in
> org the same level of automation, thus allowing non-LaTeX backends. IMO,
> the best hope of accomplishing this is to plug into the existing power
> of CSL, which has emerged as the new standard for automated citations.
>
> (A note: the other way to accomplish this right now it to export a file
> to org or markdown --- evaluating code blocks, etc. and converting
> citations to [@some_cite_key] --- and then running the output through
> pandoc, which uses CSL.)

On this note, it is also possible to have org-ref either insert pandoc
syntax, or replace the org-ref links with pandoc syntax as a pre-filter.
Alternatively, when you export markdown documents with org-ref
citations, they export as pandoc syntax right now.

>
> Best,
> Matt

--
Professor John Kitchin
Doherty Hall A207F
Department of Chemical Engineering
Carnegie Mellon University
Pittsburgh, PA 15213
412-268-7803
@johnkitchin
http://kitchingroup.cheme.cmu.edu

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Citation processing via Zotero + zotxt
  2015-12-04  1:49             ` John Kitchin
@ 2015-12-05 21:18               ` Matt Lundin
  2015-12-05 22:23                 ` John Kitchin
  0 siblings, 1 reply; 37+ messages in thread
From: Matt Lundin @ 2015-12-05 21:18 UTC (permalink / raw)
  To: John Kitchin; +Cc: emacs-orgmode, Rasmus

Hi John,

John Kitchin <jkitchin@andrew.cmu.edu> writes:

> If a reference type is not listed in the CSL, it also will not be
> supported by CSL I suppose.

How is this different than biblatex or bibtex? A user could just modify
the style or put in a request with a maintainer. 

> I also suppose the CSL must be backend specific to output formats
> appropriate to org, html, LaTeX, markdown, etc... for any particular
> style.

AFAIU, CSL styles are backend agnostic (otherwise they wouldn't be of
much use). It is the processor (citeproc-js, pandoc, etc.) that takes
the instructions (e.g., font-style="italic" in a CSL file) and adds the
appropriate markup for a defined backend. So once you add a new output
format to a processor, it works with all styles.

> We should not try to support all of these things. We could support a
> small number of things that could be improved or increased in the
> future.

I would suggest that tapping into a CSL tool like zotero of citeproc-js
is in fact *a small thing* we can do right now that would have a big
payoff for lots of users, even if it does not support 100% of use cases.

> The only time-tested, publication quality solutions for citations in
> my opinion right now are bib(la)tex, MS Word/reference manager, and
> "by hand". Even these get "edited" in their final print versions by
> journals.

Is this assessment based on your particular disciplinary experience? I
ask because many of us in the humanities have not enjoyed the benefits
of automated, text-based citation processing until quite recently, so
*both* biblatex and CSL seem awesome. The citation style in my field
(the Chicago Manual of Style) is more quirky and complex than any
scientific citation style. Thus, it is likely more feasible to implement
basic bibtex functionality in lisp than it is to re-implement
biblatex-chicago.[fn:1] CSL offers the advantage of allowing export to
backends that can easily be converted to Word (the format that
humanities publishers require).

> It might start making more sense to think of a lisp based citation
> processor. It might even address some limitations of bib(la)tex.

That would be very cool, especially if we could import/convert CSL files
(I don't want to rewrite all 1200+ lines of the
chicago-fullnote-bibliography CSL style). :)

Matt

Footnotes:

[fn:1] I cloned the CSL repository and did a quick sort by word count.
Not surprisingly, the longest files were all in the humanities:

    1296     2700    44202 chicago-fullnote-bibliography-fr.csl
    1273     2590    40935 chicago-fullnote-bibliography.csl
    1264     2576    40674 chicago-fullnote-bibliography-no-ibid.csl
    1241     2531    39515 chicago-library-list.csl
    1241     2530    39535 chicago-annotated-bibliography.csl
    1240     2524    39445 chicago-note-bibliography.csl
    1235     2506    40168 zeitschrift-fur-religionswissenschaft-note.csl
    1227     2508    39077 chicago-note-biblio-no-ibid.csl
    1132     2620    41990 mcgill-fr.csl
    1060     2149    34008 moorlands-college.csl
     998     2175    34470 lluelles.csl
     927     2066    31551 lluelles-no-ibid.csl
     911     1862    29828 proinflow.csl
     906     1961    28990 irish-historical-studies.csl
     878     1828    29238 universite-laval-faculte-de-theologie-et-de-sciences-religieuses.csl
     862     1830    29437 chicago-author-date-fr.csl
     856     1782    28244 oxford-studies-in-ancient-philosophy.csl
     809     1941    27717 university-college-dublin-school-of-history-and-archives.csl
     809     1796    26790 turabian-fullnote-bibliography.csl
     806     1857    27140 wheaton-college-phd-in-biblical-and-theological-studies.csl
     796     1791    26472 modern-language-association-6th-edition-note.csl
     793     1671    26453 sheffield-hallam-university-history.csl
     792     1910    30773 pour-reussir-note.csl
     788     1792    26377 svensk-exegetisk-arsbok.csl
     781     1782    26316 early-christianity.csl
     779     1772    26038 society-of-biblical-literature-fullnote-bibliography.csl
     775     1808    26461 new-testament-studies.csl
     768     2350    29928 clio-medica.csl
     714     1533    22993 iso690-author-date-cs.csl
     708     1578    25548 chicago-author-date-basque.csl
     708     1520    22731 iso690-author-date-sk.csl
     707     1605    24582 melbourne-school-of-theology.csl
     701     1533    22280 moore-theological-college.csl
     696     1756    25783 associacao-brasileira-de-normas-tecnicas-ufjf.csl
     694     1474    22229 podzemna-voda.csl
     692     2315    29639 foerster-geisteswissenschaft.csl
     692     1591    24881 oscola.csl

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Citation processing via Zotero + zotxt
  2015-12-05 21:18               ` Matt Lundin
@ 2015-12-05 22:23                 ` John Kitchin
  0 siblings, 0 replies; 37+ messages in thread
From: John Kitchin @ 2015-12-05 22:23 UTC (permalink / raw)
  To: Matt Lundin; +Cc: emacs-orgmode, Rasmus


Matt Lundin writes:

> Hi John,
>
> John Kitchin <jkitchin@andrew.cmu.edu> writes:
>
>> If a reference type is not listed in the CSL, it also will not be
>> supported by CSL I suppose.
>
> How is this different than biblatex or bibtex? A user could just modify
> the style or put in a request with a maintainer.

Its not different. Just to point out CSL has limitations too.

>> I also suppose the CSL must be backend specific to output formats
>> appropriate to org, html, LaTeX, markdown, etc... for any particular
>> style.
>
> AFAIU, CSL styles are backend agnostic (otherwise they wouldn't be of
> much use). It is the processor (citeproc-js, pandoc, etc.) that takes
> the instructions (e.g., font-style="italic" in a CSL file) and adds the
> appropriate markup for a defined backend. So once you add a new output
> format to a processor, it works with all styles.

I believe that. I still don't totally see where font-style="italic" gets
converted to /text/ or <i>text</i> or \it{text} etc... I trust it
happens, I just don't see it. Where does one do something fancier like:
<i title="italics text"><a href="www.italics.org">text</a></i>

It sounds trivial, but imagine a day when we all have an orcid, and our
names are linked to our orcid pages in bibliographies, e.g. my name
would be <a href="http://orcid.org/0000-0003-2625-9232">J. R.
Kitchin</a> or some other format.

See this highly linked bibliography:
http://kitchingroup.cheme.cmu.edu/dept-publications-2014.html
Where almost every part of each entry is linked to something (true to
some stuff that is behind a paywall, but lets not get distracted by
that). Is that something a CSL/citation processor could do? These aren't
reasons not to do CSL, and they aren't mission critical to citations.
They just make them richer and more useful. And to be fair, the
information these pieces are linked to came from a bibliographic record
from Scopus, not a bibtex file I maintain, which only has a doi in it to
access that information.

Clearly I still don't see how a cite-processor actually works. I gather
that
1. you extract citation data from a document.
2. send citation data, style and bibliography data to the citation
processor
3. It returns replacement text, and the bibliography string
4. you substitute the replacements in the text and insert the
bibliography string somewhere.

If the CSL doesn't have the backend information, and the citation
processor doesn't know about org/html/etc... then somewhere between step
3 and 4 you add the formatting right? Does the processor get another
piece of information to tell it how to format the output? For example,
if your CSL says a citation should be superscripted, how does the
citation processor know to output <sup>4</sup> vs. $^4$? or ^{4}.


>> We should not try to support all of these things. We could support a
>> small number of things that could be improved or increased in the
>> future.
>
> I would suggest that tapping into a CSL tool like zotero of citeproc-js
> is in fact *a small thing* we can do right now that would have a big
> payoff for lots of users, even if it does not support 100% of use
> cases.

I have no objection to it. Getting high quality references into Word
documents from org-mode is one barrier to convincing more people
org-mode is a competitive writing tool for publications in my mind.

Clearly we need an external program for this. I looked at pandoc before
http://kitchingroup.cheme.cmu.edu/blog/2015/01/29/Export-org-mode-to-docx-with-citations-via-pandoc/
and it was ok, but had some issues getting to Word with citations.

Would zotero or citeproc-js be any better?

>> The only time-tested, publication quality solutions for citations in
>> my opinion right now are bib(la)tex, MS Word/reference manager, and
>> "by hand". Even these get "edited" in their final print versions by
>> journals.
>
> Is this assessment based on your particular disciplinary experience?
Certainly ;) Those have met 100% of my needs, and about 10% of my
desires in scientific publishing for the past 15 years. org-ref now
meets 99.9% of my desires ;)

Also, by time-tested, I mean I have published papers by those methods
specifically, so I know they work (including using org-mode to make
bibtex/latex files). I have not published any papers using org mode with
export to Word, so I don't know if it is possible to do it. The final
details of formatting may prove too difficult in some cases for direct
export.

> I
> ask because many of us in the humanities have not enjoyed the benefits
> of automated, text-based citation processing until quite recently, so
> *both* biblatex and CSL seem awesome.
So even Word/Endnote\|Papers\|Zotero has not been a
citation/bibliography solution?

>The citation style in my field
> (the Chicago Manual of Style) is more quirky and complex than any
> scientific citation style. Thus, it is likely more feasible to implement
> basic bibtex functionality in lisp than it is to re-implement
> biblatex-chicago.[fn:1] CSL offers the advantage of allowing export to
> backends that can easily be converted to Word (the format that
> humanities publishers require).

Somewhat ironically, CSL won't support some standard citation formats in
chemistry journals! Luckily, those are supported in bibtex.

>> It might start making more sense to think of a lisp based citation
>> processor. It might even address some limitations of bib(la)tex.
>
> That would be very cool, especially if we could import/convert CSL files
> (I don't want to rewrite all 1200+ lines of the
> chicago-fullnote-bibliography CSL style). :)

It is easy to read in the xml files, and they read into a lisp data
structure. That isn't helpful on its own, of course, without a lisp
citation processor to send them too ;)

>
> Matt
>
> Footnotes:
>
> [fn:1] I cloned the CSL repository and did a quick sort by word count.
> Not surprisingly, the longest files were all in the humanities:
>
>     1296     2700    44202 chicago-fullnote-bibliography-fr.csl
>     1273     2590    40935 chicago-fullnote-bibliography.csl
>     1264     2576    40674 chicago-fullnote-bibliography-no-ibid.csl
>     1241     2531    39515 chicago-library-list.csl
>     1241     2530    39535 chicago-annotated-bibliography.csl
>     1240     2524    39445 chicago-note-bibliography.csl
>     1235     2506    40168 zeitschrift-fur-religionswissenschaft-note.csl
>     1227     2508    39077 chicago-note-biblio-no-ibid.csl
>     1132     2620    41990 mcgill-fr.csl
>     1060     2149    34008 moorlands-college.csl
>      998     2175    34470 lluelles.csl
>      927     2066    31551 lluelles-no-ibid.csl
>      911     1862    29828 proinflow.csl
>      906     1961    28990 irish-historical-studies.csl
>      878     1828    29238 universite-laval-faculte-de-theologie-et-de-sciences-religieuses.csl
>      862     1830    29437 chicago-author-date-fr.csl
>      856     1782    28244 oxford-studies-in-ancient-philosophy.csl
>      809     1941    27717 university-college-dublin-school-of-history-and-archives.csl
>      809     1796    26790 turabian-fullnote-bibliography.csl
>      806     1857    27140 wheaton-college-phd-in-biblical-and-theological-studies.csl
>      796     1791    26472 modern-language-association-6th-edition-note.csl
>      793     1671    26453 sheffield-hallam-university-history.csl
>      792     1910    30773 pour-reussir-note.csl
>      788     1792    26377 svensk-exegetisk-arsbok.csl
>      781     1782    26316 early-christianity.csl
>      779     1772    26038 society-of-biblical-literature-fullnote-bibliography.csl
>      775     1808    26461 new-testament-studies.csl
>      768     2350    29928 clio-medica.csl
>      714     1533    22993 iso690-author-date-cs.csl
>      708     1578    25548 chicago-author-date-basque.csl
>      708     1520    22731 iso690-author-date-sk.csl
>      707     1605    24582 melbourne-school-of-theology.csl
>      701     1533    22280 moore-theological-college.csl
>      696     1756    25783 associacao-brasileira-de-normas-tecnicas-ufjf.csl
>      694     1474    22229 podzemna-voda.csl
>      692     2315    29639 foerster-geisteswissenschaft.csl
>      692     1591    24881 oscola.csl

--
Professor John Kitchin
Doherty Hall A207F
Department of Chemical Engineering
Carnegie Mellon University
Pittsburgh, PA 15213
412-268-7803
@johnkitchin
http://kitchingroup.cheme.cmu.edu

^ permalink raw reply	[flat|nested] 37+ messages in thread

end of thread, other threads:[~2015-12-05 22:23 UTC | newest]

Thread overview: 37+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-11-28 20:16 Citation processing via Zotero + zotxt Richard Lawrence
2015-11-30  8:12 ` Martin Yrjölä
2015-12-01  0:32   ` John Kitchin
2015-12-01  2:59     ` Richard Lawrence
2015-12-03  0:27       ` Matt Lundin
2015-12-03  1:41         ` Christian Wittern
2015-12-03 14:25           ` Matt Lundin
2015-12-03  4:32         ` Richard Lawrence
2015-12-03 15:27           ` Matt Lundin
2015-12-03 17:56             ` Richard Lawrence
2015-12-03 19:14               ` Thomas S. Dye
2015-12-03 22:05                 ` Rasmus
2015-12-03 20:45               ` Matt Lundin
2015-12-03 21:04                 ` Matt Lundin
2015-12-03 23:31         ` John Kitchin
2015-12-04  0:03           ` Rasmus
2015-12-04  1:49             ` John Kitchin
2015-12-05 21:18               ` Matt Lundin
2015-12-05 22:23                 ` John Kitchin
2015-12-04  2:06           ` Matt Lundin
2015-12-04 11:22             ` John Kitchin
2015-12-04 14:13               ` Matt Lundin
2015-12-05  1:23                 ` John Kitchin
2015-12-04  9:20           ` Eric S Fraga
2015-12-01  9:34     ` Ramon Diaz-Uriarte
2015-11-30 11:35 ` Eric S Fraga
2015-11-30 12:02   ` Ramon Diaz-Uriarte
     [not found] ` <CAN_Dec-JCmwQ2bc=U93rRaPMYyXOiN_PRbtyya9x396r3qqMrg@mail.gmail.com>
2015-12-01 12:12   ` Fwd: " Matt Price
2015-12-01 15:28     ` Eric S Fraga
2015-12-01 15:51       ` Richard Lawrence
2015-12-01 19:20         ` Matt Price
2015-12-01 21:34         ` Eric S Fraga
2015-12-01 21:51     ` Matt Lundin
2015-12-02 15:21       ` Richard Lawrence
2015-12-01 14:36 ` Rasmus
2015-12-01 19:37   ` Matt Price
2015-12-02 15:58   ` Richard Lawrence

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).