* More questions about CSL and org-mode @ 2015-12-06 21:25 John Kitchin 2015-12-06 23:24 ` Richard Lawrence 0 siblings, 1 reply; 7+ messages in thread From: John Kitchin @ 2015-12-06 21:25 UTC (permalink / raw) To: Org Mode Hi all, This is mostly for the people working on citations in org-mode. I have been reading about CSL more this weekend. IIRC, one of the reasons to develop the new citation syntax was to get the ability to have pre/post text in citations more conveniently than what is currently possible. I have not seen any possibility for this with CSL, however. Is my understanding correct? Is this a problem, or something partially handled by org-export and partially by a citeproc? IIUC, the current aim is to get a citeproc that will do the following on export: 1. replace in-text citation syntax with org-formatted replacements 2. Insert an org-formatted bibliography somewhere in the document 3. proceed with org-to-something export, with built-in exporters. The current contenders for a citeproc are Zotero and Pandoc. Has anyone looked at https://pypi.python.org/pypi/citeproc-py/ or https://github.com/inukshuk/citeproc-ruby The ruby one looks pretty advanced. -- Professor John Kitchin Doherty Hall A207F Department of Chemical Engineering Carnegie Mellon University Pittsburgh, PA 15213 412-268-7803 @johnkitchin http://kitchingroup.cheme.cmu.edu ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: More questions about CSL and org-mode 2015-12-06 21:25 More questions about CSL and org-mode John Kitchin @ 2015-12-06 23:24 ` Richard Lawrence 2015-12-06 23:45 ` Richard Lawrence 2015-12-07 16:18 ` John Kitchin 0 siblings, 2 replies; 7+ messages in thread From: Richard Lawrence @ 2015-12-06 23:24 UTC (permalink / raw) To: John Kitchin, Org Mode Hi John, John Kitchin <jkitchin@andrew.cmu.edu> writes: > Hi all, > > This is mostly for the people working on citations in org-mode. > > I have been reading about CSL more this weekend. IIRC, one of the > reasons to develop the new citation syntax was to get the ability to > have pre/post text in citations more conveniently than what is currently > possible. Yes, that is my understanding, too. > I have not seen any possibility for this with CSL, however. Is my > understanding correct? Is this a problem, or something partially handled > by org-export and partially by a citeproc? The CSL processors I've looked at support prefix and suffix text for individual references within a citation. See, for example, the citeproc-js documentation: http://gsl-nagoya-u.net/http/pub/citeproc-doc.html#citation-data-object prefix, suffix, and some other fields are supported. pandoc-citeproc supports the same set of fields. However, my understanding is that neither citeproc-js nor pandoc-citeproc support a BibLaTeX-style "common" prefix/suffix that belongs to the citation as a whole, rather than the individual references within it, as is available in the multi-cite commands. We currently have support for such common prefixes/suffixes in Org syntax. My solution to this in my org-citeproc wrapper for pandoc-citeproc is to prepend the common prefix to the prefix for the first reference in a citation, and append the common suffix to the last reference. This is not a great solution, because it is not really defined what kind of punctuation (if any) should separate the common prefix from the first item's prefix, and so on. But I figured that was not an important issue to address until we actually have people making use of common prefix and suffix syntax who are not exporting to LaTeX... > IIUC, the current aim is to get a citeproc that will do the following on > export: > 1. replace in-text citation syntax with org-formatted replacements > 2. Insert an org-formatted bibliography somewhere in the document > 3. proceed with org-to-something export, with built-in > exporters. That's basically my understanding too. There is one snag with the "org-formatted replacement" plan, though, which I saw in a Zotero dev discussion yesterday. CSL processing might result in multiple levels of formatting, e.g. nested italics like <em>Something with an internal <em>Title</em></em> and that won't translate very well back to Org syntax in general: /Something with an internal /Title// The suggestion was to just use HTML output, and then parse the HTML to get a data structure that could be directly rendered into HTML, LaTeX, etc., which support nested italics just fine. I think we could do this, though maybe there's a better solution. That is, we can take HTML from the citation processor and go directly to org-element objects, without producing and re-parsing citations in Org format. > The current contenders for a citeproc are Zotero and Pandoc. > > Has anyone looked at https://pypi.python.org/pypi/citeproc-py/ > or https://github.com/inukshuk/citeproc-ruby > > The ruby one looks pretty advanced. I haven't looked at them closely. My impression was that the Python version was quite incomplete; and unfortunately, I don't know Ruby, so I would be the wrong person to evaluate it (or write code for it). Best, Richard ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: More questions about CSL and org-mode 2015-12-06 23:24 ` Richard Lawrence @ 2015-12-06 23:45 ` Richard Lawrence 2015-12-07 11:56 ` John Kitchin 2015-12-07 16:18 ` John Kitchin 1 sibling, 1 reply; 7+ messages in thread From: Richard Lawrence @ 2015-12-06 23:45 UTC (permalink / raw) To: John Kitchin, Org Mode Richard Lawrence <richard.lawrence@berkeley.edu> writes: >> IIUC, the current aim is to get a citeproc that will do the following on >> export: >> 1. replace in-text citation syntax with org-formatted replacements >> 2. Insert an org-formatted bibliography somewhere in the document >> 3. proceed with org-to-something export, with built-in >> exporters. > > That's basically my understanding too. There is one snag with the > "org-formatted replacement" plan, though, which I saw in a Zotero dev > discussion yesterday. Here's the reference for that discussion, by the way: https://groups.google.com/d/msg/zotero-dev/Bz_IenruxX4/24QWuyEIp_IJ Best, Richard P.S. John, thanks for your continued research on this. I see that our procrastination habits are on the same schedule. :) ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: More questions about CSL and org-mode 2015-12-06 23:45 ` Richard Lawrence @ 2015-12-07 11:56 ` John Kitchin 2015-12-07 19:55 ` Richard Lawrence 0 siblings, 1 reply; 7+ messages in thread From: John Kitchin @ 2015-12-07 11:56 UTC (permalink / raw) To: Richard Lawrence; +Cc: Org Mode Thanks. Its an interesting jam. You want to have multiple outputs as a possibility, but there isn't a robust markup that readily works across all backends. What about this. For now consider a bibliography database with org-formatting in the entries, e.g. subscripts, superscripts, etc... (but not like putting italics on titles or anything related to bibliography formatting). So I can have a title like "The role of H_{2}O in /d/-orbital splitting of \alpha particles" in an entry. I assume it would also be ok to have utf-8 characters in it. Equations are still problematic, as we use LaTeX syntax for those. On export the in-text citations are transformed to unique text blobs, e.g. uuids, and the document exported. The only important features of these blobs is that they do not get changed on export, and they are unique because we replace them later. The strings in the bibliography entry are "exported" to convert the org-markup to the output format. The in-text citations, expanded bibliography and style are sent to the citation processor, which outputs replacements and a formatted bibliography in the desired output format. Finally, you replace each uuid with the appropriate replacement, and insert the bibliography where it belongs. That should be the final document. If you did this with a bibtex file, it would probably break its use in LaTeX without some clever transformation of the bibtex file to a new file that was LaTeX formatted, and an on the fly change to the org buffer to use this new file. But, since the point of this is for non-LaTeX export, I guess this is ok. I bet you could even expand the bibtex format to include journal abbreviations, and directly use the fields that CSL uses (although I strongly dislike "container-title" for the journal name!) The downside is the processor now needs to output different formats, but presumably there are a few standard ones that are a one-time investment like html. Richard Lawrence writes: > Richard Lawrence <richard.lawrence@berkeley.edu> writes: > >>> IIUC, the current aim is to get a citeproc that will do the following on >>> export: >>> 1. replace in-text citation syntax with org-formatted replacements >>> 2. Insert an org-formatted bibliography somewhere in the document >>> 3. proceed with org-to-something export, with built-in >>> exporters. >> >> That's basically my understanding too. There is one snag with the >> "org-formatted replacement" plan, though, which I saw in a Zotero dev >> discussion yesterday. > > Here's the reference for that discussion, by the way: > > https://groups.google.com/d/msg/zotero-dev/Bz_IenruxX4/24QWuyEIp_IJ > > Best, > Richard > > P.S. John, thanks for your continued research on this. I see that our > procrastination habits are on the same schedule. :) -- Professor John Kitchin Doherty Hall A207F Department of Chemical Engineering Carnegie Mellon University Pittsburgh, PA 15213 412-268-7803 @johnkitchin http://kitchingroup.cheme.cmu.edu ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: More questions about CSL and org-mode 2015-12-07 11:56 ` John Kitchin @ 2015-12-07 19:55 ` Richard Lawrence 2015-12-08 11:41 ` John Kitchin 0 siblings, 1 reply; 7+ messages in thread From: Richard Lawrence @ 2015-12-07 19:55 UTC (permalink / raw) To: John Kitchin; +Cc: Org Mode Hi John, John Kitchin <jkitchin@andrew.cmu.edu> writes: > Thanks. > > Its an interesting jam. You want to have multiple outputs as a > possibility, but there isn't a robust markup that readily works across > all backends. Yes, indeed. > On export the in-text citations are transformed to unique text blobs, > e.g. uuids, and the document exported. The only important features of > these blobs is that they do not get changed on export, and they are > unique because we replace them later. > > The strings in the bibliography entry are "exported" to convert the > org-markup to the output format. The in-text citations, expanded > bibliography and style are sent to the citation processor, which outputs > replacements and a formatted bibliography in the desired output format. > > Finally, you replace each uuid with the appropriate replacement, and > insert the bibliography where it belongs. That should be the final > document. IIUC, the problem with this approach is that it will not work well when the citation style is note-based rather than inline. The main motivation for going "back to Org" is that note-based styles require the document structure to change as a result of citation processing: new footnotes have to be inserted, and existing ones have to be renumbered. That is relatively hard to do if the rest of the document is already in the target format (except with LaTeX). By doing citation processing early in the export process and converting the results to Org, we can rely on Org's footnote processing to handle this later in the export process. As far as I can see, if it weren't for note-based styles, this approach would work fine. (Indeed, it is pretty much what the existing org-cite code does, except that the mapping between citations and their replacements is done with Lisp data structures rather than via string replacement in the output buffer. I stopped work on that right about the time I realized the existing approach wouldn't work very well with note-based styles.) But given the problem about nested formatting, going back to Org at the level of text replacements doesn't work. In other words: both of the simple-minded approaches (process citations directly to text in the target format, or process them to Org text, then let Org convert them to the target format) face problems. I think probably what we'll have to do to accommodate both note-based styles and the possibility of nested formatting is to get the results of citation processing in some unambiguous format like HTML or JSON, then parse it, and then use the result to directly modify the parse tree for the Org document before continuing the export process. I can't see an easier way...can anyone else? Best, Richard ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: More questions about CSL and org-mode 2015-12-07 19:55 ` Richard Lawrence @ 2015-12-08 11:41 ` John Kitchin 0 siblings, 0 replies; 7+ messages in thread From: John Kitchin @ 2015-12-08 11:41 UTC (permalink / raw) To: Richard Lawrence; +Cc: Org Mode >> On export the in-text citations are transformed to unique text blobs, >> e.g. uuids, and the document exported. The only important features of >> these blobs is that they do not get changed on export, and they are >> unique because we replace them later. >> >> The strings in the bibliography entry are "exported" to convert the >> org-markup to the output format. The in-text citations, expanded >> bibliography and style are sent to the citation processor, which outputs >> replacements and a formatted bibliography in the desired output format. >> >> Finally, you replace each uuid with the appropriate replacement, and >> insert the bibliography where it belongs. That should be the final >> document. > > IIUC, the problem with this approach is that it will not work well when > the citation style is note-based rather than inline. The main > motivation for going "back to Org" is that note-based styles require the > document structure to change as a result of citation processing: new > footnotes have to be inserted, and existing ones have to be renumbered. > That is relatively hard to do if the rest of the document is already in > the target format (except with LaTeX). By doing citation processing > early in the export process and converting the results to Org, we can > rely on Org's footnote processing to handle this later in the export > process. I guess I don't understand what note-based citations look like, or why you would have to renumber footnotes in this process. Does the order change for some reason? Even if it does, it sounds like this might just require another pass of calculations to figure out how to replace things. Any chance you could send me a document with note-based citations? One place where text-based replacement doesn't work I guess is outputs that aren't plain text based. Maybe, for example, to ODT where the output creates multiple xml files in a zip file? > As far as I can see, if it weren't for note-based styles, this approach > would work fine. (Indeed, it is pretty much what the existing org-cite > code does, except that the mapping between citations and their > replacements is done with Lisp data structures rather than via string > replacement in the output buffer. I stopped work on that right about > the time I realized the existing approach wouldn't work very well with > note-based styles.) > > But given the problem about nested formatting, going back to Org at the > level of text replacements doesn't work. In other words: both of the > simple-minded approaches (process citations directly to text in the > target format, or process them to Org text, then let Org convert them to > the target format) face problems. > > I think probably what we'll have to do to accommodate both note-based > styles and the possibility of nested formatting is to get the results of > citation processing in some unambiguous format like HTML or JSON, then > parse it, and then use the result to directly modify the parse tree for > the Org document before continuing the export process. I can't see an > easier way...can anyone else? Like getting an xml citation, and then using xslt to translate it to the format you want? Or something equivalent? Your translation would still have to be clever to avoid nested syntax, which I guess requires some recursive parsing of the output. Modifying the parse tree is more elegant than the replacement text idea. I have to learn how to do this one day ;) > > Best, > Richard -- Professor John Kitchin Doherty Hall A207F Department of Chemical Engineering Carnegie Mellon University Pittsburgh, PA 15213 412-268-7803 @johnkitchin http://kitchingroup.cheme.cmu.edu ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: More questions about CSL and org-mode 2015-12-06 23:24 ` Richard Lawrence 2015-12-06 23:45 ` Richard Lawrence @ 2015-12-07 16:18 ` John Kitchin 1 sibling, 0 replies; 7+ messages in thread From: John Kitchin @ 2015-12-07 16:18 UTC (permalink / raw) To: Richard Lawrence; +Cc: Org Mode Richard Lawrence writes: > Hi John, > > John Kitchin <jkitchin@andrew.cmu.edu> writes: > >> Hi all, >> >> This is mostly for the people working on citations in org-mode. >> >> I have been reading about CSL more this weekend. IIRC, one of the >> reasons to develop the new citation syntax was to get the ability to >> have pre/post text in citations more conveniently than what is currently >> possible. > > Yes, that is my understanding, too. > >> I have not seen any possibility for this with CSL, however. Is my >> understanding correct? Is this a problem, or something partially handled >> by org-export and partially by a citeproc? > > The CSL processors I've looked at support prefix and suffix text for > individual references within a citation. See, for example, the > citeproc-js documentation: > > http://gsl-nagoya-u.net/http/pub/citeproc-doc.html#citation-data-object > > prefix, suffix, and some other fields are supported. pandoc-citeproc > supports the same set of fields. Interesting. I guess these are not standard for all processors? It also looks like it would be hard to get something like an inline reference formatted as [1] but refer to Reference 1, e.g. from citenum. It is possible to have (Kitchin 2007) and (2007) but not a citation reference to Kitchin that is derived from e.g. a citeauthor command in LaTeX. I am not raising any objections here, just getting a sense for what is feasible. > > However, my understanding is that neither citeproc-js nor > pandoc-citeproc support a BibLaTeX-style "common" prefix/suffix that > belongs to the citation as a whole, rather than the individual > references within it, as is available in the multi-cite commands. We > currently have support for such common prefixes/suffixes in Org syntax. > > My solution to this in my org-citeproc wrapper for pandoc-citeproc is to > prepend the common prefix to the prefix for the first reference in a > citation, and append the common suffix to the last reference. This is > not a great solution, because it is not really defined what kind of > punctuation (if any) should separate the common prefix from the first > item's prefix, and so on. But I figured that was not an important issue > to address until we actually have people making use of common prefix and > suffix syntax who are not exporting to LaTeX... agreed. > >> IIUC, the current aim is to get a citeproc that will do the following on >> export: >> 1. replace in-text citation syntax with org-formatted replacements >> 2. Insert an org-formatted bibliography somewhere in the document >> 3. proceed with org-to-something export, with built-in >> exporters. > > That's basically my understanding too. There is one snag with the > "org-formatted replacement" plan, though, which I saw in a Zotero dev > discussion yesterday. CSL processing might result in multiple levels of > formatting, e.g. nested italics like > > <em>Something with an internal <em>Title</em></em> > > and that won't translate very well back to Org syntax in general: > > /Something with an internal /Title// > > The suggestion was to just use HTML output, and then parse the HTML to > get a data structure that could be directly rendered into HTML, LaTeX, > etc., which support nested italics just fine. I think we could do this, > though maybe there's a better solution. That is, we can take HTML from > the citation processor and go directly to org-element objects, without > producing and re-parsing citations in Org format. -- Professor John Kitchin Doherty Hall A207F Department of Chemical Engineering Carnegie Mellon University Pittsburgh, PA 15213 412-268-7803 @johnkitchin http://kitchingroup.cheme.cmu.edu ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2015-12-08 11:41 UTC | newest] Thread overview: 7+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2015-12-06 21:25 More questions about CSL and org-mode John Kitchin 2015-12-06 23:24 ` Richard Lawrence 2015-12-06 23:45 ` Richard Lawrence 2015-12-07 11:56 ` John Kitchin 2015-12-07 19:55 ` Richard Lawrence 2015-12-08 11:41 ` John Kitchin 2015-12-07 16:18 ` John Kitchin
Code repositories for project(s) associated with this public inbox https://git.savannah.gnu.org/cgit/emacs/org-mode.git This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).