Re: Problem with org-ref

emacs-orgmode@gnu.org archives
 help / color / mirror / code / Atom feed

From: marvin doyley <marvinpas@gmail.com>
To: John Kitchin <jkitchin@andrew.cmu.edu>
Cc: "emacs-orgmode@gnu.org" <emacs-orgmode@gnu.org>
Subject: Re: Problem with org-ref
Date: Wed, 23 Dec 2015 16:44:30 -0500	[thread overview]
Message-ID: <F2968A13-CC35-489C-9BCC-2DB397D02434@gmail.com> (raw)
In-Reply-To: <CAJ51ETp6x_R7rFy0iTMg_cC2_T6U0fkCVvTp7ThMn8N2ESYj8Q@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 738 bytes --]

Hi John,

I am using a Mac too, and pdftotext seem to be working, well for an ocr file (see below) if I run pdftotext on the original pdf I get nothing.
cheers,
M

PS I am using brew poppler, which includes pdftotext

> On Dec 23, 2015, at 4:30 PM, John Kitchin <jkitchin@andrew.cmu.edu> wrote:
> 
> Thanks! Do you know if you have pdftotext working on your machine? The pdf drag-n-drop works by converting the pdf to text, and than matching a pattern to find a doi. If none is found, you get the message you noted. The url dnd works similarly, but there are a bunch of recipes for what to match depending on the base of the url.
> 
> I did that on a Mac, and I haven't tested it on a windows or Linux machine. 
> 
> John

[-- Attachment #2.1: Type: text/html, Size: 2170 bytes --]

[-- Attachment #2.2: acscatal%2E5b00538-OCR.txt --]
[-- Type: text/plain, Size: 32716 bytes --]

O~

This is an open access article published under an ACS Authorchoice License which permits
copyingand redistribution ofthe article or any adaptations fornon-commercialpurposes.

ACS
rditers' Choice

r

e

pubs.acs.org/acscatalysis

Examples of Effective Data Sharing in Scientific Publishing
John R. Kitchin*
Department of Chemical Engineering, Carnegie Mellon University, 5000 Forbes Avenue, Pittsburgh, Pennsylvania 15213,
United States

6 Supporting Information
ABSTRACT: We present a perspective on an approach to data sharing in scientific publications we have been developing in our group.
The essence of the approach is that data can be embedded in a human-readable and machine-addressable way within the traditional
publishing environment. We show this by example for both computational and experimental data. We articulate a need for new authoring
tools to facilitate data sharing, and we discuss the tools we have been developing for this purpose. With these tools, data generation,
analysis, and manuscript preparation can be deeply integrated, resulting in easier and better data sharing in scientific publications.

1. INTRODUCTION

a)Q

Data sharing and m anagement plans are becoming an
increasingly important requirement for scientific research. The
1
National Science Foundation requires a data management plan
for all proposals and has a formal statement on data sharing.
The Department of Energy is beginning to require this in some
proposals and has issued a formal statement on data management. Th e N ational Institutes of Health has a formal data
sharing policy.4 These mandates do not usually say how the data
management plan must be implemented, simply that there must
be one that is appropriate for the data being generated and
consistent with the needs of the scientific field.
There are several challenges to meeting these requirements.
Data is a generic term that refers to a broad range of information that varies in size and complexity. There is no single
solution that fits all data. The tools we use to write manuscripts,
which has been the primary mode of data sharing for decades
typically only include graphical or tabular data. Those data
are usually not machine readable nor are they easy to reuse. We
believe that progress in data sharing will require the development of new tools and social expectations about scientific data.
T hese tools will have to i n tegrate data directly into t h e
manuscripts and into the manuscript preparation workflow,
so that by the time one finishes writing the manuscript, the data
is already integrated and ready to share with no further work
required. When this is possible, data will be shared much more
frequently.
In this Viewpoint, we illustrate an example of a method we
have been developing that enables data, analysis, and code
sharing within the traditional publishing environment. We have
published several papers this way already including experimental
5
6
work, combined computational and experimental work, and
computational work. ' T hese manuscripts in published form
are indistinguishable from other manuscripts in those journals.
However, the Supporting Information files are very difFerent.
Although, at first glance, the Supporting Information files are
simple PDF files that provide additional information, there
are actually data files embedded (see Figure I a ) in those
PDFs which can be extracted and reused. The data is human
readable in most cases, and machine addressable, enabling
reuse and sharing. We will provide an example of this for a

'~ ACg pUbIicgtiQfi5

0201SAmericanchemicalSociety

tt

v

@ Add ~iii Delete cj h Search
a Description

Name

Q data.json
Q supportlng-information.org

IModined

Unknown

Unknown

b le, to figure our how soine analysis was r

, ~< he~. ~

+

An embeddedfi
le
in the PDF

Figure 1. (a) The attachments in the Supporting Information of ref 11.
There are two attachments. (b) A single attachment in the PDF file.
Double-clicking on the thumbtack(in the Supporting Information file)
will open the attached file.

paper we recently published in ACS Catalysis on "Estimating
Bulk-Composition-Dependent Hz Adsorption Energies on
Cuâ€žPd, â€žAlloy (111) Surfaces"."
The Supporting Information file for that article is available for
free.' When opened in a PDF reader that supports attachments,
one can see that there are attachments (see Figure I a ).
Alternatively, while reading the PDF file, the reader is alerted that
there is an attachment by the presence of a thumbtack icon
(Figure I b ), which can be double clicked to open the file. An
alternative approach to extracting the data and source file is to use
a command-line program, such as pdftk' to extract the files from
the pdf. File attachments are a standard feature of the PDF
specification, and there are a number of ways to create them. This
Supporting Information file has two attachments: one is aJSON
data file, and one is the source file that generated the PDF. It is
also possible to embed data files in a Word document.
An example of using a command line utility to extract the
attachments is shown in Listing I, which will extract the two data
files into the current directory the command is run in. The
contents of the files that are extracted are explained in the
Supporting Information file, which in this case even contains
examples of how to use the data.
Received: M arch 13, 2015
Published: May 11, 2015
3894

DOC 10.1021/acscat
al,sb00538
ACSCarat 201 5, 5, 3894-3899

\fREEKER

ACS Cata lys is
I r f>lii<k lÂ»11Â»< Ix f . A R

st< l>lo(lrÂ»Â»Â»ir: prlfrk
IIL>>All>A
(' sll

hÂ»l" Â» I" <' pc ( h(iÂ»

L>pt>ng I ('onÂ»nind fme eÂ»tf >et>on ot the dtt Lched hieÂ» ln the sÂ»ppottÂ»>A infotmdtlon

L> tili" '2' A Pvt lÂ«Â»i Â» ( Tif)t tr) I > iil i
hl> I<lÂ».

pop.
p dft k

5 015 8 5 k

00 1 p df

p

I r i t l f i l i . Â» >rl r Â» t f Â« I n i t o t Â» i >tl< n i f i i Â» l t , i < ki l ( n -

P <J

k f I

d

This idea of file attachments is one that almost everyone could
adopt immediately. File attachments can be added using Adobe
Professional or other PDF editing tools. They can also be added
in LaTeX using the attachfile package.' W o r d d o cuments

I

)

I

dr

d (I

p . I'ILC>RÂ» '
f

dO)

i i 'l I'Rr'd I I

.

) ,

t (' f d>

I

f

( I)

f*

t (k y pp

() ,

lll

Now, we have an INCAR file with these contents, which
defines the parameters VASP will use in the calculation. Listing 3
shows how to view the contents of the INCAR file.

also support file attachments. This small step could significantly
improve data sharing. Although some standardization of data
formats could be helpful, if t his delays sharing, constrains
creativity, or limits sharing, we should tolerate nonstandard

IrÂ»lgÂ»d" 0: sh

approaches. The shared data that is easiest to use will get used
more often than data that is not easy to use.
In the next sections, we illustrate how easy it is to do this in a

Lisrinp 3: Shell mÂ»nmÂ»nrl tu print t l Â« < u n rcnt, of tl Â« I â€¢ C'AR tu t h Â«

few applications. We will first show an example of an embedded
data file and its use. Then, we will show an alternative example,
and a more sophisticated approach to data sharing.

r >nsr>l<

We can similarly retrieve information about which POTCAR
files were used. The VASP license prohibits sharing these,
so information about the files used is all one can do here, and in
order to reproduce the calculation, one is required to have a
VASP license. Note that it will not always be possible to share all
data. However, it is often possible to share enough data so that
another expert can reproduce the work.

2. USE OF AN EMBEDDED DATA FILE
We embedded a data file (data.json) in th e Supporting
I nformation PDF fi le. Th e file i s i n JSON format. JSON
(javascript object notation) is a standard data format often used
in web programming. It is structured and can be read by a
number of programming languages or by web-based programs
(e.g.f http:f'f'www.jsoneditoronline.orgf' ). The data set here is a
fit-for-purpose format, which is described in the Supporting

l)IIIAII IA("

d

I

Information document.
The data.json file embedded in the Supporting Information
PDF contains all the geometries and computational parameters

1

dr

â€¢ I

. ..i

d (f

p Â« > l i< (,Ik Â»1>tloÂ»,

d())

'I I'dr' I I I

I

' (ITC,IRS

pÂ»

used for every DFT calculation in the paper. Here, we consider
how one could extract the information for a single calculation
so that it could be used as the starting point for a new calculation.
To do this in VASP, for example, that means extracting the data

f ) c t l l <111

k P vt h ( i Â» < >R)t io f( Â»rl i h (* PIITC'AR f>k* mlr)fm iiinÂ» lof , i

L i im "

The IWOINTS file would be created from data in the "input"
section of the calculator data. We extract these in Listing 5.
Based on the information provided, it is evident a IG OINTS file
should be generated that creates a 10 X 10 X 1 Monkhorst â€” Pack
k-point grid (see the Supporting Information for the output of

needed to create an INCAR, POSCAR, POTCAR, and IG OINTS
file. These are input files for the VASP density functional theory
code. We will choose as the example a clean palladium hydride
slab. The critical point we wish to communicate here is that there is
machine-readable code embedded in the Supporting Information

Listing 5).
IÂ»Â»AÂ» lxc. il rthrÂ»l
LiÂ»tÂ»l" , Â»
5 1Â»lf)hÂ» .

file that can be reused. The output from each Listing can be found
in the Supporting Information.
Reading the Supporting Information file suggests the following
'
Python"
code (Listing 2 ) would extract th e I N C A R

d

I
pÂ»

parameters, and then we write each key/value pair to a fi le
called INCAR (a file containing input parameters for VASP ).
There i s a n e x t r a k e y c a lled d doc d that provides some

P v t l i rÂ»l Â»I
> if)t Tr) Ir Â» l T h < olh<l ( Â»lr'Â»I>firÂ»l prÂ» Â» Â» ctcts sÂ»<hÂ»s k p rÂ»Â»t
Â» Â» I < 'Â»('ll IÂ»" ('-('olvr'ld>luÂ» IÂ»Â»('tÂ»Â»l 11.

1

I

dr

t .- . i i

d (f

du l

'I I'111'd I I

I

'tl7111'RÂ» PJT

: rr'(0 >

(I > I

t (k y ,

l )1

We can leverage the Atomic Simulation Environment (ASE)
to easily create the POSCAR file from the chemical symbols,
positions, and unit cell stored in the json file. ASE can also write
other input file types including GPAW, Wien21<f Castep, Siesta,
Turbomol, FHI-AIMS, Gaussian, and Quantum Espresso, in
addition to file formats such as CUBE, xyz, cif, and some database
formats. Once the information is in Python, as we use below,
one can write code to convert the information to nearly any
format desired. In Listing 6, we generate a POSCAR file (the file
containing atomic coordinates and unit cell for the VASP code).

documentation on the data, which we delete before creating
the INCAR file. We choose an example of a Pd hydride slab.
The terminology that follows is likely to be known to those
familiar with VASP. It is not critical that Python be used here,

many other languages can accomplish the same result, including
Perl, Ruby, emacs-lisp, and others. Python loads a json file as
d
a dictionary, which enables data to be looked up as key:value
d

pairs. A value can be another dictionary, providing a nested data
structure.
3895

DOI: I 0.1 021/acscatÂ»1.5b00538
ACS Cataf. 20f 5, 5, 3894 â€” 3899

\fREEKER

ACS Cata lys is
The generated POSCAR file is somewhat long, so we do not
show it here.
lllll ell IX(" f )nt i)nil

Litt>I)- (i. PP>fin)) . ( Pll>t tn r i( itr Â« P ( ) s f 2) ff filn fir)ni Ili( * f dnn (Idt i f il i l n P i dur r ifir
I tf( )tf,it in>).

â€¢

d

P)"'

)

I

di Â«*

I

P

d (I
I

II
I (

Pd d A l l ,

d())

'l ( HPI l (

I

I t

I

lt )

: lt'

ii'li

)

Next, one would then create a POTCAR file (this file contains
the pseudopotentials used by VASP ) consistent with t h e
POSCAR file, and then VASP could be run to reproduce the
result, or th e atoms could be m odified to start some new
calculation. ThisJSON file contains all of the data used in making
the figures in the paper, and it can all be extracted for reuse using
any kind of scripting language with JSON support. It should be
evident that this information is sufficient for an expert to create
very similar calculations in other DFT c odes besides VASP
as well.
We emphasize here that JSON was used as a data format
because it is (I) suitable to organize the data, (2) it is easy to
generate, (3) it is easy to reuse. However, other data formats are
easily used as well. In ref 7, we embedded csv (comma-separated
value) files. Other formats including netCDF, HDF, sqlite, and
so forth, could also be used if they were fit for purpose. It is even
possible to embed data files from proprietary softare; in ref 5>
we embedded Excel files into the Supporting Information files.
Naturally, these files are only useful to people who have access to
the software required to read the files.
Although it is technically possible to embed any type of file
in a PDF, some readers restrict what you are allowed to open.
F or example, Adobe Acrobat will n o t a l low on e t o o p e n
executable or zip files for security reasons. These can still be
extracted using the pdftk software. Alternatively, these files could
be provided separately from the PDF Supporting Information

file.
This approach of data file embedding may become impractical
for very large data sets. In ref 19, we archived a large data set
(= 1.8 Gb) of DFT calculations in an external data sharing
site which assigned the data set a DOI. " An alternative data
repository could b e a n i n stitutional data repository which
also provides a DOI for citing. It remains to be seen if these
repositories remain archival quality r epositories and what
happens to the data if the repositories cease to operate due to
lack of funding or other reasons.
Finally, the embedded data file is part of the Supporting
Information file. The Supporting Information is in a sense the
metadata that informs the reader what is in the file, how it was
made, and by example, how to use it.

3. EMBEDDING DATA WITHIN TEXT
There is another approach to storing data in the Supporting
Information than just attaching a data file. One challenge with
data files is they are often not self-documenting. This means
that when they are separated from the source, it may not be clear
what is in the file or how to use it. An alternative approach
we have been investigating is that the data can be stored in a
3896

machine addressable table or code block within the manuscript.
Notably, this table is diiferent than what is read by humans
in the PDF. Instead, the table is in a source file that is embedded
in the P D F . T h i s i s t h e o t h e r f il e n a m ed s upportinginformation.org in our example. This file is in a plain text format
that is lightly marked up using org-mode " to diiferentiate text
21
f rom data and code. When this file is opened in Emacs, a
powerful text editor, then new data sharing and reuse possibilities
are available.
Org-mode is a few diEerent things. First, it is a document
markup syntax that i s a p proximately plain t ext, and t h at
diEerentiates text, citations, equations, tables, images, code, and
data. Second, org-mode is a library of code in Emacs that provides
rich, functional links, an outline mode, a capability to embed
interactive code and data in a document, and the capability to
export the document to another format (e.g. PDF, LaTeX, and
html among others). We actually write our scientific papers using
org-mode as mentioned already.
Others have also published papers using org-mode and
' This manuscript was prepared in org-mode.
about org-mode.22,22
The code blocks shown in t his manuscript are literally the
code that was run in the document in each example. Emacs is
able to run the codes, capture the output and insert it into the
manuscript. Emacs and org-mode are open-source software,
available for free on every major computing platform. One does
not need Emacs to read org-mode; the format is in plain text.
We would never consider writing org files without Emacs;
Emacs provides all the functionality that makes org-mode useful
for this application. There is limited support for org-mode in
other editors. There are limited but developing org-mode parsers
available in Ruby (e.g.> Github can render org files as HTML )
and other languages. We will show in the next paragraph that
it is possible to read org-mode with other languages. However,
to adapt a quote by N eal Stephenson, w h e n i t c o mes to
org-mode, "emacs outshines all o t her editing software i n
approximately the same way that the noonday sun does the
stars. It is not just bigger and brighter; it simply makes everything
else vanish."
The key example we want to illustrate here is that tables in
org-mode have actually two diEerent functions. First, they are
human-readable and can be rendered as regular tablesin the PDF
or html format. Second, they are machine addressable and can
serve as data sources for code blocks. Much of the data used in
ref 11 is stored in tables in the org-source. org-mode provides
native support to read these tables in as an array. Alternatively,
one can use any scripting tool to parse the org-file and extract the
data. For example, in Listing 7, we illustrate a small Python code
that opens an org-file, finds the table, and returns the data in
the table for further analysis. The key point here is that the data is
machine readable, and one is not tied to Emacs or org-mode
necessarily to take advantage of the embedded data. org-mode is
a syntax, which can be parsed by other tools.
We use the code from Listing 7 in Listing 8 to show that one
can import the data from a table in the Supporting Information
file and use the data for new purposes. We import a table named
jm-seg which contains data about the experimentally measured
surface and bulk composition of an alloy surface. We use this data
to plot the surface composition versus the bulk composition
(Figure 2). It should be evident that further analysis is possible
at this point, using exactly the same data as was used in our
manuscript.
DOI: 10.1021/acscata I.sb00538
ACS Cataf. 2015, 5, 3894 â€” 3899

\fACS Catalysis

l.G

src block name; read-sn-orgofile

/

language: python

/
Listing 7: Prototype code to read data tables from an org-file. This code is stored in
an external Python module named pymrg table.py so it can be used by python scripts

G.B

for analysis.

/

def striag todtunber(s):
'Try t

1 St

b

if ' . ' in s : S p s o I le fi
try:
return float(s)
e rcept Val p
t

,

t

g

/

tl

t

/

Q.6

/

8

else:
try:
t,(s)

et
pt y l

E

t

8

G.4

t'oturll e

/

def ead o g table(files e, table aae, 1 elude header =r e) :
' ' ' R d the t, bi
d y ddtENRNE f t
he g -f I
d PIIENRI(E.
If eot nlciUDE HglDEI>,sRtp the ft et oo of thc tebte.'''
sich open(fileaaae) asf:
co to ts = t.readli oeO
Sf

d th

t bi

. Et

t

th

1

t k

/
/
/

S+t b t

/

t (
t t ):
1f (11ne.loserO startssith('lt+tblnaae')
8 d chalone e 1 1 1 M ) t
1, 11

teblodunn = i

/

b

/

t eo ft t d o to t o f d o t e
table data start= table nano
lt 8 c o te ts[t:]:
i f l i tlo â€¢ e'bette tth( I )
bt'osit
else:
tsblo data ets t += 1

G.G
QI

Expt.

Pal +
i

G G.2 G.4 G.6 G.a l.
G

d th d t

data - (]
fo line
if

Figure 2. Experimentally determined surface composition as a function
of bulk composition at 900 K.

c eateats[table data start:]:
8 11

. t

t e t t h( ' I ' ) :

bt'88it

eitf leuc.startsvith('I
â€” '):
t

at

[ et
â€¢= [

g t
b (* . et i p O ) f
[ t c- t ) ]

*

li

. t i po . e p l t ( ' I ' ) ]

4. PERSPECTIVE ON THESE APPROACHES

if 1 ctude&eaderc
t

d at a

1

return data[1:]

srcblock name: segregation-data
language: python
Listing 8: Example code for extracting tabular data from s supporting information
org-file.
f

p r o g t8blo p
p

t

data =

t

t p l tlib.pypl t

pit

read org table('supportint-1 forsntio .orgu ']8-seg', pales)

bQ.R p = n [ 0 ] f
surf coap = [ro [1] t *

ro

dt ]
dat a]

p it.f i gure<tigsise-(i . t ) )
Pit.Plot(blk cnn, s f co P)
p it.plot((0, 1] , [ 0 , t ] , 'k""')
pit.rlab 1('t* (C, b lk)g')
pit.ylabel('Ilr(Cu, surface)t')
pit.logo d(['Erpt< O 'pa ty'],toe ='best')
pit..t.
Ebt 1 y t(>
pit
f g <' % ef d E 8
pn,' >

This example brings up an interesting issue. One does not
directly measure these compositions. These compositions were
derived from XPS and ion scattering experiments. We did not
share that data because at the time it was not practical. In ref 6,

the raw temperature-programmed desorption spectra can be
found in the Supporting Information org file but not in the PDF.
Including them in the PDF would have made it hundreds of
pages long of tabular data. Embedding the tables in the org-file,
so they could be used as data sources within the org-file, and
then subsequently embedding the org file in the PDF still makes
this data available. Alternatively, the data could be shared by
external
or ins t i tutional data stores that provide a citable
DOI for the data set. The point is that this approach is very
flexible and allows a broad range of data sharing strategies that
can be adapted as appropriate to the sharing need. We are
increasingly integrating org-mode into our manuscript preparation workflow, so that the data is already embedded as the
manuscript is developed.
3897

The first approach we outlined is immediately accessible to
nearly every author. Within Supporting Information files, authors
could attach data files and explain how to use them. Ideally, the
data files are the same files used in making the figures for the
manuscript. Attaching files can be tedious if there are many of
them, or if one has to reattach files on every iteration of a PDF
generation. Tools such as LaTeX make that less tedious by
integrating the attachment process in the manuscript build
process. Neither of these approaches, however, enable facile
inclusion of analysis and code in a way that ensures they are
synchronized (i.e., if one copies code into La TeX, it is no longer
guaranteed to be exactly the code that was run to get some
results). When coupled with the fact that it is difficult to use the
data in the published versions (PDF or html ) of manuscripts,
these factors, among others, have led to the dearth of practical
data sharing strategies today.
We have illustrated how org-mode can streamline the integration of narrative text, data, analysis, and code into manuscripts.
It is ultimately a fail-safe approach to data sharing because the
data is almost always in h uman readable, but machineaddressable, form. We recognize that there is a learning curve
associated with writing manuscripts this way, but it can be
a long and shallow curve that grows with an author. org-mode is
not more difficult to learn than La TeX, and Emacs can be made
similar to using Microsoft Word with GUIs and menus. Scientific
publishing is a career-long activity, and one should not shy away
from learning a tool that can have impact over this time scale.
If more manuscripts were written in org-mode, the manuscripts
themselves would serve as learning opportunities about how to
efFectively organize data and code for sharing and publications.
The way we use org-mode is essentially like a scientific
notebook Other tools have similar functionality. For example,
Matlab has a "notebook" feature that integrates with Microsoft
Word, a s well as functionality to export a Matlab script to
H TML or LaTeX T here is the Ipython notebook, w h i ch
provides similar capability through a web browser, and the newer
project Jupyter, which extends the Ipython concept to other
languages. For specific uses, each of these tools may provide a
DOh 10.1021/acscat
al.sb00538
f[CS Ca(a]. 2015, 5, 3894-3899

\fREEKER

ACS Cata lys is
solution similar to what we have described here. None of them,
however, have the flexibility and power of org-mode.
It is worth considering the role of standardization in data
s haring. N o s t a n dard c a n m e e t e v e rybody's n e eds i n
experimental and computational science. Even with a standard,
the lack of tools in standard authoring tools will limit its use.
Our opinion is that we should just start sharing data as we have
described in this manuscript. Any sharing would probably be
better than the general lack of sharing that occurs today. It is true
that there may be many dilferent formats and that some will be
better than others. The data that is most valuable will be reused in
new forms, and best practices will emerge. Data that is not reused
will fade into obscurity, an old tradition in the scientific literature.
This approach is distinctly dilferent than any approach that
advocates for a centralized database. In this approach, the data
is distributed among the publishers and anyone who downloads
the files. Individual users within domains could curate collections
of the data that is fit for specific purposes. These domain experts
would be able to convert the various formats into a common
format used in their research and likely reshared through the
same mechanisms. In contrast, a centralized database would
contain a trove of data that is not useful to the majority of
the users. Further compounding this is the need for some
standardized formats for the data for most database approaches,
and maintenance costs associated with the servers that must host
this. While there have been some successful examples of this,
e.g. the Protein Databank, or crystallography databases, these
examples are in highly specialized fields, with relatively uniform
types of data. Elforts to create centralized databases for catalysis
should be supported, but we should not wait for them to be
created to share data we can share now. By sharing now, we can
enable this data to potentially be included in larger database
elforts in the future.
The approaches we have presented here certainly do not
address every issue of data sharing or reproducibility. There
are other approaches to data sharing and reproducibility in the
literature (e.go the use of a Java Virtual Machine and the
Hierarchical Data Format ),' a n d a n u mber of approaches
are discussed in ref 33 and in a recent book on "Implementing
Reproducible Research".' Many of these approaches have been
developed for domain-specific problems (e.go u sing R f o r
statistical analysis or using Matlab for a specific kind of problem),
and they t end t o f o cu s o n c o m p utational research. The
approaches we present in this work have been used by us in
both experimental and computational research publications.
In our opinion, the approaches presented here address many of
the issues in data sharing and reproducibility, and they provide
a path forward that is likely to improve existing elforts in data
sharing and in reproducibility.

â€¢

AC K N O WLEDGMENTS

We gratefully acknowledge support from the DOE Office of
Science Early Career Research program (DE-SC0004031). We
also acknowledge support from the Simon Initiative at Carnegie
Mellon University and the Phillip L. Dowd teaching fellowship
for support.

â€¢
(I)

RE F ERENCES
N a t ional Science Foundation, NSF D ata M anagement Plan

Requirements. http: //www.nsf gov/eng/general/dmp.jsp (accessed
February 23, 2015).
(2) National Science Foundation, Dissemination and Sharing of

Research Results. http: //www.nsf gov/bfa/dias/policy/dmp.jsp (accessed February 23, 2015).
(3) Department of Energy, Statement on Digital Data Management.
http: //science. energy.gov/funding-opportunities/digital-datamanagement/
(4) National Institutes of Health, NIH Data Sharing Policy. http: //
grants.nih.gov/grants/policy/data sharing/ (accessed February 23,

2015).
(5) Hallenbeck, A. P.; Kitchin, J. R. Ind. Eng. Chem. Res. Z013, 52,
10788-10794.
(6) Miller, S. D.; Pushkarev, V. V.; Gellman, A.J.; Kitchin, J. R. Top.
Caial. Z014, 57, 106 â€” 117.
(7) Curnan, M. T.; Kitchin, J. R. J.Phys. Chem. C Z014, 118, 2877628790.
(8) Xu, Z.; Kitchin, J. R. J. Phys. Chem. C Z014, 118, 25597 â€” 25602.
(9) Xu, Z.; Kitchin, J. R. Caial. Commun. Z014, 52, 60 â€” 64.
(10) Mehta, P.; Salvador, P. A.; Kitchin, J. R.ACS Appl. Mater. Interfaces
2014, 6, 3630-3639.
(I I) Boes, J. R.; Gumuslu, G.; Miller, J. B.; Gellman, A.J.; Kitchin, J. R.
ACS Caial. Z015, 5, 1020 â€” 1026.
(12) Boes, J. R.; Gumuslu, G.; Miller, J. B.; Gellman, A.J.; Kitchin, J. R.
ACS Caial. Z015, Supporting information. Vol. 5, p 1020, 10.1021/
cs501585k
(13) PDF Laboratories, PDFtk the pdf toolkit. https://www.pdflabs.
corn/tools/pdftk-the-pdf-toolkit/, h t tps:// www.pdflabs.corn/tools/
pdftk-the-pdt-toolkit/
(14) Pakin, S. attach file. http: //www.ctan.org/tex-archive/macros/
latex/contrib/attachhle, vl.Sb.
(15) Python Software Foundation, Python. https://www.python.org

(16) Millman, K. J.; Aivazis, M. Compui. Sci. Eng.Z011, 13, 9 â€” 12.
(17) Perkel, J. M. Nature Z015, 518, 125 â€” 126.
(18) Bahn, S. R.; Jacobsen, K. W. Compui. Sci. Eng.ZOOZ, 4, 56 â€” 66.
(19) Xu, Z.; Rossmeisl, J.; Kitchin, J. R. Supporting data for: A linear
response, DFT+U study of trends in the oxygen evolution activity of
transition metal rutile dioxides. https://zenodo.org/record/12635,
10.5281/zenodo.12635.
(20) Dominik, C. The Org. Mode8 Reference Manual - Organize your lif e
with GNU Emacs; Samurai Media Limited: Hong Kong, 2014.
(21) Free Software Foundation, Emacs. https://www.gnu.org/
software/emacs/emacs.html, v24.3

(22) Schulte, E.; Davison, D. Compui. Sci. Eng.Z011, 13, 66 â€” 73.
(23) Schulte, E.; Davison, D.; Dye, T.; Dominik, C. Journal of Statistical
SoftwareZOIZ, 46, I â€” 24.
(24) Stephenson, N. In Ihe Beginning "was Ihe Command Line, 1st ed.
ed.; William Morrow Paperbacks: New York, 1999.
(25) Zenodo. https://zenodo.org, Zenodo builds and operate a simple

AS S O CIATED CONTENT

O Supporting Information
The Supporting Information is available free of charge on the
ACS Publications website at DOI: 10.1021/acscatal.Sb00538.
The manuscript org source, and the output from the code

and innovative service that enables researchers, scientists, EU projects,
and institutions to share and showcase multidisciplinary research results
(data and publications) that are not part of the existing institutional or
subject-based repositories of the research communities.

listings (P D F )
â€¢

â€¢

AU T H OR INFORMATION

(26) figshare. http: //hgshare.corn, figshare helps academic institutions

Corresponding Author
*E-mail: jkitchingandrew.cmu.edu.

store, share and manage all of their research outputs
(27) Data Science at The Institute for Quantitative and Social Science,

Notes

The Dataverse Project. http: //dataverse.org (accessed February 23,

2015).

The authors declare no competing financial interest.
3898

Dol:10a 021iacscata tsb00538
Acs Catat 2015, 5, 3894 â€” 3899

\fREEKER

ACS Cata lys is
(28) Borgman, C. L.J. Am.J.Am. Soc. Inf. Sci. Technol.Z01Z, 63, 10591078.
(29) MathWorks, MATLAB Notebook. http: //www.mathworks.corn/
help/matlab/matlab prog/create-a-matlab-notebook-with-microsoftword. html, The MATLAB notebook integrates Microsoft Word and
MATLAB to create a functional document with integrated code and
results.

(30) Perez, Fa Granger, B. E. Compuf. Sci. Eng.Z007, 9, 21 â€” 29.
(31) Project Jupyter. http: //jupyter.org/, The Jupyter Project provides
a web-browser based computational notebook w it h a r a nge o f
computational backends including Python, Julia, R and others.
(32) Hinsen, K. Procedia Compufer ScienceZ011, 4, 579 â€” 588.

(33) Fomel, Sa Claerbout, J. F. Compuf. Sci. Eng. Z009, 11, 5 â€” 7.
(34) Implemenfing Reproducible Research;Stodden, Va Leisch, Fa Peng,
R. D., Edsa Chapman and Hall/CRC: London, 2014.

3899

DOI: 10.1021/acscata 1.5b00538
AC5 Cata/. 2015, 5, 3894 â€” 3899

\f

[-- Attachment #2.3: Type: text/html, Size: 233 bytes --]

next prev parent reply	other threads:[~2015-12-23 21:44 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-12-23 21:22 Problem with org-ref marvin doyley
2015-12-23 21:30 ` John Kitchin
2015-12-23 21:44   ` marvin doyley [this message]
2015-12-23 22:16   ` marvin doyley
2015-12-23 23:56   ` Andreas Kiermeier
2015-12-24  1:33     ` John Kitchin
2015-12-24  6:04       ` Andreas Kiermeier
2015-12-24 14:15         ` John Kitchin
2015-12-25  1:17           ` Andreas Kiermeier
2015-12-25  2:47             ` John Kitchin
2015-12-25  3:16               ` Andreas Kiermeier
2015-12-25 14:07                 ` John Kitchin
2015-12-26  0:03                   ` Andreas Kiermeier
2015-12-26  1:36                     ` John Kitchin
2015-12-26  4:52                       ` Andreas Kiermeier
  -- strict thread matches above, loose matches on Subject: below --
2020-07-20 13:09 problem " Johannes Brauer
2020-07-20 13:49 ` John Kitchin
2021-03-08 15:20 Problem " Marvin M. Doyley
2021-03-10  4:49 ` Kyle Meyer

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.orgmode.org/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=F2968A13-CC35-489C-9BCC-2DB397D02434@gmail.com \
    --to=marvinpas@gmail.com \
    --cc=emacs-orgmode@gnu.org \
    --cc=jkitchin@andrew.cmu.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).