From mboxrd@z Thu Jan 1 00:00:00 1970 From: marvin doyley Subject: Re: Problem with org-ref Date: Wed, 23 Dec 2015 16:44:30 -0500 Message-ID: References: <9B7FD49A-1A5F-42D3-9D9F-72BB151AF28C@gmail.com> Mime-Version: 1.0 (Mac OS X Mail 9.2 \(3112\)) Content-Type: multipart/alternative; boundary="Apple-Mail=_8B1C08BD-0F3A-4E3A-9CA9-DCC292341BF3" Return-path: Received: from eggs.gnu.org ([2001:4830:134:3::10]:44886) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1aBrD5-0000WT-7L for emacs-orgmode@gnu.org; Wed, 23 Dec 2015 16:44:43 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1aBrD1-0003VS-0f for emacs-orgmode@gnu.org; Wed, 23 Dec 2015 16:44:39 -0500 Received: from mail-qg0-x22b.google.com ([2607:f8b0:400d:c04::22b]:36792) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1aBrD0-0003VL-Pt for emacs-orgmode@gnu.org; Wed, 23 Dec 2015 16:44:34 -0500 Received: by mail-qg0-x22b.google.com with SMTP id c96so142169631qgd.3 for ; Wed, 23 Dec 2015 13:44:34 -0800 (PST) In-Reply-To: List-Id: "General discussions about Org-mode." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-orgmode-bounces+geo-emacs-orgmode=m.gmane.org@gnu.org Sender: emacs-orgmode-bounces+geo-emacs-orgmode=m.gmane.org@gnu.org To: John Kitchin Cc: "emacs-orgmode@gnu.org" --Apple-Mail=_8B1C08BD-0F3A-4E3A-9CA9-DCC292341BF3 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=us-ascii Hi John, I am using a Mac too, and pdftotext seem to be working, well for an ocr = file (see below) if I run pdftotext on the original pdf I get nothing. cheers, M PS I am using brew poppler, which includes pdftotext =20 > On Dec 23, 2015, at 4:30 PM, John Kitchin = wrote: >=20 > Thanks! Do you know if you have pdftotext working on your machine? The = pdf drag-n-drop works by converting the pdf to text, and than matching a = pattern to find a doi. If none is found, you get the message you noted. = The url dnd works similarly, but there are a bunch of recipes for what = to match depending on the base of the url. >=20 > I did that on a Mac, and I haven't tested it on a windows or Linux = machine.=20 >=20 > John --Apple-Mail=_8B1C08BD-0F3A-4E3A-9CA9-DCC292341BF3 Content-Type: multipart/mixed; boundary="Apple-Mail=_B055F875-3E11-4813-B99B-1C24A61985FB" --Apple-Mail=_B055F875-3E11-4813-B99B-1C24A61985FB Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=us-ascii Hi John,

I = am using a Mac too, and pdftotext seem to be working, well for an ocr = file (see below) if I run pdftotext on the original pdf I get = nothing.
cheers,
M

PS I am using brew = poppler, which includes pdftotext
 
On = Dec 23, 2015, at 4:30 PM, John Kitchin <jkitchin@andrew.cmu.edu> wrote:

Thanks! Do you know if you = have pdftotext working on your machine? The pdf drag-n-drop works by = converting the pdf to text, and than matching a pattern to find a doi. = If none is found, you get the message you noted. The url dnd works = similarly, but there are a bunch of recipes for what to match depending = on the base of the url.

I did that on a Mac, and I haven't tested it on a windows or = Linux machine. 

John
= --Apple-Mail=_B055F875-3E11-4813-B99B-1C24A61985FB Content-Disposition: attachment; filename=acscatal%2E5b00538-OCR.txt Content-Type: text/plain; name="acscatal%2E5b00538-OCR.txt" Content-Transfer-Encoding: quoted-printable O~ This is an open access article published under an ACS Authorchoice = License which permits copyingand redistribution ofthe article or any adaptations = fornon-commercialpurposes. ACS rditers' Choice r e pubs.acs.org/acscatalysis Examples of Effective Data Sharing in Scientific Publishing John R. Kitchin* Department of Chemical Engineering, Carnegie Mellon University, 5000 = Forbes Avenue, Pittsburgh, Pennsylvania 15213, United States 6 Supporting Information ABSTRACT: We present a perspective on an approach to data sharing in = scientific publications we have been developing in our group. The essence of the approach is that data can be embedded in a = human-readable and machine-addressable way within the traditional publishing environment. We show this by example for both computational = and experimental data. We articulate a need for new authoring tools to facilitate data sharing, and we discuss the tools we have been = developing for this purpose. With these tools, data generation, analysis, and manuscript preparation can be deeply integrated, resulting = in easier and better data sharing in scientific publications. 1. INTRODUCTION a)Q Data sharing and m anagement plans are becoming an increasingly important requirement for scientific research. The 1 National Science Foundation requires a data management plan for all proposals and has a formal statement on data sharing. The Department of Energy is beginning to require this in some proposals and has issued a formal statement on data management. Th e N = ational Institutes of Health has a formal data sharing policy.4 These mandates do not usually say how the data management plan must be implemented, simply that there must be one that is appropriate for the data being generated and consistent with the needs of the scientific field. There are several challenges to meeting these requirements. Data is a generic term that refers to a broad range of information that = varies in size and complexity. There is no single solution that fits all data. The tools we use to write manuscripts, which has been the primary mode of data sharing for decades typically only include graphical or tabular data. Those data are usually not machine readable nor are they easy to reuse. We believe that progress in data sharing will require the development of = new tools and social expectations about scientific data. T hese tools will have to i n tegrate data directly into t h e manuscripts and into the manuscript preparation workflow, so that by the time one finishes writing the manuscript, the data is already integrated and ready to share with no further work required. When this is possible, data will be shared much more frequently. In this Viewpoint, we illustrate an example of a method we have been developing that enables data, analysis, and code sharing within the traditional publishing environment. We have published several papers this way already including experimental 5 6 work, combined computational and experimental work, and computational work. ' T hese manuscripts in published form are indistinguishable from other manuscripts in those journals. However, the Supporting Information files are very difFerent. Although, at first glance, the Supporting Information files are simple PDF files that provide additional information, there are actually data files embedded (see Figure I a ) in those PDFs which can be extracted and reused. The data is human readable in most cases, and machine addressable, enabling reuse and sharing. We will provide an example of this for a '~ ACg pUbIicgtiQfi5 0201SAmericanchemicalSociety tt v @ Add ~iii Delete cj h Search a Description Name Q data.json Q supportlng-information.org IModined Unknown Unknown b le, to figure our how soine analysis was r , ~< he~. ~ + An embeddedfi le in the PDF Figure 1. (a) The attachments in the Supporting Information of ref 11. There are two attachments. (b) A single attachment in the PDF file. Double-clicking on the thumbtack(in the Supporting Information file) will open the attached file. paper we recently published in ACS Catalysis on "Estimating Bulk-Composition-Dependent Hz Adsorption Energies on Cu=E2=80=9EPd, =E2=80=9EAlloy (111) Surfaces"." The Supporting Information file for that article is available for free.' When opened in a PDF reader that supports attachments, one can see that there are attachments (see Figure I a ). Alternatively, while reading the PDF file, the reader is alerted that there is an attachment by the presence of a thumbtack icon (Figure I b ), which can be double clicked to open the file. An alternative approach to extracting the data and source file is to use a command-line program, such as pdftk' to extract the files from the pdf. File attachments are a standard feature of the PDF specification, and there are a number of ways to create them. This Supporting Information file has two attachments: one is aJSON data file, and one is the source file that generated the PDF. It is also possible to embed data files in a Word document. An example of using a command line utility to extract the attachments is shown in Listing I, which will extract the two data files into the current directory the command is run in. The contents of the files that are extracted are explained in the Supporting Information file, which in this case even contains examples of how to use the data. Received: M arch 13, 2015 Published: May 11, 2015 3894 DOC 10.1021/acscat al,sb00538 ACSCarat 201 5, 5, 3894-3899 =0CREEKER ACS Cata lys is I r f>liilo(lr=C2=BB=C2=BB=C2=BBir: prlfrk IIL>>All>A (' sll h=C2=BBl" =C2=BB I" <' pc ( h(i=C2=BB L>pt>ng I ('on=C2=BBnind fme e=C2=BBtf >et>on ot the dtt Lched hie=C2=BB = ln the s=C2=BBppott=C2=BB>A infotmdtlon L> tili" '2' A Pvt l=C2=AB=C2=BBi =C2=BB ( Tif)t tr) I > iil i hl> Irl r =C2=BB t f =C2=AB I n i t o t =C2=BB i = >tl< n i f i i =C2=BB l t , i < ki l ( n - P R=C2=BB ' f dO) i i 'l I'Rr'd I I . ) , t (' f d> I f ( I) f* t (k y pp () , lll Now, we have an INCAR file with these contents, which defines the parameters VASP will use in the calculation. Listing 3 shows how to view the contents of the INCAR file. also support file attachments. This small step could significantly improve data sharing. Although some standardization of data formats could be helpful, if t his delays sharing, constrains creativity, or limits sharing, we should tolerate nonstandard Ir=C2=BBlg=C2=BBd" 0: sh approaches. The shared data that is easiest to use will get used more often than data that is not easy to use. In the next sections, we illustrate how easy it is to do this in a Lisrinp 3: Shell m=C2=BBnm=C2=BBnrl tu print t l =C2=AB < u n rcnt, of = tl =C2=AB I =E2=80=A2 C'AR tu t h =C2=AB few applications. We will first show an example of an embedded data file and its use. Then, we will show an alternative example, and a more sophisticated approach to data sharing. r >nsr>l< We can similarly retrieve information about which POTCAR files were used. The VASP license prohibits sharing these, so information about the files used is all one can do here, and in order to reproduce the calculation, one is required to have a VASP license. Note that it will not always be possible to share all data. However, it is often possible to share enough data so that another expert can reproduce the work. 2. USE OF AN EMBEDDED DATA FILE We embedded a data file (data.json) in th e Supporting I nformation PDF fi le. Th e file i s i n JSON format. JSON (javascript object notation) is a standard data format often used in web programming. It is structured and can be read by a number of programming languages or by web-based programs (e.g.f http:f'f'www.jsoneditoronline.orgf' ). The data set here is a fit-for-purpose format, which is described in the Supporting l)IIIAII IA(" d I Information document. The data.json file embedded in the Supporting Information PDF contains all the geometries and computational parameters 1 dr =E2=80=A2 I . ..i d (f p =C2=AB > l i< (,Ik =C2=BB1>tlo=C2=BB, d()) 'I I'dr' I I I I ' (ITC,IRS p=C2=BB used for every DFT calculation in the paper. Here, we consider how one could extract the information for a single calculation so that it could be used as the starting point for a new calculation. To do this in VASP, for example, that means extracting the data f ) c t l l <111 k P vt h ( i =C2=BB < >R)t io f( =C2=BBrl i h (* PIITC'AR f>k* mlr)fm = iiin=C2=BB lof , i L i im " The IWOINTS file would be created from data in the "input" section of the calculator data. We extract these in Listing 5. Based on the information provided, it is evident a IG OINTS file should be generated that creates a 10 X 10 X 1 Monkhorst =E2=80=94 Pack k-point grid (see the Supporting Information for the output of needed to create an INCAR, POSCAR, POTCAR, and IG OINTS file. These are input files for the VASP density functional theory code. We will choose as the example a clean palladium hydride slab. The critical point we wish to communicate here is that there is machine-readable code embedded in the Supporting Information Listing 5). I=C2=BB=C2=BBA=C2=BB lxc. il rthr=C2=BBl Li=C2=BBt=C2=BBl" , =C2=BB 5 1=C2=BBlf)h=C2=BB . file that can be reused. The output from each Listing can be found in the Supporting Information. Reading the Supporting Information file suggests the following ' Python" code (Listing 2 ) would extract th e I N C A R d I p=C2=BB parameters, and then we write each key/value pair to a fi le called INCAR (a file containing input parameters for VASP ). There i s a n e x t r a k e y c a lled d doc d that provides some P v t l i r=C2=BBl =C2=BBI > if)t Tr) Ir =C2=BB l T h < olhfir=C2=BBl pr=C2=BB = =C2=BB =C2=BB ctcts s=C2=BBlu=C2=BB = I=C2=BB=C2=BB('t=C2=BB=C2=BBl 11. 1 I dr t .- . i i d (f du l 'I I'111'd I I I 'tl7111'R=C2=BB PJT : rr'(0 > (I > I t (k y , l )1 We can leverage the Atomic Simulation Environment (ASE) to easily create the POSCAR file from the chemical symbols, positions, and unit cell stored in the json file. ASE can also write other input file types including GPAW, Wien21I)- (i. PP>fin)) . ( Pll>t tn r i( itr =C2=AB P ( ) s f 2) ff filn = fir)ni Ili( * f dnn (Idt i f il i l n P i dur r ifir I tf( )tf,it in>). =E2=80=A2 d P)"' ) I di =C2=AB* I P d (I I II I ( Pd d A l l , d()) 'l ( HPI l ( I I t I lt ) : lt' ii'li ) Next, one would then create a POTCAR file (this file contains the pseudopotentials used by VASP ) consistent with t h e POSCAR file, and then VASP could be run to reproduce the result, or th e atoms could be m odified to start some new calculation. ThisJSON file contains all of the data used in making the figures in the paper, and it can all be extracted for reuse using any kind of scripting language with JSON support. It should be evident that this information is sufficient for an expert to create very similar calculations in other DFT c odes besides VASP as well. We emphasize here that JSON was used as a data format because it is (I) suitable to organize the data, (2) it is easy to generate, (3) it is easy to reuse. However, other data formats are easily used as well. In ref 7, we embedded csv (comma-separated value) files. Other formats including netCDF, HDF, sqlite, and so forth, could also be used if they were fit for purpose. It is even possible to embed data files from proprietary softare; in ref 5> we embedded Excel files into the Supporting Information files. Naturally, these files are only useful to people who have access to the software required to read the files. Although it is technically possible to embed any type of file in a PDF, some readers restrict what you are allowed to open. F or example, Adobe Acrobat will n o t a l low on e t o o p e n executable or zip files for security reasons. These can still be extracted using the pdftk software. Alternatively, these files could be provided separately from the PDF Supporting Information file. This approach of data file embedding may become impractical for very large data sets. In ref 19, we archived a large data set (=3D 1.8 Gb) of DFT calculations in an external data sharing site which assigned the data set a DOI. " An alternative data repository could b e a n i n stitutional data repository which also provides a DOI for citing. It remains to be seen if these repositories remain archival quality r epositories and what happens to the data if the repositories cease to operate due to lack of funding or other reasons. Finally, the embedded data file is part of the Supporting Information file. The Supporting Information is in a sense the metadata that informs the reader what is in the file, how it was made, and by example, how to use it. 3. EMBEDDING DATA WITHIN TEXT There is another approach to storing data in the Supporting Information than just attaching a data file. One challenge with data files is they are often not self-documenting. This means that when they are separated from the source, it may not be clear what is in the file or how to use it. An alternative approach we have been investigating is that the data can be stored in a 3896 machine addressable table or code block within the manuscript. Notably, this table is diiferent than what is read by humans in the PDF. Instead, the table is in a source file that is embedded in the P D F . T h i s i s t h e o t h e r f il e n a m ed s = upportinginformation.org in our example. This file is in a plain text = format that is lightly marked up using org-mode " to diiferentiate text 21 f rom data and code. When this file is opened in Emacs, a powerful text editor, then new data sharing and reuse possibilities are available. Org-mode is a few diEerent things. First, it is a document markup syntax that i s a p proximately plain t ext, and t h at diEerentiates text, citations, equations, tables, images, code, and data. Second, org-mode is a library of code in Emacs that provides rich, functional links, an outline mode, a capability to embed interactive code and data in a document, and the capability to export the document to another format (e.g. PDF, LaTeX, and html among others). We actually write our scientific papers using org-mode as mentioned already. Others have also published papers using org-mode and ' This manuscript was prepared in org-mode. about org-mode.22,22 The code blocks shown in t his manuscript are literally the code that was run in the document in each example. Emacs is able to run the codes, capture the output and insert it into the manuscript. Emacs and org-mode are open-source software, available for free on every major computing platform. One does not need Emacs to read org-mode; the format is in plain text. We would never consider writing org files without Emacs; Emacs provides all the functionality that makes org-mode useful for this application. There is limited support for org-mode in other editors. There are limited but developing org-mode parsers available in Ruby (e.g.> Github can render org files as HTML ) and other languages. We will show in the next paragraph that it is possible to read org-mode with other languages. However, to adapt a quote by N eal Stephenson, w h e n i t c o mes to org-mode, "emacs outshines all o t her editing software i n approximately the same way that the noonday sun does the stars. It is not just bigger and brighter; it simply makes everything else vanish." The key example we want to illustrate here is that tables in org-mode have actually two diEerent functions. First, they are human-readable and can be rendered as regular tablesin the PDF or html format. Second, they are machine addressable and can serve as data sources for code blocks. Much of the data used in ref 11 is stored in tables in the org-source. org-mode provides native support to read these tables in as an array. Alternatively, one can use any scripting tool to parse the org-file and extract the data. For example, in Listing 7, we illustrate a small Python code that opens an org-file, finds the table, and returns the data in the table for further analysis. The key point here is that the data is machine readable, and one is not tied to Emacs or org-mode necessarily to take advantage of the embedded data. org-mode is a syntax, which can be parsed by other tools. We use the code from Listing 7 in Listing 8 to show that one can import the data from a table in the Supporting Information file and use the data for new purposes. We import a table named jm-seg which contains data about the experimentally measured surface and bulk composition of an alloy surface. We use this data to plot the surface composition versus the bulk composition (Figure 2). It should be evident that further analysis is possible at this point, using exactly the same data as was used in our manuscript. DOI: 10.1021/acscata I.sb00538 ACS Cataf. 2015, 5, 3894 =E2=80=94 3899 =0CACS Catalysis l.G src block name; read-sn-orgofile / language: python / Listing 7: Prototype code to read data tables from an org-file. This = code is stored in an external Python module named pymrg table.py so it can be used by = python scripts G.B for analysis. / def striag todtunber(s): 'Try t 1 St b if ' . ' in s : S p s o I le fi try: return float(s) e rcept Val p t , t g / tl t / Q.6 / 8 else: try: t,(s) et pt y l E t 8 G.4 t'oturll e / def ead o g table(files e, table aae, 1 elude header =3Dr e) : ' ' ' R d the t, bi d y ddtENRNE f t he g -f I d PIIENRI(E. If eot nlciUDE HglDEI>,sRtp the ft et oo of thc tebte.''' sich open(fileaaae) asf: co to ts =3D t.readli oeO Sf d th t bi . Et t th 1 t k / / / S+t b t / t ( t t ): 1f (11ne.loserO startssith('lt+tblnaae') 8 d chalone e 1 1 1 M ) t 1, 11 teblodunn =3D i / b / t eo ft t d o to t o f d o t e table data start=3D table nano lt 8 c o te ts[t:]: i f l i tlo =E2=80=A2 e'bette tth( I ) bt'osit else: tsblo data ets t +=3D 1 G.G QI Expt. Pal + i G G.2 G.4 G.6 G.a l. G d th d t data - (] fo line if Figure 2. Experimentally determined surface composition as a function of bulk composition at 900 K. c eateats[table data start:]: 8 11 . t t e t t h( ' I ' ) : bt'88it eitf leuc.startsvith('I =E2=80=94 '): t at [ et =E2=80=A2=3D [ g t b (* . et i p O ) f [ t c- t ) ] * li . t i po . e p l t ( ' I ' ) ] 4. PERSPECTIVE ON THESE APPROACHES if 1 ctude&eaderc t d at a 1 return data[1:] srcblock name: segregation-data language: python Listing 8: Example code for extracting tabular data from s supporting = information org-file. f p r o g t8blo p p t data =3D t t p l tlib.pypl t pit read org table('supportint-1 forsntio .orgu ']8-seg', pales) bQ.R p =3D n [ 0 ] f surf coap =3D [ro [1] t * ro dt ] dat a] p it.f i gure pit f g <' % ef d E 8 pn,' > This example brings up an interesting issue. One does not directly measure these compositions. These compositions were derived from XPS and ion scattering experiments. We did not share that data because at the time it was not practical. In ref 6, the raw temperature-programmed desorption spectra can be found in the Supporting Information org file but not in the PDF. Including them in the PDF would have made it hundreds of pages long of tabular data. Embedding the tables in the org-file, so they could be used as data sources within the org-file, and then subsequently embedding the org file in the PDF still makes this data available. Alternatively, the data could be shared by external or ins t i tutional data stores that provide a citable DOI for the data set. The point is that this approach is very flexible and allows a broad range of data sharing strategies that can be adapted as appropriate to the sharing need. We are increasingly integrating org-mode into our manuscript preparation = workflow, so that the data is already embedded as the manuscript is developed. 3897 The first approach we outlined is immediately accessible to nearly every author. Within Supporting Information files, authors could attach data files and explain how to use them. Ideally, the data files are the same files used in making the figures for the manuscript. Attaching files can be tedious if there are many of them, or if one has to reattach files on every iteration of a PDF generation. Tools such as LaTeX make that less tedious by integrating the attachment process in the manuscript build process. Neither of these approaches, however, enable facile inclusion of analysis and code in a way that ensures they are synchronized (i.e., if one copies code into La TeX, it is no longer guaranteed to be exactly the code that was run to get some results). When coupled with the fact that it is difficult to use the data in the published versions (PDF or html ) of manuscripts, these factors, among others, have led to the dearth of practical data sharing strategies today. We have illustrated how org-mode can streamline the integration of = narrative text, data, analysis, and code into manuscripts. It is ultimately a fail-safe approach to data sharing because the data is almost always in h uman readable, but machineaddressable, form. = We recognize that there is a learning curve associated with writing manuscripts this way, but it can be a long and shallow curve that grows with an author. org-mode is not more difficult to learn than La TeX, and Emacs can be made similar to using Microsoft Word with GUIs and menus. Scientific publishing is a career-long activity, and one should not shy away from learning a tool that can have impact over this time scale. If more manuscripts were written in org-mode, the manuscripts themselves would serve as learning opportunities about how to efFectively organize data and code for sharing and publications. The way we use org-mode is essentially like a scientific notebook Other tools have similar functionality. For example, Matlab has a "notebook" feature that integrates with Microsoft Word, a s well as functionality to export a Matlab script to H TML or LaTeX T here is the Ipython notebook, w h i ch provides similar capability through a web browser, and the newer project Jupyter, which extends the Ipython concept to other languages. For specific uses, each of these tools may provide a DOh 10.1021/acscat al.sb00538 f[CS Ca(a]. 2015, 5, 3894-3899 =0CREEKER ACS Cata lys is solution similar to what we have described here. None of them, however, have the flexibility and power of org-mode. It is worth considering the role of standardization in data s haring. N o s t a n dard c a n m e e t e v e rybody's n e eds i n experimental and computational science. Even with a standard, the lack of tools in standard authoring tools will limit its use. Our opinion is that we should just start sharing data as we have described in this manuscript. Any sharing would probably be better than the general lack of sharing that occurs today. It is true that there may be many dilferent formats and that some will be better than others. The data that is most valuable will be reused in new forms, and best practices will emerge. Data that is not reused will fade into obscurity, an old tradition in the scientific literature. This approach is distinctly dilferent than any approach that advocates for a centralized database. In this approach, the data is distributed among the publishers and anyone who downloads the files. Individual users within domains could curate collections of the data that is fit for specific purposes. These domain experts would be able to convert the various formats into a common format used in their research and likely reshared through the same mechanisms. In contrast, a centralized database would contain a trove of data that is not useful to the majority of the users. Further compounding this is the need for some standardized formats for the data for most database approaches, and maintenance costs associated with the servers that must host this. While there have been some successful examples of this, e.g. the Protein Databank, or crystallography databases, these examples are in highly specialized fields, with relatively uniform types of data. Elforts to create centralized databases for catalysis should be supported, but we should not wait for them to be created to share data we can share now. By sharing now, we can enable this data to potentially be included in larger database elforts in the future. The approaches we have presented here certainly do not address every issue of data sharing or reproducibility. There are other approaches to data sharing and reproducibility in the literature (e.go the use of a Java Virtual Machine and the Hierarchical Data Format ),' a n d a n u mber of approaches are discussed in ref 33 and in a recent book on "Implementing Reproducible Research".' Many of these approaches have been developed for domain-specific problems (e.go u sing R f o r statistical analysis or using Matlab for a specific kind of problem), and they t end t o f o cu s o n c o m p utational research. The approaches we present in this work have been used by us in both experimental and computational research publications. In our opinion, the approaches presented here address many of the issues in data sharing and reproducibility, and they provide a path forward that is likely to improve existing elforts in data sharing and in reproducibility. =E2=80=A2 AC K N O WLEDGMENTS We gratefully acknowledge support from the DOE Office of Science Early Career Research program (DE-SC0004031). We also acknowledge support from the Simon Initiative at Carnegie Mellon University and the Phillip L. Dowd teaching fellowship for support. =E2=80=A2 (I) RE F ERENCES N a t ional Science Foundation, NSF D ata M anagement Plan Requirements. http: //www.nsf gov/eng/general/dmp.jsp (accessed February 23, 2015). (2) National Science Foundation, Dissemination and Sharing of Research Results. http: //www.nsf gov/bfa/dias/policy/dmp.jsp (accessed = February 23, 2015). (3) Department of Energy, Statement on Digital Data Management. http: //science. = energy.gov/funding-opportunities/digital-datamanagement/ (4) National Institutes of Health, NIH Data Sharing Policy. http: // grants.nih.gov/grants/policy/data sharing/ (accessed February 23, 2015). (5) Hallenbeck, A. P.; Kitchin, J. R. Ind. Eng. Chem. Res. Z013, 52, 10788-10794. (6) Miller, S. D.; Pushkarev, V. V.; Gellman, A.J.; Kitchin, J. R. Top. Caial. Z014, 57, 106 =E2=80=94 117. (7) Curnan, M. T.; Kitchin, J. R. J.Phys. Chem. C Z014, 118, 2877628790. (8) Xu, Z.; Kitchin, J. R. J. Phys. Chem. C Z014, 118, 25597 =E2=80=94 = 25602. (9) Xu, Z.; Kitchin, J. R. Caial. Commun. Z014, 52, 60 =E2=80=94 64. (10) Mehta, P.; Salvador, P. A.; Kitchin, J. R.ACS Appl. Mater. = Interfaces 2014, 6, 3630-3639. (I I) Boes, J. R.; Gumuslu, G.; Miller, J. B.; Gellman, A.J.; Kitchin, = J. R. ACS Caial. Z015, 5, 1020 =E2=80=94 1026. (12) Boes, J. R.; Gumuslu, G.; Miller, J. B.; Gellman, A.J.; Kitchin, J. = R. ACS Caial. Z015, Supporting information. Vol. 5, p 1020, 10.1021/ cs501585k (13) PDF Laboratories, PDFtk the pdf toolkit. https://www.pdflabs. corn/tools/pdftk-the-pdf-toolkit/, h t tps:// www.pdflabs.corn/tools/ pdftk-the-pdt-toolkit/ (14) Pakin, S. attach file. http: //www.ctan.org/tex-archive/macros/ latex/contrib/attachhle, vl.Sb. (15) Python Software Foundation, Python. https://www.python.org (16) Millman, K. J.; Aivazis, M. Compui. Sci. Eng.Z011, 13, 9 =E2=80=94 = 12. (17) Perkel, J. M. Nature Z015, 518, 125 =E2=80=94 126. (18) Bahn, S. R.; Jacobsen, K. W. Compui. Sci. Eng.ZOOZ, 4, 56 =E2=80=94 = 66. (19) Xu, Z.; Rossmeisl, J.; Kitchin, J. R. Supporting data for: A linear response, DFT+U study of trends in the oxygen evolution activity of transition metal rutile dioxides. https://zenodo.org/record/12635, 10.5281/zenodo.12635. (20) Dominik, C. The Org. Mode8 Reference Manual - Organize your lif e with GNU Emacs; Samurai Media Limited: Hong Kong, 2014. (21) Free Software Foundation, Emacs. https://www.gnu.org/ software/emacs/emacs.html, v24.3 (22) Schulte, E.; Davison, D. Compui. Sci. Eng.Z011, 13, 66 =E2=80=94 = 73. (23) Schulte, E.; Davison, D.; Dye, T.; Dominik, C. Journal of = Statistical SoftwareZOIZ, 46, I =E2=80=94 24. (24) Stephenson, N. In Ihe Beginning "was Ihe Command Line, 1st ed. ed.; William Morrow Paperbacks: New York, 1999. (25) Zenodo. https://zenodo.org, Zenodo builds and operate a simple AS S O CIATED CONTENT O Supporting Information The Supporting Information is available free of charge on the ACS Publications website at DOI: 10.1021/acscatal.Sb00538. The manuscript org source, and the output from the code and innovative service that enables researchers, scientists, EU = projects, and institutions to share and showcase multidisciplinary research = results (data and publications) that are not part of the existing institutional = or subject-based repositories of the research communities. listings (P D F ) =E2=80=A2 =E2=80=A2 AU T H OR INFORMATION (26) figshare. http: //hgshare.corn, figshare helps academic = institutions Corresponding Author *E-mail: jkitchingandrew.cmu.edu. store, share and manage all of their research outputs (27) Data Science at The Institute for Quantitative and Social Science, Notes The Dataverse Project. http: //dataverse.org (accessed February 23, 2015). The authors declare no competing financial interest. 3898 Dol:10a 021iacscata tsb00538 Acs Catat 2015, 5, 3894 =E2=80=94 3899 =0CREEKER ACS Cata lys is (28) Borgman, C. L.J. Am.J.Am. Soc. Inf. Sci. Technol.Z01Z, 63, = 10591078. (29) MathWorks, MATLAB Notebook. http: //www.mathworks.corn/ help/matlab/matlab prog/create-a-matlab-notebook-with-microsoftword. = html, The MATLAB notebook integrates Microsoft Word and MATLAB to create a functional document with integrated code and results. (30) Perez, Fa Granger, B. E. Compuf. Sci. Eng.Z007, 9, 21 =E2=80=94 29. (31) Project Jupyter. http: //jupyter.org/, The Jupyter Project provides a web-browser based computational notebook w it h a r a nge o f computational backends including Python, Julia, R and others. (32) Hinsen, K. Procedia Compufer ScienceZ011, 4, 579 =E2=80=94 588. (33) Fomel, Sa Claerbout, J. F. Compuf. Sci. Eng. Z009, 11, 5 =E2=80=94 = 7. (34) Implemenfing Reproducible Research;Stodden, Va Leisch, Fa Peng, R. D., Edsa Chapman and Hall/CRC: London, 2014. 3899 DOI: 10.1021/acscata 1.5b00538 AC5 Cata/. 2015, 5, 3894 =E2=80=94 3899 =0C= --Apple-Mail=_B055F875-3E11-4813-B99B-1C24A61985FB Content-Transfer-Encoding: 7bit Content-Type: text/html; charset=us-ascii
--Apple-Mail=_B055F875-3E11-4813-B99B-1C24A61985FB-- --Apple-Mail=_8B1C08BD-0F3A-4E3A-9CA9-DCC292341BF3--