#+title: Literate programming and reproducible research #+author: Professor Eric S Fraga, UCL #+macro: lastchange 2018.06.06 10:36 * Introduction ** The aim & challenge *** Reproducible research To ensure that our research is *reproducible* both by ourselves and by others. *** Coding, processing, writing In doing research, 1. we all *write* programs ... 2. which generate *results* ... 3. which need to be *processed* ... 4. and which should be *disseminated*. Currently, we use different tools for each step. *** Tools Workflow: - coding :: IDE, MATLAB editor, ~vi~, ~notepad~, ... - results :: ~.txt~, ~.xls~, ~.dat~ - processing :: spreadsheet, ~R~ - dissemination :: ~Word~, ~PowerPoint~, LaTeX, ~beamer~ - project management :: ? leading to constant *transfer* of data from one place to another. ** Literate programming *** Definition #+begin_quote Literate programming is a programming paradigm introduced by Donald Knuth in which a program is given as an explanation of the program logic in a natural language, such as English, interspersed with snippets of macros and traditional source code, from which a compilable source code can be generated. #+end_quote https://en.wikipedia.org/wiki/Literate_programming *** Code and documentation #+attr_latex: :width 0.8\textwidth [[file:literate-programming-tangling.png]] [[http://howardism.org/Technical/Emacs/literate-programming-tutorial.html][Source]] *** Example: the Strawberry algorithm [[file:strawberry-outline-withborder.png]] [[https://www.ucl.ac.uk/~ucecesf/strawberry.html]] ** Emacs *** Editor - originally written in 1976 - content aware editing - fully extensible in ~Emacs LISP~ - self-documenting with comprehensive help system - large eco-system of packages https://www.gnu.org/software/emacs/ *** org mode Text (*it's all text*) based mode for - writing & dissemination - project management - literate programming https://orgmode.org/ * org mode ** Programming *** Example objective function \begin{align*} \min_x z &= \sum_{i=1}^{n} x_i^4 + x_i^3 - 2x_i^2 \\ x & \in [a,b]\cap\mathbb{R}^n \end{align*} *** Plot of objective function #+results: plot-f [[file:f.png]] :plotcode: #+name: plot-f #+header: :term "png crop" #+begin_src gnuplot :file f.png :exports results reset set cntrparam levels 40 bspline set contour base set hidden3d unset key set view 60,326 set xrange [-2:2] set yrange [-2:2] unset xtics unset ytics unset ztics unset zlabel splot x**4+y**4+x**3+y**3-2*x*x-2*y*y #+end_src :end: *** Tangling Create an ~octave~ file with the objective function using *tangling*: #+name: octave-f #+begin_src octave :tangle "f.m" function [z g] = f(x) z = sum(x.^4 + x.^3 - 2*x.^2); g = 0; % unconstrained endfunction #+end_src *** Code segments Specify the parameters for the optimisation problem: #+name: problem-setup #+begin_src octave :noweb yes n = 2; x0 = rand(n,1); a = -2*ones(n,1); b = 2*ones(n,1); #+end_src *** Bringing bits together The code to include the problem setup directly and solve the problem: #+name: strawberry #+begin_src octave :results value :noweb yes clear format short <> [x y] = strawberry(x0, a, b, @f, 10, 10); [x;y]' #+end_src :results: #+name: strawberry-results #+results: strawberry | -1.42716504443494 | -1.443856778049211 | -5.66531336687728 | 0 | | -1.42716504443494 | -1.443856778049211 | -5.66531336687728 | 0 | | -1.342508585413797 | -1.55250372785845 | -5.529004698055015 | 0 | | -1.360506726205515 | -1.372263538977069 | -5.598345289891352 | 0 | | -1.468158011926931 | -1.544415421188739 | -5.594381839388814 | 0 | | -1.339213003233875 | -1.36422094977677 | -5.569697214609254 | 0 | | -1.364417892433951 | -1.491553912171922 | -5.615971023198005 | 0 | | -1.463919469920687 | -1.450505968993437 | -5.663754045524612 | 0 | | -1.445565121464613 | -1.420730525034394 | -5.663799364174422 | 0 | | -1.525782025485549 | -1.5303639151147 | -5.571556830521184 | 0 | | -1.40352725450047 | -1.472972377299925 | -5.651863813417487 | 0 | | -1.410286649811288 | -1.414343225362287 | -5.655462642214252 | 0 | | -1.52277571384937 | -1.446360898144175 | -5.62508035617786 | 0 | | -1.584751338098946 | -1.373116848847318 | -5.500483131429339 | 0 | | -1.243018966119808 | -1.476516324299424 | -5.449770831917819 | 0 | | -1.444138322679675 | -1.666222711962772 | -5.304110381262799 | 0 | | -1.370072015649149 | -1.525937116930233 | -5.590709124514991 | 0 | | -1.576526755720007 | -1.530518104917815 | -5.494784345778852 | 0 | | -1.507786501304049 | -1.37902437675949 | -5.615657626549971 | 0 | | -1.450567499881924 | -1.408583920258332 | -5.659381058555995 | 0 | | -1.535526266987041 | -1.456193641066952 | -5.609133134458752 | 0 | | -1.238285464782562 | -1.25198387068423 | -5.254687157245281 | 0 | | -1.321030271837289 | -1.420499018517368 | -5.580507706519432 | 0 | | -1.457276546364113 | -1.510431969520121 | -5.636060671059006 | 0 | | -1.386994515876665 | -1.452244319822924 | -5.647806776697823 | 0 | | -1.455869791504375 | -1.446387391362544 | -5.665742757998999 | 0 | | -1.62382235873763 | -1.442265536388754 | -5.436001199588198 | 0 | | -1.455447362960834 | -1.404854864255555 | -5.657173423616351 | 0 | | -1.352852883884089 | -1.468988859997606 | -5.615930658043434 | 0 | | -1.316282451554356 | -1.481169331825285 | -5.568062394335248 | 0 | | -1.346872171742821 | -1.559721739239854 | -5.52227606914515 | 0 | | -1.392193082324921 | -1.540475777984804 | -5.588469535120524 | 0 | | -1.358676764875123 | -1.533272203185758 | -5.572011634831901 | 0 | | -1.430102855019641 | -1.499714214014829 | -5.645120932555698 | 0 | | -1.375336252513826 | -1.351372446488072 | -5.591923631732472 | 0 | | -1.323532458220509 | -1.344166563266232 | -5.531090335022149 | 0 | | -1.493260210372538 | -1.448067147541937 | -5.650501934854817 | 0 | | -1.497055854832323 | -1.437039672362705 | -5.647851947473225 | 0 | :end: *** Plotting results Using data in table of results on previous slide: #+results: plot-results [[file:results.png]] :plotcode: #+name: plot-results #+header: :term "png size 600,600 crop" #+begin_src gnuplot :var data=strawberry-results :file results.png reset set contour base set cntrparam levels 40 set xrange [-2:2] set yrange [-2:2] unset key unset xtics unset xlabel unset ytics unset xlabel set view 0,0 splot data using 1:2:3 with points pt 6 ps 1, x**4+y**4+x**3+y**3-2*x*x-2*y*y #+end_src :end: *** Processing results Statistical analysis of results obtained above: | Statistic | Value | |--------------------+--------| | Best | -5.666 | | Average | -5.582 | | Worst | -5.255 | | Standard deviation | 0.094 | #+TBLFM: @2$2=vmin(remote(strawberry-results,@<$>>..@>$>>));f3::@3$2=vsum(remote(strawberry-results,@<$>>..@>$>>))/vcount(remote(strawberry-results,@<$>>..@>$>>));f3::@4$2=vmax(remote(strawberry-results,@<$>>..@>$>>));f3::@5$2=vsdev(remote(strawberry-results,@<$>>..@>$>>));f3 ** Writing *** Outlines Example (a recent paper): [[file:impact-paper-outline.png]] Can show, hide, and move individual sub-trees. *** COMMENT Content - export can handle mathematics: \[ y = \sqrt{x} \] - figures: [[file:~/s/personal/avatar-australia-hat-sunglasses-64x64.png]] - tables (as we have already seen) but with formatting: | Item | Description | | | | |------+-------------| | One | Interesting | | Two | Boring | *** COMMENT Inline LaTeX for full control - We can include inline LaTeX directives. - For instance, \fbox{in a box} would output as you expect. *** Publishable output ~org~ will *export* to LaTeX (and hence to ~PDF~) or ~ODT~ (~MS Word~ compatible). [[file:odt-export.png]] *** Project management Support for tasks, scheduling, appointments: **** TODO [3/4] prepare and give presentation on literate programming DEADLINE: <2018-06-06 Wed> - [X] collect images - [X] write slides - [X] book hotel - [ ] give presentation *** Revision control - A research project is a long term activity comprised of many individual tasks. - Revision control should (*must*) be an integral element of project management. - Think *track changes* but on steroids and which works for *data* as well. - Excellent tools exist: ~git~, ~mercurial~, ~subversion~, ... *** Example of revision control [[file:hg-log-view.png]] * Conclusions ** Summary *** Emacs & org Single tool for *writing*, *coding*, *data manipulation*, *data provenance*, *dissemination*, and *project management*. *** Testimonial I #+begin_quote By the age of 35 you should have realized that Emacs is the One True Editor and should have embraced it. If that’s not the case - your life so far has been completely wasted. #+end_quote [[https://twitter.com/bbatsov/status/998217369204948992][@bbatsov, 04:02 pm May 20, 2018]] *** Testimonial II #+begin_quote The advantages of plain text are hard to overstate, as is the advantage of having everything from plot notes to research material in a single (large) file under version control. And building up a novel from an outline is a natural process with org-mode. #+end_quote Bob Newell, =emacs.help= newsgroup, 2018-05-30. *** Links - Emacs :: https://www.gnu.org/software/emacs/ - Complete computing environment :: http://doc.rix.si/cce/cce.html - org mode :: https://orgmode.org/ - reproducible research :: https://reproducibleresearch.net/links/ - blog :: [[https://dfeich.github.io/www/org-mode/emacs/reproducible-research/2018/05/20/reproducible-research-for-management.html][reproducible research for management]] *** And finally [[file:real_programmers.png]] https://www.xkcd.com/378/ ** COMMENT Revision log #+begin_example $Revision: 1.33 $ $Log: talk.org,v $ Revision 1.33 2018/06/04 08:14:55 ucecesf Summary: reformatted (unfilled) second testimonial Revision 1.32 2018/06/04 08:09:32 ucecesf Summary: added another Emacs link Revision 1.31 2018/05/30 17:09:48 ucecesf Summary: added another testimonial Revision 1.30 2018/05/28 16:08:22 ucecesf Summary: some blank lines added for clearer slides Revision 1.29 2018/05/26 09:46:24 ucecesf Summary: more commentary and changed order of last two slides Revision 1.28 2018/05/25 06:51:00 ucecesf Summary: added slide on project management and one on links for further information Revision 1.27 2018/05/22 09:15:09 ucecesf Summary: added xkcd real programmers use Emacs slide Revision 1.26 2018/05/22 06:12:37 ucecesf Summary: updated objective function example with subscripts Revision 1.25 2018/05/21 19:04:07 ucecesf Summary: minor reformatting and added link to Strawberry Revision 1.24 2018/05/21 18:58:01 ucecesf Summary: added mercurial example Revision 1.23 2018/05/21 18:52:08 ucecesf Summary: added testimonial on Emacs from twitter Revision 1.22 2018/05/21 16:28:25 ucecesf Summary: added some commentary and removed superfluous screenshots Revision 1.21 2018/05/21 16:14:49 ucecesf Summary: added aim and LP figure from web Revision 1.20 2018/05/20 14:01:41 ucecesf Summary: updated the objective function to be more mathematically precise Revision 1.19 2018/05/20 12:40:57 ucecesf Summary: added ODT export image and reformatted conclusions Revision 1.18 2018/05/20 12:32:19 ucecesf Summary: minor formatting to make PDF export look good Revision 1.17 2018/05/20 12:23:46 ucecesf Summary: added impact paper outline as example Revision 1.16 2018/05/20 12:09:36 ucecesf Summary: added project management to summary Revision 1.15 2018/05/20 12:06:29 ucecesf Summary: moved plot of results and hid code Also ensured that LaTeX fragments were displayed and text was scaled appropriately. Revision 1.14 2018/05/20 11:52:33 ucecesf Summary: added plot of objective function and show results with contours Revision 1.13 2018/05/18 22:45:25 ucecesf Summary: added plotting of results slide Revision 1.12 2018/05/18 17:46:33 ucecesf Summary: added Strawberry folded example For literate programming and for outlines in org. Revision 1.11 2018/05/18 17:38:10 ucecesf Summary: made literate programming slide top level Revision 1.10 2018/05/18 17:37:16 ucecesf Summary: customised page up/down for slide transitions Revision 1.9 2018/05/18 17:04:19 ucecesf Summary: added revision log Revision 1.8 2018/05/18 17:00:28 ucecesf Summary: added conclusions revision 1.7 2018/05/18 16:57:44 ucecesf Summary: added motivation section revision 1.6 2018/05/18 08:03:39 ucecesf Summary: settings for beamer export revision 1.5 2018/05/17 17:44:07 ucecesf Summary: started talking about writing revision 1.4 2018/05/17 17:31:35 ucecesf Summary: turn off beacon mode to avoid annoying revision 1.3 2018/05/17 17:28:12 ucecesf Summary: introduce noweb for code segments revision 1.2 2018/05/16 19:30:09 ucecesf Summary: added some octave code and tables revision 1.1 2018/05/16 16:41:50 ucecesf Initial revision #+end_example * settings :noexport: ** org #+PROPERTY: cache yes *** beamer settings #+startup: beamer Change this setting depending on whether there are sections for the talk or not, with 2 for sections, 1 for no sections and 3 for subsections as well. #+options: H:3 The theme can be ~minimal~, ~progressbar~, or anything else. #+beamer_theme: minimal If links are used directly, colour them gray. #+latex_header: \hypersetup{colorlinks=true,urlcolor=gray} #+macro: actualdate 6 June 2018 #+macro: where EGL2018, Essex #+institute: University College London # +LATEX_HEADER: \institute{University College London (UCL)} *** date formatting with version information :ignoreheading: **** COMMENT git #+NAME: mydateline #+BEGIN_SRC emacs-lisp (format "#+DATE: \\copyright{} %s\n" *this*) #+END_SRC src_shell[:post mydateline() :results raw]{echo -n $(date +%Y) '@@latex:\\ \vspace*{0.1cm} \tiny \color{gray}@@' version $(git log --format=format:"%ad %h" --date=short | head -1 )} **** COMMENT mercurial #+NAME: mydateline #+BEGIN_SRC emacs-lisp (format "#+DATE: \\copyright{} %s\n" *this*) #+END_SRC src_shell[:post mydateline() :results raw]{echo -n $(date +%Y) '@@latex:\\ \vspace*{0.1cm} \tiny \color{gray}@@' version $(hg log slides.org | head -1 | sed -e 's/^.* \([0-9]*\):.*$/\1/')} **** rcs #+latex_header: \usepackage{rcs} #+latex_header: \RCS $Revision: 1.33 $ #+latex_header: \RCS $Date: 2018/06/04 08:14:55 $ #+date: @@latex:\ifdef{\institute}{@@ {{{actualdate}}} @@latex:\\@@ {{{where}}} @@latex:\\ \vfill\hfill{\tiny\color{gray}v\RCSRevision~\RCSDate}}{@@ @@latex:}@@ *** macros **** calc: short better formatted version of calculate macro If the second argument is not given, no variable is stored or shown in the output. #+macro: calc src_emacs-lisp[:results latex]{(esf/calc-and-output "$1" "$2")} The macro relies on the following code: #+name: calc-and-output #+begin_src emacs-lisp :results silent :exports none (defun esf/calc-and-output (expression variable) (let ((result (string-to-number (calc-eval (format "evalv(%s)" expression))))) (message "Expression %s results in %s" expression result) (if (string= "" variable) (format "%s = \\fbox{%s}" expression result) (progn (eval (format "(setq var-%s %s)" variable result)) (format "\\texttt{%s} \\(\\gets\\) %s = \\fbox{%s}" variable expression result)) ) )) #+end_src **** calculate: use emacs calc to evaluate expressions and assign variables # use listings to export the code evaluated #+latex_header: \lstdefinelanguage{calc}{} # evaluate the code and format the output #+macro: calculate $2 \(\gets\) src_calc[:exports code]{$1} = @@latex:\fbox{@@ src_emacs-lisp{(setq var-$2 (string-to-number (calc-eval "evalv($1)")))} @@latex:}@@ **** cite: macro for citing work and url to actual source # +macro: cite @@latex:\vfill\Citation{$1}@@@@html:


@@ # alternative cite macro for LaTeX only but with working link #+macro: cite [[$2][@@latex:\vfill\Citation{$1}@@]] #+latex_header: \newcommand{\Citation}[1]{\hfill{\scriptsize{\color{gray}#1}}} **** overlay: for absolute positioning of images etc. #+latex_header: \usepackage[overlay]{textpos} \TPGrid[0pt,0pt]{20}{20} #+macro: overlay @@latex:\begin{textblock}{$4}($2,$3)@@[[file:$1]]@@latex:\end{textblock}@@ *** org startup on file visit #+name: startup #+begin_src emacs-lisp :results none (defun esf/next-slide-or-page-up () (interactive) (if (and (boundp 'org-tree-slide-mode) org-tree-slide-mode) (org-tree-slide-move-next-tree) (scroll-up-command))) (local-set-key (kbd "") 'esf/next-slide-or-page-up) (defun esf/previous-slide-or-page-down () (interactive) (if (and (boundp 'org-tree-slide-mode) org-tree-slide-mode) (org-tree-slide-move-previous-tree) (scroll-down-command))) (local-set-key (kbd "") 'esf/previous-slide-or-page-down) (add-hook 'org-tree-slide-play-hook #'(lambda () (setq display-line-numbers nil) (beacon-mode -1) (setq evil-normal-state-cursor 'bar) ;; (evil-emacs-state) (hl-line-mode 0) (text-scale-set 2) (org-toggle-latex-fragment) (org-toggle-inline-images))) (add-hook 'org-tree-slide-stop-hook #'(lambda () (setq display-line-numbers 'visual) (beacon-mode 1) ;; (evil-normal-state) (setq evil-normal-state-cursor 'box) (hl-line-mode 1) (text-scale-set 0) (org-toggle-latex-fragment) (org-toggle-inline-images))) (org-content 2) (setq-local org-confirm-babel-evaluate nil) (setq-local org-export-allow-bind-keywords t) (setq-local org-fontify-quote-and-verse-blocks t) (setq-local org-format-latex-options '(:background default :foreground default :scale 4)) (setq-local org-latex-image-default-height "5cm") (setq-local org-latex-image-default-width nil) (setq-local org-latex-pdf-process '("pdflatex -interaction nonstopmode %f")) (setq-local org-tree-slide-breadcrumbs " ‣ ") ;; (setq-local org-tree-slide-cursor-init nil) (setq-local org-tree-slide-fold-subtrees-skipped nil) (setq-local org-tree-slide-skip-outline-level 4) ;; (setq-local sentence-highlight-mode nil) (setq-local time-stamp-line-limit 1000) (setq-local time-stamp-format "%04y.%02m.%02d %02H:%02M") (setq-local time-stamp-active t) (setq-local time-stamp-start "#\\+macro:[ \t]* lastchange[ \t]* ") (setq-local time-stamp-end "$") #+end_src ** local variables # Local Variables: # org-confirm-babel-evaluate: nil # eval: (esf/execute-startup-block) # End: