From: Christopher Genovese <genovese.cr@gmail.com>
To: emacs-orgmode@gnu.org
Subject: Four issues with org-babel-tangle
Date: Wed, 14 Sep 2011 15:51:16 -0400 [thread overview]
Message-ID: <CAPum5FgaooNEZs8Ny8Y3VD1EJYsZXQ6ZwetiT9UR4OEC7wW4=g@mail.gmail.com> (raw)
[-- Attachment #1.1: Type: text/plain, Size: 25518 bytes --]
/Semi-verbose Preamble/.
Having recently begun intensive use of org-mode for tangling source
files, I encountered four issues related to comment extraction (two
bugs, one undesirable behavior, and one ... unfulfilled need), which I
describe in detail below. I started by creating an org file that would
reproduce the problems, and soon started /describing/ the problems in
the org file as well as putting my fixes in the source blocks. At the
risk of it being too meta or annoying, I've included that org file at
the end of this message as the problem description. All the details are
there as well as two fixes. Tangling that file in various ways described
demonstrates the problems, and you can export to PDF for nicer reading.
(I've attached the PDF to this mail for convenience. It looks good;
kudos, org-mode!) I've also attached a tarball with files that make it
easy to try my changes and to restore the original behavior, as well as
tests and results from the org file for easy comparison. See the
included README.
I've been using the revised code now for a few days. It fixes the
problems I describe, and I think it provides a flexible framework for
comment extraction with minimal change to the base code. If the reaction
to this is positive, I will happily submit a patch, sign paperwork, or
whatever is needed, after fixing any problems that you all see. In any
case, I very much look forward to any feedback you can offer. Thanks.
-- Christopher
P.S. In case the attachments get dropped, I've put the PDF and the
tarball at
http://www.stat.cmu.edu/~genovese/depot/tangle-error.pdf
http://www.stat.cmu.edu/~genovese/depot/tangle-bundle.tgz
/Problem Description/
################ Cut Here ####################
# -*- org-return-follows-link: t; comment-empty-lines: t; -*-
#+TITLE: Tangle this file: four issues with org-babel-tangle
#+AUTHOR: Christopher Genovese
#+DATE: 14 Sep 2011\vspace*{-0.5cm}
#+EMAIL: genovese@cmu.edu
#+OPTIONS: toc:1 H:1
#+BABEL: :tangle yes :comments org :results silent :exports code
#+BIND: org-export-latex-hyperref-format "\\hyperlink{%s}{%s}"
#+STARTUP: showall
#+LATEX_HEADER: \usepackage[labelsep=period,labelfont=bf]{caption}
#+LaTeX: \vspace*{-1cm}\vspace*{-2pt}
* Four Related Issues with org-babel-tangle
Running org-mode version 7.7 on both Gnu Emacs 23.2.1 and 24.0.50.1
with Mac OS X 10.5.8 (with and without a -Q option), I encountered the
following issues/problems/bugs when tangling files with code blocks
for which the :comment header argument is org:
1. The subtree associated with the very first code block has
its headline tangled without leading stars, but all subsequent
sub-trees associated with code blocks have the leading stars
included in the comments.
2. If the first code block comes before the first headline, the start
of the comment text will be determined by pre-existing match data
and thus will likely be incorrect.
3. Org structural elements such as headline stars, =#+= control lines,
and drawers are included in the comments.
4. There is no way easy to delimit comment text or transform it that
does not also change the structure of the org file (e.g., by adding
headlines or source blocks).
Issues 1 and 2 seem to be genuine bugs. Issues 3 and 4 are more
subjective, I admit, but seem undesirable. Stars, drawers, and
control lines are org structure rather than content, and they
are often inappropriate in comments.
To reproduce the behaviors for issues 1 and 3, look at the result of
tangling this file. To reproduce issue 2 as well, remove the first two
stars from this file and tangle again. Alternatively, within emacs,
evaluate the =(buffer-substring ...)= sexp from the original-code code
below at the beginning character of a source block. (You
can also export this file to PDF for more pleasant reading.)
Below, I give details on these issues and code for two fixes:
a [[simple-fix][simple fix]] that handles the first two issues and the
stars in the third,
and a [[preferred-fix][better fix]] that handles all four issues in a more
modular, customizable framework. I'd be interested in hearing
feedback on all of this. If the reaction is positive, I will gladly
submit a patch. Thanks for your consideration.
* The Original Code and Details on the Problem
The relevant section of the original code from org-mode version 7.7
is shown below, comprising lines 344 through 357
of ob-tangle.el in function =org-babel-tangle-collect-blocks=. With point
is at a =#+begin_src=, it scans back either
for a heading line or for the end of the previous code block,
whichever comes later. The resulting region becomes the
comment text.
#+latex: \begin{figure}[h]
#+latex: \hypertarget{original-code}{} % <<original-code>>
#+source: original-code
#+begin_src emacs-lisp
(comment
(when (or (string= "both" (cdr (assoc :comments params)))
(string= "org" (cdr (assoc :comments params))))
;; from the previous heading or code-block end
(buffer-substring
(max (condition-case nil
(save-excursion
(org-back-to-heading t) (point))
(error 0))
(save-excursion
(re-search-backward
org-babel-src-block-regexp nil t)
(match-end 0)))
(point))))
#+end_src
#+latex: \caption{Original Code, lines 344--357 in {\tt ob-tangle.el}.}
#+latex: \label{fig::original-code}
#+latex: \end{figure}
/Issue 1/. When in the first code block in the file, the
second search fails (there is no previous code block),
so the (match-end 0) call uses the match data from the (implicit) match
during
=org-back-to-heading=, which skips the stars. (Not a particularly
transparent reference, incidentally.) For subsequent blocks, the
=(match-end 0)= gives the end of the previous code block, which in these
examples is earlier than the previous headline location.
/Issue 2/. When the first code block lies before the first headline
(say with some text before it), the searches fail in /both/ clauses of
the max. So, the =match-end= will return an essentially arbitrary
result, which is a bug.
/Issue 3/. =org-back-to-heading= leaves point at the beginning of the
stars, so a headline included in the text will have stars, except for
the first one.
/Issue 4/. Control lines at the end of the previous code block and
before point are not filtered out and so are included in the comments.
* A Simple Fix for the First Three Issues
A small change addresses issues 1, 2, and the stars for issue 3: in
both cases, simply use the =match-end= and replace 0 values with
=(point-min)=. The latter gives a sensible result even if both
computed positions are trivial (as when the first code block comes
before the first headline) and respects narrowing.
#+latex: \begin{figure}[h]
#+latex: \hypertarget{simple-fix}{} % <<simple-fix>>
#+begin_src emacs-lisp
(comment
(when (or (string= "both" (cdr (assoc :comments params)))
(string= "org" (cdr (assoc :comments params))))
;; from the previous heading or code-block end
(buffer-substring
(max (condition-case nil
(save-excursion
(org-back-to-heading t) ; sets match data
(match-end 0))
(error (point-min)))
(save-excursion
(if (re-search-backward
org-babel-src-block-regexp nil t)
(match-end 0)
(point-min))))
(point))))
#+end_src
#+latex: \caption{Simple Fix, replacement for lines 344--357 in {\tt
ob-tangle.el}.}
#+latex: \label{fig::simple-fix}
#+latex: \end{figure}
* A Fix for All Four Issues
A better fix that handles issues 1--4 starts with the region computed as
in the [[simple-fix][simple fix]]
and then processes that text through a user-configurable sequence of
functions
to derive the final form of the comment text.
The following changes are required.
** Extract Initial Comment Text and State from Org Buffer
The initial comment text ranges from either the most recent headline
at the point after the stars, the beginning of the line after the
=#+end_src= of the most recent code block, or the beginning of the
buffer, whichever is later, through the line before the source
block.[fn:1]
The [[preferred-fix][code]] to extract this is given below.
#+latex: (See Figure \ref{fig::preferred-fix}.)
It replaces lines 344 through 357 of
=ob-tangle.el= from org-mode version 7.7 in the function
=org-babel-tangle-collect-blocks=.
#+latex: \begin{figure}[h]
#+latex: \hypertarget{preferred-fix}{} % <<preferred-fix>>
#+begin_src emacs-lisp
(comment
(when (or (string= "both" (cdr (assoc :comments params)))
(string= "org" (cdr (assoc :comments params))))
(let* ((prev-heading
(condition-case nil
(save-excursion
(org-back-to-heading t) ; sets match data
(match-end 0))
(error (point-min))))
(end-of-prev-src-block
(save-excursion
(if (null (re-search-backward
org-babel-src-block-regexp nil t))
(point-min)
(goto-char (match-end 0))
(forward-line 1)
(point))))
(comment-start
(max prev-heading end-of-prev-src-block))
(comment-end
(save-excursion
(forward-line 0)
(point)))
(state
(list (cons 'org-drawers
org-drawers)
(cons 'after-heading
(= comment-start prev-heading))
(cons 'first-line
(= comment-start (point-min))))))
(org-babel-process-comment-text
(buffer-substring comment-start comment-end) state))))
#+end_src
#+latex: \caption{Better Fix, replacement for lines 344--357 in {\tt
ob-tangle.el}.}
#+latex: \label{fig::preferred-fix}
#+latex: \end{figure}
** Adjust =org-babel-spec-to-string=
The commment block collected by the [[original-code][original code]]
#+latex: (Figure \ref{fig::original-code})
in =org-babel-tangle-collect-blocks= is further processed in \newline
=org-babel-spec-to-string= to trim leading and trailing whitespace
from string. This was needed because spaces after a source block were
included in the comment. In the revised code, however, this space
trimming is handled during text transformation, except for removing
trailing newlines. (Note: trailing /spaces/ are not removed to allow
more flexibility in comment processing.) Hence,
=org-babel-spec-to-string= needs to be slightly adjusted.
#+latex: See Figure \ref{fig::spec-string-diff}.
#+latex: \begin{figure}[h]
#+begin_example
--- ob-tangle.el 2011-09-14 11:48:26.000000000 -0400
+++ new-ob-tangle.el 2011-09-14 11:55:56.000000000 -0400
@@ -398,3 +398,3 @@
(flet ((insert-comment (text)
- (let ((text (org-babel-trim text)))
+ (let ((text (org-babel-chomp text "[\f\t\n\r\v]")))
(when (and comments (not (string= comments "no"))
#+end_example
#+latex: \caption{Changes to {\tt org-spec-to-string} in {\tt
ob-tangle.el}, unified diff, one line of context}
#+latex: \label{fig::spec-string-diff}
#+latex: \end{figure}
** Process Comment Text Through Sequence of Transforms
At the end of the revised [[preferred-fix][comment collection code]],
the comment text is passed to
=org-babel-process-comment-text= which
applies a sequence of transformation functions.
#+latex: (See Figure \ref{fig::comment-transformation}.)
The list of transformation functions is stored in a customizable
variable described [[Define Customization Variable for
Transforms][below]]. Several predefined transformations are
given [[Define A Collection of Transform Functions][below]] as well.
#+latex: \begin{figure}[h]
#+begin_src emacs-lisp
(defun org-babel-process-comment-text (text &optional state)
"Apply list of transforms to comment TEXT assuming bindings in alist
STATE.
Returns the modified text string, which may have text properties.
See `org-babel-comment-processing-functions' for the transforms to be
applied and details on the allowed keys in the STATE alist."
(let ((funcs org-babel-comment-processing-functions))
(with-temp-buffer
(insert text)
(let ((org-drawers
(or (cdr (assoc 'org-drawers state))
org-drawers))
(after-heading
(cdr (assoc 'after-heading state)))
(first-line
(cdr (assoc 'first-line state))))
(while funcs
(goto-char (point-min))
(funcall (car funcs))
(setq funcs (cdr funcs))))
(buffer-substring (point-min) (point-max)))))
#+end_src
#+latex: \caption{Better Fix, comment transformation driver.}
#+latex: \label{fig::comment-transformation}
#+latex: \end{figure}
** Define Customization Variable for Transforms
A list of nullary functions applied in order to
the comment text. The text is inserted
in a temporary buffer, so these functions can use
the entire Emacs library for operating on buffer text.
#+latex: See Figure \ref{fig::comment-transformation-function-list}.
#+latex: \begin{figure}[h]
#+begin_src emacs-lisp
(defcustom org-babel-comment-processing-functions
'(org-babel-comment-delete-file-variables-line
org-babel-comment-delete-org-control-lines
org-babel-comment-delete-drawers
org-babel-comment-trim-blank-lines
org-babel-comment-trim-indent-prefix)
"List of functions to transform source-block comment text before
insertion.
Each function will be called with no arguments with point at the
beginning of a buffer containing only the comment text. Each
function can modify the text at will and leave point anywhere,
but it should *not* modify the narrowing state of the buffer.
Several dynamic state variables are set prior to execution that
each function can reference. These currently include:
+ org-drawers: names of drawers in the original org buffer.
+ from-heading: t if comment starts at an org heading line,
nil otherwise.
+ first-line: t if initial comment starts on first line
of the original org buffer, nil otherwise.
If a function changes the value of these state variables, the new
value will be seen by all following functions in the list, but
this is not generally recommended.
The functions in this list are called *in order*, and this order
can influence the form of the resulting comment text."
:group 'org-babel
:type 'list)
#+end_src
#+latex: \caption{Better Fix, customizable transform list.}
#+latex: \label{fig::comment-transformation-function-list}
#+latex: \end{figure}
** Define A Collection of Transform Functions
An advantage of this design is that transformation of the
comments is modular and customizable. We can include in
=ob-tangle.el= a collection of pre-defined transforms.
The default processing stream in =org-babel-comment-processing-functions=
is as follows:
1. Delete a file variables if on the first line of the buffer.
2. Delete all drawers and their contents.
3. Delete all org control lines from the comment text.
4. Trim blank lines from the beginning and end.
5. Reindent the text by removing the longest common leading
string of spaces.
#+ TANGLE: end-comment
These and several other useful transforms are given below
(e.g., deleting drawer delimiters but not contents)..
#+latex: See Figures \ref{fig::transformA}--\ref{fig::transformZ}.
It is easy to define new transforms; any function that
operates on text in the current buffer beginning at point-min
will work.
#+latex: \begin{figure}[h]
#+begin_src emacs-lisp
(defun org-babel-comment-delete-file-variables-line ()
"Delete file variables comment line if at beginning of buffer.
This only checks the first line of the buffer, and so should be
placed first (or at least early enough) in the list
`org-babel-comment-processing-functions' to ensure that the no
other text has been inserted earlier."
(when (and first-line
(looking-at ; file-variables line
"^#[ \t]*-\\*-.*:.*;[ \t]**-\\*-[ \t]*$"))
(let ((kill-whole-line t))
(kill-line))))
#+end_src
#+latex: \caption{Comment Transform.}
#+latex: \label{fig::transformA}
#+latex: \end{figure}
#+latex: \begin{figure}[h]
#+begin_src emacs-lisp
(defun org-babel-comment-delete-org-control-lines ()
"Remove all org #+ control lines from comment."
(let ((control-regexp "^[ \t]*#\\+.*$"))
(delete-matching-lines control-regexp)))
#+end_src
#+latex: \caption{Comment Transform.}
#+latex: \end{figure}
#+latex: \begin{figure}[h]
#+begin_src emacs-lisp
(defun org-babel-comment-delete-org-in-buffer-settings ()
"Remove all org #+ in-buffer setting lines, leaving other control
lines.
In-buffer setting lines begin with #+ and have all caps keyword
names."
(let ((setting-regexp "^#\\+[ \t]*[A-Z_]+:.*$"))
(delete-matching-lines setting-regexp)))
#+end_src
#+latex: \caption{Comment Transform.}
#+latex: \end{figure}
#+latex: \begin{figure}[h]
#+begin_src emacs-lisp
(defun org-babel-comment-delete-drawers ()
"Delete drawer delimiters and contents from comment.
Drawer names are restricted to those in the `org-drawers' state."
(let ((drawer-start-regexp
(format "^[ \t]*:\\(?:%s\\):[ \t]*$"
(mapconcat 'identity
org-drawers
"\\|")))
(drawer-end-regexp "^[ \t]*:END:[ \t]*$"))
(while (re-search-forward drawer-start-regexp nil t)
(let ((beg (save-excursion
(forward-line 0)
(point)))
(end (save-excursion
(re-search-forward drawer-end-regexp nil t)
(forward-line 1)
(point))))
(goto-char end)
(delete-region beg end)))))
#+end_src
#+latex: \caption{Comment Transform.}
#+latex: \end{figure}
#+latex: \begin{figure}[h]
#+begin_src emacs-lisp
(defun org-babel-comment-delete-drawer-delimiters ()
"Delete drawer delimiters from comment leaving content.
Drawer names are restricted to those given by the `org-drawers'
state."
(let ((drawer-delim-regexp
(format "^[ \t]*:\\(?:%s\\)"
(mapconcat 'identity
(cons "END" org-drawers)
"\\|"))))
(delete-matching-lines drawer-delim-regexp)))
#+end_src
#+latex: \caption{Comment Transform.}
#+latex: \end{figure}
#+latex: \begin{figure}[h]
#+begin_src emacs-lisp
(defun org-babel-comment-trim-blank-lines ()
"Trim whitespace-only lines from beginning and end of text."
(while (and (looking-at "^[ \t\f]*$")
(< (point) (point-max)))
(forward-line 1))
(delete-region (point-min) (point))
(when (< (point) (point-max))
(goto-char (point-max))
(let ((last-point (point)))
(forward-line 0)
(while (and (looking-at "^[ \t\f]*$")
(> (point) (point-min)))
(setq last-point (point))
(forward-line -1))
(delete-region last-point (point-max)))))
#+end_src
#+latex: \caption{Comment Transform.}
#+latex: \end{figure}
#+latex: \begin{figure}[h]
#+begin_src emacs-lisp
(defun org-babel-comment-trim-indent-prefix ()
"Remove longest common leading prefix of spaces from each line of
TEXT.
Prefix is computed from the initial whitespace on each line with
tabs converted to spaces, preserving indentation."
(let* ((common-indent nil)
(common-length (1+ (- (point-max) (point-min))))
(current-indent "") ; enter first loop
(current-length common-length)) ; skip first assignment
(goto-char (point-min))
(while current-indent
(when (< current-length common-length)
(setq common-indent current-indent
common-length current-length))
(setq current-indent
(let* ((found (re-search-forward "^\\([ \t]*\\)\\S-" nil
t))
(bol (match-beginning 0))
(eos (match-end 1))
(space-str (match-string 1))
(indent-tabs-mode nil))
(cond
((not found)
nil)
((not (string-match "\t" space-str))
space-str)
(t ; detabify indent string
(goto-char eos)
(let ((col (current-column)))
(delete-region bol eos)
(indent-to col))
(buffer-substring-no-properties bol (point))))))
(setq current-length (length current-indent)))
(when (and common-indent (> common-length 0))
(let ((indent-re (concat "^" common-indent)))
(goto-char (point-min))
(while (re-search-forward indent-re nil t)
(replace-match "" nil nil))))))
#+end_src
#+latex: \caption{Comment Transform.}
#+latex: \label{fig::transformZ}
#+latex: \end{figure}
#+latex: \end{itemize}
#+latex: \noindent
This kind of customization offers some nice possibilities,
including controlling indentation, eliminating or
transforming org markup, eliminating trailing whitespace, and
automating specialized comment formatting (e.g., javadoc). As
an additional illustration, consider the transform
=org-babel-comment-restrict-comment-range=
#+latex: in Figure \ref{fig::transform-illustration}
below. The idea is that it is sometimes useful to select
from the text under a headline a /part/ of the text for
the comment. We want some org markup that will not affect
either the export or the structure of the org file itself.
To do this, we use the fact that =#+=\textvisiblespace
lines are not exported.[fn:2] So, we can /de facto/ use the
=#+ TANGLE:= construct to control various aspects of tangling.
Here, we use the =#+ TANGLE: start-comment= and
=#+ TANGLE: end-comment= to delimit the comment text.
(This function needs to come earlier in the function list than the
functions that eliminate org control lines. It is sufficient to
prepend it to that list.) This is used in the current file,
for example.
#+latex: \begin{figure}[h]
#+begin_src emacs-lisp
(defun org-babel-comment-restrict-comment-range ()
"Remove all comment text outside start-comment and end-comment
delimiters.
Comment delimiters are #+TANGLE lines with respective keywords
start-comment and end-comment. THE #+TANGLE lines are also
deleted. To be effective, this function should be positioned in
the list `org-babel-comment-processing-functions' before any
functions that remove org control lines or process other
co-occuring attributes of #+TANGLE lines."
(when (re-search-forward "^[ \t]*#\\+[ \t]*TANGLE:.*start-comment.*$"
nil t)
(forward-line 1)
(delete-region (point-min) (point)))
(when (re-search-forward "^[ \t]*#\\+[ \t]*TANGLE:.*end-comment.*$"
nil t)
(forward-line 0)
(delete-region (point) (point-max))))
#+end_src
#+latex: \caption{Transform to illustrate some customization
possibilities.}
#+latex: \label{fig::transform-illustration}
#+latex: \end{figure}
#+latex: \begin{itemize}
[fn:1] In the original code and in the simple fix above, the comment
starts /immediately/ after the =#+end_src= rather than at the start of
the next line. Starting at the next line seems more natural to me
because the comment being constructed relates to the /following/ code
block. But the original behavior is easily restored if people disagree.
[fn:2] A feature request: I would propose that the =#+tangle:= construct
be recognized as non-exported even with spaces preceding the =#= and no
spaces after the =+=. This would enable a variety of interesting
customization for tangled comments. Alternatively, a generic construct
such as =#+noop:= or =#+generic:= could be a valuable for user-based
tags in an org file that serves a similar purpose -- allow customized
processing without directly being exported.
[-- Attachment #1.2: Type: text/html, Size: 28429 bytes --]
[-- Attachment #2: tangle-error.pdf --]
[-- Type: application/pdf, Size: 254903 bytes --]
[-- Attachment #3: tangle-bundle.tgz --]
[-- Type: application/x-gzip, Size: 278048 bytes --]
next reply other threads:[~2011-09-14 19:51 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-09-14 19:51 Christopher Genovese [this message]
2011-09-15 15:49 ` Four issues with org-babel-tangle Eric Schulte
2011-09-15 21:36 ` Christopher Genovese
2011-09-15 22:02 ` Eric Schulte
2011-09-16 4:31 ` Christopher Genovese
2011-09-16 15:51 ` Eric Schulte
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://www.orgmode.org/
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CAPum5FgaooNEZs8Ny8Y3VD1EJYsZXQ6ZwetiT9UR4OEC7wW4=g@mail.gmail.com' \
--to=genovese.cr@gmail.com \
--cc=emacs-orgmode@gnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://git.savannah.gnu.org/cgit/emacs/org-mode.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).