;;; org-tag-query-parse.el -- proposed full parser for tag queries ;; Copyright (C) 2012, Christopher R. Genovese, all rights reserved. ;; Author: Christopher Genovese ;; Version: 0.9 ;; ;; Created: Sun 29 Jul 2012 at 10:04 EDT ;; Last-Updated: Fri 03 Aug 2012 at 23:52 EDT ;; Updated By: Christopher R. Genovese ;; Keywords: org-mode, tags, query, search ;; Package-Requires: ((org-mode 7.8)) ;;; Commentary: ;; ;; The current parser for tag query searches does not handle ;; parenthesized expressions and thus does not allow negating complex ;; queries. This code implements a full parser for tag queries with ;; number of useful features (see the labeled Notes below for further ;; comments on these features): ;; ;; 1. Parenthesized expressions to arbitrary depth are allowed. ;; 2. A '-' can be used to negate a parenthesized term. [Note a] ;; 3. Regex's in {} can contain braces escaped by doubling: {{ }}. [Note b] ;; 4. Supports fast property search on HEADING and PRIORITY. [Note c] ;; 5. Handles hyphens in property names properly. [Note d] ;; 6. Allows only the proper comparison operators, including ==. [Note e] ;; 7. Allows spaces around operators and terms for readability. [Note f] ;; 8. Parse trees use the original expression order; not a big ;; deal, but free. ;; 9. Returned parse trees are clean, without trivial operators, ;; and error messages during parsing are reasonably helpful. ;; 10. A few bug fixes and a cleaner `org-make-tags-matcher'. [Note h] ;; ;; I propose that this new parser be incorporated into org.el. ;; ;; The two existing functions that are affected in the code are ;; `org-make-tags-matcher' and `org-scan-tags'. In the new version of ;; the former, I've extracted out both kinds of query parsing, ;; leading to a shorter and cleaner function. The new version of the ;; latter differs in only a couple *very minor* places that capture ;; two values that were already being computed anyway (see the diff ;; reproduced in the comments). ;; ;; Loading org-tag-query-parse.el does not change the original ;; functions. Instead, I've added a `-NEW' to the names of these ;; functions and saved the originals also with a `-ORIGINAL' added. ;; After loading the file, you can choose a version to try by doing ;; ;; (org-tmp-use-tag-parser 'new) ;; and ;; (org-tmp-use-tag-parser 'original) ;; ;; or do (org-tmp-use-tag-parser) to toggle between versions. ;; You can also just use the names with suffixes directly. ;; ;; I think the place to start looking at the code is the new version ;; of `org-make-tags-matcher'. The main entry function for the new ;; parser is `org-tag-query-parse', though the real workhorse is ;; actually the function `org-tag-query-parse-1'. There is also a ;; new function `org-todo-query-parse' which just extracts the ;; existing todo matching method. (I didn't do anything with that ;; method as the manual makes it clear that it is of secondary ;; importance.) I think the modularity here makes ;; `org-make-tags-matcher' and each separate parser easier to read ;; and understand. ;; ;; The other substantial piece (in terms of lines of code) is a utility ;; macro `org-match-cond' that is used throughout and makes the main ;; parser much more readable IMHO. Admittedly, I went a bit overboard ;; in optimizing it; the first version worked fine but this one ;; produces really nice code. I'd suggest ignoring this code (in ;; section "Parsing utility for readable matchers") on first pass. The ;; docstring is pretty complete, and its use is more or less ;; self-explanatory. ;; ;; Notes: ;; a. There is no need to introduce a new character such as ! for ;; negation because the semantics of the - are clear and are ;; consistent with its use for tags. A - binds more tightly ;; than & which in turn binds more tightly than |. A + ;; selector can also be used for positive selection of a ;; parenthesized term but it is equivalent to using no ;; selector, just as for tags. ;; ;; b. Because \'s are so heavily used in regex's and because they ;; have to be doubled in strings, using \'s for an additional ;; escape layer would be messy, ambiguous, and hard to read. ;; Only the {}'s need to be escaped and the doubling escapes ;; {{ -> { and }} -> } are simple, readable, and fast to ;; parse. For example: "+{abc\\{{3,7\\}}}" gives the regex ;; "abc\\{3,7\\}". Parity makes correctness clear at a glance. ;; ;; c. Because headline (and priority) searches can be useful and ;; powerful, and because the information on those fields is ;; *already processed* in `org-scan-tags', we get those ;; special searches *essentially for free*, requiring only two ;; minor changes to `org-scan-tags'. See the unified diff in ;; comments. The special PRIORITY property already exists; I ;; added the special HEADING property for these purposes. I'm ;; open to changing the name of course, but I do think the ;; feature is very useful. ;; ;; d. I did not see it in the manual, but I think that property names ;; with hyphens should have these \-escaped -'s in the query ;; string, with the escaping slashes removed in the produced ;; matcher. This is not currently done, but the new version does. ;; See Note h for details. ;; ;; e. It seems desirable to support both = and == as equality operators ;; since the latter is so common by habit. The new version allows ;; this explicitly. The original version does as well, but the ;; regex for the comparison operator also allows other operators ;; <<, ><, >>, =>, and >= as well, which can produce bad matchers. ;; See Note h for details. ;; ;; f. Currently, spaces are ignored around &, |, the implicit & between ;; terms, around the comparison operators in property searches, ;; and around +/- selectors. Spaces are not ignored inside {}'s ;; for a regexp match. Truth be told, I prefer having no spaces ;; after the +/- selectors, but it seems somewhat...harsh to insist ;; on that for everyone. Live and let live. ;; ;; g. The current code also allows +/- selectors before property ;; comparisons. I don't really like this because ;; +PROP<>"something" and -PROP="something" have the same ;; meaning but look very different. But the new code does ;; support this. As a side note, there's really no need for ;; the & characters as +/- serve the and/and-not function ;; completely. But again, no prob. ;; ;; h. A few bugs detected in the 7.8.11 code: ;; ;; + Faulty test for todo matcher in org-make-tags-matcher ;; (string-match "/+" match) ;; ;; Ex: (org-make-tags-matcher "PROP={^\\s-*// .*$}") produces ;; an erroneous matcher: ;; ;; ("PROP={^\\s-*// .*$}" progn ;; (setq org-cached-props nil) ;; (member "PROP" tags-list)) ;; ;; For all practical purposes it will be enough to do: ;; ;; (string-match "\\(/\\(!\\)?\\s-*\\)[^{}\"]*$" match) ;; ;; instead of the current test in org-make-tags-matcher. ;; This works as long as the TODO keywords do not contain a ;; right brace or quotation marks. (In most other cases, the ;; new parser should give an error message at parse time.) ;; ;; A technicality: this is /not/ a complete solution because ;; arbitrary strings can be TODO keywords. For instance, ;; both PROP={/!} and PROP="/!{/!}" are valid TODO keywords ;; (it works!) *and* valid property comparisons. So, a pattern ;; alone is insufficient. We want to find the first slash ;; that is not enclosed in {}'s or ""'s; if found, a todo ;; match is needed. The function `org-find-todo-query' does ;; this and (org-find-todo-query match) can be plugged in ;; directly replacing the above (string-match ...) in then ;; new `org-make-tags-matcher'. ;; ;; But because the todo parsing uses {}'s for regex matches, ;; TODO keywords with {}'s are ignored anyway. So there's ;; no need to go beyond the fixed string-match above. ;; The function `org-todo-query-parse', which handles todo ;; parsing in the new version, makes this explicit. ;; ;; + Property names with -'s are not handled properly (cf. Note d) ;; ;; Specifically, the escapes are not removed. Example: ;; (org-make-tags-matcher "PROP\\-WITH\\-HYPHENS=2") ;; produces ;; ;; ("PROP\\-WITH\\-HYPHENS=2" and ;; (progn ;; (setq org-cached-props nil) ;; (= ;; (string-to-number ;; (or (org-cached-entry-get nil "PROP\\-WITH\\-HYPHENS") ;; "")) ;; 2)) ;; t) ;; ;; The original code /does/ instead remove -'s from tag ;; names, which shouldn't have them anyway. I suspect that ;; this was intended for property names rather than tag ;; names. The new version fixes up property names but does ;; not allow -'s in tags. ;; ;; + Incorrect comparison operators allowed (cf. Note e) ;; ;; The regular expression used is "[<=>]\\{1,2\\}" is used to ;; detect the comparison operators. But this can produce bad ;; matchers that fail opaquely at match time rather than ;; giving an appropriate error message at parse time. ;; ;; Ex: (org-make-tags-matcher "P<<2") produces ;; ;; ("P<<2" and ;; (progn ;; (setq org-cached-props nil) ;; (nil ;; (string-to-number (or (org-cached-entry-get nil "P") "")) 2)) ;; t) ;; ;; This is fixed in the new version and delivers an error ;; message at parse time. ;; ;; + missing org-re (line 7179 in org.el) with posix classes ;; ;; Minor consistency issue. This line does not occur in the new ;; code. ;; ;; What follows is a grammar for the updated tag query dsl, which is ;; given in an informal hybrid of BNF and regex operators. Hopefully, it's ;; clear enough. The non-obvious terminals are ALL CAPS and are listed below. ;; ;; Grammar: ;; Expression <- Conjunction (OR Conjunction)* ;; Conjunction <- Term (AND? Term)* ;; Term <- SELECTOR? TAG_IDENT ;; | SELECTOR? LBRACE Regex RBRACE ;; | SELECTOR? PROP_IDENT CMP_OP NUMBER ;; | SELECTOR? PROP_IDENT CMP_OP STRING ;; | SELECTOR? PROP_IDENT CMP_OP DATE_STRING ;; | SELECTOR? PROP_IDENT MATCH_OP LBRACE REGEX RBRACE ;; | SELECTOR? LPAREN Expression RPAREN ;; ;; Terminals (the nonobvious ones): ;; OR = | ;; AND = & ;; CMP_OP = (==|=|<=|<>|>=|>|<) ;; MATCH_OP = (==|=|<>) ;; TAG_IDENT = [A-Za-z0-9_@#%]+ ;; PROP_IDENT = ([A-Za-z0-9_]+(\\-)*)+ ;; SELECTOR? = [-+]? ;; STRING = balanced double-quoted string with escapes ;; DATE_STRING = org style date string ;; REGEXP = regular expression with { and } doubled to escape, {{ and }} ;; ;;; License: ;; ;; This program is free software; you can redistribute it and/or ;; modify it under the terms of the GNU General Public License as ;; published by the Free Software Foundation; either version 3, or ;; (at your option) any later version. ;; ;; This program is distributed in the hope that it will be useful, ;; but WITHOUT ANY WARRANTY; without even the implied warranty of ;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU ;; General Public License for more details. ;; ;; You should have received a copy of the GNU General Public License ;; along with this program; see the file COPYING. If not, write to ;; the Free Software Foundation, Inc., 51 Franklin Street, Fifth ;; Floor, Boston, MA 02110-1301, USA. ;; ;;; Code: (eval-when-compile (require 'cl)) (require 'org) ;;; Utilities (defun org-read-balanced-string (open-delim close-delim) "Return string delimited by OPEN-DELIM and CLOSE-DELIM with escapes. OPEN-DELIM and CLOSE-DELIM must be *distinct* characters. Reading begins at point in the current buffer. To include OPEN-DELIM and CLOSE-DELIM inside the string being read, those characters must be *doubled* and only one copy of the character is kept in the string. The opening and closing delimiters for the read sequence must be single copies, and an unescaped OPEN-DELIM will raise an error." (when (char-equal open-delim close-delim) (error "Open and close delimiters must be distinct, %c" open-delim)) (unless (char-equal (char-after) open-delim) (error "Missing opening %c in delimited string" open-delim)) (forward-char 1) ; skip initial delimiter (let ((delim-re (format "[%c%c]" open-delim close-delim)) (delims-unbalanced t) (begin (point)) (fragments nil) (ch nil)) (while (and delims-unbalanced (re-search-forward delim-re nil t)) (setq ch (char-before)) (cond ((char-equal ch open-delim) (setq ch (char-after)) (if (not (and ch (char-equal ch open-delim))) (error "Unescaped open delimiter %c in balanced string" open-delim) (push (buffer-substring-no-properties begin (1- (point))) fragments) (setq begin (point)) (forward-char 1))) ((char-equal ch close-delim) (setq ch (char-after)) (if (not (and ch (char-equal ch close-delim))) (setq delims-unbalanced nil) (push (buffer-substring-no-properties begin (1- (point))) fragments) (setq begin (point)) (forward-char 1))))) (when delims-unbalanced (error "Unbalanced delimiters %c%c in balanced string at char %d." open-delim close-delim (point))) (push (buffer-substring-no-properties begin (1- (point))) fragments) (if (null (cdr fragments)) (car fragments) (apply 'concat (nreverse fragments))))) (defun org-read-quoted-string-in-query () "Read a quoted string, with escapes, starting at point. Assume that an opening quote has already been seen. This is just a wrapper for `read' that reports errors nicely." (let ((start (point))) (condition-case exception (read (current-buffer)) (error (org-tquery-error "badly formed quoted value in property comparison" :pos start))))) ;; I'm inclined to define an error symbols here to allow finer control. ;; But error symbols don't seem to be used in the main org code, so I'll ;; forgo them here and use `error' directly. -- CRG 31 July 2012 (defun org-tquery-error (info &rest other-args) "Raise a query parsing error. INFO is an auxilliary message which is appended to the standard message and which is treated as a format string for OTHER-ARGS. This need/should not be capitalized. The end of OTHER-ARGS can contain keyword-value pairs :pos and :type to control the final error message. The former defaults to point (position in the query string) and the latter defaults to `tag'." (let* ((pos (or (cadr (memq :pos other-args)) (point))) (qtype (or (cadr (memq :type other-args)) "tag "))) (error (format "Parse error in %squery at character %d, %s" qtype pos (apply 'format info other-args))))) ;; This is used to define the parser symbol table at compile time. (defmacro org-defhash-at-compile (name options &rest body) "Define a hash table NAME at compile time and/or load-time. OPTIONS is a list, possibly empty, of options to pass to `make-hash-table'. BODY is a list of sequences (lists or vectors), each of which contains a key value pair. The key and the value will be evaluated, so for example, a symbol key should be quoted." (declare (indent 2)) (let* ((docstring (and (stringp (car body)) (car body))) (table (if docstring (cdr body) body))) `(eval-and-compile (defvar ,name (make-hash-table ,@options) ,docstring) ,@(mapcar (lambda (s) `(puthash ,(elt s 0) ,(elt s 1) ,name)) table)))) ;;; Regex comparison functions in a form like the other comparison operators (defun org-string-match= (string regexp) (string-match-p regexp string)) (defun org-string-match<> (string regexp) (not (string-match-p regexp string))) ;;; Parsing utility for readable matchers (org-defhash-at-compile org-tag-query-terminals () "Lexical token regexes for tag query parsing. Hash table also contains the symbols +, * and ? that can be used to represent repetition operators in matched expressions, e.g., (NUMBER)+ or (SELECTOR)\?. (? must be escaped.)" ['TERM-BEGIN (org-re "[-+[:alnum:]_{(]")] ['SELECTOR "[-+]"] ['TAG-IDENT (org-re "[[:alnum:]_@#%:]+")] ['PROP-IDENT (org-re "\\(?:[[:alnum:]_]+\\(?:\\\\-\\)*\\)+")] ['CMP-OP "\\(?:==?\\|<=\\|<>\\|>=\\|<\\|>\\)"] ['MATCH-OP "\\(?:==?\\|<>\\)"] ['CMP-RHS-BEGIN (org-re "\\(?:{\\|\\\"\\|-?[.[:digit:]]\\)")] ['REGEX-OPEN "{"] ['GROUP-OPEN "("] ['GROUP-CLOSE ")"] ['OR-OP "|"] ['AND-OP "&"] ['DATE-STRING "\"[[<].*?[]>]\""] ['INTEGER (org-re "-?[[:digit:]]+")] ['NUMBER "-?\\(?:\\.[0-9]+\\|[0-9]+\\(\\.[0-9]*\\)?\\)\\(?:[eE][-+][0-9]+\\)?"] ['SPACE "[[:blank:]]"] ['SPACE* "[[:blank:]]*"] ['* "*"] ['+ "+"] ['\? "?"]) (org-defhash-at-compile org-tag-query-cmp-ops (:test 'equal) "Maps comparison operator strings to a vector of comparison functions. The functions are arranged in the vector for different types as follows: NUM STRING TIME REGEX" ("=" [= string= org-time= org-string-match=]) ("==" [= string= org-time= org-string-match=]) ("<>" [org<> org-string<> org-time<> org-string-match<>]) ("<" [< string< org-time< nil]) ("<=" [<= org-string<= org-time<= nil]) (">=" [>= org-string>= org-time>= nil]) (">" [> org-string> org-time> nil])) (eval-and-compile ;; I'd prefer gensyms for this, but realistically this will do. (defvar org-tag-query/private-opt-ch- nil "This variable should not be set by the user, even locally. It should remain nil or chaos may ensue.") (defun org-string-as-char-p (s) "Is S a string of a single, possibly backslash-escaped, character? If so, return the character this represents, otherwise nil." (if (stringp s) (let ((len (length s))) (or (and (= 1 len) (string-to-char s)) (and (= 2 len) (char-equal ?\\ (string-to-char s)) (aref s 1)))) nil)) (defun org-tag-query-matcher<- (patterns &optional autogroup terminals next-ch) "Convert a list of PATTERNS to a `looking-at' query. AUTOGROUP, if non-nil, turns on autogrouping of singletons. TERMINALS, if non-nil, should be a hash table for attempted symbol lookup. See `org-match-cond' for more details on PATTERNS's format." (let* ((all-strings (lambda (elements) (let ((ptr elements)) (while (and ptr (stringp (car ptr))) (setq ptr (cdr ptr))) (null ptr)))) (replace-terminal (lambda (item) (cond ((not (hash-table-p terminals)) item) ((symbolp item) (gethash item terminals item)) (t item)))) (push-all (lambda (src dest) (dolist (s src dest) (push s dest)))) (alternatives (let (rhs acc nogroup nocapture) (dolist (p patterns (progn (push (nreverse rhs) acc) (nreverse acc))) (cond ((eq p '|) (if (null rhs) (org-tquery-error "misplaced alternative marker") (push (nreverse rhs) acc) (setq rhs nil))) ((eq p :nogroup) (setq nogroup t)) ((eq p :nocapture) (setq nocapture t)) ((or (consp p) (vectorp p)) ; capturing/noncapturing group resp. (unless nogroup ; allow nogroup/nocapture here for consistency (push (if (or nocapture (vectorp p)) "\\(?:" "\\(") rhs)) (setq rhs (funcall push-all (mapcar replace-terminal p) rhs)) (unless nogroup (push "\\)" rhs)) (setq nogroup nil nocapture nil)) ((and autogroup (not nogroup)) (push (if (or nocapture (vectorp p)) "\\(?:" "\\(") rhs) (push (funcall replace-terminal p) rhs) (push "\\)" rhs) (setq nogroup nil nocapture nil)) (t (push (funcall replace-terminal p) rhs) (setq nogroup nil nocapture nil)))))) (matchers (mapcar (lambda (pterms) (let ((fst (car pterms)) (sch nil)) (cond ((and (cdr pterms) (funcall all-strings pterms)) `(looking-at ,(apply 'concat pterms))) ((cdr pterms) `(looking-at (concat ,@pterms))) ((and next-ch (setq sch (org-string-as-char-p fst))) `(and (char-equal ,next-ch ,sch) (setq org-tag-query/private-opt-ch- ,sch))) (t `(looking-at ,fst))))) alternatives))) (if (cdr matchers) `(or ,@matchers) `,(car matchers))))) (defmacro org-match-cond (options &rest clauses) "Like `cond', but allows special forms in the clauses that lookahead in, extract from, and move in the current buffer. The special forms are described below. OPTIONS, if non-nil, is either a symbol, SYMBOL-TABLE, bound to a hash table for symbol lookup, or a list of the form (SYMBOL-TABLE NEXT-CHAR-SYM &rest BINDINGS). If NEXT-CHAR-SYM is not nil, it should be a symbol that will be bound to the next character and used to optimize the matching of especially simple patterns, as described below. BINDINGS are standard let bindings which will visible in the CLAUSES. CLAUSES are structured like the clauses in `cond'. Lookahead: (<- PATTERN... [| PATTERN...]*) -- symbol lookup in SYMBOL-HASH (<<- PATTERN... [| PATTERN...]*) -- symbol lookup + singleton autogrouping (<<= PATTERN... [| PATTERN...]*) -- no symbol lookup or autogrouping Each pattern in the list is either a string, a symbol, a list of strings and symbols -- representing capturing groups, or a vector of strings and symbols -- representing non-capturing groups. The |'s separate alternatives that are tested lazily (a la `or'); the eventual strings computed for the patterns in each alternative are concatenated together to form a regular expression which is tested with `looking-at'. When symbol lookup is in effect (<- and <<- forms), symbols in a pattern are first looked up in SYMBOL-TABLE if it exists, and replaced with the corresponding value if present. If all the patterns's in an alternative resolve to strings at compile time, the regular expression is computed at compile time and all the forms reduce to a single call to `looking-at'. Otherwise, the regular expression is computed at runtime. In the special case where there are no alternatives, the pattern is a string representing a single character at compile time (one character or a backslash-escaped character), and NEXT-CHAR-SYM is a non-nil symbol, the cond clause is optimized to do a character comparison rather than a `looking-at'. Specifically, NEXT-CHAR-SYM is bound to the character at point before any tests and used via `char-equal' for this optimized match. Movement and string forms with group 0, i.e., (@ end 0), ($ 0), ($$ 0), see below, still work as expected in this case. To suppress the character optimization when NEXT-CHAR-SYM is non-nil, for instance to match a regular expression `.', it is sufficient to include an empty string in the pattern list or to put the term in a non-capturing group.. In the <<- form, each singleton pattern (strings or symbols) is automatically put in a capturing group, unless preceded by :nogroup (inhibiting group) or :nocapture (inhibiting capture). Movement: (@ FROM GROUP [OFFSET]) Moves point to a position relative to a match GROUP, which should be a non-negative integer. This has no effect if the group did not match, except an (@ end 0) always moves to the end of what matched even if the clause was optimized into a character match. FROM can be either the symbol `end' or `begin', which matches the end or beginning of the group, or a function which is called with group as an argument. Optional OFFSET, if non-nil, is added to the specified position. String Extraction: ($ GROUP) The matched string for group GROUP, or nil. A ($ 0) always works, even with character optimization. " (declare (indent 1)) (let* ((opt-listp (consp options)) (sym-table (if opt-listp (car options) options)) (next-char (if opt-listp (cadr options) nil)) (next-sym (if next-char `',next-char nil)) (bindings (if opt-listp (nthcdr 2 options) nil))) `(macrolet ((<- (&rest patterns) ; lookup `,(org-tag-query-matcher<- patterns nil ,sym-table ,next-sym)) (<<- (&rest patterns) ; lookup + autogrouping `,(org-tag-query-matcher<- patterns t ,sym-table ,next-sym)) (<<= (&rest patterns) ; no lookup or autogrouping `,(org-tag-query-matcher<- patterns nil nil ,next-sym)) (@ (from group &optional offset) ; move point relative to group ;; Handling the character-match optimization requires ;; checking if a character match was made so an (@ end 0) ;; moves forward instead of referring to the match ;; data. The idea here is to only check that case when ;; the optimization was requested in the first place ;; and then to do as much work as possible at compile ;; time. Admittedly, this part has gotten a bit crazy, ;; but it does produce good code. -- CRG 02 Aug 2012 ,(if (not next-char) `(let ((ipos (case from (end `(or (match-end ,group) (point))) (begin `(or (match-beginning ,group) (point))) (t `(,from ,group))))) `(goto-char ,(if offset `(+ ,ipos ,offset) ipos))) `(let ((ipos (case from (end `(if org-tag-query/private-opt-ch- ,(cond ((and (integerp group) (zerop group)) '(min (1+ (point)) (point-max))) ((integerp group) '(point)) (t `(min (+ (point) (if (zerop ,group) 1 0)) (point-max)))) (or (match-end ,group) (point)))) (begin `(or (and (not org-tag-query/private-opt-ch-) (match-beginning ,group)) (point))) (t `(,from ,group))))) `(goto-char ,(if offset `(+ ,ipos ,offset) ipos))))) ($ (group) ; string or nil if no match for group ,(if (not next-char) ``(match-string ,group) ``(if org-tag-query/private-opt-ch- ,(if (integerp group) ;; w/literal group, just do it (if (zerop group) ;; likely the common case '(string org-tag-query/private-opt-ch-) nil) `(if (zerop ,group) (string org-tag-query/private-opt-ch-) nil)) (match-string ,group))))) (let (,@bindings ,@(if next-char (list `(,next-char (char-after))) nil) ,@(if next-char (list '(org-tag-query/private-opt-ch- nil)) nil)) (cond ,@clauses))))) ;;; The tag query parser itself ;; Successive matches (\G-style) in a fixed string are not possible in ;; emacs (without making repeated copies of substrings) because there is ;; no way to anchor a string-match at the start position argument. ;; (I have suggested using the zero-length assertion \= in a string to ;; anchor at that position, analogous to its use in buffer searches. If ;; you're with me, spread the word.) ;; ;; So instead of marching through the string copying substrings, we ;; process the query string in a temporary buffer. This is more ;; idiomatic elisp in any case, and is quite fast and convenient it ;; turns out. Doing it the other way is possible as well, by changing ;; the org-match-cond macro and org-tag-query-parse-1 in a few places. ;; But so far I like this approach. (defun org-tag-query-parse (query-string) "Convert an Org tag QUERY-STRING into a matcher lisp form. The matcher is a lisp form" (cond ((or (not query-string) (not (string-match-p "\\S-" query-string))) t) ((string-match "^\\s-*\\([^-+A-Za-z0-9_@%#:{(/]\\)" query-string) (org-tquery-error "invalid characters in query string" :pos (match-beginning 1))) (t (with-temp-buffer (insert query-string) (goto-char (point-min)) (org-tag-query-parse-1))))) (defun org-tag-query-parse-1 () ;; Works in current buffer with string to be parsed starting at point. (let ((parse-stack nil) (paren-count 0) neg-select got-select) (labels ((emit (&rest items) (dolist (item items) (push item parse-stack))) (no-term-p () (symbolp (car parse-stack))) (negate-if (negate item) (if negate `(not ,item) item)) (thread (&rest iterations) (dolist (_ iterations) (let (entries new-entry type) (while (not (symbolp (car parse-stack))) (setq entries (cons (pop parse-stack) entries))) (unless (and entries parse-stack) (org-tquery-error "empty subexpression")) (case (setq type (pop parse-stack)) ((and or) (setq new-entry (if (cdr entries) (cons type entries) (car entries)))) (not (assert (null (cdr entries)) nil "not is unary") (let ((arg (car entries))) (setq new-entry (if (and (consp arg) (eq (car arg) 'not)) (cadr arg) `(not ,arg))))) (identity (assert (null (cdr entries)) nil "() is one expression") (setq new-entry (car entries))) (t ; this really shouldn't happen (org-tquery-error "invalid symbol %s on stack." type))) (emit new-entry)))) (tag-check (id &optional negate) (let ((check `(member ,id tags-list))) (if negate `(not ,check) check))) (prop-check (prop &optional numericp) (cond ((string-equal prop "LEVEL") 'level) ((member prop '("TODO" "HEADING" "PRIORITY")) `(or ,(intern (downcase prop)) "")) ((string-equal prop "CATEGORY") '(or (get-text-property (point) 'org-category) "")) (numericp `(string-to-number (or (org-cached-entry-get nil ,prop) ""))) (t `(or (org-cached-entry-get nil ,prop) ""))))) ;; Seed outermost expression in parse tree (emit 'or 'and) (skip-chars-forward (org-re "[:blank:]")) (while (not (eobp)) ;; Process a term ;; Look for the selector char first (org-match-cond org-tag-query-terminals ((<- (SELECTOR) SPACE*) (@ end 0) (setq got-select t neg-select (char-equal (string-to-char ($ 1)) ?-))) (t (setq got-select nil neg-select nil))) ;; Now look for the rest of the term (org-match-cond (org-tag-query-terminals char-at-point) ((<- (PROP-IDENT) SPACE* (CMP-OP) SPACE* (CMP-RHS-BEGIN)) (@ begin 3) (let* ((prop (save-match-data (replace-regexp-in-string "\\\\-" "-" ($ 1) t t))) (cmp ($ 2)) (indx (case (char-after) (?\{ 3) (?\" (org-match-cond org-tag-query-terminals ((<- DATE-STRING) 2) (t 1))) (t 0))) (op-f (aref (gethash cmp org-tag-query-cmp-ops) indx)) (rhs (case indx (0 (org-match-cond org-tag-query-terminals ((<- NUMBER) (@ end 0) (string-to-number ($ 0))) (t (org-tquery-error "invalid number on rhs of property comparison")))) (1 (org-read-quoted-string-in-query)) (2 (org-matcher-time (org-read-quoted-string-in-query))) (3 (org-read-balanced-string ?\{ ?\})))) (form (list op-f (prop-check prop (zerop indx)) rhs))) (unless op-f (org-tquery-error "invalid operator for property regexp match")) (emit (negate-if neg-select form)))) ((<- TAG-IDENT) (@ end 0) (emit (tag-check ($ 0) neg-select))) ((<- REGEX-OPEN) (let ((regex (org-read-balanced-string ?\{ ?\}))) (emit (negate-if neg-select `(org-match-any-p ,regex tags-list))))) ((<- GROUP-OPEN) (@ end 0) (emit (if neg-select 'not 'identity) 'or 'and) (incf paren-count)) (got-select (org-tquery-error "trailing selector with no term")) ((no-term-p) (org-tquery-error "missing the expected term")) ((<- GROUP-CLOSE) ; end of subexpression, clean up (@ end 0) (decf paren-count) (when (< paren-count 0) (org-tquery-error "mismatched )'s")) (thread 'conjunction 'disjunction 'selector)) ((<- AND-OP) ; continue conjunction, expect a term (@ end 0)) ((<- OR-OP) ; start or continue a disjunction (@ end 0) (thread 'conjunction) (emit 'and)) (t (org-tquery-error "invalid token %c during query parse" (char-after)))) ;; Allow spaces around terms, operators, and parens (skip-chars-forward (org-re "[:blank:]"))) (unless (zerop paren-count) (org-tquery-error "missing )s in query string")) ;; Build the final parse tree by threading the stack (while (cdr parse-stack) (thread 'any)) `(progn (setq org-cached-props nil) ,(car parse-stack))))) ;;; Modified `org-make-tags-matcher' and `org-scan-tags' that use the new parser ;;; The main change to `org-make-tags-matcher' is to insert the new tag ;;; parser, but this allowed me to shorten and clean up the code, fix ;;; one very minor bug (see NOTE below), and update the docstring. ;;; I've also separated out the todo parsing into a separate function ;;; for clarity and symmetry, though I've left the method as is. ;;; ;;; The changes to `org-scan-tags' are minor and essentially free, and ;;; I include the context diff in comments below. ;;; ;;; These both are drawn from the 7.8.11 code. To facilitate testing, ;;; I've added an -NEW to the names of these functions for the moment, ;;; but that should be changed if they are used. The temporary function ;;; `org-tmp-use-tag-parser' allows switching between old and new for ;;; testing. See also the file `tag-query-tests.el' that runs some basic ;;; tests. (defun org-make-tags-matcher-NEW (match) "Create the TAGS/TODO matcher form for the selection string MATCH. The variable `todo-only' is scoped dynamically into this function; it will be set to t if the matcher restricts matching to TODO entries, otherwise will not be touched. Returns a cons of the selection string MATCH and the constructed lisp form implementing the matcher. The matcher is to be evaluated at an Org entry, with point on the headline, and returns t if the entry matches the selection string MATCH. The returned lisp form may reference four variables with information about the entry, which must be bound around the form's evaluation: todo, the TODO keyword at the entry (or nil of none); heading, the text of the heading for the entry; priority, the priority cookie for the entry or nil; and tags-list, the list of all tags at the entry including inherited ones. Additionally, the category of the entry (if any) must be specified as the text property 'org-category on the headline. See also `org-scan-tags'. " (declare (special todo-only)) (unless (boundp 'todo-only) (error "org-make-tags-matcher expects todo-only to be scoped in")) (unless match ;; Get a new match request, with completion (let ((org-last-tags-completion-table (org-global-tags-completion-table))) (setq match (org-completing-read-no-i "Match: " 'org-tags-completion-function nil nil nil 'org-tags-history)))) ;; Parse the string and create a lisp form (let ((match0 match) tagsmatch todomatch) (if (string-match "\\(/\\(!\\)?\\s-*\\)[^{}\"]*$" match) ; See Note h above ;; match contains also a todo-matching request (progn (setq tagsmatch (substring match 0 (match-beginning 1)) todomatch (substring match (match-end 1))) (if (match-end 2) (setq todo-only t)) (if (= (match-end 0) (match-end 1)) ; the space* is greedy (setq todomatch nil))) ;; only matching tags (setq tagsmatch match todomatch nil)) (let* ((tagsmatcher (org-tag-query-parse tagsmatch)) (todomatcher (org-todo-query-parse todomatch)) (matcher (if (eq todomatcher t) ;; NOTE: original kept (and ... t) when no todo matcher -- CRG 31 Jul 2012 tagsmatcher (list 'and tagsmatcher todomatcher)))) (when todo-only (setq matcher (list 'and '(member todo org-not-done-keywords) matcher))) ;; Return the string and lisp forms of the matcher (cons match0 matcher)))) (defun org-todo-query-parse (query-string) (if (or (not query-string) (not (string-match "\\S-" query-string))) t (let ((orterms (org-split-string query-string "|")) (orlist nil) (todomatcher nil) (re (org-re "^&?\\([-+:]\\)?\\({[^}]*}\\|[^-+\"{}&|]+\\)")) term minus kwd re-p mm) (while (setq term (pop orterms)) (while (string-match re term) (setq minus (and (match-end 1) (equal (match-string 1 term) "-")) kwd (match-string 2 term) re-p (equal (string-to-char kwd) ?{) term (substring term (match-end 0)) mm (if re-p `(string-match ,(substring kwd 1 -1) todo) (list 'equal 'todo kwd)) mm (if minus (list 'not mm) mm)) (push mm todomatcher)) (push (if (> (length todomatcher) 1) (cons 'and todomatcher) (car todomatcher)) orlist) (setq todomatcher nil)) (if (> (length orlist) 1) (cons 'or orlist) (car orlist))))) ;; The changes to org-scan-tags are minor and essentially free. ;; A diff -U 2 against org.el from 7.8.11 with only this function ;; changed follows. ;; ;; --- org.el 2012-07-31 15:32:17.000000000 -0400 ;; +++ modified-org.el 2012-07-31 15:20:56.000000000 -0400 ;; @@ -12830,5 +12830,5 @@ ;; " *\\(\\<\\(" ;; (mapconcat 'regexp-quote org-todo-keywords-1 "\\|") ;; - (org-re "\\)\\>\\)? *\\(.*?\\)\\(:[[:alnum:]_@#%:]+:\\)?[ \t]*$"))) ;; + (org-re "\\)\\>\\)?[ \t]*\\(?:\\[#\\(.\\)\\]\\)?[ \t]*\\(.*?\\)\\(:[[:alnum:]_@#%:]+:\\)?[ \t]*$"))) ;; (props (list 'face 'default ;; 'done-face 'org-agenda-done ;; @@ -12848,5 +12848,5 @@ ;; (tags-alist (list (cons 0 org-file-tags))) ;; (llast 0) rtn rtn1 level category i txt ;; - todo marker entry priority) ;; + todo marker entry heading priority priority-num) ;; (when (not (or (member action '(agenda sparse-tree)) (functionp action))) ;; (setq action (list 'lambda nil action))) ;; @@ -12860,5 +12860,7 @@ ;; (catch :skip ;; (setq todo (if (match-end 1) (org-match-string-no-properties 2)) ;; - tags (if (match-end 4) (org-match-string-no-properties 4))) ;; + priority (if (match-end 3) (org-match-string-no-properties 3)) ;; + heading (org-match-string-no-properties 4) ;; + tags (if (match-end 5) (org-match-string-no-properties 5))) ;; (goto-char (setq lspos (match-beginning 0))) ;; (setq level (org-reduced-level (funcall outline-level)) ;; @@ -12938,5 +12940,5 @@ ;; tags-list ;; ) ;; - priority (org-get-priority txt)) ;; + priority-num (org-get-priority txt)) ;; (goto-char lspos) ;; (setq marker (org-agenda-new-marker)) ;; @@ -12944,5 +12946,5 @@ ;; 'org-marker marker 'org-hd-marker marker 'org-category category ;; 'todo-state todo ;; - 'priority priority 'type "tagsmatch") ;; + 'priority priority-num 'type "tagsmatch") ;; (push txt rtn)) ;; ((functionp action) ;; (defun org-scan-tags-NEW (action matcher todo-only &optional start-level) "Scan headline tags with inheritance and produce output ACTION. ACTION can be `sparse-tree' to produce a sparse tree in the current buffer, or `agenda' to produce an entry list for an agenda view. It can also be a Lisp form or a function that should be called at each matched headline, in this case the return value is a list of all return values from these calls. MATCHER is a Lisp form to be evaluated, testing if a given set of tags qualifies a headline for inclusion. When TODO-ONLY is non-nil, only lines with a not-done TODO keyword are included in the output. This should be the same variable that was scoped into and set by `org-make-tags-matcher' when it constructed MATCHER. START-LEVEL can be a string with asterisks, reducing the scope to headlines matching this string." (require 'org-agenda) (let* ((re (concat "^" (if start-level ;; Get the correct level to match (concat "\\*\\{" (number-to-string start-level) "\\} ") org-outline-regexp) " *\\(\\<\\(" (mapconcat 'regexp-quote org-todo-keywords-1 "\\|") (org-re "\\)\\>\\)?[ \t]*\\(?:\\[#\\(.\\)\\]\\)?[ \t]*\\(.*?\\)\\(:[[:alnum:]_@#%:]+:\\)?[ \t]*$"))) (props (list 'face 'default 'done-face 'org-agenda-done 'undone-face 'default 'mouse-face 'highlight 'org-not-done-regexp org-not-done-regexp 'org-todo-regexp org-todo-regexp 'org-complex-heading-regexp org-complex-heading-regexp 'help-echo (format "mouse-2 or RET jump to org file %s" (abbreviate-file-name (or (buffer-file-name (buffer-base-buffer)) (buffer-name (buffer-base-buffer))))))) (case-fold-search nil) (org-map-continue-from nil) lspos tags tags-list (tags-alist (list (cons 0 org-file-tags))) (llast 0) rtn rtn1 level category i txt todo marker entry heading priority priority-num) (when (not (or (member action '(agenda sparse-tree)) (functionp action))) (setq action (list 'lambda nil action))) (save-excursion (goto-char (point-min)) (when (eq action 'sparse-tree) (org-overview) (org-remove-occur-highlights)) (while (re-search-forward re nil t) (setq org-map-continue-from nil) (catch :skip (setq todo (if (match-end 1) (org-match-string-no-properties 2)) priority (if (match-end 3) (org-match-string-no-properties 3)) heading (org-match-string-no-properties 4) tags (if (match-end 5) (org-match-string-no-properties 5))) (goto-char (setq lspos (match-beginning 0))) (setq level (org-reduced-level (funcall outline-level)) category (org-get-category)) (setq i llast llast level) ;; remove tag lists from same and sublevels (while (>= i level) (when (setq entry (assoc i tags-alist)) (setq tags-alist (delete entry tags-alist))) (setq i (1- i))) ;; add the next tags (when tags (setq tags (org-split-string tags ":") tags-alist (cons (cons level tags) tags-alist))) ;; compile tags for current headline (setq tags-list (if org-use-tag-inheritance (apply 'append (mapcar 'cdr (reverse tags-alist))) tags) org-scanner-tags tags-list) (when org-use-tag-inheritance (setcdr (car tags-alist) (mapcar (lambda (x) (setq x (copy-sequence x)) (org-add-prop-inherited x)) (cdar tags-alist)))) (when (and tags org-use-tag-inheritance (or (not (eq t org-use-tag-inheritance)) org-tags-exclude-from-inheritance)) ;; selective inheritance, remove uninherited ones (setcdr (car tags-alist) (org-remove-uninherited-tags (cdar tags-alist)))) (when (and ;; eval matcher only when the todo condition is OK (and (or (not todo-only) (member todo org-not-done-keywords)) (let ((case-fold-search t)) (eval matcher))) ;; Call the skipper, but return t if it does not skip, ;; so that the `and' form continues evaluating (progn (unless (eq action 'sparse-tree) (org-agenda-skip)) t) ;; Check if timestamps are deselecting this entry (or (not todo-only) (and (member todo org-not-done-keywords) (or (not org-agenda-tags-todo-honor-ignore-options) (not (org-agenda-check-for-timestamp-as-reason-to-ignore-todo-item))))) ;; Extra check for the archive tag ;; FIXME: Does the skipper already do this???? (or (not (member org-archive-tag tags-list)) ;; we have an archive tag, should we use this anyway? (or (not org-agenda-skip-archived-trees) (and (eq action 'agenda) org-agenda-archives-mode)))) ;; select this headline (cond ((eq action 'sparse-tree) (and org-highlight-sparse-tree-matches (org-get-heading) (match-end 0) (org-highlight-new-match (match-beginning 1) (match-end 1))) (org-show-context 'tags-tree)) ((eq action 'agenda) (setq txt (org-agenda-format-item "" (concat (if (eq org-tags-match-list-sublevels 'indented) (make-string (1- level) ?.) "") (org-get-heading)) category tags-list ) priority-num (org-get-priority txt)) (goto-char lspos) (setq marker (org-agenda-new-marker)) (org-add-props txt props 'org-marker marker 'org-hd-marker marker 'org-category category 'todo-state todo 'priority priority-num 'type "tagsmatch") (push txt rtn)) ((functionp action) (setq org-map-continue-from nil) (save-excursion (setq rtn1 (funcall action)) (push rtn1 rtn))) (t (error "Invalid action"))) ;; if we are to skip sublevels, jump to end of subtree (unless org-tags-match-list-sublevels (org-end-of-subtree t) (backward-char 1)))) ;; Get the correct position from where to continue (if org-map-continue-from (goto-char org-map-continue-from) (and (= (point) lspos) (end-of-line 1))))) (when (and (eq action 'sparse-tree) (not org-sparse-tree-open-archived-trees)) (org-hide-archived-subtrees (point-min) (point-max))) (nreverse rtn))) ;;; Extras ;; See Note h above. Though it gives a full solution to finding ;; the todo matcher, it is likely not needed in practice, and ;; unless/until that changes, will be removed from the final code. (defun org-find-todo-query (query-string) "Does query string contain a todo match expression? Search for the first / that is not between quotes or braces, and return the index of that character if found, or nil. Set match data for QUERY-STRING so that group 0 spans from the found / to the end of the string, group 1 matches \"/!?\\s-*\" at the found /, and group 2 matches the ! if present." (with-temp-buffer (insert query-string) (goto-char (point-min)) ;; Search for first / that is not between ""'s or {}'s (catch :found-slash (while (re-search-forward "\\(/\\(!\\)?\\s-*\\)\\|[\"{]" nil t) (when (match-end 1) (set-match-data (mapcar '1- ; set indices using string convention (nconc (list (match-beginning 0) (point-max) ;0 / to end (match-beginning 1) (match-end 1)) ;1 /!?\\s-* (if (match-end 2) ;2 !? (list (match-beginning 2) (match-end 2)) nil)))) (throw :found-slash (1- (match-beginning 1)))) (goto-char (match-beginning 0)) (case (char-after) (?\" (org-read-quoted-string-in-query)) (?\{ (org-read-balanced-string ?\{ ?\})))) nil))) ;; Temporary code to help with interactive testing ;; ;; I've added a `-NEW' to the names of the modified functions and save ;; the originals belowo with a `-ORIGINAL' added. After loading this ;; file, you can do ;; ;; (org-tmp-use-tag-parser 'new) ;; and ;; (org-tmp-use-tag-parser 'original) ;; ;; two switch between versions and try them out. Or just use the ;; names with suffixes directly. See also the tests in `tag-query-tests.el'. (fset 'org-scan-tags-ORIGINAL (symbol-function 'org-scan-tags)) (fset 'org-make-tags-matcher-ORIGINAL (symbol-function 'org-make-tags-matcher)) (defvar org-tmp-which-tag-parser 'original) (defun org-tmp-use-tag-parser (&optional which) "Switch between tag query parsers. If non-nil, WHICH must be either 'new or 'original. If nil, it toggles." (setq org-tmp-which-tag-parser (or which (if (eq org-tmp-which-tag-parser 'original) 'new 'original))) (ecase org-tmp-which-tag-parser (new (fset 'org-scan-tags (symbol-function 'org-scan-tags-NEW)) (fset 'org-make-tags-matcher (symbol-function 'org-make-tags-matcher-NEW))) (original (fset 'org-scan-tags (symbol-function 'org-scan-tags-ORIGINAL)) (fset 'org-make-tags-matcher (symbol-function 'org-make-tags-matcher-ORIGINAL)))) org-tmp-which-tag-parser) ;;; org-tag-query-parse.el ends here