emacs-orgmode@gnu.org archives
 help / color / mirror / code / Atom feed
* full parser implementation for tag queries (parentheses, fast heading match, and more)
@ 2012-08-04  7:50 Christopher Genovese
  2012-08-04 10:07 ` Christopher Genovese
  0 siblings, 1 reply; 5+ messages in thread
From: Christopher Genovese @ 2012-08-04  7:50 UTC (permalink / raw)
  To: emacs-orgmode


[-- Attachment #1.1: Type: text/plain, Size: 11154 bytes --]

I am writing an application layer on top of org that uses the
entry mapping API, but I needed both negation of complex
selections and heading searches. Because the current tag query
parser does not handle parenthesized expressions, it does not
allow negating complex queries. At first, I wrote a workaround
solution that mimics/specializes the mapping API, but that
approach seemed inelegant and harder to maintain.

So instead I implemented a full parser for tag queries with a
number of useful features (see the labeled Notes at the bottom
for further comments on these features):

  1. Parenthesized expressions to arbitrary depth are allowed.
  2. A '-' can be used to negate a parenthesized term.             [Note a]
  3. Regex's in {} can contain braces escaped by doubling: {{ }}.  [Note b]
  4. Supports fast property search on HEADING and PRIORITY.        [Note c]
  5. Handles hyphens in property names properly.                   [Note
d,h]
  6. Allows only the proper comparison operators, including ==.    [Note
e,h]
  7. Allows spaces around operators and terms for readability.     [Note f]
  8. Matchers use the original expression order; not a big
     deal, but free.
  9. The error messages during parsing are reasonably helpful.
  10. Several bug fixes and a cleaner `org-make-tags-matcher'.     [Note h]

I'm submitting the code for your consideration, with the
goal of eventually incorporating this into org.el. I would be
happy to hear any comments or suggestions you have. As I'll describe
below, this involves relatively minor changes to two existing
functions and adding a few new support functions. I've attached two
files org-tag-query-parse.el (the code) and tag-query-tests.el (a
collection of tests built on a simple framework). I've also
put the files in http://www.stat.cmu.edu/~genovese/emacs/. The
comments in both files will I hope be helpful.

At the risk of going on too long, I'd like to add a few comments
about the code and tests. First, the two existing functions that
are affected in the code are `org-make-tags-matcher' and
`org-scan-tags'. In the new version of the former, I've extracted
out both kinds of query parsing, leading to a shorter and cleaner
function. The new version of the latter differs in only a couple
*very minor* places that capture two values that were already
being computed anyway (see the diff reproduced in the comments).
Btw, I'm working from the 7.8.11 code.

Loading org-tag-query-parse.el does not change the original
functions. Instead, I've added a `-NEW' to the names of these
functions and saved the originals also with a `-ORIGINAL' added.
After loading the file, you can choose a version to try by doing

    (org-tmp-use-tag-parser 'new)
and
    (org-tmp-use-tag-parser 'original)

or do (org-tmp-use-tag-parser) to toggle between versions.
You can also just use the names with suffixes directly.
I'd also suggest byte-compiling the file.

I think the place to start looking at the code is the new version
of `org-make-tags-matcher'. The main entry function for the new
parser is `org-tag-query-parse', though the real workhorse is
actually the function `org-tag-query-parse-1'. There is also a
new function `org-todo-query-parse' which just extracts the
existing todo matching method. (I didn't do anything with that
method as the manual makes it clear that it is of secondary
importance.) I think the modularity here makes
`org-make-tags-matcher' and each separate parser easier to read
and understand.

The other substantial piece (in terms of lines of code) is a utility
macro `org-match-cond' that is used throughout and makes the main
parser much more readable IMHO. Admittedly, I went a bit
overboard in optimizing it; the first version worked fine
but this one produces really nice code. I'd suggest ignoring this
code (in section "Parsing utility for readable matchers") on
first pass. The docstring is pretty complete, and its use is more
or less self-explanatory. Most of its work is done at compile time.

To run the tests, load org-tag-query-parse.el and tag-query-tests.el
and do

   (tag-test-run :results) ; use :summary for a brief summary of all runs
   (tag-test-other-tests)  ; miscellaneous other tests, including scanning

or name individual suites. They are at the moment:

   (tag-test-run :results 'org-comparison-1)  ; or use :summary
   (tag-test-run :results 'org-comparison-2)
   (tag-test-run :results 'match-results-1)
   (tag-test-run :results 'match-results-2)
   (tag-test-run :results 'should-error-1)

If you have other ideas for tests or find any bugs, please let me
know. Sorry for the homegrown framework; it just sort of grew and
then I was too tired to rewrite the tests. One complication here
is that the original and new algorithms produce different term
orders and use a few different functions. The function
tag-test-transform transforms original results to the new
algorithms conventions, but it does not handle PRIORITY or
HEADING matches at the moment. Use the tree form of the tess (see
match-results-1 for example) on these. Btw, I've run the tests on
GNU Emacs 23.2 and 24.1 (running on OS X lion).

Notes:
   a. There is no need to introduce a new character such as ! for
      negation because the semantics of the - are clear and are
      consistent with its use for tags. A - binds more tightly
      than & which in turn binds more tightly than |. A +
      selector can also be used for positive selection of a
      parenthesized term but it is equivalent to using no
      selector, just as for tags.

   b. Because \'s are so heavily used in regex's and because they
      have to be doubled in strings, using \'s for an additional
      escape layer would be messy, ambiguous, and hard to read.
      Only the {}'s need to be escaped and the doubling escapes
      {{ -> { and }} -> } are simple, readable, and fast to
      parse. For example: "+{abc\\{{3,7\\}}}" gives the regex
      "abc\\{3,7\\}". Parity makes correctness clear at a glance.

   c. Because headline (and priority) searches can be useful and
      powerful, and because the information on those fields is
      *already processed* in `org-scan-tags', we get those
      special searches *essentially for free*, requiring only two
      minor changes to `org-scan-tags'. See the unified diff in
      comments. The special PRIORITY property already exists; I
      added the special HEADING property for these purposes. I'm
      open to changing the name of course, but I do think the
      feature is both useful and elegant. (I'm using it in my
      application, for instance.)

   d. I did not see it in the manual, but I think that property names
      with hyphens should have these \-escaped -'s in the query
      string, with the escaping slashes removed in the produced
      matcher. This is not currently done, but the new version does.
      See Note h for details.

   e. It seems desirable to support both = and == as equality operators
      since the latter is so common by habit. The new version allows
      this explicitly. The original version does as well, but the
      regex for the comparison operator also allows other operators
      <<, ><, >>, =>, and >= as well, which can produce bad matchers.
      See Note h for details.

   f. Currently, spaces are ignored around &, |, the implicit & between
      terms, around the comparison operators in property searches,
      and around +/- selectors. Spaces are not ignored inside {}'s
      for a regexp match.

   g. The current code also allows +/- selectors before property
      comparisons. I don't really like this because
      +PROP<>"something" and -PROP="something" have the same
      meaning but look very different. But the new code does
      support this. As a side note, there's really no need for
      the & characters as +/- serve the and/and-not function
      completely. But again, no prob.

   h. A few bugs detected in the 7.8.11 code:

      + Faulty test for todo matcher in org-make-tags-matcher
        (string-match "/+" match)

        Ex: (org-make-tags-matcher "PROP={^\\s-*// .*$}") produces
        an erroneous matcher:

            ("PROP={^\\s-*// .*$}" progn
             (setq org-cached-props nil)
             (member "PROP" tags-list))

        For all practical purposes it will be enough to do:

         (string-match "\\(/\\(!\\)?\\s-*\\)[^{}\"]*$" match)

        instead of the current test in org-make-tags-matcher.
        This works as long as the TODO keywords do not contain a
        right brace or quotation marks. (In most other cases, the
        new parser should give an error message at parse time.)

        A technicality: this is /not/ a complete solution because
        arbitrary strings can be TODO keywords. For instance,
        both PROP={/!} and PROP="/!{/!}" are valid TODO keywords
        (it works!) *and* valid property comparisons. So, a pattern
        alone is insufficient. We want to find the first slash
        that is not enclosed in {}'s or ""'s; if found, a todo
        match is needed. The function `org-find-todo-query' does
        this and (org-find-todo-query match) can be plugged in
        directly replacing the above (string-match ...) in then
        new `org-make-tags-matcher'.

        But because the todo parsing uses {}'s for regex matches,
        TODO keywords with {}'s are ignored anyway. So there's
        no need to go beyond the fixed string-match above.
        The function `org-todo-query-parse', which handles todo
        parsing in the new version, makes this explicit.

      + Property names with -'s are not handled properly (cf. Note d)

        Specifically, the escapes are not removed. Example:
        (org-make-tags-matcher "PROP\\-WITH\\-HYPHENS=2")
        produces

        ("PROP\\-WITH\\-HYPHENS=2" and
         (progn
         (setq org-cached-props nil)
         (=
          (string-to-number
           (or (org-cached-entry-get nil "PROP\\-WITH\\-HYPHENS")
           ""))
          2))
         t)

        The original code /does/ instead remove -'s from tag
        names, which shouldn't have them anyway. I suspect that
        this was intended for property names rather than tag
        names. The new version fixes up property names but does
        not allow -'s in tags.

      + Incorrect comparison operators allowed (cf. Note e)

        The regular expression used is "[<=>]\\{1,2\\}" is used to
        detect the comparison operators. But this can produce bad
        matchers that fail opaquely at match time rather than
        giving an appropriate error message at parse time.

        Ex: (org-make-tags-matcher "P<<2") produces

         ("P<<2" and
          (progn
            (setq org-cached-props nil)
            (nil
             (string-to-number (or (org-cached-entry-get nil "P") "")) 2))
          t)

        This is fixed in the new version and delivers an error
        message at parse time.

      + missing org-re (line 7179 in org.el) with posix classes

        Minor consistency issue.  This line does not occur in the new
        code.


Thanks and regards,

   Christopher Genovese

[-- Attachment #1.2: Type: text/html, Size: 34952 bytes --]

[-- Attachment #2: org-tag-query-parse.el --]
[-- Type: application/octet-stream, Size: 53001 bytes --]

;;; org-tag-query-parse.el -- proposed full parser for tag queries

;; Copyright (C) 2012, Christopher R. Genovese, all rights reserved.

;; Author:  Christopher Genovese <genovese@cmu.edu>
;; Version: 0.9
;;
;; Created:      Sun 29 Jul 2012 at 10:04 EDT
;; Last-Updated: Fri 03 Aug 2012 at 23:52 EDT
;; Updated By:   Christopher R. Genovese
;; Keywords: org-mode, tags, query, search
;; Package-Requires: ((org-mode 7.8))


;;; Commentary:
;;
;;  The current parser for tag query searches does not handle
;;  parenthesized expressions and thus does not allow negating complex
;;  queries. This code implements a full parser for tag queries with
;;  number of useful features (see the labeled Notes below for further
;;  comments on these features):
;; 
;;   1. Parenthesized expressions to arbitrary depth are allowed.
;;   2. A '-' can be used to negate a parenthesized term.             [Note a]
;;   3. Regex's in {} can contain braces escaped by doubling: {{ }}.  [Note b]
;;   4. Supports fast property search on HEADING and PRIORITY.        [Note c]
;;   5. Handles hyphens in property names properly.                   [Note d]
;;   6. Allows only the proper comparison operators, including ==.    [Note e]
;;   7. Allows spaces around operators and terms for readability.     [Note f]
;;   8. Parse trees use the original expression order; not a big
;;      deal, but free.
;;   9. Returned parse trees are clean, without trivial operators,
;;      and error messages during parsing are reasonably helpful.
;;   10. A few bug fixes and a cleaner `org-make-tags-matcher'.       [Note h]
;;
;;  I propose that this new parser be incorporated into org.el.
;;  
;;  The two existing functions that are affected in the code are
;;  `org-make-tags-matcher' and `org-scan-tags'. In the new version of
;;  the former, I've extracted out both kinds of query parsing,
;;  leading to a shorter and cleaner function. The new version of the
;;  latter differs in only a couple *very minor* places that capture
;;  two values that were already being computed anyway (see the diff
;;  reproduced in the comments).
;;  
;;  Loading org-tag-query-parse.el does not change the original
;;  functions. Instead, I've added a `-NEW' to the names of these
;;  functions and saved the originals also with a `-ORIGINAL' added.
;;  After loading the file, you can choose a version to try by doing
;;  
;;      (org-tmp-use-tag-parser 'new)
;;  and
;;      (org-tmp-use-tag-parser 'original)
;;  
;;  or do (org-tmp-use-tag-parser) to toggle between versions.
;;  You can also just use the names with suffixes directly. 
;;  
;;  I think the place to start looking at the code is the new version
;;  of `org-make-tags-matcher'. The main entry function for the new
;;  parser is `org-tag-query-parse', though the real workhorse is
;;  actually the function `org-tag-query-parse-1'. There is also a
;;  new function `org-todo-query-parse' which just extracts the
;;  existing todo matching method. (I didn't do anything with that
;;  method as the manual makes it clear that it is of secondary
;;  importance.) I think the modularity here makes
;;  `org-make-tags-matcher' and each separate parser easier to read
;;  and understand.
;;  
;;  The other substantial piece (in terms of lines of code) is a utility
;;  macro `org-match-cond' that is used throughout and makes the main
;;  parser much more readable IMHO. Admittedly, I went a bit overboard
;;  in optimizing it; the first version worked fine but this one
;;  produces really nice code. I'd suggest ignoring this code (in
;;  section "Parsing utility for readable matchers") on first pass. The
;;  docstring is pretty complete, and its use is more or less
;;  self-explanatory.
;;  
;;  Notes:
;;    a. There is no need to introduce a new character such as ! for
;;       negation because the semantics of the - are clear and are
;;       consistent with its use for tags. A - binds more tightly
;;       than & which in turn binds more tightly than |. A +
;;       selector can also be used for positive selection of a
;;       parenthesized term but it is equivalent to using no
;;       selector, just as for tags.
;;       
;;    b. Because \'s are so heavily used in regex's and because they
;;       have to be doubled in strings, using \'s for an additional
;;       escape layer would be messy, ambiguous, and hard to read.
;;       Only the {}'s need to be escaped and the doubling escapes
;;       {{ -> { and }} -> } are simple, readable, and fast to
;;       parse. For example: "+{abc\\{{3,7\\}}}" gives the regex
;;       "abc\\{3,7\\}". Parity makes correctness clear at a glance.
;;       
;;    c. Because headline (and priority) searches can be useful and
;;       powerful, and because the information on those fields is
;;       *already processed* in `org-scan-tags', we get those
;;       special searches *essentially for free*, requiring only two
;;       minor changes to `org-scan-tags'. See the unified diff in
;;       comments. The special PRIORITY property already exists; I
;;       added the special HEADING property for these purposes. I'm
;;       open to changing the name of course, but I do think the
;;       feature is very useful.
;; 
;;    d. I did not see it in the manual, but I think that property names
;;       with hyphens should have these \-escaped -'s in the query
;;       string, with the escaping slashes removed in the produced
;;       matcher. This is not currently done, but the new version does.
;;       See Note h for details.
;; 
;;    e. It seems desirable to support both = and == as equality operators
;;       since the latter is so common by habit. The new version allows
;;       this explicitly. The original version does as well, but the
;;       regex for the comparison operator also allows other operators
;;       <<, ><, >>, =>, and >= as well, which can produce bad matchers.
;;       See Note h for details.
;; 
;;    f. Currently, spaces are ignored around &, |, the implicit & between
;;       terms, around the comparison operators in property searches,
;;       and around +/- selectors. Spaces are not ignored inside {}'s
;;       for a regexp match. Truth be told, I prefer having no spaces
;;       after the +/- selectors, but it seems somewhat...harsh to insist
;;       on that for everyone. Live and let live.
;; 
;;    g. The current code also allows +/- selectors before property
;;       comparisons. I don't really like this because
;;       +PROP<>"something" and -PROP="something" have the same
;;       meaning but look very different. But the new code does
;;       support this. As a side note, there's really no need for
;;       the & characters as +/- serve the and/and-not function
;;       completely. But again, no prob.
;; 
;;    h. A few bugs detected in the 7.8.11 code:
;; 
;;       + Faulty test for todo matcher in org-make-tags-matcher
;;         (string-match "/+" match)
;; 
;;         Ex: (org-make-tags-matcher "PROP={^\\s-*// .*$}") produces 
;;         an erroneous matcher:
;; 
;;             ("PROP={^\\s-*// .*$}" progn
;;              (setq org-cached-props nil)
;;              (member "PROP" tags-list))
;; 
;;         For all practical purposes it will be enough to do:
;;         
;;          (string-match "\\(/\\(!\\)?\\s-*\\)[^{}\"]*$" match)
;;               
;;         instead of the current test in org-make-tags-matcher.
;;         This works as long as the TODO keywords do not contain a
;;         right brace or quotation marks. (In most other cases, the
;;         new parser should give an error message at parse time.)
;;         
;;         A technicality: this is /not/ a complete solution because
;;         arbitrary strings can be TODO keywords. For instance,
;;         both PROP={/!} and PROP="/!{/!}" are valid TODO keywords
;;         (it works!) *and* valid property comparisons. So, a pattern
;;         alone is insufficient. We want to find the first slash
;;         that is not enclosed in {}'s or ""'s; if found, a todo
;;         match is needed. The function `org-find-todo-query' does
;;         this and (org-find-todo-query match) can be plugged in
;;         directly replacing the above (string-match ...) in then
;;         new `org-make-tags-matcher'.
;;         
;;         But because the todo parsing uses {}'s for regex matches,
;;         TODO keywords with {}'s are ignored anyway. So there's
;;         no need to go beyond the fixed string-match above.
;;         The function `org-todo-query-parse', which handles todo
;;         parsing in the new version, makes this explicit.
;;         
;;       + Property names with -'s are not handled properly (cf. Note d)
;;         
;;         Specifically, the escapes are not removed. Example:
;;         (org-make-tags-matcher "PROP\\-WITH\\-HYPHENS=2")
;;         produces
;;         
;;         ("PROP\\-WITH\\-HYPHENS=2" and
;;          (progn
;;          (setq org-cached-props nil)
;;          (=
;;           (string-to-number
;;            (or (org-cached-entry-get nil "PROP\\-WITH\\-HYPHENS")
;;            ""))
;;           2))
;;          t)
;;         
;;         The original code /does/ instead remove -'s from tag
;;         names, which shouldn't have them anyway. I suspect that
;;         this was intended for property names rather than tag
;;         names. The new version fixes up property names but does
;;         not allow -'s in tags.
;; 
;;       + Incorrect comparison operators allowed (cf. Note e)
;;         
;;         The regular expression used is "[<=>]\\{1,2\\}" is used to
;;         detect the comparison operators. But this can produce bad
;;         matchers that fail opaquely at match time rather than 
;;         giving an appropriate error message at parse time.
;; 
;;         Ex: (org-make-tags-matcher "P<<2") produces
;; 
;;          ("P<<2" and
;;           (progn
;;             (setq org-cached-props nil)
;;             (nil
;;              (string-to-number (or (org-cached-entry-get nil "P") "")) 2))
;;           t)
;; 
;;         This is fixed in the new version and delivers an error 
;;         message at parse time.
;; 
;;       + missing org-re (line 7179 in org.el) with posix classes
;;         
;;         Minor consistency issue.  This line does not occur in the new
;;         code.
;;
;;  What follows is a grammar for the updated tag query dsl, which is
;;  given in an informal hybrid of BNF and regex operators. Hopefully, it's
;;  clear enough. The non-obvious terminals are ALL CAPS and are listed below.
;; 
;;  Grammar:
;;    Expression  <-  Conjunction (OR Conjunction)*
;;    Conjunction <-  Term (AND? Term)*
;;    Term        <-  SELECTOR? TAG_IDENT
;;                 |  SELECTOR? LBRACE Regex RBRACE
;;                 |  SELECTOR? PROP_IDENT CMP_OP NUMBER
;;                 |  SELECTOR? PROP_IDENT CMP_OP STRING
;;                 |  SELECTOR? PROP_IDENT CMP_OP DATE_STRING
;;                 |  SELECTOR? PROP_IDENT MATCH_OP LBRACE REGEX RBRACE
;;                 |  SELECTOR? LPAREN Expression RPAREN
;;                 
;;  Terminals (the nonobvious ones):
;;    OR = |
;;    AND = &
;;    CMP_OP = (==|=|<=|<>|>=|>|<)
;;    MATCH_OP = (==|=|<>)
;;    TAG_IDENT = [A-Za-z0-9_@#%]+         
;;    PROP_IDENT = ([A-Za-z0-9_]+(\\-)*)+  
;;    SELECTOR? = [-+]?
;;    STRING = balanced double-quoted string with escapes
;;    DATE_STRING = org style date string
;;    REGEXP = regular expression with { and } doubled to escape, {{ and }}
;;


;;; License:
;;
;; This program is free software; you can redistribute it and/or
;; modify it under the terms of the GNU General Public License as
;; published by the Free Software Foundation; either version 3, or
;; (at your option) any later version.
;;
;; This program is distributed in the hope that it will be useful,
;; but WITHOUT ANY WARRANTY; without even the implied warranty of
;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
;; General Public License for more details.
;;
;; You should have received a copy of the GNU General Public License
;; along with this program; see the file COPYING.  If not, write to
;; the Free Software Foundation, Inc., 51 Franklin Street, Fifth
;; Floor, Boston, MA 02110-1301, USA.
;;


;;; Code:

(eval-when-compile
  (require 'cl))
(require 'org)


;;; Utilities

(defun org-read-balanced-string (open-delim close-delim)
  "Return string delimited by OPEN-DELIM and CLOSE-DELIM with escapes.
OPEN-DELIM and CLOSE-DELIM must be *distinct* characters. Reading
begins at point in the current buffer. To include OPEN-DELIM and
CLOSE-DELIM inside the string being read, those characters must
be *doubled* and only one copy of the character is kept in the
string. The opening and closing delimiters for the read sequence
must be single copies, and an unescaped OPEN-DELIM will raise an
error."
  (when (char-equal open-delim close-delim)
    (error "Open and close delimiters must be distinct, %c" open-delim))
  (unless (char-equal (char-after) open-delim)
    (error "Missing opening %c in delimited string" open-delim))
  (forward-char 1) ; skip initial delimiter
  (let ((delim-re (format "[%c%c]" open-delim close-delim))
        (delims-unbalanced t)
        (begin (point))
        (fragments nil)
        (ch nil))
    (while (and delims-unbalanced (re-search-forward delim-re nil t))
      (setq ch (char-before))
      (cond
        ((char-equal ch open-delim)
         (setq ch (char-after))
         (if (not (and ch (char-equal ch open-delim)))
             (error "Unescaped open delimiter %c in balanced string" open-delim)
           (push (buffer-substring-no-properties begin (1- (point))) fragments)
           (setq begin (point))
           (forward-char 1)))
        ((char-equal ch close-delim)
         (setq ch (char-after))
         (if (not (and ch (char-equal ch close-delim)))
             (setq delims-unbalanced nil)
           (push (buffer-substring-no-properties begin (1- (point))) fragments)
           (setq begin (point))
           (forward-char 1)))))
    (when delims-unbalanced
      (error "Unbalanced delimiters %c%c in balanced string at char %d."
             open-delim close-delim (point)))
    (push (buffer-substring-no-properties begin (1- (point))) fragments)
    (if (null (cdr fragments))
        (car fragments)
      (apply 'concat (nreverse fragments)))))

(defun org-read-quoted-string-in-query ()
  "Read a quoted string, with escapes, starting at point.
Assume that an opening quote has already been seen. This is just
a wrapper for `read' that reports errors nicely."
  (let ((start (point)))
    (condition-case exception
        (read (current-buffer))
      (error
       (org-tquery-error
        "badly formed quoted value in property comparison" :pos start)))))

;; I'm inclined to define an error symbols here to allow finer control.
;; But error symbols don't seem to be used in the main org code, so I'll
;; forgo them here and use `error' directly.  -- CRG 31 July 2012
(defun org-tquery-error (info &rest other-args)
  "Raise a query parsing error.
INFO is an auxilliary message which is appended to the standard
message and which is treated as a format string for OTHER-ARGS.
This need/should not be capitalized. The end of OTHER-ARGS can
contain keyword-value pairs :pos <number-or-marker> and :type
<descriptive string> to control the final error message. The
former defaults to point (position in the query string) and the
latter defaults to `tag'."
  (let* ((pos (or (cadr (memq :pos other-args)) (point)))
         (qtype (or (cadr (memq :type other-args)) "tag ")))
    (error (format "Parse error in %squery at character %d, %s"
                   qtype pos (apply 'format info other-args)))))

;; This is used to define the parser symbol table at compile time.
(defmacro org-defhash-at-compile (name options &rest body) 
  "Define a hash table NAME at compile time and/or load-time.
OPTIONS is a list, possibly empty, of options to pass to
`make-hash-table'. BODY is a list of sequences (lists or
vectors), each of which contains a key value pair. The key and
the value will be evaluated, so for example, a symbol key should
be quoted."
  (declare (indent 2))
  (let* ((docstring (and (stringp (car body)) (car body)))
         (table (if docstring (cdr body) body)))
    `(eval-and-compile
       (defvar ,name (make-hash-table ,@options) ,docstring)
       ,@(mapcar (lambda (s) `(puthash ,(elt s 0) ,(elt s 1) ,name)) table))))


;;; Regex comparison functions in a form like the other comparison operators

(defun org-string-match= (string regexp)
  (string-match-p regexp string))

(defun org-string-match<> (string regexp)
  (not (string-match-p regexp string)))


;;; Parsing utility for readable matchers

(org-defhash-at-compile org-tag-query-terminals ()
  "Lexical token regexes for tag query parsing.
Hash table also contains the symbols +, * and ? that can be used
to represent repetition operators in matched expressions,
e.g., (NUMBER)+ or (SELECTOR)\?. (? must be escaped.)"
  ['TERM-BEGIN    (org-re "[-+[:alnum:]_{(]")]
  ['SELECTOR      "[-+]"]
  ['TAG-IDENT     (org-re  "[[:alnum:]_@#%:]+")]
  ['PROP-IDENT    (org-re "\\(?:[[:alnum:]_]+\\(?:\\\\-\\)*\\)+")]
  ['CMP-OP        "\\(?:==?\\|<=\\|<>\\|>=\\|<\\|>\\)"]
  ['MATCH-OP      "\\(?:==?\\|<>\\)"]
  ['CMP-RHS-BEGIN (org-re "\\(?:{\\|\\\"\\|-?[.[:digit:]]\\)")]
  ['REGEX-OPEN    "{"]
  ['GROUP-OPEN    "("]
  ['GROUP-CLOSE   ")"]
  ['OR-OP         "|"]
  ['AND-OP        "&"]
  ['DATE-STRING   "\"[[<].*?[]>]\""]
  ['INTEGER       (org-re "-?[[:digit:]]+")]
  ['NUMBER        "-?\\(?:\\.[0-9]+\\|[0-9]+\\(\\.[0-9]*\\)?\\)\\(?:[eE][-+][0-9]+\\)?"]
  ['SPACE         "[[:blank:]]"]
  ['SPACE*        "[[:blank:]]*"]
  ['*             "*"]
  ['+             "+"]
  ['\?            "?"])

(org-defhash-at-compile org-tag-query-cmp-ops (:test 'equal)
  "Maps comparison operator strings to a vector of comparison functions.
The functions are arranged in the vector for different types as follows:
         NUM   STRING       TIME       REGEX"
  ("="  [=     string=      org-time=  org-string-match=])
  ("==" [=     string=      org-time=  org-string-match=])
  ("<>" [org<> org-string<> org-time<> org-string-match<>])
  ("<"  [<     string<      org-time<  nil])
  ("<=" [<=    org-string<= org-time<= nil])
  (">=" [>=    org-string>= org-time>= nil])
  (">"  [>     org-string>  org-time>  nil]))

(eval-and-compile
  ;; I'd prefer gensyms for this, but realistically this will do.
  (defvar org-tag-query/private-opt-ch- nil
    "This variable should not be set by the user, even locally.
It should remain nil or chaos may ensue.")

  (defun org-string-as-char-p (s)
    "Is S a string of a single, possibly backslash-escaped, character?
If so, return the character this represents, otherwise nil."
    (if (stringp s)
        (let ((len (length s)))
          (or (and (= 1 len) (string-to-char s))
              (and (= 2 len) (char-equal ?\\ (string-to-char s)) (aref s 1))))
      nil))

  (defun org-tag-query-matcher<- (patterns &optional autogroup terminals next-ch)
    "Convert a list of PATTERNS to a `looking-at' query. 
AUTOGROUP, if non-nil, turns on autogrouping of singletons.
TERMINALS, if non-nil, should be a hash table for attempted
symbol lookup. See `org-match-cond' for more details on
PATTERNS's format."
    (let* ((all-strings
            (lambda (elements)
              (let ((ptr elements))
                (while (and ptr (stringp (car ptr)))
                  (setq ptr (cdr ptr)))
                (null ptr))))
           (replace-terminal
            (lambda (item)
              (cond ((not (hash-table-p terminals)) item)
                    ((symbolp item)    (gethash item terminals item))
                    (t                 item))))
           (push-all
            (lambda (src dest) (dolist (s src dest) (push s dest))))
           (alternatives
            (let (rhs acc nogroup nocapture)
              (dolist (p patterns (progn (push (nreverse rhs) acc)
                                         (nreverse acc)))
                (cond 
                 ((eq p '|)
                  (if (null rhs)
                      (org-tquery-error "misplaced alternative marker")
                    (push (nreverse rhs) acc)
                    (setq rhs nil)))
                 ((eq p :nogroup)
                  (setq nogroup t))
                 ((eq p :nocapture)
                  (setq nocapture t))
                 ((or (consp p) (vectorp p)) ; capturing/noncapturing group resp.
                  (unless nogroup ; allow nogroup/nocapture here for consistency
                    (push (if (or nocapture (vectorp p)) "\\(?:" "\\(") rhs))
                  (setq rhs (funcall push-all (mapcar replace-terminal p) rhs))
                  (unless nogroup
                    (push "\\)" rhs))
                  (setq nogroup nil nocapture nil))
                 ((and autogroup (not nogroup))
                  (push (if (or nocapture (vectorp p)) "\\(?:" "\\(") rhs)
                  (push (funcall replace-terminal p) rhs)
                  (push "\\)" rhs)
                  (setq nogroup nil nocapture nil))
                 (t
                  (push (funcall replace-terminal p) rhs)
                  (setq nogroup nil nocapture nil))))))
           (matchers
            (mapcar (lambda (pterms)
                      (let ((fst (car pterms))
                            (sch nil))
                        (cond
                         ((and (cdr pterms) (funcall all-strings pterms))
                          `(looking-at ,(apply 'concat pterms)))
                         ((cdr pterms)
                          `(looking-at (concat ,@pterms)))
                         ((and next-ch (setq sch (org-string-as-char-p fst)))
                          `(and (char-equal ,next-ch ,sch)
                                (setq org-tag-query/private-opt-ch- ,sch)))
                         (t
                          `(looking-at ,fst)))))
                    alternatives)))
      (if (cdr matchers)
          `(or ,@matchers)
        `,(car matchers)))))

(defmacro org-match-cond (options &rest clauses)
  "Like `cond', but allows special forms in the clauses that lookahead in,
extract from, and move in the current buffer. The special forms
are described below. OPTIONS, if non-nil, is either a symbol,
SYMBOL-TABLE, bound to a hash table for symbol lookup, or a list
of the form (SYMBOL-TABLE NEXT-CHAR-SYM &rest BINDINGS). If NEXT-CHAR-SYM
is not nil, it should be a symbol that will be bound to the next character
and used to optimize the matching of especially simple patterns, as described
below. BINDINGS are standard let bindings which will visible in the
CLAUSES. CLAUSES are structured like the clauses in `cond'.

   Lookahead:
     (<-  PATTERN... [| PATTERN...]*)  -- symbol lookup in SYMBOL-HASH
     (<<- PATTERN... [| PATTERN...]*)  -- symbol lookup + singleton autogrouping
     (<<= PATTERN... [| PATTERN...]*)  -- no symbol lookup or autogrouping

     Each pattern in the list is either a string, a symbol, a
     list of strings and symbols -- representing capturing
     groups, or a vector of strings and symbols -- representing
     non-capturing groups. The |'s separate alternatives that are
     tested lazily (a la `or'); the eventual strings computed for
     the patterns in each alternative are concatenated together
     to form a regular expression which is tested with
     `looking-at'. When symbol lookup is in effect (<- and <<-
     forms), symbols in a pattern are first looked up in
     SYMBOL-TABLE if it exists, and replaced with the
     corresponding value if present. If all the patterns's in an
     alternative resolve to strings at compile time, the regular
     expression is computed at compile time and all the forms
     reduce to a single call to `looking-at'. Otherwise, the
     regular expression is computed at runtime.

     In the special case where there are no alternatives, the
     pattern is a string representing a single character at
     compile time (one character or a backslash-escaped
     character), and NEXT-CHAR-SYM is a non-nil symbol, the cond
     clause is optimized to do a character comparison rather than
     a `looking-at'. Specifically, NEXT-CHAR-SYM is bound to the
     character at point before any tests and used via
     `char-equal' for this optimized match. Movement and string
     forms with group 0, i.e., (@ end 0), ($ 0), ($$ 0), see
     below, still work as expected in this case. To suppress the
     character optimization when NEXT-CHAR-SYM is non-nil, for
     instance to match a regular expression `.', it is sufficient
     to include an empty string in the pattern list or to put the
     term in a non-capturing group..

     In the <<- form, each singleton pattern (strings or symbols)
     is automatically put in a capturing group, unless preceded by
     :nogroup (inhibiting group) or :nocapture (inhibiting capture).

   Movement:
     (@ FROM GROUP [OFFSET]) 
        Moves point to a position relative to a match GROUP,
        which should be a non-negative integer. This has no
        effect if the group did not match, except an (@ end 0)
        always moves to the end of what matched even if the
        clause was optimized into a character match. FROM can be
        either the symbol `end' or `begin', which matches the end
        or beginning of the group, or a function which is called
        with group as an argument. Optional OFFSET, if non-nil,
        is added to the specified position.

   String Extraction:
     ($ GROUP)   The matched string for group GROUP, or nil.
                 A ($ 0) always works, even with character optimization.
"
  (declare (indent 1))
  (let* ((opt-listp (consp options))
         (sym-table (if opt-listp (car options)      options))
         (next-char (if opt-listp (cadr options)     nil))
         (next-sym  (if next-char `',next-char nil))
         (bindings  (if opt-listp (nthcdr 2 options) nil)))
   `(macrolet ((<- (&rest patterns)     ; lookup
                   `,(org-tag-query-matcher<- patterns nil ,sym-table ,next-sym))
               (<<- (&rest patterns)    ; lookup + autogrouping
                    `,(org-tag-query-matcher<- patterns t ,sym-table ,next-sym))
               (<<= (&rest patterns)    ; no lookup or autogrouping
                    `,(org-tag-query-matcher<- patterns nil nil ,next-sym))
               (@ (from group &optional offset) ; move point relative to group
                  ;; Handling the character-match optimization requires
                  ;; checking if a character match was made so an (@ end 0)
                  ;; moves forward instead of referring to the match
                  ;; data. The idea here is to only check that case when
                  ;; the optimization was requested in the first place
                  ;; and then to do as much work as possible at compile
                  ;; time. Admittedly, this part has gotten a bit crazy,
                  ;; but it does produce good code. -- CRG 02 Aug 2012
                  ,(if (not next-char)
                       `(let ((ipos
                               (case from
                                 (end `(or (match-end ,group) (point)))
                                 (begin `(or (match-beginning ,group) (point)))
                                 (t `(,from ,group)))))
                          `(goto-char ,(if offset `(+ ,ipos ,offset) ipos)))
                     `(let ((ipos (case from
                                    (end
                                     `(if org-tag-query/private-opt-ch-
                                          ,(cond
                                            ((and (integerp group) (zerop group))
                                             '(min (1+ (point)) (point-max)))
                                            ((integerp group)
                                             '(point))
                                            (t
                                             `(min (+ (point)
                                                      (if (zerop ,group) 1 0))
                                                   (point-max))))
                                        (or (match-end ,group) (point))))
                                    (begin
                                     `(or (and
                                           (not org-tag-query/private-opt-ch-)
                                           (match-beginning ,group))
                                          (point)))
                                    (t `(,from ,group)))))
                        `(goto-char ,(if offset `(+ ,ipos ,offset) ipos)))))
               ($ (group)          ; string or nil if no match for group
                  ,(if (not next-char)
                       ``(match-string ,group)
                     ``(if org-tag-query/private-opt-ch-
                          ,(if (integerp group) ;; w/literal group, just do it
                               (if (zerop group) ;;  likely the common case
                                   '(string org-tag-query/private-opt-ch-) nil)
                             `(if (zerop ,group)
                                  (string org-tag-query/private-opt-ch-) nil))
                        (match-string ,group)))))
      (let (,@bindings
            ,@(if next-char (list `(,next-char (char-after))) nil)
            ,@(if next-char (list '(org-tag-query/private-opt-ch- nil)) nil))
        (cond
         ,@clauses)))))


;;; The tag query parser itself

;; Successive matches (\G-style) in a fixed string are not possible in
;; emacs (without making repeated copies of substrings) because there is
;; no way to anchor a string-match at the start position argument.
;; (I have suggested using the zero-length assertion \= in a string to
;; anchor at that position, analogous to its use in buffer searches. If
;; you're with me, spread the word.)
;;
;; So instead of marching through the string copying substrings, we
;; process the query string in a temporary buffer. This is more
;; idiomatic elisp in any case, and is quite fast and convenient it
;; turns out. Doing it the other way is possible as well, by changing
;; the org-match-cond macro and org-tag-query-parse-1 in a few places.
;; But so far I like this approach.

(defun org-tag-query-parse (query-string)
  "Convert an Org tag QUERY-STRING into a matcher lisp form.
The matcher is a lisp form"
  (cond
   ((or (not query-string) (not (string-match-p "\\S-" query-string)))
    t)
   ((string-match "^\\s-*\\([^-+A-Za-z0-9_@%#:{(/]\\)" query-string)
    (org-tquery-error "invalid characters in query string"
                      :pos (match-beginning 1)))
   (t
    (with-temp-buffer
      (insert query-string)
      (goto-char (point-min))
      (org-tag-query-parse-1)))))

(defun org-tag-query-parse-1 ()
  ;; Works in current buffer with string to be parsed starting at point.
  (let ((parse-stack nil)
        (paren-count 0)
        neg-select got-select)
    (labels
        ((emit (&rest items)
               (dolist (item items) (push item parse-stack)))
         (no-term-p ()
                    (symbolp (car parse-stack)))
         (negate-if (negate item)
                    (if negate `(not ,item) item))
         (thread (&rest iterations)
                 (dolist (_ iterations)
                   (let (entries new-entry type)
                     (while (not (symbolp (car parse-stack)))
                       (setq entries (cons (pop parse-stack) entries)))
                     (unless (and entries parse-stack)
                       (org-tquery-error "empty subexpression"))
                     (case (setq type (pop parse-stack))
                       ((and or)
                        (setq new-entry 
                              (if (cdr entries)
                                  (cons type entries)
                                (car entries))))
                       (not
                        (assert (null (cdr entries)) nil "not is unary")
                        (let ((arg (car entries)))
                          (setq new-entry
                                (if (and (consp arg) (eq (car arg) 'not))
                                    (cadr arg)
                                  `(not ,arg)))))
                       (identity
                        (assert (null (cdr entries)) nil "() is one expression")
                        (setq new-entry (car entries)))
                       (t                  ; this really shouldn't happen
                        (org-tquery-error "invalid symbol %s on stack." type)))
                     (emit new-entry))))
         (tag-check (id &optional negate)
                    (let ((check `(member ,id tags-list)))
                      (if negate `(not ,check) check)))
         (prop-check (prop &optional numericp)
                     (cond
                      ((string-equal prop "LEVEL")
                       'level)
                      ((member prop '("TODO" "HEADING" "PRIORITY"))
                       `(or ,(intern (downcase prop)) ""))
                      ((string-equal prop "CATEGORY")
                       '(or (get-text-property (point) 'org-category) ""))
                       (numericp
                        `(string-to-number
                          (or (org-cached-entry-get nil ,prop) "")))
                       (t
                        `(or (org-cached-entry-get nil ,prop) "")))))
      ;; Seed outermost expression in parse tree
      (emit 'or 'and)
      (skip-chars-forward (org-re "[:blank:]"))

      (while (not (eobp))
        ;; Process a term
        ;;   Look for the selector char first
        (org-match-cond org-tag-query-terminals
          ((<- (SELECTOR) SPACE*)
           (@ end 0)
           (setq got-select t
                 neg-select (char-equal (string-to-char ($ 1)) ?-)))
          (t
           (setq got-select nil
                 neg-select nil)))

        ;;   Now look for the rest of the term
        (org-match-cond (org-tag-query-terminals char-at-point)
         ((<- (PROP-IDENT) SPACE* (CMP-OP) SPACE* (CMP-RHS-BEGIN))
          (@ begin 3)
          (let* ((prop (save-match-data
                         (replace-regexp-in-string
                          "\\\\-" "-" ($ 1) t t)))
                 (cmp  ($ 2))
                 (indx (case (char-after)
                         (?\{ 3)
                         (?\" (org-match-cond org-tag-query-terminals
                                ((<- DATE-STRING) 2)
                                (t 1)))
                         (t 0)))
                 (op-f (aref (gethash cmp org-tag-query-cmp-ops) indx))
                 (rhs
                  (case indx
                    (0 (org-match-cond org-tag-query-terminals
                         ((<- NUMBER) (@ end 0) (string-to-number ($ 0)))
                         (t (org-tquery-error
                             "invalid number on rhs of property comparison"))))
                    (1 (org-read-quoted-string-in-query))
                    (2 (org-matcher-time (org-read-quoted-string-in-query)))
                    (3 (org-read-balanced-string ?\{ ?\}))))
                 (form (list op-f (prop-check prop (zerop indx)) rhs)))
            (unless op-f
              (org-tquery-error "invalid operator for property regexp match"))
            (emit (negate-if neg-select form))))
         ((<- TAG-IDENT)
          (@ end 0)
          (emit (tag-check ($ 0) neg-select)))
         ((<- REGEX-OPEN)
          (let ((regex (org-read-balanced-string ?\{ ?\})))
            (emit (negate-if neg-select `(org-match-any-p ,regex tags-list)))))
         ((<- GROUP-OPEN)
          (@ end 0)
          (emit (if neg-select 'not 'identity) 'or 'and)
          (incf paren-count))
         (got-select
          (org-tquery-error "trailing selector with no term"))
         ((no-term-p)
          (org-tquery-error "missing the expected term"))
         ((<- GROUP-CLOSE)    ; end of subexpression, clean up
          (@ end 0)
          (decf paren-count)
          (when (< paren-count 0) (org-tquery-error "mismatched )'s"))
          (thread 'conjunction 'disjunction 'selector))
         ((<- AND-OP)         ; continue conjunction, expect a term
          (@ end 0))                   
         ((<- OR-OP)          ; start or continue a disjunction
          (@ end 0)
          (thread 'conjunction)
          (emit 'and))
         (t
          (org-tquery-error "invalid token %c during query parse"
                            (char-after))))
        ;; Allow spaces around terms, operators, and parens
        (skip-chars-forward (org-re "[:blank:]")))

      (unless (zerop paren-count)
        (org-tquery-error "missing )s in query string"))

      ;; Build the final parse tree by threading the stack
      (while (cdr parse-stack)
        (thread 'any))

      `(progn
         (setq org-cached-props nil)
         ,(car parse-stack)))))


;;; Modified `org-make-tags-matcher' and `org-scan-tags' that use the new parser

;;; The main change to `org-make-tags-matcher' is to insert the new tag
;;; parser, but this allowed me to shorten and clean up the code, fix
;;; one very minor bug (see NOTE below), and update the docstring.
;;; I've also separated out the todo parsing into a separate function
;;; for clarity and symmetry, though I've left the method as is.
;;; 
;;; The changes to `org-scan-tags' are minor and essentially free, and 
;;; I include the context diff in comments below.
;;;
;;; These both are drawn from the 7.8.11 code. To facilitate testing,
;;; I've added an -NEW to the names of these functions for the moment,
;;; but that should be changed if they are used. The temporary function
;;; `org-tmp-use-tag-parser' allows switching between old and new for
;;; testing. See also the file `tag-query-tests.el' that runs some basic
;;; tests.

(defun org-make-tags-matcher-NEW (match)
  "Create the TAGS/TODO matcher form for the selection string MATCH.

The variable `todo-only' is scoped dynamically into this
function; it will be set to t if the matcher restricts matching
to TODO entries, otherwise will not be touched.

Returns a cons of the selection string MATCH and the constructed
lisp form implementing the matcher. The matcher is to be
evaluated at an Org entry, with point on the headline, and
returns t if the entry matches the selection string MATCH. The
returned lisp form may reference four variables with information
about the entry, which must be bound around the form's
evaluation: todo, the TODO keyword at the entry (or nil of none);
heading, the text of the heading for the entry; priority, the
priority cookie for the entry or nil; and tags-list, the list of
all tags at the entry including inherited ones. Additionally, the
category of the entry (if any) must be specified as the text
property 'org-category on the headline.

See also `org-scan-tags'.
"
  (declare (special todo-only))
  (unless (boundp 'todo-only)
    (error "org-make-tags-matcher expects todo-only to be scoped in"))
  (unless match
    ;; Get a new match request, with completion
    (let ((org-last-tags-completion-table
	   (org-global-tags-completion-table)))
      (setq match (org-completing-read-no-i
		   "Match: " 'org-tags-completion-function nil nil nil
		   'org-tags-history))))

  ;; Parse the string and create a lisp form
  (let ((match0 match)
	tagsmatch todomatch)
    (if (string-match "\\(/\\(!\\)?\\s-*\\)[^{}\"]*$" match) ; See Note h above
	;; match contains also a todo-matching request
	(progn
	  (setq tagsmatch (substring match 0 (match-beginning 1))
		todomatch (substring match (match-end 1)))
	  (if (match-end 2)
	      (setq todo-only t))
	  (if (= (match-end 0) (match-end 1)) ; the space* is greedy
	      (setq todomatch nil)))
      ;; only matching tags
      (setq tagsmatch match todomatch nil))
    (let* ((tagsmatcher (org-tag-query-parse tagsmatch))
           (todomatcher (org-todo-query-parse todomatch))
           (matcher (if (eq todomatcher t) ;; NOTE: original kept (and ... t) when no todo matcher -- CRG 31 Jul 2012
                        tagsmatcher
                      (list 'and tagsmatcher todomatcher))))
      (when todo-only
        (setq matcher (list 'and
                            '(member todo org-not-done-keywords)
                            matcher)))
      ;; Return the string and lisp forms of the matcher
      (cons match0 matcher))))

(defun org-todo-query-parse (query-string)
  (if (or (not query-string) (not (string-match "\\S-" query-string)))
      t
    (let ((orterms (org-split-string query-string "|"))
          (orlist nil)
          (todomatcher nil)
          (re (org-re "^&?\\([-+:]\\)?\\({[^}]*}\\|[^-+\"{}&|]+\\)"))
          term minus kwd re-p mm)
      (while (setq term (pop orterms))
        (while (string-match re term)
          (setq minus (and (match-end 1)
                           (equal (match-string 1 term) "-"))
                kwd (match-string 2 term)
                re-p (equal (string-to-char kwd) ?{)
                term (substring term (match-end 0))
                mm (if re-p
                       `(string-match  ,(substring kwd 1 -1) todo)
                     (list 'equal 'todo kwd))
                mm (if minus (list 'not mm) mm))
          (push mm todomatcher))
        (push (if (> (length todomatcher) 1)
                  (cons 'and todomatcher)
                (car todomatcher))
              orlist)
        (setq todomatcher nil))
      (if (> (length orlist) 1)
          (cons 'or orlist) (car orlist)))))

;; The changes to org-scan-tags are minor and essentially free.
;; A diff -U 2 against org.el from 7.8.11 with only this function
;; changed follows.
;;
;; --- org.el      2012-07-31 15:32:17.000000000 -0400
;; +++ modified-org.el     2012-07-31 15:20:56.000000000 -0400
;; @@ -12830,5 +12830,5 @@
;;                      " *\\(\\<\\("
;;                      (mapconcat 'regexp-quote org-todo-keywords-1 "\\|")
;; -                    (org-re "\\)\\>\\)? *\\(.*?\\)\\(:[[:alnum:]_@#%:]+:\\)?[ \t]*$")))
;; +                    (org-re "\\)\\>\\)?[ \t]*\\(?:\\[#\\(.\\)\\]\\)?[ \t]*\\(.*?\\)\\(:[[:alnum:]_@#%:]+:\\)?[ \t]*$")))
;;          (props (list 'face 'default
;;                       'done-face 'org-agenda-done
;; @@ -12848,5 +12848,5 @@
;;          (tags-alist (list (cons 0 org-file-tags)))
;;          (llast 0) rtn rtn1 level category i txt
;; -        todo marker entry priority)
;; +        todo marker entry heading priority priority-num)
;;      (when (not (or (member action '(agenda sparse-tree)) (functionp action)))
;;        (setq action (list 'lambda nil action)))
;; @@ -12860,5 +12860,7 @@
;;         (catch :skip
;;           (setq todo (if (match-end 1) (org-match-string-no-properties 2))
;; -               tags (if (match-end 4) (org-match-string-no-properties 4)))
;; +                priority (if (match-end 3) (org-match-string-no-properties 3))
;; +                heading (org-match-string-no-properties 4)
;; +               tags (if (match-end 5) (org-match-string-no-properties 5)))
;;           (goto-char (setq lspos (match-beginning 0)))
;;           (setq level (org-reduced-level (funcall outline-level))
;; @@ -12938,5 +12940,5 @@
;;                          tags-list
;;                          )
;; -                   priority (org-get-priority txt))
;; +                   priority-num (org-get-priority txt))
;;               (goto-char lspos)
;;               (setq marker (org-agenda-new-marker))
;; @@ -12944,5 +12946,5 @@
;;                 'org-marker marker 'org-hd-marker marker 'org-category category
;;                 'todo-state todo
;; -               'priority priority 'type "tagsmatch")
;; +               'priority priority-num 'type "tagsmatch")
;;               (push txt rtn))
;;              ((functionp action)
;; 

(defun org-scan-tags-NEW (action matcher todo-only &optional start-level)
  "Scan headline tags with inheritance and produce output ACTION.

ACTION can be `sparse-tree' to produce a sparse tree in the current buffer,
or `agenda' to produce an entry list for an agenda view.  It can also be
a Lisp form or a function that should be called at each matched headline, in
this case the return value is a list of all return values from these calls.

MATCHER is a Lisp form to be evaluated, testing if a given set of tags
qualifies a headline for inclusion.  When TODO-ONLY is non-nil,
only lines with a not-done TODO keyword are included in the output.
This should be the same variable that was scoped into
and set by `org-make-tags-matcher' when it constructed MATCHER.

START-LEVEL can be a string with asterisks, reducing the scope to
headlines matching this string."
  (require 'org-agenda)
  (let* ((re (concat "^"
		     (if start-level
			 ;; Get the correct level to match
			 (concat "\\*\\{" (number-to-string start-level) "\\} ")
		       org-outline-regexp)
		     " *\\(\\<\\("
		     (mapconcat 'regexp-quote org-todo-keywords-1 "\\|")
		     (org-re "\\)\\>\\)?[ \t]*\\(?:\\[#\\(.\\)\\]\\)?[ \t]*\\(.*?\\)\\(:[[:alnum:]_@#%:]+:\\)?[ \t]*$")))
	 (props (list 'face 'default
		      'done-face 'org-agenda-done
		      'undone-face 'default
		      'mouse-face 'highlight
		      'org-not-done-regexp org-not-done-regexp
		      'org-todo-regexp org-todo-regexp
		      'org-complex-heading-regexp org-complex-heading-regexp
		      'help-echo
		      (format "mouse-2 or RET jump to org file %s"
			      (abbreviate-file-name
			       (or (buffer-file-name (buffer-base-buffer))
				   (buffer-name (buffer-base-buffer)))))))
	 (case-fold-search nil)
	 (org-map-continue-from nil)
         lspos tags tags-list
	 (tags-alist (list (cons 0 org-file-tags)))
	 (llast 0) rtn rtn1 level category i txt
	 todo marker entry heading priority priority-num)
    (when (not (or (member action '(agenda sparse-tree)) (functionp action)))
      (setq action (list 'lambda nil action)))
    (save-excursion
      (goto-char (point-min))
      (when (eq action 'sparse-tree)
	(org-overview)
	(org-remove-occur-highlights))
      (while (re-search-forward re nil t)
	(setq org-map-continue-from nil)
	(catch :skip
	  (setq todo (if (match-end 1) (org-match-string-no-properties 2))
                priority (if (match-end 3) (org-match-string-no-properties 3))
                heading (org-match-string-no-properties 4)
		tags (if (match-end 5) (org-match-string-no-properties 5)))
	  (goto-char (setq lspos (match-beginning 0)))
	  (setq level (org-reduced-level (funcall outline-level))
		category (org-get-category))
	  (setq i llast llast level)
	  ;; remove tag lists from same and sublevels
	  (while (>= i level)
	    (when (setq entry (assoc i tags-alist))
	      (setq tags-alist (delete entry tags-alist)))
	    (setq i (1- i)))
	  ;; add the next tags
	  (when tags
	    (setq tags (org-split-string tags ":")
		  tags-alist
		  (cons (cons level tags) tags-alist)))
	  ;; compile tags for current headline
	  (setq tags-list
		(if org-use-tag-inheritance
		    (apply 'append (mapcar 'cdr (reverse tags-alist)))
		  tags)
		org-scanner-tags tags-list)
	  (when org-use-tag-inheritance
	    (setcdr (car tags-alist)
		    (mapcar (lambda (x)
			      (setq x (copy-sequence x))
			      (org-add-prop-inherited x))
			    (cdar tags-alist))))
	  (when (and tags org-use-tag-inheritance
		     (or (not (eq t org-use-tag-inheritance))
			 org-tags-exclude-from-inheritance))
	    ;; selective inheritance, remove uninherited ones
	    (setcdr (car tags-alist)
		    (org-remove-uninherited-tags (cdar tags-alist))))
	  (when (and

		 ;; eval matcher only when the todo condition is OK
		 (and (or (not todo-only) (member todo org-not-done-keywords))
		      (let ((case-fold-search t)) (eval matcher)))

		 ;; Call the skipper, but return t if it does not skip,
		 ;; so that the `and' form continues evaluating
		 (progn
		   (unless (eq action 'sparse-tree) (org-agenda-skip))
		   t)

		 ;; Check if timestamps are deselecting this entry
		 (or (not todo-only)
		     (and (member todo org-not-done-keywords)
			  (or (not org-agenda-tags-todo-honor-ignore-options)
			      (not (org-agenda-check-for-timestamp-as-reason-to-ignore-todo-item)))))

		 ;; Extra check for the archive tag
		 ;; FIXME: Does the skipper already do this????
		 (or
		  (not (member org-archive-tag tags-list))
		  ;; we have an archive tag, should we use this anyway?
		  (or (not org-agenda-skip-archived-trees)
		      (and (eq action 'agenda) org-agenda-archives-mode))))

	    ;; select this headline

	    (cond
	     ((eq action 'sparse-tree)
	      (and org-highlight-sparse-tree-matches
		   (org-get-heading) (match-end 0)
		   (org-highlight-new-match
		    (match-beginning 1) (match-end 1)))
	      (org-show-context 'tags-tree))
	     ((eq action 'agenda)
	      (setq txt (org-agenda-format-item
			 ""
			 (concat
			  (if (eq org-tags-match-list-sublevels 'indented)
			      (make-string (1- level) ?.) "")
			  (org-get-heading))
			 category
			 tags-list
			 )
		    priority-num (org-get-priority txt))
	      (goto-char lspos)
	      (setq marker (org-agenda-new-marker))
	      (org-add-props txt props
		'org-marker marker 'org-hd-marker marker 'org-category category
		'todo-state todo
		'priority priority-num 'type "tagsmatch")
	      (push txt rtn))
	     ((functionp action)
	      (setq org-map-continue-from nil)
	      (save-excursion
		(setq rtn1 (funcall action))
		(push rtn1 rtn)))
	     (t (error "Invalid action")))

	    ;; if we are to skip sublevels, jump to end of subtree
	    (unless org-tags-match-list-sublevels
	      (org-end-of-subtree t)
	      (backward-char 1))))
	;; Get the correct position from where to continue
	(if org-map-continue-from
	    (goto-char org-map-continue-from)
	  (and (= (point) lspos) (end-of-line 1)))))
    (when (and (eq action 'sparse-tree)
	       (not org-sparse-tree-open-archived-trees))
      (org-hide-archived-subtrees (point-min) (point-max)))
    (nreverse rtn)))


;;; Extras

;; See Note h above. Though it gives a full solution to finding
;; the todo matcher, it is likely not needed in practice, and
;; unless/until that changes, will be removed from the final code. 

(defun org-find-todo-query (query-string)
  "Does query string contain a todo match expression? 
Search for the first / that is not between quotes or braces, and
return the index of that character if found, or nil.
Set match data for QUERY-STRING so that group 0 spans from the
found / to the end of the string, group 1 matches \"/!?\\s-*\" at
the found /, and group 2 matches the ! if present."
  (with-temp-buffer
    (insert query-string)
    (goto-char (point-min))
    ;; Search for first / that is not between ""'s or {}'s
    (catch :found-slash
      (while (re-search-forward "\\(/\\(!\\)?\\s-*\\)\\|[\"{]" nil t)
        (when (match-end 1)
          (set-match-data  
           (mapcar '1-    ; set indices using string convention
                   (nconc (list (match-beginning 0) (point-max)    ;0 / to end
                                (match-beginning 1) (match-end 1)) ;1 /!?\\s-*
                          (if (match-end 2)                        ;2 !?
                              (list (match-beginning 2) (match-end 2))
                            nil))))
          (throw :found-slash (1- (match-beginning 1))))
        (goto-char (match-beginning 0))
        (case (char-after)
          (?\" (org-read-quoted-string-in-query))
          (?\{ (org-read-balanced-string ?\{ ?\}))))
      nil)))

;; Temporary code to help with interactive testing
;;
;;  I've added a `-NEW' to the names of the modified functions and save
;;  the originals belowo with a `-ORIGINAL' added. After loading this
;;  file, you can do
;;  
;;      (org-tmp-use-tag-parser 'new)
;;  and
;;      (org-tmp-use-tag-parser 'original)
;;  
;;  two switch between versions and try them out. Or just use the
;;  names with suffixes directly. See also the tests in `tag-query-tests.el'.


(fset 'org-scan-tags-ORIGINAL (symbol-function 'org-scan-tags))
(fset 'org-make-tags-matcher-ORIGINAL (symbol-function 'org-make-tags-matcher))

(defvar org-tmp-which-tag-parser 'original)
(defun org-tmp-use-tag-parser (&optional which)
  "Switch between tag query parsers. 
If non-nil, WHICH must be either 'new or 'original. If nil, it toggles."
  (setq org-tmp-which-tag-parser
        (or which (if (eq org-tmp-which-tag-parser 'original) 'new 'original)))
  (ecase org-tmp-which-tag-parser
    (new
     (fset 'org-scan-tags (symbol-function 'org-scan-tags-NEW))
     (fset 'org-make-tags-matcher (symbol-function 'org-make-tags-matcher-NEW)))
    (original
     (fset 'org-scan-tags (symbol-function 'org-scan-tags-ORIGINAL))
     (fset 'org-make-tags-matcher (symbol-function 'org-make-tags-matcher-ORIGINAL))))
  org-tmp-which-tag-parser)


;;; org-tag-query-parse.el ends here

[-- Attachment #3: tag-query-tests.el --]
[-- Type: application/octet-stream, Size: 21362 bytes --]

;first=>; (load "org-tag-query-parse.el")

(eval-when-compile
  (require 'cl))

;; A very rudimentary test framework

;;; Comparing org-tag-query-parser and org-make-tag-matcher is
;;; complicated by the different ordering of leaves in the trees.
;;; Specifically, the former puts the terms in the given order,
;;; but the latter (the existing org code) reverses the terms.
;;; Parsing the string to reverse would require testing the secondary
;;; parser and turtles all the way down.
;;;
;;; Two approaches then: specify the strings manually in pairs, or
;;; define transform that accounts for the differences. Here
;;; `tag-test-suite' mostly takes the former approach, unless only one
;;; string is given in which case it uses `tag-test-transform' to remap
;;; the original forms. The function `tag-test-transform' tansforms the
;;; existing forms into new forms except it ignores PRIORITY and HEADING
;;; queries which are treated differently in the new code.

(defun tag-test-transform (matcher)
  (let ((spec
         (if (and (eq (cadr matcher) 'and)
                  (eq (car (last matcher)) t))
             (cons (car matcher) (car (cddr matcher)))
           matcher)))
    (if (listp (cdr spec))
        (mapcar 'tag-test-transform-1 spec)
      spec)))

(defun tag-test-transform-1 (spec)
  (if (atom spec)
      spec
    (case (car spec)
      (and
       (cons 'and (nreverse (mapcar 'tag-test-transform-1 (cdr spec)))))
      (or
       (if (and (null (nthcdr 3 spec))
                (equal (car (cddr spec)) ""))
           spec
         (cons 'or (nreverse (mapcar 'tag-test-transform-1 (cdr spec))))))
      (not
       (if (eq (car (cadr spec)) 'string-match)
           (list 'org-string-match<>
                 (car (cddr (cadr spec)))
                 (cadr (cadr spec)))
         spec))
      (string-match
       (list 'org-string-match= (car (cddr spec)) (cadr spec)))
      (t
       (cons (tag-test-transform-1 (car spec))
             (mapcar 'tag-test-transform-1 (cdr spec)))))))

(defun tag-test-m (query &optional originalp)
  "Call `org-make-tags-matcher' on QUERY. 
New version by default, original version if ORIGINALP is non-nil."
  (let ((todo-only nil))
    (funcall
      (if originalp
          'org-make-tags-matcher-ORIGINAL
        'org-make-tags-matcher-NEW)
      query)))

(defun tag-test-parse-tree (query)
  "Return (just) the parse tree for query produced by `org-tag-query-parser'"
  (car (nthcdr 2 (org-tag-query-parse query))))
;; useful at repl: (defun tq (query) (cons query (tag-test-parse-tree query)))


(defun tag-test-compare (new original)
  (if (and (or (eq (cdr new) t) (eq (cadr new) 'progn))
           (eq (cadr original) 'and)
           (eq (car (last original)) t))
      (equal `(and ,(cdr new) t) (cdr original))
    (equal (cdr new) (cdr original))))
;; formerly returned (and __ (format "%s<==>%s" (car new) (car original)))

(org-defhash-at-compile tag-test-suite-table ()
  "Mapping from test name symbols to tag test functions. Each
function takes an optional argument, which if non-nil, causes a
simple boolean summary to be returned. Otherwise, the function
returns the list of results forms. Call with `tag-test-run'
giving name and optional summarize argument.")

(defmacro tag-test-suite (name &rest body)
  "Register test NAME. If NAME is nil, do not save the test, run it
now with summarize argument t."
  (declare (indent 1))
  (flet ((test-it (spec)
                  (let ((query (car spec))
                        (obj   (cdr spec)))
                    (cond
                     ((null obj)
                      `(let ((todo-only nil)) ; needs to be scoped in
                         (condition-case exception
                             (cons
                              (equal
                               (org-make-tags-matcher-NEW ,query)
                               (tag-test-transform
                                (org-make-tags-matcher-ORIGINAL ,query)))
                              ,query)
                           (error (cons nil
                                        (format "ERROR on query %s: %s"
                                                ,query (cadr exception)))))))
                     ((stringp obj)
                      `(let ((todo-only nil)) ; needs to be scoped in
                         (condition-case exception
                             (cons
                              (tag-test-compare
                               (org-make-tags-matcher-NEW ,query)
                               (org-make-tags-matcher-ORIGINAL ,obj))
                              ,query)
                           (error (cons nil
                                        (format "ERROR on query %s: %s"
                                                ,query (cadr exception)))))))
                     ((eq obj 'error)
                      `(condition-case exception
                           (let ((todo-only nil)) ; needs to be scoped in
                             (org-make-tags-matcher-NEW ,query)
                             (cons nil ,query))
                         (error (cons (cadr exception) ,query))))
                     ((eq obj t)
                      `(condition-case exception
                           (cons
                            (equal (tag-test-m ,query)
                                   '(,query . t))
                            ,query)
                         (error (cons nil
                                        (format "ERROR on query %s: %s"
                                                ,query (cadr exception))))))
                     ((and (consp obj)
                           (string-match-p "\\(/\\(!\\)?\\s-*\\)[^{}\"]*$"
                                           query))
                      `(condition-case exception
                           (cons
                            (equal (tag-test-m ,query)
                                   ',(cons query (car obj)))
                            ,query)
                         (error (cons nil
                                        (format "ERROR on query %s: %s"
                                                ,query (cadr exception))))))
                     ((consp obj)
                      `(condition-case exception
                           (cons
                            (equal (tag-test-m ,query)
                                   '(,query progn
                                            (setq org-cached-props nil)
                                            ,(car obj)))
                            ,query)
                         (error (cons nil
                                        (format "ERROR on query %s: %s"
                                                ,query (cadr exception))))))
                     (t (error "Unrecognized test type"))))))
    (let* ((result-forms (mapcar 'test-it body))
           (test `(lambda (&optional summarize?)
                    (let ((tests (list ,@result-forms)))
                      (if summarize?
                          (let ((outcome (catch :failure
                                           (dolist (result tests t)
                                             (unless (car result)
                                               (throw :failure nil))))))
                            (if outcome
                                (message "All %s tests passed." ',name)
                              (message "Some %s tests failed." ',name))
                            outcome)
                        (list ,@result-forms))))))
      (if name
          `(progn (puthash ',name ,test tag-test-suite-table) t)
        `(funcall ,test t)))))

(defun tag-test-run (summarize? &rest suite-symbols)
  (let ((summarize (and summarize? (not (memq summarize? '(:show :results)))))
        (results nil)
        (suites (if suite-symbols
                    suite-symbols
                  (loop for key being the hash-keys of tag-test-suite-table
                        collect key))))
    (dolist (suite suites)
      (condition-case except
          (let ((test (gethash suite tag-test-suite-table)))
            (if (and test (functionp test))
                (push (cons suite (funcall test summarize)) results)
              (push (cons suite
                          (cons nil (format "Test suite %s not found" suite)))
                    results)))
        (error (push (format "Uncaught error on suite %s: %s"
                             suite except)
                     results))))
    (if (cdr results) (nreverse results) (car results))))


;;
;; Some Basic Tests
;;
;; Keep in mind term reversal mentioned above
;;

(tag-test-suite org-comparison-1
  ("")
  ("foo")
  ("-foo")
  ("{^HU+RRAY}")
  ("-{^BO*O!}")
  ("LEVEL<3")
  ("LEVEL>3")
  ("LEVEL<=3")
  ("LEVEL>=3")
  ("LEVEL=3")
  ("LEVEL<>3")
  ("TODO=\"WAIT\"")
  ("TODO<>\"WAIT\"")
  ("A_PROP<\"foo\"")
  ("A_PROP>\"foo\"")
  ("A_PROP<=\"foo\"")
  ("A_PROP>=\"foo\"")
  ("A_PROP<>\"foo\"")
  ("A_PROP=\"foo\"")
  ("A_DATE=\"<2008-12-24 18:30>\"")
  ("A_DATE<\"<2008-12-24 18:30>\"")
  ("A_DATE<=\"<2008-12-24 18:30>\"")
  ("A_DATE>=\"<2008-12-24 18:30>\"")
  ("A_DATE>\"<2008-12-24 18:30>\"")
  ("A_DATE<>\"<2008-12-24 18:30>\"")
  ("DEADLINE<>\"<-2d>\"")
  ("DEADLINE=\"<+1w>\"")
  ("DEADLINE>\"<+60m>\"")
  ("DEADLINE<\"<today>\"")
  ("SCHEDULED>=\"<tomorrow>\"")
  ("SCHEDULED<=\"<+2y>\"")
  ("CATEGORY=\"foo\"")
  ("foo+PROP=\"A\"+Z={abc}-bar")
  ("foo+bar+zap")
  ("foo-bar+zap")
  ("-foo+bar-zap")
  ("foo|bar|-zap")
  ("-foo+bar+zap-{^a.*}")
  ("-{^abc}+{^a}")
  ("-{[0-9]}|zap9@#%it|LEVEL>5")
  ("{^A}|{^.B}|{^C}")
  ("{^A}|ok-zap|{^C}")
  ("work+TODO=\"WAITING\"|home+TODO=\"WAITING\"")
  ("work-TODO=\"WAITING\"|home&TODO=\"WAITING\"")
  ("zap-bar+foo&LEVEL>2")
  ("-zap+bar|LEVEL<=2&TODO<>\"WAIT\"")
  ("+work-boss+PRIORC=\"A\"+Coffee=\"unlimited\"+Effort<2+With={Sarah\\|Denny}+SCHEDULED>=\"<2008-10-11>\"")
  )

;; Some of these are the same as above, but I've explicitly reversed
;; the terms to make sure that the transformer is not masking any problems.
;; This necessarily excludes the other transformations so it is really a meta-test.
;; The rest include todo matchers in the comparison.
(tag-test-suite org-comparison-2
  ("foo+bar+zap" . "zap+bar+foo")
  ("foo-bar+zap" . "zap-bar+foo")
  ("-foo+bar-zap" . "-zap+bar-foo")
  ("foo|bar|-zap" . "-zap|bar|foo")
  ("-foo+bar+zap-{^a.*}" . "-{^a.*}+zap+bar-foo")
  ("-{^abc}+{^a}" . "{^a}-{^abc}")
  ("-{[0-9]}|zap9@#%it|LEVEL>5" . "LEVEL>5|zap9@#%it|-{[0-9]}")
  ("{^A}|{^.B}|{^C}" . "{^C}|{^.B}|{^A}")
  ("{^A}|ok-zap|{^C}" . "{^C}|-zap+ok|{^A}")
  ("work+TODO=\"WAITING\"|home+TODO=\"WAITING\"" . "TODO=\"WAITING\"+home|TODO=\"WAITING\"+work")
  ("work-TODO=\"WAITING\"|home&TODO=\"WAITING\"" . "TODO=\"WAITING\"&home|-TODO=\"WAITING\"+work")
  ("zap -bar   +foo & LEVEL > 2" . "LEVEL>2&foo-bar+zap")
  ("-zap+bar | LEVEL <= 2 & TODO <> \"WAIT\"" . "TODO<>\"WAIT\"&LEVEL<=2|bar-zap")
  ("foo+bar/TODO+WAIT-DONE" . "bar+foo/TODO+WAIT-DONE")
  ("foo+bar/!TODO+WAIT" . "bar+foo/!TODO+WAIT"))

(tag-test-suite match-results-1
  ("" . t)
  ("A_PROP={^0x[0-9A-F]+}"
   (org-string-match=
    (or (org-cached-entry-get nil "A_PROP") "") "^0x[0-9A-F]+"))
  ("A_PROP<>{^[A-Z]+}"
   (org-string-match<>
    (or (org-cached-entry-get nil "A_PROP") "") "^[A-Z]+"))
  ("(a+b-c|A_PROP==2|-(d-e+f&LEVEL>3))"
   (or
    (and
     (member "a" tags-list)
     (member "b" tags-list)
     (not (member "c" tags-list)))
    (= (string-to-number (or (org-cached-entry-get nil "A_PROP") "")) 2)
    (not (and
          (member "d" tags-list)
          (not (member "e" tags-list))
          (member "f" tags-list)
          (> level 3)))))
  ("((c))" (member "c" tags-list))
  ("((-(((-(c))))))" (member "c" tags-list))
  ("((-(-((-(c))))))" (not (member "c" tags-list)))
  ("-(zap -bar   +foo & LEVEL > 2 | HEADING == {Z{{3,7}}})"
   (not (or (and
             (member "zap" tags-list)
             (not (member "bar" tags-list))
             (member "foo" tags-list)
             (> level 2))
            (org-string-match= (or heading "") "Z{3,7}"))))
  ("zap -bar   +foo & LEVEL > 2" 
   (and 
    (member "zap" tags-list)
    (not (member "bar" tags-list))
    (member "foo" tags-list)
    (> level 2)))
  ("-zap+bar | LEVEL <= 2 & TODO <> \"WAIT\"" 
   (or
    (and
     (not (member "zap" tags-list))
     (member "bar" tags-list))
    (and
     (<= level 2)
     (org-string<> (or todo "")  "WAIT"))))
  ("-(zap|{^A}|LEVEL=2)"
   (not (or
         (member "zap" tags-list)
         (org-match-any-p "^A" tags-list)
         (= level 2))))
  ("/!TODO"
   (and
    (member todo org-not-done-keywords)
    (and t (equal todo "TODO"))))
  ("/TODO"
   (and t (equal todo "TODO")))
  ("/!"
   (and (member todo org-not-done-keywords) t))
  ("abc-uvw+xyz/!TODO"
   (and
    (member todo org-not-done-keywords)
    (and
     (progn
       (setq org-cached-props nil)
       (and
        (member "abc" tags-list)
        (not (member "uvw" tags-list))
        (member "xyz" tags-list)))
     (equal todo "TODO"))))
  ("abc-uvw+xyz+LEVEL<=3/TODO|HOLDING-WAITING|AVOIDING-REALLY_AVOIDING" 
   (and
    (progn
      (setq org-cached-props nil)
      (and
       (member "abc" tags-list)
       (not (member "uvw" tags-list))
       (member "xyz" tags-list)
       (<= level 3)))
    (or
     (and
      (not (equal todo "REALLY_AVOIDING"))
      (equal todo "AVOIDING"))
     (and
      (not (equal todo "WAITING"))
      (equal todo "HOLDING"))
     (equal todo "TODO")))))

(tag-test-suite match-results-2
  ("((a +b -c & LEVEL > 2 | c & LEVEL == 1)-(HEADING={<NOTES>}|TODO=\"DONE\")|urgent|_queue&DEADLINE>\"<2012-01-01 04:00>\")"
   (or
    (and
     (or (and
          (member "a" tags-list)
          (member "b" tags-list)
          (not (member "c" tags-list))
          (> level 2))
         (and
          (member "c" tags-list)
          (= level 1)))
     (not
      (or (org-string-match= (or heading "") "<NOTES>")
          (string= (or todo "") "DONE"))))
    (member "urgent" tags-list)
    (and
     (member "_queue" tags-list)
     (org-time> (or (org-cached-entry-get nil "DEADLINE") "") 1325394000.0))))
  ("+you+me-them&PRIORITY==\"A\"+CATEGORY<>\"missing\""
   (and
    (member "you" tags-list)
    (member "me" tags-list)
    (not (member "them" tags-list))
    (string= (or priority "") "A")
    (org-string<>
     (or (get-text-property (point) 'org-category) "") "missing")))
  ("+you+me-them & PRIORITY == \"A\" + CATEGORY <> \"missing\""
   (and
    (member "you" tags-list)
    (member "me" tags-list)
    (not (member "them" tags-list))
    (string= (or priority "") "A")
    (org-string<> (or (get-text-property (point) 'org-category) "") "missing")))
  ("+you+me-them & PRIORITY < \"A\" | CATEGORY <> \"missing\" + us | HEADING={\\(?:[Ss]ecret\\){{1,3}}}"
   (or
    (and
     (member "you" tags-list)
     (member "me" tags-list)
     (not (member "them" tags-list))
     (string< (or priority "") "A"))
    (and
     (org-string<> (or (get-text-property (point) 'org-category) "")
                   "missing")
     (member "us" tags-list))
    (org-string-match= (or heading "") "\\(?:[Ss]ecret\\){1,3}")))
  ("PROP\\-WITH\\-HYPHENS=2"
   (=
    (string-to-number
     (or
      (org-cached-entry-get nil "PROP-WITH-HYPHENS")
      ""))
    2))
  ("PROP={^\\s-*// .*$}"
   (org-string-match=
    (or (org-cached-entry-get nil "PROP") "")
    "^\\s-*// .*$"))
  )

(tag-test-suite should-error-1
  ("()" . error) ; we might just do t here, but the parens suggest an error
  ("(&foo+LEVEL=1)" . error)
  ("(foo+LEVEL=1" . error)
  ("(foo+LEVEL=1))" . error)
  ("(abc+)" . error)
  ("abc+xyz!" . error)
  ("(missing+paren" . error)            
  ("PROP={.*closing brace?" . error) 
  ("PROP=\"abc" . error)
  ("P<<2-bad+cmp+op" . error)
  ("P<={^foo}" . error)
  ("PROP<-.dx" . error)           ; bad number
  ("PROP={^\\s-*// .*$}/A{}B" . error) ; See Note h for why this should raise an error
  ("!?;" . error)
  ("   [x]" . error)
  )

;;; Miscellaneous Other tests -- really need a complete framework here.

(defun tag-test-find-todo-query (s)
  (if (org-find-todo-query s)
      (vector (substring s 0 (match-beginning 1))
              (substring s (match-end 1))
              (match-end 2)
              (= (match-end 0) (match-end 1)))
    (vector s nil nil nil)))

(defun tag-test-approximate-todo-check (s)
  (if (string-match "\\(/\\(!\\)?\\s-*\\)[^{}\"]*$" s)
      (vector (substring s 0 (match-beginning 1))
              (substring s (match-end 1))
              (match-end 2)
              (= (match-end 0) (match-end 1)))
    (vector s nil nil nil)))

(defun tqs (q)
  (let ((todo-only nil))
    (cdr (org-make-tags-matcher-NEW q))))

(defun tag-test-scan (s point-list &optional stay-put)
  (prog1
      (if (atom point-list)
          (org-scan-tags-NEW 'point (tqs s) nil)
        (cons (equal (org-scan-tags-NEW 'point (tqs s) nil) point-list) s))
    (unless stay-put
      (goto-char (point-min)))))

(defun tag-test-scan-do-tests (tests)
  (let ((contents
         "* [#B] A heading One                                                       :xyz:
** Put a One here too at level Two
*** And a level Three, also One   
* [#C] B heading Two                                                       :uvw:
** Another                                                                 :abc:
* [#A] C heading Three                                                     :xyz:
* <NOTES>                                                                  :wtf:
** More more more
*** TODO Onward
*** TODO Upward
*** WATT What comes up...
*** DONE Gliding home
**** Just a test    
* [#A] D heading Four                                                      :uvw:
* TODO E heading Five                                              :abc:uvw:xyz:
  SCHEDULED: <2012-07-31 Tue>
** Answers
   + xyz: 1 314 676
   + xyz-abc: 1 314  
   + Priority A: 314 595
   + Priority not empty: 1 152 314 595
   + Priority empty, Level 1: 395 676
   + Scheduled after <2012-07-01 00:00>: 676
   + TODO=\"TODO\": 494 510 676
   + TODO=\"TODO\", Level>1: 494 510
   + HEADING={<.*>}: 395
   + HEADING={One\\|Two}, Level <= 2: 1 82 152
"))
    (with-temp-buffer
      (set (make-local-variable 'org-tags-column) -80)
      (org-mode)
      (insert contents)
      (goto-char (point-min))
      (mapcar
       (lambda (v) (tag-test-scan (car v) (cdr v)))
       tests))))

(defun tag-test-other-tests ()
  (list
   ;; Testing the org-find-todo-query function
   (mapcar
    (lambda (s-a)
      (cons
       (equal (tag-test-find-todo-query (car s-a)) (cadr s-a))
       (car s-a))) 
    '(("foo+bar/A-B+C"                             ["foo+bar" "A-B+C" nil nil])
      ("foo+bar/"                                  ["foo+bar" "" nil t])
      ("foo+bar/         "                         ["foo+bar" "" nil t])
      ("foo+bar/         a"                        ["foo+bar" "a" nil nil])
      ("foo+bar/!A-B+C"                            ["foo+bar" "A-B+C" 9 nil])
      ("foo+bar/!"                                 ["foo+bar" "" 9 t])
      ("foo+bar/!         "                        ["foo+bar" "" 9 t])
      ("foo+bar/!         a"                       ["foo+bar" "a" 9 nil])
      ("PROP={^\\s-*// .*$}"                       ["PROP={^\\s-*// .*$}" nil nil nil])
      ("PROP={^\\s-*// .*$}+A=\"/!A-B\"/!A-B"      ["PROP={^\\s-*// .*$}+A=\"/!A-B\"" "A-B" 30 nil])
      ("PROP={^\\s-*// .*$}+A=\"/!A-B\"/!        " ["PROP={^\\s-*// .*$}+A=\"/!A-B\"" "" 30 t])))
   ;; Testing that the todo queries work with the existing string-match
   (mapcar
    (lambda (s-a)
      (cons
       (equal (tag-test-approximate-todo-check (car s-a)) (cadr s-a))
       (car s-a))) 
    '(("foo+bar/A-B+C"                             ["foo+bar" "A-B+C" nil nil])
      ("foo+bar/"                                  ["foo+bar" "" nil t])
      ("foo+bar/         "                         ["foo+bar" "" nil t])
      ("foo+bar/         a"                        ["foo+bar" "a" nil nil])
      ("foo+bar/!A-B+C"                            ["foo+bar" "A-B+C" 9 nil])
      ("foo+bar/!"                                 ["foo+bar" "" 9 t])
      ("foo+bar/!         "                        ["foo+bar" "" 9 t])
      ("foo+bar/!         a"                       ["foo+bar" "a" 9 nil])
      ("PROP={^\\s-*// .*$}"                       ["PROP={^\\s-*// .*$}" nil nil nil])
      ("PROP={^\\s-*// .*$}+A=\"/!A-B\"/!A-B"      ["PROP={^\\s-*// .*$}+A=\"/!A-B\"" "A-B" 30 nil])
      ("PROP={^\\s-*// .*$}+A=\"/!A-B\"/!        " ["PROP={^\\s-*// .*$}+A=\"/!A-B\"" "" 30 t])))
   ;; Testing that the matchers work with org-scan-tags
   (let ((scan-tests '(("xyz+LEVEL=1" 1 314 676)
                       ("xyz-abc+LEVEL=1" 1 314)
                       ("xyz" 1 82 117 314 676 787)
                       ("xyz-abc" 1 82 117 314)
                       ("HEADING={<.*>}" 395)
                       ("HEADING={One\\|Two} & LEVEL <= 2" 1 82 152)
                       ("PRIORITY=\"A\"" 314 595)
                       ("PRIORITY<>\"\"" 1 152 314 595)
                       ("PRIORITY=\"\" + LEVEL == 1" 395 676)
                       ("SCHEDULED>=\"<2012-07-01 00:00>\"" 676)
                       ("TODO=\"TODO\"" 494 510 676)
                       ("TODO=\"TODO\"  & LEVEL > 1" 494 510))))
     (tag-test-scan-do-tests scan-tests))))


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: full parser implementation for tag queries (parentheses, fast heading match, and more)
  2012-08-04  7:50 full parser implementation for tag queries (parentheses, fast heading match, and more) Christopher Genovese
@ 2012-08-04 10:07 ` Christopher Genovese
  2012-08-04 18:32   ` Christopher Genovese
  0 siblings, 1 reply; 5+ messages in thread
From: Christopher Genovese @ 2012-08-04 10:07 UTC (permalink / raw)
  To: emacs-orgmode

[-- Attachment #1: Type: text/plain, Size: 12596 bytes --]

A small addendum:

Right after I posted (of course!), I noticed both a small mistake and an
opportunity for
simplification, relating to detecting and processing the todo expressions
after a /.
Specifically, the approximate fix I proposed to the bug in the 7.8 code is
insufficient
to handle regexp matching in todo expressions. The full solution using the
new function org-find-todo-query is needed in the code as I have it, and
that can just
be plugged in instead of the string-match. (See Note h in the original
post, specifically
the string-match given there, and org-find-todo-query.)

But I realized that all that is unnecessary as the todo processing can be
easily built
into the new parser. I've mostly done that just now and will test and post
an update
Saturday, er...today.

Sorry to muddy the waters by not catching that earlier, and sorrier still
if I'm rambling.
Off to bed...

 -- Christopher


On Sat, Aug 4, 2012 at 3:50 AM, Christopher Genovese <genovese@cmu.edu>wrote:

> I am writing an application layer on top of org that uses the
> entry mapping API, but I needed both negation of complex
> selections and heading searches. Because the current tag query
> parser does not handle parenthesized expressions, it does not
> allow negating complex queries. At first, I wrote a workaround
> solution that mimics/specializes the mapping API, but that
> approach seemed inelegant and harder to maintain.
>
> So instead I implemented a full parser for tag queries with a
> number of useful features (see the labeled Notes at the bottom
> for further comments on these features):
>
>   1. Parenthesized expressions to arbitrary depth are allowed.
>   2. A '-' can be used to negate a parenthesized term.             [Note a]
>   3. Regex's in {} can contain braces escaped by doubling: {{ }}.  [Note b]
>   4. Supports fast property search on HEADING and PRIORITY.        [Note c]
>   5. Handles hyphens in property names properly.                   [Note
> d,h]
>   6. Allows only the proper comparison operators, including ==.    [Note
> e,h]
>   7. Allows spaces around operators and terms for readability.     [Note f]
>   8. Matchers use the original expression order; not a big
>      deal, but free.
>   9. The error messages during parsing are reasonably helpful.
>   10. Several bug fixes and a cleaner `org-make-tags-matcher'.     [Note h]
>
> I'm submitting the code for your consideration, with the
> goal of eventually incorporating this into org.el. I would be
> happy to hear any comments or suggestions you have. As I'll describe
> below, this involves relatively minor changes to two existing
> functions and adding a few new support functions. I've attached two
> files org-tag-query-parse.el (the code) and tag-query-tests.el (a
> collection of tests built on a simple framework). I've also
> put the files in http://www.stat.cmu.edu/~genovese/emacs/. The
> comments in both files will I hope be helpful.
>
> At the risk of going on too long, I'd like to add a few comments
> about the code and tests. First, the two existing functions that
> are affected in the code are `org-make-tags-matcher' and
> `org-scan-tags'. In the new version of the former, I've extracted
> out both kinds of query parsing, leading to a shorter and cleaner
> function. The new version of the latter differs in only a couple
> *very minor* places that capture two values that were already
> being computed anyway (see the diff reproduced in the comments).
> Btw, I'm working from the 7.8.11 code.
>
> Loading org-tag-query-parse.el does not change the original
> functions. Instead, I've added a `-NEW' to the names of these
> functions and saved the originals also with a `-ORIGINAL' added.
> After loading the file, you can choose a version to try by doing
>
>     (org-tmp-use-tag-parser 'new)
> and
>     (org-tmp-use-tag-parser 'original)
>
> or do (org-tmp-use-tag-parser) to toggle between versions.
> You can also just use the names with suffixes directly.
> I'd also suggest byte-compiling the file.
>
> I think the place to start looking at the code is the new version
> of `org-make-tags-matcher'. The main entry function for the new
> parser is `org-tag-query-parse', though the real workhorse is
> actually the function `org-tag-query-parse-1'. There is also a
> new function `org-todo-query-parse' which just extracts the
> existing todo matching method. (I didn't do anything with that
> method as the manual makes it clear that it is of secondary
> importance.) I think the modularity here makes
> `org-make-tags-matcher' and each separate parser easier to read
> and understand.
>
> The other substantial piece (in terms of lines of code) is a utility
> macro `org-match-cond' that is used throughout and makes the main
> parser much more readable IMHO. Admittedly, I went a bit
> overboard in optimizing it; the first version worked fine
> but this one produces really nice code. I'd suggest ignoring this
> code (in section "Parsing utility for readable matchers") on
> first pass. The docstring is pretty complete, and its use is more
> or less self-explanatory. Most of its work is done at compile time.
>
> To run the tests, load org-tag-query-parse.el and tag-query-tests.el
> and do
>
>    (tag-test-run :results) ; use :summary for a brief summary of all runs
>    (tag-test-other-tests)  ; miscellaneous other tests, including scanning
>
> or name individual suites. They are at the moment:
>
>    (tag-test-run :results 'org-comparison-1)  ; or use :summary
>    (tag-test-run :results 'org-comparison-2)
>    (tag-test-run :results 'match-results-1)
>    (tag-test-run :results 'match-results-2)
>    (tag-test-run :results 'should-error-1)
>
> If you have other ideas for tests or find any bugs, please let me
> know. Sorry for the homegrown framework; it just sort of grew and
> then I was too tired to rewrite the tests. One complication here
> is that the original and new algorithms produce different term
> orders and use a few different functions. The function
> tag-test-transform transforms original results to the new
> algorithms conventions, but it does not handle PRIORITY or
> HEADING matches at the moment. Use the tree form of the tess (see
> match-results-1 for example) on these. Btw, I've run the tests on
> GNU Emacs 23.2 and 24.1 (running on OS X lion).
>
> Notes:
>    a. There is no need to introduce a new character such as ! for
>       negation because the semantics of the - are clear and are
>       consistent with its use for tags. A - binds more tightly
>       than & which in turn binds more tightly than |. A +
>       selector can also be used for positive selection of a
>       parenthesized term but it is equivalent to using no
>       selector, just as for tags.
>
>    b. Because \'s are so heavily used in regex's and because they
>       have to be doubled in strings, using \'s for an additional
>       escape layer would be messy, ambiguous, and hard to read.
>       Only the {}'s need to be escaped and the doubling escapes
>       {{ -> { and }} -> } are simple, readable, and fast to
>       parse. For example: "+{abc\\{{3,7\\}}}" gives the regex
>       "abc\\{3,7\\}". Parity makes correctness clear at a glance.
>
>    c. Because headline (and priority) searches can be useful and
>       powerful, and because the information on those fields is
>       *already processed* in `org-scan-tags', we get those
>       special searches *essentially for free*, requiring only two
>       minor changes to `org-scan-tags'. See the unified diff in
>       comments. The special PRIORITY property already exists; I
>       added the special HEADING property for these purposes. I'm
>       open to changing the name of course, but I do think the
>       feature is both useful and elegant. (I'm using it in my
>       application, for instance.)
>
>    d. I did not see it in the manual, but I think that property names
>       with hyphens should have these \-escaped -'s in the query
>       string, with the escaping slashes removed in the produced
>       matcher. This is not currently done, but the new version does.
>       See Note h for details.
>
>    e. It seems desirable to support both = and == as equality operators
>       since the latter is so common by habit. The new version allows
>       this explicitly. The original version does as well, but the
>       regex for the comparison operator also allows other operators
>       <<, ><, >>, =>, and >= as well, which can produce bad matchers.
>       See Note h for details.
>
>    f. Currently, spaces are ignored around &, |, the implicit & between
>       terms, around the comparison operators in property searches,
>       and around +/- selectors. Spaces are not ignored inside {}'s
>       for a regexp match.
>
>    g. The current code also allows +/- selectors before property
>       comparisons. I don't really like this because
>       +PROP<>"something" and -PROP="something" have the same
>       meaning but look very different. But the new code does
>       support this. As a side note, there's really no need for
>       the & characters as +/- serve the and/and-not function
>       completely. But again, no prob.
>
>    h. A few bugs detected in the 7.8.11 code:
>
>       + Faulty test for todo matcher in org-make-tags-matcher
>         (string-match "/+" match)
>
>         Ex: (org-make-tags-matcher "PROP={^\\s-*// .*$}") produces
>         an erroneous matcher:
>
>             ("PROP={^\\s-*// .*$}" progn
>              (setq org-cached-props nil)
>              (member "PROP" tags-list))
>
>         For all practical purposes it will be enough to do:
>
>          (string-match "\\(/\\(!\\)?\\s-*\\)[^{}\"]*$" match)
>
>         instead of the current test in org-make-tags-matcher.
>         This works as long as the TODO keywords do not contain a
>         right brace or quotation marks. (In most other cases, the
>         new parser should give an error message at parse time.)
>
>         A technicality: this is /not/ a complete solution because
>         arbitrary strings can be TODO keywords. For instance,
>         both PROP={/!} and PROP="/!{/!}" are valid TODO keywords
>         (it works!) *and* valid property comparisons. So, a pattern
>         alone is insufficient. We want to find the first slash
>         that is not enclosed in {}'s or ""'s; if found, a todo
>         match is needed. The function `org-find-todo-query' does
>         this and (org-find-todo-query match) can be plugged in
>         directly replacing the above (string-match ...) in then
>         new `org-make-tags-matcher'.
>
>         But because the todo parsing uses {}'s for regex matches,
>         TODO keywords with {}'s are ignored anyway. So there's
>         no need to go beyond the fixed string-match above.
>         The function `org-todo-query-parse', which handles todo
>         parsing in the new version, makes this explicit.
>
>       + Property names with -'s are not handled properly (cf. Note d)
>
>         Specifically, the escapes are not removed. Example:
>         (org-make-tags-matcher "PROP\\-WITH\\-HYPHENS=2")
>         produces
>
>         ("PROP\\-WITH\\-HYPHENS=2" and
>          (progn
>          (setq org-cached-props nil)
>          (=
>           (string-to-number
>            (or (org-cached-entry-get nil "PROP\\-WITH\\-HYPHENS")
>            ""))
>           2))
>          t)
>
>         The original code /does/ instead remove -'s from tag
>         names, which shouldn't have them anyway. I suspect that
>         this was intended for property names rather than tag
>         names. The new version fixes up property names but does
>         not allow -'s in tags.
>
>       + Incorrect comparison operators allowed (cf. Note e)
>
>         The regular expression used is "[<=>]\\{1,2\\}" is used to
>         detect the comparison operators. But this can produce bad
>         matchers that fail opaquely at match time rather than
>         giving an appropriate error message at parse time.
>
>         Ex: (org-make-tags-matcher "P<<2") produces
>
>          ("P<<2" and
>           (progn
>             (setq org-cached-props nil)
>             (nil
>              (string-to-number (or (org-cached-entry-get nil "P") "")) 2))
>           t)
>
>         This is fixed in the new version and delivers an error
>         message at parse time.
>
>       + missing org-re (line 7179 in org.el) with posix classes
>
>         Minor consistency issue.  This line does not occur in the new
>         code.
>
>
> Thanks and regards,
>
>    Christopher Genovese
>
>
>

[-- Attachment #2: Type: text/html, Size: 36503 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: full parser implementation for tag queries (parentheses, fast heading match, and more)
  2012-08-04 10:07 ` Christopher Genovese
@ 2012-08-04 18:32   ` Christopher Genovese
  2012-08-16  5:02     ` Samuel Wales
  0 siblings, 1 reply; 5+ messages in thread
From: Christopher Genovese @ 2012-08-04 18:32 UTC (permalink / raw)
  To: emacs-orgmode; +Cc: Christopher Genovese


[-- Attachment #1.1: Type: text/plain, Size: 24469 bytes --]

I have attached an updated version of the files org-tag-query-parse.el
and tag-query-tests.el that I posted about last night, and I have
updated them at http://www.stat.cmu.edu/~genovese/emacs/ as well. I've
also appended an updated version of my original note below for
reference.

The issue had to do with the /!? todo-matches in the queries. The bug in
the 7.8 code that I described in Note h of my original post had to do
with properly detecting when a /!? todo match is present. The
approximate fix I suggested was incorrect; the complete fix was
unnecessarily complicated. I realized all this a few minutes after the
post (d'oh) and have now simplified things accordingly.

The new version unifies the parsing of both kinds of queries. I
eliminated a few functions as a result, and the new org-make-tags-matcher
is even simpler. As a result, todo matching deals with all those edge
cases correctly (like "PROP={^.*//.*$}") for free, also allows the {m,n}
construct in todo regexp matches, just like it does in the tag matches.
(Braces are escaped by doubling, e.g. {^.*B{{1,7}}} as I mentioned last
time.) All the tests pass in a fresh session, both with compiled and
non-compiled code, on 23.2 and 24.1.

I'd appreciate any comments, suggestions, or other feedback you have
on all this.

Thanks for your patience,

  Christopher


=== Updated version of original post follows for reference

I am writing an application layer on top of org that uses the
entry mapping API, but I needed both negation of complex
selections and heading searches. Because the current tag query
parser does not handle parenthesized expressions, it does not
allow negating complex queries. At first, I wrote a workaround
solution that mimics/specializes the mapping API, but that
approach seemed inelegant and harder to maintain.

So instead I implemented a full parser for tag queries with a
number of useful features (see the labeled Notes at the bottom
for further comments on these features):

  1. Parenthesized expressions to arbitrary depth are allowed.
  2. A '-' can be used to negate a parenthesized term.             [Note a]
  3. Regex's in {} can contain braces escaped by doubling: {{ }}.  [Note b]
  4. Supports fast property search on HEADING and PRIORITY.        [Note c]
  5. Handles hyphens in property names properly.                   [Note
d,h]
  6. Allows only the proper comparison operators, including ==.    [Note
e,h]
  7. Allows spaces around operators and terms for readability.     [Note f]
  8. Matchers use the original expression order; not a big
     deal, but free.
  9. The error messages during parsing are reasonably helpful.
  10. Several bug fixes and a cleaner `org-make-tags-matcher'.     [Note h]

I'm submitting the code for your consideration, with the
goal of eventually incorporating this into org.el. I would
be happy to hear any comments or suggestions you have. As
I'll describe below, this involves relatively minor changes
to two existing functions and adding a few new support
functions. I've attached two files org-tag-query-parse.el
(the code) and tag-query-tests.el (a collection of tests
built on a simple framework). I've also put the files in
http://www.stat.cmu.edu/~genovese/emacs/. The comments in
both files will I hope be helpful.

At the risk of going on too long, I'd like to add a few comments
about the code and tests. First, the two existing functions that
are affected in the code are `org-make-tags-matcher' and
`org-scan-tags'. In the new version of the former, I've unified
both kinds of query parsing, leading to a shorter and cleaner
function.. The new version of the latter differs in only a
couple *very minor* places that capture two values that were
already being computed anyway (see the diff reproduced in the
comments). By the way, I'm working from the 7.8.11 code.

Loading org-tag-query-parse.el does not change the original
functions. Instead, I've added a `-NEW' to the names of these
functions and saved the originals also with a `-ORIGINAL' added.
After loading the file, you can choose a version to try by doing

    (org-tmp-use-tag-parser 'new)
and
    (org-tmp-use-tag-parser 'original)

or do (org-tmp-use-tag-parser) to toggle between versions.
You can also just use the names with suffixes directly.
I'd also suggest byte-compiling the file.

I think the place to start looking at the code is the new
version of `org-make-tags-matcher'. The main workhorse for the
new parser is `org-tag-query-parse', with new function
`org-todo-query-parse' handling the /!? todo parsing
at the appropriate point..

The other substantial piece (in terms of lines of code) is a
utility macro `org-match-cond' that is used throughout and makes
the main parser much more readable IMHO. Admittedly, I went a
bit overboard in optimizing it; the first version worked fine
but this one produces really nice code. I'd suggest ignoring
this code (in section "Parsing utility for readable matchers")
on first pass. The docstring is pretty complete, and its use is
more or less self-explanatory.

To run the tests, load org-tag-query-parse.el and tag-query-tests.el
and do

   (tag-test-run :results) ; use :summary for a brief summary of all runs
   (tag-test-other-tests)  ; miscellaneous other tests, including scanning

or name individual suites. They are at the moment:

   (tag-test-run :results 'org-comparison-1)  ; or use :summary
   (tag-test-run :results 'org-comparison-2)
   (tag-test-run :results 'match-results-1)
   (tag-test-run :results 'match-results-2)
   (tag-test-run :results 'should-error-1)

If you have other ideas for tests or find any bugs, please let me
know. Sorry for the homegrown framework; it just sort of grew and
then I was too tired to rewrite the tests. One complication here
is that the original and new algorithms produce different term
orders and use a few different functions. The function
tag-test-transform transforms original results to the new
algorithms conventions, but it does not handle PRIORITY or
HEADING matches at the moment. Use the tree form of the tess (see
match-results-1 for example) on these. Btw, I've run the tests on
GNU Emacs 23.2 and 24.1 (running on OS X lion).

Notes:
   a. There is no need to introduce a new character such as ! for
      negation because the semantics of the - are clear and are
      consistent with its use for tags. A - binds more tightly
      than & which in turn binds more tightly than |. A +
      selector can also be used for positive selection of a
      parenthesized term but it is equivalent to using no
      selector, just as for tags.

   b. Because \'s are so heavily used in regex's and because they
      have to be doubled in strings, using \'s for an additional
      escape layer would be messy, ambiguous, and hard to read.
      Only the {}'s need to be escaped and the doubling escapes
      {{ -> { and }} -> } are simple, readable, and fast to
      parse. For example: "+{abc\\{{3,7\\}}}" gives the regex
      "abc\\{3,7\\}". Parity makes correctness clear at a glance.

   c. Because headline (and priority) searches can be useful and
      powerful, and because the information on those fields is
      *already processed* in `org-scan-tags', we get those
      special searches *essentially for free*, requiring only two
      minor changes to `org-scan-tags'. See the unified diff in
      comments. The special PRIORITY property already exists; I
      added the special HEADING property for these purposes. I'm
      open to changing the name of course, but I do think the
      feature is both useful and elegant. (I'm using it in my
      application, for instance.)

   d. I did not see it in the manual, but I think that property names
      with hyphens should have these \-escaped -'s in the query
      string, with the escaping slashes removed in the produced
      matcher. This is not currently done, but the new version does.
      See Note h for details.

   e. It seems desirable to support both = and == as equality operators
      since the latter is so common by habit. The new version allows
      this explicitly. The original version does as well, but the
      regex for the comparison operator also allows other operators
      <<, ><, >>, =>, and >= as well, which can produce bad matchers.
      See Note h for details.

   f. Currently, spaces are ignored around &, |, the implicit & between
      terms, around the comparison operators in property searches,
      and around +/- selectors. Spaces are not ignored inside {}'s
      for a regexp match.

   g. The current code also allows +/- selectors before property
      comparisons. I don't really like this because
      +PROP<>"something" and -PROP="something" have the same
      meaning but look very different. But the new code does
      support this. As a side note, there's really no need for
      the & characters as +/- serve the and/and-not function
      completely. But again, no prob.

   h. A few bugs detected in the 7.8.11 code:

      + Faulty test for todo matcher in org-make-tags-matcher
        (string-match "/+" match)

        Ex: (org-make-tags-matcher "PROP={^\\s-*// .*$}") produces
        an erroneous matcher:

            ("PROP={^\\s-*// .*$}" progn
             (setq org-cached-props nil)
             (member "PROP" tags-list))

        Because {}'s can appear in /!? todo-match expressions, and
        becauseA arbitrary strings can be TODO keywords, a simple
        pattern match will not give a complete solution. For
        instance, both PROP={/!} and PROP="/!{/!}" are valid TODO
        keywords (it works!) *and* valid property comparisons. We
        want to find the first slash that is not enclosed in {}'s or
        ""'s; if found, a todo match is needed. The new parser
        handles the two types of matching together and so
        automatically detects the todo matches correctly.

      + Property names with -'s are not handled properly (cf. Note d)

        Specifically, the escapes are not removed. Example:
        (org-make-tags-matcher "PROP\\-WITH\\-HYPHENS=2")
        produces

        ("PROP\\-WITH\\-HYPHENS=2" and
         (progn
         (setq org-cached-props nil)
         (=
          (string-to-number
           (or (org-cached-entry-get nil "PROP\\-WITH\\-HYPHENS")
           ""))
          2))
         t)

        The original code /does/ instead remove -'s from tag
        names, which shouldn't have them anyway. I suspect that
        this was intended for property names rather than tag
        names. The new version fixes up property names but does
        not allow -'s in tags.

      + Incorrect comparison operators allowed (cf. Note e)

        The regular expression used is "[<=>]\\{1,2\\}" is used to
        detect the comparison operators. But this can produce bad
        matchers that fail opaquely at match time rather than
        giving an appropriate error message at parse time.

        Ex: (org-make-tags-matcher "P<<2") produces

         ("P<<2" and
          (progn
            (setq org-cached-props nil)
            (nil
             (string-to-number (or (org-cached-entry-get nil "P") "")) 2))
          t)

        This is fixed in the new version and delivers an error
        message at parse time.

      + missing org-re (line 7179 in org.el) with posix classes

        Minor consistency issue.  This line does not occur in the new
        code.


Thanks and regards,

   Christopher Genovese




On Sat, Aug 4, 2012 at 6:07 AM, Christopher Genovese <genovese@cmu.edu>wrote:

> A small addendum:
>
> Right after I posted (of course!), I noticed both a small mistake and an
> opportunity for
> simplification, relating to detecting and processing the todo expressions
> after a /.
> Specifically, the approximate fix I proposed to the bug in the 7.8 code is
> insufficient
> to handle regexp matching in todo expressions. The full solution using the
> new function org-find-todo-query is needed in the code as I have it, and
> that can just
> be plugged in instead of the string-match. (See Note h in the original
> post, specifically
> the string-match given there, and org-find-todo-query.)
>
> But I realized that all that is unnecessary as the todo processing can be
> easily built
> into the new parser. I've mostly done that just now and will test and post
> an update
> Saturday, er...today.
>
> Sorry to muddy the waters by not catching that earlier, and sorrier still
> if I'm rambling.
> Off to bed...
>
>  -- Christopher
>
>
>
> On Sat, Aug 4, 2012 at 3:50 AM, Christopher Genovese <genovese@cmu.edu>wrote:
>
>> I am writing an application layer on top of org that uses the
>> entry mapping API, but I needed both negation of complex
>> selections and heading searches. Because the current tag query
>> parser does not handle parenthesized expressions, it does not
>> allow negating complex queries. At first, I wrote a workaround
>> solution that mimics/specializes the mapping API, but that
>> approach seemed inelegant and harder to maintain.
>>
>> So instead I implemented a full parser for tag queries with a
>> number of useful features (see the labeled Notes at the bottom
>> for further comments on these features):
>>
>>   1. Parenthesized expressions to arbitrary depth are allowed.
>>   2. A '-' can be used to negate a parenthesized term.             [Note
>> a]
>>   3. Regex's in {} can contain braces escaped by doubling: {{ }}.  [Note
>> b]
>>   4. Supports fast property search on HEADING and PRIORITY.        [Note
>> c]
>>   5. Handles hyphens in property names properly.                   [Note
>> d,h]
>>   6. Allows only the proper comparison operators, including ==.    [Note
>> e,h]
>>   7. Allows spaces around operators and terms for readability.     [Note
>> f]
>>   8. Matchers use the original expression order; not a big
>>      deal, but free.
>>   9. The error messages during parsing are reasonably helpful.
>>   10. Several bug fixes and a cleaner `org-make-tags-matcher'.     [Note
>> h]
>>
>> I'm submitting the code for your consideration, with the
>> goal of eventually incorporating this into org.el. I would be
>> happy to hear any comments or suggestions you have. As I'll describe
>> below, this involves relatively minor changes to two existing
>> functions and adding a few new support functions. I've attached two
>> files org-tag-query-parse.el (the code) and tag-query-tests.el (a
>> collection of tests built on a simple framework). I've also
>> put the files in http://www.stat.cmu.edu/~genovese/emacs/. The
>> comments in both files will I hope be helpful.
>>
>> At the risk of going on too long, I'd like to add a few comments
>> about the code and tests. First, the two existing functions that
>> are affected in the code are `org-make-tags-matcher' and
>> `org-scan-tags'. In the new version of the former, I've extracted
>> out both kinds of query parsing, leading to a shorter and cleaner
>> function. The new version of the latter differs in only a couple
>> *very minor* places that capture two values that were already
>> being computed anyway (see the diff reproduced in the comments).
>> Btw, I'm working from the 7.8.11 code.
>>
>> Loading org-tag-query-parse.el does not change the original
>> functions. Instead, I've added a `-NEW' to the names of these
>> functions and saved the originals also with a `-ORIGINAL' added.
>> After loading the file, you can choose a version to try by doing
>>
>>     (org-tmp-use-tag-parser 'new)
>> and
>>     (org-tmp-use-tag-parser 'original)
>>
>> or do (org-tmp-use-tag-parser) to toggle between versions.
>> You can also just use the names with suffixes directly.
>> I'd also suggest byte-compiling the file.
>>
>> I think the place to start looking at the code is the new version
>> of `org-make-tags-matcher'. The main entry function for the new
>> parser is `org-tag-query-parse', though the real workhorse is
>> actually the function `org-tag-query-parse-1'. There is also a
>> new function `org-todo-query-parse' which just extracts the
>> existing todo matching method. (I didn't do anything with that
>> method as the manual makes it clear that it is of secondary
>> importance.) I think the modularity here makes
>> `org-make-tags-matcher' and each separate parser easier to read
>> and understand.
>>
>> The other substantial piece (in terms of lines of code) is a utility
>> macro `org-match-cond' that is used throughout and makes the main
>> parser much more readable IMHO. Admittedly, I went a bit
>> overboard in optimizing it; the first version worked fine
>> but this one produces really nice code. I'd suggest ignoring this
>> code (in section "Parsing utility for readable matchers") on
>> first pass. The docstring is pretty complete, and its use is more
>> or less self-explanatory. Most of its work is done at compile time.
>>
>> To run the tests, load org-tag-query-parse.el and tag-query-tests.el
>> and do
>>
>>    (tag-test-run :results) ; use :summary for a brief summary of all runs
>>    (tag-test-other-tests)  ; miscellaneous other tests, including scanning
>>
>> or name individual suites. They are at the moment:
>>
>>    (tag-test-run :results 'org-comparison-1)  ; or use :summary
>>    (tag-test-run :results 'org-comparison-2)
>>    (tag-test-run :results 'match-results-1)
>>    (tag-test-run :results 'match-results-2)
>>    (tag-test-run :results 'should-error-1)
>>
>> If you have other ideas for tests or find any bugs, please let me
>> know. Sorry for the homegrown framework; it just sort of grew and
>> then I was too tired to rewrite the tests. One complication here
>> is that the original and new algorithms produce different term
>> orders and use a few different functions. The function
>> tag-test-transform transforms original results to the new
>> algorithms conventions, but it does not handle PRIORITY or
>> HEADING matches at the moment. Use the tree form of the tess (see
>> match-results-1 for example) on these. Btw, I've run the tests on
>> GNU Emacs 23.2 and 24.1 (running on OS X lion).
>>
>> Notes:
>>    a. There is no need to introduce a new character such as ! for
>>       negation because the semantics of the - are clear and are
>>       consistent with its use for tags. A - binds more tightly
>>       than & which in turn binds more tightly than |. A +
>>       selector can also be used for positive selection of a
>>       parenthesized term but it is equivalent to using no
>>       selector, just as for tags.
>>
>>    b. Because \'s are so heavily used in regex's and because they
>>       have to be doubled in strings, using \'s for an additional
>>       escape layer would be messy, ambiguous, and hard to read.
>>       Only the {}'s need to be escaped and the doubling escapes
>>       {{ -> { and }} -> } are simple, readable, and fast to
>>       parse. For example: "+{abc\\{{3,7\\}}}" gives the regex
>>       "abc\\{3,7\\}". Parity makes correctness clear at a glance.
>>
>>    c. Because headline (and priority) searches can be useful and
>>       powerful, and because the information on those fields is
>>       *already processed* in `org-scan-tags', we get those
>>       special searches *essentially for free*, requiring only two
>>       minor changes to `org-scan-tags'. See the unified diff in
>>       comments. The special PRIORITY property already exists; I
>>       added the special HEADING property for these purposes. I'm
>>       open to changing the name of course, but I do think the
>>       feature is both useful and elegant. (I'm using it in my
>>       application, for instance.)
>>
>>    d. I did not see it in the manual, but I think that property names
>>       with hyphens should have these \-escaped -'s in the query
>>       string, with the escaping slashes removed in the produced
>>       matcher. This is not currently done, but the new version does.
>>       See Note h for details.
>>
>>    e. It seems desirable to support both = and == as equality operators
>>       since the latter is so common by habit. The new version allows
>>       this explicitly. The original version does as well, but the
>>       regex for the comparison operator also allows other operators
>>       <<, ><, >>, =>, and >= as well, which can produce bad matchers.
>>       See Note h for details.
>>
>>    f. Currently, spaces are ignored around &, |, the implicit & between
>>       terms, around the comparison operators in property searches,
>>       and around +/- selectors. Spaces are not ignored inside {}'s
>>       for a regexp match.
>>
>>    g. The current code also allows +/- selectors before property
>>       comparisons. I don't really like this because
>>       +PROP<>"something" and -PROP="something" have the same
>>       meaning but look very different. But the new code does
>>       support this. As a side note, there's really no need for
>>       the & characters as +/- serve the and/and-not function
>>       completely. But again, no prob.
>>
>>    h. A few bugs detected in the 7.8.11 code:
>>
>>       + Faulty test for todo matcher in org-make-tags-matcher
>>         (string-match "/+" match)
>>
>>         Ex: (org-make-tags-matcher "PROP={^\\s-*// .*$}") produces
>>         an erroneous matcher:
>>
>>             ("PROP={^\\s-*// .*$}" progn
>>              (setq org-cached-props nil)
>>              (member "PROP" tags-list))
>>
>>         For all practical purposes it will be enough to do:
>>
>>          (string-match "\\(/\\(!\\)?\\s-*\\)[^{}\"]*$" match)
>>
>>         instead of the current test in org-make-tags-matcher.
>>         This works as long as the TODO keywords do not contain a
>>         right brace or quotation marks. (In most other cases, the
>>         new parser should give an error message at parse time.)
>>
>>         A technicality: this is /not/ a complete solution because
>>         arbitrary strings can be TODO keywords. For instance,
>>         both PROP={/!} and PROP="/!{/!}" are valid TODO keywords
>>         (it works!) *and* valid property comparisons. So, a pattern
>>         alone is insufficient. We want to find the first slash
>>         that is not enclosed in {}'s or ""'s; if found, a todo
>>         match is needed. The function `org-find-todo-query' does
>>         this and (org-find-todo-query match) can be plugged in
>>         directly replacing the above (string-match ...) in then
>>         new `org-make-tags-matcher'.
>>
>>         But because the todo parsing uses {}'s for regex matches,
>>         TODO keywords with {}'s are ignored anyway. So there's
>>         no need to go beyond the fixed string-match above.
>>         The function `org-todo-query-parse', which handles todo
>>         parsing in the new version, makes this explicit.
>>
>>       + Property names with -'s are not handled properly (cf. Note d)
>>
>>         Specifically, the escapes are not removed. Example:
>>         (org-make-tags-matcher "PROP\\-WITH\\-HYPHENS=2")
>>         produces
>>
>>         ("PROP\\-WITH\\-HYPHENS=2" and
>>          (progn
>>          (setq org-cached-props nil)
>>          (=
>>           (string-to-number
>>            (or (org-cached-entry-get nil "PROP\\-WITH\\-HYPHENS")
>>            ""))
>>           2))
>>          t)
>>
>>         The original code /does/ instead remove -'s from tag
>>         names, which shouldn't have them anyway. I suspect that
>>         this was intended for property names rather than tag
>>         names. The new version fixes up property names but does
>>         not allow -'s in tags.
>>
>>       + Incorrect comparison operators allowed (cf. Note e)
>>
>>         The regular expression used is "[<=>]\\{1,2\\}" is used to
>>         detect the comparison operators. But this can produce bad
>>         matchers that fail opaquely at match time rather than
>>         giving an appropriate error message at parse time.
>>
>>         Ex: (org-make-tags-matcher "P<<2") produces
>>
>>          ("P<<2" and
>>           (progn
>>             (setq org-cached-props nil)
>>             (nil
>>              (string-to-number (or (org-cached-entry-get nil "P") "")) 2))
>>           t)
>>
>>         This is fixed in the new version and delivers an error
>>         message at parse time.
>>
>>       + missing org-re (line 7179 in org.el) with posix classes
>>
>>         Minor consistency issue.  This line does not occur in the new
>>         code.
>>
>>
>> Thanks and regards,
>>
>>    Christopher Genovese
>>
>>
>>
>

[-- Attachment #1.2: Type: text/html, Size: 70497 bytes --]

[-- Attachment #2: org-tag-query-parse.el --]
[-- Type: application/octet-stream, Size: 51049 bytes --]

;;; org-tag-query-parse.el -- proposed full parser for tag queries

;; Copyright (C) 2012, Christopher R. Genovese, all rights reserved.

;; Author:  Christopher Genovese <genovese@cmu.edu>
;; Version: 0.9
;;
;; Created:      Sun 29 Jul 2012 at 10:04 EDT
;; Last-Updated: Fri 03 Aug 2012 at 23:52 EDT
;; Updated By:   Christopher R. Genovese
;; Keywords: org-mode, tags, query, search
;; Package-Requires: ((org-mode 7.8))


;;; Commentary:
;;
;;  The current parser for tag query searches does not handle
;;  parenthesized expressions and thus does not allow negating complex
;;  queries. This code implements a full parser for tag queries with
;;  number of useful features (see the labeled Notes below for further
;;  comments on these features):
;; 
;;   1. Parenthesized expressions to arbitrary depth are allowed.
;;   2. A '-' can be used to negate a parenthesized term.             [Note a]
;;   3. Regex's in {} can contain braces escaped by doubling: {{ }}.  [Note b]
;;   4. Supports fast property search on HEADING and PRIORITY.        [Note c]
;;   5. Handles hyphens in property names properly.                   [Note d]
;;   6. Allows only the proper comparison operators, including ==.    [Note e]
;;   7. Allows spaces around operators and terms for readability.     [Note f]
;;   8. Parse trees use the original expression order; not a big
;;      deal, but free.
;;   9. Returned parse trees are clean, without trivial operators,
;;      and error messages during parsing are reasonably helpful.
;;   10. A few bug fixes and a cleaner `org-make-tags-matcher'.       [Note h]
;;
;;  I propose that this new parser be incorporated into org.el.
;;  
;;  The two existing functions that are affected in the code are
;;  `org-make-tags-matcher' and `org-scan-tags'. In the new version of
;;  the former, I've unified both kinds of query parsing, leading to a
;;  shorter and cleaner function.. The new version of the latter differs
;;  in only a couple *very minor* places that capture two values that
;;  were already being computed anyway (see the diff reproduced in the
;;  comments).
;;  
;;  Loading org-tag-query-parse.el does not change the original
;;  functions. Instead, I've added a `-NEW' to the names of these
;;  functions and saved the originals also with a `-ORIGINAL' added.
;;  After loading the file, you can choose a version to try by doing
;;  
;;      (org-tmp-use-tag-parser 'new)
;;  and
;;      (org-tmp-use-tag-parser 'original)
;;  
;;  or do (org-tmp-use-tag-parser) to toggle between versions.
;;  You can also just use the names with suffixes directly. 
;;  
;;  I think the place to start looking at the code is the new version of
;;  `org-make-tags-matcher'. The main workhorse for the new parser is
;;  `org-tag-query-parse', with new function `org-todo-query-parse'
;;  handling the /!? todo parsing.
;;  
;;  The other substantial piece (in terms of lines of code) is a utility
;;  macro `org-match-cond' that is used throughout and makes the main
;;  parser much more readable IMHO. Admittedly, I went a bit overboard
;;  in optimizing it; the first version worked fine but this one
;;  produces really nice code. I'd suggest ignoring this code (in
;;  section "Parsing utility for readable matchers") on first pass. The
;;  docstring is pretty complete, and its use is more or less
;;  self-explanatory.
;;  
;;  Notes:
;;    a. There is no need to introduce a new character such as ! for
;;       negation because the semantics of the - are clear and are
;;       consistent with its use for tags. A - binds more tightly
;;       than & which in turn binds more tightly than |. A +
;;       selector can also be used for positive selection of a
;;       parenthesized term but it is equivalent to using no
;;       selector, just as for tags.
;;       
;;    b. Because \'s are so heavily used in regex's and because they
;;       have to be doubled in strings, using \'s for an additional
;;       escape layer would be messy, ambiguous, and hard to read.
;;       Only the {}'s need to be escaped and the doubling escapes
;;       {{ -> { and }} -> } are simple, readable, and fast to
;;       parse. For example: "+{abc\\{{3,7\\}}}" gives the regex
;;       "abc\\{3,7\\}". Parity makes correctness clear at a glance.
;;       
;;    c. Because headline (and priority) searches can be useful and
;;       powerful, and because the information on those fields is
;;       *already processed* in `org-scan-tags', we get those
;;       special searches *essentially for free*, requiring only two
;;       minor changes to `org-scan-tags'. See the unified diff in
;;       comments. The special PRIORITY property already exists; I
;;       added the special HEADING property for these purposes. I'm
;;       open to changing the name of course, but I do think the
;;       feature is very useful.
;; 
;;    d. I did not see it in the manual, but I think that property names
;;       with hyphens should have these \-escaped -'s in the query
;;       string, with the escaping slashes removed in the produced
;;       matcher. This is not currently done, but the new version does.
;;       See Note h for details.
;; 
;;    e. It seems desirable to support both = and == as equality operators
;;       since the latter is so common by habit. The new version allows
;;       this explicitly. The original version does as well, but the
;;       regex for the comparison operator also allows other operators
;;       <<, ><, >>, =>, and >= as well, which can produce bad matchers.
;;       See Note h for details.
;; 
;;    f. Currently, spaces are ignored around &, |, the implicit & between
;;       terms, around the comparison operators in property searches,
;;       and around +/- selectors. Spaces are not ignored inside {}'s
;;       for a regexp match. Truth be told, I prefer having no spaces
;;       after the +/- selectors, but it seems somewhat...harsh to insist
;;       on that for everyone. Live and let live.
;; 
;;    g. The current code also allows +/- selectors before property
;;       comparisons. I don't really like this because
;;       +PROP<>"something" and -PROP="something" have the same
;;       meaning but look very different. But the new code does
;;       support this. As a side note, there's really no need for
;;       the & characters as +/- serve the and/and-not function
;;       completely. But again, no prob.
;; 
;;    h. A few bugs detected in the 7.8.11 code:
;; 
;;       + Faulty test for todo matcher in org-make-tags-matcher
;;         (string-match "/+" match)
;; 
;;         Ex: (org-make-tags-matcher "PROP={^\\s-*// .*$}") produces 
;;         an erroneous matcher:
;; 
;;             ("PROP={^\\s-*// .*$}" progn
;;              (setq org-cached-props nil)
;;              (member "PROP" tags-list))
;; 
;;         Because {}'s can appear in /!? todo-match expressions, and
;;         becauseA arbitrary strings can be TODO keywords, a simple
;;         pattern match will not give a complete solution. For
;;         instance, both PROP={/!} and PROP="/!{/!}" are valid TODO
;;         keywords (it works!) *and* valid property comparisons. We
;;         want to find the first slash that is not enclosed in {}'s or
;;         ""'s; if found, a todo match is needed. The new parser
;;         handles the two types of matching together and so
;;         automatically detects the todo matches correctly.
;;         
;;       + Property names with -'s are not handled properly (cf. Note d)
;;         
;;         Specifically, the escapes are not removed. Example:
;;         (org-make-tags-matcher "PROP\\-WITH\\-HYPHENS=2")
;;         produces
;;         
;;         ("PROP\\-WITH\\-HYPHENS=2" and
;;          (progn
;;          (setq org-cached-props nil)
;;          (=
;;           (string-to-number
;;            (or (org-cached-entry-get nil "PROP\\-WITH\\-HYPHENS")
;;            ""))
;;           2))
;;          t)
;;         
;;         The original code /does/ instead remove -'s from tag
;;         names, which shouldn't have them anyway. I suspect that
;;         this was intended for property names rather than tag
;;         names. The new version fixes up property names but does
;;         not allow -'s in tags.
;; 
;;       + Incorrect comparison operators allowed (cf. Note e)
;;         
;;         The regular expression used is "[<=>]\\{1,2\\}" is used to
;;         detect the comparison operators. But this can produce bad
;;         matchers that fail opaquely at match time rather than 
;;         giving an appropriate error message at parse time.
;; 
;;         Ex: (org-make-tags-matcher "P<<2") produces
;; 
;;          ("P<<2" and
;;           (progn
;;             (setq org-cached-props nil)
;;             (nil
;;              (string-to-number (or (org-cached-entry-get nil "P") "")) 2))
;;           t)
;; 
;;         This is fixed in the new version and delivers an error 
;;         message at parse time.
;; 
;;       + missing org-re (line 7179 in org.el) with posix classes
;;         
;;         Minor consistency issue.  This line does not occur in the new
;;         code.
;;
;;  What follows is a grammar for the updated tag query dsl, which is
;;  given in an informal hybrid of BNF and regex operators. Hopefully, it's
;;  clear enough. The non-obvious terminals are ALL CAPS and are listed below.
;; 
;;  Grammar:
;;    Expression  <-  Conjunction (OR Conjunction)*
;;    Conjunction <-  Term (AND? Term)*
;;    Term        <-  SELECTOR? TAG_IDENT
;;                 |  SELECTOR? LBRACE Regex RBRACE
;;                 |  SELECTOR? PROP_IDENT CMP_OP NUMBER
;;                 |  SELECTOR? PROP_IDENT CMP_OP STRING
;;                 |  SELECTOR? PROP_IDENT CMP_OP DATE_STRING
;;                 |  SELECTOR? PROP_IDENT MATCH_OP LBRACE REGEX RBRACE
;;                 |  SELECTOR? LPAREN Expression RPAREN
;;                 
;;  Terminals (the nonobvious ones):
;;    OR = |
;;    AND = &
;;    CMP_OP = (==|=|<=|<>|>=|>|<)
;;    MATCH_OP = (==|=|<>)
;;    TAG_IDENT = [A-Za-z0-9_@#%]+         
;;    PROP_IDENT = ([A-Za-z0-9_]+(\\-)*)+  
;;    SELECTOR? = [-+]?
;;    STRING = balanced double-quoted string with escapes
;;    DATE_STRING = org style date string
;;    REGEXP = regular expression with { and } doubled to escape, {{ and }}
;;


;;; License:
;;
;; This program is free software; you can redistribute it and/or
;; modify it under the terms of the GNU General Public License as
;; published by the Free Software Foundation; either version 3, or
;; (at your option) any later version.
;;
;; This program is distributed in the hope that it will be useful,
;; but WITHOUT ANY WARRANTY; without even the implied warranty of
;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
;; General Public License for more details.
;;
;; You should have received a copy of the GNU General Public License
;; along with this program; see the file COPYING.  If not, write to
;; the Free Software Foundation, Inc., 51 Franklin Street, Fifth
;; Floor, Boston, MA 02110-1301, USA.
;;


;;; Code:

(eval-when-compile
  (require 'cl))
(require 'org)


;;; Utilities

(defun org-read-balanced-string (open-delim close-delim)
  "Return string delimited by OPEN-DELIM and CLOSE-DELIM with escapes.
OPEN-DELIM and CLOSE-DELIM must be *distinct* characters. Reading
begins at point in the current buffer. To include OPEN-DELIM and
CLOSE-DELIM inside the string being read, those characters must
be *doubled* and only one copy of the character is kept in the
string. The opening and closing delimiters for the read sequence
must be single copies, and an unescaped OPEN-DELIM will raise an
error."
  (when (char-equal open-delim close-delim)
    (error "Open and close delimiters must be distinct, %c" open-delim))
  (unless (char-equal (char-after) open-delim)
    (error "Missing opening %c in delimited string" open-delim))
  (forward-char 1) ; skip initial delimiter
  (let ((delim-re (format "[%c%c]" open-delim close-delim))
        (delims-unbalanced t)
        (begin (point))
        (fragments nil)
        (ch nil))
    (while (and delims-unbalanced (re-search-forward delim-re nil t))
      (setq ch (char-before))
      (cond
        ((char-equal ch open-delim)
         (setq ch (char-after))
         (if (not (and ch (char-equal ch open-delim)))
             (error "Unescaped open delimiter %c in balanced string" open-delim)
           (push (buffer-substring-no-properties begin (1- (point))) fragments)
           (setq begin (point))
           (forward-char 1)))
        ((char-equal ch close-delim)
         (setq ch (char-after))
         (if (not (and ch (char-equal ch close-delim)))
             (setq delims-unbalanced nil)
           (push (buffer-substring-no-properties begin (1- (point))) fragments)
           (setq begin (point))
           (forward-char 1)))))
    (when delims-unbalanced
      (error "Unbalanced delimiters %c%c in balanced string at char %d."
             open-delim close-delim (point)))
    (push (buffer-substring-no-properties begin (1- (point))) fragments)
    (if (null (cdr fragments))
        (car fragments)
      (apply 'concat (nreverse fragments)))))

(defun org-read-quoted-string-in-query ()
  "Read a quoted string, with escapes, starting at point.
Assume that an opening quote has already been seen. This is just
a wrapper for `read' that reports errors nicely."
  (let ((start (point)))
    (condition-case exception
        (read (current-buffer))
      (error
       (org-tquery-error
        "badly formed quoted value in property comparison" :pos start)))))

;; I'm inclined to define an error symbols here to allow finer control.
;; But error symbols don't seem to be used in the main org code, so I'll
;; forgo them here and use `error' directly.  -- CRG 31 July 2012
(defun org-tquery-error (info &rest other-args)
  "Raise a query parsing error.
INFO is an auxilliary message which is appended to the standard
message and which is treated as a format string for OTHER-ARGS.
This need/should not be capitalized. The end of OTHER-ARGS can
contain keyword-value pairs :pos <number-or-marker> and :type
<descriptive string> to control the final error message. The
former defaults to point (position in the query string) and the
latter defaults to `tag'."
  (let* ((pos (or (cadr (memq :pos other-args)) (point)))
         (qtype (or (cadr (memq :type other-args)) "tag ")))
    (error (format "Parse error in %squery at character %d, %s"
                   qtype pos (apply 'format info other-args)))))

;; This is used to define the parser symbol table at compile time.
(defmacro org-defhash-at-compile (name options &rest body) 
  "Define a hash table NAME at compile time and/or load-time.
OPTIONS is a list, possibly empty, of options to pass to
`make-hash-table'. BODY is a list of sequences (lists or
vectors), each of which contains a key value pair. The key and
the value will be evaluated, so for example, a symbol key should
be quoted."
  (declare (indent 2))
  (let* ((docstring (and (stringp (car body)) (car body)))
         (table (if docstring (cdr body) body)))
    `(eval-and-compile
       (defvar ,name (make-hash-table ,@options) ,docstring)
       ,@(mapcar (lambda (s) `(puthash ,(elt s 0) ,(elt s 1) ,name)) table))))


;;; Regex comparison functions in a form like the other comparison operators

(defun org-string-match= (string regexp)
  (string-match-p regexp string))

(defun org-string-match<> (string regexp)
  (not (string-match-p regexp string)))


;;; Parsing utility for readable matchers

(org-defhash-at-compile org-tag-query-terminals ()
  "Lexical token regexes for tag query parsing.
Hash table also contains the symbols +, * and ? that can be used
to represent repetition operators in matched expressions,
e.g., (NUMBER)+ or (SELECTOR)\?. (? must be escaped.)"
  ['TERM-BEGIN    (org-re "[-+[:alnum:]_{(]")]
  ['SELECTOR      "[-+]"]
  ['TAG-IDENT     (org-re  "[[:alnum:]_@#%:]+")]
  ['PROP-IDENT    (org-re "\\(?:[[:alnum:]_]+\\(?:\\\\-\\)*\\)+")]
  ['CMP-OP        "\\(?:==?\\|<=\\|<>\\|>=\\|<\\|>\\)"]
  ['MATCH-OP      "\\(?:==?\\|<>\\)"]
  ['CMP-RHS-BEGIN (org-re "\\(?:{\\|\\\"\\|-?[.[:digit:]]\\)")]
  ['REGEX-OPEN    "{"]
  ['GROUP-OPEN    "("]
  ['GROUP-CLOSE   ")"]
  ['OR-OP         "|"]
  ['AND-OP        "&"]
  ['DATE-STRING   "\"[[<].*?[]>]\""]
  ['INTEGER       (org-re "-?[[:digit:]]+")]
  ['NUMBER        "-?\\(?:\\.[0-9]+\\|[0-9]+\\(\\.[0-9]*\\)?\\)\\(?:[eE][-+][0-9]+\\)?"]
  ['SPACE         "[[:blank:]]"]
  ['SPACE*        "[[:blank:]]*"]
  ['TODO-MARKER   "/"]
  ['TODO-ONLY-M   "!"]
  ['TODO-KEY      "[^-+\"{}&|]+"]
  ['*             "*"]
  ['+             "+"]
  ['\?            "?"])

(org-defhash-at-compile org-tag-query-cmp-ops (:test 'equal)
  "Maps comparison operator strings to a vector of comparison functions.
The functions are arranged in the vector for different types as follows:
         NUM   STRING       TIME       REGEX"
  ("="  [=     string=      org-time=  org-string-match=])
  ("==" [=     string=      org-time=  org-string-match=])
  ("<>" [org<> org-string<> org-time<> org-string-match<>])
  ("<"  [<     string<      org-time<  nil])
  ("<=" [<=    org-string<= org-time<= nil])
  (">=" [>=    org-string>= org-time>= nil])
  (">"  [>     org-string>  org-time>  nil]))

(eval-and-compile
  ;; I'd prefer gensyms for this, but realistically this will do.
  (defvar org-tag-query/private-opt-ch- nil
    "This variable should not be set by the user, even locally.
It should remain nil or chaos may ensue.")

  (defun org-string-as-char-p (s)
    "Is S a string of a single, possibly backslash-escaped, character?
If so, return the character this represents, otherwise nil."
    (if (stringp s)
        (let ((len (length s)))
          (or (and (= 1 len) (string-to-char s))
              (and (= 2 len) (char-equal ?\\ (string-to-char s)) (aref s 1))))
      nil))

  (defun org-tag-query-matcher<- (patterns &optional autogroup terminals next-ch)
    "Convert a list of PATTERNS to a `looking-at' query. 
AUTOGROUP, if non-nil, turns on autogrouping of singletons.
TERMINALS, if non-nil, should be a hash table for attempted
symbol lookup. See `org-match-cond' for more details on
PATTERNS's format."
    (let* ((all-strings
            (lambda (elements)
              (let ((ptr elements))
                (while (and ptr (stringp (car ptr)))
                  (setq ptr (cdr ptr)))
                (null ptr))))
           (replace-terminal
            (lambda (item)
              (cond ((not (hash-table-p terminals)) item)
                    ((symbolp item)    (gethash item terminals item))
                    (t                 item))))
           (push-all
            (lambda (src dest) (dolist (s src dest) (push s dest))))
           (alternatives
            (let (rhs acc nogroup nocapture)
              (dolist (p patterns (progn (push (nreverse rhs) acc)
                                         (nreverse acc)))
                (cond 
                 ((eq p '|)
                  (if (null rhs)
                      (org-tquery-error "misplaced alternative marker")
                    (push (nreverse rhs) acc)
                    (setq rhs nil)))
                 ((eq p :nogroup)
                  (setq nogroup t))
                 ((eq p :nocapture)
                  (setq nocapture t))
                 ((or (consp p) (vectorp p)) ; capturing/noncapturing group resp.
                  (unless nogroup ; allow nogroup/nocapture here for consistency
                    (push (if (or nocapture (vectorp p)) "\\(?:" "\\(") rhs))
                  (setq rhs (funcall push-all (mapcar replace-terminal p) rhs))
                  (unless nogroup
                    (push "\\)" rhs))
                  (setq nogroup nil nocapture nil))
                 ((and autogroup (not nogroup))
                  (push (if (or nocapture (vectorp p)) "\\(?:" "\\(") rhs)
                  (push (funcall replace-terminal p) rhs)
                  (push "\\)" rhs)
                  (setq nogroup nil nocapture nil))
                 (t
                  (push (funcall replace-terminal p) rhs)
                  (setq nogroup nil nocapture nil))))))
           (matchers
            (mapcar (lambda (pterms)
                      (let ((fst (car pterms))
                            (sch nil))
                        (cond
                         ((and (cdr pterms) (funcall all-strings pterms))
                          `(looking-at ,(apply 'concat pterms)))
                         ((cdr pterms)
                          `(looking-at (concat ,@pterms)))
                         ((and next-ch (setq sch (org-string-as-char-p fst)))
                          `(and ,next-ch
                                (char-equal ,next-ch ,sch)
                                (setq org-tag-query/private-opt-ch- ,sch)))
                         (t
                          `(looking-at ,fst)))))
                    alternatives)))
      (if (cdr matchers)
          `(or ,@matchers)
        `,(car matchers)))))

(defmacro org-match-cond (options &rest clauses)
  "Like `cond', but allows special forms in the clauses that lookahead in,
extract from, and move in the current buffer. The special forms
are described below. OPTIONS, if non-nil, is either a symbol,
SYMBOL-TABLE, bound to a hash table for symbol lookup, or a list
of the form (SYMBOL-TABLE NEXT-CHAR-SYM &rest BINDINGS). If NEXT-CHAR-SYM
is not nil, it should be a symbol that will be bound to the next character
and used to optimize the matching of especially simple patterns, as described
below. BINDINGS are standard let bindings which will visible in the
CLAUSES. CLAUSES are structured like the clauses in `cond'.

   Lookahead:
     (<-  PATTERN... [| PATTERN...]*)  -- symbol lookup in SYMBOL-HASH
     (<<- PATTERN... [| PATTERN...]*)  -- symbol lookup + singleton autogrouping
     (<<= PATTERN... [| PATTERN...]*)  -- no symbol lookup or autogrouping

     Each pattern in the list is either a string, a symbol, a
     list of strings and symbols -- representing capturing
     groups, or a vector of strings and symbols -- representing
     non-capturing groups. The |'s separate alternatives that are
     tested lazily (a la `or'); the eventual strings computed for
     the patterns in each alternative are concatenated together
     to form a regular expression which is tested with
     `looking-at'. When symbol lookup is in effect (<- and <<-
     forms), symbols in a pattern are first looked up in
     SYMBOL-TABLE if it exists, and replaced with the
     corresponding value if present. If all the patterns's in an
     alternative resolve to strings at compile time, the regular
     expression is computed at compile time and all the forms
     reduce to a single call to `looking-at'. Otherwise, the
     regular expression is computed at runtime.

     In the special case where there are no alternatives, the
     pattern is a string representing a single character at
     compile time (one character or a backslash-escaped
     character), and NEXT-CHAR-SYM is a non-nil symbol, the cond
     clause is optimized to do a character comparison rather than
     a `looking-at'. Specifically, NEXT-CHAR-SYM is bound to the
     character at point before any tests and used via
     `char-equal' for this optimized match. Movement and string
     forms with group 0, i.e., (@ end 0), ($ 0), ($$ 0), see
     below, still work as expected in this case. To suppress the
     character optimization when NEXT-CHAR-SYM is non-nil, for
     instance to match a regular expression `.', it is sufficient
     to include an empty string in the pattern list or to put the
     term in a non-capturing group..

     In the <<- form, each singleton pattern (strings or symbols)
     is automatically put in a capturing group, unless preceded by
     :nogroup (inhibiting group) or :nocapture (inhibiting capture).

   Movement:
     (@ FROM GROUP [OFFSET]) 
        Moves point to a position relative to a match GROUP,
        which should be a non-negative integer. This has no
        effect if the group did not match, except an (@ end 0)
        always moves to the end of what matched even if the
        clause was optimized into a character match. FROM can be
        either the symbol `end' or `begin', which matches the end
        or beginning of the group, or a function which is called
        with group as an argument. Optional OFFSET, if non-nil,
        is added to the specified position.

   String Extraction:
     ($ GROUP)   The matched string for group GROUP, or nil.
                 A ($ 0) always works, even with character optimization.
"
  (declare (indent 1))
  (let* ((opt-listp (consp options))
         (sym-table (if opt-listp (car options)      options))
         (next-char (if opt-listp (cadr options)     nil))
         (next-sym  (if next-char `',next-char nil))
         (bindings  (if opt-listp (nthcdr 2 options) nil)))
   `(macrolet ((<- (&rest patterns)     ; lookup
                   `,(org-tag-query-matcher<- patterns nil ,sym-table ,next-sym))
               (<<- (&rest patterns)    ; lookup + autogrouping
                    `,(org-tag-query-matcher<- patterns t ,sym-table ,next-sym))
               (<<= (&rest patterns)    ; no lookup or autogrouping
                    `,(org-tag-query-matcher<- patterns nil nil ,next-sym))
               (@ (from group &optional offset) ; move point relative to group
                  ;; Handling the character-match optimization requires
                  ;; checking if a character match was made so an (@ end 0)
                  ;; moves forward instead of referring to the match
                  ;; data. The idea here is to only check that case when
                  ;; the optimization was requested in the first place
                  ;; and then to do as much work as possible at compile
                  ;; time. Admittedly, this part has gotten a bit crazy,
                  ;; but it does produce good code. -- CRG 02 Aug 2012
                  ,(if (not next-char)
                       `(let ((ipos
                               (case from
                                 (end `(or (match-end ,group) (point)))
                                 (begin `(or (match-beginning ,group) (point)))
                                 (t `(,from ,group)))))
                          `(goto-char ,(if offset `(+ ,ipos ,offset) ipos)))
                     `(let ((ipos (case from
                                    (end
                                     `(if org-tag-query/private-opt-ch-
                                          ,(cond
                                            ((and (integerp group) (zerop group))
                                             '(min (1+ (point)) (point-max)))
                                            ((integerp group)
                                             '(point))
                                            (t
                                             `(min (+ (point)
                                                      (if (zerop ,group) 1 0))
                                                   (point-max))))
                                        (or (match-end ,group) (point))))
                                    (begin
                                     `(or (and
                                           (not org-tag-query/private-opt-ch-)
                                           (match-beginning ,group))
                                          (point)))
                                    (t `(,from ,group)))))
                        `(goto-char ,(if offset `(+ ,ipos ,offset) ipos)))))
               ($ (group)          ; string or nil if no match for group
                  ,(if (not next-char)
                       ``(match-string ,group)
                     ``(if org-tag-query/private-opt-ch-
                          ,(if (integerp group) ;; w/literal group, just do it
                               (if (zerop group) ;;  likely the common case
                                   '(string org-tag-query/private-opt-ch-) nil)
                             `(if (zerop ,group)
                                  (string org-tag-query/private-opt-ch-) nil))
                        (match-string ,group)))))
      (let (,@bindings
            ,@(if next-char (list `(,next-char (char-after))) nil)
            ,@(if next-char (list '(org-tag-query/private-opt-ch- nil)) nil))
        (cond
         ,@clauses)))))


;;; The tag query parser itself

(defun org-tag-query-parse (todo-only)
  ;; Works in current buffer with string to be parsed starting at point.
  (let ((parse-stack nil)
        (paren-count 0)
        neg-select got-select
        todo-bang todo-match)
    (labels
        ((emit (&rest items)
               (dolist (item items) (push item parse-stack)))
         (no-term-p ()
                    (symbolp (car parse-stack)))
         (negate-if (negate item)
                    (if negate `(not ,item) item))
         (thread (&rest iterations)
                 (dolist (_ iterations)
                   (let (entries new-entry type)
                     (while (or (eq t (car parse-stack))
                                (not (symbolp (car parse-stack))))
                       (setq entries (cons (pop parse-stack) entries)))
                     (unless (and entries parse-stack)
                       (org-tquery-error "empty subexpression"))
                     (case (setq type (pop parse-stack))
                       ((and or)
                        (setq new-entry 
                              (if (cdr entries)
                                  (cons type entries)
                                (car entries))))
                       (not
                        (assert (null (cdr entries)) nil "not is unary")
                        (let ((arg (car entries)))
                          (setq new-entry
                                (if (and (consp arg) (eq (car arg) 'not))
                                    (cadr arg)
                                  `(not ,arg)))))
                       (identity
                        (assert (null (cdr entries)) nil "() is one expression")
                        (setq new-entry (car entries)))
                       (t                  ; this really shouldn't happen
                        (org-tquery-error "invalid symbol %s on stack." type)))
                     (emit new-entry))))
         (tag-check (id &optional negate)
                    (let ((check `(member ,id tags-list)))
                      (if negate `(not ,check) check)))
         (prop-check (prop &optional numericp)
                     (cond
                      ((string-equal prop "LEVEL")
                       'level)
                      ((member prop '("TODO" "HEADING" "PRIORITY"))
                       `(or ,(intern (downcase prop)) ""))
                      ((string-equal prop "CATEGORY")
                       '(or (get-text-property (point) 'org-category) ""))
                       (numericp
                        `(string-to-number
                          (or (org-cached-entry-get nil ,prop) "")))
                       (t
                        `(or (org-cached-entry-get nil ,prop) "")))))
      ;; Seed outermost expression in parse tree
      (emit 'or 'and)
      (skip-chars-forward (org-re "[:blank:]"))

      (while (not (eobp))
        ;; Process a term
        ;;   Look for the selector char first
        (org-match-cond org-tag-query-terminals
          ((<- (SELECTOR) SPACE*)
           (@ end 0)
           (setq got-select t
                 neg-select (char-equal (string-to-char ($ 1)) ?-)))
          (t
           (setq got-select nil
                 neg-select nil)))
        ;;   Now look for the rest of the term
        (org-match-cond (org-tag-query-terminals char-at-point)
         ((<- (PROP-IDENT) SPACE* (CMP-OP) SPACE* (CMP-RHS-BEGIN))
          (@ begin 3)
          (let* ((prop (save-match-data
                         (replace-regexp-in-string
                          "\\\\-" "-" ($ 1) t t)))
                 (cmp  ($ 2))
                 (indx (case (char-after)
                         (?\{ 3)
                         (?\" (org-match-cond org-tag-query-terminals
                                ((<- DATE-STRING) 2)
                                (t 1)))
                         (t 0)))
                 (op-f (aref (gethash cmp org-tag-query-cmp-ops) indx))
                 (rhs
                  (case indx
                    (0 (org-match-cond org-tag-query-terminals
                         ((<- NUMBER) (@ end 0) (string-to-number ($ 0)))
                         (t (org-tquery-error
                             "invalid number on rhs of property comparison"))))
                    (1 (org-read-quoted-string-in-query))
                    (2 (org-matcher-time (org-read-quoted-string-in-query)))
                    (3 (org-read-balanced-string ?\{ ?\}))))
                 (form (list op-f (prop-check prop (zerop indx)) rhs)))
            (unless op-f
              (org-tquery-error "invalid operator for property regexp match"))
            (emit (negate-if neg-select form))))
         ((<- TAG-IDENT)
          (@ end 0)
          (emit (tag-check ($ 0) neg-select)))
         ((<- REGEX-OPEN)
          (let ((regex (org-read-balanced-string ?\{ ?\})))
            (emit (negate-if neg-select `(org-match-any-p ,regex tags-list)))))
         ((<- GROUP-OPEN)
          (@ end 0)
          (emit (if neg-select 'not 'identity) 'or 'and)
          (incf paren-count))
         (got-select
          (org-tquery-error "trailing selector with no term"))
         ((<- TODO-MARKER (TODO-ONLY-M)\? SPACE*)
          (unless (zerop paren-count)
            (org-tquery-error "missing )s in query string"))
          (@ end 0)
          (setq todo-bang  (and ($ 1) t))
          (when (no-term-p) ; no tag query terms before this
            (emit t))
          (unless (eobp)
            (setq todo-match (org-todo-query-parse))))
         ((no-term-p)
          (org-tquery-error "missing the expected term"))
         ((<- GROUP-CLOSE)    ; end of subexpression, clean up
          (@ end 0)
          (decf paren-count)
          (when (< paren-count 0) (org-tquery-error "mismatched )'s"))
          (thread 'conjunction 'disjunction 'selector))
         ((<- AND-OP)         ; continue conjunction, expect a term
          (@ end 0))                   
         ((<- OR-OP)          ; start or continue a disjunction
          (@ end 0)
          (thread 'conjunction)
          (emit 'and))
         (t
          (org-tquery-error "invalid token %c during query parse"
                            (char-after))))
        ;; Allow spaces around terms, operators, and parens
        (skip-chars-forward (org-re "[:blank:]")))

      (unless (zerop paren-count)
        (org-tquery-error "missing )s in query string"))

      ;; Build the final parse tree by threading the stack
      (while (cdr parse-stack)
        (thread 'any))

      ;; Put the pieces together (todo-match is wrapped in a list)
      (let ((tag-matcher
             (if (eq (car parse-stack) t)
                 t
               `(progn
                  (setq org-cached-props nil)
                  ,(car parse-stack))))
            (todo-restrict
             (if (or todo-bang todo-only)
                 '((member todo org-not-done-keywords))
               nil)))
        (if (or todo-restrict todo-match)
            `(and ,@todo-restrict ,tag-matcher ,@todo-match)
          tag-matcher)))))

(defun org-todo-query-parse ()
  ;; Todo terms match TERM (AND-OP? TERM)* between OR-OP's,
  ;; where TERM = "\\([-+:]\\)?\\({[^}]*}\\|[^-+\"{}&|]+\\)"
  (let (orlist andlist neg-select got-select)
    (while (not (eobp))
      (org-match-cond org-tag-query-terminals
        ((<- (SELECTOR) SPACE*)
         (@ end 0)
         (setq got-select t
               neg-select (char-equal (string-to-char ($ 1)) ?-)))
        (t
         (setq got-select nil
               neg-select nil)))
      (org-match-cond (org-tag-query-terminals char-at-point)
        ((<- REGEX-OPEN)
         (let* ((regex (org-read-balanced-string ?\{ ?\})))
            (push (if neg-select
                      `(org-string-match<> todo ,regex)
                    `(org-string-match= todo ,regex))
                  andlist)))
        ((<- TODO-KEY)
         (@ end 0)
         (let ((match `(equal todo ,($ 0))))
           (push (if neg-select `(not ,match) match) andlist)))
        ((<- AND-OP)
         (@ end 0)
         (cond
          (got-select
           (org-tquery-error "trailing selector with no todo term"))
          ((null andlist)
           (org-tquery-error "missing todo term before &"))))
        ((<- OR-OP)
         (@ end 0)
         (cond
          (got-select
           (org-tquery-error "trailing selector with no todo term"))
          ((null andlist)
           (org-tquery-error "missing todo term before |")))
         (push (if (cdr andlist)
                   (cons 'and (nreverse andlist))
                 (car andlist))
               orlist)
         (setq andlist nil))
        (t
         (org-tquery-error "invalid token %c in todo expression"
                           (char-after))))
      (skip-chars-forward (org-re "[:blank:]")))
    (when andlist
      (push (if (cdr andlist) (cons 'and (nreverse andlist)) (car andlist))
            orlist))
    (list ; wrap in a list for later splicing
     (if (cdr orlist)
         (cons 'or (nreverse orlist))
       (car orlist)))))


;;; Modified `org-make-tags-matcher' and `org-scan-tags' that use the new parser

;;; The main change to `org-make-tags-matcher' is to insert the new tag
;;; parser, but this allowed me to shorten and clean up the code, fix
;;; one very minor bug (see NOTE below), and update the docstring.
;;; I've also separated out the todo parsing into a separate function
;;; for clarity and symmetry, though I've left the method as is.
;;; 
;;; The changes to `org-scan-tags' are minor and essentially free, and 
;;; I include the context diff in comments below.
;;;
;;; These both are drawn from the 7.8.11 code. To facilitate testing,
;;; I've added an -NEW to the names of these functions for the moment,
;;; but that should be changed if they are used. The temporary function
;;; `org-tmp-use-tag-parser' allows switching between old and new for
;;; testing. See also the file `tag-query-tests.el' that runs some basic
;;; tests.

;; Note: Successive matches (\G-style) in a fixed string are not possible
;; in emacs (without making repeated copies of substrings) because there
;; is no way to anchor a string-match at the start position argument.
;; (I have suggested using the zero-length assertion \= in a string to
;; anchor at that position, analogous to its use in buffer searches. If
;; you're with me, spread the word.)
;;
;; So instead of marching through the string copying substrings, we
;; process the query string in a temporary buffer. This is more
;; idiomatic elisp in any case, and is quite fast and convenient it
;; turns out. Doing it the other way is possible as well, by making
;; changes in a few places, particularly the org-match-cond macro and
;; org-tag-query-parse in a few places. But so far I like this approach.

(defun org-make-tags-matcher-NEW (match)
  "Create the TAGS/TODO matcher form for the selection string MATCH.

The variable `todo-only' is scoped dynamically into this
function; it will be set to t if the matcher restricts matching
to TODO entries, otherwise will not be touched.

Returns a cons of the selection string MATCH and the constructed
lisp form implementing the matcher. The matcher is to be
evaluated at an Org entry, with point on the headline, and
returns t if the entry matches the selection string MATCH. The
returned lisp form may reference four variables with information
about the entry, which must be bound around the form's
evaluation: todo, the TODO keyword at the entry (or nil of none);
heading, the text of the heading for the entry; priority, the
priority cookie for the entry or nil; and tags-list, the list of
all tags at the entry including inherited ones. Additionally, the
category of the entry (if any) must be specified as the text
property 'org-category on the headline.

See also `org-scan-tags'.
"
  (declare (special todo-only))
  (unless (boundp 'todo-only)
    (error "org-make-tags-matcher expects todo-only to be scoped in"))
  (unless match
    ;; Get a new match request, with completion
    (let ((org-last-tags-completion-table
	   (org-global-tags-completion-table)))
      (setq match (org-completing-read-no-i
		   "Match: " 'org-tags-completion-function nil nil nil
		   'org-tags-history))))

  ;; Parse the string into a matcher lisp form
  (cond
   ((or (not match) (not (string-match-p "\\S-" match)))
    (cons match t))
   ((string-match "^\\s-*\\([^-+A-Za-z0-9_@%#:{(/ \t]\\)" match)
    (org-tquery-error "invalid characters in query string"
                      :pos (match-beginning 1)))
   (t
    (with-temp-buffer
      (insert match)
      (goto-char (point-min))
      (cons match (org-tag-query-parse todo-only))))))

;; The changes to org-scan-tags are minor and essentially free.
;; A diff -U 2 against org.el from 7.8.11 with only this function
;; changed follows.
;;
;; --- org.el      2012-07-31 15:32:17.000000000 -0400
;; +++ modified-org.el     2012-07-31 15:20:56.000000000 -0400
;; @@ -12830,5 +12830,5 @@
;;                      " *\\(\\<\\("
;;                      (mapconcat 'regexp-quote org-todo-keywords-1 "\\|")
;; -                    (org-re "\\)\\>\\)? *\\(.*?\\)\\(:[[:alnum:]_@#%:]+:\\)?[ \t]*$")))
;; +                    (org-re "\\)\\>\\)?[ \t]*\\(?:\\[#\\(.\\)\\]\\)?[ \t]*\\(.*?\\)\\(:[[:alnum:]_@#%:]+:\\)?[ \t]*$")))
;;          (props (list 'face 'default
;;                       'done-face 'org-agenda-done
;; @@ -12848,5 +12848,5 @@
;;          (tags-alist (list (cons 0 org-file-tags)))
;;          (llast 0) rtn rtn1 level category i txt
;; -        todo marker entry priority)
;; +        todo marker entry heading priority priority-num)
;;      (when (not (or (member action '(agenda sparse-tree)) (functionp action)))
;;        (setq action (list 'lambda nil action)))
;; @@ -12860,5 +12860,7 @@
;;         (catch :skip
;;           (setq todo (if (match-end 1) (org-match-string-no-properties 2))
;; -               tags (if (match-end 4) (org-match-string-no-properties 4)))
;; +                priority (if (match-end 3) (org-match-string-no-properties 3))
;; +                heading (org-match-string-no-properties 4)
;; +               tags (if (match-end 5) (org-match-string-no-properties 5)))
;;           (goto-char (setq lspos (match-beginning 0)))
;;           (setq level (org-reduced-level (funcall outline-level))
;; @@ -12938,5 +12940,5 @@
;;                          tags-list
;;                          )
;; -                   priority (org-get-priority txt))
;; +                   priority-num (org-get-priority txt))
;;               (goto-char lspos)
;;               (setq marker (org-agenda-new-marker))
;; @@ -12944,5 +12946,5 @@
;;                 'org-marker marker 'org-hd-marker marker 'org-category category
;;                 'todo-state todo
;; -               'priority priority 'type "tagsmatch")
;; +               'priority priority-num 'type "tagsmatch")
;;               (push txt rtn))
;;              ((functionp action)
;; 

(defun org-scan-tags-NEW (action matcher todo-only &optional start-level)
  "Scan headline tags with inheritance and produce output ACTION.

ACTION can be `sparse-tree' to produce a sparse tree in the current buffer,
or `agenda' to produce an entry list for an agenda view.  It can also be
a Lisp form or a function that should be called at each matched headline, in
this case the return value is a list of all return values from these calls.

MATCHER is a Lisp form to be evaluated, testing if a given set of tags
qualifies a headline for inclusion.  When TODO-ONLY is non-nil,
only lines with a not-done TODO keyword are included in the output.
This should be the same variable that was scoped into
and set by `org-make-tags-matcher' when it constructed MATCHER.

START-LEVEL can be a string with asterisks, reducing the scope to
headlines matching this string."
  (require 'org-agenda)
  (let* ((re (concat "^"
		     (if start-level
			 ;; Get the correct level to match
			 (concat "\\*\\{" (number-to-string start-level) "\\} ")
		       org-outline-regexp)
		     " *\\(\\<\\("
		     (mapconcat 'regexp-quote org-todo-keywords-1 "\\|")
		     (org-re "\\)\\>\\)?[ \t]*\\(?:\\[#\\(.\\)\\]\\)?[ \t]*\\(.*?\\)\\(:[[:alnum:]_@#%:]+:\\)?[ \t]*$")))
	 (props (list 'face 'default
		      'done-face 'org-agenda-done
		      'undone-face 'default
		      'mouse-face 'highlight
		      'org-not-done-regexp org-not-done-regexp
		      'org-todo-regexp org-todo-regexp
		      'org-complex-heading-regexp org-complex-heading-regexp
		      'help-echo
		      (format "mouse-2 or RET jump to org file %s"
			      (abbreviate-file-name
			       (or (buffer-file-name (buffer-base-buffer))
				   (buffer-name (buffer-base-buffer)))))))
	 (case-fold-search nil)
	 (org-map-continue-from nil)
         lspos tags tags-list
	 (tags-alist (list (cons 0 org-file-tags)))
	 (llast 0) rtn rtn1 level category i txt
	 todo marker entry heading priority priority-num)
    (when (not (or (member action '(agenda sparse-tree)) (functionp action)))
      (setq action (list 'lambda nil action)))
    (save-excursion
      (goto-char (point-min))
      (when (eq action 'sparse-tree)
	(org-overview)
	(org-remove-occur-highlights))
      (while (re-search-forward re nil t)
	(setq org-map-continue-from nil)
	(catch :skip
	  (setq todo (if (match-end 1) (org-match-string-no-properties 2))
                priority (if (match-end 3) (org-match-string-no-properties 3))
                heading (org-match-string-no-properties 4)
		tags (if (match-end 5) (org-match-string-no-properties 5)))
	  (goto-char (setq lspos (match-beginning 0)))
	  (setq level (org-reduced-level (funcall outline-level))
		category (org-get-category))
	  (setq i llast llast level)
	  ;; remove tag lists from same and sublevels
	  (while (>= i level)
	    (when (setq entry (assoc i tags-alist))
	      (setq tags-alist (delete entry tags-alist)))
	    (setq i (1- i)))
	  ;; add the next tags
	  (when tags
	    (setq tags (org-split-string tags ":")
		  tags-alist
		  (cons (cons level tags) tags-alist)))
	  ;; compile tags for current headline
	  (setq tags-list
		(if org-use-tag-inheritance
		    (apply 'append (mapcar 'cdr (reverse tags-alist)))
		  tags)
		org-scanner-tags tags-list)
	  (when org-use-tag-inheritance
	    (setcdr (car tags-alist)
		    (mapcar (lambda (x)
			      (setq x (copy-sequence x))
			      (org-add-prop-inherited x))
			    (cdar tags-alist))))
	  (when (and tags org-use-tag-inheritance
		     (or (not (eq t org-use-tag-inheritance))
			 org-tags-exclude-from-inheritance))
	    ;; selective inheritance, remove uninherited ones
	    (setcdr (car tags-alist)
		    (org-remove-uninherited-tags (cdar tags-alist))))
	  (when (and

		 ;; eval matcher only when the todo condition is OK
		 (and (or (not todo-only) (member todo org-not-done-keywords))
		      (let ((case-fold-search t)) (eval matcher)))

		 ;; Call the skipper, but return t if it does not skip,
		 ;; so that the `and' form continues evaluating
		 (progn
		   (unless (eq action 'sparse-tree) (org-agenda-skip))
		   t)

		 ;; Check if timestamps are deselecting this entry
		 (or (not todo-only)
		     (and (member todo org-not-done-keywords)
			  (or (not org-agenda-tags-todo-honor-ignore-options)
			      (not (org-agenda-check-for-timestamp-as-reason-to-ignore-todo-item)))))

		 ;; Extra check for the archive tag
		 ;; FIXME: Does the skipper already do this????
		 (or
		  (not (member org-archive-tag tags-list))
		  ;; we have an archive tag, should we use this anyway?
		  (or (not org-agenda-skip-archived-trees)
		      (and (eq action 'agenda) org-agenda-archives-mode))))

	    ;; select this headline

	    (cond
	     ((eq action 'sparse-tree)
	      (and org-highlight-sparse-tree-matches
		   (org-get-heading) (match-end 0)
		   (org-highlight-new-match
		    (match-beginning 1) (match-end 1)))
	      (org-show-context 'tags-tree))
	     ((eq action 'agenda)
	      (setq txt (org-agenda-format-item
			 ""
			 (concat
			  (if (eq org-tags-match-list-sublevels 'indented)
			      (make-string (1- level) ?.) "")
			  (org-get-heading))
			 category
			 tags-list
			 )
		    priority-num (org-get-priority txt))
	      (goto-char lspos)
	      (setq marker (org-agenda-new-marker))
	      (org-add-props txt props
		'org-marker marker 'org-hd-marker marker 'org-category category
		'todo-state todo
		'priority priority-num 'type "tagsmatch")
	      (push txt rtn))
	     ((functionp action)
	      (setq org-map-continue-from nil)
	      (save-excursion
		(setq rtn1 (funcall action))
		(push rtn1 rtn)))
	     (t (error "Invalid action")))

	    ;; if we are to skip sublevels, jump to end of subtree
	    (unless org-tags-match-list-sublevels
	      (org-end-of-subtree t)
	      (backward-char 1))))
	;; Get the correct position from where to continue
	(if org-map-continue-from
	    (goto-char org-map-continue-from)
	  (and (= (point) lspos) (end-of-line 1)))))
    (when (and (eq action 'sparse-tree)
	       (not org-sparse-tree-open-archived-trees))
      (org-hide-archived-subtrees (point-min) (point-max)))
    (nreverse rtn)))


;; Temporary code to help with interactive testing
;;
;;  I've added a `-NEW' to the names of the modified functions and save
;;  the originals belowo with a `-ORIGINAL' added. After loading this
;;  file, you can do
;;  
;;      (org-tmp-use-tag-parser 'new)
;;  and
;;      (org-tmp-use-tag-parser 'original)
;;  
;;  two switch between versions and try them out. Or just use the
;;  names with suffixes directly. See also the tests in `tag-query-tests.el'.


(fset 'org-scan-tags-ORIGINAL (symbol-function 'org-scan-tags))
(fset 'org-make-tags-matcher-ORIGINAL (symbol-function 'org-make-tags-matcher))

(defvar org-tmp-which-tag-parser 'original)
(defun org-tmp-use-tag-parser (&optional which)
  "Switch between tag query parsers. 
If non-nil, WHICH must be either 'new or 'original. If nil, it toggles."
  (setq org-tmp-which-tag-parser
        (or which (if (eq org-tmp-which-tag-parser 'original) 'new 'original)))
  (ecase org-tmp-which-tag-parser
    (new
     (fset 'org-scan-tags (symbol-function 'org-scan-tags-NEW))
     (fset 'org-make-tags-matcher (symbol-function 'org-make-tags-matcher-NEW)))
    (original
     (fset 'org-scan-tags (symbol-function 'org-scan-tags-ORIGINAL))
     (fset 'org-make-tags-matcher (symbol-function 'org-make-tags-matcher-ORIGINAL))))
  org-tmp-which-tag-parser)


;;; org-tag-query-parse.el ends here

[-- Attachment #3: tag-query-tests.el --]
[-- Type: application/octet-stream, Size: 27033 bytes --]

;first=>; (load "org-tag-query-parse.el")

(eval-when-compile
  (require 'cl))

;; A very rudimentary test framework

;;; Comparing org-tag-query-parser and org-make-tag-matcher is
;;; complicated by the different ordering of leaves in the trees.
;;; Specifically, the former puts the terms in the given order,
;;; but the latter (the existing org code) reverses the terms.
;;; Parsing the string to reverse would require testing the secondary
;;; parser and turtles all the way down.
;;;
;;; Two approaches then: specify the strings manually in pairs, or
;;; define transform that accounts for the differences. Here
;;; `tag-test-suite' mostly takes the former approach, unless only one
;;; string is given in which case it uses `tag-test-transform' to remap
;;; the original forms. The function `tag-test-transform' tansforms the
;;; existing forms into new forms except it ignores PRIORITY and HEADING
;;; queries which are treated differently in the new code.

(defun tag-test-transform (matcher)
  (let ((spec
         (if (and (eq (cadr matcher) 'and)
                  (eq (car (last matcher)) t))
             (cons (car matcher) (car (cddr matcher)))
           matcher)))
    (if (listp (cdr spec))
        (mapcar 'tag-test-transform-1 spec)
      spec)))

(defun tag-test-transform-1 (spec)
  (if (atom spec)
      spec
    (case (car spec)
      (and
       (cons 'and (nreverse (mapcar 'tag-test-transform-1 (cdr spec)))))
      (or
       (if (and (null (nthcdr 3 spec))
                (equal (car (cddr spec)) ""))
           spec
         (cons 'or (nreverse (mapcar 'tag-test-transform-1 (cdr spec))))))
      (not
       (if (eq (car (cadr spec)) 'string-match)
           (list 'org-string-match<>
                 (car (cddr (cadr spec)))
                 (cadr (cadr spec)))
         spec))
      (string-match
       (list 'org-string-match= (car (cddr spec)) (cadr spec)))
      (t
       (cons (tag-test-transform-1 (car spec))
             (mapcar 'tag-test-transform-1 (cdr spec)))))))

(defun tag-test-find-todo-query (query-string)
  "Does query string contain a todo match expression? 
Search for the first / that is not between quotes or braces, and
return the index of that character if found, or nil.
Set match data for QUERY-STRING so that group 0 spans from the
found / to the end of the string, group 1 matches \"/!?\\s-*\" at
the found /, and group 2 matches the ! if present."
  (with-temp-buffer
    (insert query-string)
    (goto-char (point-min))
    ;; Search for first / that is not between ""'s or {}'s
    (catch :found-slash
      (while (re-search-forward "\\(/\\(!\\)?\\s-*\\)\\|[\"{]" nil t)
        (when (match-end 1)
          (set-match-data  
           (mapcar '1-    ; set indices using string convention
                   (nconc (list (match-beginning 0) (point-max)    ;0 / to end
                                (match-beginning 1) (match-end 1)) ;1 /!?\\s-*
                          (if (match-end 2)                        ;2 !?
                              (list (match-beginning 2) (match-end 2))
                            nil))))
          (throw :found-slash (1- (match-beginning 1))))
        (goto-char (match-beginning 0))
        (case (char-after)
          (?\" (org-read-quoted-string-in-query))
          (?\{ (org-read-balanced-string ?\{ ?\}))))
      nil)))

(defun tag-test-m (query &optional originalp)
  "Call `org-make-tags-matcher' on QUERY. 
New version by default, original version if ORIGINALP is non-nil."
  (let ((todo-only nil))
    (funcall
     (if originalp
         'org-make-tags-matcher-ORIGINAL
       'org-make-tags-matcher-NEW)
     query)))

(defun tag-test-parse-tree (query)
  "Return (just) the parse tree for query produced by `org-tag-query-parser'"
  (car (nthcdr 2 (org-tag-query-parse query))))
;; useful at repl: (defun tq (query) (cons query (tag-test-parse-tree query)))


(defun tag-test-compare (new original)
  (if (and (or (eq (cdr new) t) (eq (cadr new) 'progn))
           (eq (cadr original) 'and)
           (eq (car (last original)) t))
      (equal `(and ,(cdr new) t) (cdr original))
    (equal (cdr new) (cdr original))))
;; formerly returned (and __ (format "%s<==>%s" (car new) (car original)))

(org-defhash-at-compile tag-test-suite-table ()
  "Mapping from test name symbols to tag test functions. Each
function takes an optional argument, which if non-nil, causes a
simple boolean summary to be returned. Otherwise, the function
returns the list of results forms. Call with `tag-test-run'
giving name and optional summarize argument.")

(defmacro tag-test-suite (name &rest body)
  "Register test NAME. If NAME is nil, do not save the test, run it
now with summarize argument t."
  (declare (indent 1))
  (flet ((test-it (spec)
                  (let ((query (car spec))
                        (obj   (cdr spec)))
                    (cond
                     ((null obj)
                      `(let ((todo-only nil)) ; needs to be scoped in
                         (condition-case exception
                             (cons
                              (equal
                               (org-make-tags-matcher-NEW ,query)
                               (tag-test-transform
                                (org-make-tags-matcher-ORIGINAL ,query)))
                              ,query)
                           (error (cons nil
                                        (format "ERROR on query %s: %s"
                                                ,query (cadr exception)))))))
                     ((stringp obj)
                      `(let ((todo-only nil)) ; needs to be scoped in
                         (condition-case exception
                             (cons
                              (tag-test-compare
                               (org-make-tags-matcher-NEW ,query)
                               (org-make-tags-matcher-ORIGINAL ,obj))
                              ,query)
                           (error (cons nil
                                        (format "ERROR on query %s: %s"
                                                ,query (cadr exception)))))))
                     ((eq obj 'error)
                      `(condition-case exception
                           (let ((todo-only nil)) ; needs to be scoped in
                             (org-make-tags-matcher-NEW ,query)
                             (cons nil ,query))
                         (error (cons (cadr exception) ,query))))
                     ((eq obj t)
                      `(condition-case exception
                           (cons
                            (equal (tag-test-m ,query)
                                   '(,query . t))
                            ,query)
                         (error (cons nil
                                        (format "ERROR on query %s: %s"
                                                ,query (cadr exception))))))
                     ((and (consp obj)
                           (tag-test-find-todo-query query))
                      `(condition-case exception
                           (cons
                            (equal (tag-test-m ,query)
                                   ',(cons query (car obj)))
                            ,query)
                         (error (cons nil
                                        (format "ERROR on query %s: %s"
                                                ,query (cadr exception))))))
                     ((consp obj)
                      `(condition-case exception
                           (cons
                            (equal (tag-test-m ,query)
                                   '(,query progn
                                            (setq org-cached-props nil)
                                            ,(car obj)))
                            ,query)
                         (error (cons nil
                                        (format "ERROR on query %s: %s"
                                                ,query (cadr exception))))))
                     (t (error "Unrecognized test type"))))))
    (let* ((result-forms (mapcar 'test-it body))
           (test `(lambda (&optional summarize?)
                    (let ((tests (list ,@result-forms)))
                      (if summarize?
                          (let ((outcome (catch :failure
                                           (dolist (result tests t)
                                             (unless (car result)
                                               (throw :failure nil))))))
                            (if outcome
                                (message "All %s tests passed." ',name)
                              (message "Some %s tests failed." ',name))
                            outcome)
                        (list ,@result-forms))))))
      (if name
          `(progn (puthash ',name ,test tag-test-suite-table) t)
        `(funcall ,test t)))))

(defun tag-test-run (summarize? &rest suite-symbols)
  (let ((summarize (and summarize?
                        (not (memq summarize? '(:show :results :result)))))
        (results nil)
        (suites (if suite-symbols
                    suite-symbols
                  (loop for key being the hash-keys of tag-test-suite-table
                        collect key))))
    (dolist (suite suites)
      (condition-case except
          (let ((test (gethash suite tag-test-suite-table)))
            (if (and test (functionp test))
                (push (cons suite (funcall test summarize)) results)
              (push (cons suite
                          (cons nil (format "Test suite %s not found" suite)))
                    results)))
        (error (push (format "Uncaught error on suite %s: %s"
                             suite except)
                     results))))
    (if (cdr results) (nreverse results) (car results))))


;;
;; Some Basic Tests
;;
;; Keep in mind term reversal mentioned above
;;

(tag-test-suite org-comparison-1
  ("")
  ("foo")
  ("-foo")
  ("{^HU+RRAY}")
  ("-{^BO*O!}")
  ("LEVEL<3")
  ("LEVEL>3")
  ("LEVEL<=3")
  ("LEVEL>=3")
  ("LEVEL=3")
  ("LEVEL<>3")
  ("TODO=\"WAIT\"")
  ("TODO<>\"WAIT\"")
  ("A_PROP<\"foo\"")
  ("A_PROP>\"foo\"")
  ("A_PROP<=\"foo\"")
  ("A_PROP>=\"foo\"")
  ("A_PROP<>\"foo\"")
  ("A_PROP=\"foo\"")
  ("A_DATE=\"<2008-12-24 18:30>\"")
  ("A_DATE<\"<2008-12-24 18:30>\"")
  ("A_DATE<=\"<2008-12-24 18:30>\"")
  ("A_DATE>=\"<2008-12-24 18:30>\"")
  ("A_DATE>\"<2008-12-24 18:30>\"")
  ("A_DATE<>\"<2008-12-24 18:30>\"")
  ("DEADLINE<>\"<-2d>\"")
  ("DEADLINE=\"<+1w>\"")
  ("DEADLINE>\"<+60m>\"")
  ("DEADLINE<\"<today>\"")
  ("SCHEDULED>=\"<tomorrow>\"")
  ("SCHEDULED<=\"<+2y>\"")
  ("CATEGORY=\"foo\"")
  ("foo+PROP=\"A\"+Z={abc}-bar")
  ("foo+bar+zap")
  ("foo-bar+zap")
  ("-foo+bar-zap")
  ("foo|bar|-zap")
  ("-foo+bar+zap-{^a.*}")
  ("-{^abc}+{^a}")
  ("-{[0-9]}|zap9@#%it|LEVEL>5")
  ("{^A}|{^.B}|{^C}")
  ("{^A}|ok-zap|{^C}")
  ("work+TODO=\"WAITING\"|home+TODO=\"WAITING\"")
  ("work-TODO=\"WAITING\"|home&TODO=\"WAITING\"")
  ("zap-bar+foo&LEVEL>2")
  ("-zap+bar|LEVEL<=2&TODO<>\"WAIT\"")
  ("+work-boss+PRIORC=\"A\"+Coffee=\"unlimited\"+Effort<2+With={Sarah\\|Denny}+SCHEDULED>=\"<2008-10-11>\"")
  )

;; Some of these are the same as above, but I've explicitly reversed
;; the terms to make sure that the transformer is not masking any problems.
;; This necessarily excludes the other transformations so it is really a meta-test.
;; The rest include todo matchers in the comparison.
(tag-test-suite org-comparison-2
  ("foo+bar+zap" . "zap+bar+foo")
  ("foo-bar+zap" . "zap-bar+foo")
  ("-foo+bar-zap" . "-zap+bar-foo")
  ("foo|bar|-zap" . "-zap|bar|foo")
  ("-foo+bar+zap-{^a.*}" . "-{^a.*}+zap+bar-foo")
  ("-{^abc}+{^a}" . "{^a}-{^abc}")
  ("-{[0-9]}|zap9@#%it|LEVEL>5" . "LEVEL>5|zap9@#%it|-{[0-9]}")
  ("{^A}|{^.B}|{^C}" . "{^C}|{^.B}|{^A}")
  ("{^A}|ok-zap|{^C}" . "{^C}|-zap+ok|{^A}")
  ("work+TODO=\"WAITING\"|home+TODO=\"WAITING\"" . "TODO=\"WAITING\"+home|TODO=\"WAITING\"+work")
  ("work-TODO=\"WAITING\"|home&TODO=\"WAITING\"" . "TODO=\"WAITING\"&home|-TODO=\"WAITING\"+work")
  ("zap -bar   +foo & LEVEL > 2" . "LEVEL>2&foo-bar+zap")
  ("-zap+bar | LEVEL <= 2 & TODO <> \"WAIT\"" . "TODO<>\"WAIT\"&LEVEL<=2|bar-zap")
  ("foo+bar/TODO+WAIT-DONE" . "bar+foo/-DONE+WAIT+TODO")
  ("foo+bar/TODO+WAIT|URGENT" . "bar+foo/URGENT|WAIT+TODO")
  ("/A")
  ("/A-{AIT}")
  ("foo-bar/A+B|C-D|{W+}")
  ("/A-{AIT}")
  ("foo-bar/A+B|C-D|{W+}")
  ("-zap+bar|LEVEL<=2&TODO<>\"WAIT\"/OK-NOT|GOOD-BAD|HURRAY-BOO"))

;; some of the / todo matches are practically silly, but testing structure here
(tag-test-suite match-results-1
  ("" . t)
  ("           /!" 
   (and (member todo org-not-done-keywords) t))
  ("           /!              " 
   (and (member todo org-not-done-keywords) t))
  ("A_PROP={^0x[0-9A-F]+}"
   (org-string-match=
    (or (org-cached-entry-get nil "A_PROP") "") "^0x[0-9A-F]+"))
  ("A_PROP<>{^[A-Z]+}"
   (org-string-match<>
    (or (org-cached-entry-get nil "A_PROP") "") "^[A-Z]+"))
  ("(a+b-c|A_PROP==2|-(d-e+f&LEVEL>3))"
   (or
    (and
     (member "a" tags-list)
     (member "b" tags-list)
     (not (member "c" tags-list)))
    (= (string-to-number (or (org-cached-entry-get nil "A_PROP") "")) 2)
    (not (and
          (member "d" tags-list)
          (not (member "e" tags-list))
          (member "f" tags-list)
          (> level 3)))))
  ("((c))" (member "c" tags-list))
  ("((-(((-(c))))))" (member "c" tags-list))
  ("((-(-((-(c))))))" (not (member "c" tags-list)))
  ("-(zap -bar   +foo & LEVEL > 2 | HEADING == {Z{{3,7}}})"
   (not (or (and
             (member "zap" tags-list)
             (not (member "bar" tags-list))
             (member "foo" tags-list)
             (> level 2))
            (org-string-match= (or heading "") "Z{3,7}"))))
  ("zap -bar   +foo & LEVEL > 2" 
   (and 
    (member "zap" tags-list)
    (not (member "bar" tags-list))
    (member "foo" tags-list)
    (> level 2)))
  ("-zap+bar | LEVEL <= 2 & TODO <> \"WAIT\"" 
   (or
    (and
     (not (member "zap" tags-list))
     (member "bar" tags-list))
    (and
     (<= level 2)
     (org-string<> (or todo "")  "WAIT"))))
  ("-(zap|{^A}|LEVEL=2)"
   (not (or
         (member "zap" tags-list)
         (org-match-any-p "^A" tags-list)
         (= level 2))))
  ("/!TODO"
   (and
    (member todo org-not-done-keywords)
    t
    (equal todo "TODO")))
  ("/TODO"
   (and t (equal todo "TODO")))
  ("/!"
   (and (member todo org-not-done-keywords) t))
  ("abc-uvw+xyz/!TODO"
   (and
    (member todo org-not-done-keywords)
    (progn
      (setq org-cached-props nil)
      (and
       (member "abc" tags-list)
       (not (member "uvw" tags-list))
       (member "xyz" tags-list)))
    (equal todo "TODO")))
  ("abc-uvw+xyz+LEVEL<=3/TODO|HOLDING-WAITING|AVOIDING-REALLY_AVOIDING" 
   (and
    (progn
      (setq org-cached-props nil)
      (and
       (member "abc" tags-list)
       (not (member "uvw" tags-list))
       (member "xyz" tags-list)
       (<= level 3)))
    (or
     (equal todo "TODO")
     (and
      (equal todo "HOLDING")
      (not (equal todo "WAITING")))
     (and
      (equal todo "AVOIDING")
      (not (equal todo "REALLY_AVOIDING"))))))
  ("abc-uvw+xyz+LEVEL<=3/!TODO|HOLDING-WAITING|AVOIDING-REALLY_AVOIDING"
   (and
    (member todo org-not-done-keywords)
    (progn
      (setq org-cached-props nil)
      (and
       (member "abc" tags-list)
       (not (member "uvw" tags-list))
       (member "xyz" tags-list)
       (<= level 3)))
    (or
     (equal todo "TODO")
     (and
      (equal todo "HOLDING")
      (not (equal todo "WAITING")))
     (and
      (equal todo "AVOIDING")
      (not (equal todo "REALLY_AVOIDING"))))))
  ("abc-uvw+xyz+LEVEL<=3/!TODO|{Y}-{Z}|{A}-{B}"
   (and
    (member todo org-not-done-keywords)
    (progn
      (setq org-cached-props nil)
      (and
       (member "abc" tags-list)
       (not (member "uvw" tags-list))
       (member "xyz" tags-list)
       (<= level 3)))
    (or
     (equal todo "TODO")
     (and
      (org-string-match= todo "Y")
      (org-string-match<> todo "Z"))
     (and
      (org-string-match= todo "A")
      (org-string-match<> todo "B")))))
  ("PROP={^\\s-*// .*$}/A{}B"
   (and
    (progn
      (setq org-cached-props nil)
      (org-string-match= (or (org-cached-entry-get nil "PROP") "")
                         "^\\s-*// .*$"))
    (and
     (equal todo "A")
     (org-string-match= todo "")
     (equal todo "B")))) ; See Note h for why the current 7.8 code does not work for this case
  ("PROP={^\\s-*// .*$}/A+{^.*ab$}+B" 
   (and
    (progn
      (setq org-cached-props nil)
      (org-string-match= (or (org-cached-entry-get nil "PROP") "")
                         "^\\s-*// .*$"))
    (and
     (equal todo "A")
     (org-string-match= todo "^.*ab$")
     (equal todo "B"))))
  ("PROP={^\\s-*// .*$}/A+{^.*ab{{1,3}}$}+B" 
   (and
    (progn
      (setq org-cached-props nil)
      (org-string-match= (or (org-cached-entry-get nil "PROP") "")
                         "^\\s-*// .*$"))
    (and
     (equal todo "A")
     (org-string-match= todo "^.*ab{1,3}$")
     (equal todo "B"))))
  ("PROP={^\\s-*// .*$}/A-{^.*ab$}-B|-C+{^c*$}" 
   (and
    (progn
      (setq org-cached-props nil)
      (org-string-match= (or (org-cached-entry-get nil "PROP") "") 
                         "^\\s-*// .*$"))
    (or
     (and
      (equal todo "A")
      (org-string-match<> todo "^.*ab$")
      (not (equal todo "B")))
     (and
      (not (equal todo "C"))
      (org-string-match= todo "^c*$")))))
  ("/!A" 
   (and
    (member todo org-not-done-keywords)
    t
    (equal todo "A")))
  ("foo+bar/!TODO+WAIT-DONE" 
   (and
    (member todo org-not-done-keywords)
    (progn
      (setq org-cached-props nil)
      (and
       (member "foo" tags-list)
       (member "bar" tags-list)))
    (and
     (equal todo "TODO")
     (equal todo "WAIT")
     (not (equal todo "DONE")))))
  ("foo+bar/!TODO+WAIT|URGENT"
   (and
    (member todo org-not-done-keywords)
    (progn
      (setq org-cached-props nil)
      (and
       (member "foo" tags-list)
       (member "bar" tags-list)))
    (or
     (and
      (equal todo "TODO")
      (equal todo "WAIT"))
     (equal todo "URGENT"))))
  ("foo+bar/!TODO|WAIT|URGENT"
   (and
    (member todo org-not-done-keywords)
    (progn
      (setq org-cached-props nil)
      (and
       (member "foo" tags-list)
       (member "bar" tags-list)))
    (or
     (equal todo "TODO")
     (equal todo "WAIT")
     (equal todo "URGENT"))))  
  ("foo+bar/!{TODO}-WAIT"
   (and
    (member todo org-not-done-keywords)
    (progn
      (setq org-cached-props nil)
      (and
       (member "foo" tags-list)
       (member "bar" tags-list)))
    (and
     (org-string-match= todo "TODO")
     (not (equal todo "WAIT"))))))

(tag-test-suite match-results-2
  ("((a +b -c & LEVEL > 2 | c & LEVEL == 1)-(HEADING={<NOTES>}|TODO=\"DONE\")|urgent|_queue&DEADLINE>\"<2012-01-01 04:00>\")"
   (or
    (and
     (or (and
          (member "a" tags-list)
          (member "b" tags-list)
          (not (member "c" tags-list))
          (> level 2))
         (and
          (member "c" tags-list)
          (= level 1)))
     (not
      (or (org-string-match= (or heading "") "<NOTES>")
          (string= (or todo "") "DONE"))))
    (member "urgent" tags-list)
    (and
     (member "_queue" tags-list)
     (org-time> (or (org-cached-entry-get nil "DEADLINE") "") 1325394000.0))))
  ("+you+me-them&PRIORITY==\"A\"+CATEGORY<>\"missing\""
   (and
    (member "you" tags-list)
    (member "me" tags-list)
    (not (member "them" tags-list))
    (string= (or priority "") "A")
    (org-string<>
     (or (get-text-property (point) 'org-category) "") "missing")))
  ("+you+me-them & PRIORITY == \"A\" + CATEGORY <> \"missing\""
   (and
    (member "you" tags-list)
    (member "me" tags-list)
    (not (member "them" tags-list))
    (string= (or priority "") "A")
    (org-string<> (or (get-text-property (point) 'org-category) "") "missing")))
  ("+you+me-them & PRIORITY < \"A\" | CATEGORY <> \"missing\" + us | HEADING={\\(?:[Ss]ecret\\){{1,3}}}"
   (or
    (and
     (member "you" tags-list)
     (member "me" tags-list)
     (not (member "them" tags-list))
     (string< (or priority "") "A"))
    (and
     (org-string<> (or (get-text-property (point) 'org-category) "")
                   "missing")
     (member "us" tags-list))
    (org-string-match= (or heading "") "\\(?:[Ss]ecret\\){1,3}")))
  ("PROP\\-WITH\\-HYPHENS=2"
   (=
    (string-to-number
     (or
      (org-cached-entry-get nil "PROP-WITH-HYPHENS")
      ""))
    2))
  ("PROP={^\\s-*// .*$}"
   (org-string-match=
    (or (org-cached-entry-get nil "PROP") "")
    "^\\s-*// .*$"))
  ("work-TODO=\"WAITING\"|home&TODO=\"WAITING\"/!" 
   (and
    (member todo org-not-done-keywords)
    (progn
      (setq org-cached-props nil)
      (or
       (and
        (member "work" tags-list)
        (not (string= (or todo "") "WAITING")))
       (and
        (member "home" tags-list)
        (string= (or todo "") "WAITING"))))))
  )

(tag-test-suite should-error-1
  ("()" . error) ; we might just do t here, but the parens suggest an error
  ("(&foo+LEVEL=1)" . error)
  ("(foo+LEVEL=1" . error)
  ("(foo+LEVEL=1))" . error)
  ("(abc+)" . error)
  ("abc+xyz!" . error)
  ("(missing+paren" . error)            
  ("PROP={.*closing brace?" . error) 
  ("PROP=\"abc" . error)
  ("P<<2-bad+cmp+op" . error)
  ("P<={^foo}" . error)
  ("PROP<-.dx" . error)           ; bad number
  ("!?;" . error)
  ("   [x]" . error)
  ("(foo+bar/TODO" . error)
  ("(foo+bar)+/TODO" . error)
  ("foo+bar/!TODO|{AC" . error)
  ("(((foo))))/TODO" . error)
  ("foo+" . error)
  ("foo/TODO-" . error)
  )

;;; Miscellaneous Other tests -- really need a complete framework here.

(defun tag-test-find-todo-query (s)
  (if (org-find-todo-query s)
      (vector (substring s 0 (match-beginning 1))
              (substring s (match-end 1))
              (match-end 2)
              (= (match-end 0) (match-end 1)))
    (vector s nil nil nil)))

(defun tag-test-approximate-todo-check (s)
  (if (string-match "\\(/\\(!\\)?\\s-*\\)[^{}\"]*$" s)
      (vector (substring s 0 (match-beginning 1))
              (substring s (match-end 1))
              (match-end 2)
              (= (match-end 0) (match-end 1)))
    (vector s nil nil nil)))

(defun tqs (q)
  (let ((todo-only nil))
    (cdr (org-make-tags-matcher-NEW q))))

(defun tag-test-scan (s point-list &optional stay-put)
  (prog1
      (if (atom point-list)
          (org-scan-tags-NEW 'point (tqs s) nil)
        (cons (equal (org-scan-tags-NEW 'point (tqs s) nil) point-list) s))
    (unless stay-put
      (goto-char (point-min)))))

(defun tag-test-scan-do-tests (tests)
  (let ((contents
         "* [#B] A heading One                                                       :xyz:
** Put a One here too at level Two
*** And a level Three, also One   
* [#C] B heading Two                                                       :uvw:
** Another                                                                 :abc:
* [#A] C heading Three                                                     :xyz:
* <NOTES>                                                                  :wtf:
** More more more
*** TODO Onward
*** TODO Upward
*** WATT What comes up...
*** DONE Gliding home
**** Just a test    
* [#A] D heading Four                                                      :uvw:
* TODO E heading Five                                              :abc:uvw:xyz:
  SCHEDULED: <2012-07-31 Tue>
** Answers
   + xyz: 1 314 676
   + xyz-abc: 1 314  
   + Priority A: 314 595
   + Priority not empty: 1 152 314 595
   + Priority empty, Level 1: 395 676
   + Scheduled after <2012-07-01 00:00>: 676
   + TODO=\"TODO\": 494 510 676
   + TODO=\"TODO\", Level>1: 494 510
   + HEADING={<.*>}: 395
   + HEADING={One\\|Two}, Level <= 2: 1 82 152
"))
    (with-temp-buffer
      (set (make-local-variable 'org-tags-column) -80)
      (org-mode)
      (insert contents)
      (goto-char (point-min))
      (mapcar
       (lambda (v) (tag-test-scan (car v) (cdr v)))
       tests))))

(defun tag-test-other-tests ()
  (list
   ;; Testing the org-find-todo-query function
   (mapcar
    (lambda (s-a)
      (cons
       (equal (tag-test-find-todo-query (car s-a)) (cadr s-a))
       (car s-a))) 
    '(("foo+bar/A-B+C"                             ["foo+bar" "A-B+C" nil nil])
      ("foo+bar/"                                  ["foo+bar" "" nil t])
      ("foo+bar/         "                         ["foo+bar" "" nil t])
      ("foo+bar/         a"                        ["foo+bar" "a" nil nil])
      ("foo+bar/!A-B+C"                            ["foo+bar" "A-B+C" 9 nil])
      ("foo+bar/!"                                 ["foo+bar" "" 9 t])
      ("foo+bar/!         "                        ["foo+bar" "" 9 t])
      ("foo+bar/!         a"                       ["foo+bar" "a" 9 nil])
      ("PROP={^\\s-*// .*$}"                       ["PROP={^\\s-*// .*$}" nil nil nil])
      ("PROP={^\\s-*// .*$}+A=\"/!A-B\"/!A-B"      ["PROP={^\\s-*// .*$}+A=\"/!A-B\"" "A-B" 30 nil])
      ("PROP={^\\s-*// .*$}+A=\"/!A-B\"/!        " ["PROP={^\\s-*// .*$}+A=\"/!A-B\"" "" 30 t])))
   ;; Testing that the todo queries work with the existing string-match
   (mapcar
    (lambda (s-a)
      (cons
       (equal (tag-test-approximate-todo-check (car s-a)) (cadr s-a))
       (car s-a))) 
    '(("foo+bar/A-B+C"                             ["foo+bar" "A-B+C" nil nil])
      ("foo+bar/"                                  ["foo+bar" "" nil t])
      ("foo+bar/         "                         ["foo+bar" "" nil t])
      ("foo+bar/         a"                        ["foo+bar" "a" nil nil])
      ("foo+bar/!A-B+C"                            ["foo+bar" "A-B+C" 9 nil])
      ("foo+bar/!"                                 ["foo+bar" "" 9 t])
      ("foo+bar/!         "                        ["foo+bar" "" 9 t])
      ("foo+bar/!         a"                       ["foo+bar" "a" 9 nil])
      ("PROP={^\\s-*// .*$}"                       ["PROP={^\\s-*// .*$}" nil nil nil])
      ("PROP={^\\s-*// .*$}+A=\"/!A-B\"/!A-B"      ["PROP={^\\s-*// .*$}+A=\"/!A-B\"" "A-B" 30 nil])
      ("PROP={^\\s-*// .*$}+A=\"/!A-B\"/!        " ["PROP={^\\s-*// .*$}+A=\"/!A-B\"" "" 30 t])))
   ;; Testing that the matchers work with org-scan-tags
   (let ((scan-tests '(("xyz+LEVEL=1" 1 314 676)
                       ("xyz-abc+LEVEL=1" 1 314)
                       ("xyz" 1 82 117 314 676 787)
                       ("xyz-abc" 1 82 117 314)
                       ("HEADING={<.*>}" 395)
                       ("HEADING={One\\|Two} & LEVEL <= 2" 1 82 152)
                       ("PRIORITY=\"A\"" 314 595)
                       ("PRIORITY<>\"\"" 1 152 314 595)
                       ("PRIORITY=\"\" + LEVEL == 1" 395 676)
                       ("SCHEDULED>=\"<2012-07-01 00:00>\"" 676)
                       ("TODO=\"TODO\"" 494 510 676)
                       ("TODO=\"TODO\"  & LEVEL > 1" 494 510))))
     (tag-test-scan-do-tests scan-tests))))


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: full parser implementation for tag queries (parentheses, fast heading match, and more)
  2012-08-04 18:32   ` Christopher Genovese
@ 2012-08-16  5:02     ` Samuel Wales
  2012-08-16 17:47       ` Christopher Genovese
  0 siblings, 1 reply; 5+ messages in thread
From: Samuel Wales @ 2012-08-16  5:02 UTC (permalink / raw)
  To: Christopher Genovese; +Cc: emacs-orgmode

You have really dived into this.  I think it's excellent to allow more
flexibility in searches.

Just a brainstorm question, but having just modified the code, how
difficult do you think it would be to provide a sexp syntax?

Despite all of your obvious hard work, I'd find sexp easier to look
up, make sense of, and remember.  I favor identifiers-like-this over
single-character symbols, and (expressions (like this)) over
precedence rules.

Maybe just me though.  :)

Samuel

-- 
The Kafka Pandemic: http://thekafkapandemic.blogspot.com

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: full parser implementation for tag queries (parentheses, fast heading match, and more)
  2012-08-16  5:02     ` Samuel Wales
@ 2012-08-16 17:47       ` Christopher Genovese
  0 siblings, 0 replies; 5+ messages in thread
From: Christopher Genovese @ 2012-08-16 17:47 UTC (permalink / raw)
  To: Samuel Wales; +Cc: emacs-orgmode


[-- Attachment #1.1: Type: text/plain, Size: 10508 bytes --]

Hi Samuel,

  Thanks for your note.  Just FYI, the earlier parser code in this thread
has
been superseded by the code in the post "new tag query parser [3/5]".

   The sexp input is a nice idea, and it would be very easy I think.
The function org-make-tags-matcher now takes a query string
but could easily be modified to operate on a form as well.

    I've included some code below that basically does the job. It defines a
function
`mtrans' that transforms a sexp representation into a matcher.  It can
certainly be
better optimized (and fully tested), but I think it would do just what you
want
if inserted in org-make-tags-matcher. (Note: The car of the matcher is the
query string,
for reasons that aren't entirely clear. Because this is dropped anyway in
practice,
I don't bother making it accurate in this code. As such, I'll just give the
cdr in the examples below.)

    A few examples follow to give the idea and show the results. It seems
to handle
all the cases nicely. In the sexp representation, strings stand for exact
string matches
and  both [<string>] and (re <string>) stand for regex matches, with
symbols
for properties and standard boolean and comparison ops in the form.
The keyword :todo-only acts like /! in the query strings, and = and <> also
allow arbitrary lisp code for the property comparison with equal (as long
as the
form does not start with or, and, not, or re but then it can be shielded
with (identity ...)).

    I then append the code, which is also attached. I see no problems
with adding this to org-make-tags-matcher and would be interested in other
opinions.

    Best,

      Christopher

;;; Examples

(mtrans "foo")     ; corresponds to query string  "foo"
  => (member "foo" tags-list)

(mtrans ["^f"])   ; or (mtrans '(re "^f"))  corresponds to query string
"{^f}"
  =>  (progn
        (setq org-cached-props nil)
        (org-match-any-p "^f" tags-list))

(mtrans '(and "foo" (not "bar") (re "^u.*v")))  ; query string
"foo-bar+{^u.*v}
  => (progn
       (setq org-cached-props nil)
       (and
        (member "foo" tags-list)
        (not (member "bar" tags-list))
        (org-match-any-p "^u.*v" tags-list)))

(mtrans '(or (and "xyz" (= TODO ["^T"]) ["u\\{2,4\\}"] (<= LEVEL 3))
             (> APROP "foo")
             (and (= BPROP 4) (<> HEADING "ignore"))))
                                          ; query string
"xyz+TODO={^T}+{u\\{{2,4\\}}}+LEVEL<=3 | APROP > \"foo\" | BPROP=4&HEADING
<> \"ignore\""
  => (progn
       (setq org-cached-props nil)
       (or
        (and
         (member "xyz" tags-list)
         (org-string-match= (or todo "") "^T")
         (org-match-any-p "u\\{2,4\\}" tags-list)
         (<= level 3))
        (org-string> (or (org-cached-entry-get nil "APROP") "") "foo")
        (and
         (= (org-cached-entry-get nil "BPROP") 4)
         (org-string<> (or heading "") "ignore"))))

(mtrans '(or (and "foo" (not "bar") ["^u.*v"] (> LEVEL 2))
             (= APROP "foo")
             (and (= BPROP ["/.*/"]) (<> BPROP "/ignore/"))
             (<> TODO "TODO")
             (> SCHEDULED "<2008-11-12>")))
              ; query string "foo-bar+{^u.*v}+LEVEL>2 | APROP=\"foo\"|
BPROP={/.*/} & BPROP <> "/ignore/" | TODO<>\"TODO\" | SCHEDULED >
\"<2008-11-12>\""
  => (progn
       (setq org-cached-props nil)
       (or
        (and
         (member "foo" tags-list)
         (not (member "bar" tags-list))
         (org-match-any-p "^u.*v" tags-list)
         (> level 2))
        (string= (or (org-cached-entry-get nil "APROP") "") "foo")
        (and
         (org-string-match= (or (org-cached-entry-get nil "BPROP") "")
"/.*/")
         (org-string<> (or (org-cached-entry-get nil "BPROP") "")
"/ignore/"))
        (org-string<> (or todo "") "TODO")
        (org-time> (or (org-cached-entry-get nil "SCHEDULED") "")
1226466000.0)))

(mtrans '(and :todo-only
              (or (and (not ["^abc"]) ["ex"] (= A_PROP "foo"))
                  (>= B_PROP 1.2e10)
                  (and (< D_PROP "<2008-12-24 18:30>") (= FOO (call other
lisp code here))))))
           ; except for FOO part which has no analogue, query string
"-{^abc}+{ex}&A_PROP=\"foo\" | B_PROP > 1.2e10 | DROP < \"<2008-12-24
18:30>\" & FOO = ..."
  => (progn
       (setq org-cached-props nil)
       (and
        (member todo org-not-done-keywords)
        (or
         (and
          (not (org-match-any-p "^abc" tags-list))
          (org-match-any-p "ex" tags-list)
          (string= (or (org-cached-entry-get nil "A_PROP") "") "foo"))
         (>= (org-cached-entry-get nil "B_PROP") 12000000000.0)
         (and
          (org-time< (or (org-cached-entry-get nil "D_PROP") "")
1230094800.0)
          (equal (org-cached-entry-get nil "FOO") (call other lisp code
here))))))


;;; The Code

(eval-when-compile (require 'cl))

(defun mtrans (matcher-sexp)
  "Create a tag/todo matcher from a sexp representation.
In the sexp representation, components are transformed as follows:

  + A literal string becomes an exact tag match.
  + A [<string>] or (re <string>) becomes a tag regex match
  + (or <item>...), (and <item>...), (not <item>)
    act as boolean operators, and processing continues on the <item>'s
  + (<op> <lhs> <rhs>) is a property comparison, where op must be
    one of   <, <=, >=, >, =, ==, or <>. One of lhs or rhs must be a
    symbol naming a property and the other must be either a number,
    string, [<string>] or (re <string>) for regex, or a generic form.
    (Only =, ==, and <> are allowed on the latter two.) Property
    symbols TODO, PRIORITY, HEADING, CATEGORY, are handled specially,
    otherwise, the symbol name is used as the property name.

  + A keyword :todo-only restricts attention to not done todo keywords,
    like /! does in standard queries.

Returns a tags matcher in the standard form, although the string
in the car of the matcher is (for now) fake, i.e., the query
string would not generate the same (or any useful) matcher."
  (let ((query "!ignored!")) ; ignore making this now, as it is not really
used anyway
    (cons query
          (cond
           ((atom matcher-sexp) (mtrans-1 matcher-sexp))
           ((and (consp matcher-sexp) (listp (cdr matcher-sexp)))
            `(progn
               (setq org-cached-props nil)
               ,(mtrans-1 matcher-sexp)))
           (t (error "Badly formed matcher sexp"))))))

(defun mtrans-1 (mitem)
  (if (atom mitem)
      (cond
       ((eq mitem :todo-only)
        '(member todo org-not-done-keywords))
       ((stringp mitem)
        `(member ,mitem tags-list))
       ((and (vectorp mitem) (stringp (aref mitem 0)))
        `(org-match-any-p ,(aref mitem 0) tags-list))
       (t mitem))
    (let ((head (car mitem)))
      (case head
       ((or and)
        `(,head ,@(mapcar 'mtrans-1 (cdr mitem))))
       (not
        (when (cddr mitem) (error "not is a unary operator"))
        `(not ,(mtrans-1 (cadr mitem))))
       ((< <= >= > = == <>)
        (let* ((arg1 (cadr mitem))
               (arg2 (car (cddr mitem)))
               (rhs  (or (mtrans-cmp-rhs-p arg1)
                         (mtrans-cmp-rhs-p arg2))))
          (cond
           ((and (symbolp arg1) rhs)
            (mtrans-cmp head arg1 rhs))
           ((and (symbolp arg2) rhs)
            (mtrans-cmp head arg2 rhs))
           (t (error "Badly formed property comparison"))
           (mtrans-cmp head (cadr mitem) (car (cddr mitem))))))
       (re
        `(org-match-any-p ,(cadr mitem) tags-list))
       (t mitem)))))

(defun mtrans-cmp-rhs-p (item)
  (cond
   ((numberp item)
    `(number ,item))
   ((and (stringp item) (string-match-p "[[<].*?[]>]" item))
    `(time ,(org-matcher-time item)))
   ((stringp item)
    `(string ,item))
   ((and (vectorp item) (stringp (aref item 0)))
    `(re ,(aref item 0)))
   ((consp item)
    `(form ,item))
   (t nil)))

(defun org-not-equal (a b) (not (equal a b)))

(defvar mtrans-op-alist
  '((<  (number . <)  (string . string<)          (time . org-time<)  (re .
nil)                (form . nil))
    (<= (number . <=) (string . org-string<=)     (time . org-time<=) (re .
nil)                (form . nil))
    (>= (number . >=) (string . org-string>=)     (time . org-time>=) (re .
nil)                (form . nil))
    (>  (number . >)  (string . org-string>)      (time . org-time>)  (re .
nil)                (form . nil))
    (=  (number . =)  (string . string=)          (time . org-time>)  (re .
org-string-match=)  (form . equal))
    (== (number . =)  (string . string=)          (time . org-time=)  (re .
org-string-match=)  (form . equal))
    (<> (number . org<>)  (string . org-string<>) (time . org-time<>) (re .
org-string-match<>) (form . org-not-equal)))
  "Maps comparison operators and types to suitable comparison function.
A nil value means the comparison is erroneous.")

(defvar mtrans-special-props-alist
  `((TODO todo string re form)
    (LEVEL level number)
    (HEADING heading string re form)
    (PRIORITY priority string re form)
    (CATEGORY (get-text-property (point) 'org-category) string re form))
  "Maps special property names to their matcher symbol and constraints.
Each value is of the form (MATCHER-SYMBOL TYPE...), where TYPE is
a symbol for an allowed comparison value type.")

(defun mtrans-cmp (op prop obj)
  (let* ((type (car obj))
         (val  (cadr obj))
         (special (cdr (assoc prop mtrans-special-props-alist)))
         (prop-ref (or (car special)
                       `(org-cached-entry-get nil ,(symbol-name prop))))
         (func-alist (cdr (assq op mtrans-op-alist)))
         (func (cdr (assoc type func-alist))))
    (when (and special (not (memq type (cdr special))))
      (error "Type mismatch in %s comparison" prop))
    (when (null func)
      (error "Improper operator for %s comparison" type))
    `(,func ,(if (memq type '(number form)) prop-ref `(or ,prop-ref ""))
,val)))


On Thu, Aug 16, 2012 at 1:02 AM, Samuel Wales <samologist@gmail.com> wrote:

> You have really dived into this.  I think it's excellent to allow more
> flexibility in searches.
>
> Just a brainstorm question, but having just modified the code, how
> difficult do you think it would be to provide a sexp syntax?
>
> Despite all of your obvious hard work, I'd find sexp easier to look
> up, make sense of, and remember.  I favor identifiers-like-this over
> single-character symbols, and (expressions (like this)) over
> precedence rules.
>
> Maybe just me though.  :)
>
> Samuel
>
> --
> The Kafka Pandemic: http://thekafkapandemic.blogspot.com
>

[-- Attachment #1.2: Type: text/html, Size: 30205 bytes --]

[-- Attachment #2: tag-sexp-matchers.el --]
[-- Type: application/octet-stream, Size: 5102 bytes --]

(eval-when-compile (require 'cl))

(defun mtrans (matcher-sexp)
  "Create a tag/todo matcher from a sexp representation.
In the sexp representation, components are transformed as follows:

  + A literal string becomes an exact tag match.
  + A [<string>] or (re <string>) becomes a tag regex match
  + (or <item>...), (and <item>...), (not <item>)
    act as boolean operators, and processing continues on the <item>'s
  + (<op> <lhs> <rhs>) is a property comparison, where op must be
    one of   <, <=, >=, >, =, ==, or <>. One of lhs or rhs must be a
    symbol naming a property and the other must be either a number,
    string, [<string>] or (re <string>) for regex, or a generic form.
    (Only =, ==, and <> are allowed on the latter two.) Property
    symbols TODO, PRIORITY, HEADING, CATEGORY, are handled specially, 
    otherwise, the symbol name is used as the property name.

  + A keyword :todo-only restricts attention to not done todo keywords,
    like /! does in standard queries.

Returns a tags matcher in the standard form, although the string
in the car of the matcher is (for now) fake, i.e., the query
string would not generate the same (or any useful) matcher."
  (let ((query "!ignored!")) ; ignore making this now, as it is not really used anyway
    (cons query
          (cond
           ((atom matcher-sexp) (mtrans-1 matcher-sexp))
           ((and (consp matcher-sexp) (listp (cdr matcher-sexp)))
            `(progn
               (setq org-cached-props nil)
               ,(mtrans-1 matcher-sexp)))
           (t (error "Badly formed matcher sexp"))))))

(defun mtrans-1 (mitem)
  (if (atom mitem)
      (cond
       ((eq mitem :todo-only)
        '(member todo org-not-done-keywords))
       ((stringp mitem)
        `(member ,mitem tags-list))
       ((and (vectorp mitem) (stringp (aref mitem 0)))
        `(org-match-any-p ,(aref mitem 0) tags-list))
       (t mitem))
    (let ((head (car mitem)))
      (case head
       ((or and)
        `(,head ,@(mapcar 'mtrans-1 (cdr mitem))))
       (not
        (when (cddr mitem) (error "not is a unary operator"))
        `(not ,(mtrans-1 (cadr mitem))))
       ((< <= >= > = == <>)
        (let* ((arg1 (cadr mitem))
               (arg2 (car (cddr mitem)))
               (rhs  (or (mtrans-cmp-rhs-p arg1)
                         (mtrans-cmp-rhs-p arg2))))
          (cond
           ((and (symbolp arg1) rhs)
            (mtrans-cmp head arg1 rhs))
           ((and (symbolp arg2) rhs)
            (mtrans-cmp head arg2 rhs))
           (t (error "Badly formed property comparison"))
           (mtrans-cmp head (cadr mitem) (car (cddr mitem))))))
       (re
        `(org-match-any-p ,(cadr mitem) tags-list))
       (t mitem)))))

(defun mtrans-cmp-rhs-p (item)
  (cond
   ((numberp item)
    `(number ,item))
   ((and (stringp item) (string-match-p "[[<].*?[]>]" item))
    `(time ,(org-matcher-time item)))
   ((stringp item)
    `(string ,item))
   ((and (vectorp item) (stringp (aref item 0)))
    `(re ,(aref item 0)))
   ((consp item)
    `(form ,item))
   (t nil)))

(defun org-not-equal (a b) (not (equal a b)))

(defvar mtrans-op-alist
  '((<  (number . <)  (string . string<)          (time . org-time<)  (re . nil)                (form . nil))
    (<= (number . <=) (string . org-string<=)     (time . org-time<=) (re . nil)                (form . nil))
    (>= (number . >=) (string . org-string>=)     (time . org-time>=) (re . nil)                (form . nil))
    (>  (number . >)  (string . org-string>)      (time . org-time>)  (re . nil)                (form . nil))
    (=  (number . =)  (string . string=)          (time . org-time>)  (re . org-string-match=)  (form . equal))
    (== (number . =)  (string . string=)          (time . org-time=)  (re . org-string-match=)  (form . equal))
    (<> (number . org<>)  (string . org-string<>) (time . org-time<>) (re . org-string-match<>) (form . org-not-equal)))
  "Maps comparison operators and types to suitable comparison function.
A nil value means the comparison is erroneous.")

(defvar mtrans-special-props-alist
  `((TODO todo string re form)
    (LEVEL level number)
    (HEADING heading string re form)
    (PRIORITY priority string re form)
    (CATEGORY (get-text-property (point) 'org-category) string re form))
  "Maps special property names to their matcher symbol and constraints.
Each value is of the form (MATCHER-SYMBOL TYPE...), where TYPE is
a symbol for an allowed comparison value type.")

(defun mtrans-cmp (op prop obj)
  (let* ((type (car obj))
         (val  (cadr obj))
         (special (cdr (assoc prop mtrans-special-props-alist)))
         (prop-ref (or (car special)
                       `(org-cached-entry-get nil ,(symbol-name prop))))
         (func-alist (cdr (assq op mtrans-op-alist)))
         (func (cdr (assoc type func-alist))))
    (when (and special (not (memq type (cdr special))))
      (error "Type mismatch in %s comparison" prop))
    (when (null func)
      (error "Improper operator for %s comparison" type))
    `(,func ,(if (memq type '(number form)) prop-ref `(or ,prop-ref "")) ,val)))


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2012-08-16 17:48 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-08-04  7:50 full parser implementation for tag queries (parentheses, fast heading match, and more) Christopher Genovese
2012-08-04 10:07 ` Christopher Genovese
2012-08-04 18:32   ` Christopher Genovese
2012-08-16  5:02     ` Samuel Wales
2012-08-16 17:47       ` Christopher Genovese

Code repositories for project(s) associated with this inbox:

	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).