From mboxrd@z Thu Jan 1 00:00:00 1970 From: Christopher Genovese Subject: new tag query parser [1/5] -- the motivating issues Date: Wed, 15 Aug 2012 23:57:04 -0400 Message-ID: Mime-Version: 1.0 Content-Type: multipart/alternative; boundary=f46d042e00e391d9e204c75a0acb Return-path: Received: from eggs.gnu.org ([208.118.235.92]:39457) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1T1rD5-0004VZ-JS for emacs-orgmode@gnu.org; Wed, 15 Aug 2012 23:57:28 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1T1rD4-0005Ce-Br for emacs-orgmode@gnu.org; Wed, 15 Aug 2012 23:57:27 -0400 Received: from mail-yx0-f169.google.com ([209.85.213.169]:51460) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1T1rD4-0005CX-6B for emacs-orgmode@gnu.org; Wed, 15 Aug 2012 23:57:26 -0400 Received: by yenl1 with SMTP id l1so2807651yen.0 for ; Wed, 15 Aug 2012 20:57:25 -0700 (PDT) List-Id: "General discussions about Org-mode." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-orgmode-bounces+geo-emacs-orgmode=m.gmane.org@gnu.org Sender: emacs-orgmode-bounces+geo-emacs-orgmode=m.gmane.org@gnu.org To: emacs-orgmode@gnu.org --f46d042e00e391d9e204c75a0acb Content-Type: text/plain; charset=ISO-8859-1 My proposed changes in the tag query parser are motivated by the need and/or desire to do the following. (The example strings work with the new parser.) 1. Combine and modify tag queries programmatically. The leading case is that a function is given a tag query string and needs to *exclude* lines matching that query. To do this, we can transform query strings like so: "foo+bar+zap/TODO" --> "-(foo+bar+zap/TODO)" "foo|bar|zap" --> "-(foo|bar|zap)" The key is that we want to do this programatically while still using the mapping or agenda search command.^* I use this a lot in my GTD layer for org; other combinations and transformations come up as well. 2. Write complex queries as simply as possible (i.e., using parens). Parentheses aren't always necessary, but they can make things nicer. "(xyz|{^a}-abc) & LEVEL > 1" versus "xyz&LEVEL>1|{^a}-abc&LEVEL>1" 3. Make *fast* heading and priority searches That information is *already matched* in the current code but access is not given (or is slow in the case of PRIORITY). "LEVEL == 2 & HEADING <> {<.*>} & PRIORITY <> \"A\" " 4. Include braces in regular expression matches. "+{abc\\{{3,7\\}}}" -> regex "abc\\{3,7\\}" "{[A-Z]+\\S-+{{template}}.*$}" -> regex "[A-Z]+\\S-+{template}.*$" Because \ escapes are used so heavily in regexex and because strings require doubling them, using additional \'s would be messy, ambiguous, and hard to read. Instead, exploit that we only need to protect {}'s by *doubling* them: {{ -> { and }} -> }. This is simple, readable, fast, and parity makes correctness clear at a glance.^** 5. Allow spaces in query strings for readability. Not a big deal, but easy. See the above examples 6. Get helpful error messages at parse time when there is a problem. ^* It is of course possible to create a matcher from the string and do the search directly with lower level functions, but that ends up being a clunky solution. ^** The doubling strategy is also familiar from the doubling of \'s in quoted strings. --f46d042e00e391d9e204c75a0acb Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable My proposed changes in th= e tag query parser are motivated by the need and/or
desire to do the following. (The example strings work with the new pa= rser.)

1. Combine and modify tag queries programmatically.<= /span>
=A0=A0
=A0=A0 The leading case i= s that a function is given a tag query string and
=A0=A0 needs to *exclude* lines matching that query. To do this, we can=
=A0=A0 transform query st= rings like so:

=A0=A0=A0=A0 "foo+bar+zap/TODO"=A0 -->=A0 &qu= ot;-(foo+bar+zap/TODO)"
=A0=A0=A0=A0 "foo|ba= r|zap"=A0=A0=A0=A0=A0=A0 -->=A0 "-(foo|bar|zap)"
=A0=A0= The key is that we want to do this programatically while still using
=A0=A0 the mapping or age= nda search command.^* I use this a lot in my GTD
=A0=A0 layer for org; other combinations and transformations come up as = well.

2. Write complex queries as simply as possible (i.e.= , using parens).
=A0=A0

=A0=A0 Parentheses aren&#= 39;t always necessary, but they can make things nicer.
=A0=A0
=A0=A0 "(xyz|{^a}-ab= c) & LEVEL > 1"=A0 versus "xyz&LEVEL>1|{^a}-abc&= ;LEVEL>1"
3. Make *fast* heading an= d priority searches
<= span style=3D"font-family:courier new,monospace">=A0=A0

=A0=A0 That information i= s *already matched* in the current code but access
=A0=A0 is not given (or is slow in the case of PRIORITY).
=A0=A0
=A0=A0 "LEVEL =3D=3D 2 & HEADING <> {<.*>= ;} & PRIORITY <> \"A\" "

4. Include braces in regular expression matches.
=A0=A0
=A0=A0 "+{abc\\{{3,7= \\}}}"=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 -> regex "abc\\{3,7= \\}"
=A0=A0 "{[A-Z]+\\S-+{{template}= }.*$}"=A0 -> regex "[A-Z]+\\S-+{template}.*$"
=A0=A0
=A0=A0 Because \ escapes are used so heavily in regexex and b= ecause strings
=A0=A0 require doubling t= hem, using additional \'s would be messy,
= =A0=A0 ambiguous, and hard to read.
=A0=A0
=A0=A0 Instead, exploit that we only need to protect {}'s= by *doubling*
=A0=A0 them: {{ -> { a= nd }} -> }. This is simple, readable, fast, and parity
=A0=A0 makes correctness clear at a glance.^**
=A0=A0
5. Allow spaces in query strings for readability.
=A0=A0
=A0=A0 Not a big deal, but easy. See the above examples

6. Get helpful error messages at parse time when the= re is a problem.


^* It is of course possible to create a matcher from= the string
=A0=A0 and do the search directly = with lower level functions, but
=A0=A0 that ends up being= a clunky solution.
<= br style=3D"font-family:courier new,monospace">^** The doubling strategy is also familiar from the d= oubling of \'s
=A0=A0=A0 in quoted strin= gs.

--f46d042e00e391d9e204c75a0acb--