From mboxrd@z Thu Jan  1 00:00:00 1970
From: Christopher Genovese <genovese@cmu.edu>
Subject: new tag query parser [1/5] -- the motivating issues
Date: Wed, 15 Aug 2012 23:57:04 -0400
Message-ID: <CAPum5Fg053gQzGgsVjROg1exAbLeYQq5Qw7DjP=-MLiRyYzgBA@mail.gmail.com>
Mime-Version: 1.0
Content-Type: multipart/alternative; boundary=f46d042e00e391d9e204c75a0acb
Return-path: <emacs-orgmode-bounces+geo-emacs-orgmode=m.gmane.org@gnu.org>
Received: from eggs.gnu.org ([208.118.235.92]:39457)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <genovese.cr@gmail.com>) id 1T1rD5-0004VZ-JS
	for emacs-orgmode@gnu.org; Wed, 15 Aug 2012 23:57:28 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <genovese.cr@gmail.com>) id 1T1rD4-0005Ce-Br
	for emacs-orgmode@gnu.org; Wed, 15 Aug 2012 23:57:27 -0400
Received: from mail-yx0-f169.google.com ([209.85.213.169]:51460)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <genovese.cr@gmail.com>) id 1T1rD4-0005CX-6B
	for emacs-orgmode@gnu.org; Wed, 15 Aug 2012 23:57:26 -0400
Received: by yenl1 with SMTP id l1so2807651yen.0
	for <emacs-orgmode@gnu.org>; Wed, 15 Aug 2012 20:57:25 -0700 (PDT)
List-Id: "General discussions about Org-mode." <emacs-orgmode.gnu.org>
List-Unsubscribe: <https://lists.gnu.org/mailman/options/emacs-orgmode>,
	<mailto:emacs-orgmode-request@gnu.org?subject=unsubscribe>
List-Archive: <http://lists.gnu.org/archive/html/emacs-orgmode>
List-Post: <mailto:emacs-orgmode@gnu.org>
List-Help: <mailto:emacs-orgmode-request@gnu.org?subject=help>
List-Subscribe: <https://lists.gnu.org/mailman/listinfo/emacs-orgmode>,
	<mailto:emacs-orgmode-request@gnu.org?subject=subscribe>
Errors-To: emacs-orgmode-bounces+geo-emacs-orgmode=m.gmane.org@gnu.org
Sender: emacs-orgmode-bounces+geo-emacs-orgmode=m.gmane.org@gnu.org
To: emacs-orgmode@gnu.org

--f46d042e00e391d9e204c75a0acb
Content-Type: text/plain; charset=ISO-8859-1

My proposed changes in the tag query parser are motivated by the need and/or
desire to do the following. (The example strings work with the new parser.)

1. Combine and modify tag queries programmatically.

   The leading case is that a function is given a tag query string and
   needs to *exclude* lines matching that query. To do this, we can
   transform query strings like so:

     "foo+bar+zap/TODO"  -->  "-(foo+bar+zap/TODO)"
     "foo|bar|zap"       -->  "-(foo|bar|zap)"

   The key is that we want to do this programatically while still using
   the mapping or agenda search command.^* I use this a lot in my GTD
   layer for org; other combinations and transformations come up as well.

2. Write complex queries as simply as possible (i.e., using parens).

   Parentheses aren't always necessary, but they can make things nicer.

   "(xyz|{^a}-abc) & LEVEL > 1"  versus "xyz&LEVEL>1|{^a}-abc&LEVEL>1"

3. Make *fast* heading and priority searches

   That information is *already matched* in the current code but access
   is not given (or is slow in the case of PRIORITY).

   "LEVEL == 2 & HEADING <> {<.*>} & PRIORITY <> \"A\" "

4. Include braces in regular expression matches.

   "+{abc\\{{3,7\\}}}"             -> regex "abc\\{3,7\\}"
   "{[A-Z]+\\S-+{{template}}.*$}"  -> regex "[A-Z]+\\S-+{template}.*$"

   Because \ escapes are used so heavily in regexex and because strings
   require doubling them, using additional \'s would be messy,
   ambiguous, and hard to read.

   Instead, exploit that we only need to protect {}'s by *doubling*
   them: {{ -> { and }} -> }. This is simple, readable, fast, and parity
   makes correctness clear at a glance.^**

5. Allow spaces in query strings for readability.

   Not a big deal, but easy. See the above examples

6. Get helpful error messages at parse time when there is a problem.


^* It is of course possible to create a matcher from the string
   and do the search directly with lower level functions, but
   that ends up being a clunky solution.

^** The doubling strategy is also familiar from the doubling of \'s
    in quoted strings.

--f46d042e00e391d9e204c75a0acb
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

<span style=3D"font-family:courier new,monospace">My proposed changes in th=
e tag query parser are motivated by the need and/or</span><br style=3D"font=
-family:courier new,monospace"><span style=3D"font-family:courier new,monos=
pace">desire to do the following. (The example strings work with the new pa=
rser.)</span><br style=3D"font-family:courier new,monospace">

<br style=3D"font-family:courier new,monospace"><span style=3D"font-family:=
courier new,monospace">1. Combine and modify tag queries programmatically.<=
/span><br style=3D"font-family:courier new,monospace"><span style=3D"font-f=
amily:courier new,monospace">=A0=A0 </span><br style=3D"font-family:courier=
 new,monospace">

<span style=3D"font-family:courier new,monospace">=A0=A0 The leading case i=
s that a function is given a tag query string and</span><br style=3D"font-f=
amily:courier new,monospace"><span style=3D"font-family:courier new,monospa=
ce">=A0=A0 needs to *exclude* lines matching that query. To do this, we can=
</span><br style=3D"font-family:courier new,monospace">

<span style=3D"font-family:courier new,monospace">=A0=A0 transform query st=
rings like so:</span><br style=3D"font-family:courier new,monospace"><br st=
yle=3D"font-family:courier new,monospace"><span style=3D"font-family:courie=
r new,monospace">=A0=A0=A0=A0 &quot;foo+bar+zap/TODO&quot;=A0 --&gt;=A0 &qu=
ot;-(foo+bar+zap/TODO)&quot;</span><br style=3D"font-family:courier new,mon=
ospace">

<span style=3D"font-family:courier new,monospace">=A0=A0=A0=A0 &quot;foo|ba=
r|zap&quot;=A0=A0=A0=A0=A0=A0 --&gt;=A0 &quot;-(foo|bar|zap)&quot;</span><b=
r style=3D"font-family:courier new,monospace"><br style=3D"font-family:cour=
ier new,monospace"><span style=3D"font-family:courier new,monospace">=A0=A0=
 The key is that we want to do this programatically while still using</span=
><br style=3D"font-family:courier new,monospace">

<span style=3D"font-family:courier new,monospace">=A0=A0 the mapping or age=
nda search command.^* I use this a lot in my GTD</span><br style=3D"font-fa=
mily:courier new,monospace"><span style=3D"font-family:courier new,monospac=
e">=A0=A0 layer for org; other combinations and transformations come up as =
well.</span><br style=3D"font-family:courier new,monospace">

<br style=3D"font-family:courier new,monospace"><span style=3D"font-family:=
courier new,monospace">2. Write complex queries as simply as possible (i.e.=
, using parens).</span><br style=3D"font-family:courier new,monospace"><spa=
n style=3D"font-family:courier new,monospace">=A0=A0 </span><br style=3D"fo=
nt-family:courier new,monospace">

<span style=3D"font-family:courier new,monospace">=A0=A0 Parentheses aren&#=
39;t always necessary, but they can make things nicer.</span><br style=3D"f=
ont-family:courier new,monospace"><span style=3D"font-family:courier new,mo=
nospace">=A0=A0 </span><br style=3D"font-family:courier new,monospace">

<span style=3D"font-family:courier new,monospace">=A0=A0 &quot;(xyz|{^a}-ab=
c) &amp; LEVEL &gt; 1&quot;=A0 versus &quot;xyz&amp;LEVEL&gt;1|{^a}-abc&amp=
;LEVEL&gt;1&quot;</span><br style=3D"font-family:courier new,monospace"><br=
 style=3D"font-family:courier new,monospace">

<span style=3D"font-family:courier new,monospace">3. Make *fast* heading an=
d priority searches</span><br style=3D"font-family:courier new,monospace"><=
span style=3D"font-family:courier new,monospace">=A0=A0 </span><br style=3D=
"font-family:courier new,monospace">

<span style=3D"font-family:courier new,monospace">=A0=A0 That information i=
s *already matched* in the current code but access</span><br style=3D"font-=
family:courier new,monospace"><span style=3D"font-family:courier new,monosp=
ace">=A0=A0 is not given (or is slow in the case of PRIORITY).</span><br st=
yle=3D"font-family:courier new,monospace">

<span style=3D"font-family:courier new,monospace">=A0=A0 </span><br style=
=3D"font-family:courier new,monospace"><span style=3D"font-family:courier n=
ew,monospace">=A0=A0 &quot;LEVEL =3D=3D 2 &amp; HEADING &lt;&gt; {&lt;.*&gt=
;} &amp; PRIORITY &lt;&gt; \&quot;A\&quot; &quot;</span><br style=3D"font-f=
amily:courier new,monospace">

<br style=3D"font-family:courier new,monospace"><span style=3D"font-family:=
courier new,monospace">4. Include braces in regular expression matches.</sp=
an><br style=3D"font-family:courier new,monospace"><span style=3D"font-fami=
ly:courier new,monospace">=A0=A0 </span><br style=3D"font-family:courier ne=
w,monospace">

<span style=3D"font-family:courier new,monospace">=A0=A0 &quot;+{abc\\{{3,7=
\\}}}&quot;=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 -&gt; regex &quot;abc\\{3,7=
\\}&quot;</span><br style=3D"font-family:courier new,monospace"><span style=
=3D"font-family:courier new,monospace">=A0=A0 &quot;{[A-Z]+\\S-+{{template}=
}.*$}&quot;=A0 -&gt; regex &quot;[A-Z]+\\S-+{template}.*$&quot;</span><br s=
tyle=3D"font-family:courier new,monospace">

<span style=3D"font-family:courier new,monospace">=A0=A0 </span><br style=
=3D"font-family:courier new,monospace"><span style=3D"font-family:courier n=
ew,monospace">=A0=A0 Because \ escapes are used so heavily in regexex and b=
ecause strings</span><br style=3D"font-family:courier new,monospace">

<span style=3D"font-family:courier new,monospace">=A0=A0 require doubling t=
hem, using additional \&#39;s would be messy,</span><br style=3D"font-famil=
y:courier new,monospace"><span style=3D"font-family:courier new,monospace">=
=A0=A0 ambiguous, and hard to read.</span><br style=3D"font-family:courier =
new,monospace">

<span style=3D"font-family:courier new,monospace">=A0=A0 </span><br style=
=3D"font-family:courier new,monospace"><span style=3D"font-family:courier n=
ew,monospace">=A0=A0 Instead, exploit that we only need to protect {}&#39;s=
 by *doubling*</span><br style=3D"font-family:courier new,monospace">

<span style=3D"font-family:courier new,monospace">=A0=A0 them: {{ -&gt; { a=
nd }} -&gt; }. This is simple, readable, fast, and parity</span><br style=
=3D"font-family:courier new,monospace"><span style=3D"font-family:courier n=
ew,monospace">=A0=A0 makes correctness clear at a glance.^**</span><br styl=
e=3D"font-family:courier new,monospace">

<span style=3D"font-family:courier new,monospace">=A0=A0 </span><br style=
=3D"font-family:courier new,monospace"><span style=3D"font-family:courier n=
ew,monospace">5. Allow spaces in query strings for readability.</span><br s=
tyle=3D"font-family:courier new,monospace">

<span style=3D"font-family:courier new,monospace">=A0=A0 </span><br style=
=3D"font-family:courier new,monospace"><span style=3D"font-family:courier n=
ew,monospace">=A0=A0 Not a big deal, but easy. See the above examples</span=
><br style=3D"font-family:courier new,monospace">

<br style=3D"font-family:courier new,monospace"><span style=3D"font-family:=
courier new,monospace">6. Get helpful error messages at parse time when the=
re is a problem.</span><br style=3D"font-family:courier new,monospace"><br =
style=3D"font-family:courier new,monospace">

<br style=3D"font-family:courier new,monospace"><span style=3D"font-family:=
courier new,monospace">^* It is of course possible to create a matcher from=
 the string</span><br style=3D"font-family:courier new,monospace"><span sty=
le=3D"font-family:courier new,monospace">=A0=A0 and do the search directly =
with lower level functions, but</span><br style=3D"font-family:courier new,=
monospace">

<span style=3D"font-family:courier new,monospace">=A0=A0 that ends up being=
 a clunky solution.</span><br style=3D"font-family:courier new,monospace"><=
br style=3D"font-family:courier new,monospace"><span style=3D"font-family:c=
ourier new,monospace">^** The doubling strategy is also familiar from the d=
oubling of \&#39;s</span><br style=3D"font-family:courier new,monospace">

<span style=3D"font-family:courier new,monospace">=A0=A0=A0 in quoted strin=
gs. </span><br style=3D"font-family:courier new,monospace"><br style=3D"fon=
t-family:courier new,monospace">

--f46d042e00e391d9e204c75a0acb--