From mboxrd@z Thu Jan 1 00:00:00 1970 From: Christopher Genovese Subject: new tag query parser [2/5] -- some bugs/limitations in the current parser Date: Wed, 15 Aug 2012 23:59:11 -0400 Message-ID: Mime-Version: 1.0 Content-Type: multipart/alternative; boundary=e89a8ff2496520774c04c75a12fe Return-path: Received: from eggs.gnu.org ([208.118.235.92]:39781) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1T1rF9-0005Fc-TB for emacs-orgmode@gnu.org; Wed, 15 Aug 2012 23:59:37 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1T1rF7-0005O4-Ob for emacs-orgmode@gnu.org; Wed, 15 Aug 2012 23:59:35 -0400 Received: from mail-pb0-f41.google.com ([209.85.160.41]:63835) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1T1rF7-0005O0-F0 for emacs-orgmode@gnu.org; Wed, 15 Aug 2012 23:59:33 -0400 Received: by pbbro12 with SMTP id ro12so958641pbb.0 for ; Wed, 15 Aug 2012 20:59:32 -0700 (PDT) List-Id: "General discussions about Org-mode." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-orgmode-bounces+geo-emacs-orgmode=m.gmane.org@gnu.org Sender: emacs-orgmode-bounces+geo-emacs-orgmode=m.gmane.org@gnu.org To: emacs-orgmode@gnu.org --e89a8ff2496520774c04c75a12fe Content-Type: text/plain; charset=ISO-8859-1 1. Property names with -'s are not handled properly Specifically, the escapes are not removed. Ex: (org-make-tags-matcher "PROP\\-WITH\\-HYPHENS=2") produces ("PROP\\-WITH\\-HYPHENS=2" and (progn (setq org-cached-props nil) (= (string-to-number (or (org-cached-entry-get nil "PROP\\-WITH\\-HYPHENS") "")) 2)) t) The property name in the matcher should be "PROP-WITH-HYPHENS". The original code /does/ instead remove -'s from tag names, which shouldn't have them anyway. I suspect that this was intended for property names rather than tag names. 2. Incorrect comparison operators allowed, produce bad matchers. The regular expression "[<=>]\\{1,2\\}" is used to detect the comparison operators. But this can produce bad matchers that fail opaquely at match time rather than giving an appropriate error message at parse time. Ex: (org-make-tags-matcher "P<<2") produces ("P<<2" and (progn (setq org-cached-props nil) (nil (string-to-number (or (org-cached-entry-get nil "P") "")) 2)) t) 3. A faulty test for todo matcher in org-make-tags-matcher The current code uses (string-match "/+" match) to detect the presence of the shortcut /!? style todo matchers. But this is insufficient. Ex: (org-make-tags-matcher "PROP={^\\s-*// .*$}") produces an erroneous matcher: ("PROP={^\\s-*// .*$}" progn (setq org-cached-props nil) (member "PROP" tags-list)) We want to find the first slash that is not enclosed in {}'s or ""'s; if found, a todo match is needed. A simple pattern will not be enough for this. As a side note, org allows arbitrary characters in TODO keywords, (For instance, both PROP={/!} and PROP="/!{/!}" are valid TODO keywords (it works!) *and* valid property comparisons.) The assumption of the current version is that {}'s and "'s are excluded. I also exclude ()'s from TODO keywords in the new version for reasons we can discuss later. Neither seems like a big loss. If you are using {}'s, "'s, or ()'s in your TODO keywords, use a TODO= match rather than a /!? match. 4. Regexp matchers in todo queries fail when no TODO for an item. Ex: (org-make-tags-matcher "/{\\S-}") produces ("/{\\S-}" and t (string-match "\\S-" todo)) This will raise an error when todo is nil (no todo keyword on a scanned item) when doing an org-map-entries, say. The todo should be replaced with a (or todo "") as it is for tag-style TODO queries. 5. A minor consistency issue At line 7179 in org.el (v 7.8.11), missing an org-re call in a regex that uses posix classes. The org-re is used elsewhere, for xemacs compatibility, I think. [FWIW, all of these are problems eliminated or made moot in the new parser.] --e89a8ff2496520774c04c75a12fe Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable 1. Property names with -&= #39;s are not handled properly
=A0=A0 <= br style=3D"font-family:courier new,monospace"> =A0=A0 Specifically, the = escapes are not removed.
=A0=A0
=A0=A0 Ex: (org-make-tags= -matcher "PROP\\-WITH\\-HYPHENS=3D2") produces
=A0=A0
=A0=A0 ("PROP\\-WITH= \\-HYPHENS=3D2" and
=A0=A0=A0 (progn
=A0=A0=A0 (setq org-cache= d-props nil)
=A0=A0=A0 (=3D
=A0=A0=A0=A0 (string-to-n= umber
=A0=A0=A0=A0=A0 (or (org-cached-entry-ge= t nil "PROP\\-WITH\\-HYPHENS")
=A0=A0=A0=A0=A0 "&qu= ot;))
=A0=A0=A0=A0 2))
=A0=A0=A0 t)

=A0=A0 The= property name in the matcher should be "PROP-WITH-HYPHENS".
=A0=A0 The original code = /does/ instead remove -'s from tag names, which
=A0=A0 shouldn't have them anyway. I suspect that this was intend= ed for
=A0=A0 property names rat= her than tag names.
<= span style=3D"font-family:courier new,monospace">=A0=A0

2. Incorrect comparison o= perators allowed, produce bad matchers.

=A0=A0 The regular expression &qu= ot;[<=3D>]\\{1,2\\}" is used to detect the
=A0=A0 comparison operato= rs. But this can produce bad matchers that fail
=A0=A0 message at parse t= ime.

=A0=A0 Ex: (org-make-tags-matcher "P<<2") produces

=A0=A0=A0 ("P<<2" and
=A0=A0=A0=A0 (progn
=A0=A0=A0=A0=A0=A0 (setq = org-cached-props nil)
=A0=A0=A0=A0=A0=A0 (nil<= /span>
=A0=A0=A0=A0=A0=A0=A0 (st= ring-to-number (or (org-cached-entry-get nil "P") "")) = 2))
=A0=A0=A0=A0 t)

3. A faulty test for todo matcher in org-make-tags-m= atcher

=A0=A0 The current code u= ses (string-match "/+" match)
=A0=A0= to detect the presence of the shortcut /!? style todo matchers.
=A0=A0 But this is insuff= icient.

=A0=A0 Ex: (org-make-tags-matcher "PROP=3D{^\\s-*// .*$}&quo= t;) produces
=A0=A0 an erroneous match= er:

=A0=A0=A0=A0=A0=A0 ("PROP=3D{^\\s-*// .*$}" progn =A0=A0=A0=A0=A0=A0=A0 (se= tq org-cached-props nil)
=A0=A0=A0=A0=A0=A0=A0= (member "PROP" tags-list))

=A0=A0 We want to find the first slash that is not e= nclosed in {}'s or
=A0=A0 ""'= ;s; if found, a todo match is needed. A simple pattern

=A0=A0 will not be enough= for this.
=A0=A0
=A0=A0 As a side note, or= g allows arbitrary characters in TODO keywords,
=A0=A0 keywords (it works= !) *and* valid property comparisons.)
=A0=A0 = The assumption of the current version is that {}'s and "'s
=A0=A0 are excluded.=A0 I= also exclude ()'s from TODO keywords in the
=A0=A0 new version for reasons we can discuss later. Neither seems
=A0=A0 like a big loss. I= f you are using {}'s, "'s, or ()'s in your
=A0=A0 TODO keywords, use a TODO=3D match rather than a /!? m= atch.

4. Regexp matchers in todo queries fail when no TODO= for an item.
=A0=A0
=A0=A0 Ex: (org-make-tags= -matcher "/{\\S-}") produces
=A0=A0 =
=A0=A0=A0=A0=A0 ("/{= \\S-}" and t (string-match "\\S-" todo))
=A0=A0=A0=A0=A0
=A0=A0 This will raise an= error when todo is nil (no todo keyword on a
= =A0=A0 scanned item) when doing an org-map-entries, say. The todo should be=
=A0=A0 replaced with a (o= r todo "") as it is for tag-style TODO queries.

5. A minor consistency is= sue
=A0=A0
=A0=A0 At line 7179 in or= g.el (v 7.8.11), missing an org-re call in a
= =A0=A0 regex that uses posix classes. The org-re is used elsewhere,<= br style=3D"font-family:courier new,monospace"> =A0=A0 for xemacs compati= bility, I think.

[FWIW, all of these are p= roblems eliminated or made moot in the new parser.]


--e89a8ff2496520774c04c75a12fe--