new tag query parser [1/5] -- the motivating issues

emacs-orgmode@gnu.org archives
 help / color / mirror / code / Atom feed

* new tag query parser [1/5] -- the motivating issues
@ 2012-08-16  3:57 Christopher Genovese
  2012-08-18 14:18 ` Martin Pohlack
  0 siblings, 1 reply; 3+ messages in thread
From: Christopher Genovese @ 2012-08-16  3:57 UTC (permalink / raw)
  To: emacs-orgmode

[-- Attachment #1: Type: text/plain, Size: 2114 bytes --]

My proposed changes in the tag query parser are motivated by the need and/or
desire to do the following. (The example strings work with the new parser.)

1. Combine and modify tag queries programmatically.

   The leading case is that a function is given a tag query string and
   needs to *exclude* lines matching that query. To do this, we can
   transform query strings like so:

     "foo+bar+zap/TODO"  -->  "-(foo+bar+zap/TODO)"
     "foo|bar|zap"       -->  "-(foo|bar|zap)"

   The key is that we want to do this programatically while still using
   the mapping or agenda search command.^* I use this a lot in my GTD
   layer for org; other combinations and transformations come up as well.

2. Write complex queries as simply as possible (i.e., using parens).

   Parentheses aren't always necessary, but they can make things nicer.

   "(xyz|{^a}-abc) & LEVEL > 1"  versus "xyz&LEVEL>1|{^a}-abc&LEVEL>1"

3. Make *fast* heading and priority searches

   That information is *already matched* in the current code but access
   is not given (or is slow in the case of PRIORITY).

   "LEVEL == 2 & HEADING <> {<.*>} & PRIORITY <> \"A\" "

4. Include braces in regular expression matches.

   "+{abc\\{{3,7\\}}}"             -> regex "abc\\{3,7\\}"
   "{[A-Z]+\\S-+{{template}}.*$}"  -> regex "[A-Z]+\\S-+{template}.*$"

   Because \ escapes are used so heavily in regexex and because strings
   require doubling them, using additional \'s would be messy,
   ambiguous, and hard to read.

   Instead, exploit that we only need to protect {}'s by *doubling*
   them: {{ -> { and }} -> }. This is simple, readable, fast, and parity
   makes correctness clear at a glance.^**

5. Allow spaces in query strings for readability.

   Not a big deal, but easy. See the above examples

6. Get helpful error messages at parse time when there is a problem.

^* It is of course possible to create a matcher from the string
   and do the search directly with lower level functions, but
   that ends up being a clunky solution.

^** The doubling strategy is also familiar from the doubling of \'s
    in quoted strings.

[-- Attachment #2: Type: text/html, Size: 7436 bytes --]

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: new tag query parser [1/5] -- the motivating issues
  2012-08-16  3:57 new tag query parser [1/5] -- the motivating issues Christopher Genovese
@ 2012-08-18 14:18 ` Martin Pohlack
  2012-08-18 18:10   ` Christopher Genovese
  0 siblings, 1 reply; 3+ messages in thread
From: Martin Pohlack @ 2012-08-18 14:18 UTC (permalink / raw)
  To: Christopher Genovese; +Cc: emacs-orgmode

Hi Christopher,

If I understand your descriptions correctly, your proposed changes are
very cool.

Could you elaborate a little bit on performance?

* Are we going to see speedups?  In what cases?  How much?

* If we lose performance, could you quantify that a bit with some examples?

A question regarding backwards compatibility (I might have missed that
in the description, sorry):  Are you converting existing queries on the
fly each time, or do we have to convert our queries once?  If yes, is
there some assisting code?

Thanks,
Martin

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: new tag query parser [1/5] -- the motivating issues
  2012-08-18 14:18 ` Martin Pohlack
@ 2012-08-18 18:10   ` Christopher Genovese
  0 siblings, 0 replies; 3+ messages in thread
From: Christopher Genovese @ 2012-08-18 18:10 UTC (permalink / raw)
  To: Martin Pohlack; +Cc: emacs-orgmode

[-- Attachment #1: Type: text/plain, Size: 2958 bytes --]

Hi Martin,

   Assuming that org.el (with the new parser code) is byte-compiled, the
performance
difference is very minor. The only difference comes in converting the query
string
to a matcher form. The new parser has some additional overhead in function
calls and
keeping track of state, but in practice it is negligible.

   For example, in some basic benchmarks, both parsers can convert 10,000
fairly
complex query strings in a second or two *total*. If you run the tests,
you'll see that it does
over 200 cases plus comparisons and a good deal of other stuff in a blink
of an eye.

   So for any given agenda search or entry mapping, users will not notice
any real difference.

   Regarding backward compatibility, there is no conversion necessary. All
currently
valid queries produce equivalent matchers with the new code. The new parser
extends
the grammar to incorporate features that would not produce valid matchers
with current
code: parenthesized expressions, spaces, and {}-escapes in regexp matches.

    The only issue in this regard is that I added the name HEADING
to the list of special properties (like LEVEL, CATEGORY, PRIORITY, etc.).
This allows heading matches, which is one of my favorite features. So
existing
queries with a user-defined property HEADING would match the real heading
rather
than the property. This seems like a minor issue to me, but it would need
to be noted.

     Regards,

      Christopher

P.S. The provision above (and in the original posts) about byte compiling
the
parser code (which would be in org.el) relates to macro-expansion overhead.
I use a macro that makes the new parser function more readable and
maintainable,
and does much of its work at compile time to produce faster code.
In interpreted code that macro is expanded each pass through the loop.
The macro could be eliminated if necessary, or made faster in interpreted
code by
various tricks (that would add some overhead to compiled code).
But since org.el is typically  byte compiled during installation, this
doesn't seem
to me to be a problem. Performance is fine in practice either way, though
faster in
the typical compiled case, and I think the clarity gained from the macro is
worthwhile.

But definitely byte compile the new code before testing, as I advise in the
posts.

On Sat, Aug 18, 2012 at 10:18 AM, Martin Pohlack
<mp26@os.inf.tu-dresden.de>wrote:

> Hi Christopher,
>
> If I understand your descriptions correctly, your proposed changes are
> very cool.
>
> Could you elaborate a little bit on performance?
>
> * Are we going to see speedups?  In what cases?  How much?
>
> * If we lose performance, could you quantify that a bit with some examples?
>
> A question regarding backwards compatibility (I might have missed that
> in the description, sorry):  Are you converting existing queries on the
> fly each time, or do we have to convert our queries once?  If yes, is
> there some assisting code?
>
> Thanks,
> Martin
>
>

[-- Attachment #2: Type: text/html, Size: 3440 bytes --]

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2012-08-18 18:10 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-08-16  3:57 new tag query parser [1/5] -- the motivating issues Christopher Genovese
2012-08-18 14:18 ` Martin Pohlack
2012-08-18 18:10   ` Christopher Genovese

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).