From mboxrd@z Thu Jan 1 00:00:00 1970 From: Nick Dokos Subject: Re: [OT]: Search for missing :END: Date: Mon, 21 Nov 2011 16:38:22 -0500 Message-ID: <6557.1321911502@alphaville.americas.hpqcorp.net> References: <0vvcqdxqf0.fsf@gmail.com> Reply-To: nicholas.dokos@hp.com Return-path: Received: from eggs.gnu.org ([140.186.70.92]:41939) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1RSbvM-0001I1-Vf for emacs-orgmode@gnu.org; Mon, 21 Nov 2011 17:01:16 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1RSbvD-0003Da-Iy for emacs-orgmode@gnu.org; Mon, 21 Nov 2011 17:01:12 -0500 Received: from g1t0026.austin.hp.com ([15.216.28.33]:42392) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1RSbvD-00032N-CH for emacs-orgmode@gnu.org; Mon, 21 Nov 2011 17:01:03 -0500 In-Reply-To: Message from Markus Heller of "Mon, 21 Nov 2011 10:48:35 PST." <0vvcqdxqf0.fsf@gmail.com> List-Id: "General discussions about Org-mode." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-orgmode-bounces+geo-emacs-orgmode=m.gmane.org@gnu.org Sender: emacs-orgmode-bounces+geo-emacs-orgmode=m.gmane.org@gnu.org To: Markus Heller Cc: nicholas.dokos@hp.com, emacs-orgmode@gnu.org Markus Heller wrote: > Hello all, > > I have an OT request that can hopefully be answered by emacs gurus in > less than a minute: > > I'm looking for an emacs search expression that finds :PROPERTIES: > *without* a matching :END: ... > If you mean a regexp, you are wasting your time[fn:1]. Regexps are powerful, but their range of applicability is limited to regular languages and even then, you have to worry about their efficiency. The above *is* a regular language: if P stands for :PROPERTIES: and E stands for :END:, then the regexp is ([^EP]*P[^EP]*E)* In words, the stuff inside the parens says: 0 or more "other" things (non-P and non-E), followed by a P, followed by 0 or more "other" things, followed by an E. You can then have 0 or more of the parenthesized things. This will succeed on well formed "sentences" and fail on others. But it might have to backtrack over the inner [^EP]* matches and then the outer matches, and rescan arbitrarily long stretches, which in the worst case, can turn your search into an exponentially slow descent into the abyss. You might be able to write non-greedy regexps that might behave better in this case. In most cases, you'd end up with a horrendous-looking regexp: good luck trying to understand it next week. That's my biggest problem with complicated regexps. However, a change of tool will simplify the problem enormously. E.g. here's a simple algorithm that can be used for this kind of problem: start a nesting depth at 0 - when you see a P, increment the nesting depth by 1; when you see an E, decrement it by 1. If the nesting depth ever becomes something other than 0 or 1, you got a problem - also, if at EOF, the nesting depth is not 0, you got a problem. Easy variations of this will check well-formedness even when nesting *is* allowed. You can easily write such a program in any language you are familiar with (it does not have to be elisp, although you *can* write it in elisp - personally, I'd use awk). But assuming that you are getting some error from org, you don't know where the problem is and you are trying to find it, it will be simpler to just use egrep: grep -E -n ':PROPERTIES:|:END:' foo.org will filter out the relevant lines, so all you have to do is scan the output by eye and spot any irregularity (consecutive :PROPERTIES: or consecutive :END: lines). Even if you have hundreds of them, that's *easy* for humans to do.[fn:2] Or, if you prefer, you can write trivial validation programs to operate on the output, e.g.: grep -E -n ':PROPERTIES:|:END:' foo.org | tee foo.out | grep PROP | wc -l grep END foo.out | wc -l (the counts 'd better be the same). or grep -E -n ':PROPERTIES:|:END:' foo.org | foo.awk where foo.awk implements the nesting depth algorithm above - something like this: --8<---------------cut here---------------start------------->8--- #! /bin/bash awk ' BEGIN { d = 0;} /:PROPERTIES:/ { d++; if (d > 1) { print $1, $d; exit; }} /:END:/ { d--; if (d < 0) { print $1, $d; exit; }} END { if (d != 0) { print $1, $d; }}' --8<---------------cut here---------------end--------------->8--- Even on Windoze, you can probably do all this stuff with cygwin. Nick > Thanks and Cheers and sorry for the OT ... > > Markus > > Footnotes: [fn:1] In the (approximate) words of Jamie Zawinski: "You have a problem. You think 'I know, let me use regexps to solve it'. Now you have two problems." [fn:2] Of course, if you have formatted your file perversely or done other naughty things, this might not work. The point is that although this is not foolproof, it should deal with the vast majority of "reasonable" files out there.