From mboxrd@z Thu Jan 1 00:00:00 1970 From: Markus Heller Subject: Re: [OT]: Search for missing :END: Date: Mon, 21 Nov 2011 15:27:03 -0800 Message-ID: <0vr511xdiw.fsf@gmail.com> References: <0vvcqdxqf0.fsf@gmail.com> <6557.1321911502@alphaville.americas.hpqcorp.net> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Return-path: Received: from eggs.gnu.org ([140.186.70.92]:56746) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1RSdGj-0007co-BH for emacs-orgmode@gnu.org; Mon, 21 Nov 2011 18:27:25 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1RSdGf-00021M-Cy for emacs-orgmode@gnu.org; Mon, 21 Nov 2011 18:27:21 -0500 Received: from lo.gmane.org ([80.91.229.12]:46318) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1RSdGf-000214-2o for emacs-orgmode@gnu.org; Mon, 21 Nov 2011 18:27:17 -0500 Received: from list by lo.gmane.org with local (Exim 4.69) (envelope-from ) id 1RSdGc-00018g-4x for emacs-orgmode@gnu.org; Tue, 22 Nov 2011 00:27:14 +0100 Received: from mail.cdrd.ca ([142.103.191.98]) by main.gmane.org with esmtp (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Tue, 22 Nov 2011 00:27:14 +0100 Received: from hellerm2 by mail.cdrd.ca with local (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Tue, 22 Nov 2011 00:27:14 +0100 List-Id: "General discussions about Org-mode." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-orgmode-bounces+geo-emacs-orgmode=m.gmane.org@gnu.org Sender: emacs-orgmode-bounces+geo-emacs-orgmode=m.gmane.org@gnu.org To: emacs-orgmode@gnu.org Nick Dokos writes: > Markus Heller wrote: > >> Hello all, >> >> I have an OT request that can hopefully be answered by emacs gurus in >> less than a minute: >> >> I'm looking for an emacs search expression that finds :PROPERTIES: >> *without* a matching :END: ... >> > > If you mean a regexp, you are wasting your time[fn:1]. Regexps are > powerful, but their range of applicability is limited to regular > languages and even then, you have to worry about their efficiency. The > above *is* a regular language: if P stands for :PROPERTIES: and E stands > for :END:, then the regexp is > > ([^EP]*P[^EP]*E)* > > In words, the stuff inside the parens says: 0 or more "other" things > (non-P and non-E), followed by a P, followed by 0 or more "other" > things, followed by an E. You can then have 0 or more of the > parenthesized things. This will succeed on well formed "sentences" and > fail on others. But it might have to backtrack over the inner [^EP]* > matches and then the outer matches, and rescan arbitrarily long > stretches, which in the worst case, can turn your search into an > exponentially slow descent into the abyss. You might be able to write > non-greedy regexps that might behave better in this case. In most cases, > you'd end up with a horrendous-looking regexp: good luck trying to > understand it next week. That's my biggest problem with complicated regexps. > > However, a change of tool will simplify the problem enormously. E.g. here's > a simple algorithm that can be used for this kind of problem: start a > nesting depth at 0 - when you see a P, increment the nesting depth by 1; > when you see an E, decrement it by 1. If the nesting depth ever becomes > something other than 0 or 1, you got a problem - also, if at EOF, the > nesting depth is not 0, you got a problem. Easy variations of this will > check well-formedness even when nesting *is* allowed. > > You can easily write such a program in any language you are familiar > with (it does not have to be elisp, although you *can* write it in > elisp - personally, I'd use awk). > > But assuming that you are getting some error from org, you don't know > where the problem is and you are trying to find it, it will be simpler > to just use egrep: > > grep -E -n ':PROPERTIES:|:END:' foo.org > > will filter out the relevant lines, so all you have to do is scan the > output by eye and spot any irregularity (consecutive :PROPERTIES: or > consecutive :END: lines). Even if you have hundreds of them, that's > *easy* for humans to do.[fn:2] > > Or, if you prefer, you can write trivial validation programs to operate > on the output, e.g.: > > grep -E -n ':PROPERTIES:|:END:' foo.org | tee foo.out | grep PROP | wc -l > grep END foo.out | wc -l > > (the counts 'd better be the same). > > or > > grep -E -n ':PROPERTIES:|:END:' foo.org | foo.awk > > where foo.awk implements the nesting depth algorithm above - something > like this: > > #! /bin/bash > > awk ' > BEGIN { d = 0;} > /:PROPERTIES:/ { d++; if (d > 1) { print $1, $d; exit; }} > /:END:/ { d--; if (d < 0) { print $1, $d; exit; }} > END { if (d != 0) { print $1, $d; }}' > > > Even on Windoze, you can probably do all this stuff with cygwin. Hi Nick, thanks for this informative reply. Unfortunately, I cannot install cygwin on my work computer. I'll have to figure something else out ... AS for an example, I'm in one of my org files and I do C-TAB and get the following error: OVERVIEW CONTENTS...done SHOW ALL if: :END: line missing at position 18720 Quit Mark set Where is position 18720? I apologize if this is a stupid question, but I can't seem to figure this out ... Thanks again Markus