emacs-orgmode@gnu.org archives
 help / color / mirror / code / Atom feed
From: Nick Dokos <nicholas.dokos@hp.com>
To: Markus Heller <hellerm2@gmail.com>
Cc: nicholas.dokos@hp.com, emacs-orgmode@gnu.org
Subject: Re: [OT]: Search for missing :END:
Date: Mon, 21 Nov 2011 16:38:22 -0500	[thread overview]
Message-ID: <6557.1321911502@alphaville.americas.hpqcorp.net> (raw)
In-Reply-To: Message from Markus Heller <hellerm2@gmail.com> of "Mon, 21 Nov 2011 10:48:35 PST." <0vvcqdxqf0.fsf@gmail.com>

Markus Heller <hellerm2@gmail.com> wrote:

> Hello all,
> 
> I have an OT request that can hopefully be answered by emacs gurus in
> less than a minute:
> 
> I'm looking for an emacs search expression that finds :PROPERTIES:
> *without* a matching :END: ...
> 

If you mean a regexp, you are wasting your time[fn:1]. Regexps are
powerful, but their range of applicability is limited to regular
languages and even then, you have to worry about their efficiency. The
above *is* a regular language: if P stands for :PROPERTIES: and E stands
for :END:, then the regexp is

    ([^EP]*P[^EP]*E)*

In words, the stuff inside the parens says: 0 or more "other" things
(non-P and non-E), followed by a P, followed by 0 or more "other"
things, followed by an E. You can then have 0 or more of the
parenthesized things. This will succeed on well formed "sentences" and
fail on others.  But it might have to backtrack over the inner [^EP]*
matches and then the outer matches, and rescan arbitrarily long
stretches, which in the worst case, can turn your search into an
exponentially slow descent into the abyss. You might be able to write
non-greedy regexps that might behave better in this case. In most cases,
you'd end up with a horrendous-looking regexp: good luck trying to
understand it next week. That's my biggest problem with complicated regexps.

However, a change of tool will simplify the problem enormously. E.g. here's
a simple algorithm that can be used for this kind of problem:  start a
nesting depth at 0 - when you see a P, increment the nesting depth by 1;
when you see an E, decrement it by 1. If the nesting depth ever becomes
something other than 0 or 1, you got a problem - also, if at EOF, the
nesting depth is not 0, you got a problem. Easy variations of this will
check well-formedness even when nesting *is* allowed.

You can easily write such a program in any language you are familiar
with (it does not have to be elisp, although you *can* write it in
elisp - personally, I'd use awk).

But assuming that you are getting some error from org, you don't know
where the problem is and you are trying to find it, it will be simpler
to just use egrep:

    grep -E -n ':PROPERTIES:|:END:' foo.org

will filter out the relevant lines, so all you have to do is scan the
output by eye and spot any irregularity (consecutive :PROPERTIES: or
consecutive :END: lines). Even if you have hundreds of them, that's
*easy* for humans to do.[fn:2]

Or, if you prefer, you can write trivial validation programs to operate
on the output, e.g.:

        grep -E -n ':PROPERTIES:|:END:' foo.org | tee foo.out | grep PROP | wc -l
	grep END foo.out | wc -l

(the counts 'd better be the same).

or

	grep -E -n ':PROPERTIES:|:END:' foo.org | foo.awk

where foo.awk implements the nesting depth algorithm above - something
like this:

--8<---------------cut here---------------start------------->8---
#! /bin/bash

awk '
BEGIN          { d = 0;}
/:PROPERTIES:/ { d++; if (d > 1) { print $1, $d; exit; }}
/:END:/        { d--; if (d < 0) { print $1, $d; exit; }}
END            { if (d != 0) { print $1, $d; }}'
--8<---------------cut here---------------end--------------->8---


Even on Windoze, you can probably do all this stuff with cygwin.

Nick

> Thanks and Cheers and sorry for the OT ...
> 
> Markus
> 
>

Footnotes:

[fn:1] In the (approximate) words of Jamie Zawinski: "You have a
       problem. You think 'I know, let me use regexps to solve it'. Now
       you have two problems."

[fn:2] Of course, if you have formatted your file perversely or done
other naughty things, this might not work. The point is that although
this is not foolproof, it should deal with the vast majority of
"reasonable" files out there.

  reply	other threads:[~2011-11-21 22:01 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-11-21 18:48 [OT]: Search for missing :END: Markus Heller
2011-11-21 21:38 ` Nick Dokos [this message]
2011-11-21 23:27   ` Markus Heller
2011-11-21 23:51     ` Andrew Stribblehill
2011-11-21 23:54     ` Nick Dokos
2011-11-21 23:59       ` Jonathan Leech-Pepin
2011-11-22  0:14       ` Markus Heller
2011-11-21 23:58     ` Nick Dokos
2011-11-22  8:46   ` : " Olaf Dietsche
2011-11-22 14:37     ` Nick Dokos
  -- strict thread matches above, loose matches on Subject: below --
2013-03-04 21:12 [OT]: " Brian Wood

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.orgmode.org/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=6557.1321911502@alphaville.americas.hpqcorp.net \
    --to=nicholas.dokos@hp.com \
    --cc=emacs-orgmode@gnu.org \
    --cc=hellerm2@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).