emacs-orgmode@gnu.org archives
 help / color / mirror / code / Atom feed
* [OT]: Search for missing :END:
@ 2011-11-21 18:48 Markus Heller
  2011-11-21 21:38 ` Nick Dokos
  0 siblings, 1 reply; 11+ messages in thread
From: Markus Heller @ 2011-11-21 18:48 UTC (permalink / raw)
  To: emacs-orgmode

Hello all,

I have an OT request that can hopefully be answered by emacs gurus in
less than a minute:

I'm looking for an emacs search expression that finds :PROPERTIES:
*without* a matching :END: ...

Thanks and Cheers and sorry for the OT ...

Markus

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [OT]: Search for missing :END:
  2011-11-21 18:48 [OT]: Search for missing :END: Markus Heller
@ 2011-11-21 21:38 ` Nick Dokos
  2011-11-21 23:27   ` Markus Heller
  2011-11-22  8:46   ` : " Olaf Dietsche
  0 siblings, 2 replies; 11+ messages in thread
From: Nick Dokos @ 2011-11-21 21:38 UTC (permalink / raw)
  To: Markus Heller; +Cc: nicholas.dokos, emacs-orgmode

Markus Heller <hellerm2@gmail.com> wrote:

> Hello all,
> 
> I have an OT request that can hopefully be answered by emacs gurus in
> less than a minute:
> 
> I'm looking for an emacs search expression that finds :PROPERTIES:
> *without* a matching :END: ...
> 

If you mean a regexp, you are wasting your time[fn:1]. Regexps are
powerful, but their range of applicability is limited to regular
languages and even then, you have to worry about their efficiency. The
above *is* a regular language: if P stands for :PROPERTIES: and E stands
for :END:, then the regexp is

    ([^EP]*P[^EP]*E)*

In words, the stuff inside the parens says: 0 or more "other" things
(non-P and non-E), followed by a P, followed by 0 or more "other"
things, followed by an E. You can then have 0 or more of the
parenthesized things. This will succeed on well formed "sentences" and
fail on others.  But it might have to backtrack over the inner [^EP]*
matches and then the outer matches, and rescan arbitrarily long
stretches, which in the worst case, can turn your search into an
exponentially slow descent into the abyss. You might be able to write
non-greedy regexps that might behave better in this case. In most cases,
you'd end up with a horrendous-looking regexp: good luck trying to
understand it next week. That's my biggest problem with complicated regexps.

However, a change of tool will simplify the problem enormously. E.g. here's
a simple algorithm that can be used for this kind of problem:  start a
nesting depth at 0 - when you see a P, increment the nesting depth by 1;
when you see an E, decrement it by 1. If the nesting depth ever becomes
something other than 0 or 1, you got a problem - also, if at EOF, the
nesting depth is not 0, you got a problem. Easy variations of this will
check well-formedness even when nesting *is* allowed.

You can easily write such a program in any language you are familiar
with (it does not have to be elisp, although you *can* write it in
elisp - personally, I'd use awk).

But assuming that you are getting some error from org, you don't know
where the problem is and you are trying to find it, it will be simpler
to just use egrep:

    grep -E -n ':PROPERTIES:|:END:' foo.org

will filter out the relevant lines, so all you have to do is scan the
output by eye and spot any irregularity (consecutive :PROPERTIES: or
consecutive :END: lines). Even if you have hundreds of them, that's
*easy* for humans to do.[fn:2]

Or, if you prefer, you can write trivial validation programs to operate
on the output, e.g.:

        grep -E -n ':PROPERTIES:|:END:' foo.org | tee foo.out | grep PROP | wc -l
	grep END foo.out | wc -l

(the counts 'd better be the same).

or

	grep -E -n ':PROPERTIES:|:END:' foo.org | foo.awk

where foo.awk implements the nesting depth algorithm above - something
like this:

--8<---------------cut here---------------start------------->8---
#! /bin/bash

awk '
BEGIN          { d = 0;}
/:PROPERTIES:/ { d++; if (d > 1) { print $1, $d; exit; }}
/:END:/        { d--; if (d < 0) { print $1, $d; exit; }}
END            { if (d != 0) { print $1, $d; }}'
--8<---------------cut here---------------end--------------->8---


Even on Windoze, you can probably do all this stuff with cygwin.

Nick

> Thanks and Cheers and sorry for the OT ...
> 
> Markus
> 
>

Footnotes:

[fn:1] In the (approximate) words of Jamie Zawinski: "You have a
       problem. You think 'I know, let me use regexps to solve it'. Now
       you have two problems."

[fn:2] Of course, if you have formatted your file perversely or done
other naughty things, this might not work. The point is that although
this is not foolproof, it should deal with the vast majority of
"reasonable" files out there.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [OT]: Search for missing :END:
  2011-11-21 21:38 ` Nick Dokos
@ 2011-11-21 23:27   ` Markus Heller
  2011-11-21 23:51     ` Andrew Stribblehill
                       ` (2 more replies)
  2011-11-22  8:46   ` : " Olaf Dietsche
  1 sibling, 3 replies; 11+ messages in thread
From: Markus Heller @ 2011-11-21 23:27 UTC (permalink / raw)
  To: emacs-orgmode

Nick Dokos <nicholas.dokos@hp.com> writes:

> Markus Heller <hellerm2@gmail.com> wrote:
>
>> Hello all,
>> 
>> I have an OT request that can hopefully be answered by emacs gurus in
>> less than a minute:
>> 
>> I'm looking for an emacs search expression that finds :PROPERTIES:
>> *without* a matching :END: ...
>> 
>
> If you mean a regexp, you are wasting your time[fn:1]. Regexps are
> powerful, but their range of applicability is limited to regular
> languages and even then, you have to worry about their efficiency. The
> above *is* a regular language: if P stands for :PROPERTIES: and E stands
> for :END:, then the regexp is
>
>     ([^EP]*P[^EP]*E)*
>
> In words, the stuff inside the parens says: 0 or more "other" things
> (non-P and non-E), followed by a P, followed by 0 or more "other"
> things, followed by an E. You can then have 0 or more of the
> parenthesized things. This will succeed on well formed "sentences" and
> fail on others.  But it might have to backtrack over the inner [^EP]*
> matches and then the outer matches, and rescan arbitrarily long
> stretches, which in the worst case, can turn your search into an
> exponentially slow descent into the abyss. You might be able to write
> non-greedy regexps that might behave better in this case. In most cases,
> you'd end up with a horrendous-looking regexp: good luck trying to
> understand it next week. That's my biggest problem with complicated regexps.
>
> However, a change of tool will simplify the problem enormously. E.g. here's
> a simple algorithm that can be used for this kind of problem:  start a
> nesting depth at 0 - when you see a P, increment the nesting depth by 1;
> when you see an E, decrement it by 1. If the nesting depth ever becomes
> something other than 0 or 1, you got a problem - also, if at EOF, the
> nesting depth is not 0, you got a problem. Easy variations of this will
> check well-formedness even when nesting *is* allowed.
>
> You can easily write such a program in any language you are familiar
> with (it does not have to be elisp, although you *can* write it in
> elisp - personally, I'd use awk).
>
> But assuming that you are getting some error from org, you don't know
> where the problem is and you are trying to find it, it will be simpler
> to just use egrep:
>
>     grep -E -n ':PROPERTIES:|:END:' foo.org
>
> will filter out the relevant lines, so all you have to do is scan the
> output by eye and spot any irregularity (consecutive :PROPERTIES: or
> consecutive :END: lines). Even if you have hundreds of them, that's
> *easy* for humans to do.[fn:2]
>
> Or, if you prefer, you can write trivial validation programs to operate
> on the output, e.g.:
>
>         grep -E -n ':PROPERTIES:|:END:' foo.org | tee foo.out | grep PROP | wc -l
> 	grep END foo.out | wc -l
>
> (the counts 'd better be the same).
>
> or
>
> 	grep -E -n ':PROPERTIES:|:END:' foo.org | foo.awk
>
> where foo.awk implements the nesting depth algorithm above - something
> like this:
>
> #! /bin/bash
>
> awk '
> BEGIN          { d = 0;}
> /:PROPERTIES:/ { d++; if (d > 1) { print $1, $d; exit; }}
> /:END:/        { d--; if (d < 0) { print $1, $d; exit; }}
> END            { if (d != 0) { print $1, $d; }}'
>
>
> Even on Windoze, you can probably do all this stuff with cygwin.

Hi Nick, 

thanks for this informative reply.

Unfortunately, I cannot install cygwin on my work computer.  I'll have
to figure something else out ...

AS for an example, I'm in one of my org files and I do C-TAB and get the
following error:

OVERVIEW
CONTENTS...done
SHOW ALL
if: :END: line missing at position 18720
Quit
Mark set

Where is position 18720?  I apologize if this is a stupid question, but
I can't seem to figure this out ...

Thanks again
Markus

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [OT]: Search for missing :END:
  2011-11-21 23:27   ` Markus Heller
@ 2011-11-21 23:51     ` Andrew Stribblehill
  2011-11-21 23:54     ` Nick Dokos
  2011-11-21 23:58     ` Nick Dokos
  2 siblings, 0 replies; 11+ messages in thread
From: Andrew Stribblehill @ 2011-11-21 23:51 UTC (permalink / raw)
  To: Markus Heller; +Cc: emacs-orgmode

My guess is that it's the 18720th byte of the file. To get there, go
to the start of your buffer and type M-x goto-char 18720


On Mon, Nov 21, 2011 at 11:27 PM, Markus Heller <hellerm2@gmail.com> wrote:
> Nick Dokos <nicholas.dokos@hp.com> writes:
>
>> Markus Heller <hellerm2@gmail.com> wrote:
>>
>>> Hello all,
>>>
>>> I have an OT request that can hopefully be answered by emacs gurus in
>>> less than a minute:
>>>
>>> I'm looking for an emacs search expression that finds :PROPERTIES:
>>> *without* a matching :END: ...
>>>
>>
>> If you mean a regexp, you are wasting your time[fn:1]. Regexps are
>> powerful, but their range of applicability is limited to regular
>> languages and even then, you have to worry about their efficiency. The
>> above *is* a regular language: if P stands for :PROPERTIES: and E stands
>> for :END:, then the regexp is
>>
>>     ([^EP]*P[^EP]*E)*
>>
>> In words, the stuff inside the parens says: 0 or more "other" things
>> (non-P and non-E), followed by a P, followed by 0 or more "other"
>> things, followed by an E. You can then have 0 or more of the
>> parenthesized things. This will succeed on well formed "sentences" and
>> fail on others.  But it might have to backtrack over the inner [^EP]*
>> matches and then the outer matches, and rescan arbitrarily long
>> stretches, which in the worst case, can turn your search into an
>> exponentially slow descent into the abyss. You might be able to write
>> non-greedy regexps that might behave better in this case. In most cases,
>> you'd end up with a horrendous-looking regexp: good luck trying to
>> understand it next week. That's my biggest problem with complicated regexps.
>>
>> However, a change of tool will simplify the problem enormously. E.g. here's
>> a simple algorithm that can be used for this kind of problem:  start a
>> nesting depth at 0 - when you see a P, increment the nesting depth by 1;
>> when you see an E, decrement it by 1. If the nesting depth ever becomes
>> something other than 0 or 1, you got a problem - also, if at EOF, the
>> nesting depth is not 0, you got a problem. Easy variations of this will
>> check well-formedness even when nesting *is* allowed.
>>
>> You can easily write such a program in any language you are familiar
>> with (it does not have to be elisp, although you *can* write it in
>> elisp - personally, I'd use awk).
>>
>> But assuming that you are getting some error from org, you don't know
>> where the problem is and you are trying to find it, it will be simpler
>> to just use egrep:
>>
>>     grep -E -n ':PROPERTIES:|:END:' foo.org
>>
>> will filter out the relevant lines, so all you have to do is scan the
>> output by eye and spot any irregularity (consecutive :PROPERTIES: or
>> consecutive :END: lines). Even if you have hundreds of them, that's
>> *easy* for humans to do.[fn:2]
>>
>> Or, if you prefer, you can write trivial validation programs to operate
>> on the output, e.g.:
>>
>>         grep -E -n ':PROPERTIES:|:END:' foo.org | tee foo.out | grep PROP | wc -l
>>       grep END foo.out | wc -l
>>
>> (the counts 'd better be the same).
>>
>> or
>>
>>       grep -E -n ':PROPERTIES:|:END:' foo.org | foo.awk
>>
>> where foo.awk implements the nesting depth algorithm above - something
>> like this:
>>
>> #! /bin/bash
>>
>> awk '
>> BEGIN          { d = 0;}
>> /:PROPERTIES:/ { d++; if (d > 1) { print $1, $d; exit; }}
>> /:END:/        { d--; if (d < 0) { print $1, $d; exit; }}
>> END            { if (d != 0) { print $1, $d; }}'
>>
>>
>> Even on Windoze, you can probably do all this stuff with cygwin.
>
> Hi Nick,
>
> thanks for this informative reply.
>
> Unfortunately, I cannot install cygwin on my work computer.  I'll have
> to figure something else out ...
>
> AS for an example, I'm in one of my org files and I do C-TAB and get the
> following error:
>
> OVERVIEW
> CONTENTS...done
> SHOW ALL
> if: :END: line missing at position 18720
> Quit
> Mark set
>
> Where is position 18720?  I apologize if this is a stupid question, but
> I can't seem to figure this out ...
>
> Thanks again
> Markus
>
>
>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [OT]: Search for missing :END:
  2011-11-21 23:27   ` Markus Heller
  2011-11-21 23:51     ` Andrew Stribblehill
@ 2011-11-21 23:54     ` Nick Dokos
  2011-11-21 23:59       ` Jonathan Leech-Pepin
  2011-11-22  0:14       ` Markus Heller
  2011-11-21 23:58     ` Nick Dokos
  2 siblings, 2 replies; 11+ messages in thread
From: Nick Dokos @ 2011-11-21 23:54 UTC (permalink / raw)
  To: Markus Heller; +Cc: nicholas.dokos, emacs-orgmode

Markus Heller <hellerm2@gmail.com> wrote:


> AS for an example, I'm in one of my org files and I do C-TAB and get the
> following error:
> 
> OVERVIEW
> CONTENTS...done
> SHOW ALL
> if: :END: line missing at position 18720
> Quit
> Mark set
> 
> Where is position 18720?  I apologize if this is a stupid question, but
> I can't seem to figure this out ...
> 

M-x goto-char RET 18720 RET

will take you there, but it's probably at the very end of the buffer: search backwards
for :PROPERTIES: and you should be able to find it (if your diagnosis is
indeed correct).

Nick

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [OT]: Search for missing :END:
  2011-11-21 23:27   ` Markus Heller
  2011-11-21 23:51     ` Andrew Stribblehill
  2011-11-21 23:54     ` Nick Dokos
@ 2011-11-21 23:58     ` Nick Dokos
  2 siblings, 0 replies; 11+ messages in thread
From: Nick Dokos @ 2011-11-21 23:58 UTC (permalink / raw)
  To: Markus Heller; +Cc: nicholas.dokos, emacs-orgmode

Markus Heller <hellerm2@gmail.com> wrote:

> Unfortunately, I cannot install cygwin on my work computer.  I'll have
> to figure something else out ...
> 

Just copy the file to a Linux/Unix/OSX/... machine and do the searches there.
USB sticks were invented for exactly that purpose :-)

> AS for an example, I'm in one of my org files and I do C-TAB and get the
> following error:
> 
> OVERVIEW
> CONTENTS...done
> SHOW ALL
> if: :END: line missing at position 18720
> Quit
> Mark set
> 
> Where is position 18720?  I apologize if this is a stupid question, but
> I can't seem to figure this out ...
> 
> Thanks again
> Markus
> 
> 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [OT]: Search for missing :END:
  2011-11-21 23:54     ` Nick Dokos
@ 2011-11-21 23:59       ` Jonathan Leech-Pepin
  2011-11-22  0:14       ` Markus Heller
  1 sibling, 0 replies; 11+ messages in thread
From: Jonathan Leech-Pepin @ 2011-11-21 23:59 UTC (permalink / raw)
  To: nicholas.dokos; +Cc: Markus Heller, emacs-orgmode

[-- Attachment #1: Type: text/plain, Size: 1035 bytes --]

I just tested this.

M-x goto-char RET <reported position> RET ;; In this case 18720

Sends you directly to the end of your :PROPERTIES: line that does not have
an associated :END:.

You'll simply have to then add :END: after the appropriate properties, or
remove the :PROPERTIES: line that shouldn't be present (as the case may be).

Jon

On Mon, Nov 21, 2011 at 6:54 PM, Nick Dokos <nicholas.dokos@hp.com> wrote:

> Markus Heller <hellerm2@gmail.com> wrote:
>
>
> > AS for an example, I'm in one of my org files and I do C-TAB and get the
> > following error:
> >
> > OVERVIEW
> > CONTENTS...done
> > SHOW ALL
> > if: :END: line missing at position 18720
> > Quit
> > Mark set
> >
> > Where is position 18720?  I apologize if this is a stupid question, but
> > I can't seem to figure this out ...
> >
>
> M-x goto-char RET 18720 RET
>
> will take you there, but it's probably at the very end of the buffer:
> search backwards
> for :PROPERTIES: and you should be able to find it (if your diagnosis is
> indeed correct).
>
> Nick
>
>

[-- Attachment #2: Type: text/html, Size: 1564 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [OT]: Search for missing :END:
  2011-11-21 23:54     ` Nick Dokos
  2011-11-21 23:59       ` Jonathan Leech-Pepin
@ 2011-11-22  0:14       ` Markus Heller
  1 sibling, 0 replies; 11+ messages in thread
From: Markus Heller @ 2011-11-22  0:14 UTC (permalink / raw)
  To: emacs-orgmode

Nick Dokos <nicholas.dokos@hp.com> writes:

> Markus Heller <hellerm2@gmail.com> wrote:
>
>
>> AS for an example, I'm in one of my org files and I do C-TAB and get the
>> following error:
>> 
>> OVERVIEW
>> CONTENTS...done
>> SHOW ALL
>> if: :END: line missing at position 18720
>> Quit
>> Mark set
>> 
>> Where is position 18720?  I apologize if this is a stupid question, but
>> I can't seem to figure this out ...
>> 
>
> M-x goto-char RET 18720 RET
>
> will take you there, but it's probably at the very end of the buffer: search backwards
> for :PROPERTIES: and you should be able to find it (if your diagnosis is
> indeed correct).

Thanks to Nick and the other posters for an enlightening answer.  This
worked beautifully :)

Thanks again and Cheers
Markus, who knew that there was some easy way to solve this but couldn't
figure it out himself

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: : Search for missing :END:
  2011-11-21 21:38 ` Nick Dokos
  2011-11-21 23:27   ` Markus Heller
@ 2011-11-22  8:46   ` Olaf Dietsche
  2011-11-22 14:37     ` Nick Dokos
  1 sibling, 1 reply; 11+ messages in thread
From: Olaf Dietsche @ 2011-11-22  8:46 UTC (permalink / raw)
  To: nicholas.dokos; +Cc: Markus Heller, emacs-orgmode

Nick Dokos <nicholas.dokos@hp.com> writes:

[snip]
> But assuming that you are getting some error from org, you don't know
> where the problem is and you are trying to find it, it will be simpler
> to just use egrep:
>
>     grep -E -n ':PROPERTIES:|:END:' foo.org
>
> will filter out the relevant lines, so all you have to do is scan the
> output by eye and spot any irregularity (consecutive :PROPERTIES: or
> consecutive :END: lines). Even if you have hundreds of them, that's
> *easy* for humans to do.[fn:2]
>
> Or, if you prefer, you can write trivial validation programs to operate
> on the output, e.g.:
>
>         grep -E -n ':PROPERTIES:|:END:' foo.org | tee foo.out | grep PROP | wc -l
> 	grep END foo.out | wc -l
>
> (the counts 'd better be the same).

Since you might have other drawers as well (LOGBOOK, CLOCK), the counts
need not be the same.

Regards, Olaf

NB: At least GNU grep knows how to search for multiple regexps:
$ grep -e :PROPERTIES: -e :END: foo.org
So, no need for egrep here.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: : Search for missing :END:
  2011-11-22  8:46   ` : " Olaf Dietsche
@ 2011-11-22 14:37     ` Nick Dokos
  0 siblings, 0 replies; 11+ messages in thread
From: Nick Dokos @ 2011-11-22 14:37 UTC (permalink / raw)
  To: Olaf Dietsche; +Cc: Markus Heller, nicholas.dokos, emacs-orgmode

Olaf Dietsche <olaf+list.orgmode@olafdietsche.de> wrote:

> Nick Dokos <nicholas.dokos@hp.com> writes:
> 
> [snip]
> > But assuming that you are getting some error from org, you don't know
> > where the problem is and you are trying to find it, it will be simpler
> > to just use egrep:
> >
> >     grep -E -n ':PROPERTIES:|:END:' foo.org
> >
> > will filter out the relevant lines, so all you have to do is scan the
> > output by eye and spot any irregularity (consecutive :PROPERTIES: or
> > consecutive :END: lines). Even if you have hundreds of them, that's
> > *easy* for humans to do.[fn:2]
> >
> > Or, if you prefer, you can write trivial validation programs to operate
> > on the output, e.g.:
> >
> >         grep -E -n ':PROPERTIES:|:END:' foo.org | tee foo.out | grep PROP | wc -l
> > 	grep END foo.out | wc -l
> >
> > (the counts 'd better be the same).
> 
> Since you might have other drawers as well (LOGBOOK, CLOCK), the counts
> need not be the same.
> 

Indeed - you might have user-defined drawers as well and they all contribute the
:END: part, so the begin part will have to be fleshed out accordingly. But the
generalization is easy and eyeballing the output is pretty effective: if you
miss some drawer types, the repeating :END: stands out like a sore thumb.

>
> NB: At least GNU grep knows how to search for multiple regexps:
> $ grep -e :PROPERTIES: -e :END: foo.org
> So, no need for egrep here.
> 

Ok - thanks.

Nick

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [OT]: Search for missing :END:
@ 2013-03-04 21:12 Brian Wood
  0 siblings, 0 replies; 11+ messages in thread
From: Brian Wood @ 2013-03-04 21:12 UTC (permalink / raw)
  To: emacs-orgmode

I just found a few of these errors in *Messages* and since I have > 20 
.org files, I used this approach

cd ~/orgmode

# this gives me the byte offset & the filename for each matching line
grep -br  'CLOCK: ' *  > ~/tmp/org-missing-end.txt

#this searches within a few bytes of the byte offset reported by emacs
grep -E ":3007.:" ~/tmp/org-missing-end.txt

Searching within a few bytes was not necessary. In my case I found a 
":CLOCK:" string at the byte-offset in the error message.

-- 
Brian Wood
Applications Programmer
UC Berkeley IST

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2013-03-04 21:13 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-11-21 18:48 [OT]: Search for missing :END: Markus Heller
2011-11-21 21:38 ` Nick Dokos
2011-11-21 23:27   ` Markus Heller
2011-11-21 23:51     ` Andrew Stribblehill
2011-11-21 23:54     ` Nick Dokos
2011-11-21 23:59       ` Jonathan Leech-Pepin
2011-11-22  0:14       ` Markus Heller
2011-11-21 23:58     ` Nick Dokos
2011-11-22  8:46   ` : " Olaf Dietsche
2011-11-22 14:37     ` Nick Dokos
  -- strict thread matches above, loose matches on Subject: below --
2013-03-04 21:12 [OT]: " Brian Wood

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).