emacs-orgmode@gnu.org archives
 help / color / mirror / code / Atom feed
* proposal: defconst/defcustom org-tags-regexp
@ 2007-07-16 13:21 Adam Spiers
  2007-07-18 21:24 ` Carsten Dominik
  0 siblings, 1 reply; 8+ messages in thread
From: Adam Spiers @ 2007-07-16 13:21 UTC (permalink / raw)
  To: org-mode mailing list

Hi again,

There seem to be a number of hardcoded regexps currently used for
matching heading tags, all very similar looking, and typically
something like:

  [ \t]*\\(:[[:alnum:]_@:]+:\\)?[ \t]*\\($\\|\r\\)

Is there any reason why these shouldn't be factored out into a new
defcustom org-tags-regexp?  Not only would this make for more reliable
code as per the DRY principle, but it would also some new
possibilities:

It seems a lot of people are already using "@" as a tag prefix to
indicate GTD contexts, which the code currently allows, and is a nice
way of keeping distinct tag "namespaces".  Additionally I would like
to have tags to denote time estimates for actions:

     <10min    i.e. it's a 10 minute job worst case
     <30min    i.e. it's a half an hour job worst case
     <60min    etc.
     <120min
     <4hrs
     <1day

(currently, the "<" prefix confuses the code); and maybe something
else, like "+" to prefix tags which denote actions forming part of a
bigger picture e.g. "+health" for any actions/projects related to
living more healthily (less elisp hacking and more sunlight might be a
good start ;-)

Thanks,
Adam

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: proposal: defconst/defcustom org-tags-regexp
  2007-07-16 13:21 proposal: defconst/defcustom org-tags-regexp Adam Spiers
@ 2007-07-18 21:24 ` Carsten Dominik
  2007-07-19 13:08   ` Eddward DeVilla
  2007-07-20 15:05   ` Adam Spiers
  0 siblings, 2 replies; 8+ messages in thread
From: Carsten Dominik @ 2007-07-18 21:24 UTC (permalink / raw)
  To: Adam Spiers; +Cc: org-mode mailing list


On Jul 16, 2007, at 15:21, Adam Spiers wrote:

> Hi again,
>
> There seem to be a number of hardcoded regexps currently used for
> matching heading tags, all very similar looking, and typically
> something like:
>
>   [ \t]*\\(:[[:alnum:]_@:]+:\\)?[ \t]*\\($\\|\r\\)
>
> Is there any reason why these shouldn't be factored out into a new
> defcustom org-tags-regexp?

Well, one reason is efficiency.  When a regular expression is a 
constant,
Emacs is able to cache the compiled version of the regular expression,
and this can speed up code that does a lot of matching quite a bit.  The
token you show above is usually part of a larger string, so the full
regular expression would have to be make with concat and will therefore
be recompiled all the time.

However, we could use a macro to process the concat at compile time.
This would still require that you have set the appropriate variables
at compile time, and you would need to recompile the Lisp
file after each change in this constant - not something that Emacs users
do expect.

Do you know a solution for this problem?

Still, I agree that what I did is bad programming in the sense that
things like this could/should be factored out.

- Carsten



--
Carsten Dominik
Sterrenkundig Instituut "Anton Pannekoek"
Universiteit van Amsterdam
Kruislaan 403
NL-1098SJ Amsterdam
phone: +31 20 525 7477

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: proposal: defconst/defcustom org-tags-regexp
  2007-07-18 21:24 ` Carsten Dominik
@ 2007-07-19 13:08   ` Eddward DeVilla
  2007-07-19 15:08     ` Carsten Dominik
  2007-07-20 15:05   ` Adam Spiers
  1 sibling, 1 reply; 8+ messages in thread
From: Eddward DeVilla @ 2007-07-19 13:08 UTC (permalink / raw)
  To: Carsten Dominik; +Cc: org-mode mailing list

On 7/18/07, Carsten Dominik <dominik@science.uva.nl> wrote:
> Do you know a solution for this problem?

Does emacs let you manually compile a regular expression?  If so, it
might be possible to recompile REs when ever they change.

Edd

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: proposal: defconst/defcustom org-tags-regexp
  2007-07-19 13:08   ` Eddward DeVilla
@ 2007-07-19 15:08     ` Carsten Dominik
  2007-07-19 16:14       ` Eddward DeVilla
  0 siblings, 1 reply; 8+ messages in thread
From: Carsten Dominik @ 2007-07-19 15:08 UTC (permalink / raw)
  To: Eddward DeVilla; +Cc: org-mode mailing list


On Jul 19, 2007, at 15:08, Eddward DeVilla wrote:

> On 7/18/07, Carsten Dominik <dominik@science.uva.nl> wrote:
>> Do you know a solution for this problem?
>
> Does emacs let you manually compile a regular expression?  If so, it
> might be possible to recompile REs when ever they change.

This is not the issue.  Yes, Emacs compiles regular expressions
whenever necessary.  However, consider the following loop:

    (while (re-search-forward "^aaa" nil t)
      (do-something)

In this case, Emacs will compile the regexp once and then use
it potentially many times.

However, in the following case:

    (while (re-search-forward (concat "^" some-variable) nil t)
       (do-something)

The regular expression wil be re-compiled all the time.
In this case you can work around it by doing:

    (let ((re (concat "^" some-variable)))
      (while (re-search-forward re nil t)
        (do-something...))

But if the loop is calling functions, and the regular expression
is being built and used in those functions, this might slow down
things.

Now, I am not sure how slow actually, I am basically following the
arguments in Jeffrey E. F. Friedl's excellent book
"Mastering Regular Expressions", but I am not sure how large the
effect will be in the end.  I remember that in perl this was 
significant,
so I am assuming it is in Emacs.

I would like to factor out more regexps into variables, and may do
so with time - but this does not have priority.

- Carsten



--
Carsten Dominik
Sterrenkundig Instituut "Anton Pannekoek"
Universiteit van Amsterdam
Kruislaan 403
NL-1098SJ Amsterdam
phone: +31 20 525 7477

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: proposal: defconst/defcustom org-tags-regexp
  2007-07-19 15:08     ` Carsten Dominik
@ 2007-07-19 16:14       ` Eddward DeVilla
  0 siblings, 0 replies; 8+ messages in thread
From: Eddward DeVilla @ 2007-07-19 16:14 UTC (permalink / raw)
  To: Carsten Dominik; +Cc: org-mode mailing list

On 7/19/07, Carsten Dominik <dominik@science.uva.nl> wrote:
>
> On Jul 19, 2007, at 15:08, Eddward DeVilla wrote:
> > Does emacs let you manually compile a regular expression?  If so, it
> > might be possible to recompile REs when ever they change.
>
> This is not the issue.  Yes, Emacs compiles regular expressions
> whenever necessary.  However, consider the following loop:

  I meant, could you store a compiled RE in a variable and use the
compiled form.  Then manually recompile them if someone changes the
uncompiled RE string.  This assumes you can store the compiled RE and
that there is a hook in custom to let you know it has changed.  We'd
probably want to have a function the regenerates all of org's REs and
allow others to hook into that to recompile their own REs.

  On the other hand, I just went digging through the elisp manual and
I didn't see anything that would generate a compiled RE to be stored
and (re)used later.  Bummer.  Maybe if I get bored, I might see if
performance is ok with the REs factored out, but I don't suspect I
would get good results.

Edd

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: proposal: defconst/defcustom org-tags-regexp
  2007-07-18 21:24 ` Carsten Dominik
  2007-07-19 13:08   ` Eddward DeVilla
@ 2007-07-20 15:05   ` Adam Spiers
       [not found]     ` <931245f7aaa24436377e537552f1beab@science.uva.nl>
  1 sibling, 1 reply; 8+ messages in thread
From: Adam Spiers @ 2007-07-20 15:05 UTC (permalink / raw)
  To: org-mode mailing list

On Wed, Jul 18, 2007 at 11:24:40PM +0200, Carsten Dominik wrote:
> On Jul 16, 2007, at 15:21, Adam Spiers wrote:
> >There seem to be a number of hardcoded regexps currently used for
> >matching heading tags, all very similar looking, and typically
> >something like:
> >
> >  [ \t]*\\(:[[:alnum:]_@:]+:\\)?[ \t]*\\($\\|\r\\)
> >
> >Is there any reason why these shouldn't be factored out into a new
> >defcustom org-tags-regexp?
> 
> Well, one reason is efficiency.  When a regular expression is a 
> constant,
> Emacs is able to cache the compiled version of the regular expression,
> and this can speed up code that does a lot of matching quite a bit.  The
> token you show above is usually part of a larger string, so the full
> regular expression would have to be make with concat and will therefore
> be recompiled all the time.

Right, point taken - like m//o in Perl.  To be honest, it doesn't
matter too much to me if it's defconst rather than defcustom.  The
main thing is that I can have tags starting with '<' :-)

I suggest that we start by refactoring into a defconst, and then we
can worry about turning that into a defcustom in a way which retains
efficiency at a later date - if anyone needs the defcustom badly
enough.  Makes sense?

Cheers,
Adam

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: proposal: defconst/defcustom org-tags-regexp
       [not found]     ` <931245f7aaa24436377e537552f1beab@science.uva.nl>
@ 2007-09-01 11:01       ` Adam Spiers
  2007-09-02  7:30         ` Carsten Dominik
  0 siblings, 1 reply; 8+ messages in thread
From: Adam Spiers @ 2007-09-01 11:01 UTC (permalink / raw)
  To: Carsten Dominik; +Cc: org-mode mailing list

Carsten Dominik (dominik@science.uva.nl) wrote:
> On Jul 20, 2007, at 17:05, Adam Spiers wrote:
> >On Wed, Jul 18, 2007 at 11:24:40PM +0200, Carsten Dominik wrote:
> >>On Jul 16, 2007, at 15:21, Adam Spiers wrote:
> >>>There seem to be a number of hardcoded regexps currently used for
> >>>matching heading tags, all very similar looking, and typically
> >>>something like:
> >>>
> >>> [ \t]*\\(:[[:alnum:]_@:]+:\\)?[ \t]*\\($\\|\r\\)
> >>>
> >>>Is there any reason why these shouldn't be factored out into a new
> >>>defcustom org-tags-regexp?
> >>
> >>Well, one reason is efficiency.  When a regular expression is a constant,
> >>Emacs is able to cache the compiled version of the regular expression,
> >>and this can speed up code that does a lot of matching quite a bit.  
> >>The token you show above is usually part of a larger string, so the full
> >>regular expression would have to be make with concat and will therefore
> >>be recompiled all the time.
> >
> >Right, point taken - like m//o in Perl.  To be honest, it doesn't
> >matter too much to me if it's defconst rather than defcustom.  The
> >main thing is that I can have tags starting with '<' :-)
> 
> Hi Adam,
> 
> this is not about defcustom or defconst, but about the question
> of the regexp is built each time with concat, or not.

Ah, I was assuming that the elisp interpreter was intelligent enough
that if you did a concat of two or more constants, it would only build
the regexp the first time, similar to m//o in Perl.  Is that not the
case?  Or maybe it only performs this optimisation if you
byte-compile?  I found this in the elisp manual:

 -- Special Form: eval-when-compile body...
     This form marks BODY to be evaluated at compile time but not when
     the compiled program is loaded.  The result of evaluation by the
     compiler becomes a constant which appears in the compiled program.
     If you load the source file, rather than compiling it, BODY is
     evaluated normally.

     If you have a constant that needs some calculation to produce,
     `eval-when-compile' can do that at compile-time.  For example,

          (defvar my-regexp
            (eval-when-compile (regexp-opt '("aaa" "aba" "abb"))))

Maybe I should practice what I preach and use mercurial to start an
experimental branch to look at the impact on performance of doing this
refactoring :-)

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: proposal: defconst/defcustom org-tags-regexp
  2007-09-01 11:01       ` Adam Spiers
@ 2007-09-02  7:30         ` Carsten Dominik
  0 siblings, 0 replies; 8+ messages in thread
From: Carsten Dominik @ 2007-09-02  7:30 UTC (permalink / raw)
  To: Adam Spiers; +Cc: org-mode mailing list


On Sep 1, 2007, at 13:01, Adam Spiers wrote:
> Ah, I was assuming that the elisp interpreter was intelligent enough
> that if you did a concat of two or more constants, it would only build
> the regexp the first time, similar to m//o in Perl.  Is that not the
> case?  Or maybe it only performs this optimisation if you
> byte-compile?  I found this in the elisp manual:
>
>  -- Special Form: eval-when-compile body...
>      This form marks BODY to be evaluated at compile time but not when
>      the compiled program is loaded.  The result of evaluation by the
>      compiler becomes a constant which appears in the compiled program.
>      If you load the source file, rather than compiling it, BODY is
>      evaluated normally.
>
>      If you have a constant that needs some calculation to produce,
>      `eval-when-compile' can do that at compile-time.  For example,
>
>           (defvar my-regexp
>             (eval-when-compile (regexp-opt '("aaa" "aba" "abb"))))
>
> Maybe I should practice what I preach and use mercurial to start an
> experimental branch to look at the impact on performance of doing this
> refactoring :-)

I made some tests a while ago, and the impact was very small.

- Carsten

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2007-09-02  7:30 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-07-16 13:21 proposal: defconst/defcustom org-tags-regexp Adam Spiers
2007-07-18 21:24 ` Carsten Dominik
2007-07-19 13:08   ` Eddward DeVilla
2007-07-19 15:08     ` Carsten Dominik
2007-07-19 16:14       ` Eddward DeVilla
2007-07-20 15:05   ` Adam Spiers
     [not found]     ` <931245f7aaa24436377e537552f1beab@science.uva.nl>
2007-09-01 11:01       ` Adam Spiers
2007-09-02  7:30         ` Carsten Dominik

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).