emacs-orgmode@gnu.org archives
 help / color / mirror / code / Atom feed
* Context-sensitive word count in org mode (elisp)
@ 2011-02-16  3:51 Paul Sexton
  2011-02-16  9:12 ` Christian Moe
                   ` (4 more replies)
  0 siblings, 5 replies; 15+ messages in thread
From: Paul Sexton @ 2011-02-16  3:51 UTC (permalink / raw)
  To: emacs-orgmode

I am trying to reduce the word count in a document I am writing. The
existing word count functionality for emacs is surprisingly lacking.
I wanted a word count function for org mode which excluded tables and 
comments, and ended up writing one myself. 

If this function is called with a region highlighted, it counts the words in
the region. Otherwise it counts words in the whole buffer.

It ignores commented lines and tables. LaTeX-style macros such as 
\foo{bar,baz} are counted as 1 word, as a compromise (more often than not 
they should count as 0, but they do sometimes expand to 1 or more words
in the final document). 

Limitations:
- Does not ignore BEGIN_SRC/END_SRC or inline src_* blocks (babel).
  Should be easy enough to add however.
- There is probably a better way of identifying latex macros
  than my 'latex-macro-regexp' below.
- Ignores all org links. I couldn't figure out how to extract "description"
  text from links, but I didn't look very hard.

Improvements welcome.

Paul

------------------------------------------------------------------------

(defun in-comment-p ()
  "Return non-nil if point is in a comment."
  (if (or (null comment-start-skip)
	  (eq (preceding-char) ?\r))
      nil
    (save-excursion
      (let ((pos (point)))
	(re-search-backward "^\\|\r" nil t)
	(or (looking-at comment-start-skip)
	    (re-search-forward comment-start-skip pos t))))))

(defun in-org-table-p ()
  "Return non-nil if point is in an org-mode table."
  (if (or (not (boundp 'org-table-any-line-regexp))
          (null org-table-any-line-regexp)
	  (eq (preceding-char) ?\r))
      nil
    (save-excursion
      (let ((pos (point)))
	(re-search-backward "^\\|\r" nil t)
	(looking-at org-table-any-line-regexp)))))


(defvar latex-macro-regexp "\\\\[A-Za-z]+\\(\\[[^]]*\\]\\|\\){\\([^}]*\\)}")


(defun org-word-count (beg end)
  (interactive "r")
  (unless mark-active
    (setf beg (point-min)
          end (point-max)))
  (let ((wc 0))
    (save-excursion
      (goto-char beg)
      (while (< (point) end)
        (re-search-forward "\\w+\\W*")
        (cond
         ((or (in-comment-p) (in-org-table-p))
          nil)
         ((looking-at org-any-link-re)
          (goto-char (match-end 0)))
         ((save-excursion
            (backward-char)
            (looking-at latex-macro-regexp))
          (goto-char (match-end 0))
          (setf wc (+ 2 wc)))
         (t
          (incf wc)))))
    (message (format "%d words in %s." wc
                     (if mark-active "region" "buffer")))))

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Context-sensitive word count in org mode (elisp)
  2011-02-16  3:51 Paul Sexton
@ 2011-02-16  9:12 ` Christian Moe
  2011-02-16  9:47   ` Dan Davison
  2011-02-16  9:45 ` Christian Moe
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 15+ messages in thread
From: Christian Moe @ 2011-02-16  9:12 UTC (permalink / raw)
  To: Paul Sexton; +Cc: emacs-orgmode

Hi,

This would be really neat to have as a part of org-mode eventually. 
Another itch scratched.

You can probably replace your custom predicate functions with the 
existing org-at-table-p and org-in-commented-line (works for me). 
Maybe there's something to replace the latex-macro-regexp too.

Yours,
Christian

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Context-sensitive word count in org mode (elisp)
  2011-02-16  3:51 Paul Sexton
  2011-02-16  9:12 ` Christian Moe
@ 2011-02-16  9:45 ` Christian Moe
  2011-02-16 20:34   ` Paul Sexton
  2011-02-16 10:14 ` Bastien
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 15+ messages in thread
From: Christian Moe @ 2011-02-16  9:45 UTC (permalink / raw)
  To: Paul Sexton; +Cc: emacs-orgmode

Forgot to add the code.

#+begin_src emacs-lisp
   ;; Adapted from code posted by Paul Sexton <2011-02-16 Wed 4:51am>
   ;; - Everything now contained in one function
   ;; - Will count correct number of words inside Latex macro

   (defun org-word-count (beg end)
   (interactive "r")
   (unless mark-active
     (setf beg (point-min)
	  end (point-max)))
   (let ((wc 0)
	(latex-macro-regexp "\\\\[A-Za-z]+\\(\\[[^]]*\\]\\| 
\\){\\([^}]*\\)}"))   ; CHANGED
     (save-excursion
       (goto-char beg)
       (while (< (point) end)
         (re-search-forward "\\w+\\W*")
         (cond
          ((or (org-in-commented-line) (org-at-table-p)) ; CHANGED
           nil)
          ((looking-at org-any-link-re)
           (goto-char (match-end 0)))
          ((save-excursion
             (backward-char)
             (looking-at latex-macro-regexp))
           (goto-char (match-beginning 2))  ; CHANGED
           (setf wc (+ 2 wc)))
          (t
           (incf wc)))))
     (message (format "%d words in %s." wc
                      (if mark-active "region" "buffer")))))
#+end_src

Yours,
Christian

On 2/16/11 4:51 AM, Paul Sexton wrote:
> I am trying to reduce the word count in a document I am writing. The
> existing word count functionality for emacs is surprisingly lacking.
> I wanted a word count function for org mode which excluded tables and
> comments, and ended up writing one myself.
>
> If this function is called with a region highlighted, it counts the words in
> the region. Otherwise it counts words in the whole buffer.
>
> It ignores commented lines and tables. LaTeX-style macros such as
> \foo{bar,baz} are counted as 1 word, as a compromise (more often than not
> they should count as 0, but they do sometimes expand to 1 or more words
> in the final document).
>
> Limitations:
> - Does not ignore BEGIN_SRC/END_SRC or inline src_* blocks (babel).
>    Should be easy enough to add however.
> - There is probably a better way of identifying latex macros
>    than my 'latex-macro-regexp' below.
> - Ignores all org links. I couldn't figure out how to extract "description"
>    text from links, but I didn't look very hard.
>
> Improvements welcome.
>
> Paul
>
> ------------------------------------------------------------------------
>
> (defun in-comment-p ()
>    "Return non-nil if point is in a comment."
>    (if (or (null comment-start-skip)
> 	  (eq (preceding-char) ?\r))
>        nil
>      (save-excursion
>        (let ((pos (point)))
> 	(re-search-backward "^\\|\r" nil t)
> 	(or (looking-at comment-start-skip)
> 	    (re-search-forward comment-start-skip pos t))))))
>
> (defun in-org-table-p ()
>    "Return non-nil if point is in an org-mode table."
>    (if (or (not (boundp 'org-table-any-line-regexp))
>            (null org-table-any-line-regexp)
> 	  (eq (preceding-char) ?\r))
>        nil
>      (save-excursion
>        (let ((pos (point)))
> 	(re-search-backward "^\\|\r" nil t)
> 	(looking-at org-table-any-line-regexp)))))
>
>
> (defvar latex-macro-regexp "\\\\[A-Za-z]+\\(\\[[^]]*\\]\\|\\){\\([^}]*\\)}")
>
>
> (defun org-word-count (beg end)
>    (interactive "r")
>    (unless mark-active
>      (setf beg (point-min)
>            end (point-max)))
>    (let ((wc 0))
>      (save-excursion
>        (goto-char beg)
>        (while (<  (point) end)
>          (re-search-forward "\\w+\\W*")
>          (cond
>           ((or (in-comment-p) (in-org-table-p))
>            nil)
>           ((looking-at org-any-link-re)
>            (goto-char (match-end 0)))
>           ((save-excursion
>              (backward-char)
>              (looking-at latex-macro-regexp))
>            (goto-char (match-end 0))
>            (setf wc (+ 2 wc)))
>           (t
>            (incf wc)))))
>      (message (format "%d words in %s." wc
>                       (if mark-active "region" "buffer")))))
>
>
>
> _______________________________________________
> Emacs-orgmode mailing list
> Please use `Reply All' to send replies to the list.
> Emacs-orgmode@gnu.org
> http://lists.gnu.org/mailman/listinfo/emacs-orgmode
>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Context-sensitive word count in org mode (elisp)
  2011-02-16  9:12 ` Christian Moe
@ 2011-02-16  9:47   ` Dan Davison
  0 siblings, 0 replies; 15+ messages in thread
From: Dan Davison @ 2011-02-16  9:47 UTC (permalink / raw)
  To: mail; +Cc: emacs-orgmode, Paul Sexton

Hi Paul,

Sounds useful to me. Is it appropriate to use the existing org function
`org-in-regexps-block-p'? Or `org-context'? (Not speaking from any
experience of using these functions I just noticed them.)

Dan

Christian Moe <mail@christianmoe.com> writes:

> Hi,
>
> This would be really neat to have as a part of org-mode
> eventually. Another itch scratched.
>
> You can probably replace your custom predicate functions with the
> existing org-at-table-p and org-in-commented-line (works for
> me). Maybe there's something to replace the latex-macro-regexp too.
>
> Yours,
> Christian
>
> _______________________________________________
> Emacs-orgmode mailing list
> Please use `Reply All' to send replies to the list.
> Emacs-orgmode@gnu.org
> http://lists.gnu.org/mailman/listinfo/emacs-orgmode

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Context-sensitive word count in org mode (elisp)
  2011-02-16  3:51 Paul Sexton
  2011-02-16  9:12 ` Christian Moe
  2011-02-16  9:45 ` Christian Moe
@ 2011-02-16 10:14 ` Bastien
  2011-02-16 18:15   ` Samuel Wales
  2011-02-16 13:03 ` Joost Kremers
  2011-02-16 23:28 ` Paul Sexton
  4 siblings, 1 reply; 15+ messages in thread
From: Bastien @ 2011-02-16 10:14 UTC (permalink / raw)
  To: Paul Sexton; +Cc: emacs-orgmode

Hi Paul,

Paul Sexton <psexton@xnet.co.nz> writes:

> I am trying to reduce the word count in a document I am writing. The
> existing word count functionality for emacs is surprisingly lacking.
> I wanted a word count function for org mode which excluded tables and 
> comments, and ended up writing one myself. 
>
> If this function is called with a region highlighted, it counts the words in
> the region. Otherwise it counts words in the whole buffer.

Looks useful, thanks!  

I'll let this discussion grow a bit until we have something we can
perhaps include in Org.

Also, counting words from Emacs is a frequent request from users.  If
your solution for Org could be integrated into a more general solution
for Emacs (working on various modes), that would be a win.  See:

  http://stray-notes.blogspot.com/2009/12/emacs-word-count.html
  http://www.sultanik.com/Word_count_in_Emacs

.. and other pointers.

-- 
 Bastien

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Context-sensitive word count in org mode (elisp)
  2011-02-16  3:51 Paul Sexton
                   ` (2 preceding siblings ...)
  2011-02-16 10:14 ` Bastien
@ 2011-02-16 13:03 ` Joost Kremers
  2011-02-16 23:28 ` Paul Sexton
  4 siblings, 0 replies; 15+ messages in thread
From: Joost Kremers @ 2011-02-16 13:03 UTC (permalink / raw)
  To: emacs-orgmode

On Wed, Feb 16, 2011 at 03:51:23AM +0000, Paul Sexton wrote:
> Improvements welcome.

one remark about your naming convertions: 

(info "(elisp) Coding Conventions")

especially the second point. in short, you shouldn't use names such as
"in-comment-p" or "latex-macro-regexp" for global variables and/or functions.
use a prefix to distinguish them from other programs/packages.


-- 
Joost Kremers
Life has its moments

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Context-sensitive word count in org mode (elisp)
@ 2011-02-16 16:22 Benjamin Beckwith
  2011-02-16 16:32 ` Bastien
  2011-02-16 23:31 ` Paul Sexton
  0 siblings, 2 replies; 15+ messages in thread
From: Benjamin Beckwith @ 2011-02-16 16:22 UTC (permalink / raw)
  To: Paul Sexton, emacs-orgmode

Hi Paul,

I have a small emacs project that creates a word-count mode and function
for use.  It currently does not ignore sections like your proposal, but I think
that would be interesting functionality.

Mine does keep a running tally of adds/delete in the mode line.  It also
lets you set a goal and provides a visual indication when that goal is
met -- I believe that this feature may be of interest to you.

You can find it at:
https://github.com/bnbeckwith/wc-mode

Regards,
Ben

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Re: Context-sensitive word count in org mode (elisp)
  2011-02-16 16:22 Context-sensitive word count in org mode (elisp) Benjamin Beckwith
@ 2011-02-16 16:32 ` Bastien
  2011-02-16 18:56   ` Benjamin Beckwith
  2011-02-16 23:31 ` Paul Sexton
  1 sibling, 1 reply; 15+ messages in thread
From: Bastien @ 2011-02-16 16:32 UTC (permalink / raw)
  To: Benjamin Beckwith; +Cc: emacs-orgmode, Paul Sexton

Hi Benjamin,

Benjamin Beckwith <bnbeckwith@gmail.com> writes:

> You can find it at:
> https://github.com/bnbeckwith/wc-mode

Nice.  How hard would it be to make it org-aware (i.e. don't count
syntactic elements in an org buffer)?

-- 
 Bastien

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Context-sensitive word count in org mode (elisp)
  2011-02-16 10:14 ` Bastien
@ 2011-02-16 18:15   ` Samuel Wales
  0 siblings, 0 replies; 15+ messages in thread
From: Samuel Wales @ 2011-02-16 18:15 UTC (permalink / raw)
  To: Bastien; +Cc: emacs-orgmode, Paul Sexton

I am enjoying this discussion.  Some ideas:

  1) ignore :noexport and anything else that prevents export (ideally
including whether the user has set certain todo kw to prevent export,
although that is not currently a feature in org, so would be too
complicated now)
  2) handle word counts just like c-c c-x c-d -- then you can find out
what sections have the most words

Samuel

-- 
The Kafka Pandemic:
http://thekafkapandemic.blogspot.com/2010/12/welcome-to-kafka-pandemic-two-forces_9182.html
I support the Whittemore-Peterson Institute (WPI)
===
I want to see the original (pre-hold) Lo et al. 2010 NIH/FDA/Harvard MLV paper.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Re: Context-sensitive word count in org mode (elisp)
  2011-02-16 16:32 ` Bastien
@ 2011-02-16 18:56   ` Benjamin Beckwith
  0 siblings, 0 replies; 15+ messages in thread
From: Benjamin Beckwith @ 2011-02-16 18:56 UTC (permalink / raw)
  To: Bastien; +Cc: emacs-orgmode, Paul Sexton

> Nice.  How hard would it be to make it org-aware (i.e. don't count
> syntactic elements in an org buffer)?

I use the how-many function to get the search.  I'm not sure off the
top of my head, but there would be some rework required to make it
work properly and ignore those items.

I'll gladly accept any ideas or patches into my code to make it org-friendly.

-Ben

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Context-sensitive word count in org mode (elisp)
  2011-02-16  9:45 ` Christian Moe
@ 2011-02-16 20:34   ` Paul Sexton
  0 siblings, 0 replies; 15+ messages in thread
From: Paul Sexton @ 2011-02-16 20:34 UTC (permalink / raw)
  To: emacs-orgmode

Christian Moe <mail <at> christianmoe.com> writes:

> 
> Forgot to add the code.
> 
> #+begin_src emacs-lisp
>    ;; Adapted from code posted by Paul Sexton <2011-02-16 Wed 4:51am>
>    ;; - Everything now contained in one function
>    ;; - Will count correct number of words inside Latex macro
> 
>    (defun org-word-count (beg end)
>    (interactive "r")
>    (unless mark-active
>      (setf beg (point-min)
> 	  end (point-max)))
>    (let ((wc 0)
> 	(latex-macro-regexp "\\\\[A-Za-z]+\\(\\[[^]]*\\]\\| 
> \\){\\([^}]*\\)}"))   ; CHANGED
>      (save-excursion
>        (goto-char beg)
>        (while (< (point) end)
>          (re-search-forward "\\w+\\W*")
>          (cond
>           ((or (org-in-commented-line) (org-at-table-p)) ; CHANGED
>            nil)
>           ((looking-at org-any-link-re)
>            (goto-char (match-end 0)))
>           ((save-excursion
>              (backward-char)
>              (looking-at latex-macro-regexp))
>            (goto-char (match-beginning 2))  ; CHANGED
>            (setf wc (+ 2 wc)))
>           (t
>            (incf wc)))))
>      (message (format "%d words in %s." wc
>                       (if mark-active "region" "buffer")))))
> #+end_src
> 

Thanks, I wasn't aware of those pre-existing functions.

I don't agree with changing '(match-end 0)' to '(match-beginning 2)'
however. For most latex macros, I don't want to count the words inside
the macro's arguments. For example, I don't want the next of footnotes
to be included in the word count. However others differ, and there will
always be cases where one DOES want to count the macro arguments - so maybe
org-word-count should do this optionally.

Paul

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Context-sensitive word count in org mode (elisp)
  2011-02-16  3:51 Paul Sexton
                   ` (3 preceding siblings ...)
  2011-02-16 13:03 ` Joost Kremers
@ 2011-02-16 23:28 ` Paul Sexton
  2011-02-17 16:50   ` Samuel Wales
  2011-02-18 14:34   ` Bastien
  4 siblings, 2 replies; 15+ messages in thread
From: Paul Sexton @ 2011-02-16 23:28 UTC (permalink / raw)
  To: emacs-orgmode

Thanks for all the suggestions. Here is version 2.

Improvements:
- ignores source code blocks
- ignores tags and TODO keywords in headings
- ignores footnotes by default (option to force counting them)
- skips any sections tagged as not for export
- option to count words in latex macro arguments (they are ignored
  by default)

I would still like to count hyperlink descriptions but am not sure
how -- is there a function that fetches the description of the hyperlink
at point?

Paul

-----------------------------------------------------------------------

(defun org-word-count (beg end
                           &optional count-latex-macro-args?
                           count-footnotes?)
  "Report the number of words in the Org mode buffer or selected region.
Ignores:
- comments
- tables
- source code blocks (#+BEGIN_SRC ... #+END_SRC, and inline blocks)
- hyperlinks
- tags, priorities, and TODO keywords in headers
- sections tagged as 'not for export'.

The text of footnote definitions is ignored, unless the optional argument
COUNT-FOOTNOTES? is non-nil.

If the optional argument COUNT-LATEX-MACRO-ARGS? is non-nil, the word count
includes LaTeX macro arguments (the material between {curly braces}).
Otherwise, and by default, every LaTeX macro counts as 1 word regardless
of its arguments."
  (interactive "r")
  (unless mark-active
    (setf beg (point-min)
	  end (point-max)))
  (let ((wc 0)
	(latex-macro-regexp "\\\\[A-Za-z]+\\(\\[[^]]*\\]\\|\\){\\([^}]*\\)}"))
    (save-excursion
      (goto-char beg)
      (while (< (point) end)
        (re-search-forward "\\w+\\W*")
        (cond
         ;; Ignore comments.
         ((or (org-in-commented-line) (org-at-table-p))
          nil)
         ;; Ignore hyperlinks.
         ;; TODO need to count text of the link's description.
         ((looking-at org-any-link-re)
          (goto-char (match-end 0)))
         ;; Ignore source code blocks.
         ((org-in-regexps-block-p "^#\\+BEGIN_SRC\\W" "^#\\+END_SRC\\W")
          nil)
         ;; Ignore inline source blocks, counting them as 1 word.
         ((save-excursion
            (backward-char)
            (looking-at org-babel-inline-src-block-regexp))
          (goto-char (match-end 0))
          (setf wc (+ 2 wc)))
         ;; Count latex macros as 1 word, ignoring their arguments.
         ((save-excursion
            (backward-char)
            (looking-at latex-macro-regexp))
          (goto-char (if count-latex-macro-args?
                         (match-beginning 2)
                       (match-end 0)))
          (setf wc (+ 2 wc)))
         ;; Ignore footnotes.
         ((and (not count-footnotes?)
               (or (org-footnote-at-definition-p)
                   (org-footnote-at-reference-p)))
          nil)
         (t
          (let ((contexts (org-context)))
            (cond
             ;; Ignore tags and TODO keywords, etc.
             ((or (assoc :todo-keyword contexts)
                  (assoc :priority contexts)
                  (assoc :keyword contexts)
                  (assoc :checkbox contexts))
              nil)
             ;; Ignore sections marked with tags that are
             ;; excluded from export.
             ((assoc :tags contexts)
              (if (intersection (org-get-tags-at) org-export-exclude-tags
                                :test 'equal)
                  (org-forward-same-level 1)
                nil))
             (t
              (incf wc))))))))
    (message (format "%d words in %s." wc
                     (if mark-active "region" "buffer")))))

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Context-sensitive word count in org mode (elisp)
  2011-02-16 16:22 Context-sensitive word count in org mode (elisp) Benjamin Beckwith
  2011-02-16 16:32 ` Bastien
@ 2011-02-16 23:31 ` Paul Sexton
  1 sibling, 0 replies; 15+ messages in thread
From: Paul Sexton @ 2011-02-16 23:31 UTC (permalink / raw)
  To: emacs-orgmode

That looks really good. My suggestion would be to modify it so that
'wc-count' can be redefined on a per-major-mode or per-buffer basis,
eg via a buffer-local variable 'wc-count-function'.

Then my org-word-count function could be slotted in fairly easily - 
I would just have to modify it so that it *returns* the count, rather
than reporting it.

Paul

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Context-sensitive word count in org mode (elisp)
  2011-02-17 16:50   ` Samuel Wales
@ 2011-02-17 18:55     ` Paul Sexton
  0 siblings, 0 replies; 15+ messages in thread
From: Paul Sexton @ 2011-02-17 18:55 UTC (permalink / raw)
  To: emacs-orgmode

Samuel Wales <samologist <at> gmail.com> writes:
> 
> This looks great.
> 
> How hard do you think it would be to show how many words there are for
> every subtree in a similar way to how clock durations are summed in
> c-c c-x c-d and displayed in the subtree itself in reverse video?

I might leave that functionality to you -- it's pretty far beyond what I need 
out of a word count function.

Cheers
P

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Context-sensitive word count in org mode (elisp)
  2011-02-18 14:34   ` Bastien
@ 2011-02-20 21:49     ` Paul Sexton
  0 siblings, 0 replies; 15+ messages in thread
From: Paul Sexton @ 2011-02-20 21:49 UTC (permalink / raw)
  To: emacs-orgmode

Bastien <bastien.guerry <at> wikimedia.fr> writes:
> #+begin_src emacs-lisp
>   (when (looking-at org-bracket-link-analytic-regexp)
>     (match-string-no-properties 5))
> #+end_src emacs-lisp

Thanks. Here is version 3 if the function, which is now able to count 
words in link descriptions.

The code to advance to the next word has been moved to the end of the 
loop, which improves accuracy.

Paul

----------------------------------------------------------------------

(defun org-word-count (beg end
                           &optional count-latex-macro-args?
                           count-footnotes?)
  "Report the number of words in the Org mode buffer or selected region.
Ignores:
- comments
- tables
- source code blocks (#+BEGIN_SRC ... #+END_SRC, and inline blocks)
- hyperlinks (but does count words in hyperlink descriptions)
- tags, priorities, and TODO keywords in headers
- sections tagged as 'not for export'.

The text of footnote definitions is ignored, unless the optional argument
COUNT-FOOTNOTES? is non-nil.

If the optional argument COUNT-LATEX-MACRO-ARGS? is non-nil, the word count
includes LaTeX macro arguments (the material between {curly braces}).
Otherwise, and by default, every LaTeX macro counts as 1 word regardless
of its arguments."
  (interactive "r")
  (unless mark-active
    (setf beg (point-min)
	  end (point-max)))
  (let ((wc 0)
	(latex-macro-regexp "\\\\[A-Za-z]+\\(\\[[^]]*\\]\\|\\){\\([^}]*\\)}"))
    (save-excursion
      (goto-char beg)
      (while (< (point) end)
        (cond
         ;; Ignore comments.
         ((or (org-in-commented-line) (org-at-table-p))
          nil)
         ;; Ignore hyperlinks. But if link has a description, count
         ;; the words within the description.
         ((looking-at org-bracket-link-analytic-regexp)
          (when (match-string-no-properties 5)
            (let ((desc (match-string-no-properties 5)))
              (save-match-data 
                (incf wc (length (remove "" (org-split-string
                                             desc "\\W")))))))
          (goto-char (match-end 0)))
         ((looking-at org-any-link-re)
          (goto-char (match-end 0)))
         ;; Ignore source code blocks.
         ((org-in-regexps-block-p "^#\\+BEGIN_SRC\\W" "^#\\+END_SRC\\W")
          nil)
         ;; Ignore inline source blocks, counting them as 1 word.
         ((save-excursion
            (backward-char)
            (looking-at org-babel-inline-src-block-regexp))
          (goto-char (match-end 0))
          (setf wc (+ 2 wc)))
         ;; Count latex macros as 1 word, ignoring their arguments.
         ((save-excursion
            (backward-char)
            (looking-at latex-macro-regexp))
          (goto-char (if count-latex-macro-args?
                         (match-beginning 2)
                       (match-end 0)))
          (setf wc (+ 2 wc)))
         ;; Ignore footnotes.
         ((and (not count-footnotes?)
               (or (org-footnote-at-definition-p)
                   (org-footnote-at-reference-p)))
          nil)
         (t
          (let ((contexts (org-context)))
            (cond
             ;; Ignore tags and TODO keywords, etc.
             ((or (assoc :todo-keyword contexts)
                  (assoc :priority contexts)
                  (assoc :keyword contexts)
                  (assoc :checkbox contexts))
              nil)
             ;; Ignore sections marked with tags that are
             ;; excluded from export.
             ((assoc :tags contexts)
              (if (intersection (org-get-tags-at) org-export-exclude-tags
                                :test 'equal)
                  (org-forward-same-level 1)
                nil))
             (t
              (incf wc))))))
        (re-search-forward "\\w+\\W*")))
    (message (format "%d words in %s." wc
                     (if mark-active "region" "buffer")))))

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2011-02-20 21:49 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-02-16 16:22 Context-sensitive word count in org mode (elisp) Benjamin Beckwith
2011-02-16 16:32 ` Bastien
2011-02-16 18:56   ` Benjamin Beckwith
2011-02-16 23:31 ` Paul Sexton
  -- strict thread matches above, loose matches on Subject: below --
2011-02-16  3:51 Paul Sexton
2011-02-16  9:12 ` Christian Moe
2011-02-16  9:47   ` Dan Davison
2011-02-16  9:45 ` Christian Moe
2011-02-16 20:34   ` Paul Sexton
2011-02-16 10:14 ` Bastien
2011-02-16 18:15   ` Samuel Wales
2011-02-16 13:03 ` Joost Kremers
2011-02-16 23:28 ` Paul Sexton
2011-02-17 16:50   ` Samuel Wales
2011-02-17 18:55     ` Paul Sexton
2011-02-18 14:34   ` Bastien
2011-02-20 21:49     ` Paul Sexton

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).