emacs-orgmode@gnu.org archives
 help / color / mirror / code / Atom feed
* Context-sensitive word count in org mode (elisp)
@ 2011-02-16  3:51 Paul Sexton
  2011-02-16  9:12 ` Christian Moe
                   ` (4 more replies)
  0 siblings, 5 replies; 18+ messages in thread
From: Paul Sexton @ 2011-02-16  3:51 UTC (permalink / raw)
  To: emacs-orgmode

I am trying to reduce the word count in a document I am writing. The
existing word count functionality for emacs is surprisingly lacking.
I wanted a word count function for org mode which excluded tables and 
comments, and ended up writing one myself. 

If this function is called with a region highlighted, it counts the words in
the region. Otherwise it counts words in the whole buffer.

It ignores commented lines and tables. LaTeX-style macros such as 
\foo{bar,baz} are counted as 1 word, as a compromise (more often than not 
they should count as 0, but they do sometimes expand to 1 or more words
in the final document). 

Limitations:
- Does not ignore BEGIN_SRC/END_SRC or inline src_* blocks (babel).
  Should be easy enough to add however.
- There is probably a better way of identifying latex macros
  than my 'latex-macro-regexp' below.
- Ignores all org links. I couldn't figure out how to extract "description"
  text from links, but I didn't look very hard.

Improvements welcome.

Paul

------------------------------------------------------------------------

(defun in-comment-p ()
  "Return non-nil if point is in a comment."
  (if (or (null comment-start-skip)
	  (eq (preceding-char) ?\r))
      nil
    (save-excursion
      (let ((pos (point)))
	(re-search-backward "^\\|\r" nil t)
	(or (looking-at comment-start-skip)
	    (re-search-forward comment-start-skip pos t))))))

(defun in-org-table-p ()
  "Return non-nil if point is in an org-mode table."
  (if (or (not (boundp 'org-table-any-line-regexp))
          (null org-table-any-line-regexp)
	  (eq (preceding-char) ?\r))
      nil
    (save-excursion
      (let ((pos (point)))
	(re-search-backward "^\\|\r" nil t)
	(looking-at org-table-any-line-regexp)))))


(defvar latex-macro-regexp "\\\\[A-Za-z]+\\(\\[[^]]*\\]\\|\\){\\([^}]*\\)}")


(defun org-word-count (beg end)
  (interactive "r")
  (unless mark-active
    (setf beg (point-min)
          end (point-max)))
  (let ((wc 0))
    (save-excursion
      (goto-char beg)
      (while (< (point) end)
        (re-search-forward "\\w+\\W*")
        (cond
         ((or (in-comment-p) (in-org-table-p))
          nil)
         ((looking-at org-any-link-re)
          (goto-char (match-end 0)))
         ((save-excursion
            (backward-char)
            (looking-at latex-macro-regexp))
          (goto-char (match-end 0))
          (setf wc (+ 2 wc)))
         (t
          (incf wc)))))
    (message (format "%d words in %s." wc
                     (if mark-active "region" "buffer")))))

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Context-sensitive word count in org mode (elisp)
  2011-02-16  3:51 Context-sensitive word count in org mode (elisp) Paul Sexton
@ 2011-02-16  9:12 ` Christian Moe
  2011-02-16  9:47   ` Dan Davison
  2011-02-16  9:45 ` Christian Moe
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 18+ messages in thread
From: Christian Moe @ 2011-02-16  9:12 UTC (permalink / raw)
  To: Paul Sexton; +Cc: emacs-orgmode

Hi,

This would be really neat to have as a part of org-mode eventually. 
Another itch scratched.

You can probably replace your custom predicate functions with the 
existing org-at-table-p and org-in-commented-line (works for me). 
Maybe there's something to replace the latex-macro-regexp too.

Yours,
Christian

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Context-sensitive word count in org mode (elisp)
  2011-02-16  3:51 Context-sensitive word count in org mode (elisp) Paul Sexton
  2011-02-16  9:12 ` Christian Moe
@ 2011-02-16  9:45 ` Christian Moe
  2011-02-16 20:34   ` Paul Sexton
  2011-02-16 10:14 ` Bastien
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 18+ messages in thread
From: Christian Moe @ 2011-02-16  9:45 UTC (permalink / raw)
  To: Paul Sexton; +Cc: emacs-orgmode

Forgot to add the code.

#+begin_src emacs-lisp
   ;; Adapted from code posted by Paul Sexton <2011-02-16 Wed 4:51am>
   ;; - Everything now contained in one function
   ;; - Will count correct number of words inside Latex macro

   (defun org-word-count (beg end)
   (interactive "r")
   (unless mark-active
     (setf beg (point-min)
	  end (point-max)))
   (let ((wc 0)
	(latex-macro-regexp "\\\\[A-Za-z]+\\(\\[[^]]*\\]\\| 
\\){\\([^}]*\\)}"))   ; CHANGED
     (save-excursion
       (goto-char beg)
       (while (< (point) end)
         (re-search-forward "\\w+\\W*")
         (cond
          ((or (org-in-commented-line) (org-at-table-p)) ; CHANGED
           nil)
          ((looking-at org-any-link-re)
           (goto-char (match-end 0)))
          ((save-excursion
             (backward-char)
             (looking-at latex-macro-regexp))
           (goto-char (match-beginning 2))  ; CHANGED
           (setf wc (+ 2 wc)))
          (t
           (incf wc)))))
     (message (format "%d words in %s." wc
                      (if mark-active "region" "buffer")))))
#+end_src

Yours,
Christian

On 2/16/11 4:51 AM, Paul Sexton wrote:
> I am trying to reduce the word count in a document I am writing. The
> existing word count functionality for emacs is surprisingly lacking.
> I wanted a word count function for org mode which excluded tables and
> comments, and ended up writing one myself.
>
> If this function is called with a region highlighted, it counts the words in
> the region. Otherwise it counts words in the whole buffer.
>
> It ignores commented lines and tables. LaTeX-style macros such as
> \foo{bar,baz} are counted as 1 word, as a compromise (more often than not
> they should count as 0, but they do sometimes expand to 1 or more words
> in the final document).
>
> Limitations:
> - Does not ignore BEGIN_SRC/END_SRC or inline src_* blocks (babel).
>    Should be easy enough to add however.
> - There is probably a better way of identifying latex macros
>    than my 'latex-macro-regexp' below.
> - Ignores all org links. I couldn't figure out how to extract "description"
>    text from links, but I didn't look very hard.
>
> Improvements welcome.
>
> Paul
>
> ------------------------------------------------------------------------
>
> (defun in-comment-p ()
>    "Return non-nil if point is in a comment."
>    (if (or (null comment-start-skip)
> 	  (eq (preceding-char) ?\r))
>        nil
>      (save-excursion
>        (let ((pos (point)))
> 	(re-search-backward "^\\|\r" nil t)
> 	(or (looking-at comment-start-skip)
> 	    (re-search-forward comment-start-skip pos t))))))
>
> (defun in-org-table-p ()
>    "Return non-nil if point is in an org-mode table."
>    (if (or (not (boundp 'org-table-any-line-regexp))
>            (null org-table-any-line-regexp)
> 	  (eq (preceding-char) ?\r))
>        nil
>      (save-excursion
>        (let ((pos (point)))
> 	(re-search-backward "^\\|\r" nil t)
> 	(looking-at org-table-any-line-regexp)))))
>
>
> (defvar latex-macro-regexp "\\\\[A-Za-z]+\\(\\[[^]]*\\]\\|\\){\\([^}]*\\)}")
>
>
> (defun org-word-count (beg end)
>    (interactive "r")
>    (unless mark-active
>      (setf beg (point-min)
>            end (point-max)))
>    (let ((wc 0))
>      (save-excursion
>        (goto-char beg)
>        (while (<  (point) end)
>          (re-search-forward "\\w+\\W*")
>          (cond
>           ((or (in-comment-p) (in-org-table-p))
>            nil)
>           ((looking-at org-any-link-re)
>            (goto-char (match-end 0)))
>           ((save-excursion
>              (backward-char)
>              (looking-at latex-macro-regexp))
>            (goto-char (match-end 0))
>            (setf wc (+ 2 wc)))
>           (t
>            (incf wc)))))
>      (message (format "%d words in %s." wc
>                       (if mark-active "region" "buffer")))))
>
>
>
> _______________________________________________
> Emacs-orgmode mailing list
> Please use `Reply All' to send replies to the list.
> Emacs-orgmode@gnu.org
> http://lists.gnu.org/mailman/listinfo/emacs-orgmode
>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Context-sensitive word count in org mode (elisp)
  2011-02-16  9:12 ` Christian Moe
@ 2011-02-16  9:47   ` Dan Davison
  0 siblings, 0 replies; 18+ messages in thread
From: Dan Davison @ 2011-02-16  9:47 UTC (permalink / raw)
  To: mail; +Cc: emacs-orgmode, Paul Sexton

Hi Paul,

Sounds useful to me. Is it appropriate to use the existing org function
`org-in-regexps-block-p'? Or `org-context'? (Not speaking from any
experience of using these functions I just noticed them.)

Dan

Christian Moe <mail@christianmoe.com> writes:

> Hi,
>
> This would be really neat to have as a part of org-mode
> eventually. Another itch scratched.
>
> You can probably replace your custom predicate functions with the
> existing org-at-table-p and org-in-commented-line (works for
> me). Maybe there's something to replace the latex-macro-regexp too.
>
> Yours,
> Christian
>
> _______________________________________________
> Emacs-orgmode mailing list
> Please use `Reply All' to send replies to the list.
> Emacs-orgmode@gnu.org
> http://lists.gnu.org/mailman/listinfo/emacs-orgmode

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Context-sensitive word count in org mode (elisp)
  2011-02-16  3:51 Context-sensitive word count in org mode (elisp) Paul Sexton
  2011-02-16  9:12 ` Christian Moe
  2011-02-16  9:45 ` Christian Moe
@ 2011-02-16 10:14 ` Bastien
  2011-02-16 18:15   ` Samuel Wales
  2011-02-16 13:03 ` Joost Kremers
  2011-02-16 23:28 ` Paul Sexton
  4 siblings, 1 reply; 18+ messages in thread
From: Bastien @ 2011-02-16 10:14 UTC (permalink / raw)
  To: Paul Sexton; +Cc: emacs-orgmode

Hi Paul,

Paul Sexton <psexton@xnet.co.nz> writes:

> I am trying to reduce the word count in a document I am writing. The
> existing word count functionality for emacs is surprisingly lacking.
> I wanted a word count function for org mode which excluded tables and 
> comments, and ended up writing one myself. 
>
> If this function is called with a region highlighted, it counts the words in
> the region. Otherwise it counts words in the whole buffer.

Looks useful, thanks!  

I'll let this discussion grow a bit until we have something we can
perhaps include in Org.

Also, counting words from Emacs is a frequent request from users.  If
your solution for Org could be integrated into a more general solution
for Emacs (working on various modes), that would be a win.  See:

  http://stray-notes.blogspot.com/2009/12/emacs-word-count.html
  http://www.sultanik.com/Word_count_in_Emacs

.. and other pointers.

-- 
 Bastien

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Context-sensitive word count in org mode (elisp)
  2011-02-16  3:51 Context-sensitive word count in org mode (elisp) Paul Sexton
                   ` (2 preceding siblings ...)
  2011-02-16 10:14 ` Bastien
@ 2011-02-16 13:03 ` Joost Kremers
  2011-02-16 23:28 ` Paul Sexton
  4 siblings, 0 replies; 18+ messages in thread
From: Joost Kremers @ 2011-02-16 13:03 UTC (permalink / raw)
  To: emacs-orgmode

On Wed, Feb 16, 2011 at 03:51:23AM +0000, Paul Sexton wrote:
> Improvements welcome.

one remark about your naming convertions: 

(info "(elisp) Coding Conventions")

especially the second point. in short, you shouldn't use names such as
"in-comment-p" or "latex-macro-regexp" for global variables and/or functions.
use a prefix to distinguish them from other programs/packages.


-- 
Joost Kremers
Life has its moments

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Context-sensitive word count in org mode (elisp)
  2011-02-16 10:14 ` Bastien
@ 2011-02-16 18:15   ` Samuel Wales
  0 siblings, 0 replies; 18+ messages in thread
From: Samuel Wales @ 2011-02-16 18:15 UTC (permalink / raw)
  To: Bastien; +Cc: emacs-orgmode, Paul Sexton

I am enjoying this discussion.  Some ideas:

  1) ignore :noexport and anything else that prevents export (ideally
including whether the user has set certain todo kw to prevent export,
although that is not currently a feature in org, so would be too
complicated now)
  2) handle word counts just like c-c c-x c-d -- then you can find out
what sections have the most words

Samuel

-- 
The Kafka Pandemic:
http://thekafkapandemic.blogspot.com/2010/12/welcome-to-kafka-pandemic-two-forces_9182.html
I support the Whittemore-Peterson Institute (WPI)
===
I want to see the original (pre-hold) Lo et al. 2010 NIH/FDA/Harvard MLV paper.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Context-sensitive word count in org mode (elisp)
  2011-02-16  9:45 ` Christian Moe
@ 2011-02-16 20:34   ` Paul Sexton
  2011-02-17 10:02     ` Christian Moe
  0 siblings, 1 reply; 18+ messages in thread
From: Paul Sexton @ 2011-02-16 20:34 UTC (permalink / raw)
  To: emacs-orgmode

Christian Moe <mail <at> christianmoe.com> writes:

> 
> Forgot to add the code.
> 
> #+begin_src emacs-lisp
>    ;; Adapted from code posted by Paul Sexton <2011-02-16 Wed 4:51am>
>    ;; - Everything now contained in one function
>    ;; - Will count correct number of words inside Latex macro
> 
>    (defun org-word-count (beg end)
>    (interactive "r")
>    (unless mark-active
>      (setf beg (point-min)
> 	  end (point-max)))
>    (let ((wc 0)
> 	(latex-macro-regexp "\\\\[A-Za-z]+\\(\\[[^]]*\\]\\| 
> \\){\\([^}]*\\)}"))   ; CHANGED
>      (save-excursion
>        (goto-char beg)
>        (while (< (point) end)
>          (re-search-forward "\\w+\\W*")
>          (cond
>           ((or (org-in-commented-line) (org-at-table-p)) ; CHANGED
>            nil)
>           ((looking-at org-any-link-re)
>            (goto-char (match-end 0)))
>           ((save-excursion
>              (backward-char)
>              (looking-at latex-macro-regexp))
>            (goto-char (match-beginning 2))  ; CHANGED
>            (setf wc (+ 2 wc)))
>           (t
>            (incf wc)))))
>      (message (format "%d words in %s." wc
>                       (if mark-active "region" "buffer")))))
> #+end_src
> 

Thanks, I wasn't aware of those pre-existing functions.

I don't agree with changing '(match-end 0)' to '(match-beginning 2)'
however. For most latex macros, I don't want to count the words inside
the macro's arguments. For example, I don't want the next of footnotes
to be included in the word count. However others differ, and there will
always be cases where one DOES want to count the macro arguments - so maybe
org-word-count should do this optionally.

Paul

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Context-sensitive word count in org mode (elisp)
  2011-02-16  3:51 Context-sensitive word count in org mode (elisp) Paul Sexton
                   ` (3 preceding siblings ...)
  2011-02-16 13:03 ` Joost Kremers
@ 2011-02-16 23:28 ` Paul Sexton
  2011-02-17 16:50   ` Samuel Wales
  2011-02-18 14:34   ` Bastien
  4 siblings, 2 replies; 18+ messages in thread
From: Paul Sexton @ 2011-02-16 23:28 UTC (permalink / raw)
  To: emacs-orgmode

Thanks for all the suggestions. Here is version 2.

Improvements:
- ignores source code blocks
- ignores tags and TODO keywords in headings
- ignores footnotes by default (option to force counting them)
- skips any sections tagged as not for export
- option to count words in latex macro arguments (they are ignored
  by default)

I would still like to count hyperlink descriptions but am not sure
how -- is there a function that fetches the description of the hyperlink
at point?

Paul

-----------------------------------------------------------------------

(defun org-word-count (beg end
                           &optional count-latex-macro-args?
                           count-footnotes?)
  "Report the number of words in the Org mode buffer or selected region.
Ignores:
- comments
- tables
- source code blocks (#+BEGIN_SRC ... #+END_SRC, and inline blocks)
- hyperlinks
- tags, priorities, and TODO keywords in headers
- sections tagged as 'not for export'.

The text of footnote definitions is ignored, unless the optional argument
COUNT-FOOTNOTES? is non-nil.

If the optional argument COUNT-LATEX-MACRO-ARGS? is non-nil, the word count
includes LaTeX macro arguments (the material between {curly braces}).
Otherwise, and by default, every LaTeX macro counts as 1 word regardless
of its arguments."
  (interactive "r")
  (unless mark-active
    (setf beg (point-min)
	  end (point-max)))
  (let ((wc 0)
	(latex-macro-regexp "\\\\[A-Za-z]+\\(\\[[^]]*\\]\\|\\){\\([^}]*\\)}"))
    (save-excursion
      (goto-char beg)
      (while (< (point) end)
        (re-search-forward "\\w+\\W*")
        (cond
         ;; Ignore comments.
         ((or (org-in-commented-line) (org-at-table-p))
          nil)
         ;; Ignore hyperlinks.
         ;; TODO need to count text of the link's description.
         ((looking-at org-any-link-re)
          (goto-char (match-end 0)))
         ;; Ignore source code blocks.
         ((org-in-regexps-block-p "^#\\+BEGIN_SRC\\W" "^#\\+END_SRC\\W")
          nil)
         ;; Ignore inline source blocks, counting them as 1 word.
         ((save-excursion
            (backward-char)
            (looking-at org-babel-inline-src-block-regexp))
          (goto-char (match-end 0))
          (setf wc (+ 2 wc)))
         ;; Count latex macros as 1 word, ignoring their arguments.
         ((save-excursion
            (backward-char)
            (looking-at latex-macro-regexp))
          (goto-char (if count-latex-macro-args?
                         (match-beginning 2)
                       (match-end 0)))
          (setf wc (+ 2 wc)))
         ;; Ignore footnotes.
         ((and (not count-footnotes?)
               (or (org-footnote-at-definition-p)
                   (org-footnote-at-reference-p)))
          nil)
         (t
          (let ((contexts (org-context)))
            (cond
             ;; Ignore tags and TODO keywords, etc.
             ((or (assoc :todo-keyword contexts)
                  (assoc :priority contexts)
                  (assoc :keyword contexts)
                  (assoc :checkbox contexts))
              nil)
             ;; Ignore sections marked with tags that are
             ;; excluded from export.
             ((assoc :tags contexts)
              (if (intersection (org-get-tags-at) org-export-exclude-tags
                                :test 'equal)
                  (org-forward-same-level 1)
                nil))
             (t
              (incf wc))))))))
    (message (format "%d words in %s." wc
                     (if mark-active "region" "buffer")))))

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Re: Context-sensitive word count in org mode (elisp)
  2011-02-16 20:34   ` Paul Sexton
@ 2011-02-17 10:02     ` Christian Moe
  2011-02-17 18:57       ` Eric Schulte
  0 siblings, 1 reply; 18+ messages in thread
From: Christian Moe @ 2011-02-17 10:02 UTC (permalink / raw)
  To: Paul Sexton; +Cc: emacs-orgmode

Hi,

> I don't agree with changing '(match-end 0)' to '(match-beginning 2)'
> however. For most latex macros, I don't want to count the words inside
> the macro's arguments.

Maybe you're right. I rarely use latex, so I can't think of a use case 
off the top of my head. Default no, optional yes sounds good.

Yours,
Christian

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Re: Context-sensitive word count in org mode (elisp)
  2011-02-16 23:28 ` Paul Sexton
@ 2011-02-17 16:50   ` Samuel Wales
  2011-02-17 18:55     ` Paul Sexton
  2011-02-18 14:34   ` Bastien
  1 sibling, 1 reply; 18+ messages in thread
From: Samuel Wales @ 2011-02-17 16:50 UTC (permalink / raw)
  To: Paul Sexton; +Cc: emacs-orgmode

This looks great.

How hard do you think it would be to show how many words there are for
every subtree in a similar way to how clock durations are summed in
c-c c-x c-d and displayed in the subtree itself in reverse video?

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Context-sensitive word count in org mode (elisp)
  2011-02-17 16:50   ` Samuel Wales
@ 2011-02-17 18:55     ` Paul Sexton
  2011-03-27 19:40       ` [Orgmode] " Samuel Wales
  0 siblings, 1 reply; 18+ messages in thread
From: Paul Sexton @ 2011-02-17 18:55 UTC (permalink / raw)
  To: emacs-orgmode

Samuel Wales <samologist <at> gmail.com> writes:
> 
> This looks great.
> 
> How hard do you think it would be to show how many words there are for
> every subtree in a similar way to how clock durations are summed in
> c-c c-x c-d and displayed in the subtree itself in reverse video?

I might leave that functionality to you -- it's pretty far beyond what I need 
out of a word count function.

Cheers
P

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Re: Context-sensitive word count in org mode (elisp)
  2011-02-17 10:02     ` Christian Moe
@ 2011-02-17 18:57       ` Eric Schulte
  0 siblings, 0 replies; 18+ messages in thread
From: Eric Schulte @ 2011-02-17 18:57 UTC (permalink / raw)
  To: mail; +Cc: emacs-orgmode, Paul Sexton

Christian Moe <mail@christianmoe.com> writes:

> Hi,
>
>> I don't agree with changing '(match-end 0)' to '(match-beginning 2)'
>> however. For most latex macros, I don't want to count the words inside
>> the macro's arguments.
>
> Maybe you're right. I rarely use latex, so I can't think of a use case
> off the top of my head. Default no, optional yes sounds good.
>

I find the following handy for word-count of latex documents...

#+begin_src sh :var basename=(file-name-sans-extension (buffer-file-name))
  pdftotext $basename.pdf - |wc -w
#+end_src

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Re: Context-sensitive word count in org mode (elisp)
  2011-02-16 23:28 ` Paul Sexton
  2011-02-17 16:50   ` Samuel Wales
@ 2011-02-18 14:34   ` Bastien
  2011-02-20 21:49     ` Paul Sexton
       [not found]     ` <4D601314.8000701@xnet.co.nz>
  1 sibling, 2 replies; 18+ messages in thread
From: Bastien @ 2011-02-18 14:34 UTC (permalink / raw)
  To: Paul Sexton; +Cc: emacs-orgmode

Hi Paul,

Paul Sexton <psexton@xnet.co.nz> writes:

> Thanks for all the suggestions. Here is version 2.

Thanks for that.  Maybe you could put if on Worg/org-hacks.org?

> I would still like to count hyperlink descriptions but am not sure
> how -- is there a function that fetches the description of the hyperlink
> at point?

#+begin_src emacs-lisp
  (when (looking-at org-bracket-link-analytic-regexp)
    (match-string-no-properties 5))
#+end_src emacs-lisp

HTH,

-- 
 Bastien

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Context-sensitive word count in org mode (elisp)
  2011-02-18 14:34   ` Bastien
@ 2011-02-20 21:49     ` Paul Sexton
  2011-02-21 23:30       ` Samuel Wales
       [not found]     ` <4D601314.8000701@xnet.co.nz>
  1 sibling, 1 reply; 18+ messages in thread
From: Paul Sexton @ 2011-02-20 21:49 UTC (permalink / raw)
  To: emacs-orgmode

Bastien <bastien.guerry <at> wikimedia.fr> writes:
> #+begin_src emacs-lisp
>   (when (looking-at org-bracket-link-analytic-regexp)
>     (match-string-no-properties 5))
> #+end_src emacs-lisp

Thanks. Here is version 3 if the function, which is now able to count 
words in link descriptions.

The code to advance to the next word has been moved to the end of the 
loop, which improves accuracy.

Paul

----------------------------------------------------------------------

(defun org-word-count (beg end
                           &optional count-latex-macro-args?
                           count-footnotes?)
  "Report the number of words in the Org mode buffer or selected region.
Ignores:
- comments
- tables
- source code blocks (#+BEGIN_SRC ... #+END_SRC, and inline blocks)
- hyperlinks (but does count words in hyperlink descriptions)
- tags, priorities, and TODO keywords in headers
- sections tagged as 'not for export'.

The text of footnote definitions is ignored, unless the optional argument
COUNT-FOOTNOTES? is non-nil.

If the optional argument COUNT-LATEX-MACRO-ARGS? is non-nil, the word count
includes LaTeX macro arguments (the material between {curly braces}).
Otherwise, and by default, every LaTeX macro counts as 1 word regardless
of its arguments."
  (interactive "r")
  (unless mark-active
    (setf beg (point-min)
	  end (point-max)))
  (let ((wc 0)
	(latex-macro-regexp "\\\\[A-Za-z]+\\(\\[[^]]*\\]\\|\\){\\([^}]*\\)}"))
    (save-excursion
      (goto-char beg)
      (while (< (point) end)
        (cond
         ;; Ignore comments.
         ((or (org-in-commented-line) (org-at-table-p))
          nil)
         ;; Ignore hyperlinks. But if link has a description, count
         ;; the words within the description.
         ((looking-at org-bracket-link-analytic-regexp)
          (when (match-string-no-properties 5)
            (let ((desc (match-string-no-properties 5)))
              (save-match-data 
                (incf wc (length (remove "" (org-split-string
                                             desc "\\W")))))))
          (goto-char (match-end 0)))
         ((looking-at org-any-link-re)
          (goto-char (match-end 0)))
         ;; Ignore source code blocks.
         ((org-in-regexps-block-p "^#\\+BEGIN_SRC\\W" "^#\\+END_SRC\\W")
          nil)
         ;; Ignore inline source blocks, counting them as 1 word.
         ((save-excursion
            (backward-char)
            (looking-at org-babel-inline-src-block-regexp))
          (goto-char (match-end 0))
          (setf wc (+ 2 wc)))
         ;; Count latex macros as 1 word, ignoring their arguments.
         ((save-excursion
            (backward-char)
            (looking-at latex-macro-regexp))
          (goto-char (if count-latex-macro-args?
                         (match-beginning 2)
                       (match-end 0)))
          (setf wc (+ 2 wc)))
         ;; Ignore footnotes.
         ((and (not count-footnotes?)
               (or (org-footnote-at-definition-p)
                   (org-footnote-at-reference-p)))
          nil)
         (t
          (let ((contexts (org-context)))
            (cond
             ;; Ignore tags and TODO keywords, etc.
             ((or (assoc :todo-keyword contexts)
                  (assoc :priority contexts)
                  (assoc :keyword contexts)
                  (assoc :checkbox contexts))
              nil)
             ;; Ignore sections marked with tags that are
             ;; excluded from export.
             ((assoc :tags contexts)
              (if (intersection (org-get-tags-at) org-export-exclude-tags
                                :test 'equal)
                  (org-forward-same-level 1)
                nil))
             (t
              (incf wc))))))
        (re-search-forward "\\w+\\W*")))
    (message (format "%d words in %s." wc
                     (if mark-active "region" "buffer")))))

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Re: Context-sensitive word count in org mode (elisp)
  2011-02-20 21:49     ` Paul Sexton
@ 2011-02-21 23:30       ` Samuel Wales
  0 siblings, 0 replies; 18+ messages in thread
From: Samuel Wales @ 2011-02-21 23:30 UTC (permalink / raw)
  To: Paul Sexton; +Cc: emacs-orgmode

Out of curiosity, has anybody compared the results that this provides
with exporting to HTML, rendering with a browser (including
emacs-w3m), and counting words that way?

-- 
The Kafka Pandemic:
http://thekafkapandemic.blogspot.com/2010/12/welcome-to-kafka-pandemic-two-forces_9182.html
I support the Whittemore-Peterson Institute (WPI)
===
I want to see the original (pre-hold) Lo et al. 2010 NIH/FDA/Harvard MLV paper.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Re: Context-sensitive word count in org mode (elisp)
       [not found]     ` <4D601314.8000701@xnet.co.nz>
@ 2011-02-22 11:28       ` Bastien
  0 siblings, 0 replies; 18+ messages in thread
From: Bastien @ 2011-02-22 11:28 UTC (permalink / raw)
  To: Paul Sexton; +Cc: emacs-orgmode

Hi Paul,

Paul Sexton <psexton@xnet.co.nz> writes:

> The process of contributing to Worg is too complicated for me. 

You can clone the Worg.git repository with 

  $ git clone git://repo.or.cz/Worg.git

Then make some edit on Worg.

Then commit your changes with
  
  $ git commit -a -m "Your message"

Then create a patch with 

  $ git format-patch master

Then send the patch.

> If someone else wants to add it somewhere on my behalf that is fine
> with me.

I added a section "Count words in an Org buffer" with your code in
Worg/org-hacks.org.  Thanks.

> Have you thought of making Worg a wiki?
> Wikis hosted on github can be written in org-mode format.

Yes, but this format is limited.  I think it's good to have all Org's
functionalities in Worg, not only the one needed for publication.  In
particular, org-issues.org contains lots of information that we want 
to handle locally, not just for publishing.

Best,

-- 
 Bastien

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Orgmode] Re: Context-sensitive word count in org mode (elisp)
  2011-02-17 18:55     ` Paul Sexton
@ 2011-03-27 19:40       ` Samuel Wales
  0 siblings, 0 replies; 18+ messages in thread
From: Samuel Wales @ 2011-03-27 19:40 UTC (permalink / raw)
  To: Paul Sexton; +Cc: emacs-orgmode

If I could I would refactor org-clock-put-overlay into a function that
puts overlays on the current headline but I am not able to do so.

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2011-03-27 19:40 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-02-16  3:51 Context-sensitive word count in org mode (elisp) Paul Sexton
2011-02-16  9:12 ` Christian Moe
2011-02-16  9:47   ` Dan Davison
2011-02-16  9:45 ` Christian Moe
2011-02-16 20:34   ` Paul Sexton
2011-02-17 10:02     ` Christian Moe
2011-02-17 18:57       ` Eric Schulte
2011-02-16 10:14 ` Bastien
2011-02-16 18:15   ` Samuel Wales
2011-02-16 13:03 ` Joost Kremers
2011-02-16 23:28 ` Paul Sexton
2011-02-17 16:50   ` Samuel Wales
2011-02-17 18:55     ` Paul Sexton
2011-03-27 19:40       ` [Orgmode] " Samuel Wales
2011-02-18 14:34   ` Bastien
2011-02-20 21:49     ` Paul Sexton
2011-02-21 23:30       ` Samuel Wales
     [not found]     ` <4D601314.8000701@xnet.co.nz>
2011-02-22 11:28       ` Bastien

Code repositories for project(s) associated with this inbox:

	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).