emacs-orgmode@gnu.org archives
 help / color / mirror / code / Atom feed
* Implemented word count for subtrees
@ 2011-04-23  8:57 Simon Guest
  2011-04-27 17:51 ` Eric S Fraga
  2011-04-28  7:53 ` Implemented word count for subtrees Sébastien Vauban
  0 siblings, 2 replies; 19+ messages in thread
From: Simon Guest @ 2011-04-23  8:57 UTC (permalink / raw)
  To: emacs-org list

Dear Org mode people,

I implemented word counting for Org mode sub-trees.  That is, count
each sub-tree, and accumulate totals into the parent heading lines.
Others have asked about this, so I attach my code below.

I started with Paul Sexton's code posted to the list on 21/2/11.  I
had some different requirements, so I hacked this mercilessly.  Sorry,
Paul.

I was most concerned about speed, so I removed any check that caused
repeated hunting around in the org mode buffer - all the contextual
stuff.

I also skip heading lines, as I didn't want them in my total.
(I'm using the wonderful Org mode to write a novel, and the heading
lines are for my organisation only, not part of the text.)

Anyway, here it is.  I use this key-binding in my .emacs.
     (define-key org-mode-map "\C-c\C-xw" 'org-wc-display)

By the way, it complains if you call it without mark being set.  I
want to use (interactive "r") to handle regions, but don't know how to
handle this error case.  Suggestions welcome.

cheers,
Simon

;; org-wc.el
;;
;; Count words in org mode trees.
;; Shows word count per heading line, summed over sub-headings.
;; Aims to be fast, so doesn't check carefully what it's counting.  ;-)
;;
;; Simon Guest, 23/4/11
;;
;; Implementation based on:
;; - Paul Sexton's word count posted on org-mode mailing list 21/2/11.
;; - clock overlays

(defun org-in-heading-line ()
  "Is point in a line starting with `*'?"
  (equal (char-after (point-at-bol)) ?*))

(defun org-word-count (beg end) 
  "Report the number of words in the Org mode buffer or selected region."
  (interactive "r")
  (unless mark-active
    (setf beg (point-min)
	  end (point-max)))
  (let ((wc (org-word-count-aux beg end)))
    (message (format "%d words in %s." wc
                     (if mark-active "region" "buffer")))))

(defun org-word-count-aux (beg end)
  "Report the number of words in the selected region.
Ignores: heading lines,
         blocks,
         comments,
         drawers.
LaTeX macros are counted as 1 word."

  (let ((wc 0)
        (block-begin-re "^#\\\+BEGIN")
        (block-end-re "^#\\+END")
	(latex-macro-regexp "\\\\[A-Za-z]+\\(\\[[^]]*\\]\\|\\){\\([^}]*\\)}")
        (drawers-re (concat "^[ \t]*:\\("
                            (mapconcat 'regexp-quote org-drawers "\\|")
                            "\\):[ \t]*$"))
        (drawers-end-re "^[ \t]*:END:"))
    (save-excursion
      (goto-char beg)
      (while (< (point) end)
        (cond
         ;; Ignore heading lines.
         ((org-in-heading-line)
          (forward-line))
         ;; Ignore blocks.
         ((looking-at block-begin-re)
          (re-search-forward block-end-re))
         ;; Ignore comments.
         ((org-in-commented-line)
          (forward-line))
         ;; Ignore drawers.
         ((looking-at drawers-re)
          (re-search-forward drawers-end-re nil t))
         ;; Count latex macros as 1 word, ignoring their arguments.
         ((save-excursion
            (backward-char)
            (looking-at latex-macro-regexp))
          (goto-char (match-end 0))
          (setf wc (+ 2 wc)))
         (t
          (progn
            (re-search-forward "\\w+\\W*")
            (incf wc))))))
    wc))

(defun org-wc-count-subtrees ()
  "Count words in each subtree, putting result as the property :org-wc on that heading."
  (interactive)
  (remove-text-properties (point-min) (point-max)
                          '(:org-wc t))
  (save-excursion
    (goto-char (point-max))
    (while (outline-previous-heading)
      (org-narrow-to-subtree)
      (let ((wc (org-word-count-aux (point-min) (point-max))))
        (put-text-property (point) (point-at-eol) :org-wc wc)
        (goto-char (point-min))
        (widen)))))

(defun org-wc-display (beg end total-only)
  "Show subtree word counts in the entire buffer.
With prefix argument, only show the total wordcount for the buffer or region
in the echo area.

Use \\[org-wc-remove-overlays] to remove the subtree times.

Ignores: heading lines,
         blocks,
         comments,
         drawers.
LaTeX macros are counted as 1 word."
  (interactive "r\nP")
  (org-wc-remove-overlays)
  (unless total-only
    (let (wc p)
      (org-wc-count-subtrees)
      (save-excursion
        (goto-char (point-min))
        (while (or (and (equal (setq p (point)) (point-min))
                        (get-text-property p :org-wc))
                   (setq p (next-single-property-change
                            (point) :org-wc)))
          (goto-char p)
          (when (setq wc (get-text-property p :org-wc))
            (org-wc-put-overlay wc (funcall outline-level))))
        ;; Arrange to remove the overlays upon next change.
        (when org-remove-highlights-with-change
          (org-add-hook 'before-change-functions 'org-wc-remove-overlays
                        nil 'local)))))
  (if mark-active
      (org-word-count beg end)
    (org-word-count (point-min) (point-max))))

(defvar org-wc-overlays nil)
(make-variable-buffer-local 'org-wc-overlays)

(defun org-wc-put-overlay (wc &optional level)
  "Put an overlays on the current line, displaying word count.
If LEVEL is given, prefix word count with a corresponding number of stars.
This creates a new overlay and stores it in `org-wc-overlays', so that it
will be easy to remove."
  (let* ((c 60)
         (l (if level (org-get-valid-level level 0) 0))
	 (off 0)
	 ov tx)
    (org-move-to-column c)
    (unless (eolp) (skip-chars-backward "^ \t"))
    (skip-chars-backward " \t")
    (setq ov (make-overlay (1- (point)) (point-at-eol))
	  tx (concat (buffer-substring (1- (point)) (point))
		     (make-string (+ off (max 0 (- c (current-column)))) ?.)
		     (org-add-props (format "%s" (number-to-string wc))
			 (list 'face 'org-wc-overlay))
		     ""))
    (if (not (featurep 'xemacs))
	(overlay-put ov 'display tx)
      (overlay-put ov 'invisible t)
      (overlay-put ov 'end-glyph (make-glyph tx)))
    (push ov org-wc-overlays)))

(defun org-wc-remove-overlays (&optional beg end noremove)
  "Remove the occur highlights from the buffer.
BEG and END are ignored.  If NOREMOVE is nil, remove this function
from the `before-change-functions' in the current buffer."
  (interactive)
  (unless org-inhibit-highlight-removal
    (mapc 'delete-overlay org-wc-overlays)
    (setq org-wc-overlays nil)
    (unless noremove
      (remove-hook 'before-change-functions
		   'org-wc-remove-overlays 'local))))

(provide 'org-wc)

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Implemented word count for subtrees
  2011-04-23  8:57 Implemented word count for subtrees Simon Guest
@ 2011-04-27 17:51 ` Eric S Fraga
  2011-04-27 22:41   ` Simon Guest
  2011-04-28  7:53 ` Implemented word count for subtrees Sébastien Vauban
  1 sibling, 1 reply; 19+ messages in thread
From: Eric S Fraga @ 2011-04-27 17:51 UTC (permalink / raw)
  To: Simon Guest; +Cc: emacs-org list

Simon Guest <simon.guest@tesujimath.org> writes:

> Dear Org mode people,
>
> I implemented word counting for Org mode sub-trees.  That is, count
> each sub-tree, and accumulate totals into the parent heading lines.
> Others have asked about this, so I attach my code below.

This is quite nice.  Thanks!

I cannot help you with the mark problem; hopefully others can.

One suggestion, however: I wonder if you could introduce the word count
in a different way?  I do use my headings and having the dots and the
actual count inserted is not great (although easy work around:
org-wc-display and then undo immediately after looking at the
counts...).  Maybe you could automatically define a tag for each
heading, something like :wcNNNN:?  Of course, this won't be good for
anybody that wishes to export documents *with* tags...

Anyway, I've incorporated your code into my emacs configuration.  Thanks
again!

-- 
: Eric S Fraga (GnuPG: 0xC89193D8FFFCF67D) in Emacs 24.0.50.1
: using Org-mode version 7.5 (release_7.5.211.gb0094)

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Implemented word count for subtrees
  2011-04-27 17:51 ` Eric S Fraga
@ 2011-04-27 22:41   ` Simon Guest
  2011-04-28  8:34     ` Eric S Fraga
  0 siblings, 1 reply; 19+ messages in thread
From: Simon Guest @ 2011-04-27 22:41 UTC (permalink / raw)
  To: Eric S Fraga; +Cc: emacs-org list

At Wed, 27 Apr 2011 18:51:06 +0100,
Eric S Fraga wrote:
> This is quite nice.  Thanks!
> 
> I cannot help you with the mark problem; hopefully others can.
> 
> One suggestion, however: I wonder if you could introduce the word count
> in a different way?  I do use my headings and having the dots and the
> actual count inserted is not great (although easy work around:
> org-wc-display and then undo immediately after looking at the
> counts...).  Maybe you could automatically define a tag for each
> heading, something like :wcNNNN:?  Of course, this won't be good for
> anybody that wishes to export documents *with* tags...
> 
> Anyway, I've incorporated your code into my emacs configuration.  Thanks
> again!

Hi Eric,

Thanks for your comments.

The counts are not actually inserted as text in the file.  It's just
an overlay, done with text properties like the clock durations (from
where I stole the code).  They don't get saved in the file, and in
fact as soon as you edit the buffer, they all disappear.  At least,
they do for me.

Does it do something different for you?  If so, it may be something to
do with the file coding system that is letting them get saved.  Is
your Org mode file a simple ASCII file, or something else?

cheers,
Simon

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Implemented word count for subtrees
  2011-04-23  8:57 Implemented word count for subtrees Simon Guest
  2011-04-27 17:51 ` Eric S Fraga
@ 2011-04-28  7:53 ` Sébastien Vauban
  2011-04-28  8:35   ` Eric S Fraga
  2011-05-02 12:19   ` Daniel Clemente
  1 sibling, 2 replies; 19+ messages in thread
From: Sébastien Vauban @ 2011-04-28  7:53 UTC (permalink / raw)
  To: emacs-orgmode-mXXj517/zsQ

Hi Simon,

Simon Guest wrote:
> Dear Org mode people,
>
> I implemented word counting for Org mode sub-trees.  That is, count
> each sub-tree, and accumulate totals into the parent heading lines.
> Others have asked about this, so I attach my code below.

Another suggestion: a variable to choose between a word-count and a
line-count?

Just thinking at that because of the Org-clone in VIM...

Best regards,
  Seb

-- 
Sébastien Vauban

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Implemented word count for subtrees
  2011-04-27 22:41   ` Simon Guest
@ 2011-04-28  8:34     ` Eric S Fraga
  2011-04-28 22:05       ` Sub-tree word count v2 Simon Guest
  0 siblings, 1 reply; 19+ messages in thread
From: Eric S Fraga @ 2011-04-28  8:34 UTC (permalink / raw)
  To: Simon Guest; +Cc: emacs-org list

Simon Guest <simon.guest@tesujimath.org> writes:

> At Wed, 27 Apr 2011 18:51:06 +0100,
> Eric S Fraga wrote:
>> This is quite nice.  Thanks!
>> 
>
>> I cannot help you with the mark problem; hopefully others can.
>> 
>> One suggestion, however: I wonder if you could introduce the word count
>> in a different way?  I do use my headings and having the dots and the
>> actual count inserted is not great (although easy work around:
>> org-wc-display and then undo immediately after looking at the
>> counts...).  Maybe you could automatically define a tag for each
>> heading, something like :wcNNNN:?  Of course, this won't be good for
>> anybody that wishes to export documents *with* tags...
>> 
>> Anyway, I've incorporated your code into my emacs configuration.  Thanks
>> again!
>
> Hi Eric,
>
> Thanks for your comments.
>
> The counts are not actually inserted as text in the file.  It's just
> an overlay, done with text properties like the clock durations (from
> where I stole the code).  They don't get saved in the file, and in
> fact as soon as you edit the buffer, they all disappear.  At least,
> they do for me.
>
> Does it do something different for you?  If so, it may be something to
> do with the file coding system that is letting them get saved.  Is
> your Org mode file a simple ASCII file, or something else?

Ah, I see that the counts do disappear as soon as I do anything.  My
confusion came about because as soon as I invoke =org-wc-display=, my
buffer becomes /modified/ which led me believe that the counts were
actually inserted into the document.    As they are not inserted, please
ignore my previous comments.

However, it would be helpful if the modification flag were not changed
by asking for the word count.  I don't know enough elisp to suggest what
to change but you should be able to add the text properties without
causing the buffer modification flag to change?  Column view, for
instance, doesn't do this.

Thanks again,
eric

-- 
: Eric S Fraga (GnuPG: 0xC89193D8FFFCF67D) in Emacs 24.0.50.1
: using Org-mode version 7.5 (release_7.5.209.g1a687)

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Implemented word count for subtrees
  2011-04-28  7:53 ` Implemented word count for subtrees Sébastien Vauban
@ 2011-04-28  8:35   ` Eric S Fraga
  2011-05-02 12:19   ` Daniel Clemente
  1 sibling, 0 replies; 19+ messages in thread
From: Eric S Fraga @ 2011-04-28  8:35 UTC (permalink / raw)
  To: Sébastien Vauban; +Cc: emacs-orgmode

Sébastien Vauban <wxhgmqzgwmuf@spammotel.com> writes:

> Hi Simon,
>
> Simon Guest wrote:
>> Dear Org mode people,
>>
>> I implemented word counting for Org mode sub-trees.  That is, count
>> each sub-tree, and accumulate totals into the parent heading lines.
>> Others have asked about this, so I attach my code below.
>
> Another suggestion: a variable to choose between a word-count and a
> line-count?

Or display both at the same time?  =NNN/MM=?
-- 
: Eric S Fraga (GnuPG: 0xC89193D8FFFCF67D) in Emacs 24.0.50.1
: using Org-mode version 7.5 (release_7.5.209.g1a687)

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Sub-tree word count v2
  2011-04-28  8:34     ` Eric S Fraga
@ 2011-04-28 22:05       ` Simon Guest
  2011-04-28 22:47         ` Samuel Wales
  2011-04-28 23:16         ` Sub-tree word count v2 Eric S Fraga
  0 siblings, 2 replies; 19+ messages in thread
From: Simon Guest @ 2011-04-28 22:05 UTC (permalink / raw)
  To: Eric S Fraga; +Cc: emacs-org list

At Thu, 28 Apr 2011 09:34:35 +0100,
Eric S Fraga wrote:
> However, it would be helpful if the modification flag were not changed
> by asking for the word count.  I don't know enough elisp to suggest what
> to change but you should be able to add the text properties without
> causing the buffer modification flag to change?  Column view, for
> instance, doesn't do this.

Hi Eric,

OK, I fixed this.  Now buffer modification state is preserved.  Also
now handles empty sections properly.

Version 2 attached below.

cheers,
Simon

;; org-wc.el
;;
;; Count words in org mode trees.
;; Shows word count per heading line, summed over sub-headings.
;; Aims to be fast, so doesn't check carefully what it's counting.  ;-)
;;
;; Simon Guest, 23/4/11
;;
;; Implementation based on:
;; - Paul Sexton's word count posted on org-mode mailing list 21/2/11.
;; - clock overlays
;;
;; v2 
;; 29/4/11
;; Don't modify buffer, and fixed handling of empty sections.

(defun org-in-heading-line ()
  "Is point in a line starting with `*'?"
  (equal (char-after (point-at-bol)) ?*))

(defun org-word-count (beg end) 
  "Report the number of words in the Org mode buffer or selected region."
  (interactive "r")
  (unless mark-active
    (setf beg (point-min)
	  end (point-max)))
  (let ((wc (org-word-count-aux beg end)))
    (message (format "%d words in %s." wc
                     (if mark-active "region" "buffer")))))

(defun org-word-count-aux (beg end)
  "Report the number of words in the selected region.
Ignores: heading lines,
         blocks,
         comments,
         drawers.
LaTeX macros are counted as 1 word."

  (let ((wc 0)
        (block-begin-re "^#\\\+BEGIN")
        (block-end-re "^#\\+END")
	(latex-macro-regexp "\\\\[A-Za-z]+\\(\\[[^]]*\\]\\|\\){\\([^}]*\\)}")
        (drawers-re (concat "^[ \t]*:\\("
                            (mapconcat 'regexp-quote org-drawers "\\|")
                            "\\):[ \t]*$"))
        (drawers-end-re "^[ \t]*:END:"))
    (save-excursion
      (goto-char beg)
      (while (< (point) end)
        (cond
         ;; Ignore heading lines.
         ((org-in-heading-line)
          (forward-line))
         ;; Ignore blocks.
         ((looking-at block-begin-re)
          (re-search-forward block-end-re))
         ;; Ignore comments.
         ((org-in-commented-line)
          (forward-line))
         ;; Ignore drawers.
         ((looking-at drawers-re)
          (re-search-forward drawers-end-re nil t))
         ;; Count latex macros as 1 word, ignoring their arguments.
         ((save-excursion
            (backward-char)
            (looking-at latex-macro-regexp))
          (goto-char (match-end 0))
          (setf wc (+ 2 wc)))
         (t
          (progn
            (and (re-search-forward "\\w+\\W*" end 'skip)
                 (incf wc)))))))
    wc))

(defun org-wc-count-subtrees ()
  "Count words in each subtree, putting result as the property :org-wc on that heading."
  (interactive)
  (remove-text-properties (point-min) (point-max)
                          '(:org-wc t))
  (save-excursion
    (goto-char (point-max))
    (while (outline-previous-heading)
      (org-narrow-to-subtree)
      (let ((wc (org-word-count-aux (point-min) (point-max))))
        (put-text-property (point) (point-at-eol) :org-wc wc)
        (goto-char (point-min))
        (widen)))))

(defun org-wc-display (beg end total-only)
  "Show subtree word counts in the entire buffer.
With prefix argument, only show the total wordcount for the buffer or region
in the echo area.

Use \\[org-wc-remove-overlays] to remove the subtree times.

Ignores: heading lines,
         blocks,
         comments,
         drawers.
LaTeX macros are counted as 1 word."
  (interactive "r\nP")
  (org-wc-remove-overlays)
  (unless total-only
    (let ((bmp (buffer-modified-p))
          wc 
          p)
      (org-wc-count-subtrees)
      (save-excursion
        (goto-char (point-min))
        (while (or (and (equal (setq p (point)) (point-min))
                        (get-text-property p :org-wc))
                   (setq p (next-single-property-change
                            (point) :org-wc)))
          (goto-char p)
          (when (setq wc (get-text-property p :org-wc))
            (org-wc-put-overlay wc (funcall outline-level))))
        ;; Arrange to remove the overlays upon next change.
        (when org-remove-highlights-with-change
          (org-add-hook 'before-change-functions 'org-wc-remove-overlays
                        nil 'local)))
    (set-buffer-modified-p bmp)))
  (if mark-active
      (org-word-count beg end)
    (org-word-count (point-min) (point-max))))

(defvar org-wc-overlays nil)
(make-variable-buffer-local 'org-wc-overlays)

(defun org-wc-put-overlay (wc &optional level)
  "Put an overlays on the current line, displaying word count.
If LEVEL is given, prefix word count with a corresponding number of stars.
This creates a new overlay and stores it in `org-wc-overlays', so that it
will be easy to remove."
  (let* ((c 60)
         (l (if level (org-get-valid-level level 0) 0))
	 (off 0)
	 ov tx)
    (org-move-to-column c)
    (unless (eolp) (skip-chars-backward "^ \t"))
    (skip-chars-backward " \t")
    (setq ov (make-overlay (1- (point)) (point-at-eol))
	  tx (concat (buffer-substring (1- (point)) (point))
		     (make-string (+ off (max 0 (- c (current-column)))) ?.)
		     (org-add-props (format "%s" (number-to-string wc))
			 (list 'face 'org-wc-overlay))
		     ""))
    (if (not (featurep 'xemacs))
	(overlay-put ov 'display tx)
      (overlay-put ov 'invisible t)
      (overlay-put ov 'end-glyph (make-glyph tx)))
    (push ov org-wc-overlays)))

(defun org-wc-remove-overlays (&optional beg end noremove)
  "Remove the occur highlights from the buffer.
BEG and END are ignored.  If NOREMOVE is nil, remove this function
from the `before-change-functions' in the current buffer."
  (interactive)
  (unless org-inhibit-highlight-removal
    (mapc 'delete-overlay org-wc-overlays)
    (setq org-wc-overlays nil)
    (unless noremove
      (remove-hook 'before-change-functions
		   'org-wc-remove-overlays 'local))))

(provide 'org-wc)

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Sub-tree word count v2
  2011-04-28 22:05       ` Sub-tree word count v2 Simon Guest
@ 2011-04-28 22:47         ` Samuel Wales
  2011-04-29  1:57           ` v3, with support for narrowing Simon Guest
  2011-04-28 23:16         ` Sub-tree word count v2 Eric S Fraga
  1 sibling, 1 reply; 19+ messages in thread
From: Samuel Wales @ 2011-04-28 22:47 UTC (permalink / raw)
  To: Simon Guest; +Cc: emacs-org list

This looks great, Simon.

Is it possible to make it so that you can show the overlays for just a
subtree or region instead of the entire buffer?

Also, I have a plugin-compatible backend that will get you the /exact/
word count (uses w3m).  If you're interested you can have it be an
alternate backend.

Samuel

-- 
The Kafka Pandemic:
  http://thekafkapandemic.blogspot.com/2010/12/welcome-to-kafka-pandemic-two-forces_9182.html
I support the Whittemore-Peterson Institute (WPI)
===
I want to see the original (pre-hold) Lo et al. 2010 NIH/FDA/Harvard MRV paper.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Sub-tree word count v2
  2011-04-28 22:05       ` Sub-tree word count v2 Simon Guest
  2011-04-28 22:47         ` Samuel Wales
@ 2011-04-28 23:16         ` Eric S Fraga
  1 sibling, 0 replies; 19+ messages in thread
From: Eric S Fraga @ 2011-04-28 23:16 UTC (permalink / raw)
  To: Simon Guest; +Cc: emacs-org list

Simon Guest <simon.guest@tesujimath.org> writes:

> At Thu, 28 Apr 2011 09:34:35 +0100,
> Eric S Fraga wrote:
>> However, it would be helpful if the modification flag were not changed
>> by asking for the word count.  I don't know enough elisp to suggest what
>
>> to change but you should be able to add the text properties without
>> causing the buffer modification flag to change?  Column view, for
>> instance, doesn't do this.
>
> Hi Eric,
>
> OK, I fixed this.  Now buffer modification state is preserved.  Also
> now handles empty sections properly.
>
> Version 2 attached below.

Excellent.  Works very well indeed.  Thanks!
-- 
: Eric S Fraga (GnuPG: 0xC89193D8FFFCF67D) in Emacs 24.0.50.1
: using Org-mode version 7.5 (release_7.5.223.g4a86.dirty)

^ permalink raw reply	[flat|nested] 19+ messages in thread

* v3, with support for narrowing
  2011-04-28 22:47         ` Samuel Wales
@ 2011-04-29  1:57           ` Simon Guest
  2011-04-29  2:41             ` Eric Abrahamsen
  2011-05-14 22:48             ` v3, with support for narrowing Samuel Wales
  0 siblings, 2 replies; 19+ messages in thread
From: Simon Guest @ 2011-04-29  1:57 UTC (permalink / raw)
  To: Samuel Wales; +Cc: emacs-org list

At Thu, 28 Apr 2011 15:47:07 -0700,
Samuel Wales wrote:
> Is it possible to make it so that you can show the overlays for just a
> subtree or region instead of the entire buffer?

Hi Samuel,

Good idea!  So I just fixed it to handle narrowing properly, so narrow
to your region or subtree of interest first, and then count as usual.

Attached v3 which does this.  I'm not normally this responsive, you
just caught me at a good time.  ;-)

> Also, I have a plugin-compatible backend that will get you the /exact/
> word count (uses w3m).  If you're interested you can have it be an
> alternate backend.

I'm not that interested in higher fidelity counting.  I'm especially
interested in speed.

cheers,
Simon

;; org-wc.el
;;
;; Count words in org mode trees.
;; Shows word count per heading line, summed over sub-headings.
;; Aims to be fast, so doesn't check carefully what it's counting.  ;-)
;;
;; Simon Guest, 23/4/11
;;
;; Implementation based on:
;; - Paul Sexton's word count posted on org-mode mailing list 21/2/11.
;; - clock overlays
;;
;; v2 
;; 29/4/11
;; Don't modify buffer, and fixed handling of empty sections.
;;
;; v3
;; 29/4/11
;; Handle narrowing correctly, so partial word count works on narrowed regions.
(defun org-in-heading-line ()
  "Is point in a line starting with `*'?"
  (equal (char-after (point-at-bol)) ?*))

(defun org-word-count (beg end) 
  "Report the number of words in the Org mode buffer or selected region."
  (interactive "r")
  (unless mark-active
    (setf beg (point-min)
	  end (point-max)))
  (let ((wc (org-word-count-aux beg end)))
    (message (format "%d words in %s." wc
                     (if mark-active "region" "buffer")))))

(defun org-word-count-aux (beg end)
  "Report the number of words in the selected region.
Ignores: heading lines,
         blocks,
         comments,
         drawers.
LaTeX macros are counted as 1 word."

  (let ((wc 0)
        (block-begin-re "^#\\\+BEGIN")
        (block-end-re "^#\\+END")
	(latex-macro-regexp "\\\\[A-Za-z]+\\(\\[[^]]*\\]\\|\\){\\([^}]*\\)}")
        (drawers-re (concat "^[ \t]*:\\("
                            (mapconcat 'regexp-quote org-drawers "\\|")
                            "\\):[ \t]*$"))
        (drawers-end-re "^[ \t]*:END:"))
    (save-excursion
      (goto-char beg)
      (while (< (point) end)
        (cond
         ;; Ignore heading lines.
         ((org-in-heading-line)
          (forward-line))
         ;; Ignore blocks.
         ((looking-at block-begin-re)
          (re-search-forward block-end-re))
         ;; Ignore comments.
         ((org-in-commented-line)
          (forward-line))
         ;; Ignore drawers.
         ((looking-at drawers-re)
          (re-search-forward drawers-end-re nil t))
         ;; Count latex macros as 1 word, ignoring their arguments.
         ((save-excursion
            (backward-char)
            (looking-at latex-macro-regexp))
          (goto-char (match-end 0))
          (setf wc (+ 2 wc)))
         (t
          (progn
            (and (re-search-forward "\\w+\\W*" end 'skip)
                 (incf wc)))))))
    wc))

(defun org-wc-count-subtrees ()
  "Count words in each subtree, putting result as the property :org-wc on that heading."
  (interactive)
  (remove-text-properties (point-min) (point-max)
                          '(:org-wc t))
  (save-excursion
    (goto-char (point-max))
    (while (outline-previous-heading)
      (save-restriction
        (org-narrow-to-subtree)
        (let ((wc (org-word-count-aux (point-min) (point-max))))
          (put-text-property (point) (point-at-eol) :org-wc wc)
          (goto-char (point-min)))))))

(defun org-wc-display (beg end total-only)
  "Show subtree word counts in the entire buffer.
With prefix argument, only show the total wordcount for the buffer or region
in the echo area.

Use \\[org-wc-remove-overlays] to remove the subtree times.

Ignores: heading lines,
         blocks,
         comments,
         drawers.
LaTeX macros are counted as 1 word."
  (interactive "r\nP")
  (org-wc-remove-overlays)
  (unless total-only
    (let ((bmp (buffer-modified-p))
          wc 
          p)
      (org-wc-count-subtrees)
      (save-excursion
        (goto-char (point-min))
        (while (or (and (equal (setq p (point)) (point-min))
                        (get-text-property p :org-wc))
                   (setq p (next-single-property-change
                            (point) :org-wc)))
          (goto-char p)
          (when (setq wc (get-text-property p :org-wc))
            (org-wc-put-overlay wc (funcall outline-level))))
        ;; Arrange to remove the overlays upon next change.
        (when org-remove-highlights-with-change
          (org-add-hook 'before-change-functions 'org-wc-remove-overlays
                        nil 'local)))
    (set-buffer-modified-p bmp)))
  (if mark-active
      (org-word-count beg end)
    (org-word-count (point-min) (point-max))))

(defvar org-wc-overlays nil)
(make-variable-buffer-local 'org-wc-overlays)

(defun org-wc-put-overlay (wc &optional level)
  "Put an overlays on the current line, displaying word count.
If LEVEL is given, prefix word count with a corresponding number of stars.
This creates a new overlay and stores it in `org-wc-overlays', so that it
will be easy to remove."
  (let* ((c 60)
         (l (if level (org-get-valid-level level 0) 0))
	 (off 0)
	 ov tx)
    (org-move-to-column c)
    (unless (eolp) (skip-chars-backward "^ \t"))
    (skip-chars-backward " \t")
    (setq ov (make-overlay (1- (point)) (point-at-eol))
	  tx (concat (buffer-substring (1- (point)) (point))
		     (make-string (+ off (max 0 (- c (current-column)))) ?.)
		     (org-add-props (format "%s" (number-to-string wc))
			 (list 'face 'org-wc-overlay))
		     ""))
    (if (not (featurep 'xemacs))
	(overlay-put ov 'display tx)
      (overlay-put ov 'invisible t)
      (overlay-put ov 'end-glyph (make-glyph tx)))
    (push ov org-wc-overlays)))

(defun org-wc-remove-overlays (&optional beg end noremove)
  "Remove the occur highlights from the buffer.
BEG and END are ignored.  If NOREMOVE is nil, remove this function
from the `before-change-functions' in the current buffer."
  (interactive)
  (unless org-inhibit-highlight-removal
    (mapc 'delete-overlay org-wc-overlays)
    (setq org-wc-overlays nil)
    (unless noremove
      (remove-hook 'before-change-functions
		   'org-wc-remove-overlays 'local))))

(provide 'org-wc)

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: v3, with support for narrowing
  2011-04-29  1:57           ` v3, with support for narrowing Simon Guest
@ 2011-04-29  2:41             ` Eric Abrahamsen
  2011-04-29  3:35               ` Nick Dokos
  2011-05-14 22:48             ` v3, with support for narrowing Samuel Wales
  1 sibling, 1 reply; 19+ messages in thread
From: Eric Abrahamsen @ 2011-04-29  2:41 UTC (permalink / raw)
  To: emacs-orgmode

On Fri, Apr 29 2011, Simon Guest wrote:

> At Thu, 28 Apr 2011 15:47:07 -0700,
> Samuel Wales wrote:
>> Is it possible to make it so that you can show the overlays for just a
>> subtree or region instead of the entire buffer?
>
> Hi Samuel,
>
> Good idea!  So I just fixed it to handle narrowing properly, so narrow
> to your region or subtree of interest first, and then count as usual.
>
> Attached v3 which does this.  I'm not normally this responsive, you
> just caught me at a good time.  ;-)

Is this still a good time? Because what I'd really like is
inclusion/exclusion tags, so that I can mark some subtrees to count, and
others to ignore. I tried doing this last night, based on other parts of
the export code, and my elisp failed. Perhaps just exclusion tags?
Here's hoping you still have a bit of time/interest for improvements!

(I'm translating a novel, not writing one, and want to keep the original
text out of the count. Plus I've got a bunch of research/notes sections
that should be excluded.)

If you're out of time, I'll have another whack at it, and maybe bring my
broken efforts to the list.

Thanks,

Eric

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: v3, with support for narrowing
  2011-04-29  2:41             ` Eric Abrahamsen
@ 2011-04-29  3:35               ` Nick Dokos
  2011-04-29  4:31                 ` Eric Abrahamsen
  0 siblings, 1 reply; 19+ messages in thread
From: Nick Dokos @ 2011-04-29  3:35 UTC (permalink / raw)
  To: Eric Abrahamsen; +Cc: nicholas.dokos, emacs-orgmode

Eric Abrahamsen <eric@ericabrahamsen.net> wrote:

> On Fri, Apr 29 2011, Simon Guest wrote:
> 
> > At Thu, 28 Apr 2011 15:47:07 -0700,
> > Samuel Wales wrote:
> >> Is it possible to make it so that you can show the overlays for just a
> >> subtree or region instead of the entire buffer?
> >
> > Hi Samuel,
> >
> > Good idea!  So I just fixed it to handle narrowing properly, so narrow
> > to your region or subtree of interest first, and then count as usual.
> >
> > Attached v3 which does this.  I'm not normally this responsive, you
> > just caught me at a good time.  ;-)
> 
> Is this still a good time? Because what I'd really like is
> inclusion/exclusion tags, so that I can mark some subtrees to count, and
> others to ignore. I tried doing this last night, based on other parts of
> the export code, and my elisp failed. Perhaps just exclusion tags?
> Here's hoping you still have a bit of time/interest for improvements!
> 
> (I'm translating a novel, not writing one, and want to keep the original
> text out of the count. Plus I've got a bunch of research/notes sections
> that should be excluded.)
> 
> If you're out of time, I'll have another whack at it, and maybe bring my
> broken efforts to the list.
> 

The mapping API allows you to walk through the entries, filter the ones
you want and apply a function on each remaining entry. So cribbing heavily
from Simon's code, the following should count all the entries with tag "foo":

--8<---------------cut here---------------start------------->8---
(defun show-count ()
  (let ((p (point))
        wc)
    (when (setq wc (get-text-property p :org-wc))
      (org-wc-put-overlay wc (funcall outline-level)))
    (when org-remove-highlights-with-change
      (org-add-hook 'before-change-functions 'org-wc-remove-overlays
                    nil 'local))))

(defun count-foo ()
  (interactive)
  (let (bmp (buffer-modified-p))
    (org-wc-count-subtrees)
    (org-map-entries 'show-count "+foo" 'file)
    (set-buffer-modified-p bmp)))
--8<---------------cut here---------------end--------------->8---

Nick

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: v3, with support for narrowing
  2011-04-29  3:35               ` Nick Dokos
@ 2011-04-29  4:31                 ` Eric Abrahamsen
  2011-04-29 14:56                   ` Nick Dokos
  0 siblings, 1 reply; 19+ messages in thread
From: Eric Abrahamsen @ 2011-04-29  4:31 UTC (permalink / raw)
  To: emacs-orgmode

On Fri, Apr 29 2011, Nick Dokos wrote:

> Eric Abrahamsen <eric@ericabrahamsen.net> wrote:
>
>> On Fri, Apr 29 2011, Simon Guest wrote:
>> 
>> > At Thu, 28 Apr 2011 15:47:07 -0700,
>> > Samuel Wales wrote:
>> >> Is it possible to make it so that you can show the overlays for just a
>> >> subtree or region instead of the entire buffer?
>> >
>> > Hi Samuel,
>> >
>> > Good idea!  So I just fixed it to handle narrowing properly, so narrow
>> > to your region or subtree of interest first, and then count as usual.
>> >
>> > Attached v3 which does this.  I'm not normally this responsive, you
>> > just caught me at a good time.  ;-)
>> 
>> Is this still a good time? Because what I'd really like is
>> inclusion/exclusion tags, so that I can mark some subtrees to count, and
>> others to ignore. I tried doing this last night, based on other parts of
>> the export code, and my elisp failed. Perhaps just exclusion tags?
>> Here's hoping you still have a bit of time/interest for improvements!
>> 
>> (I'm translating a novel, not writing one, and want to keep the original
>> text out of the count. Plus I've got a bunch of research/notes sections
>> that should be excluded.)
>> 
>> If you're out of time, I'll have another whack at it, and maybe bring my
>> broken efforts to the list.
>> 
>
> The mapping API allows you to walk through the entries, filter the ones
> you want and apply a function on each remaining entry. So cribbing heavily
> from Simon's code, the following should count all the entries with tag "foo":

Nice! Seems like the mapping API could be used as the basis for this
whole functionality. Not that I'm quite volunteering yet, I'm still
working out how to make all this work…

Thanks,
Eric

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: v3, with support for narrowing
  2011-04-29  4:31                 ` Eric Abrahamsen
@ 2011-04-29 14:56                   ` Nick Dokos
  2011-04-29 21:42                     ` Simon Guest
  0 siblings, 1 reply; 19+ messages in thread
From: Nick Dokos @ 2011-04-29 14:56 UTC (permalink / raw)
  To: Eric Abrahamsen; +Cc: nicholas.dokos, emacs-orgmode

Eric Abrahamsen <eric@ericabrahamsen.net> wrote:


> > The mapping API allows you to walk through the entries, filter the ones
> > you want and apply a function on each remaining entry. So cribbing heavily
> > from Simon's code, the following should count all the entries with tag "foo":
> 
> Nice! Seems like the mapping API could be used as the basis for this
> whole functionality.
> 

Indeed: it would require a bit of refactoring of Simon's code to provide the
function(s) to apply to each entry, and changes to the top level functions to
use the mapping API instead of looping explicitly.

Nick

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: v3, with support for narrowing
  2011-04-29 14:56                   ` Nick Dokos
@ 2011-04-29 21:42                     ` Simon Guest
  2011-06-09  1:33                       ` v4, now with properties and inclusion tags Eric Abrahamsen
  0 siblings, 1 reply; 19+ messages in thread
From: Simon Guest @ 2011-04-29 21:42 UTC (permalink / raw)
  To: emacs-orgmode

At Fri, 29 Apr 2011 10:56:51 -0400,
Nick Dokos wrote:
> Indeed: it would require a bit of refactoring of Simon's code to provide the
> function(s) to apply to each entry, and changes to the top level functions to
> use the mapping API instead of looping explicitly.

That sounds like a good idea.  I may at some stage want to exclude
counting certain trees, and then I may have another look at this if
someone else hasn't already done it.

But for now, time pressure dictates I stop hacking on my word count
function.

cheers,
Simon

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Implemented word count for subtrees
  2011-04-28  7:53 ` Implemented word count for subtrees Sébastien Vauban
  2011-04-28  8:35   ` Eric S Fraga
@ 2011-05-02 12:19   ` Daniel Clemente
  1 sibling, 0 replies; 19+ messages in thread
From: Daniel Clemente @ 2011-05-02 12:19 UTC (permalink / raw)
  To: Sébastien Vauban; +Cc: emacs-orgmode

El Thu, 28 Apr 2011 09:53:17 +0200 Sébastien Vauban va escriure:
> 
> Another suggestion: a variable to choose between a word-count and a
> line-count?
> 

Or page-count. With a rough estimator that can predict how many pages each section would take. (Maybe even invoking LaTeX?)

This feature is the one I most missed when I had to write a thesis where each section had to have a known number of pages (e.g. introduction 1 page, chapter 1 10 pages, … total 60 pages).

I see it as useful enough to go in /contrib/

Many thanks!

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: v3, with support for narrowing
  2011-04-29  1:57           ` v3, with support for narrowing Simon Guest
  2011-04-29  2:41             ` Eric Abrahamsen
@ 2011-05-14 22:48             ` Samuel Wales
  2011-05-15  4:25               ` Simon Guest
  1 sibling, 1 reply; 19+ messages in thread
From: Samuel Wales @ 2011-05-14 22:48 UTC (permalink / raw)
  To: Simon Guest; +Cc: emacs-org list

Hi Simon,

On 2011-04-28, Simon Guest <simon.guest@tesujimath.org> wrote:
> I'm not that interested in higher fidelity counting.  I'm especially
> interested in speed.

What I meant was, if desired you can make the call to the word count
function be (funcall variable) to allow substitution of that function.

(Actually I'm finding that using w3m to count exact words is pretty fast.)

Samuel

-- 
The Kafka Pandemic:
  http://thekafkapandemic.blogspot.com/2010/12/welcome-to-kafka-pandemic-two-forces_9182.html
I support the Whittemore-Peterson Institute (WPI)
===
I want to see the original (pre-hold) Lo et al. 2010 NIH/FDA/Harvard MRV paper.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: v3, with support for narrowing
  2011-05-14 22:48             ` v3, with support for narrowing Samuel Wales
@ 2011-05-15  4:25               ` Simon Guest
  0 siblings, 0 replies; 19+ messages in thread
From: Simon Guest @ 2011-05-15  4:25 UTC (permalink / raw)
  To: Samuel Wales; +Cc: emacs-org list

At Sat, 14 May 2011 15:48:10 -0700,
Samuel Wales wrote:
> What I meant was, if desired you can make the call to the word count
> function be (funcall variable) to allow substitution of that function.
> 
> (Actually I'm finding that using w3m to count exact words is pretty fast.)
 
Hi Samuel,

OK, I see now.  That's clearly a good idea.

Next time I fiddle with this code (don't know when that might be),
I'll try to remember that.  Otherwise anyone else please feel free to
do what is necessary.

cheers,
Simon

^ permalink raw reply	[flat|nested] 19+ messages in thread

* v4, now with properties and inclusion tags
  2011-04-29 21:42                     ` Simon Guest
@ 2011-06-09  1:33                       ` Eric Abrahamsen
  0 siblings, 0 replies; 19+ messages in thread
From: Eric Abrahamsen @ 2011-06-09  1:33 UTC (permalink / raw)
  To: Simon Guest; +Cc: emacs-orgmode

[-- Attachment #1: Type: text/plain, Size: 2422 bytes --]

Simon Guest <simon.guest@tesujimath.org> writes:

> At Fri, 29 Apr 2011 10:56:51 -0400,
> Nick Dokos wrote:
>> Indeed: it would require a bit of refactoring of Simon's code to provide the
>> function(s) to apply to each entry, and changes to the top level functions to
>> use the mapping API instead of looping explicitly.
>
> That sounds like a good idea.  I may at some stage want to exclude
> counting certain trees, and then I may have another look at this if
> someone else hasn't already done it.
>
> But for now, time pressure dictates I stop hacking on my word count
> function.
>
> cheers,
> Simon

I'm afraid this is a bit of a two-steps-forward, one-step-back
situation, but I've rejiggered Simon's code so that it now:

1. Uses the mapping and property APIs
2. Allows selection of subtrees for count via a tag
3. Sets wordcount totals for each subtree as a property, instead of an
   overlay

First of all, this requires the fix to
=org-end-of-meta-data-and-drawers= that I sent (and then re-sent) to
this list earlier today. Otherwise it will "work funny".

I changed it to use properties instead of overlays because I wanted
something that was persistent, and available for programmatic
manipulation. Plus, you can get an overlay effect with column view.

So right now M-x org-word-count will do the following:

1. Add the subtree word count as a property (=org-wc-prop-name=) to each
   headline in the buffer
2. Respect the region, if it's active
3. Operate only on trees tagged with =org-wc-include-tag=, if that tag
   is present
4. Report a buffer/region word count total in the minibuffer
5. With a prefix arg, *only* give a minibuffer report, don't set
   properties

The two variables =org-wc-include-tag= and =org-wc-prop-name= are buffer
local, unless I've misunderstood how buffer local works and screwed it
up.

There's a helper function, =org-wc-remove-all-props= that can be used to
remove the =org-wc-prop-name= property from all headlines. Does anyone
else think that =org-entry-delete= should remove the whole drawer if
there are no other properties left?

This is very much a proposal, and I've got a bit of time to work on it,
so I'm willing to field requests, though my elisp is bad. Two immediate
possibilities would be: automatically excluding subtrees tagged
"noexport", and using =org-context= to be cleverer about what to avoid.

Further suggestions (and code fixes) welcome!

Eric


[-- Attachment #2: org-wc.el --]
[-- Type: application/emacs-lisp, Size: 3125 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2011-06-09  1:34 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-04-23  8:57 Implemented word count for subtrees Simon Guest
2011-04-27 17:51 ` Eric S Fraga
2011-04-27 22:41   ` Simon Guest
2011-04-28  8:34     ` Eric S Fraga
2011-04-28 22:05       ` Sub-tree word count v2 Simon Guest
2011-04-28 22:47         ` Samuel Wales
2011-04-29  1:57           ` v3, with support for narrowing Simon Guest
2011-04-29  2:41             ` Eric Abrahamsen
2011-04-29  3:35               ` Nick Dokos
2011-04-29  4:31                 ` Eric Abrahamsen
2011-04-29 14:56                   ` Nick Dokos
2011-04-29 21:42                     ` Simon Guest
2011-06-09  1:33                       ` v4, now with properties and inclusion tags Eric Abrahamsen
2011-05-14 22:48             ` v3, with support for narrowing Samuel Wales
2011-05-15  4:25               ` Simon Guest
2011-04-28 23:16         ` Sub-tree word count v2 Eric S Fraga
2011-04-28  7:53 ` Implemented word count for subtrees Sébastien Vauban
2011-04-28  8:35   ` Eric S Fraga
2011-05-02 12:19   ` Daniel Clemente

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).