emacs-orgmode@gnu.org archives
 help / color / mirror / code / Atom feed
From: Paul Sexton <psexton@xnet.co.nz>
To: emacs-orgmode@gnu.org
Subject: Re: Context-sensitive word count in org mode (elisp)
Date: Sun, 20 Feb 2011 21:49:16 +0000 (UTC)	[thread overview]
Message-ID: <loom.20110220T224023-532@post.gmane.org> (raw)
In-Reply-To: <87zkptbee7.fsf@gnu.org>

Bastien <bastien.guerry <at> wikimedia.fr> writes:
> #+begin_src emacs-lisp
>   (when (looking-at org-bracket-link-analytic-regexp)
>     (match-string-no-properties 5))
> #+end_src emacs-lisp

Thanks. Here is version 3 if the function, which is now able to count 
words in link descriptions.

The code to advance to the next word has been moved to the end of the 
loop, which improves accuracy.

Paul

----------------------------------------------------------------------

(defun org-word-count (beg end
                           &optional count-latex-macro-args?
                           count-footnotes?)
  "Report the number of words in the Org mode buffer or selected region.
Ignores:
- comments
- tables
- source code blocks (#+BEGIN_SRC ... #+END_SRC, and inline blocks)
- hyperlinks (but does count words in hyperlink descriptions)
- tags, priorities, and TODO keywords in headers
- sections tagged as 'not for export'.

The text of footnote definitions is ignored, unless the optional argument
COUNT-FOOTNOTES? is non-nil.

If the optional argument COUNT-LATEX-MACRO-ARGS? is non-nil, the word count
includes LaTeX macro arguments (the material between {curly braces}).
Otherwise, and by default, every LaTeX macro counts as 1 word regardless
of its arguments."
  (interactive "r")
  (unless mark-active
    (setf beg (point-min)
	  end (point-max)))
  (let ((wc 0)
	(latex-macro-regexp "\\\\[A-Za-z]+\\(\\[[^]]*\\]\\|\\){\\([^}]*\\)}"))
    (save-excursion
      (goto-char beg)
      (while (< (point) end)
        (cond
         ;; Ignore comments.
         ((or (org-in-commented-line) (org-at-table-p))
          nil)
         ;; Ignore hyperlinks. But if link has a description, count
         ;; the words within the description.
         ((looking-at org-bracket-link-analytic-regexp)
          (when (match-string-no-properties 5)
            (let ((desc (match-string-no-properties 5)))
              (save-match-data 
                (incf wc (length (remove "" (org-split-string
                                             desc "\\W")))))))
          (goto-char (match-end 0)))
         ((looking-at org-any-link-re)
          (goto-char (match-end 0)))
         ;; Ignore source code blocks.
         ((org-in-regexps-block-p "^#\\+BEGIN_SRC\\W" "^#\\+END_SRC\\W")
          nil)
         ;; Ignore inline source blocks, counting them as 1 word.
         ((save-excursion
            (backward-char)
            (looking-at org-babel-inline-src-block-regexp))
          (goto-char (match-end 0))
          (setf wc (+ 2 wc)))
         ;; Count latex macros as 1 word, ignoring their arguments.
         ((save-excursion
            (backward-char)
            (looking-at latex-macro-regexp))
          (goto-char (if count-latex-macro-args?
                         (match-beginning 2)
                       (match-end 0)))
          (setf wc (+ 2 wc)))
         ;; Ignore footnotes.
         ((and (not count-footnotes?)
               (or (org-footnote-at-definition-p)
                   (org-footnote-at-reference-p)))
          nil)
         (t
          (let ((contexts (org-context)))
            (cond
             ;; Ignore tags and TODO keywords, etc.
             ((or (assoc :todo-keyword contexts)
                  (assoc :priority contexts)
                  (assoc :keyword contexts)
                  (assoc :checkbox contexts))
              nil)
             ;; Ignore sections marked with tags that are
             ;; excluded from export.
             ((assoc :tags contexts)
              (if (intersection (org-get-tags-at) org-export-exclude-tags
                                :test 'equal)
                  (org-forward-same-level 1)
                nil))
             (t
              (incf wc))))))
        (re-search-forward "\\w+\\W*")))
    (message (format "%d words in %s." wc
                     (if mark-active "region" "buffer")))))

  reply	other threads:[~2011-02-20 21:49 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-02-16  3:51 Paul Sexton
2011-02-16  9:12 ` Christian Moe
2011-02-16  9:47   ` Dan Davison
2011-02-16  9:45 ` Christian Moe
2011-02-16 20:34   ` Paul Sexton
2011-02-17 10:02     ` Christian Moe
2011-02-17 18:57       ` Eric Schulte
2011-02-16 10:14 ` Bastien
2011-02-16 18:15   ` Samuel Wales
2011-02-16 13:03 ` Joost Kremers
2011-02-16 23:28 ` Paul Sexton
2011-02-17 16:50   ` Samuel Wales
2011-02-17 18:55     ` Paul Sexton
2011-03-27 19:40       ` [Orgmode] " Samuel Wales
2011-02-18 14:34   ` Bastien
2011-02-20 21:49     ` Paul Sexton [this message]
2011-02-21 23:30       ` Samuel Wales
     [not found]     ` <4D601314.8000701@xnet.co.nz>
2011-02-22 11:28       ` Bastien
2011-02-16 16:22 Benjamin Beckwith
2011-02-16 23:31 ` Paul Sexton

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.orgmode.org/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=loom.20110220T224023-532@post.gmane.org \
    --to=psexton@xnet.co.nz \
    --cc=emacs-orgmode@gnu.org \
    --subject='Re: Context-sensitive word count in org mode (elisp)' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Code repositories for project(s) associated with this inbox:

	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).