From mboxrd@z Thu Jan 1 00:00:00 1970 From: Paul Sexton Subject: Context-sensitive word count in org mode (elisp) Date: Wed, 16 Feb 2011 03:51:23 +0000 (UTC) Message-ID: Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Return-path: Received: from [140.186.70.92] (port=42125 helo=eggs.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1PpYQU-0007p9-VY for emacs-orgmode@gnu.org; Tue, 15 Feb 2011 22:51:39 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1PpYQT-0000jm-RL for emacs-orgmode@gnu.org; Tue, 15 Feb 2011 22:51:38 -0500 Received: from lo.gmane.org ([80.91.229.12]:45336) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1PpYQT-0000ji-F8 for emacs-orgmode@gnu.org; Tue, 15 Feb 2011 22:51:37 -0500 Received: from list by lo.gmane.org with local (Exim 4.69) (envelope-from ) id 1PpYQQ-0003Sx-LQ for emacs-orgmode@gnu.org; Wed, 16 Feb 2011 04:51:35 +0100 Received: from rp.young.med.auckland.ac.nz ([130.216.140.20]) by main.gmane.org with esmtp (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Wed, 16 Feb 2011 04:51:34 +0100 Received: from psexton by rp.young.med.auckland.ac.nz with local (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Wed, 16 Feb 2011 04:51:34 +0100 List-Id: "General discussions about Org-mode." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: emacs-orgmode-bounces+geo-emacs-orgmode=m.gmane.org@gnu.org Errors-To: emacs-orgmode-bounces+geo-emacs-orgmode=m.gmane.org@gnu.org To: emacs-orgmode@gnu.org I am trying to reduce the word count in a document I am writing. The existing word count functionality for emacs is surprisingly lacking. I wanted a word count function for org mode which excluded tables and comments, and ended up writing one myself. If this function is called with a region highlighted, it counts the words in the region. Otherwise it counts words in the whole buffer. It ignores commented lines and tables. LaTeX-style macros such as \foo{bar,baz} are counted as 1 word, as a compromise (more often than not they should count as 0, but they do sometimes expand to 1 or more words in the final document). Limitations: - Does not ignore BEGIN_SRC/END_SRC or inline src_* blocks (babel). Should be easy enough to add however. - There is probably a better way of identifying latex macros than my 'latex-macro-regexp' below. - Ignores all org links. I couldn't figure out how to extract "description" text from links, but I didn't look very hard. Improvements welcome. Paul ------------------------------------------------------------------------ (defun in-comment-p () "Return non-nil if point is in a comment." (if (or (null comment-start-skip) (eq (preceding-char) ?\r)) nil (save-excursion (let ((pos (point))) (re-search-backward "^\\|\r" nil t) (or (looking-at comment-start-skip) (re-search-forward comment-start-skip pos t)))))) (defun in-org-table-p () "Return non-nil if point is in an org-mode table." (if (or (not (boundp 'org-table-any-line-regexp)) (null org-table-any-line-regexp) (eq (preceding-char) ?\r)) nil (save-excursion (let ((pos (point))) (re-search-backward "^\\|\r" nil t) (looking-at org-table-any-line-regexp))))) (defvar latex-macro-regexp "\\\\[A-Za-z]+\\(\\[[^]]*\\]\\|\\){\\([^}]*\\)}") (defun org-word-count (beg end) (interactive "r") (unless mark-active (setf beg (point-min) end (point-max))) (let ((wc 0)) (save-excursion (goto-char beg) (while (< (point) end) (re-search-forward "\\w+\\W*") (cond ((or (in-comment-p) (in-org-table-p)) nil) ((looking-at org-any-link-re) (goto-char (match-end 0))) ((save-excursion (backward-char) (looking-at latex-macro-regexp)) (goto-char (match-end 0)) (setf wc (+ 2 wc))) (t (incf wc))))) (message (format "%d words in %s." wc (if mark-active "region" "buffer")))))