I am trying to reduce the word count in a document I am writing. The existing word count functionality for emacs is surprisingly lacking. I wanted a word count function for org mode which excluded tables and comments, and ended up writing one myself. If this function is called with a region highlighted, it counts the words in the region. Otherwise it counts words in the whole buffer. It ignores commented lines and tables. LaTeX-style macros such as \foo{bar,baz} are counted as 1 word, as a compromise (more often than not they should count as 0, but they do sometimes expand to 1 or more words in the final document). Limitations: - Does not ignore BEGIN_SRC/END_SRC or inline src_* blocks (babel). Should be easy enough to add however. - There is probably a better way of identifying latex macros than my 'latex-macro-regexp' below. - Ignores all org links. I couldn't figure out how to extract "description" text from links, but I didn't look very hard. Improvements welcome. Paul ------------------------------------------------------------------------ (defun in-comment-p () "Return non-nil if point is in a comment." (if (or (null comment-start-skip) (eq (preceding-char) ?\r)) nil (save-excursion (let ((pos (point))) (re-search-backward "^\\|\r" nil t) (or (looking-at comment-start-skip) (re-search-forward comment-start-skip pos t)))))) (defun in-org-table-p () "Return non-nil if point is in an org-mode table." (if (or (not (boundp 'org-table-any-line-regexp)) (null org-table-any-line-regexp) (eq (preceding-char) ?\r)) nil (save-excursion (let ((pos (point))) (re-search-backward "^\\|\r" nil t) (looking-at org-table-any-line-regexp))))) (defvar latex-macro-regexp "\\\\[A-Za-z]+\\(\\[[^]]*\\]\\|\\){\\([^}]*\\)}") (defun org-word-count (beg end) (interactive "r") (unless mark-active (setf beg (point-min) end (point-max))) (let ((wc 0)) (save-excursion (goto-char beg) (while (< (point) end) (re-search-forward "\\w+\\W*") (cond ((or (in-comment-p) (in-org-table-p)) nil) ((looking-at org-any-link-re) (goto-char (match-end 0))) ((save-excursion (backward-char) (looking-at latex-macro-regexp)) (goto-char (match-end 0)) (setf wc (+ 2 wc))) (t (incf wc))))) (message (format "%d words in %s." wc (if mark-active "region" "buffer")))))
Hi, This would be really neat to have as a part of org-mode eventually. Another itch scratched. You can probably replace your custom predicate functions with the existing org-at-table-p and org-in-commented-line (works for me). Maybe there's something to replace the latex-macro-regexp too. Yours, Christian
Forgot to add the code.
#+begin_src emacs-lisp
;; Adapted from code posted by Paul Sexton <2011-02-16 Wed 4:51am>
;; - Everything now contained in one function
;; - Will count correct number of words inside Latex macro
(defun org-word-count (beg end)
(interactive "r")
(unless mark-active
(setf beg (point-min)
end (point-max)))
(let ((wc 0)
(latex-macro-regexp "\\\\[A-Za-z]+\\(\\[[^]]*\\]\\|
\\){\\([^}]*\\)}")) ; CHANGED
(save-excursion
(goto-char beg)
(while (< (point) end)
(re-search-forward "\\w+\\W*")
(cond
((or (org-in-commented-line) (org-at-table-p)) ; CHANGED
nil)
((looking-at org-any-link-re)
(goto-char (match-end 0)))
((save-excursion
(backward-char)
(looking-at latex-macro-regexp))
(goto-char (match-beginning 2)) ; CHANGED
(setf wc (+ 2 wc)))
(t
(incf wc)))))
(message (format "%d words in %s." wc
(if mark-active "region" "buffer")))))
#+end_src
Yours,
Christian
On 2/16/11 4:51 AM, Paul Sexton wrote:
> I am trying to reduce the word count in a document I am writing. The
> existing word count functionality for emacs is surprisingly lacking.
> I wanted a word count function for org mode which excluded tables and
> comments, and ended up writing one myself.
>
> If this function is called with a region highlighted, it counts the words in
> the region. Otherwise it counts words in the whole buffer.
>
> It ignores commented lines and tables. LaTeX-style macros such as
> \foo{bar,baz} are counted as 1 word, as a compromise (more often than not
> they should count as 0, but they do sometimes expand to 1 or more words
> in the final document).
>
> Limitations:
> - Does not ignore BEGIN_SRC/END_SRC or inline src_* blocks (babel).
> Should be easy enough to add however.
> - There is probably a better way of identifying latex macros
> than my 'latex-macro-regexp' below.
> - Ignores all org links. I couldn't figure out how to extract "description"
> text from links, but I didn't look very hard.
>
> Improvements welcome.
>
> Paul
>
> ------------------------------------------------------------------------
>
> (defun in-comment-p ()
> "Return non-nil if point is in a comment."
> (if (or (null comment-start-skip)
> (eq (preceding-char) ?\r))
> nil
> (save-excursion
> (let ((pos (point)))
> (re-search-backward "^\\|\r" nil t)
> (or (looking-at comment-start-skip)
> (re-search-forward comment-start-skip pos t))))))
>
> (defun in-org-table-p ()
> "Return non-nil if point is in an org-mode table."
> (if (or (not (boundp 'org-table-any-line-regexp))
> (null org-table-any-line-regexp)
> (eq (preceding-char) ?\r))
> nil
> (save-excursion
> (let ((pos (point)))
> (re-search-backward "^\\|\r" nil t)
> (looking-at org-table-any-line-regexp)))))
>
>
> (defvar latex-macro-regexp "\\\\[A-Za-z]+\\(\\[[^]]*\\]\\|\\){\\([^}]*\\)}")
>
>
> (defun org-word-count (beg end)
> (interactive "r")
> (unless mark-active
> (setf beg (point-min)
> end (point-max)))
> (let ((wc 0))
> (save-excursion
> (goto-char beg)
> (while (< (point) end)
> (re-search-forward "\\w+\\W*")
> (cond
> ((or (in-comment-p) (in-org-table-p))
> nil)
> ((looking-at org-any-link-re)
> (goto-char (match-end 0)))
> ((save-excursion
> (backward-char)
> (looking-at latex-macro-regexp))
> (goto-char (match-end 0))
> (setf wc (+ 2 wc)))
> (t
> (incf wc)))))
> (message (format "%d words in %s." wc
> (if mark-active "region" "buffer")))))
>
>
>
> _______________________________________________
> Emacs-orgmode mailing list
> Please use `Reply All' to send replies to the list.
> Emacs-orgmode@gnu.org
> http://lists.gnu.org/mailman/listinfo/emacs-orgmode
>
Hi Paul,
Sounds useful to me. Is it appropriate to use the existing org function
`org-in-regexps-block-p'? Or `org-context'? (Not speaking from any
experience of using these functions I just noticed them.)
Dan
Christian Moe <mail@christianmoe.com> writes:
> Hi,
>
> This would be really neat to have as a part of org-mode
> eventually. Another itch scratched.
>
> You can probably replace your custom predicate functions with the
> existing org-at-table-p and org-in-commented-line (works for
> me). Maybe there's something to replace the latex-macro-regexp too.
>
> Yours,
> Christian
>
> _______________________________________________
> Emacs-orgmode mailing list
> Please use `Reply All' to send replies to the list.
> Emacs-orgmode@gnu.org
> http://lists.gnu.org/mailman/listinfo/emacs-orgmode
Hi Paul, Paul Sexton <psexton@xnet.co.nz> writes: > I am trying to reduce the word count in a document I am writing. The > existing word count functionality for emacs is surprisingly lacking. > I wanted a word count function for org mode which excluded tables and > comments, and ended up writing one myself. > > If this function is called with a region highlighted, it counts the words in > the region. Otherwise it counts words in the whole buffer. Looks useful, thanks! I'll let this discussion grow a bit until we have something we can perhaps include in Org. Also, counting words from Emacs is a frequent request from users. If your solution for Org could be integrated into a more general solution for Emacs (working on various modes), that would be a win. See: http://stray-notes.blogspot.com/2009/12/emacs-word-count.html http://www.sultanik.com/Word_count_in_Emacs .. and other pointers. -- Bastien
On Wed, Feb 16, 2011 at 03:51:23AM +0000, Paul Sexton wrote:
> Improvements welcome.
one remark about your naming convertions:
(info "(elisp) Coding Conventions")
especially the second point. in short, you shouldn't use names such as
"in-comment-p" or "latex-macro-regexp" for global variables and/or functions.
use a prefix to distinguish them from other programs/packages.
--
Joost Kremers
Life has its moments
Hi Paul, I have a small emacs project that creates a word-count mode and function for use. It currently does not ignore sections like your proposal, but I think that would be interesting functionality. Mine does keep a running tally of adds/delete in the mode line. It also lets you set a goal and provides a visual indication when that goal is met -- I believe that this feature may be of interest to you. You can find it at: https://github.com/bnbeckwith/wc-mode Regards, Ben
I am enjoying this discussion. Some ideas: 1) ignore :noexport and anything else that prevents export (ideally including whether the user has set certain todo kw to prevent export, although that is not currently a feature in org, so would be too complicated now) 2) handle word counts just like c-c c-x c-d -- then you can find out what sections have the most words Samuel -- The Kafka Pandemic: http://thekafkapandemic.blogspot.com/2010/12/welcome-to-kafka-pandemic-two-forces_9182.html I support the Whittemore-Peterson Institute (WPI) === I want to see the original (pre-hold) Lo et al. 2010 NIH/FDA/Harvard MLV paper.
Christian Moe <mail <at> christianmoe.com> writes:
>
> Forgot to add the code.
>
> #+begin_src emacs-lisp
> ;; Adapted from code posted by Paul Sexton <2011-02-16 Wed 4:51am>
> ;; - Everything now contained in one function
> ;; - Will count correct number of words inside Latex macro
>
> (defun org-word-count (beg end)
> (interactive "r")
> (unless mark-active
> (setf beg (point-min)
> end (point-max)))
> (let ((wc 0)
> (latex-macro-regexp "\\\\[A-Za-z]+\\(\\[[^]]*\\]\\|
> \\){\\([^}]*\\)}")) ; CHANGED
> (save-excursion
> (goto-char beg)
> (while (< (point) end)
> (re-search-forward "\\w+\\W*")
> (cond
> ((or (org-in-commented-line) (org-at-table-p)) ; CHANGED
> nil)
> ((looking-at org-any-link-re)
> (goto-char (match-end 0)))
> ((save-excursion
> (backward-char)
> (looking-at latex-macro-regexp))
> (goto-char (match-beginning 2)) ; CHANGED
> (setf wc (+ 2 wc)))
> (t
> (incf wc)))))
> (message (format "%d words in %s." wc
> (if mark-active "region" "buffer")))))
> #+end_src
>
Thanks, I wasn't aware of those pre-existing functions.
I don't agree with changing '(match-end 0)' to '(match-beginning 2)'
however. For most latex macros, I don't want to count the words inside
the macro's arguments. For example, I don't want the next of footnotes
to be included in the word count. However others differ, and there will
always be cases where one DOES want to count the macro arguments - so maybe
org-word-count should do this optionally.
Paul
Thanks for all the suggestions. Here is version 2. Improvements: - ignores source code blocks - ignores tags and TODO keywords in headings - ignores footnotes by default (option to force counting them) - skips any sections tagged as not for export - option to count words in latex macro arguments (they are ignored by default) I would still like to count hyperlink descriptions but am not sure how -- is there a function that fetches the description of the hyperlink at point? Paul ----------------------------------------------------------------------- (defun org-word-count (beg end &optional count-latex-macro-args? count-footnotes?) "Report the number of words in the Org mode buffer or selected region. Ignores: - comments - tables - source code blocks (#+BEGIN_SRC ... #+END_SRC, and inline blocks) - hyperlinks - tags, priorities, and TODO keywords in headers - sections tagged as 'not for export'. The text of footnote definitions is ignored, unless the optional argument COUNT-FOOTNOTES? is non-nil. If the optional argument COUNT-LATEX-MACRO-ARGS? is non-nil, the word count includes LaTeX macro arguments (the material between {curly braces}). Otherwise, and by default, every LaTeX macro counts as 1 word regardless of its arguments." (interactive "r") (unless mark-active (setf beg (point-min) end (point-max))) (let ((wc 0) (latex-macro-regexp "\\\\[A-Za-z]+\\(\\[[^]]*\\]\\|\\){\\([^}]*\\)}")) (save-excursion (goto-char beg) (while (< (point) end) (re-search-forward "\\w+\\W*") (cond ;; Ignore comments. ((or (org-in-commented-line) (org-at-table-p)) nil) ;; Ignore hyperlinks. ;; TODO need to count text of the link's description. ((looking-at org-any-link-re) (goto-char (match-end 0))) ;; Ignore source code blocks. ((org-in-regexps-block-p "^#\\+BEGIN_SRC\\W" "^#\\+END_SRC\\W") nil) ;; Ignore inline source blocks, counting them as 1 word. ((save-excursion (backward-char) (looking-at org-babel-inline-src-block-regexp)) (goto-char (match-end 0)) (setf wc (+ 2 wc))) ;; Count latex macros as 1 word, ignoring their arguments. ((save-excursion (backward-char) (looking-at latex-macro-regexp)) (goto-char (if count-latex-macro-args? (match-beginning 2) (match-end 0))) (setf wc (+ 2 wc))) ;; Ignore footnotes. ((and (not count-footnotes?) (or (org-footnote-at-definition-p) (org-footnote-at-reference-p))) nil) (t (let ((contexts (org-context))) (cond ;; Ignore tags and TODO keywords, etc. ((or (assoc :todo-keyword contexts) (assoc :priority contexts) (assoc :keyword contexts) (assoc :checkbox contexts)) nil) ;; Ignore sections marked with tags that are ;; excluded from export. ((assoc :tags contexts) (if (intersection (org-get-tags-at) org-export-exclude-tags :test 'equal) (org-forward-same-level 1) nil)) (t (incf wc)))))))) (message (format "%d words in %s." wc (if mark-active "region" "buffer")))))
That looks really good. My suggestion would be to modify it so that 'wc-count' can be redefined on a per-major-mode or per-buffer basis, eg via a buffer-local variable 'wc-count-function'. Then my org-word-count function could be slotted in fairly easily - I would just have to modify it so that it *returns* the count, rather than reporting it. Paul
Hi,
> I don't agree with changing '(match-end 0)' to '(match-beginning 2)'
> however. For most latex macros, I don't want to count the words inside
> the macro's arguments.
Maybe you're right. I rarely use latex, so I can't think of a use case
off the top of my head. Default no, optional yes sounds good.
Yours,
Christian
This looks great. How hard do you think it would be to show how many words there are for every subtree in a similar way to how clock durations are summed in c-c c-x c-d and displayed in the subtree itself in reverse video?
Samuel Wales <samologist <at> gmail.com> writes:
>
> This looks great.
>
> How hard do you think it would be to show how many words there are for
> every subtree in a similar way to how clock durations are summed in
> c-c c-x c-d and displayed in the subtree itself in reverse video?
I might leave that functionality to you -- it's pretty far beyond what I need
out of a word count function.
Cheers
P
Christian Moe <mail@christianmoe.com> writes:
> Hi,
>
>> I don't agree with changing '(match-end 0)' to '(match-beginning 2)'
>> however. For most latex macros, I don't want to count the words inside
>> the macro's arguments.
>
> Maybe you're right. I rarely use latex, so I can't think of a use case
> off the top of my head. Default no, optional yes sounds good.
>
I find the following handy for word-count of latex documents...
#+begin_src sh :var basename=(file-name-sans-extension (buffer-file-name))
pdftotext $basename.pdf - |wc -w
#+end_src
Hi Paul, Paul Sexton <psexton@xnet.co.nz> writes: > Thanks for all the suggestions. Here is version 2. Thanks for that. Maybe you could put if on Worg/org-hacks.org? > I would still like to count hyperlink descriptions but am not sure > how -- is there a function that fetches the description of the hyperlink > at point? #+begin_src emacs-lisp (when (looking-at org-bracket-link-analytic-regexp) (match-string-no-properties 5)) #+end_src emacs-lisp HTH, -- Bastien
Bastien <bastien.guerry <at> wikimedia.fr> writes:
> #+begin_src emacs-lisp
> (when (looking-at org-bracket-link-analytic-regexp)
> (match-string-no-properties 5))
> #+end_src emacs-lisp
Thanks. Here is version 3 if the function, which is now able to count
words in link descriptions.
The code to advance to the next word has been moved to the end of the
loop, which improves accuracy.
Paul
----------------------------------------------------------------------
(defun org-word-count (beg end
&optional count-latex-macro-args?
count-footnotes?)
"Report the number of words in the Org mode buffer or selected region.
Ignores:
- comments
- tables
- source code blocks (#+BEGIN_SRC ... #+END_SRC, and inline blocks)
- hyperlinks (but does count words in hyperlink descriptions)
- tags, priorities, and TODO keywords in headers
- sections tagged as 'not for export'.
The text of footnote definitions is ignored, unless the optional argument
COUNT-FOOTNOTES? is non-nil.
If the optional argument COUNT-LATEX-MACRO-ARGS? is non-nil, the word count
includes LaTeX macro arguments (the material between {curly braces}).
Otherwise, and by default, every LaTeX macro counts as 1 word regardless
of its arguments."
(interactive "r")
(unless mark-active
(setf beg (point-min)
end (point-max)))
(let ((wc 0)
(latex-macro-regexp "\\\\[A-Za-z]+\\(\\[[^]]*\\]\\|\\){\\([^}]*\\)}"))
(save-excursion
(goto-char beg)
(while (< (point) end)
(cond
;; Ignore comments.
((or (org-in-commented-line) (org-at-table-p))
nil)
;; Ignore hyperlinks. But if link has a description, count
;; the words within the description.
((looking-at org-bracket-link-analytic-regexp)
(when (match-string-no-properties 5)
(let ((desc (match-string-no-properties 5)))
(save-match-data
(incf wc (length (remove "" (org-split-string
desc "\\W")))))))
(goto-char (match-end 0)))
((looking-at org-any-link-re)
(goto-char (match-end 0)))
;; Ignore source code blocks.
((org-in-regexps-block-p "^#\\+BEGIN_SRC\\W" "^#\\+END_SRC\\W")
nil)
;; Ignore inline source blocks, counting them as 1 word.
((save-excursion
(backward-char)
(looking-at org-babel-inline-src-block-regexp))
(goto-char (match-end 0))
(setf wc (+ 2 wc)))
;; Count latex macros as 1 word, ignoring their arguments.
((save-excursion
(backward-char)
(looking-at latex-macro-regexp))
(goto-char (if count-latex-macro-args?
(match-beginning 2)
(match-end 0)))
(setf wc (+ 2 wc)))
;; Ignore footnotes.
((and (not count-footnotes?)
(or (org-footnote-at-definition-p)
(org-footnote-at-reference-p)))
nil)
(t
(let ((contexts (org-context)))
(cond
;; Ignore tags and TODO keywords, etc.
((or (assoc :todo-keyword contexts)
(assoc :priority contexts)
(assoc :keyword contexts)
(assoc :checkbox contexts))
nil)
;; Ignore sections marked with tags that are
;; excluded from export.
((assoc :tags contexts)
(if (intersection (org-get-tags-at) org-export-exclude-tags
:test 'equal)
(org-forward-same-level 1)
nil))
(t
(incf wc))))))
(re-search-forward "\\w+\\W*")))
(message (format "%d words in %s." wc
(if mark-active "region" "buffer")))))
Out of curiosity, has anybody compared the results that this provides with exporting to HTML, rendering with a browser (including emacs-w3m), and counting words that way? -- The Kafka Pandemic: http://thekafkapandemic.blogspot.com/2010/12/welcome-to-kafka-pandemic-two-forces_9182.html I support the Whittemore-Peterson Institute (WPI) === I want to see the original (pre-hold) Lo et al. 2010 NIH/FDA/Harvard MLV paper.
Hi Paul, Paul Sexton <psexton@xnet.co.nz> writes: > The process of contributing to Worg is too complicated for me. You can clone the Worg.git repository with $ git clone git://repo.or.cz/Worg.git Then make some edit on Worg. Then commit your changes with $ git commit -a -m "Your message" Then create a patch with $ git format-patch master Then send the patch. > If someone else wants to add it somewhere on my behalf that is fine > with me. I added a section "Count words in an Org buffer" with your code in Worg/org-hacks.org. Thanks. > Have you thought of making Worg a wiki? > Wikis hosted on github can be written in org-mode format. Yes, but this format is limited. I think it's good to have all Org's functionalities in Worg, not only the one needed for publication. In particular, org-issues.org contains lots of information that we want to handle locally, not just for publishing. Best, -- Bastien
If I could I would refactor org-clock-put-overlay into a function that puts overlays on the current headline but I am not able to do so.