* Re: Context-sensitive word count in org mode (elisp)
2011-02-16 3:51 Context-sensitive word count in org mode (elisp) Paul Sexton
@ 2011-02-16 9:12 ` Christian Moe
2011-02-16 9:47 ` Dan Davison
2011-02-16 9:45 ` Christian Moe
` (3 subsequent siblings)
4 siblings, 1 reply; 18+ messages in thread
From: Christian Moe @ 2011-02-16 9:12 UTC (permalink / raw)
To: Paul Sexton; +Cc: emacs-orgmode
Hi,
This would be really neat to have as a part of org-mode eventually.
Another itch scratched.
You can probably replace your custom predicate functions with the
existing org-at-table-p and org-in-commented-line (works for me).
Maybe there's something to replace the latex-macro-regexp too.
Yours,
Christian
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Context-sensitive word count in org mode (elisp)
2011-02-16 9:12 ` Christian Moe
@ 2011-02-16 9:47 ` Dan Davison
0 siblings, 0 replies; 18+ messages in thread
From: Dan Davison @ 2011-02-16 9:47 UTC (permalink / raw)
To: mail; +Cc: emacs-orgmode, Paul Sexton
Hi Paul,
Sounds useful to me. Is it appropriate to use the existing org function
`org-in-regexps-block-p'? Or `org-context'? (Not speaking from any
experience of using these functions I just noticed them.)
Dan
Christian Moe <mail@christianmoe.com> writes:
> Hi,
>
> This would be really neat to have as a part of org-mode
> eventually. Another itch scratched.
>
> You can probably replace your custom predicate functions with the
> existing org-at-table-p and org-in-commented-line (works for
> me). Maybe there's something to replace the latex-macro-regexp too.
>
> Yours,
> Christian
>
> _______________________________________________
> Emacs-orgmode mailing list
> Please use `Reply All' to send replies to the list.
> Emacs-orgmode@gnu.org
> http://lists.gnu.org/mailman/listinfo/emacs-orgmode
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Context-sensitive word count in org mode (elisp)
2011-02-16 3:51 Context-sensitive word count in org mode (elisp) Paul Sexton
2011-02-16 9:12 ` Christian Moe
@ 2011-02-16 9:45 ` Christian Moe
2011-02-16 20:34 ` Paul Sexton
2011-02-16 10:14 ` Bastien
` (2 subsequent siblings)
4 siblings, 1 reply; 18+ messages in thread
From: Christian Moe @ 2011-02-16 9:45 UTC (permalink / raw)
To: Paul Sexton; +Cc: emacs-orgmode
Forgot to add the code.
#+begin_src emacs-lisp
;; Adapted from code posted by Paul Sexton <2011-02-16 Wed 4:51am>
;; - Everything now contained in one function
;; - Will count correct number of words inside Latex macro
(defun org-word-count (beg end)
(interactive "r")
(unless mark-active
(setf beg (point-min)
end (point-max)))
(let ((wc 0)
(latex-macro-regexp "\\\\[A-Za-z]+\\(\\[[^]]*\\]\\|
\\){\\([^}]*\\)}")) ; CHANGED
(save-excursion
(goto-char beg)
(while (< (point) end)
(re-search-forward "\\w+\\W*")
(cond
((or (org-in-commented-line) (org-at-table-p)) ; CHANGED
nil)
((looking-at org-any-link-re)
(goto-char (match-end 0)))
((save-excursion
(backward-char)
(looking-at latex-macro-regexp))
(goto-char (match-beginning 2)) ; CHANGED
(setf wc (+ 2 wc)))
(t
(incf wc)))))
(message (format "%d words in %s." wc
(if mark-active "region" "buffer")))))
#+end_src
Yours,
Christian
On 2/16/11 4:51 AM, Paul Sexton wrote:
> I am trying to reduce the word count in a document I am writing. The
> existing word count functionality for emacs is surprisingly lacking.
> I wanted a word count function for org mode which excluded tables and
> comments, and ended up writing one myself.
>
> If this function is called with a region highlighted, it counts the words in
> the region. Otherwise it counts words in the whole buffer.
>
> It ignores commented lines and tables. LaTeX-style macros such as
> \foo{bar,baz} are counted as 1 word, as a compromise (more often than not
> they should count as 0, but they do sometimes expand to 1 or more words
> in the final document).
>
> Limitations:
> - Does not ignore BEGIN_SRC/END_SRC or inline src_* blocks (babel).
> Should be easy enough to add however.
> - There is probably a better way of identifying latex macros
> than my 'latex-macro-regexp' below.
> - Ignores all org links. I couldn't figure out how to extract "description"
> text from links, but I didn't look very hard.
>
> Improvements welcome.
>
> Paul
>
> ------------------------------------------------------------------------
>
> (defun in-comment-p ()
> "Return non-nil if point is in a comment."
> (if (or (null comment-start-skip)
> (eq (preceding-char) ?\r))
> nil
> (save-excursion
> (let ((pos (point)))
> (re-search-backward "^\\|\r" nil t)
> (or (looking-at comment-start-skip)
> (re-search-forward comment-start-skip pos t))))))
>
> (defun in-org-table-p ()
> "Return non-nil if point is in an org-mode table."
> (if (or (not (boundp 'org-table-any-line-regexp))
> (null org-table-any-line-regexp)
> (eq (preceding-char) ?\r))
> nil
> (save-excursion
> (let ((pos (point)))
> (re-search-backward "^\\|\r" nil t)
> (looking-at org-table-any-line-regexp)))))
>
>
> (defvar latex-macro-regexp "\\\\[A-Za-z]+\\(\\[[^]]*\\]\\|\\){\\([^}]*\\)}")
>
>
> (defun org-word-count (beg end)
> (interactive "r")
> (unless mark-active
> (setf beg (point-min)
> end (point-max)))
> (let ((wc 0))
> (save-excursion
> (goto-char beg)
> (while (< (point) end)
> (re-search-forward "\\w+\\W*")
> (cond
> ((or (in-comment-p) (in-org-table-p))
> nil)
> ((looking-at org-any-link-re)
> (goto-char (match-end 0)))
> ((save-excursion
> (backward-char)
> (looking-at latex-macro-regexp))
> (goto-char (match-end 0))
> (setf wc (+ 2 wc)))
> (t
> (incf wc)))))
> (message (format "%d words in %s." wc
> (if mark-active "region" "buffer")))))
>
>
>
> _______________________________________________
> Emacs-orgmode mailing list
> Please use `Reply All' to send replies to the list.
> Emacs-orgmode@gnu.org
> http://lists.gnu.org/mailman/listinfo/emacs-orgmode
>
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Context-sensitive word count in org mode (elisp)
2011-02-16 9:45 ` Christian Moe
@ 2011-02-16 20:34 ` Paul Sexton
2011-02-17 10:02 ` Christian Moe
0 siblings, 1 reply; 18+ messages in thread
From: Paul Sexton @ 2011-02-16 20:34 UTC (permalink / raw)
To: emacs-orgmode
Christian Moe <mail <at> christianmoe.com> writes:
>
> Forgot to add the code.
>
> #+begin_src emacs-lisp
> ;; Adapted from code posted by Paul Sexton <2011-02-16 Wed 4:51am>
> ;; - Everything now contained in one function
> ;; - Will count correct number of words inside Latex macro
>
> (defun org-word-count (beg end)
> (interactive "r")
> (unless mark-active
> (setf beg (point-min)
> end (point-max)))
> (let ((wc 0)
> (latex-macro-regexp "\\\\[A-Za-z]+\\(\\[[^]]*\\]\\|
> \\){\\([^}]*\\)}")) ; CHANGED
> (save-excursion
> (goto-char beg)
> (while (< (point) end)
> (re-search-forward "\\w+\\W*")
> (cond
> ((or (org-in-commented-line) (org-at-table-p)) ; CHANGED
> nil)
> ((looking-at org-any-link-re)
> (goto-char (match-end 0)))
> ((save-excursion
> (backward-char)
> (looking-at latex-macro-regexp))
> (goto-char (match-beginning 2)) ; CHANGED
> (setf wc (+ 2 wc)))
> (t
> (incf wc)))))
> (message (format "%d words in %s." wc
> (if mark-active "region" "buffer")))))
> #+end_src
>
Thanks, I wasn't aware of those pre-existing functions.
I don't agree with changing '(match-end 0)' to '(match-beginning 2)'
however. For most latex macros, I don't want to count the words inside
the macro's arguments. For example, I don't want the next of footnotes
to be included in the word count. However others differ, and there will
always be cases where one DOES want to count the macro arguments - so maybe
org-word-count should do this optionally.
Paul
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Re: Context-sensitive word count in org mode (elisp)
2011-02-16 20:34 ` Paul Sexton
@ 2011-02-17 10:02 ` Christian Moe
2011-02-17 18:57 ` Eric Schulte
0 siblings, 1 reply; 18+ messages in thread
From: Christian Moe @ 2011-02-17 10:02 UTC (permalink / raw)
To: Paul Sexton; +Cc: emacs-orgmode
Hi,
> I don't agree with changing '(match-end 0)' to '(match-beginning 2)'
> however. For most latex macros, I don't want to count the words inside
> the macro's arguments.
Maybe you're right. I rarely use latex, so I can't think of a use case
off the top of my head. Default no, optional yes sounds good.
Yours,
Christian
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Re: Context-sensitive word count in org mode (elisp)
2011-02-17 10:02 ` Christian Moe
@ 2011-02-17 18:57 ` Eric Schulte
0 siblings, 0 replies; 18+ messages in thread
From: Eric Schulte @ 2011-02-17 18:57 UTC (permalink / raw)
To: mail; +Cc: emacs-orgmode, Paul Sexton
Christian Moe <mail@christianmoe.com> writes:
> Hi,
>
>> I don't agree with changing '(match-end 0)' to '(match-beginning 2)'
>> however. For most latex macros, I don't want to count the words inside
>> the macro's arguments.
>
> Maybe you're right. I rarely use latex, so I can't think of a use case
> off the top of my head. Default no, optional yes sounds good.
>
I find the following handy for word-count of latex documents...
#+begin_src sh :var basename=(file-name-sans-extension (buffer-file-name))
pdftotext $basename.pdf - |wc -w
#+end_src
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Context-sensitive word count in org mode (elisp)
2011-02-16 3:51 Context-sensitive word count in org mode (elisp) Paul Sexton
2011-02-16 9:12 ` Christian Moe
2011-02-16 9:45 ` Christian Moe
@ 2011-02-16 10:14 ` Bastien
2011-02-16 18:15 ` Samuel Wales
2011-02-16 13:03 ` Joost Kremers
2011-02-16 23:28 ` Paul Sexton
4 siblings, 1 reply; 18+ messages in thread
From: Bastien @ 2011-02-16 10:14 UTC (permalink / raw)
To: Paul Sexton; +Cc: emacs-orgmode
Hi Paul,
Paul Sexton <psexton@xnet.co.nz> writes:
> I am trying to reduce the word count in a document I am writing. The
> existing word count functionality for emacs is surprisingly lacking.
> I wanted a word count function for org mode which excluded tables and
> comments, and ended up writing one myself.
>
> If this function is called with a region highlighted, it counts the words in
> the region. Otherwise it counts words in the whole buffer.
Looks useful, thanks!
I'll let this discussion grow a bit until we have something we can
perhaps include in Org.
Also, counting words from Emacs is a frequent request from users. If
your solution for Org could be integrated into a more general solution
for Emacs (working on various modes), that would be a win. See:
http://stray-notes.blogspot.com/2009/12/emacs-word-count.html
http://www.sultanik.com/Word_count_in_Emacs
.. and other pointers.
--
Bastien
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Context-sensitive word count in org mode (elisp)
2011-02-16 3:51 Context-sensitive word count in org mode (elisp) Paul Sexton
` (2 preceding siblings ...)
2011-02-16 10:14 ` Bastien
@ 2011-02-16 13:03 ` Joost Kremers
2011-02-16 23:28 ` Paul Sexton
4 siblings, 0 replies; 18+ messages in thread
From: Joost Kremers @ 2011-02-16 13:03 UTC (permalink / raw)
To: emacs-orgmode
On Wed, Feb 16, 2011 at 03:51:23AM +0000, Paul Sexton wrote:
> Improvements welcome.
one remark about your naming convertions:
(info "(elisp) Coding Conventions")
especially the second point. in short, you shouldn't use names such as
"in-comment-p" or "latex-macro-regexp" for global variables and/or functions.
use a prefix to distinguish them from other programs/packages.
--
Joost Kremers
Life has its moments
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Context-sensitive word count in org mode (elisp)
2011-02-16 3:51 Context-sensitive word count in org mode (elisp) Paul Sexton
` (3 preceding siblings ...)
2011-02-16 13:03 ` Joost Kremers
@ 2011-02-16 23:28 ` Paul Sexton
2011-02-17 16:50 ` Samuel Wales
2011-02-18 14:34 ` Bastien
4 siblings, 2 replies; 18+ messages in thread
From: Paul Sexton @ 2011-02-16 23:28 UTC (permalink / raw)
To: emacs-orgmode
Thanks for all the suggestions. Here is version 2.
Improvements:
- ignores source code blocks
- ignores tags and TODO keywords in headings
- ignores footnotes by default (option to force counting them)
- skips any sections tagged as not for export
- option to count words in latex macro arguments (they are ignored
by default)
I would still like to count hyperlink descriptions but am not sure
how -- is there a function that fetches the description of the hyperlink
at point?
Paul
-----------------------------------------------------------------------
(defun org-word-count (beg end
&optional count-latex-macro-args?
count-footnotes?)
"Report the number of words in the Org mode buffer or selected region.
Ignores:
- comments
- tables
- source code blocks (#+BEGIN_SRC ... #+END_SRC, and inline blocks)
- hyperlinks
- tags, priorities, and TODO keywords in headers
- sections tagged as 'not for export'.
The text of footnote definitions is ignored, unless the optional argument
COUNT-FOOTNOTES? is non-nil.
If the optional argument COUNT-LATEX-MACRO-ARGS? is non-nil, the word count
includes LaTeX macro arguments (the material between {curly braces}).
Otherwise, and by default, every LaTeX macro counts as 1 word regardless
of its arguments."
(interactive "r")
(unless mark-active
(setf beg (point-min)
end (point-max)))
(let ((wc 0)
(latex-macro-regexp "\\\\[A-Za-z]+\\(\\[[^]]*\\]\\|\\){\\([^}]*\\)}"))
(save-excursion
(goto-char beg)
(while (< (point) end)
(re-search-forward "\\w+\\W*")
(cond
;; Ignore comments.
((or (org-in-commented-line) (org-at-table-p))
nil)
;; Ignore hyperlinks.
;; TODO need to count text of the link's description.
((looking-at org-any-link-re)
(goto-char (match-end 0)))
;; Ignore source code blocks.
((org-in-regexps-block-p "^#\\+BEGIN_SRC\\W" "^#\\+END_SRC\\W")
nil)
;; Ignore inline source blocks, counting them as 1 word.
((save-excursion
(backward-char)
(looking-at org-babel-inline-src-block-regexp))
(goto-char (match-end 0))
(setf wc (+ 2 wc)))
;; Count latex macros as 1 word, ignoring their arguments.
((save-excursion
(backward-char)
(looking-at latex-macro-regexp))
(goto-char (if count-latex-macro-args?
(match-beginning 2)
(match-end 0)))
(setf wc (+ 2 wc)))
;; Ignore footnotes.
((and (not count-footnotes?)
(or (org-footnote-at-definition-p)
(org-footnote-at-reference-p)))
nil)
(t
(let ((contexts (org-context)))
(cond
;; Ignore tags and TODO keywords, etc.
((or (assoc :todo-keyword contexts)
(assoc :priority contexts)
(assoc :keyword contexts)
(assoc :checkbox contexts))
nil)
;; Ignore sections marked with tags that are
;; excluded from export.
((assoc :tags contexts)
(if (intersection (org-get-tags-at) org-export-exclude-tags
:test 'equal)
(org-forward-same-level 1)
nil))
(t
(incf wc))))))))
(message (format "%d words in %s." wc
(if mark-active "region" "buffer")))))
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Re: Context-sensitive word count in org mode (elisp)
2011-02-16 23:28 ` Paul Sexton
@ 2011-02-17 16:50 ` Samuel Wales
2011-02-17 18:55 ` Paul Sexton
2011-02-18 14:34 ` Bastien
1 sibling, 1 reply; 18+ messages in thread
From: Samuel Wales @ 2011-02-17 16:50 UTC (permalink / raw)
To: Paul Sexton; +Cc: emacs-orgmode
This looks great.
How hard do you think it would be to show how many words there are for
every subtree in a similar way to how clock durations are summed in
c-c c-x c-d and displayed in the subtree itself in reverse video?
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Context-sensitive word count in org mode (elisp)
2011-02-17 16:50 ` Samuel Wales
@ 2011-02-17 18:55 ` Paul Sexton
2011-03-27 19:40 ` [Orgmode] " Samuel Wales
0 siblings, 1 reply; 18+ messages in thread
From: Paul Sexton @ 2011-02-17 18:55 UTC (permalink / raw)
To: emacs-orgmode
Samuel Wales <samologist <at> gmail.com> writes:
>
> This looks great.
>
> How hard do you think it would be to show how many words there are for
> every subtree in a similar way to how clock durations are summed in
> c-c c-x c-d and displayed in the subtree itself in reverse video?
I might leave that functionality to you -- it's pretty far beyond what I need
out of a word count function.
Cheers
P
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Re: Context-sensitive word count in org mode (elisp)
2011-02-16 23:28 ` Paul Sexton
2011-02-17 16:50 ` Samuel Wales
@ 2011-02-18 14:34 ` Bastien
2011-02-20 21:49 ` Paul Sexton
[not found] ` <4D601314.8000701@xnet.co.nz>
1 sibling, 2 replies; 18+ messages in thread
From: Bastien @ 2011-02-18 14:34 UTC (permalink / raw)
To: Paul Sexton; +Cc: emacs-orgmode
Hi Paul,
Paul Sexton <psexton@xnet.co.nz> writes:
> Thanks for all the suggestions. Here is version 2.
Thanks for that. Maybe you could put if on Worg/org-hacks.org?
> I would still like to count hyperlink descriptions but am not sure
> how -- is there a function that fetches the description of the hyperlink
> at point?
#+begin_src emacs-lisp
(when (looking-at org-bracket-link-analytic-regexp)
(match-string-no-properties 5))
#+end_src emacs-lisp
HTH,
--
Bastien
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Context-sensitive word count in org mode (elisp)
2011-02-18 14:34 ` Bastien
@ 2011-02-20 21:49 ` Paul Sexton
2011-02-21 23:30 ` Samuel Wales
[not found] ` <4D601314.8000701@xnet.co.nz>
1 sibling, 1 reply; 18+ messages in thread
From: Paul Sexton @ 2011-02-20 21:49 UTC (permalink / raw)
To: emacs-orgmode
Bastien <bastien.guerry <at> wikimedia.fr> writes:
> #+begin_src emacs-lisp
> (when (looking-at org-bracket-link-analytic-regexp)
> (match-string-no-properties 5))
> #+end_src emacs-lisp
Thanks. Here is version 3 if the function, which is now able to count
words in link descriptions.
The code to advance to the next word has been moved to the end of the
loop, which improves accuracy.
Paul
----------------------------------------------------------------------
(defun org-word-count (beg end
&optional count-latex-macro-args?
count-footnotes?)
"Report the number of words in the Org mode buffer or selected region.
Ignores:
- comments
- tables
- source code blocks (#+BEGIN_SRC ... #+END_SRC, and inline blocks)
- hyperlinks (but does count words in hyperlink descriptions)
- tags, priorities, and TODO keywords in headers
- sections tagged as 'not for export'.
The text of footnote definitions is ignored, unless the optional argument
COUNT-FOOTNOTES? is non-nil.
If the optional argument COUNT-LATEX-MACRO-ARGS? is non-nil, the word count
includes LaTeX macro arguments (the material between {curly braces}).
Otherwise, and by default, every LaTeX macro counts as 1 word regardless
of its arguments."
(interactive "r")
(unless mark-active
(setf beg (point-min)
end (point-max)))
(let ((wc 0)
(latex-macro-regexp "\\\\[A-Za-z]+\\(\\[[^]]*\\]\\|\\){\\([^}]*\\)}"))
(save-excursion
(goto-char beg)
(while (< (point) end)
(cond
;; Ignore comments.
((or (org-in-commented-line) (org-at-table-p))
nil)
;; Ignore hyperlinks. But if link has a description, count
;; the words within the description.
((looking-at org-bracket-link-analytic-regexp)
(when (match-string-no-properties 5)
(let ((desc (match-string-no-properties 5)))
(save-match-data
(incf wc (length (remove "" (org-split-string
desc "\\W")))))))
(goto-char (match-end 0)))
((looking-at org-any-link-re)
(goto-char (match-end 0)))
;; Ignore source code blocks.
((org-in-regexps-block-p "^#\\+BEGIN_SRC\\W" "^#\\+END_SRC\\W")
nil)
;; Ignore inline source blocks, counting them as 1 word.
((save-excursion
(backward-char)
(looking-at org-babel-inline-src-block-regexp))
(goto-char (match-end 0))
(setf wc (+ 2 wc)))
;; Count latex macros as 1 word, ignoring their arguments.
((save-excursion
(backward-char)
(looking-at latex-macro-regexp))
(goto-char (if count-latex-macro-args?
(match-beginning 2)
(match-end 0)))
(setf wc (+ 2 wc)))
;; Ignore footnotes.
((and (not count-footnotes?)
(or (org-footnote-at-definition-p)
(org-footnote-at-reference-p)))
nil)
(t
(let ((contexts (org-context)))
(cond
;; Ignore tags and TODO keywords, etc.
((or (assoc :todo-keyword contexts)
(assoc :priority contexts)
(assoc :keyword contexts)
(assoc :checkbox contexts))
nil)
;; Ignore sections marked with tags that are
;; excluded from export.
((assoc :tags contexts)
(if (intersection (org-get-tags-at) org-export-exclude-tags
:test 'equal)
(org-forward-same-level 1)
nil))
(t
(incf wc))))))
(re-search-forward "\\w+\\W*")))
(message (format "%d words in %s." wc
(if mark-active "region" "buffer")))))
^ permalink raw reply [flat|nested] 18+ messages in thread
[parent not found: <4D601314.8000701@xnet.co.nz>]
* Re: Re: Context-sensitive word count in org mode (elisp)
[not found] ` <4D601314.8000701@xnet.co.nz>
@ 2011-02-22 11:28 ` Bastien
0 siblings, 0 replies; 18+ messages in thread
From: Bastien @ 2011-02-22 11:28 UTC (permalink / raw)
To: Paul Sexton; +Cc: emacs-orgmode
Hi Paul,
Paul Sexton <psexton@xnet.co.nz> writes:
> The process of contributing to Worg is too complicated for me.
You can clone the Worg.git repository with
$ git clone git://repo.or.cz/Worg.git
Then make some edit on Worg.
Then commit your changes with
$ git commit -a -m "Your message"
Then create a patch with
$ git format-patch master
Then send the patch.
> If someone else wants to add it somewhere on my behalf that is fine
> with me.
I added a section "Count words in an Org buffer" with your code in
Worg/org-hacks.org. Thanks.
> Have you thought of making Worg a wiki?
> Wikis hosted on github can be written in org-mode format.
Yes, but this format is limited. I think it's good to have all Org's
functionalities in Worg, not only the one needed for publication. In
particular, org-issues.org contains lots of information that we want
to handle locally, not just for publishing.
Best,
--
Bastien
^ permalink raw reply [flat|nested] 18+ messages in thread