* Re: Context-sensitive word count in org mode (elisp)
@ 2011-02-16 16:22 Benjamin Beckwith
2011-02-16 16:32 ` Bastien
2011-02-16 23:31 ` Paul Sexton
0 siblings, 2 replies; 10+ messages in thread
From: Benjamin Beckwith @ 2011-02-16 16:22 UTC (permalink / raw)
To: Paul Sexton, emacs-orgmode
Hi Paul,
I have a small emacs project that creates a word-count mode and function
for use. It currently does not ignore sections like your proposal, but I think
that would be interesting functionality.
Mine does keep a running tally of adds/delete in the mode line. It also
lets you set a goal and provides a visual indication when that goal is
met -- I believe that this feature may be of interest to you.
You can find it at:
https://github.com/bnbeckwith/wc-mode
Regards,
Ben
^ permalink raw reply [flat|nested] 10+ messages in thread
* Context-sensitive word count in org mode (elisp)
@ 2011-02-16 3:51 Paul Sexton
2011-02-16 9:45 ` Christian Moe
2011-02-16 23:28 ` Paul Sexton
0 siblings, 2 replies; 10+ messages in thread
From: Paul Sexton @ 2011-02-16 3:51 UTC (permalink / raw)
To: emacs-orgmode
I am trying to reduce the word count in a document I am writing. The
existing word count functionality for emacs is surprisingly lacking.
I wanted a word count function for org mode which excluded tables and
comments, and ended up writing one myself.
If this function is called with a region highlighted, it counts the words in
the region. Otherwise it counts words in the whole buffer.
It ignores commented lines and tables. LaTeX-style macros such as
\foo{bar,baz} are counted as 1 word, as a compromise (more often than not
they should count as 0, but they do sometimes expand to 1 or more words
in the final document).
Limitations:
- Does not ignore BEGIN_SRC/END_SRC or inline src_* blocks (babel).
Should be easy enough to add however.
- There is probably a better way of identifying latex macros
than my 'latex-macro-regexp' below.
- Ignores all org links. I couldn't figure out how to extract "description"
text from links, but I didn't look very hard.
Improvements welcome.
Paul
------------------------------------------------------------------------
(defun in-comment-p ()
"Return non-nil if point is in a comment."
(if (or (null comment-start-skip)
(eq (preceding-char) ?\r))
nil
(save-excursion
(let ((pos (point)))
(re-search-backward "^\\|\r" nil t)
(or (looking-at comment-start-skip)
(re-search-forward comment-start-skip pos t))))))
(defun in-org-table-p ()
"Return non-nil if point is in an org-mode table."
(if (or (not (boundp 'org-table-any-line-regexp))
(null org-table-any-line-regexp)
(eq (preceding-char) ?\r))
nil
(save-excursion
(let ((pos (point)))
(re-search-backward "^\\|\r" nil t)
(looking-at org-table-any-line-regexp)))))
(defvar latex-macro-regexp "\\\\[A-Za-z]+\\(\\[[^]]*\\]\\|\\){\\([^}]*\\)}")
(defun org-word-count (beg end)
(interactive "r")
(unless mark-active
(setf beg (point-min)
end (point-max)))
(let ((wc 0))
(save-excursion
(goto-char beg)
(while (< (point) end)
(re-search-forward "\\w+\\W*")
(cond
((or (in-comment-p) (in-org-table-p))
nil)
((looking-at org-any-link-re)
(goto-char (match-end 0)))
((save-excursion
(backward-char)
(looking-at latex-macro-regexp))
(goto-char (match-end 0))
(setf wc (+ 2 wc)))
(t
(incf wc)))))
(message (format "%d words in %s." wc
(if mark-active "region" "buffer")))))
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Context-sensitive word count in org mode (elisp)
2011-02-16 3:51 Paul Sexton
@ 2011-02-16 9:45 ` Christian Moe
2011-02-16 20:34 ` Paul Sexton
2011-02-16 23:28 ` Paul Sexton
1 sibling, 1 reply; 10+ messages in thread
From: Christian Moe @ 2011-02-16 9:45 UTC (permalink / raw)
To: Paul Sexton; +Cc: emacs-orgmode
Forgot to add the code.
#+begin_src emacs-lisp
;; Adapted from code posted by Paul Sexton <2011-02-16 Wed 4:51am>
;; - Everything now contained in one function
;; - Will count correct number of words inside Latex macro
(defun org-word-count (beg end)
(interactive "r")
(unless mark-active
(setf beg (point-min)
end (point-max)))
(let ((wc 0)
(latex-macro-regexp "\\\\[A-Za-z]+\\(\\[[^]]*\\]\\|
\\){\\([^}]*\\)}")) ; CHANGED
(save-excursion
(goto-char beg)
(while (< (point) end)
(re-search-forward "\\w+\\W*")
(cond
((or (org-in-commented-line) (org-at-table-p)) ; CHANGED
nil)
((looking-at org-any-link-re)
(goto-char (match-end 0)))
((save-excursion
(backward-char)
(looking-at latex-macro-regexp))
(goto-char (match-beginning 2)) ; CHANGED
(setf wc (+ 2 wc)))
(t
(incf wc)))))
(message (format "%d words in %s." wc
(if mark-active "region" "buffer")))))
#+end_src
Yours,
Christian
On 2/16/11 4:51 AM, Paul Sexton wrote:
> I am trying to reduce the word count in a document I am writing. The
> existing word count functionality for emacs is surprisingly lacking.
> I wanted a word count function for org mode which excluded tables and
> comments, and ended up writing one myself.
>
> If this function is called with a region highlighted, it counts the words in
> the region. Otherwise it counts words in the whole buffer.
>
> It ignores commented lines and tables. LaTeX-style macros such as
> \foo{bar,baz} are counted as 1 word, as a compromise (more often than not
> they should count as 0, but they do sometimes expand to 1 or more words
> in the final document).
>
> Limitations:
> - Does not ignore BEGIN_SRC/END_SRC or inline src_* blocks (babel).
> Should be easy enough to add however.
> - There is probably a better way of identifying latex macros
> than my 'latex-macro-regexp' below.
> - Ignores all org links. I couldn't figure out how to extract "description"
> text from links, but I didn't look very hard.
>
> Improvements welcome.
>
> Paul
>
> ------------------------------------------------------------------------
>
> (defun in-comment-p ()
> "Return non-nil if point is in a comment."
> (if (or (null comment-start-skip)
> (eq (preceding-char) ?\r))
> nil
> (save-excursion
> (let ((pos (point)))
> (re-search-backward "^\\|\r" nil t)
> (or (looking-at comment-start-skip)
> (re-search-forward comment-start-skip pos t))))))
>
> (defun in-org-table-p ()
> "Return non-nil if point is in an org-mode table."
> (if (or (not (boundp 'org-table-any-line-regexp))
> (null org-table-any-line-regexp)
> (eq (preceding-char) ?\r))
> nil
> (save-excursion
> (let ((pos (point)))
> (re-search-backward "^\\|\r" nil t)
> (looking-at org-table-any-line-regexp)))))
>
>
> (defvar latex-macro-regexp "\\\\[A-Za-z]+\\(\\[[^]]*\\]\\|\\){\\([^}]*\\)}")
>
>
> (defun org-word-count (beg end)
> (interactive "r")
> (unless mark-active
> (setf beg (point-min)
> end (point-max)))
> (let ((wc 0))
> (save-excursion
> (goto-char beg)
> (while (< (point) end)
> (re-search-forward "\\w+\\W*")
> (cond
> ((or (in-comment-p) (in-org-table-p))
> nil)
> ((looking-at org-any-link-re)
> (goto-char (match-end 0)))
> ((save-excursion
> (backward-char)
> (looking-at latex-macro-regexp))
> (goto-char (match-end 0))
> (setf wc (+ 2 wc)))
> (t
> (incf wc)))))
> (message (format "%d words in %s." wc
> (if mark-active "region" "buffer")))))
>
>
>
> _______________________________________________
> Emacs-orgmode mailing list
> Please use `Reply All' to send replies to the list.
> Emacs-orgmode@gnu.org
> http://lists.gnu.org/mailman/listinfo/emacs-orgmode
>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Context-sensitive word count in org mode (elisp)
2011-02-16 9:45 ` Christian Moe
@ 2011-02-16 20:34 ` Paul Sexton
2011-02-17 10:02 ` Christian Moe
0 siblings, 1 reply; 10+ messages in thread
From: Paul Sexton @ 2011-02-16 20:34 UTC (permalink / raw)
To: emacs-orgmode
Christian Moe <mail <at> christianmoe.com> writes:
>
> Forgot to add the code.
>
> #+begin_src emacs-lisp
> ;; Adapted from code posted by Paul Sexton <2011-02-16 Wed 4:51am>
> ;; - Everything now contained in one function
> ;; - Will count correct number of words inside Latex macro
>
> (defun org-word-count (beg end)
> (interactive "r")
> (unless mark-active
> (setf beg (point-min)
> end (point-max)))
> (let ((wc 0)
> (latex-macro-regexp "\\\\[A-Za-z]+\\(\\[[^]]*\\]\\|
> \\){\\([^}]*\\)}")) ; CHANGED
> (save-excursion
> (goto-char beg)
> (while (< (point) end)
> (re-search-forward "\\w+\\W*")
> (cond
> ((or (org-in-commented-line) (org-at-table-p)) ; CHANGED
> nil)
> ((looking-at org-any-link-re)
> (goto-char (match-end 0)))
> ((save-excursion
> (backward-char)
> (looking-at latex-macro-regexp))
> (goto-char (match-beginning 2)) ; CHANGED
> (setf wc (+ 2 wc)))
> (t
> (incf wc)))))
> (message (format "%d words in %s." wc
> (if mark-active "region" "buffer")))))
> #+end_src
>
Thanks, I wasn't aware of those pre-existing functions.
I don't agree with changing '(match-end 0)' to '(match-beginning 2)'
however. For most latex macros, I don't want to count the words inside
the macro's arguments. For example, I don't want the next of footnotes
to be included in the word count. However others differ, and there will
always be cases where one DOES want to count the macro arguments - so maybe
org-word-count should do this optionally.
Paul
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Re: Context-sensitive word count in org mode (elisp)
2011-02-16 20:34 ` Paul Sexton
@ 2011-02-17 10:02 ` Christian Moe
2011-02-17 18:57 ` Eric Schulte
0 siblings, 1 reply; 10+ messages in thread
From: Christian Moe @ 2011-02-17 10:02 UTC (permalink / raw)
To: Paul Sexton; +Cc: emacs-orgmode
Hi,
> I don't agree with changing '(match-end 0)' to '(match-beginning 2)'
> however. For most latex macros, I don't want to count the words inside
> the macro's arguments.
Maybe you're right. I rarely use latex, so I can't think of a use case
off the top of my head. Default no, optional yes sounds good.
Yours,
Christian
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Re: Context-sensitive word count in org mode (elisp)
2011-02-17 10:02 ` Christian Moe
@ 2011-02-17 18:57 ` Eric Schulte
0 siblings, 0 replies; 10+ messages in thread
From: Eric Schulte @ 2011-02-17 18:57 UTC (permalink / raw)
To: mail; +Cc: emacs-orgmode, Paul Sexton
Christian Moe <mail@christianmoe.com> writes:
> Hi,
>
>> I don't agree with changing '(match-end 0)' to '(match-beginning 2)'
>> however. For most latex macros, I don't want to count the words inside
>> the macro's arguments.
>
> Maybe you're right. I rarely use latex, so I can't think of a use case
> off the top of my head. Default no, optional yes sounds good.
>
I find the following handy for word-count of latex documents...
#+begin_src sh :var basename=(file-name-sans-extension (buffer-file-name))
pdftotext $basename.pdf - |wc -w
#+end_src
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Context-sensitive word count in org mode (elisp)
2011-02-16 3:51 Paul Sexton
2011-02-16 9:45 ` Christian Moe
@ 2011-02-16 23:28 ` Paul Sexton
2011-02-17 16:50 ` Samuel Wales
2011-02-18 14:34 ` Bastien
1 sibling, 2 replies; 10+ messages in thread
From: Paul Sexton @ 2011-02-16 23:28 UTC (permalink / raw)
To: emacs-orgmode
Thanks for all the suggestions. Here is version 2.
Improvements:
- ignores source code blocks
- ignores tags and TODO keywords in headings
- ignores footnotes by default (option to force counting them)
- skips any sections tagged as not for export
- option to count words in latex macro arguments (they are ignored
by default)
I would still like to count hyperlink descriptions but am not sure
how -- is there a function that fetches the description of the hyperlink
at point?
Paul
-----------------------------------------------------------------------
(defun org-word-count (beg end
&optional count-latex-macro-args?
count-footnotes?)
"Report the number of words in the Org mode buffer or selected region.
Ignores:
- comments
- tables
- source code blocks (#+BEGIN_SRC ... #+END_SRC, and inline blocks)
- hyperlinks
- tags, priorities, and TODO keywords in headers
- sections tagged as 'not for export'.
The text of footnote definitions is ignored, unless the optional argument
COUNT-FOOTNOTES? is non-nil.
If the optional argument COUNT-LATEX-MACRO-ARGS? is non-nil, the word count
includes LaTeX macro arguments (the material between {curly braces}).
Otherwise, and by default, every LaTeX macro counts as 1 word regardless
of its arguments."
(interactive "r")
(unless mark-active
(setf beg (point-min)
end (point-max)))
(let ((wc 0)
(latex-macro-regexp "\\\\[A-Za-z]+\\(\\[[^]]*\\]\\|\\){\\([^}]*\\)}"))
(save-excursion
(goto-char beg)
(while (< (point) end)
(re-search-forward "\\w+\\W*")
(cond
;; Ignore comments.
((or (org-in-commented-line) (org-at-table-p))
nil)
;; Ignore hyperlinks.
;; TODO need to count text of the link's description.
((looking-at org-any-link-re)
(goto-char (match-end 0)))
;; Ignore source code blocks.
((org-in-regexps-block-p "^#\\+BEGIN_SRC\\W" "^#\\+END_SRC\\W")
nil)
;; Ignore inline source blocks, counting them as 1 word.
((save-excursion
(backward-char)
(looking-at org-babel-inline-src-block-regexp))
(goto-char (match-end 0))
(setf wc (+ 2 wc)))
;; Count latex macros as 1 word, ignoring their arguments.
((save-excursion
(backward-char)
(looking-at latex-macro-regexp))
(goto-char (if count-latex-macro-args?
(match-beginning 2)
(match-end 0)))
(setf wc (+ 2 wc)))
;; Ignore footnotes.
((and (not count-footnotes?)
(or (org-footnote-at-definition-p)
(org-footnote-at-reference-p)))
nil)
(t
(let ((contexts (org-context)))
(cond
;; Ignore tags and TODO keywords, etc.
((or (assoc :todo-keyword contexts)
(assoc :priority contexts)
(assoc :keyword contexts)
(assoc :checkbox contexts))
nil)
;; Ignore sections marked with tags that are
;; excluded from export.
((assoc :tags contexts)
(if (intersection (org-get-tags-at) org-export-exclude-tags
:test 'equal)
(org-forward-same-level 1)
nil))
(t
(incf wc))))))))
(message (format "%d words in %s." wc
(if mark-active "region" "buffer")))))
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Re: Context-sensitive word count in org mode (elisp)
2011-02-16 23:28 ` Paul Sexton
@ 2011-02-17 16:50 ` Samuel Wales
2011-02-18 14:34 ` Bastien
1 sibling, 0 replies; 10+ messages in thread
From: Samuel Wales @ 2011-02-17 16:50 UTC (permalink / raw)
To: Paul Sexton; +Cc: emacs-orgmode
This looks great.
How hard do you think it would be to show how many words there are for
every subtree in a similar way to how clock durations are summed in
c-c c-x c-d and displayed in the subtree itself in reverse video?
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Re: Context-sensitive word count in org mode (elisp)
2011-02-16 23:28 ` Paul Sexton
2011-02-17 16:50 ` Samuel Wales
@ 2011-02-18 14:34 ` Bastien
2011-02-20 21:49 ` Paul Sexton
[not found] ` <4D601314.8000701@xnet.co.nz>
1 sibling, 2 replies; 10+ messages in thread
From: Bastien @ 2011-02-18 14:34 UTC (permalink / raw)
To: Paul Sexton; +Cc: emacs-orgmode
Hi Paul,
Paul Sexton <psexton@xnet.co.nz> writes:
> Thanks for all the suggestions. Here is version 2.
Thanks for that. Maybe you could put if on Worg/org-hacks.org?
> I would still like to count hyperlink descriptions but am not sure
> how -- is there a function that fetches the description of the hyperlink
> at point?
#+begin_src emacs-lisp
(when (looking-at org-bracket-link-analytic-regexp)
(match-string-no-properties 5))
#+end_src emacs-lisp
HTH,
--
Bastien
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Context-sensitive word count in org mode (elisp)
2011-02-18 14:34 ` Bastien
@ 2011-02-20 21:49 ` Paul Sexton
2011-02-21 23:30 ` Samuel Wales
[not found] ` <4D601314.8000701@xnet.co.nz>
1 sibling, 1 reply; 10+ messages in thread
From: Paul Sexton @ 2011-02-20 21:49 UTC (permalink / raw)
To: emacs-orgmode
Bastien <bastien.guerry <at> wikimedia.fr> writes:
> #+begin_src emacs-lisp
> (when (looking-at org-bracket-link-analytic-regexp)
> (match-string-no-properties 5))
> #+end_src emacs-lisp
Thanks. Here is version 3 if the function, which is now able to count
words in link descriptions.
The code to advance to the next word has been moved to the end of the
loop, which improves accuracy.
Paul
----------------------------------------------------------------------
(defun org-word-count (beg end
&optional count-latex-macro-args?
count-footnotes?)
"Report the number of words in the Org mode buffer or selected region.
Ignores:
- comments
- tables
- source code blocks (#+BEGIN_SRC ... #+END_SRC, and inline blocks)
- hyperlinks (but does count words in hyperlink descriptions)
- tags, priorities, and TODO keywords in headers
- sections tagged as 'not for export'.
The text of footnote definitions is ignored, unless the optional argument
COUNT-FOOTNOTES? is non-nil.
If the optional argument COUNT-LATEX-MACRO-ARGS? is non-nil, the word count
includes LaTeX macro arguments (the material between {curly braces}).
Otherwise, and by default, every LaTeX macro counts as 1 word regardless
of its arguments."
(interactive "r")
(unless mark-active
(setf beg (point-min)
end (point-max)))
(let ((wc 0)
(latex-macro-regexp "\\\\[A-Za-z]+\\(\\[[^]]*\\]\\|\\){\\([^}]*\\)}"))
(save-excursion
(goto-char beg)
(while (< (point) end)
(cond
;; Ignore comments.
((or (org-in-commented-line) (org-at-table-p))
nil)
;; Ignore hyperlinks. But if link has a description, count
;; the words within the description.
((looking-at org-bracket-link-analytic-regexp)
(when (match-string-no-properties 5)
(let ((desc (match-string-no-properties 5)))
(save-match-data
(incf wc (length (remove "" (org-split-string
desc "\\W")))))))
(goto-char (match-end 0)))
((looking-at org-any-link-re)
(goto-char (match-end 0)))
;; Ignore source code blocks.
((org-in-regexps-block-p "^#\\+BEGIN_SRC\\W" "^#\\+END_SRC\\W")
nil)
;; Ignore inline source blocks, counting them as 1 word.
((save-excursion
(backward-char)
(looking-at org-babel-inline-src-block-regexp))
(goto-char (match-end 0))
(setf wc (+ 2 wc)))
;; Count latex macros as 1 word, ignoring their arguments.
((save-excursion
(backward-char)
(looking-at latex-macro-regexp))
(goto-char (if count-latex-macro-args?
(match-beginning 2)
(match-end 0)))
(setf wc (+ 2 wc)))
;; Ignore footnotes.
((and (not count-footnotes?)
(or (org-footnote-at-definition-p)
(org-footnote-at-reference-p)))
nil)
(t
(let ((contexts (org-context)))
(cond
;; Ignore tags and TODO keywords, etc.
((or (assoc :todo-keyword contexts)
(assoc :priority contexts)
(assoc :keyword contexts)
(assoc :checkbox contexts))
nil)
;; Ignore sections marked with tags that are
;; excluded from export.
((assoc :tags contexts)
(if (intersection (org-get-tags-at) org-export-exclude-tags
:test 'equal)
(org-forward-same-level 1)
nil))
(t
(incf wc))))))
(re-search-forward "\\w+\\W*")))
(message (format "%d words in %s." wc
(if mark-active "region" "buffer")))))
^ permalink raw reply [flat|nested] 10+ messages in thread
[parent not found: <4D601314.8000701@xnet.co.nz>]
* Re: Re: Context-sensitive word count in org mode (elisp)
[not found] ` <4D601314.8000701@xnet.co.nz>
@ 2011-02-22 11:28 ` Bastien
0 siblings, 0 replies; 10+ messages in thread
From: Bastien @ 2011-02-22 11:28 UTC (permalink / raw)
To: Paul Sexton; +Cc: emacs-orgmode
Hi Paul,
Paul Sexton <psexton@xnet.co.nz> writes:
> The process of contributing to Worg is too complicated for me.
You can clone the Worg.git repository with
$ git clone git://repo.or.cz/Worg.git
Then make some edit on Worg.
Then commit your changes with
$ git commit -a -m "Your message"
Then create a patch with
$ git format-patch master
Then send the patch.
> If someone else wants to add it somewhere on my behalf that is fine
> with me.
I added a section "Count words in an Org buffer" with your code in
Worg/org-hacks.org. Thanks.
> Have you thought of making Worg a wiki?
> Wikis hosted on github can be written in org-mode format.
Yes, but this format is limited. I think it's good to have all Org's
functionalities in Worg, not only the one needed for publication. In
particular, org-issues.org contains lots of information that we want
to handle locally, not just for publishing.
Best,
--
Bastien
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2011-02-22 11:29 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-02-16 16:22 Context-sensitive word count in org mode (elisp) Benjamin Beckwith
2011-02-16 16:32 ` Bastien
2011-02-16 18:56 ` Benjamin Beckwith
2011-02-16 23:31 ` Paul Sexton
-- strict thread matches above, loose matches on Subject: below --
2011-02-16 3:51 Paul Sexton
2011-02-16 9:45 ` Christian Moe
2011-02-16 20:34 ` Paul Sexton
2011-02-17 10:02 ` Christian Moe
2011-02-17 18:57 ` Eric Schulte
2011-02-16 23:28 ` Paul Sexton
2011-02-17 16:50 ` Samuel Wales
2011-02-18 14:34 ` Bastien
2011-02-20 21:49 ` Paul Sexton
2011-02-21 23:30 ` Samuel Wales
[not found] ` <4D601314.8000701@xnet.co.nz>
2011-02-22 11:28 ` Bastien
Code repositories for project(s) associated with this public inbox
https://git.savannah.gnu.org/cgit/emacs/org-mode.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).