emacs-orgmode@gnu.org archives
 help / color / mirror / code / Atom feed
* building tagcloud datastructure in elisp
@ 2012-09-12 18:41 Marcelo de Moraes Serpa
  2012-09-12 18:58 ` Eric Schulte
  2012-09-12 23:34 ` Jonathan Leech-Pepin
  0 siblings, 2 replies; 3+ messages in thread
From: Marcelo de Moraes Serpa @ 2012-09-12 18:41 UTC (permalink / raw)
  To: Org Mode

[-- Attachment #1: Type: text/plain, Size: 449 bytes --]

Hi list,

How hard would it be to parse a bunch of org files and build an elisp data
structure (Hash?) that represents a tagcloud? All tags in all headlines and
subtrees should be taken into account (for all org files that are parsed).
Could I use org-element to help me parse this or is there a better way?

I'm just learning the org API, and I've only done a bunch of elisp hacks,
so any insight would be greatly appreciated!

Thanks,

- Marcelo.

[-- Attachment #2: Type: text/html, Size: 556 bytes --]

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: building tagcloud datastructure in elisp
  2012-09-12 18:41 building tagcloud datastructure in elisp Marcelo de Moraes Serpa
@ 2012-09-12 18:58 ` Eric Schulte
  2012-09-12 23:34 ` Jonathan Leech-Pepin
  1 sibling, 0 replies; 3+ messages in thread
From: Eric Schulte @ 2012-09-12 18:58 UTC (permalink / raw)
  To: Marcelo de Moraes Serpa; +Cc: Org Mode

Marcelo de Moraes Serpa <celoserpa@gmail.com> writes:

> Hi list,
>
> How hard would it be to parse a bunch of org files and build an elisp data
> structure (Hash?) that represents a tagcloud? All tags in all headlines and
> subtrees should be taken into account (for all org files that are parsed).
> Could I use org-element to help me parse this or is there a better way?
>
> I'm just learning the org API, and I've only done a bunch of elisp hacks,
> so any insight would be greatly appreciated!
>
> Thanks,
>
> - Marcelo.

My favorite method of getting word frequencies from text files is the
following.  Sometimes it is easier to just Org-mode files as text files
rather than to use e-lisp.

    # -*- shell-script -*-
    many=20 # to print the 20 most popular words
    cat org-file.org \
        |tr -cs A-Za-z '\n' \
        |tr A-Z a-z \
        |sort \
        |uniq -c \
        |sort -rn \
        |sed ${many}q \
        |sed 's/^ *//' \
        |sed 's/\([^ ]*\) \([^ ]*\)/\2:\1/' \
        |tr '\n' ' ' \
        |sed 's/ $/\n/'

Adapted from http://www.leancrew.com/all-this/2011/12/more-shell-less-egg/

Best,

-- 
Eric Schulte
http://cs.unm.edu/~eschulte

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: building tagcloud datastructure in elisp
  2012-09-12 18:41 building tagcloud datastructure in elisp Marcelo de Moraes Serpa
  2012-09-12 18:58 ` Eric Schulte
@ 2012-09-12 23:34 ` Jonathan Leech-Pepin
  1 sibling, 0 replies; 3+ messages in thread
From: Jonathan Leech-Pepin @ 2012-09-12 23:34 UTC (permalink / raw)
  To: Marcelo de Moraes Serpa; +Cc: Org Mode

Hello Marcello,

On 12 September 2012 14:41, Marcelo de Moraes Serpa <celoserpa@gmail.com> wrote:
> Hi list,
>
> How hard would it be to parse a bunch of org files and build an elisp data
> structure (Hash?) that represents a tagcloud? All tags in all headlines and
> subtrees should be taken into account (for all org files that are parsed).
> Could I use org-element to help me parse this or is there a better way?
>
> I'm just learning the org API, and I've only done a bunch of elisp hacks, so
> any insight would be greatly appreciated!

I'm learning as well, mostly by providing a feature I could use, or by
seeing a problem I find interesting and deciding I want to find a
solution to it.

> Thanks,
>
> - Marcelo.

Org-element doesn't seem to include tag-inheritance when providing
tags for a given headline, so counting inherited tags becomes slightly
more complex.

The following should provide what you want:

#+begin_src emacs-lisp
  (defun zin/org-tag-cloud-freq (&optional inherit file)
    "Return an alist containing tag and frequency.

  When INHERIT is given, the frequency of a tag includes the number
  of subheadings (to indicate tag inheritance).  FILE allows for an
  arbitrary file to be retrieved and used for tag counting."
    (interactive "P")
    (when file
      (find-file file))
    (let* ((source (org-element-parse-buffer 'headline))
           (tags (org-element-map
                  source 'headline
                  (lambda (headline)
                    (let ((tags (org-export-get-tags headline source))
                          (count (if inherit
                                     (length (org-element-map headline
'headline 'identity))
                                   1)))
                      (list tags count)))))
           taglist)
      (setq taglist
            (mapcar (lambda (s)
                      (when (car s)
                        (loop for item in (car s) collect
                              (list item (cadr s))))) tags))
      (setq taglist
            (loop for item in taglist append item))
      (dolist (tag taglist result)
        (let* ((tagitem (car tag))
               (tagcount (cadr tag))
               (sofar (assoc tagitem result)))
          (if sofar
              (setcdr sofar (+ tagcount (cdr sofar)))
            (push (cons tagitem tagcount) result))))
      (format "%s" result)))

  (defun zin/org-tag-freq-list (files &optional inherit)
    "List of files to be processed by `zin/org-tag-cloud-freq'.

  Returns a single alist of tag counts."
    (let (result)
      (dolist (file files result)
        (let ((entries (zin/org-tag-cloud-freq inherit file)))
          (loop for tag in entries do
                (let ((tagitem (car tag))
                      (tagcount (cdr tag))
                      (sofar (assoc tagitem result)))
                  (if sofar
                      (setcdr sofar (+ tagcount (cdr sofar)))
                    (push (cons tagitem tagcount) result))))))
      (format "%s" result)))
#+end_src

The dolist loop for counting the tags themselves comes from
http://stackoverflow.com/questions/6050033/elegant-way-to-count-items.
There may be a cleaner way to obtain the list of tags and associated
counts but this provides the values.

The first function will work on any Org buffer to return the list of
tags while the second will do so for a list of org files (for example
org-agenda-files).

I hope this helps

Regards,

--
Jon

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2012-09-12 23:34 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-09-12 18:41 building tagcloud datastructure in elisp Marcelo de Moraes Serpa
2012-09-12 18:58 ` Eric Schulte
2012-09-12 23:34 ` Jonathan Leech-Pepin

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).