emacs-orgmode@gnu.org archives
 help / color / mirror / code / Atom feed
* Email -> Org-mode: charset problem
@ 2011-06-27  8:30 Karl Voit
  2011-06-27 20:04 ` David Maus
  0 siblings, 1 reply; 4+ messages in thread
From: Karl Voit @ 2011-06-27  8:30 UTC (permalink / raw)
  To: emacs-orgmode

Hi!

Is there somebody who managed to develop an email to Org-mode bridge
without having charset problems? (No, I do not use Emacs as a
MUA[4])

    Disclaimer: This is not directly related to Org-mode but I guess
    there are people here with the very same problem.

I am using a procmail[1] with an entry to forward emails containing
a keyword in the subject into my inbox.org (where also MobileOrg[2]
entries are written to).

This is quite handy since some tasks arrive at emails and I want to
capture this everywhere (smartphone, webmail, ...) using a simply
email forward.

Unfortunately I get weird stuff like uuencoded things, UTF-8 in
ASCII, ... which messes up my inbox.org and I am not able to read it
afterwards.[5]

I guess this is because procmail and formail[3] – the tools I am
using to extract mail infos to append to the org-file – are
7-bit-only or similar.

If this is the case, I guess I'll have to find a different approach
for this purpose.

Any ideas?

  1. http://www.procmail.org/
  2. https://github.com/matburt/mobileorg-android/wiki/
  3. http://www.linuxcommand.org/man_pages/formail1.html
  4. http://en.wikipedia.org/wiki/Mail_user_agent
  5. I can give examples if you want.
-- 
Karl Voit

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Email -> Org-mode: charset problem
  2011-06-27  8:30 Email -> Org-mode: charset problem Karl Voit
@ 2011-06-27 20:04 ` David Maus
  2011-06-28  4:43   ` Richard Lawrence
  2011-06-28 17:13   ` Capture from GNU screen to Org-mode (was: Email -> Org-mode: charset problem) Karl Voit
  0 siblings, 2 replies; 4+ messages in thread
From: David Maus @ 2011-06-27 20:04 UTC (permalink / raw)
  To: news1142, Karl Voit; +Cc: emacs-orgmode

[-- Attachment #1: Type: text/plain, Size: 2997 bytes --]

At Mon, 27 Jun 2011 10:30:06 +0200,
Karl Voit wrote:
> 
> Hi!
> 
> Is there somebody who managed to develop an email to Org-mode bridge
> without having charset problems? (No, I do not use Emacs as a
> MUA[4])
> 
>     Disclaimer: This is not directly related to Org-mode but I guess
>     there are people here with the very same problem.
> 
> I am using a procmail[1] with an entry to forward emails containing
> a keyword in the subject into my inbox.org (where also MobileOrg[2]
> entries are written to).
> 
> This is quite handy since some tasks arrive at emails and I want to
> capture this everywhere (smartphone, webmail, ...) using a simply
> email forward.
> 
> Unfortunately I get weird stuff like uuencoded things, UTF-8 in
> ASCII, ... which messes up my inbox.org and I am not able to read it
> afterwards.[5]
> 
> I guess this is because procmail and formail[3] – the tools I am
> using to extract mail infos to append to the org-file – are
> 7-bit-only or similar.
>
> If this is the case, I guess I'll have to find a different approach
> for this purpose.
> 
> Any ideas?


As far as I know a lot, if not most, email messages are transfered in
7bit for backward compatibility reasons and getting things other than
text/plain; charset=us-ascii across the net via email requires a lot
of things to consider (RFC2045-49 in all their glory). And if you
receive message from others, their mua might get things wrong and you
end up with things like utf-8 in ascii.

So dumping the raw message won't help, you need functions for
processing the raw message informaton.

Here's an idea (or two).

I do use an Emacs base mua (Wanderlust) and recently started to file
reference notes for certain messages. Because in cases I want to keept
the message (e.g. an interesting usenet post) I save the raw message
to a temporary file and add it as an attachment. This works nice,
although I'd really like to have an automated way of save+attache when
capturing.

So, what about org-capture? You could use a script that serializes the
raw message to disk and -- somehow -- calls capture with a reference
to this file. Emacs opens the mail file and extracts information
required for the template using the build-in MIME libaries (mml-*
IIRC) which are capable of correctly parsing and if necessary decoding
(e.g. quoted printable encoded letters in the subject). After
extracting the information Emacs creates the appropriate capture
entry, somehow adds the message file as attachment and finishes the
capture process.

-or-

Don't dump the message in the org buffer "as is", but wrap it in a
#+begin_src <MIME MODE> where MIME-MODE is the build in mode for
decoding MIME messages. You could then view the original message with
C-c ' -- only task left is dealing with the subject line for the entry
headline.

Best,
  -- David
-- 
OpenPGP... 0x99ADB83B5A4478E6
Jabber.... dmjena@jabber.org
Email..... dmaus@ictsoc.de

[-- Attachment #2: Type: application/pgp-signature, Size: 230 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Email -> Org-mode: charset problem
  2011-06-27 20:04 ` David Maus
@ 2011-06-28  4:43   ` Richard Lawrence
  2011-06-28 17:13   ` Capture from GNU screen to Org-mode (was: Email -> Org-mode: charset problem) Karl Voit
  1 sibling, 0 replies; 4+ messages in thread
From: Richard Lawrence @ 2011-06-28  4:43 UTC (permalink / raw)
  To: emacs-orgmode


> At Mon, 27 Jun 2011 10:30:06 +0200,
> Karl Voit wrote:
>> Is there somebody who managed to develop an email to Org-mode bridge
>> without having charset problems? (No, I do not use Emacs as a
>> MUA[4])

David Maus <dmaus@ictsoc.de> writes:
> I do use an Emacs base mua (Wanderlust) and recently started to file
> reference notes for certain messages. Because in cases I want to keept
> the message (e.g. an interesting usenet post) I save the raw message
> to a temporary file and add it as an attachment. This works nice,
> although I'd really like to have an automated way of save+attache when
> capturing.
>
> So, what about org-capture? You could use a script that serializes the
> raw message to disk and -- somehow -- calls capture with a reference
> to this file. Emacs opens the mail file and extracts information
> required for the template using the build-in MIME libaries (mml-*
> IIRC) which are capable of correctly parsing and if necessary decoding
> (e.g. quoted printable encoded letters in the subject). After
> extracting the information Emacs creates the appropriate capture
> entry, somehow adds the message file as attachment and finishes the
> capture process.

I do something like this, though I also use Emacs as an MUA (Gnus).  

My approach was to write a little parser that binds the various parts of
a message in a pre-determined format to variables that I grab in an Org
capture template, and dynamically scope those variables into a call to
org-capture.  I have included my code below.

But to address your original question:  unfortunately, I do not have a
solution to the encoding problem.  Despite my "pre-determined format",
the emails I got (student paper submissions) were often in strange
encodings that I ended up fixing by hand during the capture process.

However, as David points out, I believe that it would be possible to use
Emacs' MIME libraries to do the dirty work.  I just haven't had time (or
desire, honestly) to figure out how.  If you figure it out, though, I'd
love to hear about it!

My (quite simplistic) code is below.  I hope you find it useful if you
decide to go this route!

Best,
Richard


;;;; simple working example: Gnus side
; handling paper submissions
(defun capture-paper-part ()
  "Call parse-paper and capture the result using org-capture"
  (interactive)
  (let ((paper (parse-paper *phil100-paper-header-re*)))
    (if paper
	  (org-capture nil "ap") ; "ap" is Org-capture key combo for template shown below
      (message "No paper found in this part"))))

(if (not (boundp 'gnus-mime-action-alist))
    (setq gnus-mime-action-alist '()))     
(add-to-list 'gnus-mime-action-alist '("capture paper" . capture-paper-part))

(defconst *phil100-paper-header-re*
  "^\\(?1:.*\\)\n\\(?2:.*\\)\n\\(?3:[Pp]aper *\\(?31:[0-9]+\\).*\\)")

(defun parse-paper (header-re)
  "Return an alist representing a paper, by parsing the buffer using header-re"
  (save-excursion
    (goto-char (point-min))
    (if (re-search-forward header-re nil t)
	(list `(header . ,(or (match-string 0) ""))
	      `(author . ,(or (match-string 2) ""))
	      `(title . ,(or (match-string 1) ""))
	      `(date . ,(or (match-string 3) ""))
	      `(num . ,(or (match-string 31) ""))
	      `(body . ,(buffer-substring (match-end 0) (point-max))))
      nil)))

(defun paper-property (key)
  ; paper is dynamically scoped in by capture-paper-part
  (or (cdr (assoc key paper)) ""))
  
(defun string-replace-downcase (from to in)
  "Replace FROM with TO in string IN, and also downcase the result"
  (with-temp-buffer
    (insert in)
    (goto-char (point-min))
    (while (search-forward from nil t)
      (replace-match to nil t))
    (downcase-region (point-min) (point-max))
    (buffer-substring (point-min) (point-max))))

#### simple working example: Org capture template
** %(paper-property 'author)
   :PROPERTIES:
   :EXPORT_TITLE: %(paper-property 'title)
   :EXPORT_AUTHOR: %(paper-property 'author)
   :EXPORT_DATE: Paper %(paper-property 'num)
   :EXPORT_FILE_NAME: %(concat (paper-property 'num) "-" (string-replace-downcase " " "-" (paper-property 'author)) ".tex")
   :END:

%(paper-property 'body)

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Capture from GNU screen to Org-mode (was: Email -> Org-mode: charset problem)
  2011-06-27 20:04 ` David Maus
  2011-06-28  4:43   ` Richard Lawrence
@ 2011-06-28 17:13   ` Karl Voit
  1 sibling, 0 replies; 4+ messages in thread
From: Karl Voit @ 2011-06-28 17:13 UTC (permalink / raw)
  To: emacs-orgmode

* David Maus <dmaus@ictsoc.de> wrote:
>
> As far as I know a lot, if not most, email messages are transfered in
> 7bit for backward compatibility reasons and getting things other than
> text/plain; charset=3Dus-ascii across the net via email requires a lot
> of things to consider (RFC2045-49 in all their glory). 

Yes. The header is always 7-bit ASCII. The body has to be
interpreted by very advanced algorithms in order to show the various
types of content properly.

> So dumping the raw message won't help, you need functions for
> processing the raw message informaton.

I came to the very same conclusion. :-( I thought «formail» would be
able to handle charset for me.

> Here's an idea (or two).

Thanks for your ideas!

Those are perfectly good tips for someone who is using a MUA and
Org-mode on the same computer.

But my system is different since I am using «mutt»[1] as MUA
(running in a GNU screen[2] session) on a remote server (accessed by
ssh). 

Org-mode is (only) running on my local machine since the (very slow)
root server is an old Debian stable one (with emacs21 only).

My personal solution for the moment is following system:

Since I am using GNU screen which has the ability to dump its screen
content (non graphical!) into a file, I wrote a shell script that
generates an Org-mode entry and appends it to the input.org which
will be synchronized over all my systems anyway (by cron and
unison[3] although dvcs-autosync[4] seems to be very handy too for
building your own Dropbox[5] alternative, respectively).

,----[ *one* line from my .screenrc ]
| bind O eval "hardcopy_append off" "hardcopy
|   $HOME/screen-capture.tmp" "screen sh
|   $HOME/bin/add-screensnapshot-to-org.sh"
`----

... this one creates a screenshot when I press «Ctrl-a O» in GNU
screen. Then «$HOME/screen-capture.tmp» gets filled with the screen
content (which is the start of an email or a usenet posting[6] or
similar - whichever is shown currently in GNU screen).

Then following script generates the (simple) Org-mode entry:

,----[ $HOME/bin/add-screensnapshot-to-org.sh ]
| #!/bin/sh
| ## generates an org-file-entry from a screen capture file from GNU screen
|
| SCREENCONTENTFILE="$HOME/screen-capture.tmp"
| ORGFILE="$HOME/org-mode/inbox.org"
|
| ## not very elegant, I know:
| echo "* TODO screen "`date '+%Y-%m-%dT%H:%M'` >> ${ORGFILE}
| echo ":PROPERTIES:" >> ${ORGFILE}
| echo ":CREATED: <"`date '+%Y-%m-%d %a %H:%M'`">" >> ${ORGFILE}
| echo ":END:" >> ${ORGFILE}
| echo >> ${ORGFILE}
| echo "#+BEGIN_VERB" >> ${ORGFILE}
| cat "${SCREENCONTENTFILE}" >> ${ORGFILE}
| echo "#+END_VERB" >> ${ORGFILE}
| echo  >> ${ORGFILE}
|
| #end
`----

Probably this method could be handy for other users of GNU screen
and Org-mode too.

  1. http://www.mutt.org/
  2. https://secure.wikimedia.org/wikipedia/en/wiki/GNU_Screen
  3. http://www.cis.upenn.edu/~bcpierce/unison/
  4. http://www.mayrhofer.eu.org/dvcs-autosync
  5. https://www.dropbox.com/
  6. http://www.slrn.org/
-- 
Karl Voit

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2011-06-28 17:13 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-06-27  8:30 Email -> Org-mode: charset problem Karl Voit
2011-06-27 20:04 ` David Maus
2011-06-28  4:43   ` Richard Lawrence
2011-06-28 17:13   ` Capture from GNU screen to Org-mode (was: Email -> Org-mode: charset problem) Karl Voit

Code repositories for project(s) associated with this inbox:

	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).