I've had this post flagged since it first appeared and I've been intending to read it and comment on it fully. Since I appear not to be getting around to it, I thought I'd make a brief post for now. I haven't read your proposal fully but I've implemented some code that might be relevant or a related cousin. I use my cell phone's voice memo capability as a part of my GTD inbox collection points. I record brief voice memos when something occurs to me, they are saved to my phone's microSD card. Later I export the voice memos to a special "in" folder on my laptop. Then I review the memos, refile them next to whatever org-mode file they apply to, create any new headlines/TODOs if appropriate, and transcribe them into special notes on the headline. To make all this very fast and efficient, I've written a library called transcribe.el. I've attached it. I start by populating the bongo playlist buffer with all the memo files from my in folder (i f "~/in/phone/sounds/*.qcp") and I play the first file. The transcribe.el library provides a global key binding to a command for moving the currently playing file. I use that keystroke once I've heard enough of the memo to refile it to an appropriate location. As such, the contents of the "in" folder continually represent the memos that have not yet been processed even if I'm interrupted. I create any appropriate headlines/TODOs for the memo. Then I use the org-add-transcription command bound to "C-c v z" to add a special kind of note to the headline/TODO. The note is pre-populated with a link to the memo file and a timestamp for the time the memo was taken if supported (see below). I transcribe the memo using a global key binding for bongo-seek, "C-c v s", as necessary. When I'm done, I save the note and use bong-seek again to advance to the next memo. Then I repeat this "move, add headlines, transcribe note" cycle until I'm done. With this approach, I can process my voice memos moving freely around my org-mode buffers as appropriate and without having to switch to any bongo buffers, and doing everything from key bindings. As such, the only context switches I have to do are directly related to the contexts of the voice memos themselves. I find it works quite well for me. The memo's are *.qcp files in Qualcomm's PureVoice format. The transcribe.el library includes a bongo backend to play the PureVoice filed using Qualcomm's pvconv converter: http://www.qctconnect.com/products/purevoice_downloads.html The backend converts the files to *.wav files next to the original *.qcp files and plays the *.wav files. The pvconv converter is pretty fast, but even so long *.qcp recordings can take a couple seconds to convert before bongo can start playing the file. If someone wants to work out how to convert the *.qcp file asynchronously so that bongo can start playing the *.wav before pvconv is finished, that would be great. The *.qcp files are so much smaller than the converted *.wav files, so the backend deletes the *.wav file once it stops playing the file. Phones using Qualcomm's PureVoice memos embed a timestamp into the filename of the memo. Currently transcribe.el can extract this timestamp for use in the transcription note. I'd be interested in contributions for extracting timestamps from voice memos that do it differently. I'd like to hear any thoughts on this, if this can/should be integrated with your concept of a conversation manager, or even independent of that. I also hope to review your proposal more thoroughly in the near future. Ross