emacs-orgmode@gnu.org archives
 help / color / mirror / code / Atom feed
* html to org-mode
@ 2014-01-04  2:40 John Kitchin
  2014-01-04  4:56 ` Feng Shu
  0 siblings, 1 reply; 5+ messages in thread
From: John Kitchin @ 2014-01-04  2:40 UTC (permalink / raw)
  To: emacs-orgmode@gnu.org

[-- Attachment #1: Type: text/plain, Size: 2151 bytes --]

Hi everyone,

I was playing around with org-rss today, and it is pretty cool. I would
like to customize the way the subheading bodies look though, primarily to
unescape some html  things like <,  to get rid of all the html tags,
convert <a ..> to org-mode links, to download <img ...> so they can be
displayed, etc...

for example a body of an rss entry looks like:

     <title>Philip Herron: Cython Book</title>     <guid>
http://redbrain.co.uk/?p=147</guid>     <link>
http://redbrain.co.uk/cython-book/</link>     <description><p>Hey all i
thought i should really share that i actually wrote a book on Cython. The
book has detailed examples and even shows you how you can extend native
C/C++ applications in python by doing it for Tmux. <a href="
http://bit.ly/195ahQs">http://bit.ly/195ahQs</a></p> <p><a href="
http://redbrain.co.uk/wp-content/uploads/2013/12/photo.jpg"><img
class="aligncenter size-full wp-image-148" alt="photo" src="
http://redbrain.co.uk/wp-content/uploads/2013/12/photo.jpg" width="640"
height="480" /></a>The code can be found: <a href="
https://github.com/redbrain/cython-book">
https://github.com/redbrain/cython-book</a></p></description>
<pubDate>Tue, 10 Dec 2013 14:45:08 +0000</pubDate>

I would like this simplified to something like:
Philip Herron: Cython Book

http://redbrain.co.uk/?p=147

http://redbrain.co.uk/cython-book/
Hey all i thought i should really share that i actually wrote a book on
Cython. The book has detailed examples and even shows you how you can
extend native C/C++ applications in python by doing it for Tmux.
http://bit.ly/195ahQs

[[feed-images/photo.jpg]]

The code can be found: https://github.com/redbrain/cython-book

basically, get the html code as close to org as reasonable. i found a way
to get an html parse tree (libxml-parse-html-region start end), but I can't
figure out how to convert that to the text I want.

Has anyone done anything like this?

John

-----------------------------------
John Kitchin
Associate Professor
Doherty Hall A207F
Department of Chemical Engineering
Carnegie Mellon University
Pittsburgh, PA 15213
412-268-7803
http://kitchingroup.cheme.cmu.edu

[-- Attachment #2: Type: text/html, Size: 3288 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2014-01-04 13:48 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-01-04  2:40 html to org-mode John Kitchin
2014-01-04  4:56 ` Feng Shu
2014-01-04  6:22   ` York Zhao
2014-01-04 10:54     ` Bastien
2014-01-04 13:48       ` John Kitchin

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).