emacs-orgmode@gnu.org archives
 help / color / mirror / code / Atom feed
* html to org-mode
@ 2014-01-04  2:40 John Kitchin
  2014-01-04  4:56 ` Feng Shu
  0 siblings, 1 reply; 5+ messages in thread
From: John Kitchin @ 2014-01-04  2:40 UTC (permalink / raw)
  To: emacs-orgmode@gnu.org

[-- Attachment #1: Type: text/plain, Size: 2151 bytes --]

Hi everyone,

I was playing around with org-rss today, and it is pretty cool. I would
like to customize the way the subheading bodies look though, primarily to
unescape some html  things like <,  to get rid of all the html tags,
convert <a ..> to org-mode links, to download <img ...> so they can be
displayed, etc...

for example a body of an rss entry looks like:

     <title>Philip Herron: Cython Book</title>     <guid>
http://redbrain.co.uk/?p=147</guid>     <link>
http://redbrain.co.uk/cython-book/</link>     <description><p>Hey all i
thought i should really share that i actually wrote a book on Cython. The
book has detailed examples and even shows you how you can extend native
C/C++ applications in python by doing it for Tmux. <a href="
http://bit.ly/195ahQs">http://bit.ly/195ahQs</a></p> <p><a href="
http://redbrain.co.uk/wp-content/uploads/2013/12/photo.jpg"><img
class="aligncenter size-full wp-image-148" alt="photo" src="
http://redbrain.co.uk/wp-content/uploads/2013/12/photo.jpg" width="640"
height="480" /></a>The code can be found: <a href="
https://github.com/redbrain/cython-book">
https://github.com/redbrain/cython-book</a></p></description>
<pubDate>Tue, 10 Dec 2013 14:45:08 +0000</pubDate>

I would like this simplified to something like:
Philip Herron: Cython Book

http://redbrain.co.uk/?p=147

http://redbrain.co.uk/cython-book/
Hey all i thought i should really share that i actually wrote a book on
Cython. The book has detailed examples and even shows you how you can
extend native C/C++ applications in python by doing it for Tmux.
http://bit.ly/195ahQs

[[feed-images/photo.jpg]]

The code can be found: https://github.com/redbrain/cython-book

basically, get the html code as close to org as reasonable. i found a way
to get an html parse tree (libxml-parse-html-region start end), but I can't
figure out how to convert that to the text I want.

Has anyone done anything like this?

John

-----------------------------------
John Kitchin
Associate Professor
Doherty Hall A207F
Department of Chemical Engineering
Carnegie Mellon University
Pittsburgh, PA 15213
412-268-7803
http://kitchingroup.cheme.cmu.edu

[-- Attachment #2: Type: text/html, Size: 3288 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: html to org-mode
  2014-01-04  2:40 html to org-mode John Kitchin
@ 2014-01-04  4:56 ` Feng Shu
  2014-01-04  6:22   ` York Zhao
  0 siblings, 1 reply; 5+ messages in thread
From: Feng Shu @ 2014-01-04  4:56 UTC (permalink / raw)
  To: emacs-orgmode

John Kitchin <jkitchin@andrew.cmu.edu> writes:

> Hi everyone,
>
> I was playing around with org-rss today, and it is pretty cool. I
> would like to customize the way the subheading bodies look though,
> primarily to unescape some html things like &lt;, to get rid of all
> the html tags, convert <a ..> to org-mode links, to download <img ...>
> so they can be displayed, etc... 
>
> for example a body of an rss entry looks like: 
>
> <title>Philip Herron: Cython Book</title>
> <guid>http://redbrain.co.uk/?p=147</guid>
> <link>http://redbrain.co.uk/cython-book/</link> <description><p>Hey
> all i thought i should really share that i actually wrote a book on
> Cython. The book has detailed examples and even shows you how you can
> extend native C/C++ applications in python by doing it for Tmux. <a
> href="http://bit.ly/195ahQs">http://bit.ly/195ahQs</a></p> <p><a
> href="http://redbrain.co.uk/wp-content/uploads/2013/12/photo.jpg"><img
> class="aligncenter size-full wp-image-148" alt="photo"
> src="http://redbrain.co.uk/wp-content/uploads/2013/12/photo.jpg"
> width="640" height="480" /></a>The code can be found: <a
> href="https://github.com/redbrain/cython-book">https://github.com/redbrain/cython-book</a></p></description>
> <pubDate>Tue, 10 Dec 2013 14:45:08 +0000</pubDate>
>
> I would like this simplified to something like:
> Philip Herron: Cython Book
>
> http://redbrain.co.uk/?p=147
>
> http://redbrain.co.uk/cython-book/
> Hey all i thought i should really share that i actually wrote a book
> on Cython. The book has detailed examples and even shows you how you
> can extend native C/C++ applications in python by doing it for Tmux.
> http://bit.ly/195ahQs
>
> [[feed-images/photo.jpg]]
>
> The code can be found: https://github.com/redbrain/cython-book
>
> basically, get the html code as close to org as reasonable. i found a
> way to get an html parse tree (libxml-parse-html-region start end),
> but I can't figure out how to convert that to the text I want. 
>
> Has anyone done anything like this?
>
> John

Maybe eww can help you...

>
> -----------------------------------
> John Kitchin
> Associate Professor
> Doherty Hall A207F
> Department of Chemical Engineering
> Carnegie Mellon University
> Pittsburgh, PA 15213
> 412-268-7803
> http://kitchingroup.cheme.cmu.edu

-- 

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: html to org-mode
  2014-01-04  4:56 ` Feng Shu
@ 2014-01-04  6:22   ` York Zhao
  2014-01-04 10:54     ` Bastien
  0 siblings, 1 reply; 5+ messages in thread
From: York Zhao @ 2014-01-04  6:22 UTC (permalink / raw)
  To: Feng Shu; +Cc: emacs-orgmode

[-- Attachment #1: Type: text/plain, Size: 2592 bytes --]

Hi John Kitchin,

What is org-rss, I couldn't find it. Can you give a link?

Thanks




On Fri, Jan 3, 2014 at 11:56 PM, Feng Shu <tumashu@gmail.com> wrote:

> John Kitchin <jkitchin@andrew.cmu.edu> writes:
>
> > Hi everyone,
> >
> > I was playing around with org-rss today, and it is pretty cool. I
> > would like to customize the way the subheading bodies look though,
> > primarily to unescape some html things like &lt;, to get rid of all
> > the html tags, convert <a ..> to org-mode links, to download <img ...>
> > so they can be displayed, etc...
> >
> > for example a body of an rss entry looks like:
> >
> > <title>Philip Herron: Cython Book</title>
> > <guid>http://redbrain.co.uk/?p=147</guid>
> > <link>http://redbrain.co.uk/cython-book/</link> <description><p>Hey
> > all i thought i should really share that i actually wrote a book on
> > Cython. The book has detailed examples and even shows you how you can
> > extend native C/C++ applications in python by doing it for Tmux. <a
> > href="http://bit.ly/195ahQs">http://bit.ly/195ahQs</a></p> <p><a
> > href="http://redbrain.co.uk/wp-content/uploads/2013/12/photo.jpg"><img
> > class="aligncenter size-full wp-image-148" alt="photo"
> > src="http://redbrain.co.uk/wp-content/uploads/2013/12/photo.jpg"
> > width="640" height="480" /></a>The code can be found: <a
> > href="https://github.com/redbrain/cython-book">
> https://github.com/redbrain/cython-book</a></p></description>
> > <pubDate>Tue, 10 Dec 2013 14:45:08 +0000</pubDate>
> >
> > I would like this simplified to something like:
> > Philip Herron: Cython Book
> >
> > http://redbrain.co.uk/?p=147
> >
> > http://redbrain.co.uk/cython-book/
> > Hey all i thought i should really share that i actually wrote a book
> > on Cython. The book has detailed examples and even shows you how you
> > can extend native C/C++ applications in python by doing it for Tmux.
> > http://bit.ly/195ahQs
> >
> > [[feed-images/photo.jpg]]
> >
> > The code can be found: https://github.com/redbrain/cython-book
> >
> > basically, get the html code as close to org as reasonable. i found a
> > way to get an html parse tree (libxml-parse-html-region start end),
> > but I can't figure out how to convert that to the text I want.
> >
> > Has anyone done anything like this?
> >
> > John
>
> Maybe eww can help you...
>
> >
> > -----------------------------------
> > John Kitchin
> > Associate Professor
> > Doherty Hall A207F
> > Department of Chemical Engineering
> > Carnegie Mellon University
> > Pittsburgh, PA 15213
> > 412-268-7803
> > http://kitchingroup.cheme.cmu.edu
>
> --
>
>

[-- Attachment #2: Type: text/html, Size: 4534 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: html to org-mode
  2014-01-04  6:22   ` York Zhao
@ 2014-01-04 10:54     ` Bastien
  2014-01-04 13:48       ` John Kitchin
  0 siblings, 1 reply; 5+ messages in thread
From: Bastien @ 2014-01-04 10:54 UTC (permalink / raw)
  To: York Zhao; +Cc: Feng Shu, emacs-orgmode

York Zhao <gtdplatform@gmail.com> writes:

> What is org-rss, I couldn't find it. Can you give a link?

This is ox-rss.el in contrib/lisp/ if you clone Org or get it
as a .zip/.tar.gz archive.

-- 
 Bastien

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: html to org-mode
  2014-01-04 10:54     ` Bastien
@ 2014-01-04 13:48       ` John Kitchin
  0 siblings, 0 replies; 5+ messages in thread
From: John Kitchin @ 2014-01-04 13:48 UTC (permalink / raw)
  To: Bastien; +Cc: York Zhao, Feng Shu, emacs-orgmode

[-- Attachment #1: Type: text/plain, Size: 2080 bytes --]

It was actually org-feed (http://orgmode.org/worg/org-contrib/org-feed.html
).

Here is an example usagethat downloads recent org-mode maillist entries and
formats them in a list of org-headings. The formatter function does some
character replacements, and makes the headings TODO items. I also defined a
little function to speed up deleting headlines I don't want to keep.

#+BEGIN_SRC emacs-lisp
(defun textify (s)
"strip control and escaped html ^M &lt; &gt; etc"
(let ((output s))
;  (setq output (replace-regexp-in-string "
\\|
;" " " s))
  (setq output (replace-regexp-in-string "&lt;" "<" output))
  (setq output (replace-regexp-in-string "&gt;" ">" output))
  (setq output (replace-regexp-in-string "&quot;" "\"" output))
  (setq output (replace-regexp-in-string "<br \>" "
" output))
  (setq output (replace-regexp-in-string "<br\>" "
" output))
  output))

(defun my-formatter (e)
  "format for rss feed to eventually do something useful"
  (format "* TODO %S
%s

%s" (textify (plist-get e :title))
  (or (and (plist-get entry :guid-permalink)
                       (plist-get entry :guid))
                  (plist-get entry :link))
  (textify (plist-get e :description))))

(let* ((org-feed-alist
        `(("org" "http://rss.gmane.org/messages/complete/gmane.emacs.orgmode"
"feeds-org.org" "org-mode" :formatter my-formatter))))
  (org-feed-update-all))

;; convenience to delete uninteresting articles
(defun delete-headline ()
  (interactive)
  (org-mark-subtree)
  (delete-forward-char 1))

(global-set-key (kbd "<f5>") 'delete-headline)
#+END_SRC

John

-----------------------------------
John Kitchin
Associate Professor
Doherty Hall A207F
Department of Chemical Engineering
Carnegie Mellon University
Pittsburgh, PA 15213
412-268-7803
http://kitchingroup.cheme.cmu.edu



On Sat, Jan 4, 2014 at 5:54 AM, Bastien <bzg@gnu.org> wrote:

> York Zhao <gtdplatform@gmail.com> writes:
>
> > What is org-rss, I couldn't find it. Can you give a link?
>
> This is ox-rss.el in contrib/lisp/ if you clone Org or get it
> as a .zip/.tar.gz archive.
>
> --
>  Bastien
>
>

[-- Attachment #2: Type: text/html, Size: 3253 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2014-01-04 13:48 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-01-04  2:40 html to org-mode John Kitchin
2014-01-04  4:56 ` Feng Shu
2014-01-04  6:22   ` York Zhao
2014-01-04 10:54     ` Bastien
2014-01-04 13:48       ` John Kitchin

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).