From mboxrd@z Thu Jan 1 00:00:00 1970 From: Adam Porter Subject: Re: org-board -- bookmarking and archival Date: Thu, 15 Sep 2016 12:07:33 -0500 Message-ID: <87oa3pdrca.fsf@alphapapa.net> References: Mime-Version: 1.0 Content-Type: text/plain Return-path: Received: from eggs.gnu.org ([2001:4830:134:3::10]:41384) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bka8h-0001vm-Kw for emacs-orgmode@gnu.org; Thu, 15 Sep 2016 13:07:57 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1bka8e-0002SE-BD for emacs-orgmode@gnu.org; Thu, 15 Sep 2016 13:07:55 -0400 Received: from [195.159.176.226] (port=60144 helo=blaine.gmane.org) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bka8e-0002QQ-38 for emacs-orgmode@gnu.org; Thu, 15 Sep 2016 13:07:52 -0400 Received: from list by blaine.gmane.org with local (Exim 4.84_2) (envelope-from ) id 1bka8W-0002Ii-7v for emacs-orgmode@gnu.org; Thu, 15 Sep 2016 19:07:44 +0200 List-Id: "General discussions about Org-mode." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-orgmode-bounces+geo-emacs-orgmode=m.gmane.org@gnu.org Sender: "Emacs-orgmode" To: emacs-orgmode@gnu.org Hi Charles, Thanks for sharing that, I will check it out. As was mentioned, it seems ripe for integrating with browser capture. On that note, have you seen org-protocol-capture-html? For articles that are primarily text, I've been capturing articles directly in Org format, but your package sounds good for capturing pages as-is. By the way, you might want to consider integrating something like Readability or the Python package python-readability (aka readability-lxml) for reducing web pages to the primary content. It's worked out well in org-protocol-capture-html. By the way, here's some code I've been using to read and/or capture web pages from URLs on the clipboard: #+BEGIN_SRC elisp (defun url-to-org-with-readability (url) "Get page content of URL with python-readability, convert to Org with Pandoc, and display in buffer." (let (title content new-buffer) (with-temp-buffer (unless (= 0 (call-process "python" nil '(t t) nil "-m" "readability.readability" "-u" url)) (error "Python readability-lxml script failed: %s" (buffer-string))) ;; Get title (goto-char (point-min)) (setq title (buffer-substring-no-properties (search-forward "Title:") (line-end-position))) (unless (= 0 (call-process-region (point-min) (point-max) "pandoc" t t nil "--no-wrap" "-f" "html" "-t" "org")) (error "Pandoc failed.")) (setq content (buffer-substring (point-min) (buffer-end 1)))) ;; Make new buffer (setq new-buffer (generate-new-buffer title)) (with-current-buffer new-buffer (insert (concat "* [[" url "][" title "]]\n\n")) (insert content) (org-mode) (goto-char (point-min)) (org-cycle) (switch-to-buffer new-buffer)))) (defun read-url-with-org () "Call `url-to-org-with-readability' on URL in kill ring." (interactive) (url-to-org-with-readability (first kill-ring))) (defun org-capture-web-page-with-readability (&optional url) "Return string containing entire capture to be inserted in org-capture template." (let ((url (or url (first kill-ring))) ;; From org-insert-time-stamp (timestamp (format-time-string (concat "[" (substring (cdr org-time-stamp-formats) 1 -1) "]"))) title title-linked content) (with-temp-buffer (unless (= 0 (call-process "python" nil '(t t) nil "-m" "readability.readability" "-u" url)) (error "Python readability-lxml script failed: %s" (buffer-string))) ;; Get title (goto-char (point-min)) (setq title (buffer-substring-no-properties (search-forward "Title:") (line-end-position))) (setq title-linked (concat "[[" url "][" title "]]")) (unless (= 0 (call-process-region (point-min) (point-max) "pandoc" t t nil "--no-wrap" "-f" "html" "-t" "org")) (error "Pandoc failed.")) ;; Demote page headings in capture buffer to below the ;; top-level Org heading and "Article" 2nd-level heading (save-excursion (goto-char (point-min)) (while (re-search-forward (rx bol (1+ "*") (1+ space)) nil t) (beginning-of-line) (insert "**") (end-of-line))) (goto-char (point-min)) (goto-line 2) (setq content (s-trim (buffer-substring (point) (buffer-end 1)))) ;; Return capture for insertion (concat title-linked " :website:\n\n" timestamp "\n\n** Article\n\n" content)))) ;; org-capture template ("wr" "Capture Web site with python-readability" entry (file "~/org/articles.org") "* %(org-capture-web-page-with-readability)") #+END_SRC