From mboxrd@z Thu Jan 1 00:00:00 1970 From: Robert Horn Subject: Re: How do you store web pages for reference? Date: Mon, 16 Jan 2017 11:52:59 -0500 Message-ID: References: <2017-01-16T15-41-12@devnull.Karl-Voit.at> <2017-01-16T17-27-24@devnull.Karl-Voit.at> Mime-Version: 1.0 Content-Type: text/plain Return-path: Received: from eggs.gnu.org ([2001:4830:134:3::10]:56559) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1cTAWm-00079U-T6 for emacs-orgmode@gnu.org; Mon, 16 Jan 2017 11:53:05 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1cTAWj-0006Mc-Py for emacs-orgmode@gnu.org; Mon, 16 Jan 2017 11:53:04 -0500 Received: from mailbackend.panix.com ([166.84.1.89]:62849) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1cTAWj-0006MR-LQ for emacs-orgmode@gnu.org; Mon, 16 Jan 2017 11:53:01 -0500 In-reply-to: <2017-01-16T17-27-24@devnull.Karl-Voit.at> List-Id: "General discussions about Org-mode." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-orgmode-bounces+geo-emacs-orgmode=m.gmane.org@gnu.org Sender: "Emacs-orgmode" To: Karl Voit Cc: emacs-orgmode@gnu.org There is also a Firefox plugin "ScrapBook X", which is a successor to Scrapbook. It can capture the web page alone (with links to outside world) and allows you to select by depth or link additional pages that are also to be captured. (If you have infinite time and storage with the right links you might attempt to capture the entire Internet. Something like capture all pages to link depth 1000 comes to mind.) I use it to capture a variety of things. Each capture is stored in a directory tree of html, css, etc. rooted at a time-date tag for when the capture was performed. I have not seen nor attempted to integrate it with org or any other tools. This is feasible in theory, since the file /index.html is a valid page starting point and links are been rewritten as appropriate. Something like "firefox scrapbook-root/20170115205014/index.html" would be a proper reference. The more the page content becomes active content like javascript, the less likely that the page capture will save what you want, but that's inherent with active content. It would be nice to capture more metadata (like Zotero), but it only preserves minimal metadata about the capture. R Horn rjhorn@alum.mit.edu