From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mp0 ([2001:41d0:2:4a6f::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by ms11 with LMTPS id CJ7TNVo50V6qXwAA0tVLHw (envelope-from ) for ; Fri, 29 May 2020 16:33:30 +0000 Received: from aspmx1.migadu.com ([2001:41d0:2:4a6f::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by mp0 with LMTPS id MOe7MVo50V6vfwAA1q6Kng (envelope-from ) for ; Fri, 29 May 2020 16:33:30 +0000 Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by aspmx1.migadu.com (Postfix) with ESMTPS id 6397494030E for ; Fri, 29 May 2020 16:33:30 +0000 (UTC) Received: from localhost ([::1]:59114 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1jehwm-0006LO-04 for larch@yhetil.org; Fri, 29 May 2020 12:33:28 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:60354) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1jehvt-0006KV-GY for emacs-orgmode@gnu.org; Fri, 29 May 2020 12:32:33 -0400 Received: from [183.249.132.153] (port=1983 helo=localhost) by eggs.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1jehvr-0001wu-6c for emacs-orgmode@gnu.org; Fri, 29 May 2020 12:32:33 -0400 Received: by localhost (Postfix, from userid 1000) id 75462242223; Sat, 30 May 2020 00:32:21 +0800 (CST) References: <87y2pbtrh7.fsf@localhost> <87h7vysqlq.fsf@fastmail.fm> User-agent: mu4e 1.4; emacs 28.0.50 From: stardiviner To: Matthew Lundin Subject: Re: [Feature] add a new org-attach dispatcher command to offline save web page In-reply-to: <87h7vysqlq.fsf@fastmail.fm> Date: Sat, 30 May 2020 00:32:18 +0800 Message-ID: <87v9keem7h.fsf@gmail.com> MIME-Version: 1.0 Content-Type: text/plain Content-Transfer-Encoding: quoted-printable X-Host-Lookup-Failed: Reverse DNS lookup failed for 183.249.132.153 (deferred) Received-SPF: softfail client-ip=183.249.132.153; envelope-from=numbchild@gmail.com; helo=localhost X-detected-operating-system: by eggs.gnu.org: First seen = 2020/05/29 12:32:22 X-ACL-Warn: Detected OS = Linux 2.2.x-3.x [generic] [fuzzy] X-Spam_score_int: 53 X-Spam_score: 5.3 X-Spam_bar: +++++ X-Spam_report: (5.3 / 5.0 requ) BAYES_00=-1.9, DKIM_ADSP_CUSTOM_MED=0.001, FORGED_GMAIL_RCVD=1, FREEMAIL_FROM=0.001, FSL_HELO_NON_FQDN_1=0.001, HELO_LOCALHOST=3.828, NML_ADSP_CUSTOM_MED=0.9, RDNS_NONE=0.793, SPF_SOFTFAIL=0.665, URIBL_BLOCKED=0.001 autolearn=_AUTOLEARN X-Spam_action: reject X-BeenThere: emacs-orgmode@gnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "General discussions about Org-mode." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: numbchild@gmail.com Cc: Org Mode , Ihor Radchenko Errors-To: emacs-orgmode-bounces+larch=yhetil.org@gnu.org Sender: "Emacs-orgmode" X-Scanner: scn0 Authentication-Results: aspmx1.migadu.com; dkim=none; dmarc=fail reason="SPF not aligned (relaxed), No valid DKIM" header.from=gmail.com (policy=none); spf=pass (aspmx1.migadu.com: domain of emacs-orgmode-bounces@gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=emacs-orgmode-bounces@gnu.org X-Spam-Score: 1.59 X-TUID: hISPpmkCppXr =2D----BEGIN PGP SIGNED MESSAGE----- Hash: SHA256 Matthew Lundin writes: > Ihor Radchenko writes: > >>> As I said, PATCH welcome, I admired many times I don't have ability to = build a >>> complex archive functionality on url.el or wget or curl. >> >> I have found the following solution [1] using wget: >> >> wget --mirror -p --convert-links -P ./LOCAL-DIR WEBSITE-URL >> > > I don't think --mirror is what we want this context, since that will > initiate a recursive download of the entire site. (Ironically, my IP is > now banned from a personal blog that provides a how-to for using wget > after I tried to run the above command on it.) From the wget manual: > > -m > --mirror > Turn on options suitable for mirroring. This option turns on rec= ursion and > time-stamping, sets infinite recursion depth and keeps FTP direct= ory listings. > It is currently equivalent to -r -N -l inf --no-remove-listing. > > AFAICT, org-board uses the following options, which limit the archiving > to a single page and all its resources: > > wget -e robots=3Doff --page-requisites --adjust-extension --convert-links= [...] Interesting, Learned a new skill. > >> This will not bundle the page into a single file, but it is better than >> nothing. org-attach does not have to attach exactly one file. > > You can also create a warc (web archive) file with wget, but then you > need a web archive replayer to view it, which is not exactly convenient. > Looks like you already parsed most technologies in awesome-web-archive list. Does that mean most technologies not suitable for Org Mode for now? > Best, > > Matt =2D --=20 [ stardiviner ] I try to make every word tell the meaning that I want to express. Blog: https://stardiviner.github.io/ IRC(freenode): stardiviner, Matrix: stardiviner GPG: F09F650D7D674819892591401B5DF1C95AE89AC3 =20=20=20=20=20=20 =2D----BEGIN PGP SIGNATURE----- iQFIBAEBCAAyFiEE8J9lDX1nSBmJJZFAG13xyVromsMFAl7RORIUHG51bWJjaGls ZEBnbWFpbC5jb20ACgkQG13xyVromsPyxwf5ATSzoz/vxLnlWGJ4jKxH/HtKlleE LJP3BhLDuRl6nrjJTBYfwCM/SrFbcWQjpzLtoXFRvL6Ro60yhNWxCgYSu74WPF6B Or3oEQrnQGvQ8SKNcyEOimXbKCUDImNetCAyLpxtbPJ0pHtf6py49p9i/M7ZTZK/ jIF2g0E1AezDwGPG00jpdO1cDOiGYczvRfaaMr2OQ1EBtDrZOn5z+Cb6YiaHqfF5 x5HT5Z8fCqxAFOj9ixzOdqZ9C2hRncxATyiSFH0vAzwDYpYw9A20ZqoevThTVZRI IUmirJM7pUQ4MM2SXidkLaT5T/Zuy4PjCrM8yfXybbnrSkWrVgjeEpXhYg=3D=3D =3DgsT0 =2D----END PGP SIGNATURE-----