From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mp0 ([2001:41d0:2:4a6f::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by ms11 with LMTPS id uGZSFnMr0V5cewAA0tVLHw (envelope-from ) for ; Fri, 29 May 2020 15:34:11 +0000 Received: from aspmx1.migadu.com ([2001:41d0:2:4a6f::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by mp0 with LMTPS id YHE6EnMr0V6UYQAA1q6Kng (envelope-from ) for ; Fri, 29 May 2020 15:34:11 +0000 Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by aspmx1.migadu.com (Postfix) with ESMTPS id 342059400EF for ; Fri, 29 May 2020 15:34:10 +0000 (UTC) Received: from localhost ([::1]:43738 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1jeh1N-0007TP-4p for larch@yhetil.org; Fri, 29 May 2020 11:34:09 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:48290) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1jeh0y-0007LR-Ol for emacs-orgmode@gnu.org; Fri, 29 May 2020 11:33:44 -0400 Received: from wout5-smtp.messagingengine.com ([64.147.123.21]:41957) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1jeh0w-0004qB-Tj for emacs-orgmode@gnu.org; Fri, 29 May 2020 11:33:44 -0400 Received: from compute4.internal (compute4.nyi.internal [10.202.2.44]) by mailout.west.internal (Postfix) with ESMTP id 1E0F5EAE; Fri, 29 May 2020 11:33:41 -0400 (EDT) Received: from mailfrontend2 ([10.202.2.163]) by compute4.internal (MEProxy); Fri, 29 May 2020 11:33:41 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=imapmail.org; h= from:to:cc:subject:in-reply-to:date:message-id:mime-version :content-type; s=fm3; bh=/M6vItZbmB5YINYonkrhTzlR4eMstTppCEhfjUm EQyg=; b=hotEw/POrvOoO6BDE4LwBz9yOwLjISr88tOxqlKJM7kBgAao+RrCzET Bqrlb71bJyE+GKf3r4E6Ipi9SfuvVYcDCuskfeq/s9am8CVicacKTRECrVCMgIpx BN/YSAHNwVUSFdJa9+AZvgSMCFqXTptQsGWO6gMAdw7qIMtVseib7UvL5KBrEz6O rQGYiK2TZzxeuBGkLxtcQaNpy9svP/abztrmNDcBLlSTdQ737V708gbk0eoOH37i FBq8kcW6HEEfJ6ZJQTmb9vxa1/sqJHs0RQMdWbj9g2jMBtC/z+LKaN2JbllMYxlx gB0vY/MsTEZng7zFFU396zZNz/KdwZA== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-type:date:from:in-reply-to :message-id:mime-version:subject:to:x-me-proxy:x-me-proxy :x-me-sender:x-me-sender:x-sasl-enc; s=fm2; bh=/M6vItZbmB5YINYon krhTzlR4eMstTppCEhfjUmEQyg=; b=UGNwrNEEmdLW4d4pFj6GqjtK6GjZq8CC2 j4L13sWoQgaQ6lx5bKI2D0ke7Fmx6j30xRDQfoBR6DLHEQ5DtmcYJAm0lI628Uv1 lDJnGc3EC6nOe4XdImJVQd4SBT+HEUhvnDM7tOFtWNSeiRTdKp/+e18gbE90+WA5 321ocrq+n+E5XQ7nLnwdWRM01ipWYgSnB2p+hIjfbj3Jz4ZcH5aPpwRIOUkQT9qh l+FGB2TJj9M9jhUGCH0EbUwx64TsKb/LdKGlHj5qojqWYbeMdqChYV8RDs27/R5t H4/EQXgaD13Lw4oFiDpa6pu/FZPFaBzbPXHY1xsyeUOdXA+pBsKtg== X-ME-Sender: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgeduhedruddvkedgieehucetufdoteggodetrfdotf fvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfqfgfvpdfurfetoffkrfgpnffqhgen uceurghilhhouhhtmecufedttdenucesvcftvggtihhpihgvnhhtshculddquddttddmne cujfgurhephffvufgjfffkgggtsehttdertddttddtnecuhfhrohhmpeforghtthhhvgif ucfnuhhnughinhcuoehmughlsehimhgrphhmrghilhdrohhrgheqnecuggftrfgrthhtvg hrnhepffevuddvtdfhkeevtefgtedvtdeuvedtieekhffffeffheduhfefkeeiueefueev necuffhomhgrihhnpehnohhthhhinhhgrdhorhhgnecukfhppedujeefrddvgeegrdefie drudegnecuvehluhhsthgvrhfuihiivgeptdenucfrrghrrghmpehmrghilhhfrhhomhep mhgulhesihhmrghpmhgrihhlrdhorhhg X-ME-Proxy: Received: from localhost (unknown [173.244.36.14]) by mail.messagingengine.com (Postfix) with ESMTPA id 3C12130618B7; Fri, 29 May 2020 11:33:40 -0400 (EDT) From: Matthew Lundin To: Ihor Radchenko , stardiviner Subject: Re: [Feature] add a new org-attach dispatcher command to offline save web page In-Reply-To: <87y2pbtrh7.fsf@localhost> Date: Fri, 29 May 2020 10:33:37 -0500 Message-ID: <87h7vysqlq.fsf@fastmail.fm> MIME-Version: 1.0 Content-Type: text/plain Received-SPF: pass client-ip=64.147.123.21; envelope-from=mdl@imapmail.org; helo=wout5-smtp.messagingengine.com X-detected-operating-system: by eggs.gnu.org: First seen = 2020/05/29 10:10:25 X-ACL-Warn: Detected OS = Linux 2.2.x-3.x [generic] [fuzzy] X-Spam_score_int: -27 X-Spam_score: -2.8 X-Spam_bar: -- X-Spam_report: (-2.8 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H3=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001 autolearn=_AUTOLEARN X-Spam_action: no action X-BeenThere: emacs-orgmode@gnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "General discussions about Org-mode." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Org Mode Errors-To: emacs-orgmode-bounces+larch=yhetil.org@gnu.org Sender: "Emacs-orgmode" X-Scanner: scn0 Authentication-Results: aspmx1.migadu.com; dkim=fail (rsa verify failed) header.d=imapmail.org header.s=fm3 header.b=hotEw/PO; dkim=fail (rsa verify failed) header.d=messagingengine.com header.s=fm2 header.b=UGNwrNEE; dmarc=fail reason="SPF not aligned (relaxed)" header.from=imapmail.org (policy=none); spf=pass (aspmx1.migadu.com: domain of emacs-orgmode-bounces@gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=emacs-orgmode-bounces@gnu.org X-Spam-Score: 0.09 X-TUID: v2BEHrb5U0Aa Ihor Radchenko writes: >> As I said, PATCH welcome, I admired many times I don't have ability to build a >> complex archive functionality on url.el or wget or curl. > > I have found the following solution [1] using wget: > > wget --mirror -p --convert-links -P ./LOCAL-DIR WEBSITE-URL > I don't think --mirror is what we want this context, since that will initiate a recursive download of the entire site. (Ironically, my IP is now banned from a personal blog that provides a how-to for using wget after I tried to run the above command on it.) From the wget manual: -m --mirror Turn on options suitable for mirroring. This option turns on recursion and time-stamping, sets infinite recursion depth and keeps FTP directory listings. It is currently equivalent to -r -N -l inf --no-remove-listing. AFAICT, org-board uses the following options, which limit the archiving to a single page and all its resources: wget -e robots=off --page-requisites --adjust-extension --convert-links [...] > This will not bundle the page into a single file, but it is better than > nothing. org-attach does not have to attach exactly one file. You can also create a warc (web archive) file with wget, but then you need a web archive replayer to view it, which is not exactly convenient. Best, Matt