From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mp1 ([2001:41d0:2:4a6f::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by ms11 with LMTPS id wDc1FN/50V4xcwAA0tVLHw (envelope-from ) for ; Sat, 30 May 2020 06:14:55 +0000 Received: from aspmx1.migadu.com ([2001:41d0:2:4a6f::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by mp1 with LMTPS id EwTlD9/50V5qKwAAbx9fmQ (envelope-from ) for ; Sat, 30 May 2020 06:14:55 +0000 Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by aspmx1.migadu.com (Postfix) with ESMTPS id B267F940390 for ; Sat, 30 May 2020 06:14:54 +0000 (UTC) Received: from localhost ([::1]:54542 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1jeulg-0003Nb-WA for larch@yhetil.org; Sat, 30 May 2020 02:14:53 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:57002) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1jeulL-0003NQ-6c for emacs-orgmode@gnu.org; Sat, 30 May 2020 02:14:31 -0400 Received: from mail-pl1-x642.google.com ([2607:f8b0:4864:20::642]:34797) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1jeulJ-0001IC-QL for emacs-orgmode@gnu.org; Sat, 30 May 2020 02:14:30 -0400 Received: by mail-pl1-x642.google.com with SMTP id d3so2067617pln.1 for ; Fri, 29 May 2020 23:14:29 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:in-reply-to:references:date:message-id :mime-version; bh=3hhirECCo2pGu8G5tkqnfon127/hLlx8c1a3nf4muMs=; b=iDK6CVQKGv9znUwuM1Pf40PJAjmzIYRwVcLRLciM/3AmuVHYALXRPjncKXKKN08LzQ 7f/+E74iCQ4ArwmZKccwig8Px3Ep7mJwhTebJjEtCxjrlyVwBOpKBIAkCJJORTKxoZum TCy+B3U0jdSQY5Fuwm8vYhOtPHmffc5rWlq2xjuvtWBH6kgGTHXhMyIeThm+2x+pMGod d4PPR3gCLvYHLYWrzm58oK2Uxqj8fygzLceJC3oiGZmnoP/ajiY25qTNLnbekywBkmge IynSXXUL9iu6eAVsUSWHRv6ebZ09JMd/bRG+tU6LS+4QV6J8AMVvgMhGRy/OfACTDipP 0uag== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:in-reply-to:references:date :message-id:mime-version; bh=3hhirECCo2pGu8G5tkqnfon127/hLlx8c1a3nf4muMs=; b=qSEBx7b1EUaa1wgNmVLBLBtsLl3Pps6gVxb0japk4dnLr4UiDia65ebEdvqPI0tAuS EEYjZlQUdjUNw8bZl6wEaQOKNIzjr709MaiK4GuUasTRf79aOkc7seTq4E1RHcjexRUd xc11EG403eCCXVqDa/wCxh4tCLukDWB5I0XpyPwQauOsGhbhduVMbPUMcyB29dr19mBE 9WimHjppR/K3Xv4BN1H1mtwGMfDi1/db9Pzs1NqjLUuNOXTW2e/19MCnI4pMc1DatUDj 0Yh1UVakv/OnmMJb+uU68nJVo40ZuWAjUFKTJdK+RO4xelPVwqPmf3eAlbhVIvNEz7Wd jtCg== X-Gm-Message-State: AOAM533NttG5OLqPsyfSDm3yBAiiMYIl7M+vUWR9UmIpN7Bl0024Vke9 TT1X6wg3nFx72l1Et+USwxU= X-Google-Smtp-Source: ABdhPJxjE9kvsEbxPDIvSr0pYuL3RLMWYiEvmpkRcVvRme1T9N7PqdTjzU3c51vAb6JCH6yqoG53Dw== X-Received: by 2002:a17:902:bd09:: with SMTP id p9mr12153993pls.214.1590819267938; Fri, 29 May 2020 23:14:27 -0700 (PDT) Received: from localhost ([104.250.131.79]) by smtp.gmail.com with ESMTPSA id 125sm1711731pff.130.2020.05.29.23.14.25 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 29 May 2020 23:14:27 -0700 (PDT) From: Ihor Radchenko To: Matthew Lundin , stardiviner Subject: Re: [Feature] add a new org-attach dispatcher command to offline save web page In-Reply-To: <87h7vysqlq.fsf@fastmail.fm> References: <87h7vysqlq.fsf@fastmail.fm> Date: Sat, 30 May 2020 14:09:44 +0800 Message-ID: <87tuzykl7b.fsf@localhost> MIME-Version: 1.0 Content-Type: text/plain Received-SPF: pass client-ip=2607:f8b0:4864:20::642; envelope-from=yantar92@gmail.com; helo=mail-pl1-x642.google.com X-detected-operating-system: by eggs.gnu.org: No matching host in p0f cache. That's all we know. X-Spam_score_int: -17 X-Spam_score: -1.8 X-Spam_bar: - X-Spam_report: (-1.8 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_ENVFROM_END_DIGIT=0.25, FREEMAIL_FROM=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001 autolearn=_AUTOLEARN X-Spam_action: no action X-BeenThere: emacs-orgmode@gnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "General discussions about Org-mode." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Org Mode Errors-To: emacs-orgmode-bounces+larch=yhetil.org@gnu.org Sender: "Emacs-orgmode" X-Scanner: scn0 Authentication-Results: aspmx1.migadu.com; dkim=fail (rsa verify failed) header.d=gmail.com header.s=20161025 header.b=iDK6CVQK; dmarc=fail reason="SPF not aligned (relaxed)" header.from=gmail.com (policy=none); spf=pass (aspmx1.migadu.com: domain of emacs-orgmode-bounces@gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=emacs-orgmode-bounces@gnu.org X-Spam-Score: 0.59 X-TUID: OyqvpjyLaU37 > AFAICT, org-board uses the following options, which limit the archiving > to a single page and all its resources: > > wget -e robots=off --page-requisites --adjust-extension --convert-links [...] This is certainly better. I believe that wget would be a better default (with right flags). It is much more likely to be installed for average user. > You can also create a warc (web archive) file with wget, but then you > need a web archive replayer to view it, which is not exactly convenient. Interesting. I did not know about warc. Matthew Lundin writes: > Ihor Radchenko writes: > >>> As I said, PATCH welcome, I admired many times I don't have ability to build a >>> complex archive functionality on url.el or wget or curl. >> >> I have found the following solution [1] using wget: >> >> wget --mirror -p --convert-links -P ./LOCAL-DIR WEBSITE-URL >> > > I don't think --mirror is what we want this context, since that will > initiate a recursive download of the entire site. (Ironically, my IP is > now banned from a personal blog that provides a how-to for using wget > after I tried to run the above command on it.) From the wget manual: > > -m > --mirror > Turn on options suitable for mirroring. This option turns on recursion and > time-stamping, sets infinite recursion depth and keeps FTP directory listings. > It is currently equivalent to -r -N -l inf --no-remove-listing. > > AFAICT, org-board uses the following options, which limit the archiving > to a single page and all its resources: > > wget -e robots=off --page-requisites --adjust-extension --convert-links [...] > >> This will not bundle the page into a single file, but it is better than >> nothing. org-attach does not have to attach exactly one file. > > You can also create a warc (web archive) file with wget, but then you > need a web archive replayer to view it, which is not exactly convenient. > > Best, > > Matt > -- Ihor Radchenko, PhD, Center for Advancing Materials Performance from the Nanoscale (CAMP-nano) State Key Laboratory for Mechanical Behavior of Materials, Xi'an Jiaotong University, Xi'an, China Email: yantar92@gmail.com, ihor_radchenko@alumni.sutd.edu.sg