From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mp2 ([2001:41d0:2:4a6f::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by ms11 with LMTPS id ODteENpx0F7IGgAA0tVLHw (envelope-from ) for ; Fri, 29 May 2020 02:22:18 +0000 Received: from aspmx1.migadu.com ([2001:41d0:2:4a6f::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by mp2 with LMTPS id l8BQDNpx0F6SWwAAB5/wlQ (envelope-from ) for ; Fri, 29 May 2020 02:22:18 +0000 Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by aspmx1.migadu.com (Postfix) with ESMTPS id 4DCFA9400B1 for ; Fri, 29 May 2020 02:22:17 +0000 (UTC) Received: from localhost ([::1]:49238 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1jeUf1-0000KY-6G for larch@yhetil.org; Thu, 28 May 2020 22:22:15 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:41356) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1jeUed-0000KQ-Ik for emacs-orgmode@gnu.org; Thu, 28 May 2020 22:21:51 -0400 Received: from mail-pl1-x635.google.com ([2607:f8b0:4864:20::635]:46628) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1jeUec-0007Mf-2I for emacs-orgmode@gnu.org; Thu, 28 May 2020 22:21:51 -0400 Received: by mail-pl1-x635.google.com with SMTP id i17so390027pli.13 for ; Thu, 28 May 2020 19:21:49 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:in-reply-to:references:date:message-id :mime-version; bh=/i0CTBAC9vo2Qcr5c2rpKjUSNinMf27miAJsrJUYYv0=; b=DNiBzor4wgyF0VpiWZNeroJjno9KNItBA1b5mzfZqbD1yS7iBLIfOKfF52NTatmHK4 A6/d1zuWwqtIIJJeKkf0zV3BNhTabBHgbxpcZ95NqY91+fW4as+Lph0QobOKOHhPux7Q 6rIyK+QFG10zfBPxS8fVHMIhmw95ps58GD7c/hgjGeo4f+VUnMi2olS3gfxfITRLzrSy DExrc7jQRnRY5bX3EqOD6vU1+G7Y7LQPPZD3JcxbGIfDCZz1xvP3XmilF7AALMntvK1+ JRupKir2kcHrBNS+LcCJd6bQ4965PG3i7ITjUl4RyMsCR+0HA3+Eb+soaiQJioTJGTTt PFJA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:in-reply-to:references:date :message-id:mime-version; bh=/i0CTBAC9vo2Qcr5c2rpKjUSNinMf27miAJsrJUYYv0=; b=RZuYFdKEK8HbP1l62QmIpgpFfd0SHMbS+0RQh/iTew7tpf3n+kvX5xBsQ+myaTN1+p 3I/gGL15ktBLaQzlja6ChNqiIERlusLOLV8LQSiY5ODfXeP07778njmugbB/C/pE0AjM wrTBGDzrX0apMeyBSghcIW0Zv8l9ZGs2s+eRMp7L9NqdqZUoKGdqRVsH2B7teTcMyLzl sVLm/8wn+B5JW02Ph2SiVeAdV+G0v5+ciYtLvBtXQEKDaBWji0913NvDr4fPIafUvUZJ JnljRL9kKfeY9hlU1mrnLmnm5uJf1plOMWdLrVZmKDWU7AehIMm8o9TR8cBCeTwXwj3A tLWg== X-Gm-Message-State: AOAM5325JY5YUwrAtGgECNuc1in6SdUACYhrb3BHSiplqaZ8dW1WgoYk 1TAuSvckDKgHFEk9VKTCMf8= X-Google-Smtp-Source: ABdhPJydsZj8I1QKrNJlI0XCgt3Z7Xo4txRNQJX0CTES6Xy/uOUAF3uZzlDE03mNLvntTJbqJxmgZQ== X-Received: by 2002:a17:90a:e38f:: with SMTP id b15mr7468424pjz.206.1590718908475; Thu, 28 May 2020 19:21:48 -0700 (PDT) Received: from localhost ([104.250.131.79]) by smtp.gmail.com with ESMTPSA id l4sm5402959pgo.92.2020.05.28.19.21.47 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 28 May 2020 19:21:47 -0700 (PDT) From: Ihor Radchenko To: stardiviner , Matthew Lundin Subject: Re: [Feature] add a new org-attach dispatcher command to offline save web page In-Reply-To: <87y2pbsdj0.fsf@gmail.com> References: <87sgflu2gw.fsf@gmail.com> <87r1v4wyy4.fsf@fastmail.fm> <87r1v4bodg.fsf@localhost> <874ks0vxpk.fsf@fastmail.fm> <87y2pbsdj0.fsf@gmail.com> Date: Fri, 29 May 2020 10:17:08 +0800 Message-ID: <87y2pbtrh7.fsf@localhost> MIME-Version: 1.0 Content-Type: text/plain Received-SPF: pass client-ip=2607:f8b0:4864:20::635; envelope-from=yantar92@gmail.com; helo=mail-pl1-x635.google.com X-detected-operating-system: by eggs.gnu.org: No matching host in p0f cache. That's all we know. X-Spam_score_int: -17 X-Spam_score: -1.8 X-Spam_bar: - X-Spam_report: (-1.8 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_ENVFROM_END_DIGIT=0.25, FREEMAIL_FROM=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001 autolearn=_AUTOLEARN X-Spam_action: no action X-BeenThere: emacs-orgmode@gnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "General discussions about Org-mode." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Org Mode Errors-To: emacs-orgmode-bounces+larch=yhetil.org@gnu.org Sender: "Emacs-orgmode" X-Scanner: scn0 Authentication-Results: aspmx1.migadu.com; dkim=fail (rsa verify failed) header.d=gmail.com header.s=20161025 header.b=DNiBzor4; dmarc=fail reason="SPF not aligned (relaxed)" header.from=gmail.com (policy=none); spf=pass (aspmx1.migadu.com: domain of emacs-orgmode-bounces@gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=emacs-orgmode-bounces@gnu.org X-Spam-Score: 0.59 X-TUID: uPkDAVpnyJJL > As I said, PATCH welcome, I admired many times I don't have ability to build a > complex archive functionality on url.el or wget or curl. I have found the following solution [1] using wget: wget --mirror -p --convert-links -P ./LOCAL-DIR WEBSITE-URL This will not bundle the page into a single file, but it is better than nothing. org-attach does not have to attach exactly one file. [1] https://askubuntu.com/questions/411540/how-to-get-wget-to-download-exact-same-web-page-html-as-browser Best, Ihor stardiviner writes: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA256 > > > Matthew Lundin writes: > >> Ihor Radchenko writes: >> >>> It does not mean that attaching URL directly is not worth including >>> into org. This sounds pretty common use case, especially considering >>> the number of packages providing similar feature. You mentioned >>> org-board, but there is also org-download and org-web-tools. >> >> My uneasiness has more to do with the specificity of the dependence on >> monolith and the way that is hard-coded into the patch. When it comes to >> patches, I think priority should go to those that are configurable, >> accessible, and useful for everyone as opposed to those that have >> hard-coded work-flows or highly-specific user configurations. > > I don't think it's hardcoded, As I said in my first email, It's an `defcustom', > isn't it? User totally can change that. And I used `executable-find', the value > will be `nil' when monolith is not available. And I don't have ability to > archive same functionality with wget or curl. And as I said before, PATCH > welcome. > >> >>> I agree that monolith is completely uncommon tool and I would not expect >>> the majority of users to have it installed, but the same functionality >>> utilising built-in url.el (as a default) should be acceptable. >> >> The question is: which functionality? A simple downloading tool or a >> full archival tool? Achieving similar functionality to org-board or >> monolith would a big task, since they aim to download an archival >> version of a webpage (including all resources). In addition, with >> archiving you also quickly run into the complexity of versioning based >> on time archived. There's also the challenge of mapping the downloaded >> files to metadata (specifically the original url). Org-board currently >> handles both of these very well. > > If you mean adding a timestamp and original URL, I don't think its our argument > point, because user totally can do it manually, or automatic this task in some > kind of way. WDYT? > >> >> I suppose there would be a few options depending on what the aims are: >> >> 1. At the simple end, include little more than than a quick and dirty >> way of downloading a single resource (html, pdf, jpeg) using url.el >> or wget (or optionally, monolith) and putting that in the attachment >> folder. Those who want full archiving of all resources could use >> other tools like org-board or org-web-tools. > > the [u] url downloading method already did the simple task. > >> >> 2. At the (much) more complex end, it would be to code out a robust >> archiving solution on top of url.el or wget. > > As I said, PATCH welcome, I admired many times I don't have ability to build a > complex archive functionality on url.el or wget or curl. > > Isn't Org Mode build up from small things? Can just become a big tool at once? > >> >> 3. Another, possibly simpler option... Add a command to the dispatcher >> that allows the user to invoke a custom function that is called with >> the attachment directory as the default-directory. This would enable >> more end-user flexibility, such as the ability to use >> wkhtmtoimage/wkhtmltopdf, monolith, phantom.js, archive.is, etc. >> > > This is leaved to `defcustom' option. I will improve the download function to > have enough elasticity. > > - -- > [ stardiviner ] > I try to make every word tell the meaning that I want to express. > > Blog: https://stardiviner.github.io/ > IRC(freenode): stardiviner, Matrix: stardiviner > GPG: F09F650D7D674819892591401B5DF1C95AE89AC3 > > -----BEGIN PGP SIGNATURE----- > > iQFIBAEBCAAyFiEE8J9lDX1nSBmJJZFAG13xyVromsMFAl7QbYMUHG51bWJjaGls > ZEBnbWFpbC5jb20ACgkQG13xyVromsOFvAgAjTPtt5ZikNZnkp1gfuEGunVzblk6 > D6O+qgZAamT0EpR2nwK9U+x470KZAhXiLURkVb0l9/6uvA6MQZtj+umX/Qer/6bw > vB4RICWenFKyyBMy9U1pZ9WzTNw2ZNfeaEiJ0XlgXx9ZPGjBgcLi/7FSM4NybV4O > qbGwHbp4AzZ4mJOrMchNwlQfDYpg381xDewnHeuwtkVeBc34HpuFEGYKdmtOJ16S > mnL/5LrbI7MahlIZuiqyAzeyU3DkSvFOC/ovvch9YlAk1SeK/BslUcaGrwl/Il70 > XG/28XaAvVW2OYrO9vOmwpdoguQqARhKeMuof6p85waf1OF8itd97P95Lw== > =V/fu > -----END PGP SIGNATURE----- -- Ihor Radchenko, PhD, Center for Advancing Materials Performance from the Nanoscale (CAMP-nano) State Key Laboratory for Mechanical Behavior of Materials, Xi'an Jiaotong University, Xi'an, China Email: yantar92@gmail.com, ihor_radchenko@alumni.sutd.edu.sg