emacs-orgmode@gnu.org archives
 help / color / mirror / code / Atom feed
* [Feature] add a new org-attach dispatcher command to offline save web page
@ 2020-05-27  9:55 stardiviner
  2020-05-28  2:55 ` Matthew Lundin
                   ` (3 more replies)
  0 siblings, 4 replies; 29+ messages in thread
From: stardiviner @ 2020-05-27  9:55 UTC (permalink / raw)
  To: Org Mode


[-- Attachment #1.1: Type: text/plain, Size: 553 bytes --]


I attached the patch.

I think this feature will be helpful for use who archive web page data usually
like me. To be more portable, I also added an defcustom option for other
external command. And use an if condition to detect whether external command
available, else warning user.

-- 
[ stardiviner ]
       I try to make every word tell the meaning that I want to express.

       Blog: https://stardiviner.github.io/
       IRC(freenode): stardiviner, Matrix: stardiviner
       GPG: F09F650D7D674819892591401B5DF1C95AE89AC3
      

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1.2: 0001-org-attach.el-add-a-new-command-to-offline-save-web-.patch --]
[-- Type: text/x-patch, Size: 3139 bytes --]

From fcf78b5a5447ae81479e7839df001da659ede5e3 Mon Sep 17 00:00:00 2001
From: stardiviner <numbchild@gmail.com>
Date: Wed, 27 May 2020 17:42:24 +0800
Subject: [PATCH] * org-attach.el: add a new command to offline save web page

* lisp/org-attach.el (org-attach-url-offline): A new org-attach
dispatcher command to offline save web page to a single file.

* lisp/org-agenda.el (org-attach-url-offline-command): A customize
option to specify external command for offline save web page.

* lisp/org-attach.el (org-attach-offline-url-with-monolith): A command
invoke external command "monolith" to offline save web page.
---
 lisp/org-attach.el | 26 +++++++++++++++++++++++++-
 1 file changed, 25 insertions(+), 1 deletion(-)

diff --git a/lisp/org-attach.el b/lisp/org-attach.el
index 57d1360fc..0d13e0da1 100644
--- a/lisp/org-attach.el
+++ b/lisp/org-attach.el
@@ -204,6 +204,8 @@ (defcustom org-attach-commands
      "Attach a file using symbolic-link method.")
     ((?u ?\C-u) org-attach-url
      "Attach a file from URL (downloading it).")
+    ((?U) org-attach-url-offline
+     "Attach an offline version of web page URL (use monolith).")
     ((?b) org-attach-buffer
      "Select a buffer and attach its contents to the task.")
     ((?n ?\C-n) org-attach-new
@@ -467,6 +469,27 @@ (defun org-attach-url (url)
   (let ((org-attach-method 'url))
     (org-attach-attach url)))
 
+(defun org-attach-url-offline (url)
+  (interactive "MURL of the web page be offline to attach: \n")
+  (let ((org-attach-method 'offline))
+    (org-attach-attach url)))
+
+(defcustom org-attach-url-offline-command (executable-find "monolith")
+  "The command used to offline save web page."
+  :type 'string
+  :safe #'stringp
+  :group 'org-attach)
+
+(defun org-attach-offline-url-with-monolith (url target-file)
+  "Save an offline archive of web page with monolith."
+  (if org-attach-url-offline-command
+      (make-process
+       :name "org-attach-offline"
+       :command (list org-attach-url-offline-command url "-o" target-file)
+       :sentinel (lambda (proc event) (message "org-attach-offline finished!"))
+       :buffer "*org-attach-offline*")
+    (warn "You must have a command availble for offline save web page!\n Set variable `org-attach-url-offline-command'.")))
+
 (defun org-attach-buffer (buffer-name)
   "Attach BUFFER-NAME's contents to current outline node.
 BUFFER-NAME is a string.  Signals a `file-already-exists' error
@@ -504,7 +527,8 @@ (defun org-attach-attach (file &optional visit-dir method)
        ((eq method 'cp) (copy-file file attach-file))
        ((eq method 'ln) (add-name-to-file file attach-file))
        ((eq method 'lns) (make-symbolic-link file attach-file))
-       ((eq method 'url) (url-copy-file file attach-file)))
+       ((eq method 'url) (url-copy-file file attach-file))
+       ((eq method 'offline) (org-attach-offline-url-with-monolith file attach-file)))
       (run-hook-with-args 'org-attach-after-change-hook attach-dir)
       (org-attach-tag)
       (cond ((eq org-attach-store-link-p 'attached)
-- 
2.26.2


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 515 bytes --]

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [Feature] add a new org-attach dispatcher command to offline save web page
  2020-05-27  9:55 [Feature] add a new org-attach dispatcher command to offline save web page stardiviner
@ 2020-05-28  2:55 ` Matthew Lundin
  2020-05-28  5:49   ` Ihor Radchenko
                     ` (2 more replies)
  2020-05-28 22:24 ` Samuel Wales
                   ` (2 subsequent siblings)
  3 siblings, 3 replies; 29+ messages in thread
From: Matthew Lundin @ 2020-05-28  2:55 UTC (permalink / raw)
  To: numbchild, Org Mode

stardiviner <numbchild@gmail.com> writes:

> I attached the patch.
>
> I think this feature will be helpful for use who archive web page data usually
> like me. To be more portable, I also added an defcustom option for other
> external command. And use an if condition to detect whether external command
> available, else warning user.

This looks helpful, but I think this should be an add-on rather than
integrated into the org-attach. I'm glad to learn about monolith, but
far fewer users are likely to have it installed than have wget or curl
installed. You might also want to check out org-board, which has similar
functionality:

https://github.com/scallywag/org-board

Best,

Matt


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Feature] add a new org-attach dispatcher command to offline save web page
  2020-05-28  2:55 ` Matthew Lundin
@ 2020-05-28  5:49   ` Ihor Radchenko
  2020-05-28  6:39     ` stardiviner
  2020-05-28 16:19     ` Matthew Lundin
  2020-05-28  6:37   ` stardiviner
  2020-05-28  6:40   ` stardiviner
  2 siblings, 2 replies; 29+ messages in thread
From: Ihor Radchenko @ 2020-05-28  5:49 UTC (permalink / raw)
  To: Matthew Lundin, numbchild, Org Mode

> This looks helpful, but I think this should be an add-on rather than
> integrated into the org-attach. I'm glad to learn about monolith, but
> far fewer users are likely to have it installed than have wget or curl
> installed. You might also want to check out org-board, which has similar
> functionality:

It does not mean that attaching URL directly is not worth including into
org. This sounds pretty common use case, especially considering the number
of packages providing similar feature. You mentioned org-board, but
there is also org-download and org-web-tools.

I agree that monolith is completely uncommon tool and I would not expect
the majority of users to have it installed, but the same functionality
utilising built-in url.el (as a default) should be acceptable.

Best,
Ihor


Matthew Lundin <mdl@imapmail.org> writes:

> stardiviner <numbchild@gmail.com> writes:
>
>> I attached the patch.
>>
>> I think this feature will be helpful for use who archive web page data usually
>> like me. To be more portable, I also added an defcustom option for other
>> external command. And use an if condition to detect whether external command
>> available, else warning user.
>
> This looks helpful, but I think this should be an add-on rather than
> integrated into the org-attach. I'm glad to learn about monolith, but
> far fewer users are likely to have it installed than have wget or curl
> installed. You might also want to check out org-board, which has similar
> functionality:
>
> https://github.com/scallywag/org-board
>
> Best,
>
> Matt
>

-- 
Ihor Radchenko,
PhD,
Center for Advancing Materials Performance from the Nanoscale (CAMP-nano)
State Key Laboratory for Mechanical Behavior of Materials, Xi'an Jiaotong University, Xi'an, China
Email: yantar92@gmail.com, ihor_radchenko@alumni.sutd.edu.sg


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Feature] add a new org-attach dispatcher command to offline save web page
  2020-05-28  2:55 ` Matthew Lundin
  2020-05-28  5:49   ` Ihor Radchenko
@ 2020-05-28  6:37   ` stardiviner
  2020-05-28  6:40   ` stardiviner
  2 siblings, 0 replies; 29+ messages in thread
From: stardiviner @ 2020-05-28  6:37 UTC (permalink / raw)
  To: Matthew Lundin; +Cc: Org Mode

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256


Matthew Lundin <mdl@imapmail.org> writes:

> stardiviner <numbchild@gmail.com> writes:
>
>> I attached the patch.
>>
>> I think this feature will be helpful for use who archive web page data usually
>> like me. To be more portable, I also added an defcustom option for other
>> external command. And use an if condition to detect whether external command
>> available, else warning user.
>
> This looks helpful.

Yes, this is helpful. Especially for archiving modern web pages.

> but I think this should be an add-on rather than integrated into the
> org-attach.

I more like an alternative version of [C-c C-a u] version which not downloading
the file, just download the web page instead.

> I'm glad to learn about monolith, but far fewer users are likely to have it
> installed than have wget or curl installed.

This option is not enabled by default or forced, The defcustom option will try
to find the command. It's just an option. Many Windows user also neither have
wget nor curl installed. That considering will ban lots of functionality. I did
an search in Org Mode source code, I found some items like org-feed.el which
using wget and curl. Like most Babel series, ob-plantuml.el etc, they all reply
on external stuff.

You might also want to check out org-board, which has similar
> functionality:
>
> https://github.com/scallywag/org-board
>

I have used org-board before, but it's just not perfect for my need (well,
personal experience just). There are some disadvantages, like complex operation,
too many steps to archive web page, many files (bad for Org directory backup)
etc. And wget or curl is bad for archive modern web pages. They are old for old
web pages.

The only disadvantage of my patch is that "monolith" is not in Emacs or Org Mode
built-in directly. Org Mode is an open source project, so it does not contains
not open source stuff, I agree. I don't think a line of code ~(executable-find
"monolith")~ is against this principle.

Best regards.

- -- 
[ stardiviner ]
       I try to make every word tell the meaning that I want to express.

       Blog: https://stardiviner.github.io/
       IRC(freenode): stardiviner, Matrix: stardiviner
       GPG: F09F650D7D674819892591401B5DF1C95AE89AC3
      
-----BEGIN PGP SIGNATURE-----

iQFIBAEBCAAyFiEE8J9lDX1nSBmJJZFAG13xyVromsMFAl7PXCgUHG51bWJjaGls
ZEBnbWFpbC5jb20ACgkQG13xyVromsNBgQf/TCfGrMuLzGQSx0N1zzLmd2nX+tTj
6CUJd82SWpHqwI4QZX1BNd8JLpp5OKOw8o3LZi9y5UigOIxrh8DG51vUn+sJBvl5
iaBx3ao/K6Ff1ytMFnCcs+d9ohCj9mUTD1ZSt3LYjG1tJI1uqEoPi16smd84KSUQ
Q8QqYauYarotF+JityGBII3RqpY1GlH5xWYFauE/t5KVPaf8tOueYwbm9FehOWuq
ZKDzmuNkDnJfLRcz7ovEMJk0wrk6mwTjIYKhnrZiPDj9vGyR7ZfrRC0yZDjYgwaG
p9vDZkFwTbPeQUrE1BJOo4N/h2HNoNB0b2ojpvgm52PnlvYvQEyWJxVaLg==
=FsnW
-----END PGP SIGNATURE-----


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Feature] add a new org-attach dispatcher command to offline save web page
  2020-05-28  5:49   ` Ihor Radchenko
@ 2020-05-28  6:39     ` stardiviner
  2020-05-28 14:03       ` Ihor Radchenko
  2020-05-28 16:19     ` Matthew Lundin
  1 sibling, 1 reply; 29+ messages in thread
From: stardiviner @ 2020-05-28  6:39 UTC (permalink / raw)
  To: Ihor Radchenko; +Cc: Matthew Lundin, Org Mode

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256


Ihor Radchenko <yantar92@gmail.com> writes:

>> This looks helpful, but I think this should be an add-on rather than
>> integrated into the org-attach. I'm glad to learn about monolith, but
>> far fewer users are likely to have it installed than have wget or curl
>> installed. You might also want to check out org-board, which has similar
>> functionality:
>
> It does not mean that attaching URL directly is not worth including into
> org. This sounds pretty common use case, especially considering the number
> of packages providing similar feature. You mentioned org-board, but
> there is also org-download and org-web-tools.

That's right, this phenomenon represents an obviously situation, Org Mode should
have this functionality.

>
> I agree that monolith is completely uncommon tool and I would not expect
> the majority of users to have it installed, but the same functionality
> utilising built-in url.el (as a default) should be acceptable.

If have better solution, I totally agree with that.

>
> Best,
> Ihor
>
>
> Matthew Lundin <mdl@imapmail.org> writes:
>
>> stardiviner <numbchild@gmail.com> writes:
>>
>>> I attached the patch.
>>>
>>> I think this feature will be helpful for use who archive web page data usually
>>> like me. To be more portable, I also added an defcustom option for other
>>> external command. And use an if condition to detect whether external command
>>> available, else warning user.
>>
>> This looks helpful, but I think this should be an add-on rather than
>> integrated into the org-attach. I'm glad to learn about monolith, but
>> far fewer users are likely to have it installed than have wget or curl
>> installed. You might also want to check out org-board, which has similar
>> functionality:
>>
>> https://github.com/scallywag/org-board
>>
>> Best,
>>
>> Matt
>>


- -- 
[ stardiviner ]
       I try to make every word tell the meaning that I want to express.

       Blog: https://stardiviner.github.io/
       IRC(freenode): stardiviner, Matrix: stardiviner
       GPG: F09F650D7D674819892591401B5DF1C95AE89AC3
      
-----BEGIN PGP SIGNATURE-----

iQFIBAEBCAAyFiEE8J9lDX1nSBmJJZFAG13xyVromsMFAl7PXKoUHG51bWJjaGls
ZEBnbWFpbC5jb20ACgkQG13xyVromsOiMQf9EY7ES42hWxdjgFqiN0UqPGssRd+l
fcvhb+yiT5/71TYA8YbQOF0Pa2SpUBgGUw+hcHUd9aDXxAP+4ucmP4bZ/vx29UTN
T7YpOvuc5ZkafiN5bWghij1dLYhfi2CMDGGduU9rvjSleaRMhoX12wo8OM2qBHDm
2bqWHrZsGVPsNlAK3ymN3PdmwOtxktRVUxWZOug3ZF6LYnEHWRp/rb6lvS2BF3Ej
qNazerYVcpfDsOiCWTSrjos2gnITT2pSlnxmrQFO69Pvex+T2dEJL7raNyuXi3Qd
jn5uayrnHn4ZeUwSfmAQiqvq2TsFZ5zvtH0nKM+4tp/6gSgEuTw9gYaoBQ==
=qrFy
-----END PGP SIGNATURE-----


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Feature] add a new org-attach dispatcher command to offline save web page
  2020-05-28  2:55 ` Matthew Lundin
  2020-05-28  5:49   ` Ihor Radchenko
  2020-05-28  6:37   ` stardiviner
@ 2020-05-28  6:40   ` stardiviner
  2 siblings, 0 replies; 29+ messages in thread
From: stardiviner @ 2020-05-28  6:40 UTC (permalink / raw)
  To: Matthew Lundin; +Cc: Org Mode

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256


Matthew Lundin <mdl@imapmail.org> writes:

> stardiviner <numbchild@gmail.com> writes:
>
>> I attached the patch.
>>
>> I think this feature will be helpful for use who archive web page data usually
>> like me. To be more portable, I also added an defcustom option for other
>> external command. And use an if condition to detect whether external command
>> available, else warning user.
>
> This looks helpful.

Yes, this is helpful. Especially for archiving modern web pages.

> but I think this should be an add-on rather than integrated into the
> org-attach.

I more like an alternative version of [C-c C-a u] version which not downloading
the file, just download the web page instead.

> I'm glad to learn about monolith, but far fewer users are likely to have it
> installed than have wget or curl installed.

This option is not enabled by default or forced, The defcustom option will try
to find the command. It's just an option. Many Windows user also neither have
wget nor curl installed. That considering will ban lots of functionality. I did
an search in Org Mode source code, I found some items like org-feed.el which
using wget and curl. Like most Babel series, ob-plantuml.el etc, they all reply
on external stuff.

You might also want to check out org-board, which has similar
> functionality:
>
> https://github.com/scallywag/org-board
>

I have used org-board before, but it's just not perfect for my need (well,
personal experience just). There are some disadvantages, like complex operation,
too many steps to archive web page, many files (bad for Org directory backup)
etc. And wget or curl is bad for archive modern web pages. They are old for old
web pages.

The only disadvantage of my patch is that "monolith" is not in Emacs or Org Mode
built-in directly. Org Mode is an open source project, so it does not contains
not open source stuff, I agree. I don't think a line of code ~(executable-find
"monolith")~ is against this principle.

Best regards.

- -- 
[ stardiviner ]
       I try to make every word tell the meaning that I want to express.

       Blog: https://stardiviner.github.io/
       IRC(freenode): stardiviner, Matrix: stardiviner
       GPG: F09F650D7D674819892591401B5DF1C95AE89AC3
      
-----BEGIN PGP SIGNATURE-----

iQFIBAEBCAAyFiEE8J9lDX1nSBmJJZFAG13xyVromsMFAl7PXNYUHG51bWJjaGls
ZEBnbWFpbC5jb20ACgkQG13xyVromsOTwQf+MyArcTpZG1CdzSiYTIFkp7P8/2Bo
L5CM11djHyv4CVtR+vJJOR3REQrkn6Wl86S82uG/YhJ5R2bn+DELZTnsSI1M1f63
W40UcLkgpZ/8+lWi/m2MvoB8mW64vdE54uXKvbQHaY21RreVi5IaR7VstMsuwLVw
BngcypAr+0yGqCPsoD7ulal74YZLQR6o1WogvHERAwpMiDbJwwfh1HNVH2/2xYYv
SuOX+6LCAOAkMGUr3TA8eWs1cttxoqdRLIINVB1mriq9GcjxV9fi6I9pXmA2xhcu
ICaY+lwive+1/grZ/0y5klaE/0bsecGq6FCEUuynxWvnatn6T4Nx7wZK6A==
=LcTW
-----END PGP SIGNATURE-----


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Feature] add a new org-attach dispatcher command to offline save web page
  2020-05-28  6:39     ` stardiviner
@ 2020-05-28 14:03       ` Ihor Radchenko
  2020-05-28 16:00         ` stardiviner
  0 siblings, 1 reply; 29+ messages in thread
From: Ihor Radchenko @ 2020-05-28 14:03 UTC (permalink / raw)
  To: numbchild; +Cc: Matthew Lundin, Org Mode

> If have better solution, I totally agree with that.

In the current state, your patch will be pretty much useless for users
without technical background.

url-retrieve-synchronously + save-buffer should be a better default. It
will be available for all users.

Ideally, there should be several customisation options. Something like:
 - url-retrieve-synchronously
 - curl
 - wget
 - monolith

Also, it would be great if you provide some description about each
customisation in the docstring. For example, a link to github page of
monolith and a brief description how it is different from other url
retrieval methods should be useful for general user.

Best,
Ihor


stardiviner <numbchild@gmail.com> writes:

> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA256
>
>
> Ihor Radchenko <yantar92@gmail.com> writes:
>
>>> This looks helpful, but I think this should be an add-on rather than
>>> integrated into the org-attach. I'm glad to learn about monolith, but
>>> far fewer users are likely to have it installed than have wget or curl
>>> installed. You might also want to check out org-board, which has similar
>>> functionality:
>>
>> It does not mean that attaching URL directly is not worth including into
>> org. This sounds pretty common use case, especially considering the number
>> of packages providing similar feature. You mentioned org-board, but
>> there is also org-download and org-web-tools.
>
> That's right, this phenomenon represents an obviously situation, Org Mode should
> have this functionality.
>
>>
>> I agree that monolith is completely uncommon tool and I would not expect
>> the majority of users to have it installed, but the same functionality
>> utilising built-in url.el (as a default) should be acceptable.
>
> If have better solution, I totally agree with that.
>
>>
>> Best,
>> Ihor
>>
>>
>> Matthew Lundin <mdl@imapmail.org> writes:
>>
>>> stardiviner <numbchild@gmail.com> writes:
>>>
>>>> I attached the patch.
>>>>
>>>> I think this feature will be helpful for use who archive web page data usually
>>>> like me. To be more portable, I also added an defcustom option for other
>>>> external command. And use an if condition to detect whether external command
>>>> available, else warning user.
>>>
>>> This looks helpful, but I think this should be an add-on rather than
>>> integrated into the org-attach. I'm glad to learn about monolith, but
>>> far fewer users are likely to have it installed than have wget or curl
>>> installed. You might also want to check out org-board, which has similar
>>> functionality:
>>>
>>> https://github.com/scallywag/org-board
>>>
>>> Best,
>>>
>>> Matt
>>>
>
>
> - -- 
> [ stardiviner ]
>        I try to make every word tell the meaning that I want to express.
>
>        Blog: https://stardiviner.github.io/
>        IRC(freenode): stardiviner, Matrix: stardiviner
>        GPG: F09F650D7D674819892591401B5DF1C95AE89AC3
>       
> -----BEGIN PGP SIGNATURE-----
>
> iQFIBAEBCAAyFiEE8J9lDX1nSBmJJZFAG13xyVromsMFAl7PXKoUHG51bWJjaGls
> ZEBnbWFpbC5jb20ACgkQG13xyVromsOiMQf9EY7ES42hWxdjgFqiN0UqPGssRd+l
> fcvhb+yiT5/71TYA8YbQOF0Pa2SpUBgGUw+hcHUd9aDXxAP+4ucmP4bZ/vx29UTN
> T7YpOvuc5ZkafiN5bWghij1dLYhfi2CMDGGduU9rvjSleaRMhoX12wo8OM2qBHDm
> 2bqWHrZsGVPsNlAK3ymN3PdmwOtxktRVUxWZOug3ZF6LYnEHWRp/rb6lvS2BF3Ej
> qNazerYVcpfDsOiCWTSrjos2gnITT2pSlnxmrQFO69Pvex+T2dEJL7raNyuXi3Qd
> jn5uayrnHn4ZeUwSfmAQiqvq2TsFZ5zvtH0nKM+4tp/6gSgEuTw9gYaoBQ==
> =qrFy
> -----END PGP SIGNATURE-----
>

-- 
Ihor Radchenko,
PhD,
Center for Advancing Materials Performance from the Nanoscale (CAMP-nano)
State Key Laboratory for Mechanical Behavior of Materials, Xi'an Jiaotong University, Xi'an, China
Email: yantar92@gmail.com, ihor_radchenko@alumni.sutd.edu.sg


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Feature] add a new org-attach dispatcher command to offline save web page
  2020-05-28 14:03       ` Ihor Radchenko
@ 2020-05-28 16:00         ` stardiviner
  2020-05-28 18:16           ` Ihor Radchenko
  0 siblings, 1 reply; 29+ messages in thread
From: stardiviner @ 2020-05-28 16:00 UTC (permalink / raw)
  To: Ihor Radchenko; +Cc: Matthew Lundin, Org Mode

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256


Ihor Radchenko <yantar92@gmail.com> writes:

>> If have better solution, I totally agree with that.
>
> In the current state, your patch will be pretty much useless for users
> without technical background.
>
> url-retrieve-synchronously + save-buffer should be a better default. It
> will be available for all users.
>

I don't know how to implement similar functionality like monolith with
url-retrieve-synchronously + save-buffer. PATCH welcome.

> Ideally, there should be several customisation options. Something like:
>  - url-retrieve-synchronously
>  - curl
>  - wget
>  - monolith

Of course curl and wget is alternative options, just still same problem. Don't
know how to download web page as a single file and save it as archive file. If
use other ways, it's just another org-board. Then this patch will be unnecessary.

>
> Also, it would be great if you provide some description about each
> customisation in the docstring. For example, a link to github page of
> monolith and a brief description how it is different from other url
> retrieval methods should be useful for general user.

This is good solution, I will improve option description.

>
> Best,
> Ihor
>
>
> stardiviner <numbchild@gmail.com> writes:
>
>> -----BEGIN PGP SIGNED MESSAGE-----
>> Hash: SHA256
>>
>>
>> Ihor Radchenko <yantar92@gmail.com> writes:
>>
>>>> This looks helpful, but I think this should be an add-on rather than
>>>> integrated into the org-attach. I'm glad to learn about monolith, but
>>>> far fewer users are likely to have it installed than have wget or curl
>>>> installed. You might also want to check out org-board, which has similar
>>>> functionality:
>>>
>>> It does not mean that attaching URL directly is not worth including into
>>> org. This sounds pretty common use case, especially considering the number
>>> of packages providing similar feature. You mentioned org-board, but
>>> there is also org-download and org-web-tools.
>>
>> That's right, this phenomenon represents an obviously situation, Org Mode should
>> have this functionality.
>>
>>>
>>> I agree that monolith is completely uncommon tool and I would not expect
>>> the majority of users to have it installed, but the same functionality
>>> utilising built-in url.el (as a default) should be acceptable.
>>
>> If have better solution, I totally agree with that.
>>
>>>
>>> Best,
>>> Ihor
>>>
>>>
>>> Matthew Lundin <mdl@imapmail.org> writes:
>>>
>>>> stardiviner <numbchild@gmail.com> writes:
>>>>
>>>>> I attached the patch.
>>>>>
>>>>> I think this feature will be helpful for use who archive web page data usually
>>>>> like me. To be more portable, I also added an defcustom option for other
>>>>> external command. And use an if condition to detect whether external command
>>>>> available, else warning user.
>>>>
>>>> This looks helpful, but I think this should be an add-on rather than
>>>> integrated into the org-attach. I'm glad to learn about monolith, but
>>>> far fewer users are likely to have it installed than have wget or curl
>>>> installed. You might also want to check out org-board, which has similar
>>>> functionality:
>>>>
>>>> https://github.com/scallywag/org-board
>>>>
>>>> Best,
>>>>
>>>> Matt
>>>>
>>
>>
>> - -- 
>> [ stardiviner ]
>>        I try to make every word tell the meaning that I want to express.
>>
>>        Blog: https://stardiviner.github.io/
>>        IRC(freenode): stardiviner, Matrix: stardiviner
>>        GPG: F09F650D7D674819892591401B5DF1C95AE89AC3
>>       
>> -----BEGIN PGP SIGNATURE-----
>>
>> iQFIBAEBCAAyFiEE8J9lDX1nSBmJJZFAG13xyVromsMFAl7PXKoUHG51bWJjaGls
>> ZEBnbWFpbC5jb20ACgkQG13xyVromsOiMQf9EY7ES42hWxdjgFqiN0UqPGssRd+l
>> fcvhb+yiT5/71TYA8YbQOF0Pa2SpUBgGUw+hcHUd9aDXxAP+4ucmP4bZ/vx29UTN
>> T7YpOvuc5ZkafiN5bWghij1dLYhfi2CMDGGduU9rvjSleaRMhoX12wo8OM2qBHDm
>> 2bqWHrZsGVPsNlAK3ymN3PdmwOtxktRVUxWZOug3ZF6LYnEHWRp/rb6lvS2BF3Ej
>> qNazerYVcpfDsOiCWTSrjos2gnITT2pSlnxmrQFO69Pvex+T2dEJL7raNyuXi3Qd
>> jn5uayrnHn4ZeUwSfmAQiqvq2TsFZ5zvtH0nKM+4tp/6gSgEuTw9gYaoBQ==
>> =qrFy
>> -----END PGP SIGNATURE-----
>>


- -- 
[ stardiviner ]
       I try to make every word tell the meaning that I want to express.

       Blog: https://stardiviner.github.io/
       IRC(freenode): stardiviner, Matrix: stardiviner
       GPG: F09F650D7D674819892591401B5DF1C95AE89AC3
      
-----BEGIN PGP SIGNATURE-----

iQFIBAEBCAAyFiEE8J9lDX1nSBmJJZFAG13xyVromsMFAl7P4CwUHG51bWJjaGls
ZEBnbWFpbC5jb20ACgkQG13xyVromsMnIgf+MR3SvdlaxKofkSPZjYknvJ/MwUy4
iJOFYe0VRUXcNyjkQVGToNVVM50rWodR29LSXy8F1WbvgjDZb0arktsjOpjy+0Oq
R/7JiE7Y8lXYPlmn+Gcqoijcuoo0tNNaE8ia6U9uYbhp8I1Whd0Jhx5kWiXCxlWe
AMf2Yz4mw2CJc67W1MVUPM/2nq8+GhduLmY7UPw56A6ph+H2HD/AO+FzgL0dlivP
lV3EKCc7j5kA5uzXqbqfWJZdylH4cUbzdHHePsW+EpWodF/tcJcGJmlIm/PUNuEX
MsIkkKQt7QFLvr47vROVrJczEjT70ZdjWqjTZf5/esEsRUhb+m8uDbgc5g==
=c9Pj
-----END PGP SIGNATURE-----


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Feature] add a new org-attach dispatcher command to offline save web page
  2020-05-28  5:49   ` Ihor Radchenko
  2020-05-28  6:39     ` stardiviner
@ 2020-05-28 16:19     ` Matthew Lundin
  2020-05-28 17:11       ` Ihor Radchenko
  2020-05-29  2:03       ` stardiviner
  1 sibling, 2 replies; 29+ messages in thread
From: Matthew Lundin @ 2020-05-28 16:19 UTC (permalink / raw)
  To: Ihor Radchenko, numbchild, Org Mode

Ihor Radchenko <yantar92@gmail.com> writes:

> It does not mean that attaching URL directly is not worth including
> into org. This sounds pretty common use case, especially considering
> the number of packages providing similar feature. You mentioned
> org-board, but there is also org-download and org-web-tools.

My uneasiness has more to do with the specificity of the dependence on
monolith and the way that is hard-coded into the patch. When it comes to
patches, I think priority should go to those that are configurable,
accessible, and useful for everyone as opposed to those that have
hard-coded work-flows or highly-specific user configurations.

> I agree that monolith is completely uncommon tool and I would not expect
> the majority of users to have it installed, but the same functionality
> utilising built-in url.el (as a default) should be acceptable.

The question is: which functionality? A simple downloading tool or a
full archival tool? Achieving similar functionality to org-board or
monolith would a big task, since they aim to download an archival
version of a webpage (including all resources). In addition, with
archiving you also quickly run into the complexity of versioning based
on time archived. There's also the challenge of mapping the downloaded
files to metadata (specifically the original url). Org-board currently
handles both of these very well.

I suppose there would be a few options depending on what the aims are:

1. At the simple end, include little more than than a quick and dirty
   way of downloading a single resource (html, pdf, jpeg) using url.el
   or wget (or optionally, monolith) and putting that in the attachment
   folder. Those who want full archiving of all resources could use
   other tools like org-board or org-web-tools.

2. At the (much) more complex end, it would be to code out a robust
   archiving solution on top of url.el or wget.

3. Another, possibly simpler option... Add a command to the dispatcher
   that allows the user to invoke a custom function that is called with
   the attachment directory as the default-directory. This would enable
   more end-user flexibility, such as the ability to use
   wkhtmtoimage/wkhtmltopdf, monolith, phantom.js, archive.is, etc.

Best,

Matt


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Feature] add a new org-attach dispatcher command to offline save web page
  2020-05-28 16:19     ` Matthew Lundin
@ 2020-05-28 17:11       ` Ihor Radchenko
  2020-05-28 22:15         ` Matthew Lundin
  2020-05-29  2:06         ` stardiviner
  2020-05-29  2:03       ` stardiviner
  1 sibling, 2 replies; 29+ messages in thread
From: Ihor Radchenko @ 2020-05-28 17:11 UTC (permalink / raw)
  To: Matthew Lundin, numbchild, Org Mode

> My uneasiness has more to do with the specificity of the dependence on
> monolith and the way that is hard-coded into the patch. When it comes to
> patches, I think priority should go to those that are configurable,
> accessible, and useful for everyone as opposed to those that have
> hard-coded work-flows or highly-specific user configurations.

Agree. Though I can see a use of having monolith as one of the options
to help people discover what kind of tools they can use. I personally
had a hard time finding command-line cli like monolith. Actually, it is
the first time I heard about some offline tool handling js without a
need to write python or ruby code. 

> The question is: which functionality? A simple downloading tool or a
> full archival tool? Achieving similar functionality to org-board or
> monolith would a big task, since they aim to download an archival
> version of a webpage (including all resources). 

My view on this is bare-bones download, in a spirit of org-attach
itself. There is already 'url method in org-attach-attach, but it is
hard-coded to url-retrieve-synchronously. It would be handy if user
could configure alternative retrievers (like monolith, wget, curl, or
some user-defined function).

Note that monolith does not crawl the website. It only collects
everything needed to show the page as you see it in browser into single
html file. This behaviour is what one expects to obtain when saving a
full web-page from browser.

> In addition, with
> archiving you also quickly run into the complexity of versioning based
> on time archived. 

I guess that org-attach-git can be used for versioning, but I don't
think that versioning is within scope of this patch. Monolith does not
even support versioning.

> There's also the challenge of mapping the downloaded
> files to metadata (specifically the original url). Org-board currently
> handles both of these very well.

org-board is a great package, but it is not built-in. I do not think
that all the org-board functionality needs to be included into
org-attach. At least not within scope of this patch as I understand it.

Best,
Ihor

Matthew Lundin <mdl@imapmail.org> writes:

> Ihor Radchenko <yantar92@gmail.com> writes:
>
>> It does not mean that attaching URL directly is not worth including
>> into org. This sounds pretty common use case, especially considering
>> the number of packages providing similar feature. You mentioned
>> org-board, but there is also org-download and org-web-tools.
>
> My uneasiness has more to do with the specificity of the dependence on
> monolith and the way that is hard-coded into the patch. When it comes to
> patches, I think priority should go to those that are configurable,
> accessible, and useful for everyone as opposed to those that have
> hard-coded work-flows or highly-specific user configurations.
>
>> I agree that monolith is completely uncommon tool and I would not expect
>> the majority of users to have it installed, but the same functionality
>> utilising built-in url.el (as a default) should be acceptable.
>
> The question is: which functionality? A simple downloading tool or a
> full archival tool? Achieving similar functionality to org-board or
> monolith would a big task, since they aim to download an archival
> version of a webpage (including all resources). In addition, with
> archiving you also quickly run into the complexity of versioning based
> on time archived. There's also the challenge of mapping the downloaded
> files to metadata (specifically the original url). Org-board currently
> handles both of these very well.
>
> I suppose there would be a few options depending on what the aims are:
>
> 1. At the simple end, include little more than than a quick and dirty
>    way of downloading a single resource (html, pdf, jpeg) using url.el
>    or wget (or optionally, monolith) and putting that in the attachment
>    folder. Those who want full archiving of all resources could use
>    other tools like org-board or org-web-tools.
>
> 2. At the (much) more complex end, it would be to code out a robust
>    archiving solution on top of url.el or wget.
>
> 3. Another, possibly simpler option... Add a command to the dispatcher
>    that allows the user to invoke a custom function that is called with
>    the attachment directory as the default-directory. This would enable
>    more end-user flexibility, such as the ability to use
>    wkhtmtoimage/wkhtmltopdf, monolith, phantom.js, archive.is, etc.
>
> Best,
>
> Matt

-- 
Ihor Radchenko,
PhD,
Center for Advancing Materials Performance from the Nanoscale (CAMP-nano)
State Key Laboratory for Mechanical Behavior of Materials, Xi'an Jiaotong University, Xi'an, China
Email: yantar92@gmail.com, ihor_radchenko@alumni.sutd.edu.sg


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Feature] add a new org-attach dispatcher command to offline save web page
  2020-05-28 16:00         ` stardiviner
@ 2020-05-28 18:16           ` Ihor Radchenko
  0 siblings, 0 replies; 29+ messages in thread
From: Ihor Radchenko @ 2020-05-28 18:16 UTC (permalink / raw)
  To: stardiviner; +Cc: Matthew Lundin, Org Mode

> I don't know how to implement similar functionality like monolith with
> url-retrieve-synchronously + save-buffer. PATCH welcome.

Sorry, I missed that the default 'url attach method exists already.

> Of course curl and wget is alternative options, just still same problem. Don't
> know how to download web page as a single file and save it as archive file. If
> use other ways, it's just another org-board. Then this patch will be unnecessary.

Well. org-board is not built-in. And curl or wget are much more likely
to be installed. They can serve as a good fallback alternative to
monolith.

Best,
Ihor

stardiviner <numbchild@gmail.com> writes:

> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA256
>
>
> Ihor Radchenko <yantar92@gmail.com> writes:
>
>>> If have better solution, I totally agree with that.
>>
>> In the current state, your patch will be pretty much useless for users
>> without technical background.
>>
>> url-retrieve-synchronously + save-buffer should be a better default. It
>> will be available for all users.
>>
>
> I don't know how to implement similar functionality like monolith with
> url-retrieve-synchronously + save-buffer. PATCH welcome.
>
>> Ideally, there should be several customisation options. Something like:
>>  - url-retrieve-synchronously
>>  - curl
>>  - wget
>>  - monolith
>
> Of course curl and wget is alternative options, just still same problem. Don't
> know how to download web page as a single file and save it as archive file. If
> use other ways, it's just another org-board. Then this patch will be unnecessary.
>
>>
>> Also, it would be great if you provide some description about each
>> customisation in the docstring. For example, a link to github page of
>> monolith and a brief description how it is different from other url
>> retrieval methods should be useful for general user.
>
> This is good solution, I will improve option description.
>
>>
>> Best,
>> Ihor
>>
>>
>> stardiviner <numbchild@gmail.com> writes:
>>
>>> -----BEGIN PGP SIGNED MESSAGE-----
>>> Hash: SHA256
>>>
>>>
>>> Ihor Radchenko <yantar92@gmail.com> writes:
>>>
>>>>> This looks helpful, but I think this should be an add-on rather than
>>>>> integrated into the org-attach. I'm glad to learn about monolith, but
>>>>> far fewer users are likely to have it installed than have wget or curl
>>>>> installed. You might also want to check out org-board, which has similar
>>>>> functionality:
>>>>
>>>> It does not mean that attaching URL directly is not worth including into
>>>> org. This sounds pretty common use case, especially considering the number
>>>> of packages providing similar feature. You mentioned org-board, but
>>>> there is also org-download and org-web-tools.
>>>
>>> That's right, this phenomenon represents an obviously situation, Org Mode should
>>> have this functionality.
>>>
>>>>
>>>> I agree that monolith is completely uncommon tool and I would not expect
>>>> the majority of users to have it installed, but the same functionality
>>>> utilising built-in url.el (as a default) should be acceptable.
>>>
>>> If have better solution, I totally agree with that.
>>>
>>>>
>>>> Best,
>>>> Ihor
>>>>
>>>>
>>>> Matthew Lundin <mdl@imapmail.org> writes:
>>>>
>>>>> stardiviner <numbchild@gmail.com> writes:
>>>>>
>>>>>> I attached the patch.
>>>>>>
>>>>>> I think this feature will be helpful for use who archive web page data usually
>>>>>> like me. To be more portable, I also added an defcustom option for other
>>>>>> external command. And use an if condition to detect whether external command
>>>>>> available, else warning user.
>>>>>
>>>>> This looks helpful, but I think this should be an add-on rather than
>>>>> integrated into the org-attach. I'm glad to learn about monolith, but
>>>>> far fewer users are likely to have it installed than have wget or curl
>>>>> installed. You might also want to check out org-board, which has similar
>>>>> functionality:
>>>>>
>>>>> https://github.com/scallywag/org-board
>>>>>
>>>>> Best,
>>>>>
>>>>> Matt
>>>>>
>>>
>>>
>>> - -- 
>>> [ stardiviner ]
>>>        I try to make every word tell the meaning that I want to express.
>>>
>>>        Blog: https://stardiviner.github.io/
>>>        IRC(freenode): stardiviner, Matrix: stardiviner
>>>        GPG: F09F650D7D674819892591401B5DF1C95AE89AC3
>>>       
>>> -----BEGIN PGP SIGNATURE-----
>>>
>>> iQFIBAEBCAAyFiEE8J9lDX1nSBmJJZFAG13xyVromsMFAl7PXKoUHG51bWJjaGls
>>> ZEBnbWFpbC5jb20ACgkQG13xyVromsOiMQf9EY7ES42hWxdjgFqiN0UqPGssRd+l
>>> fcvhb+yiT5/71TYA8YbQOF0Pa2SpUBgGUw+hcHUd9aDXxAP+4ucmP4bZ/vx29UTN
>>> T7YpOvuc5ZkafiN5bWghij1dLYhfi2CMDGGduU9rvjSleaRMhoX12wo8OM2qBHDm
>>> 2bqWHrZsGVPsNlAK3ymN3PdmwOtxktRVUxWZOug3ZF6LYnEHWRp/rb6lvS2BF3Ej
>>> qNazerYVcpfDsOiCWTSrjos2gnITT2pSlnxmrQFO69Pvex+T2dEJL7raNyuXi3Qd
>>> jn5uayrnHn4ZeUwSfmAQiqvq2TsFZ5zvtH0nKM+4tp/6gSgEuTw9gYaoBQ==
>>> =qrFy
>>> -----END PGP SIGNATURE-----
>>>
>
>
> - -- 
> [ stardiviner ]
>        I try to make every word tell the meaning that I want to express.
>
>        Blog: https://stardiviner.github.io/
>        IRC(freenode): stardiviner, Matrix: stardiviner
>        GPG: F09F650D7D674819892591401B5DF1C95AE89AC3
>       
> -----BEGIN PGP SIGNATURE-----
>
> iQFIBAEBCAAyFiEE8J9lDX1nSBmJJZFAG13xyVromsMFAl7P4CwUHG51bWJjaGls
> ZEBnbWFpbC5jb20ACgkQG13xyVromsMnIgf+MR3SvdlaxKofkSPZjYknvJ/MwUy4
> iJOFYe0VRUXcNyjkQVGToNVVM50rWodR29LSXy8F1WbvgjDZb0arktsjOpjy+0Oq
> R/7JiE7Y8lXYPlmn+Gcqoijcuoo0tNNaE8ia6U9uYbhp8I1Whd0Jhx5kWiXCxlWe
> AMf2Yz4mw2CJc67W1MVUPM/2nq8+GhduLmY7UPw56A6ph+H2HD/AO+FzgL0dlivP
> lV3EKCc7j5kA5uzXqbqfWJZdylH4cUbzdHHePsW+EpWodF/tcJcGJmlIm/PUNuEX
> MsIkkKQt7QFLvr47vROVrJczEjT70ZdjWqjTZf5/esEsRUhb+m8uDbgc5g==
> =c9Pj
> -----END PGP SIGNATURE-----

-- 
Ihor Radchenko,
PhD,
Center for Advancing Materials Performance from the Nanoscale (CAMP-nano)
State Key Laboratory for Mechanical Behavior of Materials, Xi'an Jiaotong University, Xi'an, China
Email: yantar92@gmail.com, ihor_radchenko@alumni.sutd.edu.sg


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Feature] add a new org-attach dispatcher command to offline save web page
  2020-05-28 17:11       ` Ihor Radchenko
@ 2020-05-28 22:15         ` Matthew Lundin
  2020-05-29  2:15           ` stardiviner
  2020-05-29  2:06         ` stardiviner
  1 sibling, 1 reply; 29+ messages in thread
From: Matthew Lundin @ 2020-05-28 22:15 UTC (permalink / raw)
  To: Ihor Radchenko, numbchild, Org Mode

Ihor Radchenko <yantar92@gmail.com> writes:

> My view on this is bare-bones download, in a spirit of org-attach
> itself. There is already 'url method in org-attach-attach, but it is
> hard-coded to url-retrieve-synchronously. It would be handy if user
> could configure alternative retrievers (like monolith, wget, curl, or
> some user-defined function).

Thanks. I'm learning about this for the first time after using
org-attach for years! I have no idea why I've never noticed it.

> Note that monolith does not crawl the website. It only collects
> everything needed to show the page as you see it in browser into single
> html file. This behaviour is what one expects to obtain when saving a
> full web-page from browser.

Agreed. This looks like the way to go. Keep it simple and abstract the
call to url-copy-file to allow the user to configure the command for
downloading the page/file. 

Best,

Matt




^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Feature] add a new org-attach dispatcher command to offline save web page
  2020-05-27  9:55 [Feature] add a new org-attach dispatcher command to offline save web page stardiviner
  2020-05-28  2:55 ` Matthew Lundin
@ 2020-05-28 22:24 ` Samuel Wales
  2020-05-29  2:23 ` [PATCH updated] " stardiviner
  2020-05-29  2:27 ` stardiviner
  3 siblings, 0 replies; 29+ messages in thread
From: Samuel Wales @ 2020-05-28 22:24 UTC (permalink / raw)
  To: numbchild; +Cc: Org Mode

i keep wondering if this is similar to the idea of making all external
links cached.  or specified ones.  using a specified cache dir.

for example, you could have various external links in your
html-exportable document, and run a command to cache them all.


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Feature] add a new org-attach dispatcher command to offline save web page
  2020-05-28 16:19     ` Matthew Lundin
  2020-05-28 17:11       ` Ihor Radchenko
@ 2020-05-29  2:03       ` stardiviner
  2020-05-29  2:17         ` Ihor Radchenko
  1 sibling, 1 reply; 29+ messages in thread
From: stardiviner @ 2020-05-29  2:03 UTC (permalink / raw)
  To: Matthew Lundin; +Cc: Org Mode, Ihor Radchenko

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256


Matthew Lundin <mdl@imapmail.org> writes:

> Ihor Radchenko <yantar92@gmail.com> writes:
>
>> It does not mean that attaching URL directly is not worth including
>> into org. This sounds pretty common use case, especially considering
>> the number of packages providing similar feature. You mentioned
>> org-board, but there is also org-download and org-web-tools.
>
> My uneasiness has more to do with the specificity of the dependence on
> monolith and the way that is hard-coded into the patch. When it comes to
> patches, I think priority should go to those that are configurable,
> accessible, and useful for everyone as opposed to those that have
> hard-coded work-flows or highly-specific user configurations.

I don't think it's hardcoded, As I said in my first email, It's an `defcustom',
isn't it? User totally can change that. And I used `executable-find', the value
will be `nil' when monolith is not available. And I don't have ability to
archive same functionality with wget or curl. And as I said before, PATCH
welcome.

>
>> I agree that monolith is completely uncommon tool and I would not expect
>> the majority of users to have it installed, but the same functionality
>> utilising built-in url.el (as a default) should be acceptable.
>
> The question is: which functionality? A simple downloading tool or a
> full archival tool? Achieving similar functionality to org-board or
> monolith would a big task, since they aim to download an archival
> version of a webpage (including all resources). In addition, with
> archiving you also quickly run into the complexity of versioning based
> on time archived. There's also the challenge of mapping the downloaded
> files to metadata (specifically the original url). Org-board currently
> handles both of these very well.

If you mean adding a timestamp and original URL, I don't think its our argument
point, because user totally can do it manually, or automatic this task in some
kind of way. WDYT?

>
> I suppose there would be a few options depending on what the aims are:
>
> 1. At the simple end, include little more than than a quick and dirty
>    way of downloading a single resource (html, pdf, jpeg) using url.el
>    or wget (or optionally, monolith) and putting that in the attachment
>    folder. Those who want full archiving of all resources could use
>    other tools like org-board or org-web-tools.

the [u] url downloading method already did the simple task.

>
> 2. At the (much) more complex end, it would be to code out a robust
>    archiving solution on top of url.el or wget.

As I said, PATCH welcome, I admired many times I don't have ability to build a
complex archive functionality on url.el or wget or curl.

Isn't Org Mode build up from small things? Can just become a big tool at once?

>
> 3. Another, possibly simpler option... Add a command to the dispatcher
>    that allows the user to invoke a custom function that is called with
>    the attachment directory as the default-directory. This would enable
>    more end-user flexibility, such as the ability to use
>    wkhtmtoimage/wkhtmltopdf, monolith, phantom.js, archive.is, etc.
>

This is leaved to `defcustom' option. I will improve the download function to
have enough elasticity.

- -- 
[ stardiviner ]
       I try to make every word tell the meaning that I want to express.

       Blog: https://stardiviner.github.io/
       IRC(freenode): stardiviner, Matrix: stardiviner
       GPG: F09F650D7D674819892591401B5DF1C95AE89AC3
      
-----BEGIN PGP SIGNATURE-----

iQFIBAEBCAAyFiEE8J9lDX1nSBmJJZFAG13xyVromsMFAl7QbYMUHG51bWJjaGls
ZEBnbWFpbC5jb20ACgkQG13xyVromsOFvAgAjTPtt5ZikNZnkp1gfuEGunVzblk6
D6O+qgZAamT0EpR2nwK9U+x470KZAhXiLURkVb0l9/6uvA6MQZtj+umX/Qer/6bw
vB4RICWenFKyyBMy9U1pZ9WzTNw2ZNfeaEiJ0XlgXx9ZPGjBgcLi/7FSM4NybV4O
qbGwHbp4AzZ4mJOrMchNwlQfDYpg381xDewnHeuwtkVeBc34HpuFEGYKdmtOJ16S
mnL/5LrbI7MahlIZuiqyAzeyU3DkSvFOC/ovvch9YlAk1SeK/BslUcaGrwl/Il70
XG/28XaAvVW2OYrO9vOmwpdoguQqARhKeMuof6p85waf1OF8itd97P95Lw==
=V/fu
-----END PGP SIGNATURE-----


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Feature] add a new org-attach dispatcher command to offline save web page
  2020-05-28 17:11       ` Ihor Radchenko
  2020-05-28 22:15         ` Matthew Lundin
@ 2020-05-29  2:06         ` stardiviner
  1 sibling, 0 replies; 29+ messages in thread
From: stardiviner @ 2020-05-29  2:06 UTC (permalink / raw)
  To: Ihor Radchenko; +Cc: Matthew Lundin, Org Mode

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256


Thanks, Ihor, your explanation is helpful a lot!!!

Ihor Radchenko <yantar92@gmail.com> writes:

>> My uneasiness has more to do with the specificity of the dependence on
>> monolith and the way that is hard-coded into the patch. When it comes to
>> patches, I think priority should go to those that are configurable,
>> accessible, and useful for everyone as opposed to those that have
>> hard-coded work-flows or highly-specific user configurations.
>
> Agree. Though I can see a use of having monolith as one of the options
> to help people discover what kind of tools they can use. I personally
> had a hard time finding command-line cli like monolith. Actually, it is
> the first time I heard about some offline tool handling js without a
> need to write python or ruby code. 
>
>> The question is: which functionality? A simple downloading tool or a
>> full archival tool? Achieving similar functionality to org-board or
>> monolith would a big task, since they aim to download an archival
>> version of a webpage (including all resources). 
>
> My view on this is bare-bones download, in a spirit of org-attach
> itself. There is already 'url method in org-attach-attach, but it is
> hard-coded to url-retrieve-synchronously. It would be handy if user
> could configure alternative retrievers (like monolith, wget, curl, or
> some user-defined function).
>
> Note that monolith does not crawl the website. It only collects
> everything needed to show the page as you see it in browser into single
> html file. This behaviour is what one expects to obtain when saving a
> full web-page from browser.
>
>> In addition, with
>> archiving you also quickly run into the complexity of versioning based
>> on time archived. 
>
> I guess that org-attach-git can be used for versioning, but I don't
> think that versioning is within scope of this patch. Monolith does not
> even support versioning.
>
>> There's also the challenge of mapping the downloaded
>> files to metadata (specifically the original url). Org-board currently
>> handles both of these very well.
>
> org-board is a great package, but it is not built-in. I do not think
> that all the org-board functionality needs to be included into
> org-attach. At least not within scope of this patch as I understand it.
>
> Best,
> Ihor
>
> Matthew Lundin <mdl@imapmail.org> writes:
>
>> Ihor Radchenko <yantar92@gmail.com> writes:
>>
>>> It does not mean that attaching URL directly is not worth including
>>> into org. This sounds pretty common use case, especially considering
>>> the number of packages providing similar feature. You mentioned
>>> org-board, but there is also org-download and org-web-tools.
>>
>> My uneasiness has more to do with the specificity of the dependence on
>> monolith and the way that is hard-coded into the patch. When it comes to
>> patches, I think priority should go to those that are configurable,
>> accessible, and useful for everyone as opposed to those that have
>> hard-coded work-flows or highly-specific user configurations.
>>
>>> I agree that monolith is completely uncommon tool and I would not expect
>>> the majority of users to have it installed, but the same functionality
>>> utilising built-in url.el (as a default) should be acceptable.
>>
>> The question is: which functionality? A simple downloading tool or a
>> full archival tool? Achieving similar functionality to org-board or
>> monolith would a big task, since they aim to download an archival
>> version of a webpage (including all resources). In addition, with
>> archiving you also quickly run into the complexity of versioning based
>> on time archived. There's also the challenge of mapping the downloaded
>> files to metadata (specifically the original url). Org-board currently
>> handles both of these very well.
>>
>> I suppose there would be a few options depending on what the aims are:
>>
>> 1. At the simple end, include little more than than a quick and dirty
>>    way of downloading a single resource (html, pdf, jpeg) using url.el
>>    or wget (or optionally, monolith) and putting that in the attachment
>>    folder. Those who want full archiving of all resources could use
>>    other tools like org-board or org-web-tools.
>>
>> 2. At the (much) more complex end, it would be to code out a robust
>>    archiving solution on top of url.el or wget.
>>
>> 3. Another, possibly simpler option... Add a command to the dispatcher
>>    that allows the user to invoke a custom function that is called with
>>    the attachment directory as the default-directory. This would enable
>>    more end-user flexibility, such as the ability to use
>>    wkhtmtoimage/wkhtmltopdf, monolith, phantom.js, archive.is, etc.
>>
>> Best,
>>
>> Matt


- -- 
[ stardiviner ]
       I try to make every word tell the meaning that I want to express.

       Blog: https://stardiviner.github.io/
       IRC(freenode): stardiviner, Matrix: stardiviner
       GPG: F09F650D7D674819892591401B5DF1C95AE89AC3
      
-----BEGIN PGP SIGNATURE-----

iQFIBAEBCAAyFiEE8J9lDX1nSBmJJZFAG13xyVromsMFAl7QbiAUHG51bWJjaGls
ZEBnbWFpbC5jb20ACgkQG13xyVromsP+zQf/fR+EfMEqnXjb7GHaQFy6c7oKwYV6
ha8Wn4surO/drg5jgGXAyCpU72ru9Q1hKlxxlUYI/ZVexiCZU8U4masVHxOIbMWG
2PrtBAJgVcC87jrYufTF+bnWfDBmNMgMtpCALa4NQ2tH83vMKSkpBK42vRSIWK61
YUbGUD0aPdUCjVz5Cwa5xfZe2i9phPPg6ipjBCm+sIdzOeFL8Dj/34dtPW1G/sOE
LTFyntcWn44xpb9mjSgN6EWC1Y9LJYSPTyP0PWVu5JXBoQfA+4vf5i7UakLurI46
+fBPPWNkb48yRc5i5OTxnI3Nxxk5YlQQbXZStMmKuqpIKmsNJsVjqfGa+g==
=8LEN
-----END PGP SIGNATURE-----


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Feature] add a new org-attach dispatcher command to offline save web page
  2020-05-28 22:15         ` Matthew Lundin
@ 2020-05-29  2:15           ` stardiviner
  0 siblings, 0 replies; 29+ messages in thread
From: stardiviner @ 2020-05-29  2:15 UTC (permalink / raw)
  To: Matthew Lundin; +Cc: Org Mode, Ihor Radchenko

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256


Matthew Lundin <mdl@imapmail.org> writes:

> Ihor Radchenko <yantar92@gmail.com> writes:
>
>> My view on this is bare-bones download, in a spirit of org-attach
>> itself. There is already 'url method in org-attach-attach, but it is
>> hard-coded to url-retrieve-synchronously. It would be handy if user
>> could configure alternative retrievers (like monolith, wget, curl, or
>> some user-defined function).
>
> Thanks. I'm learning about this for the first time after using
> org-attach for years! I have no idea why I've never noticed it.
>
>> Note that monolith does not crawl the website. It only collects
>> everything needed to show the page as you see it in browser into single
>> html file. This behaviour is what one expects to obtain when saving a
>> full web-page from browser.
>
> Agreed. This looks like the way to go. Keep it simple and abstract the
> call to url-copy-file to allow the user to configure the command for
> downloading the page/file. 

I tried to adopt my code to fit this design. But later, I realized this two
things are totally different functionality. url-retrieve-synchronously is used
to download URL file. Not for offline archiving.

>
> Best,
>
> Matt


- -- 
[ stardiviner ]
       I try to make every word tell the meaning that I want to express.

       Blog: https://stardiviner.github.io/
       IRC(freenode): stardiviner, Matrix: stardiviner
       GPG: F09F650D7D674819892591401B5DF1C95AE89AC3
      
-----BEGIN PGP SIGNATURE-----

iQFIBAEBCAAyFiEE8J9lDX1nSBmJJZFAG13xyVromsMFAl7QcEAUHG51bWJjaGls
ZEBnbWFpbC5jb20ACgkQG13xyVromsOJpgf/Yzib6/loAB+N3CMY7LBKxmH/xZ7d
u+isMgMlB2b9lVcC0BojwVG8Vo372Midjzu0+TJr2KuU/l9QV8xLqK90FOsLKzha
y2gbCghl2ehPvIfIKfI/LUPW3NOFIq01T4RYHgpEQ2GCWBdpNZ6Hki3fnvou2YzO
oIdtyablG5tZowvgRNQDNSg8XvaLTkkPqvXQUX1n7wba+buMADmns+rc3OTWBQqW
BHa8HC0X0n4TIcGy/pnet7+ooJtkHT05Ua9W0SrsaQthbaoqiDrfEyirL3Izm89D
rKYTo8dM7hKWLp7gQckPH6TYa8eiK0oQjp1riHa7myB2DjeWRqz2tKs2VA==
=JzUc
-----END PGP SIGNATURE-----


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Feature] add a new org-attach dispatcher command to offline save web page
  2020-05-29  2:03       ` stardiviner
@ 2020-05-29  2:17         ` Ihor Radchenko
  2020-05-29  6:16           ` stardiviner
  2020-05-29 15:33           ` Matthew Lundin
  0 siblings, 2 replies; 29+ messages in thread
From: Ihor Radchenko @ 2020-05-29  2:17 UTC (permalink / raw)
  To: stardiviner, Matthew Lundin; +Cc: Org Mode

> As I said, PATCH welcome, I admired many times I don't have ability to build a
> complex archive functionality on url.el or wget or curl.

I have found the following solution [1] using wget:

wget --mirror -p --convert-links -P ./LOCAL-DIR WEBSITE-URL

This will not bundle the page into a single file, but it is better than
nothing. org-attach does not have to attach exactly one file.

[1] https://askubuntu.com/questions/411540/how-to-get-wget-to-download-exact-same-web-page-html-as-browser

Best,
Ihor

stardiviner <numbchild@gmail.com> writes:

> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA256
>
>
> Matthew Lundin <mdl@imapmail.org> writes:
>
>> Ihor Radchenko <yantar92@gmail.com> writes:
>>
>>> It does not mean that attaching URL directly is not worth including
>>> into org. This sounds pretty common use case, especially considering
>>> the number of packages providing similar feature. You mentioned
>>> org-board, but there is also org-download and org-web-tools.
>>
>> My uneasiness has more to do with the specificity of the dependence on
>> monolith and the way that is hard-coded into the patch. When it comes to
>> patches, I think priority should go to those that are configurable,
>> accessible, and useful for everyone as opposed to those that have
>> hard-coded work-flows or highly-specific user configurations.
>
> I don't think it's hardcoded, As I said in my first email, It's an `defcustom',
> isn't it? User totally can change that. And I used `executable-find', the value
> will be `nil' when monolith is not available. And I don't have ability to
> archive same functionality with wget or curl. And as I said before, PATCH
> welcome.
>
>>
>>> I agree that monolith is completely uncommon tool and I would not expect
>>> the majority of users to have it installed, but the same functionality
>>> utilising built-in url.el (as a default) should be acceptable.
>>
>> The question is: which functionality? A simple downloading tool or a
>> full archival tool? Achieving similar functionality to org-board or
>> monolith would a big task, since they aim to download an archival
>> version of a webpage (including all resources). In addition, with
>> archiving you also quickly run into the complexity of versioning based
>> on time archived. There's also the challenge of mapping the downloaded
>> files to metadata (specifically the original url). Org-board currently
>> handles both of these very well.
>
> If you mean adding a timestamp and original URL, I don't think its our argument
> point, because user totally can do it manually, or automatic this task in some
> kind of way. WDYT?
>
>>
>> I suppose there would be a few options depending on what the aims are:
>>
>> 1. At the simple end, include little more than than a quick and dirty
>>    way of downloading a single resource (html, pdf, jpeg) using url.el
>>    or wget (or optionally, monolith) and putting that in the attachment
>>    folder. Those who want full archiving of all resources could use
>>    other tools like org-board or org-web-tools.
>
> the [u] url downloading method already did the simple task.
>
>>
>> 2. At the (much) more complex end, it would be to code out a robust
>>    archiving solution on top of url.el or wget.
>
> As I said, PATCH welcome, I admired many times I don't have ability to build a
> complex archive functionality on url.el or wget or curl.
>
> Isn't Org Mode build up from small things? Can just become a big tool at once?
>
>>
>> 3. Another, possibly simpler option... Add a command to the dispatcher
>>    that allows the user to invoke a custom function that is called with
>>    the attachment directory as the default-directory. This would enable
>>    more end-user flexibility, such as the ability to use
>>    wkhtmtoimage/wkhtmltopdf, monolith, phantom.js, archive.is, etc.
>>
>
> This is leaved to `defcustom' option. I will improve the download function to
> have enough elasticity.
>
> - -- 
> [ stardiviner ]
>        I try to make every word tell the meaning that I want to express.
>
>        Blog: https://stardiviner.github.io/
>        IRC(freenode): stardiviner, Matrix: stardiviner
>        GPG: F09F650D7D674819892591401B5DF1C95AE89AC3
>       
> -----BEGIN PGP SIGNATURE-----
>
> iQFIBAEBCAAyFiEE8J9lDX1nSBmJJZFAG13xyVromsMFAl7QbYMUHG51bWJjaGls
> ZEBnbWFpbC5jb20ACgkQG13xyVromsOFvAgAjTPtt5ZikNZnkp1gfuEGunVzblk6
> D6O+qgZAamT0EpR2nwK9U+x470KZAhXiLURkVb0l9/6uvA6MQZtj+umX/Qer/6bw
> vB4RICWenFKyyBMy9U1pZ9WzTNw2ZNfeaEiJ0XlgXx9ZPGjBgcLi/7FSM4NybV4O
> qbGwHbp4AzZ4mJOrMchNwlQfDYpg381xDewnHeuwtkVeBc34HpuFEGYKdmtOJ16S
> mnL/5LrbI7MahlIZuiqyAzeyU3DkSvFOC/ovvch9YlAk1SeK/BslUcaGrwl/Il70
> XG/28XaAvVW2OYrO9vOmwpdoguQqARhKeMuof6p85waf1OF8itd97P95Lw==
> =V/fu
> -----END PGP SIGNATURE-----

-- 
Ihor Radchenko,
PhD,
Center for Advancing Materials Performance from the Nanoscale (CAMP-nano)
State Key Laboratory for Mechanical Behavior of Materials, Xi'an Jiaotong University, Xi'an, China
Email: yantar92@gmail.com, ihor_radchenko@alumni.sutd.edu.sg


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH updated] Re: [Feature] add a new org-attach dispatcher command to offline save web page
  2020-05-27  9:55 [Feature] add a new org-attach dispatcher command to offline save web page stardiviner
  2020-05-28  2:55 ` Matthew Lundin
  2020-05-28 22:24 ` Samuel Wales
@ 2020-05-29  2:23 ` stardiviner
  2020-05-29  2:27 ` stardiviner
  3 siblings, 0 replies; 29+ messages in thread
From: stardiviner @ 2020-05-29  2:23 UTC (permalink / raw)
  To: numbchild; +Cc: Org Mode


[-- Attachment #1.1: Type: text/plain, Size: 133 bytes --]


I adopted some part of my patch, and make function name and docstring more clear
after this mail thread long discussion.
      

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1.2: updated patch v2 --]
[-- Type: text/x-patch, Size: 3130 bytes --]

From 7e682ccd8d0d2a567de1bbbc0c8e02ee59e156bb Mon Sep 17 00:00:00 2001
From: stardiviner <numbchild@gmail.com>
Date: Fri, 29 May 2020 09:46:15 +0800
Subject: [PATCH] * org-attach.el: add a new command to archive web page

* lisp/org-attach.el (org-attach-url-archive): A new org-attach
dispatcher command to archive web page to a single file.

* lisp/org-agenda.el (org-attach-url-archive-command): A customize
option to specify external command for archiving web page.

* lisp/org-attach.el (org-attach-url-archive-page): A command
invoke external command to offline archive save web page.
---
 lisp/org-attach.el | 29 ++++++++++++++++++++++++++++-
 1 file changed, 28 insertions(+), 1 deletion(-)

diff --git a/lisp/org-attach.el b/lisp/org-attach.el
index 57d1360fc..14ac0f828 100644
--- a/lisp/org-attach.el
+++ b/lisp/org-attach.el
@@ -204,6 +204,8 @@ (defcustom org-attach-commands
      "Attach a file using symbolic-link method.")
     ((?u ?\C-u) org-attach-url
      "Attach a file from URL (downloading it).")
+    ((?U) org-attach-url-archive
+     "Attach an offline version of web page URL.")
     ((?b) org-attach-buffer
      "Select a buffer and attach its contents to the task.")
     ((?n ?\C-n) org-attach-new
@@ -467,6 +469,30 @@ (defun org-attach-url (url)
   (let ((org-attach-method 'url))
     (org-attach-attach url)))
 
+(defun org-attach-url-archive (url)
+  (interactive "MURL of the web page be archived to attach: \n")
+  (let ((org-attach-method 'archive))
+    (org-attach-attach url)))
+
+(defcustom org-attach-url-archive-command
+  (file-name-nondirectory (executable-find "monolith"))
+  "The command used to offline archive web page."
+  :type 'string
+  :safe #'stringp
+  :group 'org-attach)
+
+(defun org-attach-url-archive-page (url target-file)
+  "Save an offline archive of web page."
+  (pcase org-attach-url-archive-command
+    ("monolith"
+     (make-process
+      :name "org-attach-offline"
+      :command (list org-attach-url-archive-command url "-o" target-file)
+      :sentinel (lambda (proc event) (message "org-attach-offline finished!"))
+      :buffer "*org-attach-url-archive*"))
+    (nil (warn "You must have a command availble for offline save web page!
+Set variable `org-attach-url-archive-command'."))))
+
 (defun org-attach-buffer (buffer-name)
   "Attach BUFFER-NAME's contents to current outline node.
 BUFFER-NAME is a string.  Signals a `file-already-exists' error
@@ -504,7 +530,8 @@ (defun org-attach-attach (file &optional visit-dir method)
        ((eq method 'cp) (copy-file file attach-file))
        ((eq method 'ln) (add-name-to-file file attach-file))
        ((eq method 'lns) (make-symbolic-link file attach-file))
-       ((eq method 'url) (url-copy-file file attach-file)))
+       ((eq method 'url) (url-copy-file file attach-file))
+       ((eq method 'archive) (org-attach-url-archive-page file attach-file)))
       (run-hook-with-args 'org-attach-after-change-hook attach-dir)
       (org-attach-tag)
       (cond ((eq org-attach-store-link-p 'attached)
-- 
2.26.2


[-- Attachment #1.3: Type: text/plain, Size: 603 bytes --]


stardiviner <numbchild@gmail.com> writes:

> I attached the patch.
>
> I think this feature will be helpful for use who archive web page data usually
> like me. To be more portable, I also added an defcustom option for other
> external command. And use an if condition to detect whether external command
> available, else warning user.


-- 
[ stardiviner ]
       I try to make every word tell the meaning that I want to express.

       Blog: https://stardiviner.github.io/
       IRC(freenode): stardiviner, Matrix: stardiviner
       GPG: F09F650D7D674819892591401B5DF1C95AE89AC3

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 515 bytes --]

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [Feature] add a new org-attach dispatcher command to offline save web page
  2020-05-27  9:55 [Feature] add a new org-attach dispatcher command to offline save web page stardiviner
                   ` (2 preceding siblings ...)
  2020-05-29  2:23 ` [PATCH updated] " stardiviner
@ 2020-05-29  2:27 ` stardiviner
  2020-06-02 12:20   ` Bastien
  3 siblings, 1 reply; 29+ messages in thread
From: stardiviner @ 2020-05-29  2:27 UTC (permalink / raw)
  To: numbchild; +Cc: Org Mode


[-- Attachment #1.1: Type: text/plain, Size: 88 bytes --]


I added monolith link to defcustom option docstring now. Update patch again.
      

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1.2: updated patch v3 --]
[-- Type: text/x-patch, Size: 3194 bytes --]

From 6c667461b45e93059c6f801e485f7da4bfc3606c Mon Sep 17 00:00:00 2001
From: stardiviner <numbchild@gmail.com>
Date: Fri, 29 May 2020 09:46:15 +0800
Subject: [PATCH] * org-attach.el: add a new command to archive web page

* lisp/org-attach.el (org-attach-url-archive): A new org-attach
dispatcher command to archive web page to a single file.

* lisp/org-agenda.el (org-attach-url-archive-command): A customize
option to specify external command for archiving web page.

* lisp/org-attach.el (org-attach-url-archive-page): A command
invoke external command to offline archive save web page.
---
 lisp/org-attach.el | 30 +++++++++++++++++++++++++++++-
 1 file changed, 29 insertions(+), 1 deletion(-)

diff --git a/lisp/org-attach.el b/lisp/org-attach.el
index 57d1360fc..80855d147 100644
--- a/lisp/org-attach.el
+++ b/lisp/org-attach.el
@@ -204,6 +204,8 @@ (defcustom org-attach-commands
      "Attach a file using symbolic-link method.")
     ((?u ?\C-u) org-attach-url
      "Attach a file from URL (downloading it).")
+    ((?U) org-attach-url-archive
+     "Attach an offline version of web page URL.")
     ((?b) org-attach-buffer
      "Select a buffer and attach its contents to the task.")
     ((?n ?\C-n) org-attach-new
@@ -467,6 +469,31 @@ (defun org-attach-url (url)
   (let ((org-attach-method 'url))
     (org-attach-attach url)))
 
+(defun org-attach-url-archive (url)
+  (interactive "MURL of the web page be archived to attach: \n")
+  (let ((org-attach-method 'archive))
+    (org-attach-attach url)))
+
+(defcustom org-attach-url-archive-command
+  (file-name-nondirectory (executable-find "monolith"))
+  "The command used to offline archive web page.
+monolith can be found here: https://github.com/Y2Z/monolith."
+  :type 'string
+  :safe #'stringp
+  :group 'org-attach)
+
+(defun org-attach-url-archive-page (url target-file)
+  "Save an offline archive of web page."
+  (pcase org-attach-url-archive-command
+    ("monolith"
+     (make-process
+      :name "org-attach-offline"
+      :command (list org-attach-url-archive-command url "-o" target-file)
+      :sentinel (lambda (proc event) (message "org-attach-offline finished!"))
+      :buffer "*org-attach-url-archive*"))
+    (nil (warn "You must have a command availble for offline save web page!
+Set variable `org-attach-url-archive-command'."))))
+
 (defun org-attach-buffer (buffer-name)
   "Attach BUFFER-NAME's contents to current outline node.
 BUFFER-NAME is a string.  Signals a `file-already-exists' error
@@ -504,7 +531,8 @@ (defun org-attach-attach (file &optional visit-dir method)
        ((eq method 'cp) (copy-file file attach-file))
        ((eq method 'ln) (add-name-to-file file attach-file))
        ((eq method 'lns) (make-symbolic-link file attach-file))
-       ((eq method 'url) (url-copy-file file attach-file)))
+       ((eq method 'url) (url-copy-file file attach-file))
+       ((eq method 'archive) (org-attach-url-archive-page file attach-file)))
       (run-hook-with-args 'org-attach-after-change-hook attach-dir)
       (org-attach-tag)
       (cond ((eq org-attach-store-link-p 'attached)
-- 
2.26.2


[-- Attachment #1.3: Type: text/plain, Size: 603 bytes --]


stardiviner <numbchild@gmail.com> writes:

> I attached the patch.
>
> I think this feature will be helpful for use who archive web page data usually
> like me. To be more portable, I also added an defcustom option for other
> external command. And use an if condition to detect whether external command
> available, else warning user.


-- 
[ stardiviner ]
       I try to make every word tell the meaning that I want to express.

       Blog: https://stardiviner.github.io/
       IRC(freenode): stardiviner, Matrix: stardiviner
       GPG: F09F650D7D674819892591401B5DF1C95AE89AC3

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 515 bytes --]

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [Feature] add a new org-attach dispatcher command to offline save web page
  2020-05-29  2:17         ` Ihor Radchenko
@ 2020-05-29  6:16           ` stardiviner
  2020-05-29 15:33           ` Matthew Lundin
  1 sibling, 0 replies; 29+ messages in thread
From: stardiviner @ 2020-05-29  6:16 UTC (permalink / raw)
  To: Ihor Radchenko; +Cc: Matthew Lundin, Org Mode

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256


Ihor Radchenko <yantar92@gmail.com> writes:

>> As I said, PATCH welcome, I admired many times I don't have ability to build a
>> complex archive functionality on url.el or wget or curl.
>
> I have found the following solution [1] using wget:
>
> wget --mirror -p --convert-links -P ./LOCAL-DIR WEBSITE-URL

I know this "--mirror" option. I used this solution to archive web page before.
Make this command work in org-attach is a little bit of complicated. I have
considered it on current org-attach code. With my Elisp ability, can't write it
out. Still not very good at Elisp :(

>
> This will not bundle the page into a single file, but it is better than
> nothing. org-attach does not have to attach exactly one file.

You can check out function `org-attach-attach` source code, the "cond" structure
is the most commands.

I remember org-attach dispatcher used to have a command to move directory
directly. But it seems removed. Now, all org-attach commands only supports file.
I also search Elisp functions, can't find a function to move directory around.
So PATCH welcome. I remember you send patch on Org Mode mailing list, I believe
you at least is same kind of newbie like me who can write a little Elisp. :)

>
> [1] https://askubuntu.com/questions/411540/how-to-get-wget-to-download-exact-same-web-page-html-as-browser
>
> Best,
> Ihor
>
> stardiviner <numbchild@gmail.com> writes:
>
>> -----BEGIN PGP SIGNED MESSAGE-----
>> Hash: SHA256
>>
>>
>> Matthew Lundin <mdl@imapmail.org> writes:
>>
>>> Ihor Radchenko <yantar92@gmail.com> writes:
>>>
>>>> It does not mean that attaching URL directly is not worth including
>>>> into org. This sounds pretty common use case, especially considering
>>>> the number of packages providing similar feature. You mentioned
>>>> org-board, but there is also org-download and org-web-tools.
>>>
>>> My uneasiness has more to do with the specificity of the dependence on
>>> monolith and the way that is hard-coded into the patch. When it comes to
>>> patches, I think priority should go to those that are configurable,
>>> accessible, and useful for everyone as opposed to those that have
>>> hard-coded work-flows or highly-specific user configurations.
>>
>> I don't think it's hardcoded, As I said in my first email, It's an `defcustom',
>> isn't it? User totally can change that. And I used `executable-find', the value
>> will be `nil' when monolith is not available. And I don't have ability to
>> archive same functionality with wget or curl. And as I said before, PATCH
>> welcome.
>>
>>>
>>>> I agree that monolith is completely uncommon tool and I would not expect
>>>> the majority of users to have it installed, but the same functionality
>>>> utilising built-in url.el (as a default) should be acceptable.
>>>
>>> The question is: which functionality? A simple downloading tool or a
>>> full archival tool? Achieving similar functionality to org-board or
>>> monolith would a big task, since they aim to download an archival
>>> version of a webpage (including all resources). In addition, with
>>> archiving you also quickly run into the complexity of versioning based
>>> on time archived. There's also the challenge of mapping the downloaded
>>> files to metadata (specifically the original url). Org-board currently
>>> handles both of these very well.
>>
>> If you mean adding a timestamp and original URL, I don't think its our argument
>> point, because user totally can do it manually, or automatic this task in some
>> kind of way. WDYT?
>>
>>>
>>> I suppose there would be a few options depending on what the aims are:
>>>
>>> 1. At the simple end, include little more than than a quick and dirty
>>>    way of downloading a single resource (html, pdf, jpeg) using url.el
>>>    or wget (or optionally, monolith) and putting that in the attachment
>>>    folder. Those who want full archiving of all resources could use
>>>    other tools like org-board or org-web-tools.
>>
>> the [u] url downloading method already did the simple task.
>>
>>>
>>> 2. At the (much) more complex end, it would be to code out a robust
>>>    archiving solution on top of url.el or wget.
>>
>> As I said, PATCH welcome, I admired many times I don't have ability to build a
>> complex archive functionality on url.el or wget or curl.
>>
>> Isn't Org Mode build up from small things? Can just become a big tool at once?
>>
>>>
>>> 3. Another, possibly simpler option... Add a command to the dispatcher
>>>    that allows the user to invoke a custom function that is called with
>>>    the attachment directory as the default-directory. This would enable
>>>    more end-user flexibility, such as the ability to use
>>>    wkhtmtoimage/wkhtmltopdf, monolith, phantom.js, archive.is, etc.
>>>
>>
>> This is leaved to `defcustom' option. I will improve the download function to
>> have enough elasticity.
>>
>> - -- 
>> [ stardiviner ]
>>        I try to make every word tell the meaning that I want to express.
>>
>>        Blog: https://stardiviner.github.io/
>>        IRC(freenode): stardiviner, Matrix: stardiviner
>>        GPG: F09F650D7D674819892591401B5DF1C95AE89AC3
>>       
>> -----BEGIN PGP SIGNATURE-----
>>
>> iQFIBAEBCAAyFiEE8J9lDX1nSBmJJZFAG13xyVromsMFAl7QbYMUHG51bWJjaGls
>> ZEBnbWFpbC5jb20ACgkQG13xyVromsOFvAgAjTPtt5ZikNZnkp1gfuEGunVzblk6
>> D6O+qgZAamT0EpR2nwK9U+x470KZAhXiLURkVb0l9/6uvA6MQZtj+umX/Qer/6bw
>> vB4RICWenFKyyBMy9U1pZ9WzTNw2ZNfeaEiJ0XlgXx9ZPGjBgcLi/7FSM4NybV4O
>> qbGwHbp4AzZ4mJOrMchNwlQfDYpg381xDewnHeuwtkVeBc34HpuFEGYKdmtOJ16S
>> mnL/5LrbI7MahlIZuiqyAzeyU3DkSvFOC/ovvch9YlAk1SeK/BslUcaGrwl/Il70
>> XG/28XaAvVW2OYrO9vOmwpdoguQqARhKeMuof6p85waf1OF8itd97P95Lw==
>> =V/fu
>> -----END PGP SIGNATURE-----


- -- 
[ stardiviner ]
       I try to make every word tell the meaning that I want to express.

       Blog: https://stardiviner.github.io/
       IRC(freenode): stardiviner, Matrix: stardiviner
       GPG: F09F650D7D674819892591401B5DF1C95AE89AC3
      
-----BEGIN PGP SIGNATURE-----

iQFIBAEBCAAyFiEE8J9lDX1nSBmJJZFAG13xyVromsMFAl7QqLsUHG51bWJjaGls
ZEBnbWFpbC5jb20ACgkQG13xyVromsNlIQf/RSbkVNiqJ1PetBjIM7s7OT/frOkW
s8hJQPtmCLbG0fj/aUCZyRz0qIjEfr8MY30j3tb7cfC2dlrVACrG18fQMCCzBexy
Kdqc2Grq16m1dj83tvNoxXeZQdvnGtba8x13dNc3RnpjVBC10n1tfUQ96GFHEtP6
WoLW8wHs6neES2wFzOt0xnO0Z5fAjG1KmF18F+t/AcerxlJLh/7Rd0rNxxrap4fP
5PTI1tq02xX2+lzg80BuAhlLFO/NWm+dg6ON7xUoVdIbPbaTJxh58e+8EQw93CFD
JQGxEvimD2w7K1hUBnspLo0cuHcL+QTQJr7ibDAxyCbqda9BvKB7fbjyCQ==
=uZRf
-----END PGP SIGNATURE-----


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Feature] add a new org-attach dispatcher command to offline save web page
  2020-05-29  2:17         ` Ihor Radchenko
  2020-05-29  6:16           ` stardiviner
@ 2020-05-29 15:33           ` Matthew Lundin
  2020-05-29 16:32             ` stardiviner
  2020-05-30  6:09             ` Ihor Radchenko
  1 sibling, 2 replies; 29+ messages in thread
From: Matthew Lundin @ 2020-05-29 15:33 UTC (permalink / raw)
  To: Ihor Radchenko, stardiviner; +Cc: Org Mode

Ihor Radchenko <yantar92@gmail.com> writes:

>> As I said, PATCH welcome, I admired many times I don't have ability to build a
>> complex archive functionality on url.el or wget or curl.
>
> I have found the following solution [1] using wget:
>
> wget --mirror -p --convert-links -P ./LOCAL-DIR WEBSITE-URL
>

I don't think --mirror is what we want this context, since that will
initiate a recursive download of the entire site. (Ironically, my IP is
now banned from a personal blog that provides a how-to for using wget
after I tried to run the above command on it.) From the wget manual:

    -m
    --mirror
        Turn on options suitable for mirroring.  This option turns on recursion and
        time-stamping, sets infinite recursion depth and keeps FTP directory listings.
        It is currently equivalent to -r -N -l inf --no-remove-listing.

AFAICT, org-board uses the following options, which limit the archiving
to a single page and all its resources:

wget -e robots=off --page-requisites --adjust-extension --convert-links [...]

> This will not bundle the page into a single file, but it is better than
> nothing. org-attach does not have to attach exactly one file.

You can also create a warc (web archive) file with wget, but then you
need a web archive replayer to view it, which is not exactly convenient.

Best,

Matt



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Feature] add a new org-attach dispatcher command to offline save web page
  2020-05-29 15:33           ` Matthew Lundin
@ 2020-05-29 16:32             ` stardiviner
  2020-05-30  6:09             ` Ihor Radchenko
  1 sibling, 0 replies; 29+ messages in thread
From: stardiviner @ 2020-05-29 16:32 UTC (permalink / raw)
  To: Matthew Lundin; +Cc: Org Mode, Ihor Radchenko

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256


Matthew Lundin <mdl@imapmail.org> writes:

> Ihor Radchenko <yantar92@gmail.com> writes:
>
>>> As I said, PATCH welcome, I admired many times I don't have ability to build a
>>> complex archive functionality on url.el or wget or curl.
>>
>> I have found the following solution [1] using wget:
>>
>> wget --mirror -p --convert-links -P ./LOCAL-DIR WEBSITE-URL
>>
>
> I don't think --mirror is what we want this context, since that will
> initiate a recursive download of the entire site. (Ironically, my IP is
> now banned from a personal blog that provides a how-to for using wget
> after I tried to run the above command on it.) From the wget manual:
>
>     -m
>     --mirror
>         Turn on options suitable for mirroring.  This option turns on recursion and
>         time-stamping, sets infinite recursion depth and keeps FTP directory listings.
>         It is currently equivalent to -r -N -l inf --no-remove-listing.
>
> AFAICT, org-board uses the following options, which limit the archiving
> to a single page and all its resources:
>
> wget -e robots=off --page-requisites --adjust-extension --convert-links [...]

Interesting, Learned a new skill.

>
>> This will not bundle the page into a single file, but it is better than
>> nothing. org-attach does not have to attach exactly one file.
>
> You can also create a warc (web archive) file with wget, but then you
> need a web archive replayer to view it, which is not exactly convenient.
>

Looks like you already parsed most technologies in awesome-web-archive list.
Does that mean most technologies not suitable for Org Mode for now?

> Best,
>
> Matt


- -- 
[ stardiviner ]
       I try to make every word tell the meaning that I want to express.

       Blog: https://stardiviner.github.io/
       IRC(freenode): stardiviner, Matrix: stardiviner
       GPG: F09F650D7D674819892591401B5DF1C95AE89AC3
      
-----BEGIN PGP SIGNATURE-----

iQFIBAEBCAAyFiEE8J9lDX1nSBmJJZFAG13xyVromsMFAl7RORIUHG51bWJjaGls
ZEBnbWFpbC5jb20ACgkQG13xyVromsPyxwf5ATSzoz/vxLnlWGJ4jKxH/HtKlleE
LJP3BhLDuRl6nrjJTBYfwCM/SrFbcWQjpzLtoXFRvL6Ro60yhNWxCgYSu74WPF6B
Or3oEQrnQGvQ8SKNcyEOimXbKCUDImNetCAyLpxtbPJ0pHtf6py49p9i/M7ZTZK/
jIF2g0E1AezDwGPG00jpdO1cDOiGYczvRfaaMr2OQ1EBtDrZOn5z+Cb6YiaHqfF5
x5HT5Z8fCqxAFOj9ixzOdqZ9C2hRncxATyiSFH0vAzwDYpYw9A20ZqoevThTVZRI
IUmirJM7pUQ4MM2SXidkLaT5T/Zuy4PjCrM8yfXybbnrSkWrVgjeEpXhYg==
=gsT0
-----END PGP SIGNATURE-----


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Feature] add a new org-attach dispatcher command to offline save web page
  2020-05-29 15:33           ` Matthew Lundin
  2020-05-29 16:32             ` stardiviner
@ 2020-05-30  6:09             ` Ihor Radchenko
  1 sibling, 0 replies; 29+ messages in thread
From: Ihor Radchenko @ 2020-05-30  6:09 UTC (permalink / raw)
  To: Matthew Lundin, stardiviner; +Cc: Org Mode

> AFAICT, org-board uses the following options, which limit the archiving
> to a single page and all its resources:
>
> wget -e robots=off --page-requisites --adjust-extension --convert-links [...]

This is certainly better. I believe that wget would be a better default
(with right flags). It is much more likely to be installed for average
user.

> You can also create a warc (web archive) file with wget, but then you
> need a web archive replayer to view it, which is not exactly convenient.

Interesting. I did not know about warc.

Matthew Lundin <mdl@imapmail.org> writes:

> Ihor Radchenko <yantar92@gmail.com> writes:
>
>>> As I said, PATCH welcome, I admired many times I don't have ability to build a
>>> complex archive functionality on url.el or wget or curl.
>>
>> I have found the following solution [1] using wget:
>>
>> wget --mirror -p --convert-links -P ./LOCAL-DIR WEBSITE-URL
>>
>
> I don't think --mirror is what we want this context, since that will
> initiate a recursive download of the entire site. (Ironically, my IP is
> now banned from a personal blog that provides a how-to for using wget
> after I tried to run the above command on it.) From the wget manual:
>
>     -m
>     --mirror
>         Turn on options suitable for mirroring.  This option turns on recursion and
>         time-stamping, sets infinite recursion depth and keeps FTP directory listings.
>         It is currently equivalent to -r -N -l inf --no-remove-listing.
>
> AFAICT, org-board uses the following options, which limit the archiving
> to a single page and all its resources:
>
> wget -e robots=off --page-requisites --adjust-extension --convert-links [...]
>
>> This will not bundle the page into a single file, but it is better than
>> nothing. org-attach does not have to attach exactly one file.
>
> You can also create a warc (web archive) file with wget, but then you
> need a web archive replayer to view it, which is not exactly convenient.
>
> Best,
>
> Matt
>

-- 
Ihor Radchenko,
PhD,
Center for Advancing Materials Performance from the Nanoscale (CAMP-nano)
State Key Laboratory for Mechanical Behavior of Materials, Xi'an Jiaotong University, Xi'an, China
Email: yantar92@gmail.com, ihor_radchenko@alumni.sutd.edu.sg


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Feature] add a new org-attach dispatcher command to offline save web page
  2020-05-29  2:27 ` stardiviner
@ 2020-06-02 12:20   ` Bastien
  2020-06-02 14:06     ` stardiviner
  0 siblings, 1 reply; 29+ messages in thread
From: Bastien @ 2020-06-02 12:20 UTC (permalink / raw)
  To: stardiviner; +Cc: Org Mode

Hi,

stardiviner <numbchild@gmail.com> writes:

> I added monolith link to defcustom option docstring now. Update
> patch again.

thanks for working on this, sorry for coming late to the party.

The initial need was to be able to download big web documents
asynchronously* but the conversation drifted and the patch now is
about adding a new org-attach command, using monolith.

org-attach-commands is an option that users can customize to suit
their needs.  Adding the ability to use monolith is fine, and we can
advertize through Worg, but I don't think it should be in Org's core,
because (1) monolith is never preinstalled and (2) there seems to be a
thin and non-obvious difference between what `org-attach-url' and what
`org-attach-url-archive' provide -- unless I miss something?

So please don't hesitate to document this on
https://orgmode.org/worg/org-hacks.html or any other page!

Thanks,

* BTW, I still think implementing url-copy-file-asynchronously
is something that would be useful within Emacs core!

-- 
 Bastien


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Feature] add a new org-attach dispatcher command to offline save web page
  2020-06-02 12:20   ` Bastien
@ 2020-06-02 14:06     ` stardiviner
  2020-06-02 14:26       ` Bastien
  0 siblings, 1 reply; 29+ messages in thread
From: stardiviner @ 2020-06-02 14:06 UTC (permalink / raw)
  To: Bastien; +Cc: Org Mode

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256


Bastien <bzg@gnu.org> writes:

> Hi,
>
> stardiviner <numbchild@gmail.com> writes:
>
>> I added monolith link to defcustom option docstring now. Update
>> patch again.
>
> thanks for working on this, sorry for coming late to the party.
>
> The initial need was to be able to download big web documents
> asynchronously* but the conversation drifted and the patch now is
> about adding a new org-attach command, using monolith.

No, my patch's purpose is to offline archive the whole web page completely.

>
> org-attach-commands is an option that users can customize to suit
> their needs.  Adding the ability to use monolith is fine, and we can
> advertize through Worg, but I don't think it should be in Org's core,
> because (1) monolith is never preinstalled and (2) there seems to be a
> thin and non-obvious difference between what `org-attach-url' and what
> `org-attach-url-archive' provide -- unless I miss something?

Well, It is acceptable if you think reply on monolith is un-acceptable.

I think maybe Org maintainer can improve the org-attach code to suitable to add
custom function. In my patch, If I want to add a custom function to archive
without modify Org Mode org-attach code, customize is not possible. So do you
have any idea to improve current org-attach code, make it suitable for things
like my patch?

>
> So please don't hesitate to document this on
> https://orgmode.org/worg/org-hacks.html or any other page!
>
> Thanks,
>
> * BTW, I still think implementing url-copy-file-asynchronously
> is something that would be useful within Emacs core!

I remember I send an email to emacs-dev mailing list, but seems this is not
implemented. So ....

- -- 
[ stardiviner ]
       I try to make every word tell the meaning that I want to express.

       Blog: https://stardiviner.github.io/
       IRC(freenode): stardiviner, Matrix: stardiviner
       GPG: F09F650D7D674819892591401B5DF1C95AE89AC3
-----BEGIN PGP SIGNATURE-----

iQFIBAEBCAAyFiEE8J9lDX1nSBmJJZFAG13xyVromsMFAl7WXNoUHG51bWJjaGls
ZEBnbWFpbC5jb20ACgkQG13xyVromsOJ1wgAwxeVRr8DS5VmN49SP6g5BfcNUYGy
V6BdIBH2DFFU1F68df8VR/zIgq94QylsoAGR/9R6zyDv7iWBiHZxCBdnn4CXkYsh
k18td+lcRmG9M2sr40FvLeNW5bNseI1Sz0quNrSOlc+1n0DDH9AiTEWUQEGCE2gW
53u2zQak4woFCuQToNP2AWm9vd+U7yBDBtkdz2vFYnqQ5TUMLC17bDcKYDPEXMVj
X0QulekeVhFw+iheQ2s63bU/IpaeU3fdylZhctNECNUqHRf9If+mOInpKYmZ3xsP
JDly8+tjtOPHlr0Llizzd07805NwNGhONyH8DXdeI7AeAQJudYkAt6x8lw==
=PEyv
-----END PGP SIGNATURE-----


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Feature] add a new org-attach dispatcher command to offline save web page
  2020-06-02 14:06     ` stardiviner
@ 2020-06-02 14:26       ` Bastien
  2020-06-02 14:40         ` stardiviner
  0 siblings, 1 reply; 29+ messages in thread
From: Bastien @ 2020-06-02 14:26 UTC (permalink / raw)
  To: stardiviner; +Cc: Org Mode

Hi Stardiviner,

stardiviner <numbchild@gmail.com> writes:

> I think maybe Org maintainer can improve the org-attach code to suitable to add
> custom function. 

Sorry, I don't understand.

Can you to use M-x customize-option RET org-attach-commands RET then
add a custom command pointing to the command you have written, bound
to a keybinding of your choice?

If something does not work with this approach, can you show me what
you tried and what does not work?

-- 
 Bastien


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Feature] add a new org-attach dispatcher command to offline save web page
  2020-06-02 14:26       ` Bastien
@ 2020-06-02 14:40         ` stardiviner
  2020-06-03 15:10           ` Bastien
  0 siblings, 1 reply; 29+ messages in thread
From: stardiviner @ 2020-06-02 14:40 UTC (permalink / raw)
  To: Bastien; +Cc: Org Mode

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256


Bastien <bzg@gnu.org> writes:

> Hi Stardiviner,
>
> stardiviner <numbchild@gmail.com> writes:
>
>> I think maybe Org maintainer can improve the org-attach code to suitable to add
>> custom function. 
>
> Sorry, I don't understand.
>
> Can you to use M-x customize-option RET org-attach-commands RET then
> add a custom command pointing to the command you have written, bound
> to a keybinding of your choice?
>
> If something does not work with this approach, can you show me what
> you tried and what does not work?

Here is my patch, I try to annotate on my patch code to express my words:

#+begin_src diff
- From 6c667461b45e93059c6f801e485f7da4bfc3606c Mon Sep 17 00:00:00 2001
From: stardiviner <numbchild@gmail.com>
Subject: [PATCH] * org-attach.el: add a new command to archive web page

,* lisp/org-attach.el (org-attach-url-archive): A new org-attach
dispatcher command to archive web page to a single file.

,* lisp/org-agenda.el (org-attach-url-archive-command): A customize
option to specify external command for archiving web page.

,* lisp/org-attach.el (org-attach-url-archive-page): A command
invoke external command to offline archive save web page.
- ---
 lisp/org-attach.el | 30 +++++++++++++++++++++++++++++-
 1 file changed, 29 insertions(+), 1 deletion(-)

diff --git a/lisp/org-attach.el b/lisp/org-attach.el
index 57d1360fc..80855d147 100644
- --- a/lisp/org-attach.el
+++ b/lisp/org-attach.el
@@ -204,6 +204,8 @@ (defcustom org-attach-commands
      "Attach a file using symbolic-link method.")
     ((?u ?\C-u) org-attach-url
      "Attach a file from URL (downloading it).")
+    ((?U) org-attach-url-archive
+     "Attach an offline version of web page URL.")             <----- This `org-attach-commands' is an option which indeead can be customized.
     ((?b) org-attach-buffer
      "Select a buffer and attach its contents to the task.")
     ((?n ?\C-n) org-attach-new
@@ -467,6 +469,31 @@ (defun org-attach-url (url)
   (let ((org-attach-method 'url))
     (org-attach-attach url)))
 
+(defun org-attach-url-archive (url)                            <-----------------+
+  (interactive "MURL of the web page be archived to attach: \n")                 |
+  (let ((org-attach-method 'archive))                                            |
+    (org-attach-attach url)))                                                    |
+                                                                                 |
+(defcustom org-attach-url-archive-command                                        |  You can see, all those function code can be put in user configuration. Yes.
+  (file-name-nondirectory (executable-find "monolith"))                          |
+  "The command used to offline archive web page.                                 |
+monolith can be found here: https://github.com/Y2Z/monolith."                    |
+  :type 'string                                                                  |
+  :safe #'stringp                                                                |
+  :group 'org-attach)                                                            |
+                                                                                 |
+(defun org-attach-url-archive-page (url target-file)                             |
+  "Save an offline archive of web page."                                         |
+  (pcase org-attach-url-archive-command                                          |
+    ("monolith"                                                                  |
+     (make-process                                                               |
+      :name "org-attach-offline"                                                 |
+      :command (list org-attach-url-archive-command url "-o" target-file)        |
+      :sentinel (lambda (proc event) (message "org-attach-offline finished!"))   |
+      :buffer "*org-attach-url-archive*"))                                       |
+    (nil (warn "You must have a command availble for offline save web page!      |
+Set variable `org-attach-url-archive-command'."))))               <--------------+
+
 (defun org-attach-buffer (buffer-name)
   "Attach BUFFER-NAME's contents to current outline node.
 BUFFER-NAME is a string.  Signals a `file-already-exists' error
@@ -504,7 +531,8 @@ (defun org-attach-attach (file &optional visit-dir method)
        ((eq method 'cp) (copy-file file attach-file))
        ((eq method 'ln) (add-name-to-file file attach-file))
        ((eq method 'lns) (make-symbolic-link file attach-file))
- -       ((eq method 'url) (url-copy-file file attach-file)))
+       ((eq method 'url) (url-copy-file file attach-file))        <-------------- But this can't be don'e through advice or something else. Unless copy whole function code.
+       ((eq method 'archive) (org-attach-url-archive-page file attach-file)))     In my less elisp knowledge, I don't have any idea how to add a new org-attach method in `org-attach-attach'.
       (run-hook-with-args 'org-attach-after-change-hook attach-dir)               Maybe you have better idea. I might consider publish this as a tiny Emacs extension.
       (org-attach-tag)
       (cond ((eq org-attach-store-link-p 'attached)
- -- 
2.26.2
#+end_src

- -- 
[ stardiviner ]
       I try to make every word tell the meaning that I want to express.

       Blog: https://stardiviner.github.io/
       IRC(freenode): stardiviner, Matrix: stardiviner
       GPG: F09F650D7D674819892591401B5DF1C95AE89AC3
-----BEGIN PGP SIGNATURE-----

iQFIBAEBCAAyFiEE8J9lDX1nSBmJJZFAG13xyVromsMFAl7WZN4UHG51bWJjaGls
ZEBnbWFpbC5jb20ACgkQG13xyVromsNmdQgAymuUs1u6lq1Rf8hA2OIRp1I+SC1X
v4N1GbG8u94VuMeSHjp8zAliKuYPyBpZ6UhFahvZoXlKUX0jdZMybcQAPDUlRdSg
NbC+qJ7OFyuTVMCzPb9RJlHS8lm9oJKWcj6KlMwvOeKoIZUIeCFAV3RW+eM3diso
FNfzGOwS97YPHI9KD4IfWpM7UqtYADx3HrKYn1k0X2m46cmoSyF3FNyu9dnYfmaw
5evX62WtWOMNa1eflIO9wBr//e9bJJFSIAI0Rl2sI+/eo5uas4YJ4/ezDsyrd3nL
dDb/HywwCXjhgthzKHXQGBgXMyrx+8ZbrDaXz4ywEaUKGVL+5ri54k2Sbg==
=Elhx
-----END PGP SIGNATURE-----


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Feature] add a new org-attach dispatcher command to offline save web page
  2020-06-02 14:40         ` stardiviner
@ 2020-06-03 15:10           ` Bastien
  2020-06-03 23:34             ` stardiviner
  0 siblings, 1 reply; 29+ messages in thread
From: Bastien @ 2020-06-03 15:10 UTC (permalink / raw)
  To: stardiviner; +Cc: Org Mode

Hi,

stardiviner <numbchild@gmail.com> writes:

> Here is my patch, I try to annotate on my patch code to express my
> words:

Thanks - I see what is hardcoded and why.

Since the whole idea is to use something else than url-copy-file for
downloading URLs I would rather add an option to allow using a custom
function.

Something like `org-attach-url-copy-file-function' which would be
url-copy-file by default but could be set to another function with 
the same signature.

Don't rush on implementing this though, I'd like to reread the code
in org-attach.el again to decide if this is the right way - and this
won't be before 9.5.

Best,

-- 
 Bastien


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Feature] add a new org-attach dispatcher command to offline save web page
  2020-06-03 15:10           ` Bastien
@ 2020-06-03 23:34             ` stardiviner
  0 siblings, 0 replies; 29+ messages in thread
From: stardiviner @ 2020-06-03 23:34 UTC (permalink / raw)
  To: Bastien; +Cc: Org Mode

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256


Bastien <bzg@gnu.org> writes:

> Hi,
>
> stardiviner <numbchild@gmail.com> writes:
>
>> Here is my patch, I try to annotate on my patch code to express my
>> words:
>
> Thanks - I see what is hardcoded and why.
>
> Since the whole idea is to use something else than url-copy-file for
> downloading URLs I would rather add an option to allow using a custom
> function.
>
> Something like `org-attach-url-copy-file-function' which would be
> url-copy-file by default but could be set to another function with 
> the same signature.
>
> Don't rush on implementing this though, I'd like to reread the code
> in org-attach.el again to decide if this is the right way - and this
> won't be before 9.5.
>
> Best,

Thanks a lot, that'll be great!

I guess my expression skill is limited, not very smooth to discuss with people.
Try to improve it.

- -- 
[ stardiviner ]
       I try to make every word tell the meaning that I want to express.

       Blog: https://stardiviner.github.io/
       IRC(freenode): stardiviner, Matrix: stardiviner
       GPG: F09F650D7D674819892591401B5DF1C95AE89AC3
-----BEGIN PGP SIGNATURE-----

iQFIBAEBCAAyFiEE8J9lDX1nSBmJJZFAG13xyVromsMFAl7YM4wUHG51bWJjaGls
ZEBnbWFpbC5jb20ACgkQG13xyVromsPbsQgAtnuLIcgq3l+k5PcFu/YZuhvgjLEg
qp3F9Vvo/yu8eoZvGB2IMA+B1AUVF5Af81OvSIpeOV4ef5lis7smvNtLTWSHxLVI
5m/MGeCATThJ4cbFVGB1fEi3BxFAk5fGerVc+Mm8YYDwLkjhxuERDxkWYAJeLkBj
6vLrFoRTDJc6UvOg36tlMKqQuGkEFke8zNvqwQ/AM3IHwEmOfa9mpCaqAAtKxOtL
mBNSs1aa/+zR4MBYwmIDoloZPoUvWQERj2kJo9KEu+DqHsgHGeS9rT6OSKccZA0P
HJSrtcjmfhKAafLDH4ERtoBoyli+Y8v0LA5aEj2MEl9BIGFmMa14f9dXkA==
=X23S
-----END PGP SIGNATURE-----


^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2020-06-03 23:35 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-05-27  9:55 [Feature] add a new org-attach dispatcher command to offline save web page stardiviner
2020-05-28  2:55 ` Matthew Lundin
2020-05-28  5:49   ` Ihor Radchenko
2020-05-28  6:39     ` stardiviner
2020-05-28 14:03       ` Ihor Radchenko
2020-05-28 16:00         ` stardiviner
2020-05-28 18:16           ` Ihor Radchenko
2020-05-28 16:19     ` Matthew Lundin
2020-05-28 17:11       ` Ihor Radchenko
2020-05-28 22:15         ` Matthew Lundin
2020-05-29  2:15           ` stardiviner
2020-05-29  2:06         ` stardiviner
2020-05-29  2:03       ` stardiviner
2020-05-29  2:17         ` Ihor Radchenko
2020-05-29  6:16           ` stardiviner
2020-05-29 15:33           ` Matthew Lundin
2020-05-29 16:32             ` stardiviner
2020-05-30  6:09             ` Ihor Radchenko
2020-05-28  6:37   ` stardiviner
2020-05-28  6:40   ` stardiviner
2020-05-28 22:24 ` Samuel Wales
2020-05-29  2:23 ` [PATCH updated] " stardiviner
2020-05-29  2:27 ` stardiviner
2020-06-02 12:20   ` Bastien
2020-06-02 14:06     ` stardiviner
2020-06-02 14:26       ` Bastien
2020-06-02 14:40         ` stardiviner
2020-06-03 15:10           ` Bastien
2020-06-03 23:34             ` stardiviner

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).