emacs-orgmode@gnu.org archives
 help / color / mirror / code / Atom feed
* [FR] Make notion of "modification time" configurable during publishing
       [not found] <x1-6cTTUwmKWZxHNPpNVJOTVkwgKgg@gwene.org>
@ 2023-09-20 20:55 ` Suhail Singh
  2023-09-22  9:41   ` Ihor Radchenko
  0 siblings, 1 reply; 6+ messages in thread
From: Suhail Singh @ 2023-09-20 20:55 UTC (permalink / raw)
  To: emacs-orgmode

Ihor Radchenko <post@gwene.org> writes:

> - weary-traveler asked to add a new feature to Org publishing system.
>
>   The use case is publishing using CI tasks, where a new, fresh image
>   is created every time a website is re-published. Such image does
>   not preserve the original modification times of the source Org
>   files, making Org publish re-generate everything every single time.
>
>   The new suggested feature is (optionally) using last git commit time
>   instead of file modification time.

>   - Conclusion: the feature request is to be submitted to mailing list
>     for more detailed discussion.

The requested feature is for the notion of "modification time" to be
configurable by the user. Even in the specific situation quoted above,
different users may desire different alternative notions of
"modification time". For instance,

- the value of the DATE keyword
- if using git as vc, the "git author date"
- if using git as vc, the "git commit date"

It would help for the notion of "modification time" (currently
hard-coded as file-attribute-modification-time ) to be configurable in
the following functions:

- org-publish-cache-mtime-of-src: this is used to decide whether or not
  a particular file needs to be republished. I.e., the function at the
  heart of the motivating situation discussed during the meetup.
  
- org-publish-find-date: this is used during the publishing of sitemaps
  where the modification time is used as the default value (for
  directories the filesystem modification time is the only value ever
  used).

The user-supplied "modification time" function would take the FILE as an
input and its output would be in the style of `current-time'.  A default
value could be provided for above cases which preserves current
semantics.  Additional inputs to such a "modification time" function
that may be helpful (though may not be necessary) are the PROJECT-PLIST
and the PUB-DIR.

>   - Org already has somewhat similar functionality in a form of
>     ={{{modification-time ...}}}= macro. See
>     [[info:org#Macro Replacement][org#Macro Replacement]] section
>     of Org manual.

I believe the modification-time macro for git uses the "author date".
This may not always be the desirable "vc-modification-time" notion.
However, when it's not, the user has the freedom to create (and use)
their own macro.

>   - Also, several other places in Org are relying on file modification
>     time and might need to be considered.

org-html-format-spec is one such place that may additionally benefit.
However, in that case it may be sufficient to simply mirror what's done
for the modification-time macro. I.e., consider only one notion of
"vc-modification-time" and have it be the "author date" semantics.

-- 
Suhail


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [FR] Make notion of "modification time" configurable during publishing
  2023-09-20 20:55 ` [FR] Make notion of "modification time" configurable during publishing Suhail Singh
@ 2023-09-22  9:41   ` Ihor Radchenko
  0 siblings, 0 replies; 6+ messages in thread
From: Ihor Radchenko @ 2023-09-22  9:41 UTC (permalink / raw)
  To: Suhail Singh; +Cc: emacs-orgmode

Suhail Singh <suhailsingh247@gmail.com> writes:

>>   The new suggested feature is (optionally) using last git commit time
>>   instead of file modification time.
>
> The requested feature is for the notion of "modification time" to be
> configurable by the user. Even in the specific situation quoted above,
> different users may desire different alternative notions of
> "modification time". For instance,
>
> - the value of the DATE keyword
> - if using git as vc, the "git author date"
> - if using git as vc, the "git commit date"

I am a bit confused. What do you mean by "git author date" and "git
commit date"?

> It would help for the notion of "modification time" (currently
> hard-coded as file-attribute-modification-time ) to be configurable in
> the following functions:
>
> - org-publish-cache-mtime-of-src: this is used to decide whether or not
>   a particular file needs to be republished. I.e., the function at the
>   heart of the motivating situation discussed during the meetup.

I think that we should use an alternative approach. Both "git time" and
"fs time" are only an approximation. The true decision to re-publish an
article should be triggered by article text being modified. So, we may
better decide based on the file text hash, not the modification times.

-- 
Ihor Radchenko // yantar92,
Org mode contributor,
Learn more about Org mode at <https://orgmode.org/>.
Support Org development at <https://liberapay.com/org-mode>,
or support my work at <https://liberapay.com/yantar92>


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [FR] Make notion of "modification time" configurable during publishing
       [not found] <x1-5i65OtI+6PjBbgoemiQBgzkPLPw@gwene.org>
@ 2023-09-22 19:56 ` Suhail Singh
  2023-09-23 10:36   ` Ihor Radchenko
  0 siblings, 1 reply; 6+ messages in thread
From: Suhail Singh @ 2023-09-22 19:56 UTC (permalink / raw)
  To: emacs-orgmode

Ihor Radchenko writes:

> I am a bit confused. What do you mean by "git author date" and "git
> commit date"?

In the output of `git log --pretty=fuller`, there is AuthorDate which is
distinct from CommitDate. In case unfamiliar, an elaboration on the
distinction: <https://stackoverflow.com/a/11857467>.

> I think that we should use an alternative approach. Both "git time" and
> "fs time" are only an approximation. The true decision to re-publish an
> article should be triggered by article text being modified. So, we may
> better decide based on the file text hash, not the modification times.

For it to work, the "file text hash" would have to also take into
account the "file text hash" of included files, or the decision to
re-publish would have to be predicated on the hash of included files as
well. I.e., the equivalent of this logic in
org-publish-cache-file-needs-publishing :

	(let ((mtime (org-publish-cache-mtime-of-src filename)))
	  (or (time-less-p pstamp mtime)
	      (cl-some (lambda (ct) (time-less-p mtime ct))
		       included-files-mtime)))

But assuming the existence of equivalent logic, yes something like a
file hash would work. In fact, at least in the case of git, the VCS
could even be queried for it (via git hash-object).

-- 
Suhail


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [FR] Make notion of "modification time" configurable during publishing
  2023-09-22 19:56 ` Suhail Singh
@ 2023-09-23 10:36   ` Ihor Radchenko
  0 siblings, 0 replies; 6+ messages in thread
From: Ihor Radchenko @ 2023-09-23 10:36 UTC (permalink / raw)
  To: Suhail Singh; +Cc: emacs-orgmode

Suhail Singh <suhailsingh247@gmail.com> writes:

> Ihor Radchenko writes:
>
>> I am a bit confused. What do you mean by "git author date" and "git
>> commit date"?
>
> In the output of `git log --pretty=fuller`, there is AuthorDate which is
> distinct from CommitDate. In case unfamiliar, an elaboration on the
> distinction: <https://stackoverflow.com/a/11857467>.

I see now.
But do you actually use one but not other in practice?

>> I think that we should use an alternative approach. Both "git time" and
>> "fs time" are only an approximation. The true decision to re-publish an
>> article should be triggered by article text being modified. So, we may
>> better decide based on the file text hash, not the modification times.
>
> For it to work, the "file text hash" would have to also take into
> account the "file text hash" of included files, or the decision to
> re-publish would have to be predicated on the hash of included files as
> well. I.e., the equivalent of this logic in
> org-publish-cache-file-needs-publishing :
>
> 	(let ((mtime (org-publish-cache-mtime-of-src filename)))
> 	  (or (time-less-p pstamp mtime)
> 	      (cl-some (lambda (ct) (time-less-p mtime ct))
> 		       included-files-mtime)))

And the code right above the quoted already looks into INCLUDEs. It
should not be too hard to add buffer hash calculation there.

> But assuming the existence of equivalent logic, yes something like a
> file hash would work. In fact, at least in the case of git, the VCS
> could even be queried for it (via git hash-object).

Probably an overkill. It will only work for files without includes and
force us to use exactly the same hash algorithm.

-- 
Ihor Radchenko // yantar92,
Org mode contributor,
Learn more about Org mode at <https://orgmode.org/>.
Support Org development at <https://liberapay.com/org-mode>,
or support my work at <https://liberapay.com/yantar92>


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [FR] Make notion of "modification time" configurable during publishing
       [not found] <x1-bakbPFr8WTAd8rrVoF4onXphuTM@gwene.org>
@ 2023-09-23 15:45 ` Suhail Singh
  2023-09-26 11:02   ` Ihor Radchenko
  0 siblings, 1 reply; 6+ messages in thread
From: Suhail Singh @ 2023-09-23 15:45 UTC (permalink / raw)
  To: emacs-orgmode

Ihor Radchenko writes:

> But do you actually use one but not other in practice?

As in, could users have a preference for one vs the other in practice?
Yes, since the choice isn't without consequence, it's conceivable
(generally speaking) that some would prefer one over the other. FWIW, in
my specific case, I use CommitDate, but I am not convinced it's "the
right thing" in all situations.

Not having conducted a survey, I also cannot comment on the frequency
with which users have a desired preference for one vs the other. I am
also not aware of general rules where users would necessarily prefer one
over the other, but it's possible they may exist. My point was to simply
point out that there is more than one interpretation of
"vc-modification-time".

> It should not be too hard to add buffer hash calculation there.

No disagreement there.

> It will only work for files without includes and force us to use
> exactly the same hash algorithm.

I don't follow. I was stating that the concept of a "file hash" could be
obtained in more than way. I.e., in addition to it being calculated "by
hand" it would also be possible to query an oracle (the VCS in this
case) for it. This is distinct and orthogonal from the decision of how a
"file with includes" is handled.

If I understand you correctly, the logic you have in mind, would be
something like this:

- during publish, compare the file hash of the file being published as
  well as all included files
- if the values for all are the same as in the cache, don't publish (if
  user has signalled such intent via the equivalent of
  org-publish-use-timestamps-flag)
- if the value of any one is different, re-publish and update cache with
  the updated file hashes

It doesn't matter how the specific file hash is obtained, as long as the
mechanism is being used consistently and the file hash of the included
files are also being consulted in an appropriate way.

-- 
Suhail


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [FR] Make notion of "modification time" configurable during publishing
  2023-09-23 15:45 ` Suhail Singh
@ 2023-09-26 11:02   ` Ihor Radchenko
  0 siblings, 0 replies; 6+ messages in thread
From: Ihor Radchenko @ 2023-09-26 11:02 UTC (permalink / raw)
  To: Suhail Singh; +Cc: emacs-orgmode

Suhail Singh <suhailsingh247@gmail.com> writes:

> Ihor Radchenko writes:
>
>> But do you actually use one but not other in practice?
>
> As in, could users have a preference for one vs the other in practice?
> Yes, since the choice isn't without consequence, it's conceivable
> (generally speaking) that some would prefer one over the other. FWIW, in
> my specific case, I use CommitDate, but I am not convinced it's "the
> right thing" in all situations.
>
> Not having conducted a survey, I also cannot comment on the frequency
> with which users have a desired preference for one vs the other. I am
> also not aware of general rules where users would necessarily prefer one
> over the other, but it's possible they may exist. My point was to simply
> point out that there is more than one interpretation of
> "vc-modification-time".

I see your point. Although, from the point of view of Org development,
we do not want to add features nobody practically use. Such features add
burden upon maintainers and do not practically benefit users. So, I am
inclined to reuse the existing approach and only add more granularity if
users ask for it.

>> It will only work for files without includes and force us to use
>> exactly the same hash algorithm.
>
> I don't follow. I was stating that the concept of a "file hash" could be
> obtained in more than way. I.e., in addition to it being calculated "by
> hand" it would also be possible to query an oracle (the VCS in this
> case) for it. This is distinct and orthogonal from the decision of how a
> "file with includes" is handled.

Sure. But the way "file hash" is calculated should be consistent. The
way Emacs calculates hash is different from git. So, supporting Emacs
and git ways means more code (not a good thing unless necessary). We
should thus prefer Emacs API to calculate hash.

> If I understand you correctly, the logic you have in mind, would be
> something like this:
>
> - during publish, compare the file hash of the file being published as
>   well as all included files
> - if the values for all are the same as in the cache, don't publish (if
>   user has signalled such intent via the equivalent of
>   org-publish-use-timestamps-flag)
> - if the value of any one is different, re-publish and update cache with
>   the updated file hashes
>
> It doesn't matter how the specific file hash is obtained, as long as the
> mechanism is being used consistently and the file hash of the included
> files are also being consulted in an appropriate way.

Yup, you are correct.

Now, to practice - let's start from updating ox-publish to use text hash
to decide about re-publishing. I will put it into my todo list, but
patches are more than welcome (my todo list is rather long). The
function to start looking into is `org-publish-cache-file-needs-publishing'.

-- 
Ihor Radchenko // yantar92,
Org mode contributor,
Learn more about Org mode at <https://orgmode.org/>.
Support Org development at <https://liberapay.com/org-mode>,
or support my work at <https://liberapay.com/yantar92>


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2023-09-26 11:02 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <x1-6cTTUwmKWZxHNPpNVJOTVkwgKgg@gwene.org>
2023-09-20 20:55 ` [FR] Make notion of "modification time" configurable during publishing Suhail Singh
2023-09-22  9:41   ` Ihor Radchenko
     [not found] <x1-5i65OtI+6PjBbgoemiQBgzkPLPw@gwene.org>
2023-09-22 19:56 ` Suhail Singh
2023-09-23 10:36   ` Ihor Radchenko
     [not found] <x1-bakbPFr8WTAd8rrVoF4onXphuTM@gwene.org>
2023-09-23 15:45 ` Suhail Singh
2023-09-26 11:02   ` Ihor Radchenko

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).