Re: archiving speed [was Re: Tips on maintaining history in Org Mode]

emacs-orgmode@gnu.org archives
 help / color / mirror / code / Atom feed

From: Tim Cross <theophilusx@gmail.com>
To: Samuel Wales <samologist@gmail.com>
Cc: Org-mode <emacs-orgmode@gnu.org>,
	Ihor Radchenko <yantar92@gmail.com>,
	David Masterson <dsmasterson92630@outlook.com>
Subject: Re: archiving speed [was Re: Tips on maintaining history in Org Mode]
Date: Thu, 12 Aug 2021 15:56:34 +1000	[thread overview]
Message-ID: <87a6lnchcd.fsf@gmail.com> (raw)
In-Reply-To: <CAJcAo8tG4eKSXHUJNigZOEa46ymr8yjk_MpZfEbe_aNBNsNp7A@mail.gmail.com>

It is for me. However, this all depends on how you configure things.

In my case, I have a standard structure for my org files. I have a
different org file for each 'topic' and each org file has headings for

* Tasks

* Notes

* Resources

* Comms

* General

and each of those headings has a property list with an ARCHIVE property
which looks like %s_archive::<heading> i.e. %s_archive::* Tasks, so when
I archive a tree/subtree is is placed under the heading according to the
ARCHIVE property for the tree it comes from.

almost all data I enter into org files comes via a capture template. All
captured data initially goes into a refile.org file and then I refile to
the appropriate topic org file each morning.

For completed tasks, I will usually marked them with an ARCHIVE tag soon
after they have been completed and then every 6 months or so, I will
archive into an archive file (where heading hierarchies are retained).
This is most common with completed tasks. I have a custom agenda which
shows tasks broken up into "Completed", "In Progress", "Next" and
"Backlog". I will archive the DONE items so that the "Completed" list in
the agenda does ot grow too large.

Every 12 months I move the archive files to an archive folder where they
are renamed to include the year in the filename.

I don't find archiving terribly slow. this is mainly because few of my
org files are particularly large (because they are broken up into
topics) and because I move older archives out into an archive directory
after 12 months. It is very rare that I need to go digging around in
archive files (either current or older year archives).

Of course this could all change down the track. My org files are slowly
getting larger and I expect at some point, I will hit a tipping point
where things become slow. So far, even with the larger files (around
5Mb) performance is fine. I also don't put 'everything' in the org file.
If I have another file of data, I will just link to that from the org
file rather than have that data actually reside in the org file. SO for
example, my org-mode.org file has a lot of links to interesting messages
from the org list, but the messages themselves are all in my mu4e
maildir folders.

My main point was that because configuration like mine exist, simply
appending archived items to the archive file simply would not work. I
like having my archive records in a similar 'shape' to my normal org
files because when I do need to dig into the archive, I don't want to
have to go through the whole file looking for something. I generally
know if I'm looking for an old task, note, general entry or comms record
and it is handly to know I only have to look in that section of the
file. 

This is one of the big challenges for org mode. Because it is so
flexible and people take advantage of that flexibility, what may appear
like a simple way to solve an issue often ends up being far more complex
than it initially seemed. If, for exmaple, you could not archive based
on heading, date, etc, just appending entries would probably work fine.
However, as the archviing policy might be more complex, org needs to
examine/parse the archive file to work out where to insert the archived
entry.

Tim

Samuel Wales <samologist@gmail.com> writes:

> what is the current status of hierarchy in archive files?  surely they
> don't deal with updating categories and updating hierarchy structure
> [sounds brittle and syncy]?  i'm thinking it isn't hierarchical at
> present, except when you have a doneified task with children?
>
>
> On 8/11/21, Tim Cross <theophilusx@gmail.com> wrote:
>> I think the problem with just using append to file is that it won't
>> preserve the shape of the file. For example, if I had a file with
>>
>> * Notes
>> ** Note 1
>>    blah blah
>> ** Note 2 blah blah
>>
>> * Tasks
>> ** DONE task 1
>> ** TODO Task 2
>>
>> and I decide to archive note 1 and task 1, I would like them to both appear
>> under the same headings and with the same level. If the process just uses
>> append to file, I can have this for the first archiving i.e.
>>
>> * Noes
>> ** Note 1
>>
>> * Tasks
>> ** DONE task 1
>>
>> but then later, I decide to archive note 2, if append file is used, I will
>> end up with
>>
>> * Notes
>> ** Note 1
>>
>> * Taks
>> ** DONE task 1
>>
>> * Notes
>> ** Note 2
>>
>> which is not what I want. I want
>>
>> * Notes
>> ** Note 1
>> ** Note 2
>>
>> * Tasks
>> ** DONE Task 1
>>
>> So, if we want to preserve hierarchies in our archive files and not have
>> everything jumbled up together, the system has to parse the file. If you
>> are also using something like Categories, then even more work needs to be
>> odne to update the category lists.
>>
>> What I tend to do is mark items with the ARCHIVE tag and leave them in the
>> file and then every few months, move archived data to archive files.  It
>> can still get slow, but I don't do it often, so it isn't too much of a
>> hassle.
>>
>>
>> On Thu, 12 Aug 2021 at 08:23, Samuel Wales <samologist@gmail.com> wrote:
>>
>>> thanks for the clarification.  are you saying that, for every archived
>>> entry, it calculates teh category property, using the original org
>>> file, in order to add a category property to just one archived entry?
>>>
>>> that would certainly slow down more and more, but it sends me back to
>>> my question about whether append to file would work.
>>> i.e. build the single entry in a temporary buffer then write that
>>> region to a file on disk.
>>>
>>> On 8/10/21, Ihor Radchenko <yantar92@gmail.com> wrote:
>>> > Samuel Wales <samologist@gmail.com> writes:
>>> >
>>> >> i should clarify.  bulk archiving slows down even with /nonexistent/
>>> >> (have not tried empty) archives.  as part of normal and expected
>>> >> operation, bulk creates the archive for the first entry, and then
>>> >> subsequent entries are added.  those get slower and slower.
>>> >
>>> > That's what I suspected. I also see this and my suggestion helped
>>> > archiving speed in my case.
>>> >
>>> >> i use (olpath category itags).  i will try (file time) when i can, if
>>> >> that still applies.  my brain needs to be more operational.
>>> >
>>> > When you use category, every time you modify the original file (not the
>>> > archive!), Org mode re-calculates *all* the categories in the original
>>> > Org file. It happens for every single archived heading. If your
>>> > original
>>> > Org file is large, re-calculations make things extremely slow.
>>> >
>>> > Best,
>>> > Ihor
>>> >
>>>
>>>
>>> --
>>> The Kafka Pandemic
>>>
>>> Please learn what misopathy is.
>>>
>>> https://thekafkapandemic.blogspot.com/2013/10/why-some-diseases-are-wronged.html
>>>
>>>
>>
>> --
>> regards,
>>
>> Tim
>>
>> --
>> Tim Cross
>>

next prev parent reply	other threads:[~2021-08-12  6:25 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-03-01  4:22 archiving speed [was Re: Tips on maintaining history in Org Mode] Samuel Wales
2021-03-01  6:02 ` Ihor Radchenko
2021-08-11  1:08 ` Samuel Wales
2021-08-11  4:13   ` Ihor Radchenko
2021-08-11  5:58     ` Samuel Wales
2021-08-11  6:43       ` Ihor Radchenko
2021-08-11 22:23         ` Samuel Wales
2021-08-12  0:24           ` Ihor Radchenko
2021-08-12  5:47             ` Samuel Wales
2021-08-12  3:38           ` Tim Cross
2021-08-12  5:49             ` Samuel Wales
2021-08-12  5:56               ` Tim Cross [this message]
2021-10-17 12:08       ` Ihor Radchenko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.orgmode.org/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87a6lnchcd.fsf@gmail.com \
    --to=theophilusx@gmail.com \
    --cc=dsmasterson92630@outlook.com \
    --cc=emacs-orgmode@gnu.org \
    --cc=samologist@gmail.com \
    --cc=yantar92@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).