New feature? Remove duplicate subheadings, preserving order

emacs-orgmode@gnu.org archives
 help / color / mirror / code / Atom feed

* New feature? Remove duplicate subheadings, preserving order
@ 2018-01-01  2:42 Allen Li
  2018-01-01  5:55 ` Marcin Borkowski
  2018-01-01 10:04 ` Nicolas Goaziou
  0 siblings, 2 replies; 18+ messages in thread
From: Allen Li @ 2018-01-01  2:42 UTC (permalink / raw)
  To: emacs-orgmode

I wrote a command to remove duplicate subheadings, which I use to
remove duplicate captured links among other things.  Would this be a
useful addition to Org mode?

I have included it below for reference.  I will clean it up a bit if
it's a worthy feature.

(defun mir-org-uniq ()
  "Remove duplicate subheadings, preserving order."
  (interactive)
  (let ((seen (make-hash-table :test 'equal))
        (removed 0))
    (save-excursion
      (org-map-entries (lambda ()
                         (let ((heading (org-get-heading t t t t)))
                           (if (not (gethash heading seen))
                               (puthash heading t seen)
                             (org-cut-subtree)
                             (org-backward-heading-same-level 1)
                             (setq removed (1+ removed)))))
                       (format "LEVEL=%s" (1+ (org-current-level)))
                       'tree))
    (message "Removed %d duplicates" removed)))

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: New feature? Remove duplicate subheadings, preserving order
  2018-01-01  2:42 New feature? Remove duplicate subheadings, preserving order Allen Li
@ 2018-01-01  5:55 ` Marcin Borkowski
  2018-01-01 10:04 ` Nicolas Goaziou
  1 sibling, 0 replies; 18+ messages in thread
From: Marcin Borkowski @ 2018-01-01  5:55 UTC (permalink / raw)
  To: Allen Li; +Cc: emacs-orgmode


On 2018-01-01, at 03:42, Allen Li <vianchielfaura@gmail.com> wrote:

> I wrote a command to remove duplicate subheadings, which I use to
> remove duplicate captured links among other things.  Would this be a
> useful addition to Org mode?

IMHO yes.

-- 
Marcin Borkowski

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: New feature? Remove duplicate subheadings, preserving order
  2018-01-01  2:42 New feature? Remove duplicate subheadings, preserving order Allen Li
  2018-01-01  5:55 ` Marcin Borkowski
@ 2018-01-01 10:04 ` Nicolas Goaziou
  2018-01-01 11:59   ` Allen Li
  1 sibling, 1 reply; 18+ messages in thread
From: Nicolas Goaziou @ 2018-01-01 10:04 UTC (permalink / raw)
  To: Allen Li; +Cc: emacs-orgmode

Hello,

Allen Li <vianchielfaura@gmail.com> writes:

> I wrote a command to remove duplicate subheadings, which I use to
> remove duplicate captured links among other things.  Would this be a
> useful addition to Org mode?
>
> I have included it below for reference.  I will clean it up a bit if
> it's a worthy feature.
>
> (defun mir-org-uniq ()
>   "Remove duplicate subheadings, preserving order."
>   (interactive)
>   (let ((seen (make-hash-table :test 'equal))
>         (removed 0))
>     (save-excursion
>       (org-map-entries (lambda ()
>                          (let ((heading (org-get-heading t t t t)))
>                            (if (not (gethash heading seen))
>                                (puthash heading t seen)
>                              (org-cut-subtree)
>                              (org-backward-heading-same-level 1)
>                              (setq removed (1+ removed)))))
>                        (format "LEVEL=%s" (1+ (org-current-level)))
>                        'tree))
>     (message "Removed %d duplicates" removed)))

Duplicates headings are not necessarily wrong. I think this is too
specific to be integrated in Org proper.

Maybe we could add a check for duplicates headings in Org Lint instead,
and add this to Worg, in a "tools" page.

Or we could check for duplicate headings _including contents_, which are
more likely to be an error.

WDYT?

Regards,

-- 
Nicolas Goaziou

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: New feature? Remove duplicate subheadings, preserving order
  2018-01-01 10:04 ` Nicolas Goaziou
@ 2018-01-01 11:59   ` Allen Li
  2018-01-01 18:26     ` Nicolas Goaziou
  0 siblings, 1 reply; 18+ messages in thread
From: Allen Li @ 2018-01-01 11:59 UTC (permalink / raw)
  To: Nicolas Goaziou; +Cc: emacs-orgmode

On Mon, Jan 1, 2018 at 2:04 AM, Nicolas Goaziou <mail@nicolasgoaziou.fr> wrote:
> Duplicates headings are not necessarily wrong. I think this is too
> specific to be integrated in Org proper.
>
> Maybe we could add a check for duplicates headings in Org Lint instead,
> and add this to Worg, in a "tools" page.
>
> Or we could check for duplicate headings _including contents_, which are
> more likely to be an error.
>
> WDYT?

Org mode is fundamentally an outliner, and one often makes lists with
an outliner.  Filtering out duplicates from a list seems to me like a
common need.  I wrote such a command to support some of my work flows,
and I posted this here because I think there is a possibility that
other Org users might also find it useful.

If this is not so, I’m perfectly okay just keeping this in my personal
config.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: New feature? Remove duplicate subheadings, preserving order
  2018-01-01 11:59   ` Allen Li
@ 2018-01-01 18:26     ` Nicolas Goaziou
  2018-01-01 23:04       ` Allen Li
  2018-01-02 15:28       ` Florian Beck
  0 siblings, 2 replies; 18+ messages in thread
From: Nicolas Goaziou @ 2018-01-01 18:26 UTC (permalink / raw)
  To: Allen Li; +Cc: emacs-orgmode

Allen Li <vianchielfaura@gmail.com> writes:

> Org mode is fundamentally an outliner, and one often makes lists with
> an outliner.  Filtering out duplicates from a list seems to me like a
> common need.

AFAIK, this is the first time this need is expressed on this ML. There
is no equivalent in "org-list.el" either.

Anyway, I'm not questioning the usefulness of the feature in your
workflow. AIUI, in your implementation, duplicates are headlines with
the same title, but without considering TODO keyword, priority, comment
status, tags or contents. So,

  * DONE Summary :Alice:
  * TODO Summary :Bob:

are duplicates. Isn't it a bit too tolerant? We may be able to find
a more general function that still suits you.

> I wrote such a command to support some of my work flows, and I posted
> this here because I think there is a possibility that other Org users
> might also find it useful.

You didn't answer to any of my suggestions, tho. Are they fundamentally
wrong? I.e., wouldn't warning instead of deleting more useful? Or would
it make more sense to include contents when looking for duplicates ? In
the latter case, maybe a prefix argument could toggle headline check and
full check.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: New feature? Remove duplicate subheadings, preserving order
  2018-01-01 18:26     ` Nicolas Goaziou
@ 2018-01-01 23:04       ` Allen Li
  2018-01-02  4:07         ` Adam Porter
  2018-01-02 15:28       ` Florian Beck
  1 sibling, 1 reply; 18+ messages in thread
From: Allen Li @ 2018-01-01 23:04 UTC (permalink / raw)
  To: Nicolas Goaziou; +Cc: emacs-orgmode

On Mon, Jan 1, 2018 at 10:26 AM, Nicolas Goaziou <mail@nicolasgoaziou.fr> wrote:
> Allen Li <vianchielfaura@gmail.com> writes:
>
>> Org mode is fundamentally an outliner, and one often makes lists with
>> an outliner.  Filtering out duplicates from a list seems to me like a
>> common need.
>
> AFAIK, this is the first time this need is expressed on this ML. There
> is no equivalent in "org-list.el" either.
>
> Anyway, I'm not questioning the usefulness of the feature in your
> workflow. AIUI, in your implementation, duplicates are headlines with
> the same title, but without considering TODO keyword, priority, comment
> status, tags or contents. So,
>
>   * DONE Summary :Alice:
>   * TODO Summary :Bob:
>
> are duplicates. Isn't it a bit too tolerant? We may be able to find
> a more general function that still suits you.

I see this feature as primarily being useful on lists.  So for example:

* Things to buy
** cabbage
** milk
** carrots
** milk

I don’t know if a more intelligent way of handling tags and todo
keywords is worth the extra complexity, but in the use case that I
imagine it makes sense to match using only the heading/list item:

* Things to buy
** TODO cabbage
** DONE milk :store1:
   Maybe I forgot a tag here.  Oh well, I already bought the milk.
** TODO carrots
** TODO milk :store1:store2:

>
>> I wrote such a command to support some of my work flows, and I posted
>> this here because I think there is a possibility that other Org users
>> might also find it useful.
>
> You didn't answer to any of my suggestions, tho. Are they fundamentally
> wrong? I.e., wouldn't warning instead of deleting more useful? Or would
> it make more sense to include contents when looking for duplicates ? In
> the latter case, maybe a prefix argument could toggle headline check and
> full check.

Since the point would be remove duplicates from lists, I don’t think
warning is very useful.  I would want to remove the duplicate list
items, not get a warning about it and delete them manually.  Perhaps
that would be a useful additional feature however (like uniq -d).

It doesn’t make sense to include the contents because I see this as
primarily being useful for list items.  In particular, we would want
to ignore log entries and properties for the sake of matching
(intelligent property or logbook merging might be useful, but adds
complexity).

I don’t think doing a full text check is useful, but if someone has a
use case for that, please speak up.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: New feature? Remove duplicate subheadings, preserving order
  2018-01-01 23:04       ` Allen Li
@ 2018-01-02  4:07         ` Adam Porter
  2018-01-02  7:40           ` Allen Li
  0 siblings, 1 reply; 18+ messages in thread
From: Adam Porter @ 2018-01-02  4:07 UTC (permalink / raw)
  To: emacs-orgmode

Allen Li <vianchielfaura@gmail.com> writes:

> I don’t know if a more intelligent way of handling tags and todo
> keywords is worth the extra complexity, but in the use case that I
> imagine it makes sense to match using only the heading/list item:
>
> * Things to buy
> ** TODO cabbage
> ** DONE milk :store1:
>    Maybe I forgot a tag here.  Oh well, I already bought the milk.
> ** TODO carrots
> ** TODO milk :store1:store2:
>
> ...
>
> It doesn’t make sense to include the contents because I see this as
> primarily being useful for list items.  In particular, we would want
> to ignore log entries and properties for the sake of matching
> (intelligent property or logbook merging might be useful, but adds
> complexity).

I think such a command should check all heading data by default,
because that's the safest option.  A user who commonly needs to ignore
one or more types of data could use a custom function that calls the
command with arguments to disable checking of certain types.

> Since the point would be remove duplicates from lists, I don’t think
> warning is very useful.  I would want to remove the duplicate list
> items, not get a warning about it and delete them manually.  Perhaps
> that would be a useful additional feature however (like uniq -d).

I think warning or asking for confirmation should be the default action,
because it's the safest option.  Users who want to skip that could use a
prefix argument or call it from a custom command.

> I don’t think doing a full text check is useful, but if someone has a
> use case for that, please speak up.

An example where this would be useful is if the user has copied and
pasted subtrees and accidentally pasted one more than once.

I argue here for the safest behavior by default because I've found that,
in very large Org buffers, it's easy to lose my place in the file, and
it's easy to accidentally do something that I didn't mean to, without
noticing.  IMO this is simply a consequence of Org buffers still being
plain-text.

So it is quite conceivable to me that a user might intentionally give
two headings the same name (e.g. a user who captures quotations to an
inbox file might have two "Quote" headings that are completely
different), or might accidentally duplicate a subtree and then make a
desired change to one of them without realizing there was a
duplicate--then he might use this deduplication command and accidentally
delete a subtree he didn't mean to, resulting in data loss.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: New feature? Remove duplicate subheadings, preserving order
  2018-01-02  4:07         ` Adam Porter
@ 2018-01-02  7:40           ` Allen Li
  2018-01-02 14:36             ` Robert Horn
  2018-01-02 16:36             ` Nick Dokos
  0 siblings, 2 replies; 18+ messages in thread
From: Allen Li @ 2018-01-02  7:40 UTC (permalink / raw)
  To: Adam Porter; +Cc: emacs-orgmode

On Mon, Jan 1, 2018 at 8:07 PM, Adam Porter <adam@alphapapa.net> wrote:
> Allen Li <vianchielfaura@gmail.com> writes:
>
>> I don’t know if a more intelligent way of handling tags and todo
>> keywords is worth the extra complexity, but in the use case that I
>> imagine it makes sense to match using only the heading/list item:
>>
>> * Things to buy
>> ** TODO cabbage
>> ** DONE milk :store1:
>>    Maybe I forgot a tag here.  Oh well, I already bought the milk.
>> ** TODO carrots
>> ** TODO milk :store1:store2:
>>
>> ...
>>
>> It doesn’t make sense to include the contents because I see this as
>> primarily being useful for list items.  In particular, we would want
>> to ignore log entries and properties for the sake of matching
>> (intelligent property or logbook merging might be useful, but adds
>> complexity).
>
> I think such a command should check all heading data by default,
> because that's the safest option.  A user who commonly needs to ignore
> one or more types of data could use a custom function that calls the
> command with arguments to disable checking of certain types.

I don’t see a use case for checking all heading data.

>> Since the point would be remove duplicates from lists, I don’t think
>> warning is very useful.  I would want to remove the duplicate list
>> items, not get a warning about it and delete them manually.  Perhaps
>> that would be a useful additional feature however (like uniq -d).
>
> I think warning or asking for confirmation should be the default action,
> because it's the safest option.  Users who want to skip that could use a
> prefix argument or call it from a custom command.

There is always undo and automatic Emacs file backups.

>> I don’t think doing a full text check is useful, but if someone has a
>> use case for that, please speak up.
>
> An example where this would be useful is if the user has copied and
> pasted subtrees and accidentally pasted one more than once.

In that case, the user should use undo instead of a remove duplicates
command.

> I argue here for the safest behavior by default because I've found that,
> in very large Org buffers, it's easy to lose my place in the file, and
> it's easy to accidentally do something that I didn't mean to, without
> noticing.  IMO this is simply a consequence of Org buffers still being
> plain-text.

I don’t agree with this philosophy.  Org and Emacs already has lots of
commands that can cause damage, for example org-sort-entries which my
remove duplicates command is modeled after (both modify the direct children
under the heading at point irreversibly ignoring undo).  If this command should
warn, then org-sort-entries should also warn.  If org-sort-entries does not need
to warn, then this command does not need to warn.

Emacs makes backups by default and supports undo, which under my
philosophy is good enough; we shouldn’t be constantly asking for
confirmation to prevent user error.  That just causes pop-up dialog fatigue.
For example, everyone clicks OK on pop-up confirmation boxes without
reading them.
These kinds of confirmation prompts are worse than useless; they slow
down and annoy the user without providing any protection.  Undo is the
better solution.

> So it is quite conceivable to me that a user might intentionally give
> two headings the same name (e.g. a user who captures quotations to an
> inbox file might have two "Quote" headings that are completely
> different), or might accidentally duplicate a subtree and then make a
> desired change to one of them without realizing there was a
> duplicate--then he might use this deduplication command and accidentally
> delete a subtree he didn't mean to, resulting in data loss.

I think it would be more useful for list members to post actual use
cases than hypothesize about what people want.

For me, the use case is filtering out duplicates from a list,
e.g. groceries to buy or links to read captured with timestamps and
other metadata, so checking the tags, todo, or body text is not useful,
warning is not useful.

Based on the responses I have gotten, it sounds like this feature is
too specialized to be worth including in Org mode, so I will stop
pursuing this unless people post actual use cases/desire for
the feature.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: New feature? Remove duplicate subheadings, preserving order
  2018-01-02  7:40           ` Allen Li
@ 2018-01-02 14:36             ` Robert Horn
  2018-01-02 21:34               ` Allen Li
  2018-01-02 16:36             ` Nick Dokos
  1 sibling, 1 reply; 18+ messages in thread
From: Robert Horn @ 2018-01-02 14:36 UTC (permalink / raw)
  To: Allen Li; +Cc: Adam Porter, emacs-orgmode

Allen Li writes:

> On Mon, Jan 1, 2018 at 8:07 PM, Adam Porter <adam@alphapapa.net> wrote:
>
> I don’t see a use case for checking all heading data.
>

I can see such cases arising from templates and time tracking.  I can
have a template that captures telephone calls.  The call comes in and I
start the template.  At this point the heading is just "Received Phone
Call" and a time tracking start.  Time tracking is eventually kept in a
drawer, not in the headline.

I might eventually go back an revise the headline based on notes from
the call, but that will not happen during the call.  It's quite likely
that sorting out the calls will happen at the end of the day or the next
day.

Similarly, I receive lab results.  These will initially be a headline
with just "Lab Result", a time tag like CLOCK, and a tag to indicate
that it is a lab result.  Some time later I might move them around to
attach them to a patient or project, but often by just moving them as
element into the right section for that patient or project.  So these
also have the same headline contents and different headline data.

R Horn

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: New feature? Remove duplicate subheadings, preserving order
  2018-01-01 18:26     ` Nicolas Goaziou
  2018-01-01 23:04       ` Allen Li
@ 2018-01-02 15:28       ` Florian Beck
  2018-01-02 21:28         ` Allen Li
  1 sibling, 1 reply; 18+ messages in thread
From: Florian Beck @ 2018-01-02 15:28 UTC (permalink / raw)
  To: emacs-orgmode


> AFAIK, this is the first time this need is expressed on this ML. There
> is no equivalent in "org-list.el" either.

A way to handle duplicates would be useful, indeed. But a basic function 
should only remove duplicates that are truly identical (same properties, 
same tags, same/no content). Still, removing true duplicates from 
subtrees (AND lists) would be useful.

More useful would be a slightly more general approach. I have three 
kinds of duplicates:
  - duplicate IDs (which are handled rather poorly),
  - duplicate content (which often is only almost identical) and
  - duplicate headings (which usually I want to rectify when they are on
    the same level of the same subtree)

As you can see, a fixed concept of duplication is probably not going to 
work.

What I'd like is a function finds duplicates according to scope, match 
(as in `org-map-entries') and a user defined function. This function 
should then display the problem cases (via agenda view?). Then we need a 
couple of convenience functions like
  - delete all duplicates but the one at point,
  - mark duplicates I want to keep,
  - uniquify entries (tricky; for headlines maybe prompt the user; for
    IDs, we should check if the ID is referenced from somewhere)
  - merge entries.

But then, I also have duplicates (in content) I want to keep, e.g. one 
in my notes and in a writing project. So, we'd need a property like 
"DUPLICATE_OF".

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: New feature? Remove duplicate subheadings, preserving order
  2018-01-02  7:40           ` Allen Li
  2018-01-02 14:36             ` Robert Horn
@ 2018-01-02 16:36             ` Nick Dokos
  2018-01-02 21:22               ` Allen Li
  2018-01-03  7:40               ` Adam Porter
  1 sibling, 2 replies; 18+ messages in thread
From: Nick Dokos @ 2018-01-02 16:36 UTC (permalink / raw)
  To: emacs-orgmode

Allen Li <vianchielfaura@gmail.com> writes:

>
> I don’t see a use case for checking all heading data.
>
>>> Since the point would be remove duplicates from lists, I don’t think
>>> warning is very useful.  I would want to remove the duplicate list
>>> items, not get a warning about it and delete them manually.  Perhaps
>>> that would be a useful additional feature however (like uniq -d).
>>
>> I think warning or asking for confirmation should be the default action,
>> because it's the safest option.  Users who want to skip that could use a
>> prefix argument or call it from a custom command.
>
> There is always undo and automatic Emacs file backups.
>

There be dragons.

The problem is that some things happen invisibly and far away from
where you are, so you don't know about it and you don't find out for a
couple of weeks.  Undo and automatic backups are useless in that case.

That *has* happened: there have been multiple postings in the ML about
such problems. Whenever it has happened, the devs have always modified
org to make it safer: that is the prudent thing to do and the correct
course of action IMO.

Hell hath no fury like an orgmode user who lost part of his/her
precious org file because of an errant keystroke a month ago and was
not aware of the loss until it was too late.

-- 
Nick

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: New feature? Remove duplicate subheadings, preserving order
  2018-01-02 16:36             ` Nick Dokos
@ 2018-01-02 21:22               ` Allen Li
  2018-01-03  7:24                 ` Adam Porter
  2018-01-03  7:40               ` Adam Porter
  1 sibling, 1 reply; 18+ messages in thread
From: Allen Li @ 2018-01-02 21:22 UTC (permalink / raw)
  To: Nick Dokos; +Cc: emacs-orgmode

On Tue, Jan 2, 2018 at 8:36 AM, Nick Dokos <ndokos@gmail.com> wrote:
> Allen Li <vianchielfaura@gmail.com> writes:
>>
>> There is always undo and automatic Emacs file backups.
>>
>
> There be dragons.
>
> The problem is that some things happen invisibly and far away from
> where you are, so you don't know about it and you don't find out for a
> couple of weeks.  Undo and automatic backups are useless in that case.
>
> That *has* happened: there have been multiple postings in the ML about
> such problems. Whenever it has happened, the devs have always modified
> org to make it safer: that is the prudent thing to do and the correct
> course of action IMO.
>
> Hell hath no fury like an orgmode user who lost part of his/her
> precious org file because of an errant keystroke a month ago and was
> not aware of the loss until it was too late.

I can see where you're coming from, but for me there are various reasons
why I don’t think warning is right:

1. org-sort-entries, which performs an action of similar scope and
   destructiveness, does not need to warn so far.

2. Since I see the only use case for warning is checking beforehand, a
   user that uses this command frequently is not going to type C-c d C-u
   C-c d every time (assuming the user has bound this command to C-c d),
   they’re just going to type C-u C-c d or get frustrated and just bind
   the actual command without warning to C-c d.  So warning provides
   zero safety in practice.

   Another possibility is using a y or n confirmation prompt before
   removing duplicates, however this falls into the same trap that a
   user who uses this frequently is just going to bind the command to a
   key and disable this check.

3. I don’t propose binding this command to any key by default, and I
   don’t think M-x org-remove-duplicates RET is a very common typo.

4. The only commands in Emacs that warn beforehand are truly
   irreversible commands, like deleting in Dired or killing a buffer.
   Everything else in Emacs follows the philosophy of using undo if the
   user makes a mistake, including lots of commands that could have
   unintentional, low visibility effects.  I would prefer following this
   policy unless it proves to actually be a problem.  It seems like
   org-sort-entries in practice has not suffered from this problem, so I
   believe a remove duplicates command will similarly not suffer from
   this problem in practice.

5. Everyone should be keeping reliable backups.  This is reiterated all
   the time, yet no one seems to follow it? =)

>
> --
> Nick
>
>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: New feature? Remove duplicate subheadings, preserving order
  2018-01-02 15:28       ` Florian Beck
@ 2018-01-02 21:28         ` Allen Li
  0 siblings, 0 replies; 18+ messages in thread
From: Allen Li @ 2018-01-02 21:28 UTC (permalink / raw)
  To: Florian Beck; +Cc: emacs-orgmode

On Tue, Jan 2, 2018 at 7:28 AM, Florian Beck <fb@fbeck.net> wrote:
>
>> AFAIK, this is the first time this need is expressed on this ML. There
>> is no equivalent in "org-list.el" either.
>
>
> A way to handle duplicates would be useful, indeed. But a basic function
> should only remove duplicates that are truly identical (same properties,
> same tags, same/no content). Still, removing true duplicates from subtrees
> (AND lists) would be useful.
>
> More useful would be a slightly more general approach. I have three kinds of
> duplicates:
>  - duplicate IDs (which are handled rather poorly),
>  - duplicate content (which often is only almost identical) and
>  - duplicate headings (which usually I want to rectify when they are on
>    the same level of the same subtree)
>
> As you can see, a fixed concept of duplication is probably not going to
> work.

It sounds like this problem might vary too much across use cases to
generalize and include in Org mode because...

>
> What I'd like is a function finds duplicates according to scope, match (as
> in `org-map-entries') and a user defined function. This function should then
> display the problem cases (via agenda view?). Then we need a couple of
> convenience functions like
>  - delete all duplicates but the one at point,
>  - mark duplicates I want to keep,
>  - uniquify entries (tricky; for headlines maybe prompt the user; for
>    IDs, we should check if the ID is referenced from somewhere)
>  - merge entries.
>
> But then, I also have duplicates (in content) I want to keep, e.g. one in my
> notes and in a writing project. So, we'd need a property like
> "DUPLICATE_OF".

You’re requesting a different feature, removing duplicates across all
agenda files.  My initial proposal was just for removing duplicate
direct children (whether by heading or full text pending discussion)
under the heading at point.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: New feature? Remove duplicate subheadings, preserving order
  2018-01-02 14:36             ` Robert Horn
@ 2018-01-02 21:34               ` Allen Li
  0 siblings, 0 replies; 18+ messages in thread
From: Allen Li @ 2018-01-02 21:34 UTC (permalink / raw)
  To: Robert Horn; +Cc: Adam Porter, emacs-orgmode

On Tue, Jan 2, 2018 at 6:36 AM, Robert Horn <rjhorniii@gmail.com> wrote:
>
> Allen Li writes:
>
>> On Mon, Jan 1, 2018 at 8:07 PM, Adam Porter <adam@alphapapa.net> wrote:
>>
>> I don’t see a use case for checking all heading data.
>>
>
> I can see such cases arising from templates and time tracking.  I can
> have a template that captures telephone calls.  The call comes in and I
> start the template.  At this point the heading is just "Received Phone
> Call" and a time tracking start.  Time tracking is eventually kept in a
> drawer, not in the headline.
>
> I might eventually go back an revise the headline based on notes from
> the call, but that will not happen during the call.  It's quite likely
> that sorting out the calls will happen at the end of the day or the next
> day.
>
> Similarly, I receive lab results.  These will initially be a headline
> with just "Lab Result", a time tag like CLOCK, and a tag to indicate
> that it is a lab result.  Some time later I might move them around to
> attach them to a patient or project, but often by just moving them as
> element into the right section for that patient or project.  So these
> also have the same headline contents and different headline data.

It doesn’t sound like you end up with duplicates though?  How do you
envision using duplicate removal for your workflows?

In any case, it sounds like you want to assign unique IDs to each entry
as suggested by Florian and remove duplicates using IDs instead of
matching against heading data, which could collide through sheer luck.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: New feature? Remove duplicate subheadings, preserving order
  2018-01-02 21:22               ` Allen Li
@ 2018-01-03  7:24                 ` Adam Porter
  0 siblings, 0 replies; 18+ messages in thread
From: Adam Porter @ 2018-01-03  7:24 UTC (permalink / raw)
  To: emacs-orgmode

Allen Li <vianchielfaura@gmail.com> writes:

> Designing around actual use cases that users have an immediate use for
> is better than trying to predict what users might need in the far
> future, especially if adding those features requires extra complexity.

You seem to be approaching this from a "use case" perspective.  I and
several other posters on the list are concerned from the "potential
damage" perspective.  It's vitally important as software developers to
anticipate potential user *actions*, regardless of their intentions or
needs, and to proactively defend against mistakes that may cause data
loss.

>  Everyone should be keeping reliable backups.  This is reiterated all
>  the time, yet no one seems to follow it? =)

As I mentioned, I am not merely hypothesizing: I have experienced such
data loss myself, which I only recovered months after the fact when I
noticed and was able to track it down in the git repository I
automatically commit most of my Org files to.  This was a
time-consuming, laborious, manual process which we cannot expect most
Org users to be able to do; how many Org users do you think know how to
use git and commit all their Org files to it?  On top of that, as I and
others mentioned, due to the nature of Emacs, Org, and plain-text
buffers, it is very easy for such changes to happen outside the visible
portion of the buffer, in which case even the most experienced user is
unlikely to notice such data loss.  In that case, it might go
permanently unnoticed.  For example, I have some large Org files that I
capture data into, with hundreds or thousands of top-level headings.  If
I accidentally cut one of those subtrees, outside the visible portion of
the buffer, which I had captured weeks or months earlier, how would I
even know that it was missing?  Most likely I would not, because the
whole point of storing them in Org is that I will forget about them if I
don't.

An analogy is, if you ran an "rm -rf" command with a wildcard, and
accidentally left off a slash somewhere without realizing it, how long
would it be before you noticed that you had deleted the wrong data?
What if you were operating on a directory deep in a hierarchy that is
essentially an archive of rarely accessed files?  How long would it be
before you noticed the mistake?  Would you still be able to recover from
it?  What if this happened in a script you wrote, rather than an
interactive command?  Have you ever made a mistake in a Bash script that
had undesirable effects?  I can't help but be reminded of this recent
classic:

https://github.com/ValveSoftware/steam-for-linux/issues/3671

Another anecdote: I once almost lost my GPG private key, because somehow
(I still have no idea what happened) it was truncated.  This went
unnoticed by me for a long time, and the truncated file was *backed up*
over and over again.  When I finally noticed, all of my readily
available backups had the truncated version of the file.  I was
only--very luckily!--able to recover it by digging out old CD-R backups
I had made years earlier, one of which had an intact copy of the file.
That experience taught me some lessons, among which are 1. Keep your old
backups, and 2. Respect Murphy's Law.

> If you had an immediate use case in mind, I would love to hear it.
> There's no need to suggest your use case as conceivable if it's
> something you could benefit from immediately.

So the "use case" here is simply "using Org, without losing data by
accidentally activating a command in Emacs that operates on data that
may or may not be currently visible--because Emacs and Org are
incredibly complex software that is extensively customized by users in
ways that the developers often do not anticipate--therefore we should
use extra caution in code that manipulates user data in destructive
ways."

I hope that these explanations help clarify our perspectives.  We are
not attacking you, your code, or its potential usefulness.  We are
simply concerned about implementing such things in ways that are best
for users.

Thanks,
Adam

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: New feature? Remove duplicate subheadings, preserving order
  2018-01-02 16:36             ` Nick Dokos
  2018-01-02 21:22               ` Allen Li
@ 2018-01-03  7:40               ` Adam Porter
  2018-01-03  8:19                 ` Ihor Radchenko
  1 sibling, 1 reply; 18+ messages in thread
From: Adam Porter @ 2018-01-03  7:40 UTC (permalink / raw)
  To: emacs-orgmode

Nick Dokos <ndokos@gmail.com> writes:

> There be dragons.
>
> The problem is that some things happen invisibly and far away from
> where you are, so you don't know about it and you don't find out for a
> couple of weeks.  Undo and automatic backups are useless in that case.
>
> That *has* happened: there have been multiple postings in the ML about
> such problems. Whenever it has happened, the devs have always modified
> org to make it safer: that is the prudent thing to do and the correct
> course of action IMO.
>
> Hell hath no fury like an orgmode user who lost part of his/her
> precious org file because of an errant keystroke a month ago and was
> not aware of the loss until it was too late.

Indeed.  Maybe I'm just paranoid, but having worked with Org code a bit,
I still wonder sometimes if I have ever accidentally wiped out a subtree
without noticing.  Would I ever notice that it's missing?  Even if it's
stored in git or a backup, how can I restore something that I don't know
needs restoring?

Some of this is simply the nature of computers, I think--a keystroke
here, a blink of the eye there, and poof, the data is gone.  If the
point is in one buffer when my fingers press C-c C-x C-w, but my eyes
are in another buffer, does the subtree still get deleted?  :)

And despite how great Emacs and Org are, this is one area in which their
power may make them more vulnerable to such issues.  Their use of global
state and special variables also makes unintended consequences easier to
achieve.

This is why I think we should always be very careful.  Org is nothing if
we can't trust it to keep our data safe!  :)

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: New feature? Remove duplicate subheadings, preserving order
  2018-01-03  7:40               ` Adam Porter
@ 2018-01-03  8:19                 ` Ihor Radchenko
  2018-01-03  9:39                   ` Adam Porter
  0 siblings, 1 reply; 18+ messages in thread
From: Ihor Radchenko @ 2018-01-03  8:19 UTC (permalink / raw)
  To: Adam Porter, emacs-orgmode

[-- Attachment #1: Type: text/plain, Size: 2254 bytes --]

Is there any possible way to prevent it more reliably?

I am aware of org-catch-invisible-edits, but this is obviously not
enough. Does it make sense to generate some kind of subtree based diff
after each change, so that user can review all recent changes in org
files?

Ihor

Adam Porter <adam@alphapapa.net> writes:

> Nick Dokos <ndokos@gmail.com> writes:
>
>> There be dragons.
>>
>> The problem is that some things happen invisibly and far away from
>> where you are, so you don't know about it and you don't find out for a
>> couple of weeks.  Undo and automatic backups are useless in that case.
>>
>> That *has* happened: there have been multiple postings in the ML about
>> such problems. Whenever it has happened, the devs have always modified
>> org to make it safer: that is the prudent thing to do and the correct
>> course of action IMO.
>>
>> Hell hath no fury like an orgmode user who lost part of his/her
>> precious org file because of an errant keystroke a month ago and was
>> not aware of the loss until it was too late.
>
> Indeed.  Maybe I'm just paranoid, but having worked with Org code a bit,
> I still wonder sometimes if I have ever accidentally wiped out a subtree
> without noticing.  Would I ever notice that it's missing?  Even if it's
> stored in git or a backup, how can I restore something that I don't know
> needs restoring?
>
> Some of this is simply the nature of computers, I think--a keystroke
> here, a blink of the eye there, and poof, the data is gone.  If the
> point is in one buffer when my fingers press C-c C-x C-w, but my eyes
> are in another buffer, does the subtree still get deleted?  :)
>
> And despite how great Emacs and Org are, this is one area in which their
> power may make them more vulnerable to such issues.  Their use of global
> state and special variables also makes unintended consequences easier to
> achieve.
>
> This is why I think we should always be very careful.  Org is nothing if
> we can't trust it to keep our data safe!  :)
>
>

-- 
Ihor Radchenko,
PhD Student
Singapore University of Technology and Design,
8 Somapah Road Singapore 487372
Email: yantar92@gmail.com, ihor_radchenko@mymail.sutd.edu.sg
Tel: +6584017977

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 487 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: New feature? Remove duplicate subheadings, preserving order
  2018-01-03  8:19                 ` Ihor Radchenko
@ 2018-01-03  9:39                   ` Adam Porter
  0 siblings, 0 replies; 18+ messages in thread
From: Adam Porter @ 2018-01-03  9:39 UTC (permalink / raw)
  To: emacs-orgmode

Ihor Radchenko <yantar92@gmail.com> writes:

> Is there any possible way to prevent it more reliably?
>
> I am aware of org-catch-invisible-edits, but this is obviously not
> enough. Does it make sense to generate some kind of subtree based diff
> after each change, so that user can review all recent changes in org
> files?

That's a good question.  If you want to be max-paranoid, I guess you
should put all your Org files in git, and review and commit all changes
with magit.  For certain things that might make sense (e.g. I do that
with readme files in published projects), but for my personal Org files,
that would feel like a burden to me.

I do store my personal Org files in git, but I don't review the changes
manually.  I commit the changes automatically with a cron job and when my
"emacs-raise-or-run" script raises or minimizes the Emacs window.  If I
ever need to review the changes, I can use magit or gitk.

If you add this to your .git/config file in a git repo of Org files, it
uses Org heading lines as diff headers, which helps when reviewing
changes:

[diff "org"]
    xfuncname = "^\\*+ +.*$"

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2018-01-03  9:40 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-01-01  2:42 New feature? Remove duplicate subheadings, preserving order Allen Li
2018-01-01  5:55 ` Marcin Borkowski
2018-01-01 10:04 ` Nicolas Goaziou
2018-01-01 11:59   ` Allen Li
2018-01-01 18:26     ` Nicolas Goaziou
2018-01-01 23:04       ` Allen Li
2018-01-02  4:07         ` Adam Porter
2018-01-02  7:40           ` Allen Li
2018-01-02 14:36             ` Robert Horn
2018-01-02 21:34               ` Allen Li
2018-01-02 16:36             ` Nick Dokos
2018-01-02 21:22               ` Allen Li
2018-01-03  7:24                 ` Adam Porter
2018-01-03  7:40               ` Adam Porter
2018-01-03  8:19                 ` Ihor Radchenko
2018-01-03  9:39                   ` Adam Porter
2018-01-02 15:28       ` Florian Beck
2018-01-02 21:28         ` Allen Li

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).