Dear Ihor,
   
> Regarding the question about buffer-lens interaction. Let's take even
> more complicated example:  To run the command, the user hits some key
> combination, which happens to be bound to different commands in the main
> buffer and in the lense buffer (i.e. the main buffer is in org-mode, the
> lense is in mingus-mode, and the key is C-d). What should be the
> behaviour in such a case? run the commands in both the buffers? decide
> depending on the point position? It is easy to make up similar
> complicated examples if you consider some exotic major modes in the
> lense buffer.

It's basically a question of customization, a client-side decision.
In other words, this really depends on what the user wants to happen.

This customization is done through the controller of the lens.

To your example.
If the desirable behavior (for you, as a user) for C-d is to run in the lens, then add "C-d" to the controller of the lens.
And then, whenever the point is in the area, C-d runs in the lens unconditionally.
(For the sake of terminology, we can say that the keybinding is "redirected".)

If you want C-d to work conditionally (sometimes do the org-mode thing and sometimes the mingus-mode thing), I am afraid there is nothing better than to update the controller yourself on the go.
And that's fine, because that's what the user wants (to use the same bind for two different things in the same place at different times).

(BTW, the controller could be asked to work "in reverse" and redirect all keybindings, except the ones in its black list.)

But speaking of the larger picture and integration, a user can define a list of key combinations for any mode and the list will be added to the controller if the lens runs that mode.
I think this should cover the vast majority of use-cases.
Of course, there is no reason for the logic of key addition not to be flexible enough to cover anything more exotic.

> I think that it would be more effective if someone decide on some basic
> approach for the low-level implementation of the lense-mode (which
> probably involves modifying emacs C-level source code) and continue the
> discussion according to the benefits/limitations of that kind of
> implementation.

I too look forward to hearing from someone about the low-level implementation possibilities :)
I especially hope the approach for the border-case issue (as described in my previous message) can work.

Best regards,
Dmitrii.