From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mp0 ([2001:41d0:2:4a6f::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by ms11 with LMTPS id UOuCBZDai1+3BgAA0tVLHw (envelope-from ) for ; Sun, 18 Oct 2020 06:02:56 +0000 Received: from aspmx1.migadu.com ([2001:41d0:2:4a6f::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by mp0 with LMTPS id iOErAZDai19wDwAA1q6Kng (envelope-from ) for ; Sun, 18 Oct 2020 06:02:56 +0000 Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by aspmx1.migadu.com (Postfix) with ESMTPS id A83A6940224 for ; Sun, 18 Oct 2020 06:02:54 +0000 (UTC) Received: from localhost ([::1]:41082 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1kU1mO-00024E-SQ for larch@yhetil.org; Sun, 18 Oct 2020 02:02:52 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:36940) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1kU1lD-00022s-Cm for emacs-orgmode@gnu.org; Sun, 18 Oct 2020 02:01:39 -0400 Received: from mail-wm1-x32b.google.com ([2a00:1450:4864:20::32b]:40580) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1kU1lA-0004QJ-Rm for emacs-orgmode@gnu.org; Sun, 18 Oct 2020 02:01:39 -0400 Received: by mail-wm1-x32b.google.com with SMTP id k18so9517896wmj.5 for ; Sat, 17 Oct 2020 23:01:34 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=ykh0mFFleYw+RjPvSQR1KT+cpcZWEajf2l0FGzeyZ+0=; b=JbFAUuTzR8jCSJFpImJhJoBIUwoBnnqEPp4wRe6oQcrDTwCSJccFVGQWEnOtiyPBIU yPUKTeT0wfLj5jmtkJsTIPq5QEQDPshBJOY1ODiWI1CO+FaWAv+TBpUGm198xL1OUPYW gFSctf38uRp0rxmN85/HpnGyhi9xP5+WX8hcO07ioKBwIRohsr66xmn/E1y97ML+OG+t WrmhRhWYq57y+oMBlXS77yDpaPo0yNeEamA/1P2RfyE5QUwRMLhmryrn+sLfZNUNNtqN 3cP895eIcWGIa152NyuiL+7VRB0KU9/KLi0Qf2OF1bvv4UxeXukDG+tlLaYzvY2fvPaj 3Dvg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=ykh0mFFleYw+RjPvSQR1KT+cpcZWEajf2l0FGzeyZ+0=; b=OWF4/J3EUBtaFUfFAfsLEC2VdksjA43bAAOjPWdFxAfDxMCGD+vs4CGzzmqvnKMgN5 sQ8NA5G48EomDBO0hHYWkN1FDRbuF2iWaGV+c8/WOdVvnb/HKX3fv42ZdYafrkwMXO6/ +fTJCVUo1xabnKIOHDRPMDWHBbTIeg4HDPX88RvSLl2bQxIBgDaUR2XzTlf5kmEwjV2d 7utg/JJnvncM2I9qMjSq1HxLmgaINrQXy3nYXDnAqg2amarqeiQr6X6x8XrSJmGIROaY xRVJ7RKgaMNSpZsdSew/iuO9derVd9MvnxiOtlI18FbVmT/uDpE1CJB71845rQagABZr 1TNA== X-Gm-Message-State: AOAM530eniFi1Xs0SOeJoFnBk5p6+9Bb63FFOQpQ+2XS1+tG+XiciS3U 4HKhPcReOUh1rXVTqYybOceVwbE0ZMOnN0v0+s4= X-Google-Smtp-Source: ABdhPJwsRksvxLgqgwXf6l/5SuvCDWVvLOK0q8C3bY2fDbY+TrN5pDhKoYyXOUv8UNhQg1iSd21SRyvZuCzSrYRLSH0= X-Received: by 2002:a7b:c09a:: with SMTP id r26mr11650478wmh.45.1603000893352; Sat, 17 Oct 2020 23:01:33 -0700 (PDT) MIME-Version: 1.0 References: <1695924.1602785484@apollo2.minshall.org> <87h7qup5zs.fsf@gmail.com> In-Reply-To: <87h7qup5zs.fsf@gmail.com> From: Tom Gillespie Date: Sun, 18 Oct 2020 02:01:20 -0400 Message-ID: Subject: Re: best practices query: non-emacs packages based on tangled source To: Greg Minshall Content-Type: text/plain; charset="UTF-8" Received-SPF: pass client-ip=2a00:1450:4864:20::32b; envelope-from=tgbugs@gmail.com; helo=mail-wm1-x32b.google.com X-detected-operating-system: by eggs.gnu.org: No matching host in p0f cache. That's all we know. X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: emacs-orgmode@gnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "General discussions about Org-mode." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: emacs-orgmode , TEC Errors-To: emacs-orgmode-bounces+larch=yhetil.org@gnu.org Sender: "Emacs-orgmode" X-Scanner: scn0 Authentication-Results: aspmx1.migadu.com; dkim=pass header.d=gmail.com header.s=20161025 header.b=JbFAUuTz; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (aspmx1.migadu.com: domain of emacs-orgmode-bounces@gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=emacs-orgmode-bounces@gnu.org X-Spam-Score: -1.71 X-TUID: Qn0vW2jEhjFo Hi Greg, Great question. This came out a bit longer than I anticipated since I wrote up a couple of relevant workflows. Sync between org source blocks and tangled code is something that I think needs improvement. I have covered the difference in semantics between tangled code and babel evaluated code (along with some others factors) since it can have an impact on what workflows you might choose. Following on Timothy's suggestion, one key thing that I think is needed is the ability to detangle nested and arbitrary code. Detangle of code tangled with :comments noweb is not fully implemented. A full detangling implementation would make it easier and safer to automatically detangle back to the org source blocks using a pre-commit hook or similar. Detangling from the org sources is also something that needs to be implemented for this to work. I think that a more complete detangling implementation could go a long way toward making it easier for those who are not used to org to commit to a project. In the absence of this, I have found that most of my existing workflows actively avoid keeping tangled code and org sources tracked in git at the same time unless absolutely necessary, and even with git there to back me up I have shot myself in the foot tangling over files that I forgot to detangle. Below are a few examples. In all cases I have had to consciously work around the issue of having tangled code that is outside the source of truth that is the org file. As an additional note before the examples, I have found that the trade offs when tangling code also depend on the language you are using. For example, I consider some languages, such as Python, to be obligate tanglers since their semantics conflate modules and files. Org babel might be able to work around this in some cases, but it would mean, for example, that ob-python would have to explicitly compensate for this deficiency by implementing the ability to treat source blocks as modules to be loaded into a session or somehow pulled in during the prologue by pre-parsing blocks to look for import statements, etc. The deficiencies of a language mean that if you want certain functionality for that language then org babel can't just treat the code as text, and might have to go to great lengths to try to keep the semantics of babel evaluation and of tangled code aligned. The workflow that I have found to be the most reasonable I developed while working on an elisp project (it is public but I'm not quite ready to link it on this list). In this workflow I add a hook via ~(add-hook 'before-save-hook #'org-babel-tangle nil t)~ for any org file that should be tangled, that way I don't have to worry about whether I remember to tangle etc. However, there are a couple of issues. It does not work in reverse, you still always have to edit the org blocks. It will become annoyingly slow if you have many blocks to tangle. You really want it to run only when the source blocks change, not whenever the org files changes. Finally, The exact semantics of tangling multiple blocks to the same file can have a major impact on performance. So even this best case is not the greatest and doesn't enable your specific use case (detangle issues in particular are a show stopper). Since this is an elisp project I have to run tests on the tangled file in a separate instance of Emacs to ensure that it works as expected. In this project I also have a completely unreadable file that is valid and executable in 3 languages. Tangled blocks that are easily readable in the org sources are commented out intentionally in the tangled file. This is a worst case for detangling. I suspect that it can be done, but it will push the detangling implementation to the limit. At the moment, there is no way to detangle this file back to readable form at all, and it is not clear that anyone should try to edit the tangled file in the first place. All this to say, if we reason from this extreme example, maybe the best thing is to tangle at the last possible moment, never keep the tangled form under version control etc. Unfortunately the use case for this file is to bootstrap Emacs, which means that in order for it to be useful it _must_ be tangled and put under version control since the systems it needs to run on don't have Emacs. The worst experience I have had was when I was developing python code where I needed to capture the output of the block in order to populate tables. Over time the code grew to the point where it needed to become a library. This is where Python being an obligate tangler reared its head, and the differences in semantics between tangled and evaluated code became a major pain. Combine this with the fact that my testing workflows in Python essentially require me to edit the tangled code for me not to lose my mind, but I would also forget to detangle, and sometimes overwrite on retangle, and I was quite unhappy. As a result I eventually gave up and moved all python development out of the org file except for the few critical parts that were needed to produce the tables. There were simply too many steps between modifying a file and being able to test changes (my time writing elisp and common lisp has massively reduced my tolerance for this kind of thing). The risk of forgetting whether I had or needed to tangle, or detangle went to zero. I was much more productive and could do sane things like safely import the python code into other modules etc. Even if you automatically tangle code to the file system on save, you still have to be able to use it from the org file. In obligate tanglers like Python this means that you must figure out how to do something like setting ~PYTHONPATH~ so that org babel can find it. There weren't good ways to do this inside a single org file and adding a random path to your .bashrc for each one of these would be a nightmare not to mention that it completely defeats the purpose of using org to simplify documentation of code (this is one use case for the elisp project discussed above). A slightly better experience with Python is one where I have an existing code base with a single module containing most of the implementation. I then wrote a developer guide as an org mode file and I tangle that code to a submodule. https://github.com/SciCrunch/sparc-curation/blob/master/docs/developer-guide.org#datasets This was not nearly as bad as the other python project because I wrote each source block as if it were its own complete file and module. This severely limited the style that I could use and recombination and reuse within the org file is difficult (as noted). I still have to tangle everything before I test, and I have to (if I have not already) add the tangled files to .gitignore so that other developers cannot accidentally edit them (runnable documentation is cool, except when people don't read it and start modifying just the runnable part). To compensate for this I now have a build time dependency on Emacs (major WTF right there ya?) that all the python packaging tools know nothing about, just so that there is only a single source of truth for the python code. Now, you would think that I could use the source block header arguments with the modularized example code to run the code via org babel directly, but it is not really possible because when tangled I import code from other source blocks as a module, but in org babel that means those files would still have to be tangled, otherwise the python import system could not find the code. Maybe ob-python could be enhanced to dynamically load other source blocks as modules? I'm sure that other languages have similar issues. Just to be a bit less harsh on Python, I have had similar issues developing code in bash that needed to be sourced in order for the functions to be available for use in a shell. Sometimes I would find myself accidentally editing the tangled source and forget to detangle, or was unable to detangle because I was using :comments noweb. As mentioned above, I think this is the single largest issue preventing sane workflows for keeping tangled files and org sources in sync. Another example of how the quality and experience of the workflow depends on the language you are working in. I started a project (https://github.com/tgbugs/git-share) in common lisp that includes other languages such as sql, bash, and elisp. I wanted to write everything in a single org file. In this case I have been able to develop two separate workflows. For production release I tangle all the files and then run ~save-lisp-and-die~ in sbcl. For development I have a workflow where everything is set up and runs via slime and all modifications can be made and run via org babel directly. For this project I also explicitly never commit the tangled code to git. This winds up being less of an issue for this project compared to the others because dumping the files to disk is only needed to create the production build (and might not even be required for that). One disadvantage of this approach (which also applies to a pure elisp babel approach) is that there aren't concrete source files so you can't use ~xref-find-definitions~ or ~slime-edit-definition~ to jump to a definition. I imagine that this is something that could be fixed though, so that the source location for definitions could point to lines in an org file. This kind of split setup is really only possible in languages where the semantics for an org babel session are the same as the semantics when tangled (common lisp and elisp being two examples). As mentioned, in Python this is virtually impossible because the semantics of the babel session and the semantics of a tangled file that start from the same block(s) are radically different. This is understandable due to the fact that the CL community put an enormous amount of effort into making sure that compiled code and interpreted code, top level and nested code had semantics that were as close to each other as possible (and it shows). Best! Tom