Re: [PATCH] ob-shell-test, test-ob-shell and introduction

emacs-orgmode@gnu.org archives
 help / color / mirror / code / Atom feed

From: Matt <matt@excalamus.com>
To: "Thomas S. Dye" <tsd@tsdye.online>
Cc: emacs-orgmode <emacs-orgmode@gnu.org>, Timothy <tecosaur@gmail.com>
Subject: Re: [PATCH] ob-shell-test, test-ob-shell and introduction
Date: Sat, 01 Jan 2022 23:32:00 -0500	[thread overview]
Message-ID: <17e190f0110.dffbf4bc1032.6816869643164380190@excalamus.com> (raw)
In-Reply-To: <8735m8ifvr.fsf@tsdye.online>

Apologies for the book.  I've been sitting on this stuff for two months and am wondering how to proceed.

IANAL but AFAIK/CT, my contract contains an exception for making contributions to projects like Org. I've gotten confirmation from my manager and by HR.  However, until the CEO signs the FSF disclaimer, I can't officially contribute.  I'm confident that I can publish changes (e.g. to a personal website); the FSF just can't accept my changes (yet).

I could start working on ob-shell.el separately now and publish that myself. It's not clear how I would then contribute those changes back once I'm cleared by the FSF.  I'm inclined towards some refactoring and I'm not sure how I could break that down in such a way that if it took two more months to get the copyright stuff worked out that anyone could make sense of the changes.  I would much prefer to gradually submit patches, discuss them, and then eventually have them merged in turn.  If I have a heap of changes elsewhere, I feel like it would be harder to have a conversion about them.

Regardless, I think I should define test cases first.  If those are considered valid, then any refactoring would be moot if they pass, right?  With (agreed upon) test cases, it shouldn't matter if I refactor as long as functionality remains the same?

Overall, what advice do you have?

It looks to me like ob-shell.el could use some love in other respects besides async evaluation.  I've detailed where below, partly for my own organization and partly for posterity, but mainly because this isn't my house, so to speak, and I don't want to barge in and start rearranging the furniture and eating whatever's in the fridge.  Or, is it like Worg in that once I have the keys I can drive where I like, so long as there're no crashes?

I'm interested in people's thoughts on these notes on ob-shell.el:

* Tests
There are several code paths, like shebang, cmdline, and basic execution, which don't always have something to do with one another in the code.  Having tests would be really helpful to make sure everything jives.  Before doing anything with the code base, I intend to create tests for all the functionality.

* 2D-array
I documented two options for the =:var= header[fn:1]. The ob-shell.el code actually defines a third option for 2D-arrays.  I couldn't get it to work.  It was one of several things not documented anywhere, at least as far as I could find, and which I had to figure out straight from the code.  Between not being great at shell scripting and having a hard time deciphering that ob-shell.el code, I'm not sure 2D-arrays are actually fully or correctly implemented.

* M-up behavior <<M-up>>
The =org-babel-load-session:shell= code path only works when M-up is used on a code block[fn:2]. Is M-up actually documented anywhere?  Furthermore, the =org-babel-load-session:shell= only works for the "shell" language, which is not actually a "proper" shell (i.e. it's not listed in =org-babel-shell-names=). The M-up defaults to using $ESHELL or shell-file-name through the =shell= command.

For example, try calling M-up on these:

#+comment: (opaquely) calls the system shell
#+begin_src shell :session my-shell
echo "hello, world!"  #+end_src

#+comment: fails
#+begin_src sh :session my-sh
echo "hello, world!"  #+end_src

#+comment: fails
#+begin_src bash :session my-bash
echo "hello, world!"  #+end_src

To fix this, there needs to be an =org-babel-load-session:<lang>= for each language in =org-babel-shell-names=.  This would probably make the most sense in =org-babel-shell-initialize=.  However, that function [[org-babel-shell-initialize][needs attention]].

* Refactoring <<refactoring>>
The ob-shell.el code appears inconsistent to me.  Clearly, this is somewhat subjective.  I've tried to give a rationale for each point to make it less so.  My goal is to be maintainer of ob-shell.el, but that's not forever.  These things were an obstacle for me and my aim is to remove them for the next person.

** =org-babel-shell-initialize= <<org-babel-shell-initialize>>
As alluded to elsewhere, =org-babel-shell-initialize= doesn't appear to adhere to the (elisp) Coding Conventions,

#+begin_quote
   • Constructs that define a function or variable should be macros, not functions, and their names should start with ‘define-’.  The macro should receive the name to be defined as the first argument.  That will help various tools find the definition automatically.  Avoid constructing the names in the macro itself, since that would confuse these tools.  #+end_quote

The =org-babel-shell-initialize= function defines =org-babel-execute:<lang>=, =org-babel-variable-assignments:<lang>=, and =org-babel-default-header-args:<lang>= for the "languages" in =org-babel-shell-names=.  As it stands, that =org-babel-shell-initialize= is a function does no harm (aside from being confusing by way of straying from convention).  However, if the [[M-up][M-up]] issue is to be resolved, it seems to me that =org-babel-shell-initialize= should be updated to match the convention (i.e. be a macro).

** Organization
Definitions are introduced in different orders.  For example, the =org-babel-shell-initialize= function whose sole purpose is to work with =org-babel-shell-names= is defined before the reader knows what =org-babel-shell-names= is.  Later, this pattern of defining the component pieces after they're used is reversed. For example, =org-babel-load-session:shell= relies on =org-babel-prep-session:shell= which is defined first.  I find defining terms before they're used makes a document more easy to comprehend than otherwise.  I want to reorder the definitions.

Similarly, some functions could use breaking out.  The most important is probably =org-babel-sh-evaluate= which handles the various header arguments.  There are various paths of execution each requiring separate logic, yet all live in the one large function.  Breaking these out would allow them to have separate docstrings and, I expect, be easier to understand since the logic of one would be (lexically) separated from the rest.

Other functionality might be better served by consolidating functions. There's a lot of fiddly code dedicated to variable assignment. Actually, much of the ob-shell.el file is related to variable assignment. The assignments are done using separate functions, yet all apply to the same task. They'll never be used for anything else.  Do they need to be split out?  Is there a technical reason? I don't see one.  Does it help comprehension?  When split out as they are, I found it hard to make sense of the context they're used in since they're defined apart from the logic that uses them (i.e. what header uses them, what form does the header take, etc.).  I think it's worth seeing if there's a better way to present that part of the code base.

** Naming
The following apply to all shells, not just "sh" and should be updated to be "shell". This looks like cruft from when ob-shell.el was called ob-sh.el AFAICT.

- =org-babel-sh-evaluate=
- =org-babel-sh-eoe-indicator=
- =org-babel-sh-eoe-output=
- =org-babel--variable-assignments:sh-generic=
- =org-babel-sh-var-to-sh=
- =org-babel-sh-var-to-string=
- =org-babel-sh-initiate-session=
- =org-babel-sh-evaluate=
- =org-babel-sh-strip-weird-long-prompt=

Generally speaking, I find the Org Babel code base tricky to read (especially at first).  I spent a good deal of time untangling what lived where and who did what. I can play along fine now that I'm familiar. However, since understanding took longer than I think was necessary, I want to detail the pain points as they have made contributing to Babel harder.

Overall, Babel somewhat breaks the (elisp) Coding Conventions for naming,

#+begin_quote
You should choose a short word to distinguish your program from other Lisp programs.  #+end_quote

I understand the variable/function name prefix to be the file name, typically.  The file name is often the package name, or more precisely the feature provided by the file[fn:3]. For Org Babel, there's not a solid file-to-prefix relation.  We say "Org Babel", but the main functionality is in ob-core and the various "ob-" files either extend or implement implied behavior (e.g. =org-babel-<lang>-execute=).  Is the "program" ob-core, ob-lang, or the whole suite of files?  This is a subjective question which the Org Babel "program" answers with, "the whole suite of files".  All components, across all "ob-" files, bear the name "org-babel-". This is still something that trips me up: is the current symbol core or not?  Who is responsible for what?

I would expect the core API to have its own prefix. The extensions would then define their code and have a different prefix, "ob-<lang>-". This way, readers/contributors could open the pertinent ob-* file, see the expected symbol prefix (e.g. "ob-shell-") and another prefix (e.g. "org-babel-") and be able to distinguish which is which.  As it stands, ob-core.el could be renamed to org-babel.el or the "org-babel-" prefix could be changed to "ob-core-".

Another possible solution, or a stopgap, would be to have a document detailing the Org Babel API[fn:4].

* Process interaction
Emacs has several different ways of interacting with processes. The ob-shell.el code uses a few of them. Since async is another way to interact with a process, a single process pattern could be used. The goal would be to make each of the different functionalities provided by ob-shell.el have a similar implementation.  The expectation is that this would benefit maintenance.

* Footnotes

[fn:1] https://orgmode.org/worg/org-contrib/babel/languages/ob-doc-shell.html#orgfa6b7c5

[fn:2] M-up is bound to =org-metaup-hook= and
=ob-core:org-babel-load-in-session= by default.

[fn:3] It's not clear to me if there's a technical definition for an
Emacs package.

[fn:4] I may extend my personal notes into a document detailing the
Org API. http://excalamus.com/2021-11-03-org_babel.html

next prev parent reply	other threads:[~2022-01-02  4:32 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-11-22 18:37 [PATCH] ob-shell-test, test-ob-shell and introduction Matt
2021-11-22 18:43 ` Timothy
2021-11-24 18:48   ` Matt
2021-12-02  9:39     ` Timothy
2021-12-06  3:50       ` Matt
2021-12-06  4:50         ` Thomas S. Dye
2021-12-18  7:03           ` Matt
2021-12-18 20:51             ` Thomas S. Dye
2021-12-31 17:04             ` Thomas S. Dye
2021-12-31 19:18               ` Matt
2021-12-31 22:11                 ` Thomas S. Dye
2022-01-02  4:32                   ` Matt [this message]
2022-01-02 18:57                     ` Thomas S. Dye
2022-01-05 17:12                 ` Max Nikulin
2022-01-06  3:47                   ` Matt
2022-01-07 16:18                     ` Max Nikulin
2021-11-22 19:08 ` Thomas S. Dye
2021-11-23  4:42   ` [PATCH] ob-doc-shell.org (was [PATCH] ob-shell-test, test-ob-shell and introduction) Matt
2021-11-23  4:59     ` Thomas S. Dye

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.orgmode.org/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=17e190f0110.dffbf4bc1032.6816869643164380190@excalamus.com \
    --to=matt@excalamus.com \
    --cc=emacs-orgmode@gnu.org \
    --cc=tecosaur@gmail.com \
    --cc=tsd@tsdye.online \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).