From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Sebastien Vauban" Subject: Re: org babel support for tcl and awk Date: Wed, 25 May 2011 14:30:01 +0200 Message-ID: <80aaeb2cae.fsf@somewhere.org> References: <20110524113109.fo2dcd0mwno0c4w4@webmail.dds.nl> <8739k46z2p.fsf@gmail.com> <874o4kvvcb.fsf@ucl.ac.uk> <87vcwz9b02.fsf@gmail.com> <80boyr50w4.fsf@somewhere.org> <87lixvd5ei.fsf@gmail.com> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Return-path: List-Id: "General discussions about Org-mode." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-orgmode-bounces+geo-emacs-orgmode=m.gmane.org-mXXj517/zsQ@public.gmane.org Sender: emacs-orgmode-bounces+geo-emacs-orgmode=m.gmane.org-mXXj517/zsQ@public.gmane.org To: emacs-orgmode-mXXj517/zsQ@public.gmane.org Hi Eric, Eric Schulte wrote: > "Sebastien Vauban" writes: >> Eric Schulte wrote: >>> Eric S Fraga writes: I've made a quick change so th= at >>> any variable named "stdin" is treated specially, in that, rather than >>> using its value to replace strings of $stdin in the text of the awk cod= e, >>> the value of the stdin variable is saved into the file processed by awk. >>> This allows awk to operate over Org-mode references. >>> >>> If babel code block supported a pipe or an actual stdin header argument, >>> that would be the ideal way to add this behavior, but currently nothing= of >>> that nature exists. >>> >>> Please let me know if this misses part of your suggestion, or more >>> generally what else may be advisable before we add this to the core. >> >> Could this be implemented for sh as well? >> >> AFAI understand, this is exactly the missing piece for me to be able to: > > Unfortunately this simple hack for ob-awk does not address the need you l= ink > to below -- which I am aware of and which is on my list of larger > longer-term Babel development items. I think that a future piping > implementation will be the ultimate solution to the issues you address. Glad to hear you understand my wish. It's not always easy to express myself= in a very clean, with English not being my mother tongue, especially when tryi= ng to tackle difficult subjects. > Such an implementation -- allowing data to flow between concurrently > executing blocks utilizing posix pipes -- will require more sophisticated > processes interaction and possibly some form of multi-threaded elisp > execution. Just for the sake of clarity, I don't need concurrent or multi-threaded execution of any kind. My double-sided goal is: 1. to cut a shell script in small parts, and explain what every part does, with a runnable example (=3DC-c C-v C-e=3D). 2. to tangle the executable script out of the Babel document, by concatenat= ing all its parts (=3DC-c C-v C-t=3D). A quite "dumb" example follows. I've made it as _minimal_ and as _complete_= as possible, to be able to _express my point_, for further reference. * Abstract This script "americanizes" a European CSV file. * Sample data The following is a sample CSV file: #+results: sample-csv #+begin_example Date;Amount;Account 28-05-2010;-6.806,25;999-1974050-30 04-06-2009;420,00;999-1500974-23 24-02-2009;-54,93;999-1974050-30 #+end_example * Script What the script must do is: ** Load the data Read the raw contents of the input file. #+srcname: load-data #+begin_src sh :var data=3Dsample-csv :results output :exports both echo "$data" #+end_src #+results: load-data #+begin_example Date;Amount;Account 28-05-2010;-6.806,25;999-1974050-30 04-06-2009;420,00;999-1500974-23 24-02-2009;-54,93;999-1974050-30 #+end_example ** Convert the date in American format Convert the date in =3DMM/DD/YYYY=3D format. #+srcname: convert-date #+begin_src sh :var data=3Dload-data :results output :exports both echo "$data" |\ sed -r 's/^([[:digit:]]{2})-([[:digit:]]{2})-([[:digit:]]{4})/\2\/\1\/\3/g'= |\ sed -r 's/^([[:digit:]]{2})\/([[:digit:]]{2})\/([[:digit:]]{2})/\2\/\1\/20\= 3/g' #+end_src #+results: convert-date #+begin_example Date;Amount;Account 28/05/202010;-6.806,25;999-1974050-30 04/06/202009;420,00;999-1500974-23 24/02/202009;-54,93;999-1974050-30 #+end_example ** Convert the separators Apply the following operations in order to "americanize" the CSV file recei= ved from the bank: - remove the dot used as thousands separator (=3D.=3D -> =3D=3D) - replace the comma used as decimal separator by a dot (=3D,=3D -> =3D.=3D) - replace other commas by a dot (=3D,=3D -> =3D.=3D) - replace the semi-comma used as field separator by a comma (=3D;=3D -> =3D= ,=3D) #+srcname: convert-separators #+begin_src sh :var data=3Dconvert-date :results output :exports both echo "$data" |\ sed -r 's/([[:digit:]])\.([[:digit:]]{3})/\1\2/g' |\ sed -r 's/([[:digit:]]),([[:digit:]]{2})/\1.\2/g' |\ sed -r 's/,/./g' |\ sed -r 's/;/,/g' #+end_src #+results: convert-separators #+begin_example Date,Amount,Account 28/05/202010,-6806.25,999-1974050-30 04/06/202009,420.00,999-1500974-23 24/02/202009,-54.93,999-1974050-30 #+end_example * Full code The script is then: #+begin_src sh :tangle americanize-csv.sh :noweb yes #!/bin/bash # americanize-csv.sh -- Convert CSV file to American format # Usage: americanize-csv FILE.CSV cat $1 |\ <> |\ <> exit 0 # americanize-csv.sh ends here #+end_src As you can see, the tangled script is not executable anymore, as I've been forced to put =3Decho $data=3D commands, in every apart code block, as thei= r first command to run. #+begin_src sh #!/bin/bash # americanize-csv.sh -- Convert CSV file to American format # Usage: americanize-csv FILE.CSV cat $1 |\ echo "$data" |\ sed -r 's/^([[:digit:]]{2})-([[:digit:]]{2})-([[:digit:]]{4})/\2\/\1\/\3/g'= |\ sed -r 's/^([[:digit:]]{2})\/([[:digit:]]{2})\/([[:digit:]]{2})/\2\/\1\/20\= 3/g' |\ echo "$data" |\ sed -r 's/([[:digit:]])\.([[:digit:]]{3})/\1\2/g' |\ sed -r 's/([[:digit:]]),([[:digit:]]{2})/\1.\2/g' |\ sed -r 's/,/./g' |\ sed -r 's/;/,/g' exit 0 # americanize-csv.sh ends here #+end_src Would I have the possibility to play with =3Dstdin=3D, I could have "hidden= " that first line, and assume all the code I'm writing will be executed against what's read on =3Dstdin=3D. As well in the Org buffer, as in the stand-alon= e shell script. Right? #+begin_src sh #!/bin/bash # americanize-csv.sh -- Convert CSV file to American format # Usage: americanize-csv FILE.CSV cat $1 |\ sed -r 's/^([[:digit:]]{2})-([[:digit:]]{2})-([[:digit:]]{4})/\2\/\1\/\3/g'= |\ sed -r 's/^([[:digit:]]{2})\/([[:digit:]]{2})\/([[:digit:]]{2})/\2\/\1\/20\= 3/g' |\ sed -r 's/([[:digit:]])\.([[:digit:]]{3})/\1\2/g' |\ sed -r 's/([[:digit:]]),([[:digit:]]{2})/\1.\2/g' |\ sed -r 's/,/./g' |\ sed -r 's/;/,/g' exit 0 # americanize-csv.sh ends here #+end_src * Conclusions As you can see, I did not really mean any concurrent execution. Simply being able to execute parts of code in-situ, in the Org buffer, to document (and test) what I'm writing. And to be able to assemble all the parts in one single script file, by the means of literate programming. Best regards, Seb --=20 S=C3=A9bastien Vauban