From mboxrd@z Thu Jan  1 00:00:00 1970
From: "Sebastien Vauban" <wxhgmqzgwmuf-geNee64TY+gS+FvcfC7Uqw@public.gmane.org>
Subject: Re: org babel support for tcl and awk
Date: Wed, 25 May 2011 14:30:01 +0200
Message-ID: <80aaeb2cae.fsf@somewhere.org>
References: <20110524113109.fo2dcd0mwno0c4w4@webmail.dds.nl>
	<8739k46z2p.fsf@gmail.com> <874o4kvvcb.fsf@ucl.ac.uk>
	<87vcwz9b02.fsf@gmail.com> <80boyr50w4.fsf@somewhere.org>
	<87lixvd5ei.fsf@gmail.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable
Return-path: <emacs-orgmode-bounces+geo-emacs-orgmode=m.gmane.org-mXXj517/zsQ@public.gmane.org>
List-Id: "General discussions about Org-mode." <emacs-orgmode.gnu.org>
List-Unsubscribe: <https://lists.gnu.org/mailman/options/emacs-orgmode>,
	<mailto:emacs-orgmode-request-mXXj517/zsQ@public.gmane.org?subject=unsubscribe>
List-Archive: </archive/html/emacs-orgmode>
List-Post: <mailto:emacs-orgmode-mXXj517/zsQ@public.gmane.org>
List-Help: <mailto:emacs-orgmode-request-mXXj517/zsQ@public.gmane.org?subject=help>
List-Subscribe: <https://lists.gnu.org/mailman/listinfo/emacs-orgmode>,
	<mailto:emacs-orgmode-request-mXXj517/zsQ@public.gmane.org?subject=subscribe>
Errors-To: emacs-orgmode-bounces+geo-emacs-orgmode=m.gmane.org-mXXj517/zsQ@public.gmane.org
Sender: emacs-orgmode-bounces+geo-emacs-orgmode=m.gmane.org-mXXj517/zsQ@public.gmane.org
To: emacs-orgmode-mXXj517/zsQ@public.gmane.org

Hi Eric,

Eric Schulte wrote:
> "Sebastien Vauban" <wxhgmqzgwmuf-geNee64TY+gS+FvcfC7Uqw@public.gmane.org> writes:
>> Eric Schulte wrote:
>>> Eric S Fraga <e.fraga-hclig2XLE9Zaa/9Udqfwiw@public.gmane.org> writes: I've made a quick change so th=
at
>>> any variable named "stdin" is treated specially, in that, rather than
>>> using its value to replace strings of $stdin in the text of the awk cod=
e,
>>> the value of the stdin variable is saved into the file processed by awk.
>>> This allows awk to operate over Org-mode references.
>>>
>>> If babel code block supported a pipe or an actual stdin header argument,
>>> that would be the ideal way to add this behavior, but currently nothing=
 of
>>> that nature exists.
>>>
>>> Please let me know if this misses part of your suggestion, or more
>>> generally what else may be advisable before we add this to the core.
>>
>> Could this be implemented for sh as well?
>>
>> AFAI understand, this is exactly the missing piece for me to be able to:
>
> Unfortunately this simple hack for ob-awk does not address the need you l=
ink
> to below -- which I am aware of and which is on my list of larger
> longer-term Babel development items. I think that a future piping
> implementation will be the ultimate solution to the issues you address.

Glad to hear you understand my wish. It's not always easy to express myself=
 in
a very clean, with English not being my mother tongue, especially when tryi=
ng
to tackle difficult subjects.

> Such an implementation -- allowing data to flow between concurrently
> executing blocks utilizing posix pipes -- will require more sophisticated
> processes interaction and possibly some form of multi-threaded elisp
> execution.

Just for the sake of clarity, I don't need concurrent or multi-threaded
execution of any kind.

My double-sided goal is:

1. to cut a shell script in small parts, and explain what every part does,
   with a runnable example (=3DC-c C-v C-e=3D).

2. to tangle the executable script out of the Babel document, by concatenat=
ing
   all its parts (=3DC-c C-v C-t=3D).

A quite "dumb" example follows. I've made it as _minimal_ and as _complete_=
 as
possible, to be able to _express my point_, for further reference.

* Abstract

This script "americanizes" a European CSV file.

* Sample data

The following is a sample CSV file:

#+results: sample-csv
#+begin_example
Date;Amount;Account
28-05-2010;-6.806,25;999-1974050-30
04-06-2009;420,00;999-1500974-23
24-02-2009;-54,93;999-1974050-30
#+end_example

* Script

What the script must do is:

** Load the data

Read the raw contents of the input file.

#+srcname: load-data
#+begin_src sh :var data=3Dsample-csv :results output :exports both
echo "$data"
#+end_src

#+results: load-data
#+begin_example
Date;Amount;Account
28-05-2010;-6.806,25;999-1974050-30
04-06-2009;420,00;999-1500974-23
24-02-2009;-54,93;999-1974050-30
#+end_example

** Convert the date in American format

Convert the date in =3DMM/DD/YYYY=3D format.

#+srcname: convert-date
#+begin_src sh :var data=3Dload-data :results output :exports both
echo "$data" |\
sed -r 's/^([[:digit:]]{2})-([[:digit:]]{2})-([[:digit:]]{4})/\2\/\1\/\3/g'=
 |\
sed -r 's/^([[:digit:]]{2})\/([[:digit:]]{2})\/([[:digit:]]{2})/\2\/\1\/20\=
3/g'
#+end_src

#+results: convert-date
#+begin_example
Date;Amount;Account
28/05/202010;-6.806,25;999-1974050-30
04/06/202009;420,00;999-1500974-23
24/02/202009;-54,93;999-1974050-30
#+end_example

** Convert the separators

Apply the following operations in order to "americanize" the CSV file recei=
ved
from the bank:

- remove the dot used as thousands separator (=3D.=3D -> =3D=3D)
- replace the comma used as decimal separator by a dot (=3D,=3D -> =3D.=3D)
- replace other commas by a dot (=3D,=3D -> =3D.=3D)
- replace the semi-comma used as field separator by a comma (=3D;=3D -> =3D=
,=3D)

#+srcname: convert-separators
#+begin_src sh :var data=3Dconvert-date :results output :exports both
echo "$data" |\
sed -r 's/([[:digit:]])\.([[:digit:]]{3})/\1\2/g' |\
sed -r 's/([[:digit:]]),([[:digit:]]{2})/\1.\2/g' |\
sed -r 's/,/./g' |\
sed -r 's/;/,/g'
#+end_src

#+results: convert-separators
#+begin_example
Date,Amount,Account
28/05/202010,-6806.25,999-1974050-30
04/06/202009,420.00,999-1500974-23
24/02/202009,-54.93,999-1974050-30
#+end_example

* Full code

The script is then:

#+begin_src sh :tangle americanize-csv.sh :noweb yes
#!/bin/bash
# americanize-csv.sh -- Convert CSV file to American format

# Usage: americanize-csv FILE.CSV

cat $1 |\
<<convert-date>> |\
<<convert-separators>>

exit 0

# americanize-csv.sh ends here
#+end_src

As you can see, the tangled script is not executable anymore, as I've been
forced to put =3Decho $data=3D commands, in every apart code block, as thei=
r first
command to run.

#+begin_src sh
#!/bin/bash
# americanize-csv.sh -- Convert CSV file to American format

# Usage: americanize-csv FILE.CSV

cat $1 |\
echo "$data" |\
sed -r 's/^([[:digit:]]{2})-([[:digit:]]{2})-([[:digit:]]{4})/\2\/\1\/\3/g'=
 |\
sed -r 's/^([[:digit:]]{2})\/([[:digit:]]{2})\/([[:digit:]]{2})/\2\/\1\/20\=
3/g' |\
echo "$data" |\
sed -r 's/([[:digit:]])\.([[:digit:]]{3})/\1\2/g' |\
sed -r 's/([[:digit:]]),([[:digit:]]{2})/\1.\2/g' |\
sed -r 's/,/./g' |\
sed -r 's/;/,/g'

exit 0

# americanize-csv.sh ends here
#+end_src

Would I have the possibility to play with =3Dstdin=3D, I could have "hidden=
" that
first line, and assume all the code I'm writing will be executed against
what's read on =3Dstdin=3D. As well in the Org buffer, as in the stand-alon=
e shell
script. Right?

#+begin_src sh
#!/bin/bash
# americanize-csv.sh -- Convert CSV file to American format

# Usage: americanize-csv FILE.CSV

cat $1 |\
sed -r 's/^([[:digit:]]{2})-([[:digit:]]{2})-([[:digit:]]{4})/\2\/\1\/\3/g'=
 |\
sed -r 's/^([[:digit:]]{2})\/([[:digit:]]{2})\/([[:digit:]]{2})/\2\/\1\/20\=
3/g' |\
sed -r 's/([[:digit:]])\.([[:digit:]]{3})/\1\2/g' |\
sed -r 's/([[:digit:]]),([[:digit:]]{2})/\1.\2/g' |\
sed -r 's/,/./g' |\
sed -r 's/;/,/g'

exit 0

# americanize-csv.sh ends here
#+end_src

* Conclusions

As you can see, I did not really mean any concurrent execution. Simply being
able to execute parts of code in-situ, in the Org buffer, to document (and
test) what I'm writing.

And to be able to assemble all the parts in one single script file, by the
means of literate programming.

Best regards,
  Seb

--=20
S=C3=A9bastien Vauban