emacs-orgmode@gnu.org archives
 help / color / mirror / code / Atom feed
From: "Sebastien Vauban" <wxhgmqzgwmuf-geNee64TY+gS+FvcfC7Uqw@public.gmane.org>
To: emacs-orgmode-mXXj517/zsQ@public.gmane.org
Subject: Re: org babel support for tcl and awk
Date: Wed, 25 May 2011 14:30:01 +0200	[thread overview]
Message-ID: <80aaeb2cae.fsf@somewhere.org> (raw)
In-Reply-To: 87lixvd5ei.fsf@gmail.com

Hi Eric,

Eric Schulte wrote:
> "Sebastien Vauban" <wxhgmqzgwmuf-geNee64TY+gS+FvcfC7Uqw@public.gmane.org> writes:
>> Eric Schulte wrote:
>>> Eric S Fraga <e.fraga-hclig2XLE9Zaa/9Udqfwiw@public.gmane.org> writes: I've made a quick change so that
>>> any variable named "stdin" is treated specially, in that, rather than
>>> using its value to replace strings of $stdin in the text of the awk code,
>>> the value of the stdin variable is saved into the file processed by awk.
>>> This allows awk to operate over Org-mode references.
>>>
>>> If babel code block supported a pipe or an actual stdin header argument,
>>> that would be the ideal way to add this behavior, but currently nothing of
>>> that nature exists.
>>>
>>> Please let me know if this misses part of your suggestion, or more
>>> generally what else may be advisable before we add this to the core.
>>
>> Could this be implemented for sh as well?
>>
>> AFAI understand, this is exactly the missing piece for me to be able to:
>
> Unfortunately this simple hack for ob-awk does not address the need you link
> to below -- which I am aware of and which is on my list of larger
> longer-term Babel development items. I think that a future piping
> implementation will be the ultimate solution to the issues you address.

Glad to hear you understand my wish. It's not always easy to express myself in
a very clean, with English not being my mother tongue, especially when trying
to tackle difficult subjects.

> Such an implementation -- allowing data to flow between concurrently
> executing blocks utilizing posix pipes -- will require more sophisticated
> processes interaction and possibly some form of multi-threaded elisp
> execution.

Just for the sake of clarity, I don't need concurrent or multi-threaded
execution of any kind.

My double-sided goal is:

1. to cut a shell script in small parts, and explain what every part does,
   with a runnable example (=C-c C-v C-e=).

2. to tangle the executable script out of the Babel document, by concatenating
   all its parts (=C-c C-v C-t=).

A quite "dumb" example follows. I've made it as _minimal_ and as _complete_ as
possible, to be able to _express my point_, for further reference.

* Abstract

This script "americanizes" a European CSV file.

* Sample data

The following is a sample CSV file:

#+results: sample-csv
#+begin_example
Date;Amount;Account
28-05-2010;-6.806,25;999-1974050-30
04-06-2009;420,00;999-1500974-23
24-02-2009;-54,93;999-1974050-30
#+end_example

* Script

What the script must do is:

** Load the data

Read the raw contents of the input file.

#+srcname: load-data
#+begin_src sh :var data=sample-csv :results output :exports both
echo "$data"
#+end_src

#+results: load-data
#+begin_example
Date;Amount;Account
28-05-2010;-6.806,25;999-1974050-30
04-06-2009;420,00;999-1500974-23
24-02-2009;-54,93;999-1974050-30
#+end_example

** Convert the date in American format

Convert the date in =MM/DD/YYYY= format.

#+srcname: convert-date
#+begin_src sh :var data=load-data :results output :exports both
echo "$data" |\
sed -r 's/^([[:digit:]]{2})-([[:digit:]]{2})-([[:digit:]]{4})/\2\/\1\/\3/g' |\
sed -r 's/^([[:digit:]]{2})\/([[:digit:]]{2})\/([[:digit:]]{2})/\2\/\1\/20\3/g'
#+end_src

#+results: convert-date
#+begin_example
Date;Amount;Account
28/05/202010;-6.806,25;999-1974050-30
04/06/202009;420,00;999-1500974-23
24/02/202009;-54,93;999-1974050-30
#+end_example

** Convert the separators

Apply the following operations in order to "americanize" the CSV file received
from the bank:

- remove the dot used as thousands separator (=.= -> ==)
- replace the comma used as decimal separator by a dot (=,= -> =.=)
- replace other commas by a dot (=,= -> =.=)
- replace the semi-comma used as field separator by a comma (=;= -> =,=)

#+srcname: convert-separators
#+begin_src sh :var data=convert-date :results output :exports both
echo "$data" |\
sed -r 's/([[:digit:]])\.([[:digit:]]{3})/\1\2/g' |\
sed -r 's/([[:digit:]]),([[:digit:]]{2})/\1.\2/g' |\
sed -r 's/,/./g' |\
sed -r 's/;/,/g'
#+end_src

#+results: convert-separators
#+begin_example
Date,Amount,Account
28/05/202010,-6806.25,999-1974050-30
04/06/202009,420.00,999-1500974-23
24/02/202009,-54.93,999-1974050-30
#+end_example

* Full code

The script is then:

#+begin_src sh :tangle americanize-csv.sh :noweb yes
#!/bin/bash
# americanize-csv.sh -- Convert CSV file to American format

# Usage: americanize-csv FILE.CSV

cat $1 |\
<<convert-date>> |\
<<convert-separators>>

exit 0

# americanize-csv.sh ends here
#+end_src

As you can see, the tangled script is not executable anymore, as I've been
forced to put =echo $data= commands, in every apart code block, as their first
command to run.

#+begin_src sh
#!/bin/bash
# americanize-csv.sh -- Convert CSV file to American format

# Usage: americanize-csv FILE.CSV

cat $1 |\
echo "$data" |\
sed -r 's/^([[:digit:]]{2})-([[:digit:]]{2})-([[:digit:]]{4})/\2\/\1\/\3/g' |\
sed -r 's/^([[:digit:]]{2})\/([[:digit:]]{2})\/([[:digit:]]{2})/\2\/\1\/20\3/g' |\
echo "$data" |\
sed -r 's/([[:digit:]])\.([[:digit:]]{3})/\1\2/g' |\
sed -r 's/([[:digit:]]),([[:digit:]]{2})/\1.\2/g' |\
sed -r 's/,/./g' |\
sed -r 's/;/,/g'

exit 0

# americanize-csv.sh ends here
#+end_src

Would I have the possibility to play with =stdin=, I could have "hidden" that
first line, and assume all the code I'm writing will be executed against
what's read on =stdin=. As well in the Org buffer, as in the stand-alone shell
script. Right?

#+begin_src sh
#!/bin/bash
# americanize-csv.sh -- Convert CSV file to American format

# Usage: americanize-csv FILE.CSV

cat $1 |\
sed -r 's/^([[:digit:]]{2})-([[:digit:]]{2})-([[:digit:]]{4})/\2\/\1\/\3/g' |\
sed -r 's/^([[:digit:]]{2})\/([[:digit:]]{2})\/([[:digit:]]{2})/\2\/\1\/20\3/g' |\
sed -r 's/([[:digit:]])\.([[:digit:]]{3})/\1\2/g' |\
sed -r 's/([[:digit:]]),([[:digit:]]{2})/\1.\2/g' |\
sed -r 's/,/./g' |\
sed -r 's/;/,/g'

exit 0

# americanize-csv.sh ends here
#+end_src

* Conclusions

As you can see, I did not really mean any concurrent execution. Simply being
able to execute parts of code in-situ, in the Org buffer, to document (and
test) what I'm writing.

And to be able to assemble all the parts in one single script file, by the
means of literate programming.

Best regards,
  Seb

-- 
Sébastien Vauban

  reply	other threads:[~2011-05-25 12:30 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-05-24  9:31 org babel support for tcl and awk orgmode
2011-05-24 12:51 ` Eric Schulte
2011-05-24 17:53   ` Eric S Fraga
2011-05-24 19:03     ` Eric Schulte
2011-05-24 19:55       ` Sebastien Vauban
2011-05-24 23:51         ` Eric Schulte
2011-05-25 12:30           ` Sebastien Vauban [this message]
2011-05-25 15:57             ` Eric Schulte
2011-05-26 11:18               ` Sebastien Vauban
2011-05-26 13:37                 ` Eric Schulte
2011-05-26 13:03       ` Eric Schulte
2011-05-26 15:15         ` Eric S Fraga
2011-05-24 18:57   ` orgmode

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.orgmode.org/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=80aaeb2cae.fsf@somewhere.org \
    --to=wxhgmqzgwmuf-genee64ty+gs+fvcfc7uqw@public.gmane.org \
    --cc=emacs-orgmode-mXXj517/zsQ@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).