From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Alan E. Davis" Subject: Re: Keeping an advanced dictionary in Org-mode? Date: Tue, 7 Jun 2011 00:50:29 +1000 Message-ID: References: <4DECA029.2000905@christianmoe.com> Mime-Version: 1.0 Content-Type: multipart/alternative; boundary=000e0ce0d120b686d304a50c3a84 Return-path: Received: from eggs.gnu.org ([140.186.70.92]:59550) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1QTb8p-0001L7-6d for emacs-orgmode@gnu.org; Mon, 06 Jun 2011 10:50:57 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1QTb8m-000836-Jb for emacs-orgmode@gnu.org; Mon, 06 Jun 2011 10:50:54 -0400 Received: from mail-wy0-f169.google.com ([74.125.82.169]:49724) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1QTb8l-00082h-Sl for emacs-orgmode@gnu.org; Mon, 06 Jun 2011 10:50:52 -0400 Received: by wyf19 with SMTP id 19so3596654wyf.0 for ; Mon, 06 Jun 2011 07:50:50 -0700 (PDT) In-Reply-To: <4DECA029.2000905@christianmoe.com> List-Id: "General discussions about Org-mode." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-orgmode-bounces+geo-emacs-orgmode=m.gmane.org@gnu.org Sender: emacs-orgmode-bounces+geo-emacs-orgmode=m.gmane.org@gnu.org To: mail@christianmoe.com Cc: Org Mode --000e0ce0d120b686d304a50c3a84 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable FWIW: I won't get into it much for now, but I have used a "band format" for lexical data. There are other names for this type of free form database. = I wrote a crude elisp routine to recover entries into LaTeX formatted files. A "band" is a record, so to speak. I am not very well qualified in this, but was able to use it to record lexical data. You may find some linguists= ' websites where this or similar formats are elucidated. A record starts wit= h a double dotted key, and information categories may be made up on the fly, as marked by single-dotted keys, preceded by at least two spaces. I thin= k it's convenient for a record to be delineated by a line feed, as well. ..HW .D .GE .NS .NCE .NCs .R .RC This is just a made up case, but perhaps you can catch the drift. Here are a couple of simple cases from my files: ..hw tutubi ,lang vis .nce dragonfly .source FSD ..HW sigai .lang vis .ge (mollusc) shell, when empty ..hw soksok .ec gecko .cg .la ilo .src hanna .n ..hw locus .ec octopus .cg .la ilongo .src hannah .n see nucus [vis]; kuus [chuukese] ..hw tikling .ec heron .cg .la vis .from fsd ..hw nucus .ec octopus .cg .la vis .src fsd, hannah .n related to chuukese kuus Fairly straightforward elisp would scan a record and wrapping each item in = a particular typeface. To get an idea of the output. Each line was output as an \item in a list. This got to be a LITTLE cumbersome, perhaps, and someone good at coding would do it differently. The idea is that a lisp routine scans the records and spits out list items. This could be any kind of output, and perhaps or= g mode would be a good way to rig a routine to scan list items and output different band types as slanted (\sl), roman, or italicized components. \item [{\sl k\'{u}\'{u}s\/}$_{3}$] \index{k\'{u}\'{u}s} \quad Small, night-time octopus. HADJ E\'{e}t. \item [{\sl k\'{u}\'{u}s\/}$_{4}$] \index{k\'{u}\'{u}s} \quad Daytime octopus. {\sc syn\/}:\ {\sl nippach}. {\sc alt\/}:\ {\sl k\'{u}\'{u}h}. \HADJ F\'{o}n\'{o}. \item [{\sl k\'{u}\'{u}sen neepwin\/}] \index{k\'{u}\'{u}sen neepwin} \quad {\sc see\/}:\ {\sl nippachin neepwin}. Even though this is not said, it would be the correct way to say it. \HADJ Wonip. This may not be an appealing approach. I am still pleased with the ability to flexibly add band keys on the fly, during data entry, and the potential to use LaTeX as a frontend. HTML would also be useful, depending on how yo= u wish to read your dictionary. Not a perfect system. Linguists have done better. Robert Hsu of the University of Hawaii built a system around SPITBOL and maybe SNOBOL4. I wa= s hopelessly lost trying to use those, but elisp did what little I needed. I think that it may be possible to organize a database using org-mode. For now, I have a capture template for data entry, such as it is: ("=3D" "lex" entry (file+headline "lexicon.org" "Unsorted") "* ..hw %^{Headword} .gs %^{Scientific Name} .ge %^{English Gloss} .ec %^{Englis= h Common Name} .cg %^{Category} .la %^{Language} .src %^{Informant} .n %^{Note} %? .dt %u " :prepend t :immediate-finish t) Again, FWIW. To me, a great deal. Maybe to others, not so great of a deal. Alan On Mon, Jun 6, 2011 at 7:38 PM, Christian Moe wrote= : > Hi, > > Is anybody using Org-mode to build an advanced dictionary with sub-entrie= s, > tags etc.? Would you be willing to share a setup? > > For example, the obvious way to build a dictionary would be to use a > dictionary list (I borrow a few English-French lines from the wonderful > WordReference.com site): > > - pine :: > (/pa=C9=AAn/) > 1. /m noun/ [bot.] pin; *stripped ~* pin d=C3=A9cap=C3=A9. > 2. /intr verb/ languir (*for* apr=C3=A8s; *to do* de faire) > > This looks nice, but unfortunately, you cannot set tags or properties on > dictionary terms, so it's not particularly amenable to fancy searching, > mapping etc. > > On the other hand, you could do something like this: > > * pine > :PROPERTIES: > :Pronunciation: /pa=C9=AAn/ > :END: > ** pin :bot: > :PROPERTIES: > :Word_class: noun > :Gender: m > :END: > *stripped ~* pin d=C3=A9cap=C3=A9. > ** languir > :PROPERTIES: > :Word_class: verb > :Transitivity: intr > :END: > (*for* apr=C3=A8s; *to do* de faire) > > It's a pain to do, and because of outline folding, it could be a pain to > look up meanings, and you might need to do some serious post-processing o= n > the export to make it look anything like a dictionary. But when you're do= ne, > you could extract a list of all botanical terms (:bot:), or of words and > pronunciations only... etc. > > So for my growing pile of translation notes, I might like to keep that ki= nd > of thing. But there are so many ways it could be organized - what do you = put > in subheadings? what in entry text below subheadings? what in tags, what = in > properties? etc. So if someone has an example that works for them, I'd li= ke > to see one. > > (Org may not be the best tool for this job, of course, but it's the right > tool for me...) > > Yours, > Christian > > > --000e0ce0d120b686d304a50c3a84 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable FWIW:

I won't get into it much for now, but I have used a "= band format" for lexical data.=C2=A0 There are other names for this ty= pe of free form database.=C2=A0 I wrote a crude elisp routine to recover en= tries into LaTeX formatted files.=C2=A0

A "band" is a record, so to speak.=C2=A0 I am not very well q= ualified in this, but was able to use it to record lexical data.=C2=A0 You = may find some linguists' websites where this or similar formats are elu= cidated.=C2=A0 A record starts with a double dotted key, and information ca= tegories may be made up on the fly, as marked by=C2=A0 single-dotted keys, = preceded by at least two spaces. =C2=A0 I think it's convenient for a r= ecord to be delineated by a line feed, as well.=C2=A0

..HW <headword>=C2=A0 .D <local dialectZ> =C2=A0 .GE <En= glish gloss>=C2=A0=C2=A0 .NS <scientific name>=C2=A0 .NCE=C2=A0 &l= t;Common Name>=C2=A0=C2=A0 .NCs <Spanish Common Name>=C2=A0 .R=C2= =A0 <remark>=C2=A0 .RC <Remark on Cultural Signficance>=C2=A0 <= br>
This is just a made up case, but perhaps you can catch the drift.
Here are a couple of simple cases from my files:

..hw tutubi=C2=A0= =C2=A0 ,lang vis=C2=A0 .nce dragonfly=C2=A0=C2=A0=C2=A0 .source FSD
..HW= sigai=C2=A0=C2=A0=C2=A0 .lang vis .ge (mollusc) shell, when empty
..hw soksok=C2=A0 .ec gecko=C2=A0 .cg=C2=A0 .la ilo=C2=A0 .src hanna .n
= ..hw locus=C2=A0 .ec octopus=C2=A0 .cg=C2=A0 .la ilongo=C2=A0 .src hannah .= n see nucus [vis]; kuus [chuukese]
..hw tikling=C2=A0 .ec heron=C2=A0 .c= g=C2=A0 .la vis=C2=A0 .from fsd=C2=A0
..hw nucus=C2=A0 .ec octopus=C2= =A0 .cg=C2=A0 .la vis=C2=A0 .src fsd, hannah=C2=A0 .n related to chuukese k= uus

Fairly straightforward elisp would scan a record and wrapping each item= in a particular typeface.=C2=A0

To get an idea of the output.=C2= =A0 Each line was output as an \item in a list.=C2=A0 This got to be a LITT= LE cumbersome, perhaps, and someone good at coding would do it differently.= =C2=A0 The idea is that a lisp routine scans the records and spits out list= items.=C2=A0 This could be any kind of output, and perhaps org mode would = be a good way to rig a routine to scan list items and output different band= types as slanted (\sl), roman, or italicized components.

\item [{\sl k\'{u}\'{u}s\/}$_{3}$]=C2=A0=C2=A0 \index{k\'{u= }\'{u}s} \quad=C2=A0=C2=A0=C2=A0=C2=A0 Small, night-time octopus.=C2=A0= =C2=A0 HADJ=C2=A0 E\'{e}t.=C2=A0

\item [{\sl k\'{u}\'{u= }s\/}$_{4}$]=C2=A0=C2=A0 \index{k\'{u}\'{u}s} \quad=C2=A0=C2=A0=C2= =A0=C2=A0 Daytime octopus.=C2=A0=C2=A0 {\sc syn\/}:\ {\sl=C2=A0 nippach}.= =C2=A0=C2=A0=C2=A0 {\sc alt\/}:\ {\sl k\'{u}\'{u}h}.=C2=A0=C2=A0=C2= =A0 \HADJ=C2=A0 F\'{o}n\'{o}.=C2=A0

\item [{\sl k\'{u}\'{u}sen neepwin\/}]=C2=A0=C2=A0 \index{k\= 9;{u}\'{u}sen neepwin} \quad=C2=A0=C2=A0=C2=A0 {\sc see\/}:\ {\sl nippa= chin neepwin}.=C2=A0=C2=A0=C2=A0 Even though this is not said, it would be = the correct way to say it. \HADJ=C2=A0 Wonip.=C2=A0


This may not be an appealing approach.=C2=A0 I am still pleased wit= h the ability to flexibly add band keys on the fly, during data entry, and = the potential to use LaTeX as a frontend.=C2=A0 HTML would also be useful, = depending on how you wish to read your dictionary.

Not a perfect system.=C2=A0 Linguists have done better.=C2=A0 Robert Hs= u of the University of Hawaii built a system around SPITBOL and maybe SNOBO= L4.=C2=A0 I was hopelessly lost trying to use those, but elisp did what lit= tle I needed.=C2=A0 I think that it may be possible to organize a database = using org-mode.=C2=A0

For now, I have a capture template for data entry, such as it is:
("=3D" "lex" entry (file+headline "lexicon.org" "Unsorted") "* ..hw = %^{Headword}=C2=A0 .gs %^{Scientific Name}=C2=A0 .ge %^{English Gloss}=C2= =A0 .ec %^{English Common Name}=C2=A0 .cg %^{Category}=C2=A0 .la %^{Languag= e}=C2=A0 .src %^{Informant} .n %^{Note} %?=C2=A0 .dt %u " :prepend t := immediate-finish t)

Again, FWIW.=C2=A0 To me, a great deal.=C2=A0 Maybe to others, not so g= reat of a deal.=C2=A0

Alan


On= Mon, Jun 6, 2011 at 7:38 PM, Christian Moe <mail@christianmoe.com> wrot= e:
Hi,

Is anybody using Org-mode to build an advanced dictionary with sub-entries,= tags etc.? Would you be willing to share a setup?

For example, the obvious way to build a dictionary would be to use a dictio= nary list (I borrow a few English-French lines from the wonderful WordRefer= ence.com site):

- pine ::
=C2=A0(/pa=C9=AAn/)
=C2=A01. /m noun/ [bot.] pin; *stripped ~* pin d=C3=A9cap=C3=A9.
=C2=A02. /intr verb/ languir (*for* apr=C3=A8s; *to do* de faire)

This looks nice, but unfortunately, you cannot set tags or properties on di= ctionary terms, so it's not particularly amenable to fancy searching, m= apping etc.

On the other hand, you could do something like this:

* pine
=C2=A0:PROPERTIES:
=C2=A0:Pronunciation: /pa=C9=AAn/
=C2=A0:END:
** pin =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0:bot:
=C2=A0 :PROPERTIES:
=C2=A0 :Word_class: noun
=C2=A0 :Gender: =C2=A0 m
=C2=A0 :END: =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0
=C2=A0 *stripped ~* pin d=C3=A9cap=C3=A9.
** languir
=C2=A0 :PROPERTIES:
=C2=A0 :Word_class: verb
=C2=A0 :Transitivity: intr
=C2=A0 :END:
=C2=A0 (*for* apr=C3=A8s; *to do* de faire)

It's a pain to do, and because of outline folding, it could be a pain t= o look up meanings, and you might need to do some serious post-processing o= n the export to make it look anything like a dictionary. But when you'r= e done, you could extract a list of all botanical terms (:bot:), or of word= s and pronunciations only... etc.

So for my growing pile of translation notes, I might like to keep that kind= of thing. But there are so many ways it could be organized - what do you p= ut in subheadings? what in entry text below subheadings? what in tags, what= in properties? etc. So if someone has an example that works for them, I= 9;d like to see one.

(Org may not be the best tool for this job, of course, but it's the rig= ht tool for me...)

Yours,
Christian



--000e0ce0d120b686d304a50c3a84--