emacs-orgmode@gnu.org archives
 help / color / mirror / code / Atom feed
* Converting org-mode/org-contacts to VCard (importing to Android)
@ 2013-11-22 16:37 Karl Voit
  2013-11-22 16:54 ` Rüdiger Sonderfeld
  2013-11-23  8:29 ` Feng Shu
  0 siblings, 2 replies; 19+ messages in thread
From: Karl Voit @ 2013-11-22 16:37 UTC (permalink / raw)
  To: emacs-orgmode

Hi!

I wrote a Python script that parses an Org-mode file in order to
generate a VCard 2.1 compatible output file I am using to import to
my Android 4.4 device:

    https://github.com/novoid/org-contacts2vcard

The reason I wrote it in Python is that I don't know ELISP well
enough. The reason I wrote the script instead of using existing
export methods: I only want to export a small sub-set (names, phone
numbers, email addresses, contact image) due to privacy reasons.

So far, it is a one-direction approach and no synchronization
solution.



By the way: does somebody know of any somewhat intelligent tool that
is able to compare two different VCard files? The main issue here is
the fact that VCard order and property order within a single VCard
can be different but the VCard file could still contain the same
information. So line-by-line comparisons like diff do not work here.

-- 
mail|git|SVN|photos|postings|SMS|phonecalls|RSS|CSV|XML to Org-mode:
       > get Memacs from https://github.com/novoid/Memacs <

https://github.com/novoid/extract_pdf_annotations_to_orgmode + more on github

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Converting org-mode/org-contacts to VCard (importing to Android)
  2013-11-22 16:37 Converting org-mode/org-contacts to VCard (importing to Android) Karl Voit
@ 2013-11-22 16:54 ` Rüdiger Sonderfeld
  2013-11-22 17:09   ` Karl Voit
  2013-11-23  8:29 ` Feng Shu
  1 sibling, 1 reply; 19+ messages in thread
From: Rüdiger Sonderfeld @ 2013-11-22 16:54 UTC (permalink / raw)
  To: emacs-orgmode, news1142

On Friday 22 November 2013 17:37:01 Karl Voit wrote:
> The reason I wrote it in Python is that I don't know ELISP well
> enough. The reason I wrote the script instead of using existing
> export methods: I only want to export a small sub-set (names, phone
> numbers, email addresses, contact image) due to privacy reasons.

That should be possible with the existing VCard export.  See `org-contacts-
ignore-property' to ignore specific properties.  And `org-contacts-export-as-
vcard' takes a NAME parameter to limit the names.

Regards,
Rüdiger

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Converting org-mode/org-contacts to VCard (importing to Android)
  2013-11-22 16:54 ` Rüdiger Sonderfeld
@ 2013-11-22 17:09   ` Karl Voit
  2013-11-23  0:00     ` Rüdiger Sonderfeld
  2013-11-23  8:36     ` Converting org-mode/org-contacts to VCard (importing to Android) Feng Shu
  0 siblings, 2 replies; 19+ messages in thread
From: Karl Voit @ 2013-11-22 17:09 UTC (permalink / raw)
  To: emacs-orgmode

* Rüdiger Sonderfeld <ruediger@c-plusplus.de> wrote:
> On Friday 22 November 2013 17:37:01 Karl Voit wrote:
>> The reason I wrote it in Python is that I don't know ELISP well
>> enough. The reason I wrote the script instead of using existing
>> export methods: I only want to export a small sub-set (names, phone
>> numbers, email addresses, contact image) due to privacy reasons.
>
> That should be possible with the existing VCard export.  See `org-contacts-
> ignore-property' to ignore specific properties.  And `org-contacts-export-as-
> vcard' takes a NAME parameter to limit the names.

Fair enough :-)

However, I did additional things like checks, filtering, and so
forth that were important to my data-set. E.g., my contact template
does contain "0043/" as a pre-filled content for phone numbers. I
wanted to ignore those fields that got only this template and not a
complete phone number. I also wanted to get warnings in case some
data does not fulfill certain other requirements.

I have to admit that I don't know the feature-set of the Org-mode
export. I would be very surprised, if the Org-mode export method is
able to follow my custom "photo:" link I am using, grab the image
file, test if it has a image format that works with VCard
2.1 on Android, and encodes it in base64 accordingly.

You see: I want to have ways to tweak the export process. And as
long as I don't know ELISP that well, I stick to the tools I know.


A side remark of mine: a couple of months ago I tried to find out
how to store address information, phone numbers, and so on in
org-contact properties. AFAIR I could not find anything except the
:EMAIL: property. Is there a standard out there that answers
questions like "separate street from house number?", "how to cope
with multiple addresses for one contact?", and so forth? I created
something on my own as you can see on [1].


I am happy if you can get benefit from my little project and I am
also happy when Org-mode offers a great export functionality for the
rest of us :-)

  1. https://raw.github.com/novoid/org-contacts2vcard/master/testdata/testcontacts.org
-- 
mail|git|SVN|photos|postings|SMS|phonecalls|RSS|CSV|XML to Org-mode:
       > get Memacs from https://github.com/novoid/Memacs <

https://github.com/novoid/extract_pdf_annotations_to_orgmode + more on github

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Converting org-mode/org-contacts to VCard (importing to Android)
  2013-11-22 17:09   ` Karl Voit
@ 2013-11-23  0:00     ` Rüdiger Sonderfeld
  2013-11-24 13:06       ` Implementing Org-mode tools in languages other than ELISP (was: Converting org-mode/org-contacts to VCard (importing to Android)) Karl Voit
  2013-11-23  8:36     ` Converting org-mode/org-contacts to VCard (importing to Android) Feng Shu
  1 sibling, 1 reply; 19+ messages in thread
From: Rüdiger Sonderfeld @ 2013-11-23  0:00 UTC (permalink / raw)
  To: emacs-orgmode, news1142

On Friday 22 November 2013 18:09:42 Karl Voit wrote:
> I have to admit that I don't know the feature-set of the Org-mode
> export. I would be very surprised, if the Org-mode export method is
> able to follow my custom "photo:" link I am using, grab the image
> file, test if it has a image format that works with VCard
> 2.1 on Android, and encodes it in base64 accordingly.

Org-contacts has an :ICON: property and supports Gravatar.  It doesn't seem to 
be handled in the VCard export though.
 
> You see: I want to have ways to tweak the export process. And as
> long as I don't know ELISP that well, I stick to the tools I know.

I understand that and it solved your problem for now.  But having an external 
tool in a different programming language is usually not a good idea to solve 
the problem in the long run.  The code base of org-contacts and your tool is 
under the risk of diverting quickly.  If it's in org-contacts then it is 
maintained in one piece and easily accessible to other users.

So my point is you should take a look at elisp.  It's a lot of fun to use and 
if you are using org-mode and Emacs then you will have to learn it sooner or 
later.

> A side remark of mine: a couple of months ago I tried to find out
> how to store address information, phone numbers, and so on in
> org-contact properties. AFAIR I could not find anything except the
> 
> :EMAIL: property. Is there a standard out there that answers
> 
> questions like "separate street from house number?", "how to cope
> with multiple addresses for one contact?", and so forth? I created
> something on my own as you can see on [1].

I have to admit the org-contacts format is pretty much ad-hoc and not really 
well designed.  It is documented a bit in the file itself 
(contrib/lisp/contacts.el).  M-x customize-group RET org-contacts RET should 
also tell you more about the options.

Your format choice is not fully compatible with the existing org-contacts.  
Right now multiple entries are separated by space (which sadly breaks for 
addresses) and different entry names are used.

However I'd look forward to some new ideas and improvements.  Right now it's 
not ideal solution.

Regards,
Rüdiger

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Converting org-mode/org-contacts to VCard (importing to Android)
  2013-11-22 16:37 Converting org-mode/org-contacts to VCard (importing to Android) Karl Voit
  2013-11-22 16:54 ` Rüdiger Sonderfeld
@ 2013-11-23  8:29 ` Feng Shu
  1 sibling, 0 replies; 19+ messages in thread
From: Feng Shu @ 2013-11-23  8:29 UTC (permalink / raw)
  To: emacs-orgmode

Karl Voit <devnull@Karl-Voit.at> writes:

> Hi!
>
> I wrote a Python script that parses an Org-mode file in order to
> generate a VCard 2.1 compatible output file I am using to import to
> my Android 4.4 device:
>
>     https://github.com/novoid/org-contacts2vcard
>
> The reason I wrote it in Python is that I don't know ELISP well
> enough. The reason I wrote the script instead of using existing
> export methods: I only want to export a small sub-set (names, phone
> numbers, email addresses, contact image) due to privacy reasons.

the below function  will only export name, phones and email

#+begin_src
(defun org-contacts-vcard-format (contact)
  "Formats CONTACT in VCard 3.0 format."
  (let* ((properties (caddr contact))
	 (name (org-contacts-vcard-escape (car contact)))
	 (n (org-contacts-vcard-encode-name name))
	 (email (cdr (assoc-string org-contacts-email-property properties)))
	 (tel (cdr (assoc-string org-contacts-tel-property properties)))
	 (ignore-list (cdr (assoc-string org-contacts-ignore-property properties)))
	 (ignore-list (when ignore-list
			(org-contacts-split-property ignore-list)))
	 (head (format "BEGIN:VCARD\nVERSION:3.0\nN:%s\nFN:%s\n" n name))
	 emails-list result phones-list)
    (concat head
	    (when email (progn
			  (setq emails-list (org-contacts-remove-ignored-property-values ignore-list (org-contacts-split-property email)))
			  (setq result "")
			  (while emails-list
			    (setq result (concat result  "EMAIL:" (org-contacts-strip-link (car emails-list)) "\n"))
			    (setq emails-list (cdr emails-list)))
			  result))
	    (when tel (progn
			(setq phones-list (org-contacts-remove-ignored-property-values ignore-list (org-contacts-split-property tel)))
			(setq result "")
			(while phones-list
			  (setq result (concat result  "TEL:" (org-link-unescape (org-contacts-strip-link (car phones-list))) "\n"))
			  (setq phones-list (cdr phones-list)))
			result))
	    "END:VCARD\n\n")))

#+end_src

>
> So far, it is a one-direction approach and no synchronization
> solution.
>
>
>
> By the way: does somebody know of any somewhat intelligent tool that
> is able to compare two different VCard files? The main issue here is
> the fact that VCard order and property order within a single VCard
> can be different but the VCard file could still contain the same
> information. So line-by-line comparisons like diff do not work here.

This may be difficult, I use org-contacts and use a elisp function to
merge all the contacs which have  same name. then export contacts to a
vcard file.

-- 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Converting org-mode/org-contacts to VCard (importing to Android)
  2013-11-22 17:09   ` Karl Voit
  2013-11-23  0:00     ` Rüdiger Sonderfeld
@ 2013-11-23  8:36     ` Feng Shu
  1 sibling, 0 replies; 19+ messages in thread
From: Feng Shu @ 2013-11-23  8:36 UTC (permalink / raw)
  To: emacs-orgmode

Karl Voit <devnull@Karl-Voit.at> writes:

> * Rüdiger Sonderfeld <ruediger@c-plusplus.de> wrote:
>> On Friday 22 November 2013 17:37:01 Karl Voit wrote:
>>> The reason I wrote it in Python is that I don't know ELISP well
>>> enough. The reason I wrote the script instead of using existing
>>> export methods: I only want to export a small sub-set (names, phone
>>> numbers, email addresses, contact image) due to privacy reasons.
>>
>> That should be possible with the existing VCard export.  See `org-contacts-
>> ignore-property' to ignore specific properties.  And `org-contacts-export-as-
>> vcard' takes a NAME parameter to limit the names.
>
> Fair enough :-)
>
> However, I did additional things like checks, filtering, and so
> forth that were important to my data-set. E.g., my contact template
> does contain "0043/" as a pre-filled content for phone numbers. I
> wanted to ignore those fields that got only this template and not a
> complete phone number. I also wanted to get warnings in case some
> data does not fulfill certain other requirements.

use (replace-regexp-in-string "^[0-9]\\{4,4\\}/" "" "0043/333/333")

#+begin_src
(defun org-contacts-vcard-format (contact)
  "Formats CONTACT in VCard 3.0 format."
  (let* ((properties (caddr contact))
	 (name (org-contacts-vcard-escape (car contact)))
	 (n (org-contacts-vcard-encode-name name))
	 (email (cdr (assoc-string org-contacts-email-property properties)))
	 (tel (cdr (assoc-string org-contacts-tel-property properties)))
	 (ignore-list (cdr (assoc-string org-contacts-ignore-property properties)))
	 (ignore-list (when ignore-list
			(org-contacts-split-property ignore-list)))
	 (note (cdr (assoc-string org-contacts-note-property properties)))
	 (bday (org-contacts-vcard-escape (cdr (assoc-string org-contacts-birthday-property properties))))
	 (addr (cdr (assoc-string org-contacts-address-property properties)))
	 (nick (org-contacts-vcard-escape (cdr (assoc-string org-contacts-nickname-property properties))))
	 (head (format "BEGIN:VCARD\nVERSION:3.0\nN:%s\nFN:%s\n" n name))
	 emails-list result phones-list)
    (concat head
	    (when email (progn
			  (setq emails-list (org-contacts-remove-ignored-property-values ignore-list (org-contacts-split-property email)))
			  (setq result "")
			  (while emails-list
			    (setq result (concat result  "EMAIL:" (org-contacts-strip-link (car emails-list)) "\n"))
			    (setq emails-list (cdr emails-list)))
			  result))
	    (when addr
	      (format "ADR:;;%s\n" (replace-regexp-in-string "\\, ?" ";" addr)))
	    (when tel (progn
			(setq phones-list (org-contacts-remove-ignored-property-values ignore-list (org-contacts-split-property tel)))
			(setq result "")
			(while phones-list
			  (setq result (concat result  "TEL:"  (replace-regexp-in-string "^[0-9]\\{4,4\\}/" ""  (org-link-unescape (org-contacts-strip-link (car phones-list)))) "\n"))
			  (setq phones-list (cdr phones-list)))
			result))
	    (when bday
	      (let ((cal-bday (calendar-gregorian-from-absolute (org-time-string-to-absolute bday))))
		(format "BDAY:%04d-%02d-%02d\n"
			(calendar-extract-year cal-bday)
			(calendar-extract-month cal-bday)
			(calendar-extract-day cal-bday))))
	    (when nick (format "NICKNAME:%s\n" nick))
	    (when note (format "NOTE:%s\n" note))
	    "END:VCARD\n\n")))

#+end_src

>
> I have to admit that I don't know the feature-set of the Org-mode
> export. I would be very surprised, if the Org-mode export method is
> able to follow my custom "photo:" link I am using, grab the image
> file, test if it has a image format that works with VCard
> 2.1 on Android, and encodes it in base64 accordingly.
>
> You see: I want to have ways to tweak the export process. And as
> long as I don't know ELISP that well, I stick to the tools I know.
>
>
> A side remark of mine: a couple of months ago I tried to find out
> how to store address information, phone numbers, and so on in
> org-contact properties. AFAIR I could not find anything except the
> :EMAIL: property. Is there a standard out there that answers
> questions like "separate street from house number?", "how to cope
> with multiple addresses for one contact?", and so forth? I created
> something on my own as you can see on [1].
>
>
> I am happy if you can get benefit from my little project and I am
> also happy when Org-mode offers a great export functionality for the
> rest of us :-)
>
>   1. https://raw.github.com/novoid/org-contacts2vcard/master/testdata/testcontacts.org-- 
> mail|git|SVN|photos|postings|SMS|phonecalls|RSS|CSV|XML to Org-mode:
>        > get Memacs from https://github.com/novoid/Memacs <
>
> https://github.com/novoid/extract_pdf_annotations_to_orgmode + more on github

-- 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Implementing Org-mode tools in languages other than ELISP (was: Converting org-mode/org-contacts to VCard (importing to Android))
  2013-11-23  0:00     ` Rüdiger Sonderfeld
@ 2013-11-24 13:06       ` Karl Voit
  2014-01-04 14:58         ` Implementing Org-mode tools in languages other than ELISP Bastien
  0 siblings, 1 reply; 19+ messages in thread
From: Karl Voit @ 2013-11-24 13:06 UTC (permalink / raw)
  To: emacs-orgmode

Executive summary of this rather long email: I am aware that ELISP
is the language of choice for Org-mode features/tools. Here, I
describe my motivation behind using Python instead.


* Rüdiger Sonderfeld <ruediger@c-plusplus.de> wrote:
> On Friday 22 November 2013 18:09:42 Karl Voit wrote:
>
> Org-contacts has an :ICON: property and supports Gravatar.  It doesn't seem to 
> be handled in the VCard export though.

:ICON:, I see. Thanks.

>> You see: I want to have ways to tweak the export process. And as
>> long as I don't know ELISP that well, I stick to the tools I know.
>
> I understand that and it solved your problem for now.  

Exactly.

> But having an external tool in a different programming language is
> usually not a good idea to solve the problem in the long run.  The
> code base of org-contacts and your tool is under the risk of
> diverting quickly.  If it's in org-contacts then it is maintained
> in one piece and easily accessible to other users.

Don't worry: I totally agree. :-)

> So my point is you should take a look at elisp.  It's a lot of fun
> to use and if you are using org-mode and Emacs then you will have
> to learn it sooner or later.

I tried but I could not get a decent progress to implement the
features I want to use. It is a rather high learning effort. I am
not only referring to ELISP as a language. The basics are not that
hard to learn. However, the more important part is to get into the
existing libraries and their feature-set. For me, I could not get
into it or I am not patient any more :-) It might be laziness or my
brain might not be compatible with the world of functional
programming languages.

Therefore, I develop all my Org-mode tools with Python which I am
comfortably with. I have done various things and put it on
http://github.com/novoid

I agree that implementing this stuff in ELISP would have been better
for the community. However, as long as I don't have an ELISP
code-monkey that implements my ideas and wishes, I have to stick to
Python which is doing well to me and I don't have to invest a couple
of weeks/months of not being that productive. You don't have to
forget that I am not a programmer - I am an advanced user who
is tweaking his personal set-up in a small sub-set of his spare
time.

If the features of my tools are implemented in Org-mode as well, I
feel happy about it. I don't want to write "please add this highly
sophisticated feature to Org-mode"-messages on the ML and wait for
somebody to implement it. I can do it on my own (in Python) and I am
able to do it the way I need/want and I am able to *use* it right
away. Works for me.

Additionally, I would never be able to implement Memacs (see sig)
without the help of several students of mine. And here is the next
thing: I could get several students with Python-knowledge and no
one(!) with (E)LISP knowledge. Sad but true. I have the feeling that
ELISP knowledge is found only at a small set of experts.

Therefore: I did it in Python and I am aware that this is not the
best thing to do. However, if somebody finds my stuff handy, she/he
can grab it from github. If somebody re-implements it in ELISP, I am
fine as well.

It is even "worse" than that: I totally insist on writing a complete
stand-alone blog system which parses my Org-mode files and generates
(static) HTML5: https://github.com/novoid/lazyblorg Bam! Worst case
scenario! :-)

I tried to get other people infected with my thoughts [1] on a IMHO
perfect blog system. So far, it seems that everybody is happy with
the blog generating systems we do have now.

When I stick to my current development velocity of lazyblorg, it
will be finished right for the Christmas season ... of 2014 ;-)

> I have to admit the org-contacts format is pretty much ad-hoc and
> not really well designed.  It is documented a bit in the file
> itself (contrib/lisp/contacts.el).  M-x customize-group RET
> org-contacts RET should also tell you more about the options.

Thanks for the pointer.

However, I consider my template a bit more elaborated since I want
to differ things like, e.g., mobile phone, work phone, land-line
phone, and so forth.

> Your format choice is not fully compatible with the existing
> org-contacts.  Right now multiple entries are separated by space
> (which sadly breaks for addresses) and different entry names are
> used.
>
> However I'd look forward to some new ideas and improvements.
> Right now it's not ideal solution.

I am glad to help here as well if my help is needed. The current
examples in org-contacts.el were not able to suit my personal
requirements. Therefore, I did my own definitions. In future, I will
derive my complete mailserver whitelist directly from my Org-mode
contacts and more.

  1. http://article.gmane.org/gmane.emacs.orgmode/49747/
-- 
mail|git|SVN|photos|postings|SMS|phonecalls|RSS|CSV|XML to Org-mode:
       > get Memacs from https://github.com/novoid/Memacs <

https://github.com/novoid/extract_pdf_annotations_to_orgmode + more on github

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Implementing Org-mode tools in languages other than ELISP
  2013-11-24 13:06       ` Implementing Org-mode tools in languages other than ELISP (was: Converting org-mode/org-contacts to VCard (importing to Android)) Karl Voit
@ 2014-01-04 14:58         ` Bastien
  2014-01-04 16:43           ` François Pinard
  0 siblings, 1 reply; 19+ messages in thread
From: Bastien @ 2014-01-04 14:58 UTC (permalink / raw)
  To: Karl Voit; +Cc: news1142, emacs-orgmode

Karl Voit <devnull@Karl-Voit.at> writes:

> Therefore, I develop all my Org-mode tools with Python which I am
> comfortably with. I have done various things and put it on
> http://github.com/novoid

FWIW, I think it's good to develop tools for Org not only in Elisp but
also in other languages: Org is not just an Emacs module, it's also a
format, used outside Emacs.  E.g. .org files on github.

-- 
 Bastien

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Implementing Org-mode tools in languages other than ELISP
  2014-01-04 14:58         ` Implementing Org-mode tools in languages other than ELISP Bastien
@ 2014-01-04 16:43           ` François Pinard
  2014-01-05 16:43             ` Daniel Clemente
  0 siblings, 1 reply; 19+ messages in thread
From: François Pinard @ 2014-01-04 16:43 UTC (permalink / raw)
  To: emacs-orgmode

Bastien <bzg@gnu.org> writes:

> Karl Voit <devnull@Karl-Voit.at> writes:

>> Therefore, I develop all my Org-mode tools with Python which I am
>> comfortably with. I have done various things and put it on
>> http://github.com/novoid

> FWIW, I think it's good to develop tools for Org not only in Elisp but
> also in other languages: Org is not just an Emacs module, it's also a
> format, used outside Emacs.  E.g. .org files on github.

I dream of having a general Python parser for Org mode files, knowing
every bit about the current syntax for Org files, surrounded by enough
Python machinery to make it useful.

One non-negligible problem is that such a tool, to be very complete,
would need an Emacs Lisp interpreter, which is all of an undertaking in
itself.  Maybe that some half-heated compromise could be developed?  A
hundredth-hearted compromise is likely the most I could do!  :-)

François

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Implementing Org-mode tools in languages other than ELISP
  2014-01-04 16:43           ` François Pinard
@ 2014-01-05 16:43             ` Daniel Clemente
  2014-01-06 10:44               ` Parsing Org-mode in Python (was: Implementing Org-mode tools in languages other than ELISP) Karl Voit
  2014-01-07  1:47               ` Implementing Org-mode tools in languages other than ELISP François Pinard
  0 siblings, 2 replies; 19+ messages in thread
From: Daniel Clemente @ 2014-01-05 16:43 UTC (permalink / raw)
  To: François Pinard; +Cc: emacs-orgmode

> 
> I dream of having a general Python parser for Org mode files, knowing
> every bit about the current syntax for Org files, surrounded by enough
> Python machinery to make it useful.
> 

Try PyOrgMode (https://github.com/bjonnh/PyOrgMode), it works for some files (but still needs corrections: it crashes with date formats, with bold markers, etc.).

You don't need a Lisp interpreter written in Python, only Python code that understands org syntax without getting confused.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Parsing Org-mode in Python (was: Implementing Org-mode tools in languages other than ELISP)
  2014-01-05 16:43             ` Daniel Clemente
@ 2014-01-06 10:44               ` Karl Voit
  2014-01-07  2:33                 ` Parsing Org-mode in Python François Pinard
  2014-01-07 16:09                 ` Brett Viren
  2014-01-07  1:47               ` Implementing Org-mode tools in languages other than ELISP François Pinard
  1 sibling, 2 replies; 19+ messages in thread
From: Karl Voit @ 2014-01-06 10:44 UTC (permalink / raw)
  To: emacs-orgmode

Hi!

* Daniel Clemente <n142857@gmail.com> wrote:
>> 
>> I dream of having a general Python parser for Org mode files, knowing
>> every bit about the current syntax for Org files, surrounded by enough
>> Python machinery to make it useful.

Oh, this would be great since there are way more Python-coders out
there as ELISP coders.

> Try PyOrgMode (https://github.com/bjonnh/PyOrgMode), it works for
> some files (but still needs corrections: it crashes with date
> formats, with bold markers, etc.).

For my blogging system I am implementing [4] I was doing some
research on current Org-parsers in Python.

My notes about PyOrgMode (2013-05) were that there is not much of a
documentation to use it properly and that the list of open todos
contains rather basic things to consider it elaborated enough.

So far, I consider my own Python parser[1] as the most advanced
Python parser so far (unfortunately). However, I am completely aware
of its downsides:

- it's a very primitive line-by-line parser and not using any classical
  parsing tool at all (works for me so far!)
- it's currently limited to a few Org-mode elements so that I can
  continue to develop my blogging system
  - more Org-mode elements (not all!) will be added when my blogging
    system gets stable enough to add Org-mode syntax features such
    as tables.
- it's not written with the premise to be a stand-alone Org-mode
  parser since I only need it for my blogging system
  - feel free to use it and modify it to be a stand-alone parser

I do think that for a more general approach, somebody should develop
an Org-mode Python parser with classical parsing engines. I do have
some experience with ply[2]. Unfortunately, I have to say that using
ply feels a bit awkward in Python. I did not get the impression that
this is a parsing engine that is done the Python way. A lot of
things are done by convention (naming stuff, and so on) which has
certain limitations in details. And AFAIR there were more things that
puzzled me. However, it got my (simple) job [3] done back then.

> You don't need a Lisp interpreter written in Python, only Python
> code that understands org syntax without getting confused.

I am no expert in this. I do feel that if you are going to use a
ELISP interpreter to parse Org-mode syntax for Python, this should
completely re-use the original Org-parser and nothing else. I have
no idea if this is possible or not.

If you have to implement a parser on your own, you probably should
stick to Python-only.

In order to avoid confusion, your own Python parser implements only
a very well defined and documented sub-set of Org-mode syntax and
should accept/parse everything else als ordinary text (content).
IMHO.

HTH.

  1. https://github.com/novoid/lazyblorg/blob/master/lib/orgparser.py
  2. http://www.dabeaz.com/ply/
  3. https://github.com/novoid/2011-04-tagstore-formal-experiment/tree/master/analysis_and_derived_data/scripts
  4. https://github.com/novoid/lazyblorg
-- 
mail|git|SVN|photos|postings|SMS|phonecalls|RSS|CSV|XML to Org-mode:
       > get Memacs from https://github.com/novoid/Memacs <

https://github.com/novoid/extract_pdf_annotations_to_orgmode + more on github

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Implementing Org-mode tools in languages other than ELISP
  2014-01-05 16:43             ` Daniel Clemente
  2014-01-06 10:44               ` Parsing Org-mode in Python (was: Implementing Org-mode tools in languages other than ELISP) Karl Voit
@ 2014-01-07  1:47               ` François Pinard
  1 sibling, 0 replies; 19+ messages in thread
From: François Pinard @ 2014-01-07  1:47 UTC (permalink / raw)
  To: emacs-orgmode

Daniel Clemente <n142857@gmail.com> writes:

>> I dream of having a general Python parser for Org mode files, knowing
>> every bit about the current syntax for Org files, surrounded by enough
>> Python machinery to make it useful.

> Try PyOrgMode (https://github.com/bjonnh/PyOrgMode), it works for some
> files (but still needs corrections: it crashes with date formats, with
> bold markers, etc.).

Hi, Daniel.  As Karl points out (in a kind way), PyOrgMode is rather far
from "knowing every bit about the current syntax for Org files".  My
feeling is that this effort should be restarted afresh.

> You don't need a Lisp interpreter written in Python, only Python code
> that understands org syntax without getting confused.

Well, I would prefer a Python-only solution, rather than requiring Emacs
and using it under the scene.

François

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Parsing Org-mode in Python
  2014-01-06 10:44               ` Parsing Org-mode in Python (was: Implementing Org-mode tools in languages other than ELISP) Karl Voit
@ 2014-01-07  2:33                 ` François Pinard
  2014-01-07 16:09                 ` Brett Viren
  1 sibling, 0 replies; 19+ messages in thread
From: François Pinard @ 2014-01-07  2:33 UTC (permalink / raw)
  To: emacs-orgmode

Karl Voit <devnull@Karl-Voit.at> writes:

> I did not get the impression that [ply] is a parsing engine that is
> done the Python way.

PLY has pros and cons.  SPARK[1] always attracted me as being more
elegant.  While it accepts a wider set of grammars than PLY, SPARK can
become quite slow on grammars which are less "natural" (admittedly a
very fuzzy, subjective term).  For simpler grammars, recursive descent
does the job at good enough speed, and often, grammars can be rearranged
a bit so the lexer could cleverly help the parser.  Of course, it looks
like more work writing a recursive descent parser, yet many times in my
experience, the programmer is amply repaid with simplicity and clarity.

>> You don't need a Lisp interpreter written in Python, only Python
>> code that understands org syntax without getting confused.

> if you are going to use a ELISP interpreter to parse Org-mode syntax
> for Python, this should completely re-use the original Org-parser and
> nothing else.  I have no idea if this is possible or not.  If you have
> to implement a parser on your own, you probably should stick to
> Python-only.

Hey hey, it's fun! :-) You misunderstood me, but this is constructive
actually, as you raise good points.  In my dreams, a pure Python parser
parses Org mode files.  However, here and there in the parsed files, as
data, we can see bits of Emacs Lisp code, or even Calc syntax at some
places.  That Emacs Lisp code could be mere constants or identifiers,
but sometimes more complex, evalable S-expressions.

A parser is probably of limited use if it does not come with some
extra-tools covering most frequent use cases around the syntax, and I
guess that pressure will develop to have some kind of Emacs Lisp
interpreter, hardly complete, probably only mild or even ridiculous.

The interesting idea in your comments is that, *if* we had an Emacs Lisp
interpreter of serious quality, that interpreter could use "the original
Org-parser and nothing else".  That would solve maintenance, as the
parser would be wholly external, to be found in Org mode distribution,
all standard.  But this avenue is quite unlikely: it looks like a major
undertaking to me, and while such a parser would be useful on small data
excerpts within an Org file, it might be inordinately slow if it had to
interpret a lot of Lisp code while deciphering big Org files.

Worse, keeping a Python parser in sync with the true Emacs Lisp parser
would require much energy, maybe only once in a while, but extended over
a long period of time.  Unless a great enthusiasm exists, distributed on
many people, such projects are always doomed to fail.  Not many people
are ready to commit themselves for life in the required maintenance.

François

---------------
[1] http://pages.cpsc.ucalgary.ca/~aycock/spark/

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Parsing Org-mode in Python
  2014-01-06 10:44               ` Parsing Org-mode in Python (was: Implementing Org-mode tools in languages other than ELISP) Karl Voit
  2014-01-07  2:33                 ` Parsing Org-mode in Python François Pinard
@ 2014-01-07 16:09                 ` Brett Viren
  2014-01-08 12:33                   ` François Pinard
  1 sibling, 1 reply; 19+ messages in thread
From: Brett Viren @ 2014-01-07 16:09 UTC (permalink / raw)
  To: news1142; +Cc: emacs-orgmode

[-- Attachment #1: Type: text/plain, Size: 1899 bytes --]

Hi Karl,

Karl Voit <devnull@Karl-Voit.at> writes:

> Hi!
>
> * Daniel Clemente <n142857@gmail.com> wrote:
>>> 
>>> I dream of having a general Python parser for Org mode files, knowing
>>> every bit about the current syntax for Org files, surrounded by enough
>>> Python machinery to make it useful.
>
> Oh, this would be great since there are way more Python-coders out
> there as ELISP coders.

I agree.

I'm also (slowly) working toward some Python-based org processing.  My
strategy is to produce an intermediate file in JSON format which is
designed to capture the full org document structure.  I am calling this
a "shunt" export as it is meant to do as little interpretation of the
document as possible.

If this is interesting to you and you haven't already seen it please
check the thread from December were I got a lot of help to output this
JSON via the new org export mechanism (I'm a LISP newbie).  Here is the
concluding post with a working example:

  http://permalink.gmane.org/gmane.emacs.orgmode/79838

Besides any eventual Python-side development, one remaining gap in my
plan is how to produce some kind of schema description using the org
exporter machinery.  I want to have this description generated
automatically so that any future changes to the org format can be
accommodated with some level of automation.

So, my current thinking is to find a way to exploit org export machinery
to generate this schema (call it a "meta-shunt" export?).  If I can find
that I'll output it as another JSON file.  Then, on the Python-side, I
will read this schema file in and generate instances of
collections.namedtuple.  Finally a reader of the JSON org document will
be developed to produce objects of these namedtuple classes.

At the end of the day one will have a DOM-style data structure
representing the initial org document.

-Brett.

[-- Attachment #2: Type: application/pgp-signature, Size: 197 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Parsing Org-mode in Python
  2014-01-07 16:09                 ` Brett Viren
@ 2014-01-08 12:33                   ` François Pinard
  2014-01-08 15:42                     ` Brett Viren
  0 siblings, 1 reply; 19+ messages in thread
From: François Pinard @ 2014-01-08 12:33 UTC (permalink / raw)
  To: emacs-orgmode

Brett Viren <bv@bnl.gov> writes:

> I'm also (slowly) working toward some Python-based org processing.  My
> strategy is to produce an intermediate file in JSON format which is
> designed to capture the full org document structure.  I am calling
> this a "shunt" export as it is meant to do as little interpretation of
> the document as possible.

Might be interesting, indeed!

>   http://permalink.gmane.org/gmane.emacs.orgmode/79838

This yields:

,----
| Not Found
| 
| The requested URL /gmane.emacs.orgmode/79838 was not found on this server.
`----

> At the end of the day one will have a DOM-style data structure
> representing the initial org document.

Keep me (us!) posted! :-)

François

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Parsing Org-mode in Python
  2014-01-08 12:33                   ` François Pinard
@ 2014-01-08 15:42                     ` Brett Viren
  2014-01-08 16:11                       ` François Pinard
  2014-01-09  4:13                       ` Daniel Clemente
  0 siblings, 2 replies; 19+ messages in thread
From: Brett Viren @ 2014-01-08 15:42 UTC (permalink / raw)
  To: François Pinard; +Cc: emacs-orgmode

[-- Attachment #1: Type: text/plain, Size: 1629 bytes --]

François Pinard <pinard@iro.umontreal.ca> writes:

> Brett Viren <bv@bnl.gov> writes:
>
>>   http://permalink.gmane.org/gmane.emacs.orgmode/79838
>
> This yields:
>
> ,----
> | Not Found
> | 
> | The requested URL /gmane.emacs.orgmode/79838 was not found on this server.
> `----

Huh, maybe a transient failure?  It's there for me right now.  Here is
the same message from GNU's archive:

  http://lists.gnu.org/archive/html/emacs-orgmode/2013-12/msg00415.html

In any case, here is the salient chunk:

#+BEGIN_SRC elisp
  (require 'json)
  (let* ((tree (org-element-parse-buffer 'object nil)))
    (org-element-map tree (append org-element-all-elements
    org-element-all-objects '(plain-text))
      (lambda (x) 
        (if (org-element-property :parent x)
            (org-element-put-property x :parent "none"))
        (if (org-element-property :structure x)
            (org-element-put-property x :structure "none"))
        ))
    (write-region
     (json-encode tree) 
      nil "foo.dat"))
#+END_SRC

This test is meant to run from inside an org-mode buffer which itself
provides the fodder for the test.  But, it shows the steps that I'll
need to integrate into some new org export mechanism.  The important
part is nulling out the :parent and :structure (and maybe others?)
properties in order to break their circular references.  The heavy
lifting is all in org-element-parse-buffer and json-encode.

>> At the end of the day one will have a DOM-style data structure
>> representing the initial org document.
>
> Keep me (us!) posted! :-)

Definitely!  
-Brett.

[-- Attachment #2: Type: application/pgp-signature, Size: 197 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Parsing Org-mode in Python
  2014-01-08 15:42                     ` Brett Viren
@ 2014-01-08 16:11                       ` François Pinard
  2014-01-09  4:13                       ` Daniel Clemente
  1 sibling, 0 replies; 19+ messages in thread
From: François Pinard @ 2014-01-08 16:11 UTC (permalink / raw)
  To: Brett Viren; +Cc: emacs-orgmode

[-- Attachment #1: Type: text/plain, Size: 309 bytes --]

2014/1/8 Brett Viren <bv@bnl.gov>

Huh, maybe a transient failure?  It's there for me right now.  Here is
> the same message from GNU's archive:
>
>   http://lists.gnu.org/archive/html/emacs-orgmode/2013-12/msg00415.html


Got it, thanks! :-)

-- 
François Pinard http://pinard.progiciels-bpi.ca

[-- Attachment #2: Type: text/html, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Parsing Org-mode in Python
  2014-01-08 15:42                     ` Brett Viren
  2014-01-08 16:11                       ` François Pinard
@ 2014-01-09  4:13                       ` Daniel Clemente
  2014-01-09 14:13                         ` Brett Viren
  1 sibling, 1 reply; 19+ messages in thread
From: Daniel Clemente @ 2014-01-09  4:13 UTC (permalink / raw)
  To: Brett Viren; +Cc: François Pinard, emacs-orgmode

El Wed, 08 Jan 2014 10:42:17 -0500 Brett Viren va escriure:
> 
>   http://lists.gnu.org/archive/html/emacs-orgmode/2013-12/msg00415.html
> 
> In any case, here is the salient chunk:
> 
> #+BEGIN_SRC elisp
>   (require 'json)
>   (let* ((tree (org-element-parse-buffer 'object nil)))
>     (org-element-map tree (append org-element-all-elements
>     org-element-all-objects '(plain-text))
>       (lambda (x) 
>         (if (org-element-property :parent x)
>             (org-element-put-property x :parent "none"))
>         (if (org-element-property :structure x)
>             (org-element-put-property x :structure "none"))
>         ))
>     (write-region
>      (json-encode tree) 
>       nil "foo.dat"))
> #+END_SRC
> 

  I like this very much. This output is much easier to parse than the source .org file, and it's still using the original Elisp parser (so you don't need a Python parser).
  I hope ox-json.el gets into org-mode some day.

  Are there already Python parsers for it?
  Should ox-json's output be as raw as possible (e.g. what your code produces now) or transformed to simpler JSON?
  (I think both formats should coexist).
  

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Parsing Org-mode in Python
  2014-01-09  4:13                       ` Daniel Clemente
@ 2014-01-09 14:13                         ` Brett Viren
  0 siblings, 0 replies; 19+ messages in thread
From: Brett Viren @ 2014-01-09 14:13 UTC (permalink / raw)
  To: Daniel Clemente; +Cc: François Pinard, emacs-orgmode

[-- Attachment #1: Type: text/plain, Size: 2138 bytes --]

Hi Daniel,

Daniel Clemente <n142857@gmail.com> writes:

>   Are there already Python parsers for it?

Parsing generic JSON is fairly trivial in Python.

  import json
  data = json.dumps(open('file.json').read())

The resulting "data" is then a bunch of Python lists and/or dicts
matching whatever structure was output from org and is in the .json
file.  The schema in these three contexts are (will be) identical.

At this point, Pythonistas can do what they want with "data".  Although,
as I mentioned, I'd like to put another layer on this "raw" data
structure which expresses/enforces the org schema as understood by the
org-exporter.  If I can figure out how to dump a representation of this
schema from org I'll express it as a set of generated
collections.namedtuple instances.  We'll see.

>   Should ox-json's output be as raw as possible (e.g. what your code
> produces now) or transformed to simpler JSON?
>   (I think both formats should coexist).

I suppose there may be a usefulness to "winnow down" the structure.  One
thing I'm thinking about here is the narrowing done to support the "blog
From anywhere" feature of Karl's lazyblorg mentioned in this thread.

That can be done either on the emacs side or Python side (or both, in
principle).  However, my intention is to do as little modification of
the org document structure on the emacs-side in order to preserve
details that may possibly be interesting on the Python-side in the
future.  Also, I'm still learning LISP but know Python fairly well so
would rather do as much processing as possible on the Python side. :)

So far the only thing I see that needs to be stripped is the :parent
property (and the :structure, which really should be resolved as a copy
instead of being stripped) which cause the emacs-side data structure to
become a Circular Object and thus break the emacs JSON dumper.  

I just noticed that Python's JSON dumper can do this kind of stripping
implicitly and in general.  It might be nice if someone were to add such
a feature to the emacs JSON dumper but I don't plan to try this.

-Brett.


[-- Attachment #2: Type: application/pgp-signature, Size: 197 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2014-01-09 14:14 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-11-22 16:37 Converting org-mode/org-contacts to VCard (importing to Android) Karl Voit
2013-11-22 16:54 ` Rüdiger Sonderfeld
2013-11-22 17:09   ` Karl Voit
2013-11-23  0:00     ` Rüdiger Sonderfeld
2013-11-24 13:06       ` Implementing Org-mode tools in languages other than ELISP (was: Converting org-mode/org-contacts to VCard (importing to Android)) Karl Voit
2014-01-04 14:58         ` Implementing Org-mode tools in languages other than ELISP Bastien
2014-01-04 16:43           ` François Pinard
2014-01-05 16:43             ` Daniel Clemente
2014-01-06 10:44               ` Parsing Org-mode in Python (was: Implementing Org-mode tools in languages other than ELISP) Karl Voit
2014-01-07  2:33                 ` Parsing Org-mode in Python François Pinard
2014-01-07 16:09                 ` Brett Viren
2014-01-08 12:33                   ` François Pinard
2014-01-08 15:42                     ` Brett Viren
2014-01-08 16:11                       ` François Pinard
2014-01-09  4:13                       ` Daniel Clemente
2014-01-09 14:13                         ` Brett Viren
2014-01-07  1:47               ` Implementing Org-mode tools in languages other than ELISP François Pinard
2013-11-23  8:36     ` Converting org-mode/org-contacts to VCard (importing to Android) Feng Shu
2013-11-23  8:29 ` Feng Shu

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).