Presenting Hyperorg version 0.1.0: The Org to HTML Converter

emacs-orgmode@gnu.org archives
 help / color / mirror / code / Atom feed

* Presenting Hyperorg version 0.1.0: The Org to HTML Converter
@ 2024-03-19  9:23 c.buhtz
  2024-03-20 13:09 ` Ihor Radchenko
  0 siblings, 1 reply; 10+ messages in thread
From: c.buhtz @ 2024-03-19  9:23 UTC (permalink / raw)
  To: Emacs orgmode

Hello,

I am excited to introduce Hyperorg 0.1.0 [1], designed to efficiently 
convert files
from Emacs Org-mode and Org-roam into clean HTML files. Ideal for 
Zettelkasten
enthusiasts and personal wiki creators, Hyperorg simplifies the 
conversion process,
ensuring your nodes correctly interlinked and accurately transformed 
into HTML format.

Key Features:
- Converts Emacs Org-mode and Org-roam files to pure HTML5 and CSS.
- Tailored for Zettelkasten and personal wiki use cases.
- Written in Python3.

Visit the project website for clear installation instructions [2].

Have questions, feature ideas, or encountered any bugs?
I appreciate your feed back and do work now on the issues for the next 
release.

I am also very thankful the support of this community who helped and 
inspired me with some parsing details.

Best regards
Christian Buhtz

[1] -- <https://codeberg.org/buhtz/hyperorg>
[2] -- <https://codeberg.org/buhtz/hyperorg#installation>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Presenting Hyperorg version 0.1.0: The Org to HTML Converter
  2024-03-19  9:23 Presenting Hyperorg version 0.1.0: The Org to HTML Converter c.buhtz
@ 2024-03-20 13:09 ` Ihor Radchenko
  2024-03-23 13:50   ` c.buhtz
  0 siblings, 1 reply; 10+ messages in thread
From: Ihor Radchenko @ 2024-03-20 13:09 UTC (permalink / raw)
  To: c.buhtz; +Cc: Emacs orgmode

c.buhtz@posteo.jp writes:

> I am excited to introduce Hyperorg 0.1.0 [1], designed to efficiently 
> convert files
> from Emacs Org-mode and Org-roam into clean HTML files. Ideal for 
> Zettelkasten
> enthusiasts and personal wiki creators, Hyperorg simplifies the 
> conversion process,
> ensuring your nodes correctly interlinked and accurately transformed 
> into HTML format.

Thanks for sharing!
I am wondering why you used pure python rather than built-in export
backend extension support.

> Key Features:
> - Converts Emacs Org-mode and Org-roam files to pure HTML5 and CSS.

May you outline the benefits compared to built-in ox-html (with
org-html-html5-fancy enabled), ox-publish, and possibly with other Org
mode blog generators (see "Publishing" section in
https://orgmode.org/tools.html)?

> - Tailored for Zettelkasten and personal wiki use cases.

May you elaborate what exactly is specially tailored?

-- 
Ihor Radchenko // yantar92,
Org mode contributor,
Learn more about Org mode at <https://orgmode.org/>.
Support Org development at <https://liberapay.com/org-mode>,
or support my work at <https://liberapay.com/yantar92>


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Presenting Hyperorg version 0.1.0: The Org to HTML Converter
  2024-03-20 13:09 ` Ihor Radchenko
@ 2024-03-23 13:50   ` c.buhtz
  2024-03-23 13:58     ` Ihor Radchenko
  0 siblings, 1 reply; 10+ messages in thread
From: c.buhtz @ 2024-03-23 13:50 UTC (permalink / raw)
  To: Emacs orgmode

On 2024-03-20 13:09 Ihor Radchenko <yantar92@posteo.net> wrote:
> May you outline the benefits

Now I added a section about that topic. Of course I am open for
discussion and feedback.

    https://codeberg.org/buhtz/hyperorg#benefits-compared-to-similar-tools


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Presenting Hyperorg version 0.1.0: The Org to HTML Converter
  2024-03-23 13:50   ` c.buhtz
@ 2024-03-23 13:58     ` Ihor Radchenko
  2024-03-23 19:45       ` c.buhtz
  0 siblings, 1 reply; 10+ messages in thread
From: Ihor Radchenko @ 2024-03-23 13:58 UTC (permalink / raw)
  To: c.buhtz; +Cc: Emacs orgmode

<c.buhtz@posteo.jp> writes:

> On 2024-03-20 13:09 Ihor Radchenko <yantar92@posteo.net> wrote:
>> May you outline the benefits
>
> Now I added a section about that topic. Of course I am open for
> discussion and feedback.
>
>     https://codeberg.org/buhtz/hyperorg#benefits-compared-to-similar-tools

Thanks! Although I am actually more interested in other questions - why
custom parser and what is tailored for zettelkasten.

-- 
Ihor Radchenko // yantar92,
Org mode contributor,
Learn more about Org mode at <https://orgmode.org/>.
Support Org development at <https://liberapay.com/org-mode>,
or support my work at <https://liberapay.com/yantar92>


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Presenting Hyperorg version 0.1.0: The Org to HTML Converter
  2024-03-23 13:58     ` Ihor Radchenko
@ 2024-03-23 19:45       ` c.buhtz
  2024-03-24 13:31         ` Ihor Radchenko
  0 siblings, 1 reply; 10+ messages in thread
From: c.buhtz @ 2024-03-23 19:45 UTC (permalink / raw)
  To: Emacs orgmode

On 2024-03-23 13:58 Ihor Radchenko <yantar92@posteo.net> wrote:
> Although I am actually more interested in other questions -
> why custom parser and what is tailored for zettelkasten.

What do you mean by "custom parser"?

Zettelkasten? Hyperorg handles the links between nodes out of the box
including the backlinks. It also creates an index (nodes sorted by
title, tags, etc pp).

Of course with Emacs is everything possible even Coffee making. But the
difference are the resources you have to invest into configure it. This
is much even if you know Lisp.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Presenting Hyperorg version 0.1.0: The Org to HTML Converter
  2024-03-23 19:45       ` c.buhtz
@ 2024-03-24 13:31         ` Ihor Radchenko
  2024-03-24 14:22           ` c.buhtz
  0 siblings, 1 reply; 10+ messages in thread
From: Ihor Radchenko @ 2024-03-24 13:31 UTC (permalink / raw)
  To: c.buhtz; +Cc: Emacs orgmode

<c.buhtz@posteo.jp> writes:

> On 2024-03-23 13:58 Ihor Radchenko <yantar92@posteo.net> wrote:
>> Although I am actually more interested in other questions -
>> why custom parser and what is tailored for zettelkasten.
>
> What do you mean by "custom parser"?

Hmm. I thought that you implemented Org parser in python from scratch.
Now, I see that you are using orgparse.

Wondering what you are referring to when mentioning "resilient when
dealing with parser issues".

> Zettelkasten? Hyperorg handles the links between nodes out of the box
> including the backlinks. It also creates an index (nodes sorted by
> title, tags, etc pp).
>
> Of course with Emacs is everything possible even Coffee making. But the
> difference are the resources you have to invest into configure it. This
> is much even if you know Lisp.

index can be produced with minimal configuration via ox-publish.
Backlinks are certainly a novelty. I do not recall Org publishing
systems that produce backlinks automatically (not via dynamic block).

-- 
Ihor Radchenko // yantar92,
Org mode contributor,
Learn more about Org mode at <https://orgmode.org/>.
Support Org development at <https://liberapay.com/org-mode>,
or support my work at <https://liberapay.com/yantar92>


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Presenting Hyperorg version 0.1.0: The Org to HTML Converter
  2024-03-24 13:31         ` Ihor Radchenko
@ 2024-03-24 14:22           ` c.buhtz
  2024-03-24 14:40             ` Ihor Radchenko
  0 siblings, 1 reply; 10+ messages in thread
From: c.buhtz @ 2024-03-24 14:22 UTC (permalink / raw)
  To: Emacs orgmode

Dear Ihor,

thanks for your reply.

Am 24.03.2024 14:31 schrieb Ihor Radchenko:
> Hmm. I thought that you implemented Org parser in python from scratch.
> Now, I see that you are using orgparse.

The orgparse package do not parse much of on org file. IT does parse the 
meta infos (property drawers, etc) but not the content of an orgfile. In 
the long run I might replay orgparse to reduce dependencies.

Beside orgparse yes I implement an org parser.

> Wondering what you are referring to when mentioning "resilient when
> dealing with parser issues".

Orgparse do throw exceptions e.g. UnicodeDecodeError or when timestamps 
are invalid. Hyperorg catch that exceptions and go on with the next node 
without interrupting the whole process.
Other things are "invalid" links, e.g. unknown orgids, unknown roam 
links, unsupported "link kinds" ("protocols" in org syntax?; e.g. 
"inkscape:").
Additionally there are multiple fancy but not supported org features 
(e.g. tables) currently not supported. Hyperorg shouldn't stop or crash 
at this point.

> index can be produced with minimal configuration via ox-publish.

"minimal" is a subjective term here. Again I don't blame the tools or 
the Emacs universe.
But for me it is not even minimal to get ox-publish run in the first 
place. Not speaking about further modifications, e.g. an index.

Emacs, Lisp and its "documentation" is a special thing not everybody can 
or want to handle. I would have to invest so much resources into basics 
like Lisp just to understand the documentation in a way that I would be 
able to modify the publishing feature in a (for me) satisfying way. I am 
the problem not Emacs and Co. ;)

Kind
Christian Buhtz

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Presenting Hyperorg version 0.1.0: The Org to HTML Converter
  2024-03-24 14:22           ` c.buhtz
@ 2024-03-24 14:40             ` Ihor Radchenko
  2024-03-24 16:59               ` c.buhtz
  0 siblings, 1 reply; 10+ messages in thread
From: Ihor Radchenko @ 2024-03-24 14:40 UTC (permalink / raw)
  To: c.buhtz; +Cc: Emacs orgmode

c.buhtz@posteo.jp writes:

> Am 24.03.2024 14:31 schrieb Ihor Radchenko:
>> Hmm. I thought that you implemented Org parser in python from scratch.
>> Now, I see that you are using orgparse.
>
> The orgparse package do not parse much of on org file. IT does parse the 
> meta infos (property drawers, etc) but not the content of an orgfile. In 
> the long run I might replay orgparse to reduce dependencies.
>
> Beside orgparse yes I implement an org parser.

Thanks for the  clarification.
If it is an option, it would be nice if you upstreamed your additions to
orgparse. This way, we can get a better Python-based Org parser for
everyone's benefit.

>> Wondering what you are referring to when mentioning "resilient when
>> dealing with parser issues".
>
> Orgparse do throw exceptions e.g. UnicodeDecodeError or when timestamps 
> are invalid. Hyperorg catch that exceptions and go on with the next node 
> without interrupting the whole process.

I see. FYI, it is a bug to throw an error when parsing Org document. Any
kind of text file is a valid Org document. There is no notion of invalid
syntax in Org markup.

> Other things are "invalid" links, e.g. unknown orgids, unknown roam 
> links, unsupported "link kinds" ("protocols" in org syntax?; e.g. 
> "inkscape:").

In Org terminology, we call these "broken" links.
"link kinds" are link "types".

> Additionally there are multiple fancy but not supported org features 
> (e.g. tables) currently not supported. Hyperorg shouldn't stop or crash 
> at this point.

Do you mean that orgparse throws an error when encountering tables?
If so, it is slightly odd to see this implementation detail listed in
"Benefits".

Generally, part of the "Benefits" section is a bit hand-wavy. I
recommend using more clear statements. Otherwise, it is not clear what
exactly the benefits are.

I'd suggest the following:

1. Drop "Fairly resilient when dealing with parser issues."
2. Reword "Fairly resilient managing dead and problematic links which
are a common phenomenon when working with a constantly evolving
Zettelkasten or personal wiki." And instead clearly explain how broken
links are exported.
3. Supply "Generates a comprehensive index of all nodes." with a
   screenshot
4. Drop "Adhers to World Wide Web Consortium (W3C) standards for HTML5
   and CSS (<!DOCTYPE html>)." Most other blog exporters for Org mode
   adhere to standards. And those that are not are probably out of
   interest for the purposes of comparison.
5. Maybe mention the "tag cloud" visible in the example screenshot (btw,
   the screenshot is not very sexy; compare it with something like
   https://one.tonyaldon.com/).

>> index can be produced with minimal configuration via ox-publish.
>
> "minimal" is a subjective term here. Again I don't blame the tools or 
> the Emacs universe.
> But for me it is not even minimal to get ox-publish run in the first 
> place. Not speaking about further modifications, e.g. an index.

Clear.

-- 
Ihor Radchenko // yantar92,
Org mode contributor,
Learn more about Org mode at <https://orgmode.org/>.
Support Org development at <https://liberapay.com/org-mode>,
or support my work at <https://liberapay.com/yantar92>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Presenting Hyperorg version 0.1.0: The Org to HTML Converter
  2024-03-24 14:40             ` Ihor Radchenko
@ 2024-03-24 16:59               ` c.buhtz
  2024-03-24 18:15                 ` Ihor Radchenko
  0 siblings, 1 reply; 10+ messages in thread
From: c.buhtz @ 2024-03-24 16:59 UTC (permalink / raw)
  To: Emacs orgmode

Dear Ihor,

I will consider your points and take them into account.

On 2024-03-24 14:40 Ihor Radchenko <yantar92@posteo.net> wrote:
> If it is an option, it would be nice if you upstreamed your additions
> to orgparse. This way, we can get a better Python-based Org parser for
> everyone's benefit.

The code is free. The orgparse maintainer is free to re-use it of
course. On the other hand in the long run I will consider to separate
my parsing code into an extra package. But currently it is to unstable
and do support only a small subset of all org(roam) features.

> > Orgparse do throw exceptions e.g. UnicodeDecodeError or when
> > timestamps are invalid. Hyperorg catch that exceptions and go on
> > with the next node without interrupting the whole process.
> 
> I see. FYI, it is a bug to throw an error when parsing Org document.
> Any kind of text file is a valid Org document. There is no notion of
> invalid syntax in Org markup.

You mean throw an error is a bug because it is not possible to
write invalid org documents?

I am not convinced yet. But I am open to it and willing to learn.

Even org-html-export* itself do throw errors and stop processing when
there are unknown orgids.

What is about an inconsistent block?

#begin_src
foobar
#end_example

> > Other things are "invalid" links, e.g. unknown orgids, unknown roam 
> > links, unsupported "link kinds" ("protocols" in org syntax?; e.g. 
> > "inkscape:").
> 
> In Org terminology, we call these "broken" links.
> "link kinds" are link "types".

The term "types" is to broad and conflicts with Pythons in build
functions. ;) That is the main reason why I used "kind". On the other
hand the org syntax reference IMHO also use the term "protocol".

> Do you mean that orgparse throws an error when encountering tables?

No. I was referring to Hyperorg. OrgParse do not parse any org content
except headings and properties. Nearly everything else is unparsed and
given raw to my.

> Generally, part of the "Benefits" section is a bit hand-wavy. I
> recommend using more clear statements. Otherwise, it is not clear what
> exactly the benefits are.

Again. It is also not "clear" for me. There are benefits just for
myself as an low-level-Emacs-and-org-user, someone who get headaches
reading Lisp code and feeling very comfortable using Python. In short:
My opinion is very subjective. And I don't have enough experience to
compare my tool to others.
I tried to make this point clear in my benefits section. And this is
also the reason why there was no benefits section in the first place
because I wasn't clear enough about what to write in there.

Maybe I should rephrase the section to "Benefits and design goals".

> 1. Drop "Fairly resilient when dealing with parser issues."

Why? The "design goal" is to process all nodes no matter how
bad/invalid they are.

> 2. Reword "Fairly resilient managing dead and problematic links which
> are a common phenomenon when working with a constantly evolving
> Zettelkasten or personal wiki." And instead clearly explain how broken
> links are exported.

I don't want to blow up the text. Not sure what you expect here. The
node is exported as HTML but the link is colorful highlighted and a
tooltip explaining the problem is added.

> 4. Drop "Adhers to World Wide Web Consortium (W3C) standards for HTML5
>    and CSS (<!DOCTYPE html>)." Most other blog exporters for Org mode
>    adhere to standards. And those that are not are probably out of
>    interest for the purposes of comparison.

Why?

Btw: Even code generated by org-html-export* (XHTML 1.0 Strict) give
errors on W3C. e.g. "type" attribute is missing in <style> tag.

> 5. Maybe mention the "tag cloud" visible in the example screenshot
> (btw, the screenshot is not very sexy; compare it with something like
>    https://one.tonyaldon.com/).

There nothing fancy as a "tag cloud". ;) btw: There is no cloud on the
link you provided.

About the "sexy"ness of Hyperorg output: There is a specific label for
that issues:

    <https://codeberg.org/buhtz/hyperorg/issues?labels=180551>

But as you can see on the "milestone" the priority is low.

Thanks for your effort and your discussion. I learn a lot.

Kind
Christian Buhtz

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Presenting Hyperorg version 0.1.0: The Org to HTML Converter
  2024-03-24 16:59               ` c.buhtz
@ 2024-03-24 18:15                 ` Ihor Radchenko
  0 siblings, 0 replies; 10+ messages in thread
From: Ihor Radchenko @ 2024-03-24 18:15 UTC (permalink / raw)
  To: c.buhtz; +Cc: Emacs orgmode

<c.buhtz@posteo.jp> writes:

> I will consider your points and take them into account.

Thanks!

> On 2024-03-24 14:40 Ihor Radchenko <yantar92@posteo.net> wrote:
>> If it is an option, it would be nice if you upstreamed your additions
>> to orgparse. This way, we can get a better Python-based Org parser for
>> everyone's benefit.
>
> The code is free. The orgparse maintainer is free to re-use it of
> course. On the other hand in the long run I will consider to separate
> my parsing code into an extra package. But currently it is to unstable
> and do support only a small subset of all org(roam) features.

It would help if you notify orgparse maintainer once your code gets more
stable ;)

>> I see. FYI, it is a bug to throw an error when parsing Org document.
>> Any kind of text file is a valid Org document. There is no notion of
>> invalid syntax in Org markup.
>
> You mean throw an error is a bug because it is not possible to
> write invalid org documents?

Yup.

> I am not convinced yet. But I am open to it and willing to learn.
>
> Even org-html-export* itself do throw errors and stop processing when
> there are unknown orgids.

This has nothing to do with the parser.
Erring on unknown ids/paths is a special _feature_ of Org exporter
controlled by `org-export-with-broken-links' variable.
`org-export-with-broken-links' is nil by default simply because (1) Org
export has no sensible way to export links that point to nowhere; (2)
Such links are generally unwanted and need to be corrected by the user
in many use cases.

> What is about an inconsistent block?
>
> #begin_src
> foobar
> #end_example

With your example, the following AST will be produced by Org parser
(`org-element-parse-buffer'):

(org-data
 (section
  (paragraph
   "#+begin" (subscript "src")
   "\nfoobar\n#+end" (subscript "example")
   "\n"))

>> > Other things are "invalid" links, e.g. unknown orgids, unknown roam 
>> > links, unsupported "link kinds" ("protocols" in org syntax?; e.g. 
>> > "inkscape:").
>> 
>> In Org terminology, we call these "broken" links.
>> "link kinds" are link "types".
>
> The term "types" is to broad and conflicts with Pythons in build
> functions. ;) That is the main reason why I used "kind". On the other
> hand the org syntax reference IMHO also use the term "protocol".

Syntax reference says the following:

PROTOCOL
A string which is one of the link type strings in org-link-parameters
                             ^^^^^^^^^

We also always say "link type" in the manual.

I just made things more explicit, replacing PROTOCOL with LINKTYPE:
https://git.sr.ht/~bzg/worg/commit/0634eed3

>> Generally, part of the "Benefits" section is a bit hand-wavy. I
>> recommend using more clear statements. Otherwise, it is not clear what
>> exactly the benefits are.
>
> Again. It is also not "clear" for me. There are benefits just for
> myself as an low-level-Emacs-and-org-user, someone who get headaches
> reading Lisp code and feeling very comfortable using Python. In short:
> My opinion is very subjective. And I don't have enough experience to
> compare my tool to others.
> I tried to make this point clear in my benefits section. And this is
> also the reason why there was no benefits section in the first place
> because I wasn't clear enough about what to write in there.
>
> Maybe I should rephrase the section to "Benefits and design goals".

Maybe something like "Motivation"; to emphasize that the listed points
are your subjective reasons to write the exporter.

Still, it would be useful to have an objective comparison; if you want
to get others to use your package. Having a clear list of reasons why
your package is better is important then. (I implicitly assumed that you
are interested to attract users after you announced the package in
public)

>> 1. Drop "Fairly resilient when dealing with parser issues."
>
> Why? The "design goal" is to process all nodes no matter how
> bad/invalid they are.

Simply because Org mode has no notion of invalid nodes.
So, this kind of goal sounds very strange for me.

>> 2. Reword "Fairly resilient managing dead and problematic links which
>> are a common phenomenon when working with a constantly evolving
>> Zettelkasten or personal wiki." And instead clearly explain how broken
>> links are exported.
>
> I don't want to blow up the text. Not sure what you expect here. The
> node is exported as HTML but the link is colorful highlighted and a
> tooltip explaining the problem is added.

Is it something akin when `org-export-with-broken-links' is set to 'mark?

>> 4. Drop "Adhers to World Wide Web Consortium (W3C) standards for HTML5
>>    and CSS (<!DOCTYPE html>)." Most other blog exporters for Org mode
>>    adhere to standards. And those that are not are probably out of
>>    interest for the purposes of comparison.
>
> Why?

If Org export does not adhere to standards, it is a bug, it should, and
it will be fixed. And some other blog generators that do not use Org
export (like Hugo) do conform to the standards, AFAIK.

> Btw: Even code generated by org-html-export* (XHTML 1.0 Strict) give
> errors on W3C. e.g. "type" attribute is missing in <style> tag.

May you please provide an example? I cannot reproduce.

>> 5. Maybe mention the "tag cloud" visible in the example screenshot
>> (btw, the screenshot is not very sexy; compare it with something like
>>    https://one.tonyaldon.com/).
>
> There nothing fancy as a "tag cloud". ;) btw: There is no cloud on the
> link you provided.

I was referring to top line listing tags.

> About the "sexy"ness of Hyperorg output: There is a specific label for
> that issues:
>
>     <https://codeberg.org/buhtz/hyperorg/issues?labels=180551>
>
> But as you can see on the "milestone" the priority is low.

Fair. Do note that "sexyness" is what attracts users :)

-- 
Ihor Radchenko // yantar92,
Org mode contributor,
Learn more about Org mode at <https://orgmode.org/>.
Support Org development at <https://liberapay.com/org-mode>,
or support my work at <https://liberapay.com/yantar92>


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2024-03-24 18:16 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-03-19  9:23 Presenting Hyperorg version 0.1.0: The Org to HTML Converter c.buhtz
2024-03-20 13:09 ` Ihor Radchenko
2024-03-23 13:50   ` c.buhtz
2024-03-23 13:58     ` Ihor Radchenko
2024-03-23 19:45       ` c.buhtz
2024-03-24 13:31         ` Ihor Radchenko
2024-03-24 14:22           ` c.buhtz
2024-03-24 14:40             ` Ihor Radchenko
2024-03-24 16:59               ` c.buhtz
2024-03-24 18:15                 ` Ihor Radchenko

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).