From mboxrd@z Thu Jan 1 00:00:00 1970 From: Sebastian Rose Subject: Re: Re: Custom entry IDs in HTML export Date: Fri, 17 Apr 2009 00:37:49 +0200 Message-ID: <87prfcgr7m.fsf@kassiopeya.MSHEIMNETZ> References: <87myb7w2s9.fsf@CPU107.opentrends.net> <6BF0FCBC-4343-4B8C-9A16-F4B9AC9B0F48@gmail.com> <87eiwiluft.fsf@gollum.intra.norang.ca> <87y6uqwsjw.fsf@kassiopeya.MSHEIMNETZ> <871vsfjkm3.fsf@CPU107.opentrends.net> <1FEE16B4-2913-487C-8822-094FF4EC725C@gmail.com> <878wm1ugml.fsf@kassiopeya.MSHEIMNETZ> <43758593-D9D0-43BC-B4D9-14E036C66271@gmail.com> <87vdp4u4e8.fsf@kassiopeya.MSHEIMNETZ> <87hc0ob9wc.fsf@kassiopeya.MSHEIMNETZ> <70EC5312-4BB8-4A7F-A2AD-7B96CBF7C068@gmail.com> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Return-path: Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1LuaAB-0002ZT-Gh for emacs-orgmode@gnu.org; Thu, 16 Apr 2009 18:34:31 -0400 Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1LuaA6-0002Xv-VT for emacs-orgmode@gnu.org; Thu, 16 Apr 2009 18:34:31 -0400 Received: from [199.232.76.173] (port=58681 helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1LuaA6-0002Xs-Fm for emacs-orgmode@gnu.org; Thu, 16 Apr 2009 18:34:26 -0400 Received: from mail.gmx.net ([213.165.64.20]:40776) by monty-python.gnu.org with smtp (Exim 4.60) (envelope-from ) id 1LuaA5-00033M-NB for emacs-orgmode@gnu.org; Thu, 16 Apr 2009 18:34:26 -0400 In-Reply-To: <70EC5312-4BB8-4A7F-A2AD-7B96CBF7C068@gmail.com> (Carsten Dominik's message of "Thu, 16 Apr 2009 23:26:48 +0200") List-Id: "General discussions about Org-mode." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: emacs-orgmode-bounces+geo-emacs-orgmode=m.gmane.org@gnu.org Errors-To: emacs-orgmode-bounces+geo-emacs-orgmode=m.gmane.org@gnu.org To: Carsten Dominik Cc: org-mode mailing list , Bernt Hansen Carsten Dominik writes: > On Apr 16, 2009, at 10:50 PM, Sebastian Rose wrote: > >> Carsten Dominik writes: >>> Hi Sebastian, >>> >>> On Apr 16, 2009, at 3:14 PM, Sebastian Rose wrote: >>> >>>> Hm - counter arguments? >>>> >>>> The only counter argument is, that hand made IDs for links are prone to >>>> error. But that risk should be up to the user. >>> >>> Yes. and during the export, I can actually check and throw a warning o= r an >>> error if the same custom ID shows up twice. >>> >>>> >>>> I actually changed my mind a little in this concern. >>>> >>>> If the user clicks a section link in the toc to jump to a section, he >>>> can bookmark the page with exactly that jump target. If the jump target >>>> (the ID) is human readable, the bookmark is more verbose. >>> >>> Yes, this is really the best application. Also, when hovering over int= ernal >>> links, it is helpful if the link displays the human-readable form. >>> >>>> Just one wish: >>>> >>>> The containers should reflect that change (HRID =3D human readable id): >>>> >>>>
>>>>

headline

>>>>
>>>> sections content... >>>>
>>>>
>>> >>> >>> Sure, we can do this. I would then add sec-xxx as one >>> of the alternative anchors as well. >>> >>> However: If I make the structure as you indicate above, >>> do I understand correctly that the structure of a section without a >>> human-readable id should be changed to this: >>> >>>
>>>

headline

>>>
>>> sections content... >>>
>>>
>>> >>> >>> Note the "sec-" which is added to the stuff that currently >>> defines the structure. >> >> >> >> I considered the `sec-' part of the automatic IDs. >> >> In either case I'd have to adjust org-info.js. So why not go for the >> human readable IDs without `sec-'? >> >> >> Right now we have: >> >>
>>

2 Things I want= to find >> out

>>
>> >> The `sec-' part is in the headlines ID only. > > > Why? Because this introduced a parsing inconsistency for you between aut= omatic > and custom IDs. Because for the automatic ones, you need to strip "sec-= " to > retrieve the correct suffix for the container etc names. With the custo= m IDs, > no such stripping should be done. Does this not make things harder? > > - Carsten That's the way it is _now_. The structure above is taken from one of my exported org-files. But it's not that hard to strip `sec-' :) Now the scanning considers `sec-' a prefix - just like `outline-container-' and `outline-text-'. But in the future: If we now plan to use human readable IDs in the TOC, those IDs would be the IDs of the section heading. That's why those IDs should have no `sec-' prefix. Otherwise, bookmark URLs would not be what we want them: http://orgmode.org/org-faq.php#sec-isearch-in-links instead of http://orgmode.org/org-faq.php#isearch-in-links Automatic IDs on the other hand must have a prefix, since an ID may _not_ start with a number. So wouldn't it make sense, to change the IDs of the containers this way: Case _automatic_:

....

....
Case _human-readable_:

....

....
?? Sebastian >> >> >> >> Sebastian >> >> >> >> >>>> That way the script would keep working with older pages. >>>> Automatic IDs and human readable ones could be mixed. >>>> >>>> >>>> The '' anchors are scanned anyway, as are all jump targets = in >>>> the page. >>> >>> Yes, you implemented that some time ago, I remember. >>> >>>> >>>> Maybe this is even the point to re-work the parser of org-info.js to >>>> become independent of the TOC at all. The script could search for >>>> headings instead. That's more work, but the script would then work for >>>> all HTML pages with a structure similar to the org-export's one: >>> >>> So this would mean, we could read web pages with your java >>> support even if those webpages were not created with Org? >>> Pretty cool. >>> >>>>
content
>>>> >>>> but I could postpone this, if you fullfill my wish above. >>> >>> >>> Best wishes >>> >>> - Carsten >>> >>>> >>>> >>>> Best wishes >>>> >>>> Sebastian >>>> >>>> >>>> >>>> >>>> Carsten Dominik writes: >>>>> On Apr 16, 2009, at 10:50 AM, Sebastian Rose wrote: >>>>> >>>>>> Carsten Dominik writes: >>>>>>> Hi Sebastian, >>>>>>> >>>>>>> I kind of like the idea to have a property that can be >>>>>>> used to set an ID, as an alternative to the <> >>>>>>> notation. Actually, using a property seems a lot cleaner, >>>>>>> thanks for coming up with this idea, Daniel. >>>>>>> >>>>>>> I can also follow the reasoning that it is useful to have >>>>>>> the table of contents link to the human-readable id, because >>>>>>> it provides a general, simple workflow to retrieve a link that >>>>>>> will persist through changes of the document. This workflow >>>>>>> was described also by Bernt earlier in this thread. >>>>>>> >>>>>>> Finally, I also agree that the main id in the

tag >>>>>>> should be the automatically generated one because this is >>>>>>> best for automatic processing and because of all the arguments >>>>>>> you have presented. >>>>>>> >>>>>>> Would it cause problems for org-info.js if the toc points to >>>>>>> a user specified anchor in the headline, instead of the main >>>>>>> ID that is inside the

tag? THis would really be the only >>>>>>> required change. >>>>>> >>>>>> >>>>>> I'll have to test this before I can give a final answer to this >>>>>> question. >>>>>> >>>>>> But regardless of the results, I will adjust the script to reflect t= hat >>>>>> change. The script should not rule the HTML export and it will be an >>>>>> easy thing to do. >>>>> >>>>> But I do want to hear any counter arguments you might have.... >>>>> >>>>> - Carsten >>>>> >>>>>> >>>>>> Sebastian >>>>>> >>>>>> >>>>>> >>>>>>> - Carsten >>>>>>> >>>>>>> >>>>>>> On Mar 30, 2009, at 1:49 PM, Daniel Clemente wrote: >>>>>>> >>>>>>>> El dv, mar 27 2009, Sebastian Rose va escriure: >>>>>>>>> >>>>>>>>> What we have now, just as Carstens said: >>>>>>>>> >>>>>>>>> # <> >>>>>>>>> * Section B >>>>>>>>> >>>>>>>>> Creates this headline in HTML: >>>>>>>>> >>>>>>>>>

>>>>>>>> a>2 Section B >>>>>>>>>

>>>>>>>>> >>>>>>>>> This is enough for all the use cases I can think of. >>>>>>>>> >>>>>>>> >>>>>>>> Yes, this is enough except for two things: >>>>>>>> 1. The TOC still links to #sec-2 and the user can't change that >>>>>>>> 2. Your syntax doesn't fold very well in the outliner. I mean: if = you >>>>>>>> use >>>>>>>> >>>>>>>>> # <> >>>>>>>>> * Section B >>>>>>>> >>>>>>>> then the comment appears at the end of the previous section, and y= ou can >>>>>>>> miss >>>>>>>> it when you are viewing the heading =E2=80=9ESection B=E2=80=9C. I= would swap both >>>>>>>> lines >>>>>>>> (solution 1): >>>>>>>> >>>>>>>>> * Section B >>>>>>>>> # <> >>>>>>>> >>>>>>>> But since there are already LOGBOOK drawers under the heading, it = would >>>>>>>> be >>>>>>>> a >>>>>>>> lot clearer to use a property, like EXPORT_ID (solution 2): >>>>>>>> >>>>>>>>> * Section B >>>>>>>>> :PROPERTIES: >>>>>>>>> :EXPORT_ID: human-readable >>>>>>>>> :END: >>>>>>>> >>>>>>>> >>>>>>>> In this way, the TOC can reliably find the EXPORT_ID, and then gen= erate: >>>>>>>>>

>>>>>>>> a>2 Section B >>>>>>>>>

>>>>>>>> >>>>>>>> (You could also leave *just* the human-readable id, but having two= is >>>>>>>> not >>>>>>>> bad. >>>>>>>> >>>>>>>> >>>>>>>> I would prefer solution 1, but I don't because I'm not sure that t= he TOC >>>>>>>> can >>>>>>>> find the ID if it is written as a comment anywhere under the head= ing >>>>>>>> (and >>>>>>>> together with other things). >>>>>>>> >>>>>>>> Solution 2 involves thus: a new property to specify the human- >>>>>>>> readable entry ID, which will be used to link to the entry. The >>>>>>>> automatic >>>>>>>> ID >>>>>>>> (#sec-2) will still work for all entrys. >>>>>>>> >>>>>>>> >>>>>>>>> >>>>>>>>> * Distinguishing automatic and human readable IDs >>>>>>>>> >>>>>>>>> One thing I like is, that we now _can_ distinguish the >>>>>>>>> `human-readable-target' (human readable) from the `sec-2' (not hu= man >>>>>>>>> readable and not context related) using a regular expression. >>>>>>>>> >>>>>>>>> In org-info.js, I can now prefere the human readable ID in fr= om an >>>>>>>>> automatic created one, and thus use that to create the links for = `l' >>>>>>>>> and `L'. The same holds true for other programming languages and >>>>>>>>> parsers. >>>>>>>>> >>>>>>>>> If we open the

's ID for user defined values (bad), we can not >>>>>>>>> distinguish those ID's using a regular expression and there is no= way >>>>>>>>> to detect the human readable one. There will be no way to _know_ = that >>>>>>>>> the 's ID is the prefered one used for human readable links. >>>>>>>>> >>>>>>>> >>>>>>>> Solution 2 doesn't break the parsing techniques you use; in fact i= t can >>>>>>>> also >>>>>>>> make clearer which ID is the human readable one and which one not. >>>>>>>> >>>>>>>> >>>>>>>> This is not extremely important; just useful: >>>>>>>> - for pages with many incoming links from external sites >>>>>>>> - to ensure link integrity (now you can't assure that links will s= till >>>>>>>> work >>>>>>>> in >>>>>>>> 1 year ... or in some weeks) >>>>>>>> - to avoid that HTML visitors get directed to a wrong section and = can't >>>>>>>> find >>>>>>>> what they searched >>>>>>>> >>>>>>>> >>>>>>>> Greetings, >>>>>>>> Daniel >>>>>>>> >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> Emacs-orgmode mailing list >>>>>>>> Remember: use `Reply All' to send replies to the list. >>>>>>>> Emacs-orgmode@gnu.org >>>>>>>> http://lists.gnu.org/mailman/listinfo/emacs-orgmode >>>>>>> >>>>>> >>>>>> -- >>>>>> Sebastian Rose, EMMA STIL - mediendesign, Niemeyerstr.6, 30449 Hanno= ver >>>>>> Tel.: +49 (0)511 - 36 58 472 >>>>>> Fax: +49 (0)1805 - 233633 - 11044 >>>>>> mobil: +49 (0)173 - 83 93 417 >>>>>> Email: s.rose@emma-stil.de, sebastian_rose@gmx.de >>>>>> Http: www.emma-stil.de >>>>> >>>> >>>> -- >>>> Sebastian Rose, EMMA STIL - mediendesign, Niemeyerstr.6, 30449 Hannover >>>> Tel.: +49 (0)511 - 36 58 472 >>>> Fax: +49 (0)1805 - 233633 - 11044 >>>> mobil: +49 (0)173 - 83 93 417 >>>> Email: s.rose@emma-stil.de, sebastian_rose@gmx.de >>>> Http: www.emma-stil.de >>> >> >> -- >> Sebastian Rose, EMMA STIL - mediendesign, Niemeyerstr.6, 30449 Hannover >> Tel.: +49 (0)511 - 36 58 472 >> Fax: +49 (0)1805 - 233633 - 11044 >> mobil: +49 (0)173 - 83 93 417 >> Email: s.rose@emma-stil.de, sebastian_rose@gmx.de >> Http: www.emma-stil.de > --=20 Sebastian Rose, EMMA STIL - mediendesign, Niemeyerstr.6, 30449 Hannover Tel.: +49 (0)511 - 36 58 472 Fax: +49 (0)1805 - 233633 - 11044 mobil: +49 (0)173 - 83 93 417 Email: s.rose@emma-stil.de, sebastian_rose@gmx.de Http: www.emma-stil.de