From mboxrd@z Thu Jan 1 00:00:00 1970 From: Carsten Dominik Subject: Re: Re: Custom entry IDs in HTML export Date: Fri, 17 Apr 2009 06:11:49 +0200 Message-ID: References: <87myb7w2s9.fsf@CPU107.opentrends.net> <6BF0FCBC-4343-4B8C-9A16-F4B9AC9B0F48@gmail.com> <87eiwiluft.fsf@gollum.intra.norang.ca> <87y6uqwsjw.fsf@kassiopeya.MSHEIMNETZ> <871vsfjkm3.fsf@CPU107.opentrends.net> <1FEE16B4-2913-487C-8822-094FF4EC725C@gmail.com> <878wm1ugml.fsf@kassiopeya.MSHEIMNETZ> <43758593-D9D0-43BC-B4D9-14E036C66271@gmail.com> <87vdp4u4e8.fsf@kassiopeya.MSHEIMNETZ> <87hc0ob9wc.fsf@kassiopeya.MSHEIMNETZ> <70EC5312-4BB8-4A7F-A2AD-7B96CBF7C068@gmail.com> <87prfcgr7m.fsf@kassiopeya.MSHEIMNETZ> Mime-Version: 1.0 (Apple Message framework v930.3) Content-Type: text/plain; charset=WINDOWS-1252; format=flowed; delsp=yes Content-Transfer-Encoding: quoted-printable Return-path: Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1LufQm-0002td-H2 for emacs-orgmode@gnu.org; Fri, 17 Apr 2009 00:12:00 -0400 Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1LufQg-0002t1-Ia for emacs-orgmode@gnu.org; Fri, 17 Apr 2009 00:12:00 -0400 Received: from [199.232.76.173] (port=35250 helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1LufQg-0002sy-Ea for emacs-orgmode@gnu.org; Fri, 17 Apr 2009 00:11:54 -0400 Received: from mail-ew0-f160.google.com ([209.85.219.160]:58433) by monty-python.gnu.org with esmtp (Exim 4.60) (envelope-from ) id 1LufQf-0004sx-JQ for emacs-orgmode@gnu.org; Fri, 17 Apr 2009 00:11:54 -0400 Received: by ewy4 with SMTP id 4so710057ewy.42 for ; Thu, 16 Apr 2009 21:11:52 -0700 (PDT) In-Reply-To: <87prfcgr7m.fsf@kassiopeya.MSHEIMNETZ> List-Id: "General discussions about Org-mode." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: emacs-orgmode-bounces+geo-emacs-orgmode=m.gmane.org@gnu.org Errors-To: emacs-orgmode-bounces+geo-emacs-orgmode=m.gmane.org@gnu.org To: Sebastian Rose Cc: org-mode mailing list , Bernt Hansen On Apr 17, 2009, at 12:37 AM, Sebastian Rose wrote: > Carsten Dominik writes: >> On Apr 16, 2009, at 10:50 PM, Sebastian Rose wrote: >> >>> Carsten Dominik writes: >>>> Hi Sebastian, >>>> >>>> On Apr 16, 2009, at 3:14 PM, Sebastian Rose wrote: >>>> >>>>> Hm - counter arguments? >>>>> >>>>> The only counter argument is, that hand made IDs for links are =20 >>>>> prone to >>>>> error. But that risk should be up to the user. >>>> >>>> Yes. and during the export, I can actually check and throw a =20 >>>> warning or an >>>> error if the same custom ID shows up twice. >>>> >>>>> >>>>> I actually changed my mind a little in this concern. >>>>> >>>>> If the user clicks a section link in the toc to jump to a =20 >>>>> section, he >>>>> can bookmark the page with exactly that jump target. If the jump =20= >>>>> target >>>>> (the ID) is human readable, the bookmark is more verbose. >>>> >>>> Yes, this is really the best application. Also, when hovering =20 >>>> over internal >>>> links, it is helpful if the link displays the human-readable form. >>>> >>>>> Just one wish: >>>>> >>>>> The containers should reflect that change (HRID =3D human readable = =20 >>>>> id): >>>>> >>>>>
>>>>>

headline

>>>>>
>>>>> sections content... >>>>>
>>>>>
>>>> >>>> >>>> Sure, we can do this. I would then add sec-xxx as one >>>> of the alternative anchors as well. >>>> >>>> However: If I make the structure as you indicate above, >>>> do I understand correctly that the structure of a section without a >>>> human-readable id should be changed to this: >>>> >>>>
>>>>

headline

>>>>
>>>> sections content... >>>>
>>>>
>>>> >>>> >>>> Note the "sec-" which is added to the stuff that currently >>>> defines the structure. >>> >>> >>> >>> I considered the `sec-' part of the automatic IDs. >>> >>> In either case I'd have to adjust org-info.js. So why not go for the >>> human readable IDs without `sec-'? >>> >>> >>> Right now we have: >>> >>>
>>>

2 Things I =20= >>> want to find >>> out

>>>
>>> >>> The `sec-' part is in the headlines ID only. >> >> >> Why? Because this introduced a parsing inconsistency for you =20 >> between automatic >> and custom IDs. Because for the automatic ones, you need to strip =20= >> "sec-" to >> retrieve the correct suffix for the container etc names. With the =20= >> custom IDs, >> no such stripping should be done. Does this not make things harder? >> >> - Carsten > > > That's the way it is _now_. The structure above is taken from one of =20= > my > exported org-files. But it's not that hard to strip `sec-' :) > > Now the scanning considers `sec-' a prefix - just like > `outline-container-' and `outline-text-'. > > > But in the future: > > > If we now plan to use human readable IDs in the TOC, those IDs would =20= > be > the IDs of the section heading. That's why those IDs should have no > `sec-' prefix. > > Otherwise, bookmark URLs would not be what we want them: > > http://orgmode.org/org-faq.php#sec-isearch-in-links > > instead of > > http://orgmode.org/org-faq.php#isearch-in-links > > > > Automatic IDs on the other hand must have a prefix, since an ID may > _not_ start with a number. > > > So wouldn't it make sense, to change the IDs of the containers this =20= > way: > > Case _automatic_: > >
>

....

>
> .... >
>
> > Case _human-readable_: > >
>

....

>
> .... >
>
Yes, it does make sense. t only introduces on tiny restriction: A =20 human-readable ID may not be something like sec-555, but that is =20 reasonable, we can document and enforce this. OK. This is what I have done now. You need to use the property =20 CUSTOM_ID. Please do some testing, and then I will document this change. Daniel, could you help testing, please? - Carsten > > ?? > > > Sebastian > > > >>> >>> >>> >>> Sebastian >>> >>> >>> >>> >>>>> That way the script would keep working with older pages. >>>>> Automatic IDs and human readable ones could be mixed. >>>>> >>>>> >>>>> The '' anchors are scanned anyway, as are all jump =20 >>>>> targets in >>>>> the page. >>>> >>>> Yes, you implemented that some time ago, I remember. >>>> >>>>> >>>>> Maybe this is even the point to re-work the parser of org-=20 >>>>> info.js to >>>>> become independent of the TOC at all. The script could search for >>>>> headings instead. That's more work, but the script would then =20 >>>>> work for >>>>> all HTML pages with a structure similar to the org-export's one: >>>> >>>> So this would mean, we could read web pages with your java >>>> support even if those webpages were not created with Org? >>>> Pretty cool. >>>> >>>>>
content
>>>>> >>>>> but I could postpone this, if you fullfill my wish above. >>>> >>>> >>>> Best wishes >>>> >>>> - Carsten >>>> >>>>> >>>>> >>>>> Best wishes >>>>> >>>>> Sebastian >>>>> >>>>> >>>>> >>>>> >>>>> Carsten Dominik writes: >>>>>> On Apr 16, 2009, at 10:50 AM, Sebastian Rose wrote: >>>>>> >>>>>>> Carsten Dominik writes: >>>>>>>> Hi Sebastian, >>>>>>>> >>>>>>>> I kind of like the idea to have a property that can be >>>>>>>> used to set an ID, as an alternative to the <> >>>>>>>> notation. Actually, using a property seems a lot cleaner, >>>>>>>> thanks for coming up with this idea, Daniel. >>>>>>>> >>>>>>>> I can also follow the reasoning that it is useful to have >>>>>>>> the table of contents link to the human-readable id, because >>>>>>>> it provides a general, simple workflow to retrieve a link that >>>>>>>> will persist through changes of the document. This workflow >>>>>>>> was described also by Bernt earlier in this thread. >>>>>>>> >>>>>>>> Finally, I also agree that the main id in the

tag >>>>>>>> should be the automatically generated one because this is >>>>>>>> best for automatic processing and because of all the arguments >>>>>>>> you have presented. >>>>>>>> >>>>>>>> Would it cause problems for org-info.js if the toc points to >>>>>>>> a user specified anchor in the headline, instead of the main >>>>>>>> ID that is inside the

tag? THis would really be the only >>>>>>>> required change. >>>>>>> >>>>>>> >>>>>>> I'll have to test this before I can give a final answer to this >>>>>>> question. >>>>>>> >>>>>>> But regardless of the results, I will adjust the script to =20 >>>>>>> reflect that >>>>>>> change. The script should not rule the HTML export and it will =20= >>>>>>> be an >>>>>>> easy thing to do. >>>>>> >>>>>> But I do want to hear any counter arguments you might have.... >>>>>> >>>>>> - Carsten >>>>>> >>>>>>> >>>>>>> Sebastian >>>>>>> >>>>>>> >>>>>>> >>>>>>>> - Carsten >>>>>>>> >>>>>>>> >>>>>>>> On Mar 30, 2009, at 1:49 PM, Daniel Clemente wrote: >>>>>>>> >>>>>>>>> El dv, mar 27 2009, Sebastian Rose va escriure: >>>>>>>>>> >>>>>>>>>> What we have now, just as Carstens said: >>>>>>>>>> >>>>>>>>>> # <> >>>>>>>>>> * Section B >>>>>>>>>> >>>>>>>>>> Creates this headline in HTML: >>>>>>>>>> >>>>>>>>>>

>>>>>>>>> readable">>>>>>>>>> a>2 Section B >>>>>>>>>>

>>>>>>>>>> >>>>>>>>>> This is enough for all the use cases I can think of. >>>>>>>>>> >>>>>>>>> >>>>>>>>> Yes, this is enough except for two things: >>>>>>>>> 1. The TOC still links to #sec-2 and the user can't change =20 >>>>>>>>> that >>>>>>>>> 2. Your syntax doesn't fold very well in the outliner. I =20 >>>>>>>>> mean: if you >>>>>>>>> use >>>>>>>>> >>>>>>>>>> # <> >>>>>>>>>> * Section B >>>>>>>>> >>>>>>>>> then the comment appears at the end of the previous section, =20= >>>>>>>>> and you can >>>>>>>>> miss >>>>>>>>> it when you are viewing the heading =84Section B=93. I would =20= >>>>>>>>> swap both >>>>>>>>> lines >>>>>>>>> (solution 1): >>>>>>>>> >>>>>>>>>> * Section B >>>>>>>>>> # <> >>>>>>>>> >>>>>>>>> But since there are already LOGBOOK drawers under the =20 >>>>>>>>> heading, it would >>>>>>>>> be >>>>>>>>> a >>>>>>>>> lot clearer to use a property, like EXPORT_ID (solution 2): >>>>>>>>> >>>>>>>>>> * Section B >>>>>>>>>> :PROPERTIES: >>>>>>>>>> :EXPORT_ID: human-readable >>>>>>>>>> :END: >>>>>>>>> >>>>>>>>> >>>>>>>>> In this way, the TOC can reliably find the EXPORT_ID, and =20 >>>>>>>>> then generate: >>>>>>>>>>

>>>>>>>>> readable">>>>>>>>>> a>2 Section B >>>>>>>>>>

>>>>>>>>> >>>>>>>>> (You could also leave *just* the human-readable id, but =20 >>>>>>>>> having two is >>>>>>>>> not >>>>>>>>> bad. >>>>>>>>> >>>>>>>>> >>>>>>>>> I would prefer solution 1, but I don't because I'm not sure =20= >>>>>>>>> that the TOC >>>>>>>>> can >>>>>>>>> find the ID if it is written as a comment anywhere under =20 >>>>>>>>> the heading >>>>>>>>> (and >>>>>>>>> together with other things). >>>>>>>>> >>>>>>>>> Solution 2 involves thus: a new property to specify the human- >>>>>>>>> readable entry ID, which will be used to link to the entry. =20= >>>>>>>>> The >>>>>>>>> automatic >>>>>>>>> ID >>>>>>>>> (#sec-2) will still work for all entrys. >>>>>>>>> >>>>>>>>> >>>>>>>>>> >>>>>>>>>> * Distinguishing automatic and human readable IDs >>>>>>>>>> >>>>>>>>>> One thing I like is, that we now _can_ distinguish the >>>>>>>>>> `human-readable-target' (human readable) from the =20 >>>>>>>>>> `sec-2' (not human >>>>>>>>>> readable and not context related) using a regular expression. >>>>>>>>>> >>>>>>>>>> In org-info.js, I can now prefere the human readable ID in =20= >>>>>>>>>> from an >>>>>>>>>> automatic created one, and thus use that to create the =20 >>>>>>>>>> links for `l' >>>>>>>>>> and `L'. The same holds true for other programming =20 >>>>>>>>>> languages and >>>>>>>>>> parsers. >>>>>>>>>> >>>>>>>>>> If we open the

's ID for user defined values (bad), we =20= >>>>>>>>>> can not >>>>>>>>>> distinguish those ID's using a regular expression and there =20= >>>>>>>>>> is no way >>>>>>>>>> to detect the human readable one. There will be no way to =20 >>>>>>>>>> _know_ that >>>>>>>>>> the 's ID is the prefered one used for human readable =20 >>>>>>>>>> links. >>>>>>>>>> >>>>>>>>> >>>>>>>>> Solution 2 doesn't break the parsing techniques you use; in =20= >>>>>>>>> fact it can >>>>>>>>> also >>>>>>>>> make clearer which ID is the human readable one and which =20 >>>>>>>>> one not. >>>>>>>>> >>>>>>>>> >>>>>>>>> This is not extremely important; just useful: >>>>>>>>> - for pages with many incoming links from external sites >>>>>>>>> - to ensure link integrity (now you can't assure that links =20= >>>>>>>>> will still >>>>>>>>> work >>>>>>>>> in >>>>>>>>> 1 year ... or in some weeks) >>>>>>>>> - to avoid that HTML visitors get directed to a wrong =20 >>>>>>>>> section and can't >>>>>>>>> find >>>>>>>>> what they searched >>>>>>>>> >>>>>>>>> >>>>>>>>> Greetings, >>>>>>>>> Daniel >>>>>>>>> >>>>>>>>> >>>>>>>>> _______________________________________________ >>>>>>>>> Emacs-orgmode mailing list >>>>>>>>> Remember: use `Reply All' to send replies to the list. >>>>>>>>> Emacs-orgmode@gnu.org >>>>>>>>> http://lists.gnu.org/mailman/listinfo/emacs-orgmode >>>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Sebastian Rose, EMMA STIL - mediendesign, Niemeyerstr.6, 30449 =20= >>>>>>> Hannover >>>>>>> Tel.: +49 (0)511 - 36 58 472 >>>>>>> Fax: +49 (0)1805 - 233633 - 11044 >>>>>>> mobil: +49 (0)173 - 83 93 417 >>>>>>> Email: s.rose@emma-stil.de, sebastian_rose@gmx.de >>>>>>> Http: www.emma-stil.de >>>>>> >>>>> >>>>> -- >>>>> Sebastian Rose, EMMA STIL - mediendesign, Niemeyerstr.6, 30449 =20 >>>>> Hannover >>>>> Tel.: +49 (0)511 - 36 58 472 >>>>> Fax: +49 (0)1805 - 233633 - 11044 >>>>> mobil: +49 (0)173 - 83 93 417 >>>>> Email: s.rose@emma-stil.de, sebastian_rose@gmx.de >>>>> Http: www.emma-stil.de >>>> >>> >>> -- >>> Sebastian Rose, EMMA STIL - mediendesign, Niemeyerstr.6, 30449 =20 >>> Hannover >>> Tel.: +49 (0)511 - 36 58 472 >>> Fax: +49 (0)1805 - 233633 - 11044 >>> mobil: +49 (0)173 - 83 93 417 >>> Email: s.rose@emma-stil.de, sebastian_rose@gmx.de >>> Http: www.emma-stil.de >> > > --=20 > Sebastian Rose, EMMA STIL - mediendesign, Niemeyerstr.6, 30449 =20 > Hannover > Tel.: +49 (0)511 - 36 58 472 > Fax: +49 (0)1805 - 233633 - 11044 > mobil: +49 (0)173 - 83 93 417 > Email: s.rose@emma-stil.de, sebastian_rose@gmx.de > Http: www.emma-stil.de