From mboxrd@z Thu Jan 1 00:00:00 1970 From: Sebastian Rose Subject: Re: [bug] org-link-escape and (wrong-type-argument stringp nil) Date: Mon, 20 Sep 2010 21:31:18 +0200 Message-ID: <87sk14rz3t.fsf@gmx.de> References: <87tylkwpq0.fsf@mundaneum.com> <87mxrc1bwj.wl%dmaus@ictsoc.de> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Return-path: Received: from [140.186.70.92] (port=56604 helo=eggs.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1Oxm5O-0003DX-C5 for emacs-orgmode@gnu.org; Mon, 20 Sep 2010 15:31:38 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.69) (envelope-from ) id 1Oxm5J-0008GN-JP for emacs-orgmode@gnu.org; Mon, 20 Sep 2010 15:31:30 -0400 Received: from mailout-de.gmx.net ([213.165.64.23]:34790 helo=mail.gmx.net) by eggs.gnu.org with smtp (Exim 4.69) (envelope-from ) id 1Oxm5J-0008Ft-7K for emacs-orgmode@gnu.org; Mon, 20 Sep 2010 15:31:29 -0400 In-Reply-To: <87mxrc1bwj.wl%dmaus@ictsoc.de> (David Maus's message of "Mon, 20 Sep 2010 20:57:00 +0200") List-Id: "General discussions about Org-mode." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: emacs-orgmode-bounces+geo-emacs-orgmode=m.gmane.org@gnu.org Errors-To: emacs-orgmode-bounces+geo-emacs-orgmode=m.gmane.org@gnu.org To: David Maus Cc: =?utf-8?Q?S=C3=A9bastien?= Vauban , emacs-orgmode@gnu.org David Maus writes: > S=C3=A9bastien Vauban wrote: >>Hello, > >>With current git pull, and such an Org file (in UTF-8 encoding): > >> ... > >>I get the following error when trying to export it via PDFLaTeX: > > The problem is, that the '=C3=89' character is not in Org's default list > for link escapes but `string-match' matches for the lower case > character. Adding more chars to `org-link-escape-chars' would solve > the problem, but this seems to be a broder issue: > > Regular links (URIs) are restricted to a special set of ASCII > characters and non-ascii chars are hex-encoded. Currently Org escapes > links to Org mode headlines using the table mentioned above. But Org > files and hence Org headlines might be Unicode, containing multibyte > characters that cannot be hex-escaped in the normal fashion. > > Maybe something like this would be a solution: > > - Org only escapes square brackets when escaping a link to an Org > mode headline > - `org-link-escape' uses a shotgun-approach: Every char that is not > allowed according to the specs (Cf. RFC3986) is percent encoded if > the link sequence does not contain multibyte chars; If the sequence > does contain multibyte chars, `org-link-escape' produces an IRI > (Cf. RFC3987). Is there a reason for this distinction between multibyte and unibyte? I favour the "shotgun-approach" if not. It's bullet-proof. The JavaScript function `encodeURIComponent()' encodes the German Umlaut `=C3=BC' as `%C3%B6' regardless of the sources encoding actually. That's w= hy I wrote the two functions `org-protocol-unhex-string' and `org-protocol-unhex-compound' (s. org-protocol.el). I'll have to take a look at that RFC you mentioned :) Best wishes Sebastian