emacs-orgmode@gnu.org archives
 help / color / mirror / code / Atom feed
* Improve percent escaping links in Org mode (pull request / OK to push)
@ 2011-01-02 19:37 David Maus
  2011-02-12 22:17 ` Bastien
  0 siblings, 1 reply; 22+ messages in thread
From: David Maus @ 2011-01-02 19:37 UTC (permalink / raw)
  To: org-mode, bastien.guerry


[-- Attachment #1.1: Type: text/plain, Size: 3241 bytes --]

This is a pull request or push announcement for the first set of
patches to improve Org mode's percent escaping functions.  This set of
changes solves the problems with percent escaping non-ascii
characters.

git@github.com:dmj/dmj-org-mode.git feature/org-percent-escaping

I do have commit access but because this set of changes might break
things seriously I'd like to get an "OK to push" or someone who pulls
and reviews the changeset.

The problem:

Current implementation of percent escaping URIs uses a whitelist
approach, e.g. only percent escapes characters that are in
`org-link-escape-chars' or in a user supplied list.  This is a problem
because using this function requires knowledge about all possible
characters that could occur in a URI -- and URIs are limited to plain
ASCII, meaning a call to the function must list literally all possible
characters and their escapings to get a properly percent escaped
string.

The changes:

- `org-link-escape' percent escapes every character that matches one
  of the following conditiions:

  * equal 37 (percent sign)
  * equal 127 (DEL, control character)
  * below 32 (control character)
  * above 127 (non-ASCII character)
  * a character in the escaping table (e.g. `org-link-escape-chars')

  The character in question is first encoded in UTF-8, then all bytes
  of the resulting character are percent escaped.  If converting to
  UTF-8 fails, Org throws an error indicating this problem.

  The function got a optional third argument which can be set to merge
  to user defined table with the default escaping table.

- `org-link-unescape' unescapes every percent-escape sequence.  It is
  no longer possible to supply a list of characters that should be
  unescaped.  No function in core used `org-link-unescape' with a
  unescaping table.

  Internally the `org-protocol-unhex-*' functions were renamend to
  `org-link-unescape-*', moved to org.el and refactored (thanks to
  Vincent Belaïche for suggesting some of the changes).  They are
  declared obsolete and aliased per 2010-11-21.

  The unescaping function is backward compatible and unescapes the old
  percent escape format for non-ASCII characters (thanks to Sebastian
  Rose).

  It is possible that the new implementation will break links in at
  least this (known) case: If the user stored a link to a file or
  directory containing a percent sign.  Currently Org mode does not
  percent escape the percent sign and subsequently the new variant of
  `org-link-unescape' will try to unescpae the alleged percent escape
  sequence.[1]

- `org-link-escape-chars' format changed.  It's just a list of
  characters to escape, the percent escape sequence is implied by the
  character.

  Functions in core that used a custom escaping table are changed
  accordingly to use the new table format.

What is next:

  - check if we can fall back to use `url-hexify-string' and
    `url-unhex-string' instead our own functions
  - check if the recent problems with percent escaping are solved

Best,
  -- David

[1] Not escaping the percent sign is actually a glitch: Try to store
and open a link to a file literally called "foo%20baz.org".


[-- Attachment #1.2: Type: application/pgp-signature, Size: 230 bytes --]

[-- Attachment #2: Type: text/plain, Size: 201 bytes --]

_______________________________________________
Emacs-orgmode mailing list
Please use `Reply All' to send replies to the list.
Emacs-orgmode@gnu.org
http://lists.gnu.org/mailman/listinfo/emacs-orgmode

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2011-02-14 10:09 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-01-02 19:37 Improve percent escaping links in Org mode (pull request / OK to push) David Maus
2011-02-12 22:17 ` Bastien
2011-02-13 12:01   ` David Maus
2011-02-13 13:41     ` Bastien
2011-02-14  6:38       ` David Maus
2011-02-14 10:09         ` Bastien
2011-02-13 12:01   ` [PATCH 01/16] Decode single byte sequence if decoding unicode failed David Maus
2011-02-13 12:01   ` [PATCH 02/16] New unicode aware percent encoding algorithm David Maus
2011-02-13 12:01   ` [PATCH 03/16] New format of percent escape table David Maus
2011-02-13 12:01   ` [PATCH 04/16] Fixup doc string David Maus
2011-02-13 12:01   ` [PATCH 05/16] New optional argument: Merge user table with default table David Maus
2011-02-13 12:01   ` [PATCH 06/16] Inline function to properly decode utf8 characters in Emacs 22 David Maus
2011-02-13 12:01   ` [PATCH 07/16] Unescape functions moved and renamed from org-protocol.el David Maus
2011-02-13 12:01   ` [PATCH 08/16] Declare obsolete & alias to respective org-link-unescape-* functions David Maus
2011-02-13 12:01   ` [PATCH 09/16] Remove obsolete argument in call to org-link-unescape David Maus
2011-02-13 12:01   ` [PATCH 10/16] Use new percent escape character table format David Maus
2011-02-13 12:01   ` [PATCH 11/16] Add percent sign to list of escape chars David Maus
2011-02-13 12:01   ` [PATCH 12/16] Rename lambda argument David Maus
2011-02-13 12:01   ` [PATCH 13/16] Refactor unescaping functions David Maus
2011-02-13 12:01   ` [PATCH 14/16] Always percent escape the percent sign David Maus
2011-02-13 12:01   ` [PATCH 15/16] Use `org-link-unescape' instead of obsolete unhex string function David Maus
2011-02-13 12:01   ` [PATCH 16/16] Throw error if encoding character in utf8 fails David Maus

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).