This is a pull request or push announcement for the first set of patches to improve Org mode's percent escaping functions. This set of changes solves the problems with percent escaping non-ascii characters. git@github.com:dmj/dmj-org-mode.git feature/org-percent-escaping I do have commit access but because this set of changes might break things seriously I'd like to get an "OK to push" or someone who pulls and reviews the changeset. The problem: Current implementation of percent escaping URIs uses a whitelist approach, e.g. only percent escapes characters that are in `org-link-escape-chars' or in a user supplied list. This is a problem because using this function requires knowledge about all possible characters that could occur in a URI -- and URIs are limited to plain ASCII, meaning a call to the function must list literally all possible characters and their escapings to get a properly percent escaped string. The changes: - `org-link-escape' percent escapes every character that matches one of the following conditiions: * equal 37 (percent sign) * equal 127 (DEL, control character) * below 32 (control character) * above 127 (non-ASCII character) * a character in the escaping table (e.g. `org-link-escape-chars') The character in question is first encoded in UTF-8, then all bytes of the resulting character are percent escaped. If converting to UTF-8 fails, Org throws an error indicating this problem. The function got a optional third argument which can be set to merge to user defined table with the default escaping table. - `org-link-unescape' unescapes every percent-escape sequence. It is no longer possible to supply a list of characters that should be unescaped. No function in core used `org-link-unescape' with a unescaping table. Internally the `org-protocol-unhex-*' functions were renamend to `org-link-unescape-*', moved to org.el and refactored (thanks to Vincent Belaïche for suggesting some of the changes). They are declared obsolete and aliased per 2010-11-21. The unescaping function is backward compatible and unescapes the old percent escape format for non-ASCII characters (thanks to Sebastian Rose). It is possible that the new implementation will break links in at least this (known) case: If the user stored a link to a file or directory containing a percent sign. Currently Org mode does not percent escape the percent sign and subsequently the new variant of `org-link-unescape' will try to unescpae the alleged percent escape sequence.[1] - `org-link-escape-chars' format changed. It's just a list of characters to escape, the percent escape sequence is implied by the character. Functions in core that used a custom escaping table are changed accordingly to use the new table format. What is next: - check if we can fall back to use `url-hexify-string' and `url-unhex-string' instead our own functions - check if the recent problems with percent escaping are solved Best, -- David [1] Not escaping the percent sign is actually a glitch: Try to store and open a link to a file literally called "foo%20baz.org".