org-protocol is really a great extension for org-mode! However, I experience an annoyance regarding non-ascii character encoding. When using org-protocol with remember and firefox, all non-ascii characters get b0rked in the remember buffer (on linux, with emacs 23 and org-mode as of today and latest ff 3.0). It doesn't matter if the source page uses html entities or literal non-ascii-characters. Does anyone share this experience and has a suggestion on where to look for the cause? Ulf
Ulf Stegemann <ulf-news@zeitform.de> writes:
> org-protocol is really a great extension for org-mode! However, I
> experience an annoyance regarding non-ascii character encoding.
>
> When using org-protocol with remember and firefox, all non-ascii
> characters get b0rked in the remember buffer (on linux, with emacs 23
> and org-mode as of today and latest ff 3.0). It doesn't matter if the
> source page uses html entities or literal non-ascii-characters. Does
> anyone share this experience and has a suggestion on where to look for
> the cause?
Yes. Same here.
This seems to be a emacs/remember problem though.
If I open a file `xy.txt' and select this text:
lkäüüäüpüpjüpjsf
and then 'C-x r' to remember it, I get this in my remember buffer:
[[file:~/xy.txt::lk%20p%20pj%20pjsf][file:~/xy.txt::lk p pj pjsf]]
Not sure how to work around this yet. Seems to be encoding-related...
Maybe I find some time to into this later today.
Sebastian
Sebastian Rose <sebastian_rose@gmx.de> writes:
> This seems to be a emacs/remember problem though.
>
>
> If I open a file `xy.txt' and select this text:
>
> lkäüüäüpüpjüpjsf
>
> and then 'C-x r' to remember it, I get this in my remember buffer:
>
> [[file:~/xy.txt::lk%20p%20pj%20pjsf][file:~/xy.txt::lk p pj pjsf]]
>
While this is true, incomming text looks corrupted in org-protocol.el too.
On Apr 6, 2009, at 1:53 PM, Sebastian Rose wrote:
> Ulf Stegemann <ulf-news@zeitform.de> writes:
>> org-protocol is really a great extension for org-mode! However, I
>> experience an annoyance regarding non-ascii character encoding.
>>
>> When using org-protocol with remember and firefox, all non-ascii
>> characters get b0rked in the remember buffer (on linux, with emacs 23
>> and org-mode as of today and latest ff 3.0). It doesn't matter if the
>> source page uses html entities or literal non-ascii-characters. Does
>> anyone share this experience and has a suggestion on where to look
>> for
>> the cause?
>
>
> Yes. Same here.
>
> This seems to be a emacs/remember problem though.
>
>
> If I open a file `xy.txt' and select this text:
>
> lkäüüäüpüpjüpjsf
>
> and then 'C-x r' to remember it, I get this in my remember buffer:
>
> [[file:~/xy.txt::lk%20p%20pj%20pjsf][file:~/xy.txt::lk p pj pjsf]]
>
>
This problem might be partially resolved by pulling from git and then
seting
(setq org-url-encoding-use-url-hexify t)
This is for testing only right now.
- Carsten
[-- Attachment #1: Type: text/plain, Size: 1522 bytes --] Hi Ulf and Carsten, here appended is a patch, that does two things. 1. Decode hex-encoded unicode ============================= The new function `org-protocol-unhex-string' correctly decodes unicode hex-enoded, just like the JavaScript function `encodeURIComponent' does. I tested with several unicode and german websites. This is text fetched per org-protocol.el after patching: From the mew homepage (http://www.mew.org/index.html.ja): => --->8----------------------------->8----------------------------->8--- Quelle: [2009-04-16 Do], [[http://www.mew.org/index.html.ja][Mew のオフィシャルページ]] Mewに関する質問はMew-distメーリングリストへ送ってください。 作者個人宛に送っても,返事は戻ってこないかもしれません。 このページへのリンク、書籍・雑誌等での紹介は、 公序良俗に反しない範囲で自由にどうぞ。 <= ---8<-----------------------------8<-----------------------------8<--- 2. Allow a function as second argument to org-protocol-split-data ================================================================= The default decoding function is now `org-protocol-unhex-string', if the second parameter to `org-protocol-split-data' is non-nil. If that parameter is a function, that function is used to decode the split parts. The patch still containes some lines with debugging code, that may be uncommented to see what's going on. [-- Attachment #2: patch-org-protocol.el --] [-- Type: application/emacs-lisp, Size: 3467 bytes --] [-- Attachment #3: Type: text/plain, Size: 26 bytes --] Best Sebastian [-- Attachment #4: Type: text/plain, Size: 204 bytes --] _______________________________________________ Emacs-orgmode mailing list Remember: use `Reply All' to send replies to the list. Emacs-orgmode@gnu.org http://lists.gnu.org/mailman/listinfo/emacs-orgmode
Sorry for replying to my own mail. Reading `man utf-8' might help to understand what the patch does. And, utf-8 is what is decoded here. Sebastian
Hi Sebastian,
this looks like a good solution!
Send me a final patch when you are convinced yourself.
Ulf, can you do some testing, please?
Thanks.
- Carsten
On Apr 16, 2009, at 12:22 AM, Sebastian Rose wrote:
> Hi Ulf and Carsten,
>
>
> here appended is a patch, that does two things.
>
>
> 1. Decode hex-encoded unicode
> =============================
>
> The new function `org-protocol-unhex-string' correctly decodes unicode
> hex-enoded, just like the JavaScript function `encodeURIComponent'
> does.
>
> I tested with several unicode and german websites.
>
> This is text fetched per org-protocol.el after patching:
>
> From the mew homepage (http://www.mew.org/index.html.ja):
>
> => --->8----------------------------->8-----------------------------
> >8---
> Quelle: [2009-04-16 Do], [[http://www.mew.org/index.html.ja][Mew の
> オフィシャルページ]]
>
> Mewに関する質問はMew-distメーリングリストへ送って
> ください。
> 作者個人宛に送っても,返事は戻ってこないかもし
> れません。
> このページへのリンク、書籍・雑誌等での紹介は、
> 公序良俗に反しない範囲で自由にどうぞ。
>
> <=
> ---8<-----------------------------8<-----------------------------8<---
>
>
>
>
> 2. Allow a function as second argument to org-protocol-split-data
> =================================================================
>
> The default decoding function is now `org-protocol-unhex-string', if
> the
> second parameter to `org-protocol-split-data' is non-nil. If that
> parameter is a function, that function is used to decode the split
> parts.
>
>
>
>
>
> The patch still containes some lines with debugging code, that
> may be uncommented to see what's going on.
>
>
> <patch-org-protocol.el>
>
>
> Best
>
> Sebastian
>
>
>
Sebastian, Carsten,
Carsten Dominik <carsten.dominik@gmail.com> wrote:
> Ulf, can you do some testing, please?
done that using Emacs 23 (cvs today), Org (git today + patch), Firefox
3.0.8 on linux. Everything seems to work fine, I haven't found a single
page/text that hasn't been encoded correctly. So I assume that the patch
is working ... at least in the environment mentioned. Thanks for your
effort and the good work :)
Ulf