emacs-orgmode@gnu.org archives
 help / color / mirror / code / Atom feed
* Unicode problem with export of literal contents
@ 2023-02-16 10:46 Jens Lechtenboerger
  2023-02-17  5:51 ` Bruno Barbier
  2023-02-17 11:02 ` Ihor Radchenko
  0 siblings, 2 replies; 14+ messages in thread
From: Jens Lechtenboerger @ 2023-02-16 10:46 UTC (permalink / raw)
  To: emacs-orgmode

Hi there,

consider this piece code, where unicode-file.org contains umlauts
(say, just the word “Lechtenbörger”):

(org-export-string-as
 (with-temp-buffer
   (insert-file-contents-literally "unicode-file.org")
   (buffer-string))
 'html t)

With Org 9.6.1 from Emacs master, I get the following warning, and I
am asked to select a coding system:

> These default coding systems were tried to encode the following
> problematic characters in the buffer ‘ *temp*’:
> ...

With previous Org versions, this did not happen, export would just
work.  Note that I insert contents literally because I do not want
‘find-file-hook’, automatic uncompression, etc. (which are avoided
according to the doc string of insert-file-contents-literally).

Could the old behavior be restored?

Best wishes
Jens


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Unicode problem with export of literal contents
  2023-02-16 10:46 Unicode problem with export of literal contents Jens Lechtenboerger
@ 2023-02-17  5:51 ` Bruno Barbier
  2023-02-17  7:48   ` Jens Lechtenboerger
  2023-02-17 11:02 ` Ihor Radchenko
  1 sibling, 1 reply; 14+ messages in thread
From: Bruno Barbier @ 2023-02-17  5:51 UTC (permalink / raw)
  To: Jens Lechtenboerger, emacs-orgmode


Hi Jens,

Jens Lechtenboerger <lechten@wi.uni-muenster.de> writes:

> ...
> Note that I insert contents literally because I do not want
> ‘find-file-hook’, automatic uncompression, etc. (which are avoided
> according to the doc string of insert-file-contents-literally).
>
> Could the old behavior be restored?

By using `insert-file-contents-literally' (as opposed to
`insert-file-contents'), you're also forbidding Emacs to decode the
binary content of your file into text.

My guess is that it was working by chance in previous versions.

In case somebody might help you, here is a simple way to trigger the
encoding question with a recent version of org (mine is Org mode version 9.6.1).

   (with-temp-buffer
      (insert "Lechtenb\303\266rger")
      (org-mode))



Bruno



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Unicode problem with export of literal contents
  2023-02-17  5:51 ` Bruno Barbier
@ 2023-02-17  7:48   ` Jens Lechtenboerger
  2023-02-17 17:30     ` Bruno Barbier
  0 siblings, 1 reply; 14+ messages in thread
From: Jens Lechtenboerger @ 2023-02-17  7:48 UTC (permalink / raw)
  To: Bruno Barbier; +Cc: emacs-orgmode

[-- Attachment #1: Type: text/plain, Size: 1096 bytes --]

Hi Bruno,

On 2023-02-17, Bruno Barbier wrote:

> Hi Jens,
>
> Jens Lechtenboerger <lechten@wi.uni-muenster.de> writes:
>
>> ...
>> Note that I insert contents literally because I do not want
>> ‘find-file-hook’, automatic uncompression, etc. (which are avoided
>> according to the doc string of insert-file-contents-literally).
>>
>> Could the old behavior be restored?
>
> By using `insert-file-contents-literally' (as opposed to
> `insert-file-contents'), you're also forbidding Emacs to decode the
> binary content of your file into text.
>
> My guess is that it was working by chance in previous versions.

in any case, this will introduce failures elsewhere.

> In case somebody might help you, here is a simple way to trigger the
> encoding question with a recent version of org (mine is Org mode version 9.6.1).
>
>    (with-temp-buffer
>       (insert "Lechtenb\303\266rger")
>       (org-mode))

Thank you for the simpler recipe.  This indeed fails now.

So, maybe my question is: Must text be decoded for Org mode from now on?

Best wishes
Jens

[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 6187 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Unicode problem with export of literal contents
  2023-02-16 10:46 Unicode problem with export of literal contents Jens Lechtenboerger
  2023-02-17  5:51 ` Bruno Barbier
@ 2023-02-17 11:02 ` Ihor Radchenko
  2023-02-17 16:56   ` Jens Lechtenboerger
  1 sibling, 1 reply; 14+ messages in thread
From: Ihor Radchenko @ 2023-02-17 11:02 UTC (permalink / raw)
  To: Jens Lechtenboerger; +Cc: emacs-orgmode

Jens Lechtenboerger <lechten@wi.uni-muenster.de> writes:

> With Org 9.6.1 from Emacs master, I get the following warning, and I
> am asked to select a coding system:
>
>> These default coding systems were tried to encode the following
>> problematic characters in the buffer ‘ *temp*’:
>> ...
>
> With previous Org versions, this did not happen, export would just
> work.  Note that I insert contents literally because I do not want
> ‘find-file-hook’, automatic uncompression, etc. (which are avoided
> according to the doc string of insert-file-contents-literally).

This warning appears upon Org calling `secure-hash'.
Org is doing nothing wrong here - your file does not have proper encoding.
You did not see this error in the past by chance.

Not a bug. You need to fix your files with improper encoding.

-- 
Ihor Radchenko // yantar92,
Org mode contributor,
Learn more about Org mode at <https://orgmode.org/>.
Support Org development at <https://liberapay.com/org-mode>,
or support my work at <https://liberapay.com/yantar92>


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Unicode problem with export of literal contents
  2023-02-17 11:02 ` Ihor Radchenko
@ 2023-02-17 16:56   ` Jens Lechtenboerger
  2023-02-17 17:23     ` Bruno Barbier
  2023-02-17 18:22     ` Ihor Radchenko
  0 siblings, 2 replies; 14+ messages in thread
From: Jens Lechtenboerger @ 2023-02-17 16:56 UTC (permalink / raw)
  To: Ihor Radchenko; +Cc: emacs-orgmode

[-- Attachment #1: Type: text/plain, Size: 1323 bytes --]

On 2023-02-17, Ihor Radchenko wrote:

> Jens Lechtenboerger <lechten@wi.uni-muenster.de> writes:
>
>> With Org 9.6.1 from Emacs master, I get the following warning, and I
>> am asked to select a coding system:
>>
>>> These default coding systems were tried to encode the following
>>> problematic characters in the buffer ‘ *temp*’:
>>> ...
>>
>> With previous Org versions, this did not happen, export would just
>> work.  Note that I insert contents literally because I do not want
>> ‘find-file-hook’, automatic uncompression, etc. (which are avoided
>> according to the doc string of insert-file-contents-literally).
>
> This warning appears upon Org calling `secure-hash'.
> Org is doing nothing wrong here - your file does not have proper encoding.
> You did not see this error in the past by chance.

I was afraid you would say so.  To me, this is a breaking change.

Also, when I call secure-hash on the literal buffer-string, no
problem arises.

> Not a bug. You need to fix your files with improper encoding.

The file has the proper encoding.  I insert literally on purpose as
stated above.

It is not obvious that Org tries to write something here and why
that fails now (I could use the results in exporters writing to
files just fine previously).

Best wishes
Jens

[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 6187 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Unicode problem with export of literal contents
  2023-02-17 16:56   ` Jens Lechtenboerger
@ 2023-02-17 17:23     ` Bruno Barbier
  2023-02-20  9:18       ` Jens Lechtenboerger
  2023-02-17 18:22     ` Ihor Radchenko
  1 sibling, 1 reply; 14+ messages in thread
From: Bruno Barbier @ 2023-02-17 17:23 UTC (permalink / raw)
  To: Jens Lechtenboerger, Ihor Radchenko; +Cc: emacs-orgmode

Jens Lechtenboerger <lechten@wi.uni-muenster.de> writes:

> On 2023-02-17, Ihor Radchenko wrote:
>
>> Jens Lechtenboerger <lechten@wi.uni-muenster.de> writes:
>>
>
>> Not a bug. You need to fix your files with improper encoding.
>
> The file has the proper encoding.  I insert literally on purpose as
> stated above.

IIUC, the file has the proper encoding. But, when loading it with
`insert-file-contents-literally', it doesn't: that's part of the
"literally" feature I guess.

When loading it with `insert-file-contents', it should work (it does in
my case).


Here is a way to reproduce that doesn't use org, in case it might help
to manully fix your encoding issue:

   (with-temp-buffer
      (insert "Lechtenb\303\266rger")
      (let ((buffer-file-name (make-temp-file "mailtest")))
        (save-buffer)))

Does it work with your old config (with your old org) ?

What kind of failure do you get elsewhere if you let Emacs use the
correct encoding (i.e. if you use `insert-file-contents') ?



Bruno


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Unicode problem with export of literal contents
  2023-02-17  7:48   ` Jens Lechtenboerger
@ 2023-02-17 17:30     ` Bruno Barbier
  0 siblings, 0 replies; 14+ messages in thread
From: Bruno Barbier @ 2023-02-17 17:30 UTC (permalink / raw)
  To: Jens Lechtenboerger; +Cc: emacs-orgmode


Jens Lechtenboerger <lechten@wi.uni-muenster.de> writes:

> So, maybe my question is: Must text be decoded for Org mode from now on?

Yes. Since forever.  Emacs must know how to read/write from/to files and
what text to display to you. Org is just relying on Emacs for that part.

Bruno



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Unicode problem with export of literal contents
  2023-02-17 16:56   ` Jens Lechtenboerger
  2023-02-17 17:23     ` Bruno Barbier
@ 2023-02-17 18:22     ` Ihor Radchenko
  2023-02-20  9:28       ` Jens Lechtenboerger
  1 sibling, 1 reply; 14+ messages in thread
From: Ihor Radchenko @ 2023-02-17 18:22 UTC (permalink / raw)
  To: Jens Lechtenboerger; +Cc: emacs-orgmode

Jens Lechtenboerger <lechten@wi.uni-muenster.de> writes:

> I was afraid you would say so.  To me, this is a breaking change.

It is not a breaking change. It is Org's change revealing issues with
your files. If you need to edit or act upon that part of the file, you
could see the same problem.

> Also, when I call secure-hash on the literal buffer-string, no
> problem arises.

Org is calling secure-hash on buffer. Calling on buffer-string would
require unnecessary memory allocation to create the string.

> It is not obvious that Org tries to write something here and why
> that fails now

Org is not trying to write something. In you example, Org is just trying
to calculate buffer string hash - nothing wrong on Org side. "Something
wrong with encoding" way my guess. If you think that your case should be
perfectly fine, I recommend asking Emacs devs by filing a bug report to
them.

-- 
Ihor Radchenko // yantar92,
Org mode contributor,
Learn more about Org mode at <https://orgmode.org/>.
Support Org development at <https://liberapay.com/org-mode>,
or support my work at <https://liberapay.com/yantar92>


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Unicode problem with export of literal contents
  2023-02-17 17:23     ` Bruno Barbier
@ 2023-02-20  9:18       ` Jens Lechtenboerger
  2023-02-20  9:40         ` Bruno Barbier
  0 siblings, 1 reply; 14+ messages in thread
From: Jens Lechtenboerger @ 2023-02-20  9:18 UTC (permalink / raw)
  To: Bruno Barbier; +Cc: Ihor Radchenko, emacs-orgmode

[-- Attachment #1: Type: text/plain, Size: 857 bytes --]

On 2023-02-17, Bruno Barbier wrote:

> Here is a way to reproduce that doesn't use org, in case it might help
> to manully fix your encoding issue:
>
>    (with-temp-buffer
>       (insert "Lechtenb\303\266rger")
>       (let ((buffer-file-name (make-temp-file "mailtest")))
>         (save-buffer)))
>
> Does it work with your old config (with your old org) ?

This also asks for an encoding.

> What kind of failure do you get elsewhere if you let Emacs use the
> correct encoding (i.e. if you use `insert-file-contents') ?

I want to be sure to use the file contents in unchanged form, as
promised by insert-file-contents-literally.  For now, I copied part
of the code from insert-file-contents-literally to avoid
after-insert processing and file handlers.  I still do not
understand what is happening differently in my case, though...

Best wishes
Jens

[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 6187 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Unicode problem with export of literal contents
  2023-02-17 18:22     ` Ihor Radchenko
@ 2023-02-20  9:28       ` Jens Lechtenboerger
  0 siblings, 0 replies; 14+ messages in thread
From: Jens Lechtenboerger @ 2023-02-20  9:28 UTC (permalink / raw)
  To: Ihor Radchenko; +Cc: emacs-orgmode

[-- Attachment #1: Type: text/plain, Size: 1068 bytes --]

On 2023-02-17, Ihor Radchenko wrote:

> Jens Lechtenboerger <lechten@wi.uni-muenster.de> writes:

>> Also, when I call secure-hash on the literal buffer-string, no
>> problem arises.
>
> Org is calling secure-hash on buffer. Calling on buffer-string would
> require unnecessary memory allocation to create the string.

I can call secure-hash on the buffer with literally inserted
contents without problems.

>> It is not obvious that Org tries to write something here and why
>> that fails now
>
> Org is not trying to write something. In you example, Org is just trying
> to calculate buffer string hash - nothing wrong on Org side. "Something
> wrong with encoding" way my guess. If you think that your case should be
> perfectly fine, I recommend asking Emacs devs by filing a bug report to
> them.

Thank you for the clarifications.  Probably I have to do that.

For the record, if I insert "Lechtenb\303\266rger" as string into a
buffer, secure-hash asks for a decoding.  If I insert that literally
via an UTF-8 encoded file, secure-hash works.

Best wishes
Jens

[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 6187 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Unicode problem with export of literal contents
  2023-02-20  9:18       ` Jens Lechtenboerger
@ 2023-02-20  9:40         ` Bruno Barbier
  2023-02-20 10:16           ` Jens Lechtenboerger
  0 siblings, 1 reply; 14+ messages in thread
From: Bruno Barbier @ 2023-02-20  9:40 UTC (permalink / raw)
  To: Jens Lechtenboerger; +Cc: Ihor Radchenko, emacs-orgmode


Jens Lechtenboerger <lechten@wi.uni-muenster.de> writes:

> On 2023-02-17, Bruno Barbier wrote:
>
>> Here is a way to reproduce that doesn't use org, in case it might help
>> to manully fix your encoding issue:
>>
>>    (with-temp-buffer
>>       (insert "Lechtenb\303\266rger")
>>       (let ((buffer-file-name (make-temp-file "mailtest")))
>>         (save-buffer)))
>>
>> Does it work with your old config (with your old org) ?
>
> This also asks for an encoding.

If you're always using utf-8, here is a way to force it so that
secure-hash can compute the hash. This should work:

   (with-temp-buffer
      (let ((coding-system-for-write 'utf-8))
        (insert "Lechtenb\303\266rger")
        (secure-hash 'md5 (current-buffer))))

Without setting coding-system-for-write to utf-8, it asks for an
encoding:

   (with-temp-buffer
      (insert "Lechtenb\303\266rger")
      (secure-hash 'md5 (current-buffer)))


I'm still no getting your use case, but, let's hope that this naive hack
is enough for you :-)


Bruno


> Best wishes
> Jens


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Unicode problem with export of literal contents
  2023-02-20  9:40         ` Bruno Barbier
@ 2023-02-20 10:16           ` Jens Lechtenboerger
  2023-02-20 19:00             ` Bruno Barbier
  0 siblings, 1 reply; 14+ messages in thread
From: Jens Lechtenboerger @ 2023-02-20 10:16 UTC (permalink / raw)
  To: Bruno Barbier; +Cc: Ihor Radchenko, emacs-orgmode

[-- Attachment #1: Type: text/plain, Size: 700 bytes --]

On 2023-02-20, Bruno Barbier wrote:

> If you're always using utf-8, here is a way to force it so that
> secure-hash can compute the hash. This should work:
>
>    (with-temp-buffer
>       (let ((coding-system-for-write 'utf-8))
>         (insert "Lechtenb\303\266rger")
>         (secure-hash 'md5 (current-buffer))))

Yes, that works.

However, if I use insert-file-contents-literally with a unicode
file, I do *not* have to set the coding-system-for-write.  This just
works:

   (with-temp-buffer
      (insert-file-contents-literally "~/unicode.org")
      (secure-hash 'md5 (current-buffer)))

In the context of Org export, secure-hash seems to require a coding
system.  Why?

Best wishes
Jens

[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 6187 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Unicode problem with export of literal contents
  2023-02-20 10:16           ` Jens Lechtenboerger
@ 2023-02-20 19:00             ` Bruno Barbier
  2023-02-21  8:30               ` Jens Lechtenboerger
  0 siblings, 1 reply; 14+ messages in thread
From: Bruno Barbier @ 2023-02-20 19:00 UTC (permalink / raw)
  To: Jens Lechtenboerger; +Cc: Ihor Radchenko, emacs-orgmode


Jens Lechtenboerger <lechten@wi.uni-muenster.de> writes:

> On 2023-02-20, Bruno Barbier wrote:
>
> However, if I use insert-file-contents-literally with a unicode
> file, I do *not* have to set the coding-system-for-write.  This just
> works:
>
>    (with-temp-buffer
>       (insert-file-contents-literally "~/unicode.org")
>       (secure-hash 'md5 (current-buffer)))

Humm. Emacs is amazing: it managed to guess the right encoding, from the
buffer context, probably...

But, what you are giving to 'org-export-string-as' is not the buffer,
it's a string. So, let's try the same without using an org function:

     (with-temp-buffer
       (insert (with-temp-buffer
                 (insert-file-contents-literally "~/unicode.org")
                 (buffer-string)))
       (secure-hash 'md5 (current-buffer)))
       
And, that fails, requesting an encoding.


> In the context of Org export, secure-hash seems to require a coding
> system.  Why?

I'm not an expert, so, you'll need to confirm with other sources.  But
secure-hash requires an encoding in all cases, to compute the hash of
some text, because it needs the array of bytes that represents that text
to compute its hash.

I don't see any bug in org, and, I don't see any bug in secure-hash either.

You literally shoud stop using "literally" ;-)

And, you might want to read:
   (info "(elisp) Non-ASCII Characters")



Bruno


>
> Best wishes
> Jens


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Unicode problem with export of literal contents
  2023-02-20 19:00             ` Bruno Barbier
@ 2023-02-21  8:30               ` Jens Lechtenboerger
  0 siblings, 0 replies; 14+ messages in thread
From: Jens Lechtenboerger @ 2023-02-21  8:30 UTC (permalink / raw)
  To: Bruno Barbier; +Cc: Ihor Radchenko, emacs-orgmode

[-- Attachment #1: Type: text/plain, Size: 1676 bytes --]

On 2023-02-20, Bruno Barbier wrote:

> Jens Lechtenboerger <lechten@wi.uni-muenster.de> writes:
>
>> On 2023-02-20, Bruno Barbier wrote:
>>
>> However, if I use insert-file-contents-literally with a unicode
>> file, I do *not* have to set the coding-system-for-write.  This just
>> works:
>>
>>    (with-temp-buffer
>>       (insert-file-contents-literally "~/unicode.org")
>>       (secure-hash 'md5 (current-buffer)))
>
> Humm. Emacs is amazing: it managed to guess the right encoding, from the
> buffer context, probably...
>
> But, what you are giving to 'org-export-string-as' is not the buffer,
> it's a string. So, let's try the same without using an org function:
>
>      (with-temp-buffer
>        (insert (with-temp-buffer
>                  (insert-file-contents-literally "~/unicode.org")
>                  (buffer-string)))
>        (secure-hash 'md5 (current-buffer)))
>        
> And, that fails, requesting an encoding.

Thank you for this example.

>> In the context of Org export, secure-hash seems to require a coding
>> system.  Why?
>
> I'm not an expert, so, you'll need to confirm with other sources.  But
> secure-hash requires an encoding in all cases, to compute the hash of
> some text, because it needs the array of bytes that represents that text
> to compute its hash.
>
> I don't see any bug in org, and, I don't see any bug in secure-hash either.
>
> You literally shoud stop using "literally" ;-)

Indeed.  

> And, you might want to read:
>    (info "(elisp) Non-ASCII Characters")

The first section was already helpful, thanks!  (I still need to
read more of this...)

Best wishes
Jens

[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 6187 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2023-02-21  8:30 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-02-16 10:46 Unicode problem with export of literal contents Jens Lechtenboerger
2023-02-17  5:51 ` Bruno Barbier
2023-02-17  7:48   ` Jens Lechtenboerger
2023-02-17 17:30     ` Bruno Barbier
2023-02-17 11:02 ` Ihor Radchenko
2023-02-17 16:56   ` Jens Lechtenboerger
2023-02-17 17:23     ` Bruno Barbier
2023-02-20  9:18       ` Jens Lechtenboerger
2023-02-20  9:40         ` Bruno Barbier
2023-02-20 10:16           ` Jens Lechtenboerger
2023-02-20 19:00             ` Bruno Barbier
2023-02-21  8:30               ` Jens Lechtenboerger
2023-02-17 18:22     ` Ihor Radchenko
2023-02-20  9:28       ` Jens Lechtenboerger

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).