From mboxrd@z Thu Jan 1 00:00:00 1970 From: Nick Dokos Subject: Re: Org mode, minted, and non-ASCII Date: Thu, 05 Jan 2012 14:47:49 -0500 Message-ID: <7757.1325792869@alphaville.americas.hpqcorp.net> References: <87vcoqdeoq.fsf@iro.umontreal.ca> Reply-To: nicholas.dokos@hp.com Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Return-path: Received: from eggs.gnu.org ([140.186.70.92]:35363) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1RitI0-00085C-V8 for emacs-orgmode@gnu.org; Thu, 05 Jan 2012 14:47:53 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1RitHz-0000yo-NS for emacs-orgmode@gnu.org; Thu, 05 Jan 2012 14:47:52 -0500 Received: from g1t0027.austin.hp.com ([15.216.28.34]:30446) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1RitHz-0000yj-Gg for emacs-orgmode@gnu.org; Thu, 05 Jan 2012 14:47:51 -0500 In-Reply-To: Message from pinard@iro.umontreal.ca (=?us-ascii?Q?=3D=3Futf-?= =?us-ascii?Q?8=3FQ=3FFran=3DC3=3DA7ois=3F=3D?= Pinard) of "Wed\, 04 Jan 2012 22\:40\:56 EST." <87vcoqdeoq.fsf@iro.umontreal.ca> List-Id: "General discussions about Org-mode." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-orgmode-bounces+geo-emacs-orgmode=m.gmane.org@gnu.org Sender: emacs-orgmode-bounces+geo-emacs-orgmode=m.gmane.org@gnu.org To: =?us-ascii?Q?=3D=3Futf-8=3FQ=3FFran=3DC3=3DA7?= =?us-ascii?Q?ois=3F=3D?= Pinard Cc: nicholas.dokos@hp.com, emacs-orgmode@gnu.org Fran=C3=A7ois Pinard wrote: > Hi, Org people. >=20 > Still experimenting around for this report, I installed *minted* so one > of the appendices might nicely display a bulky bit of Python code. >=20 > It works satisfactorily (and speedily enough) if I squash out all > diacriticized and other Unicode special symbols in the file. However, > no output is produced if I leave the tiniest non-ASCII character in the > file. OK, OK, don't kill me :-). Agreed that all non-ASCII characters > are neither tinier or bigger than one another in this context. >=20 > The Org document, the Python sources, and the default charset for this > machine are all UTF-8. I saw no Unicode problem between Unicode and > LaTeX when minted is not in the picture. pygmentize also appears to do > well with Unicode input. >=20 > So the problem likely lies either between Org mode and minted LaTex, or > within minted. Is that a known problem or limitation? >=20 > This problem is bit more hurtful here, as the Python code really uses > Unicode, and mangling out Unicode characters really changes the semantic > of the code as displayed in the report. If it was not for this problem, > the minted output is attractive, at least more than what I saw with the > listings package. On last resort and of course, I may still include an > unfontified Python source in the appendix, or produce it by other means; > not such a big deal, it's just that I would have liked to impress my > coworkers a bit more with Org mode integration and capabilities. :-). >=20 > To confuse me a little more, I'm getting random (I mean, unpredictable > by me) "org-mode fontification error" diagnostics while creating the PDF > output. Perusing org.el tells me that this is likely a mere > coincidence, as those fontification errors seem wholly unrelated to > LaTeX processing. >=20 Yes, indeed it seems to be something that minted is doing (or not doing). The following tex file, python program and Makefile illustrate that pygmentize and latex are fine as you stated. But when minted is inserted into the mix, all hell breaks loose. I tried modifying minted.sty to introduce utf-8 encoding options in the two places where pygmentize is called, but this still does not work for me. I never used pygmentize from the command line before. I believe the Makefile describes the proper usage, but I'd appreciate corrections before I dive into minted. Nick PS ... and yes, I know that the "german" in the following is just nonsense = ;-) Makefile: --8<---------------cut here---------------start------------->8--- view: fp.pdf xpdf fp.pdf fp.pdf: fp.tex fp.py pygmentize -S default -f latex -P "encoding=3Dutf-8" > fp.pyg pygmentize -l python -f latex -F tokenmerge -P "encoding=3Dutf-8" -P "verb= options=3D " -o fp.out.pyg fp.py pdflatex -shell-escape fp.tex pdflatex -shell-escape fp.tex pdflatex -shell-escape fp.tex clean: rm -f *~ fp.aux fp.pyg fp.out.pyg fp.log fp.toc fp.dvi fp.pdf --8<---------------cut here---------------end--------------->8--- fp.py: --8<---------------cut here---------------start------------->8--- #! /usr/bin/env python # -*- coding: utf-8 -*- """ """ import sys x =3D 'This is a unicode string mit =C3=BCml=C3=A4ute und gro=C3=9Fen probl= emen.' def main(args): print x return 0 if __name__ =3D=3D '__main__': status =3D main(sys.argv[1:]) sys.exit(status) --8<---------------cut here---------------end--------------->8--- fp.tex: --8<---------------cut here---------------start------------->8--- \documentclass{article} \usepackage[utf8]{inputenc} \usepackage{fancyvrb} \usepackage{color} \begin{document} \section{foo} Pygmentize can deal with =C3=BCnicode with n=C3=B6 problems (given -P ``enc= oding=3Dutf-8'' options). \input{fp.pyg} \input{fp.out.pyg} \end{document} %%% Local Variables:=20 %%% mode: latex %%% TeX-master: t %%% End:=20 --8<---------------cut here---------------end--------------->8---