From mboxrd@z Thu Jan 1 00:00:00 1970 From: Tim Cross Subject: Re: minimal testing setup for pdf export? Date: Sat, 31 Aug 2019 11:46:38 +1000 Message-ID: <87ef12hzsx.fsf@gmail.com> References: <20190827081403.03c7b07b@lt70.mpip-mainz.mpg.de> <8fc0dad7-f387-0d81-7196-259f328f7f19@gmx.de> <871rx21koa.fsf@alphaville.usersys.redhat.com> <87ftlii623.fsf@gmail.com> Mime-Version: 1.0 Content-Type: text/plain Return-path: Received: from eggs.gnu.org ([2001:470:142:3::10]:51734) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1i3sTa-0004is-WC for emacs-orgmode@gnu.org; Fri, 30 Aug 2019 21:46:52 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1i3sTY-0000nD-64 for emacs-orgmode@gnu.org; Fri, 30 Aug 2019 21:46:50 -0400 Received: from mail-pg1-x544.google.com ([2607:f8b0:4864:20::544]:32893) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1i3sTX-0000ge-VP for emacs-orgmode@gnu.org; Fri, 30 Aug 2019 21:46:48 -0400 Received: by mail-pg1-x544.google.com with SMTP id n190so4397535pgn.0 for ; Fri, 30 Aug 2019 18:46:46 -0700 (PDT) In-reply-to: List-Id: "General discussions about Org-mode." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-orgmode-bounces+geo-emacs-orgmode=m.gmane.org@gnu.org Sender: "Emacs-orgmode" To: Jude DaShiell Cc: Nick Dokos , emacs-orgmode@gnu.org To be a little precise, Org mode PDF documents created using the default Latex classes are not going to meet minimal accessibility standards. The extent to which they can be accessed using accessibility software will depend largely on the structure of the underlying PDF. I have not investigated other workflows for generating PDFs from org (for example, what happens if you go to some other intermediate format, like HTML or markdown etc) and then to PDF using a different tool to generate the final PDF. Likewise, I don't know if some of the TeX pdf generators are better than others (this is partly why it gets complicated - there are multiple workflows to generate PDFs from Latex). There are people working on additional latex packages to address this accessibility requirement. However, either they are only at experimental status or require significant configuration and setup or require the author to manually add additional data, making them inappropriate for org-mode. With PDFs, the level of accessibility does depend a lot on the structure of the underlying document. Even PDFs without full tagging can be reasonably accessible if the structure of the data in the PDF is straight-forward i.e. not lots of tables, multi-column, lots of footnotes and internal references etc. If the data flow in the document is reasonably 'linear', then it isn't too bad. If the document has lots of embedded postscript or any image like data, that will not be accessible and will not be tagged adequately. Likewise, I've found stuff generated in math mode is typically inaccessible. The various text extraction tools, like pdftotext are able to extract the text. However, because it cannot determine the structure with any accuracy, it can tend to be somewhat jumbled and have a bit of 'garbage'. Again, how good/bad this is depends on the underlying PDF structure. You might find the following links useful - https://www.tug.org/twg/accessibility/ https://tug.org/pipermail/accessibility/2016q4/000005.html You might be able to improve the accessibility of PDFs generated from Latex by adding some of the (mostly experimental) additional packages to your Org mode setup. However, this will probably have some unfortunate side effects or corner cases (which is why I don't just recommend adding these packages as defaults in org itself). There has been an item on my todo list to experiment with this stuff for a long time, but I just never seem to get that far down the list. If you do find some configurations which help make the PDFs more accessible, I would be happy to try adding them to my setup for further testing. We may find some additional packages which improve the situation and don't have unacceptable impact on general org performance and stability. I'm confident that if we can demonstrate this, having these additions added to org-mode defaults would be possible. Tim Jude DaShiell writes: > Okay, orgmode pdf files will be inaccessible for the foreseeable future. > Has anyone had any luck extracting text however mangled from one of > these with pdftotext or similar tools? If that's not possible that will > be another good thing to know. > > On Sat, 31 Aug 2019, Tim Cross wrote: > >> Date: Fri, 30 Aug 2019 19:31:32 >> From: Tim Cross >> To: emacs-orgmode@gnu.org >> Cc: Nick Dokos >> Subject: Re: [O] minimal testing setup for pdf export? >> >> >> I think the main thing which needs to be in the PDF is structure >> 'tagging'. Unfortunately, making truly accessible PDFs is the one area >> I've found where the 'TeX suite is weak. I was tracking some discussions >> about this on the various TeX and Latex lists and it seems that to add >> the necessary information needed to create accessible PDFs requires a >> major redesign of TeX internals. >> >> It has been a while since I looked at this, but I do believe there are >> some add-on latex packages which can help a bit, but creating PDFs which >> meet minimal accessibility requirement tests is currently not possible. >> >> IIRC the speech-disabling feature is not part of the PDF spec. This is >> somethinhg added by Adobe (along with other DRM support). This is no >> 'switch' so to speak in plain PDF documents as the PDF spec predates >> considerations like TTS or even accessibility. >> >> >> Jude DaShiell writes: >> >> > most of the books sold on google play books are speech-disabled by >> > publishers. The adobe accessibility site has speech-enabled >> > accessibility examples. I think it's a matter of a single control that >> > is either enabled or disabled. Oh, the IRS has speech-enabled pdf tax >> > forms anyone can download. I nearly forgot about that one. The 1099R >> > form is a short one so it ought to be pretty quick to find the setting >> > in one of those forms. >> > >> > On Fri, 30 Aug 2019, Nick Dokos wrote: >> > >> >> Date: Fri, 30 Aug 2019 16:07:49 >> >> From: Nick Dokos >> >> To: emacs-orgmode@gnu.org >> >> Subject: Re: [O] minimal testing setup for pdf export? >> >> >> >> Jude DaShiell writes: >> >> >> >> > It would be helpful if when pdf get exported from orgmode they have >> >> > speech enabled by default. >> >> > >> >> >> >> Not sure that org mode can do anything about, since it's LaTeX that produces >> >> the PDF. That said, I'm not sure what needs to be done: what's the difference >> >> between a speech-enabled PDF and a non-speech-enabled one? >> >> >> >> >> >> >> -- Tim Cross