emacs-orgmode@gnu.org archives
 help / color / mirror / code / Atom feed
From: Tim Cross <theophilusx@gmail.com>
To: Jude DaShiell <jdashiel@panix.com>
Cc: Nick Dokos <ndokos@gmail.com>, emacs-orgmode@gnu.org
Subject: Re: minimal testing setup for pdf export?
Date: Sat, 31 Aug 2019 11:46:38 +1000	[thread overview]
Message-ID: <87ef12hzsx.fsf@gmail.com> (raw)
In-Reply-To: <alpine.NEB.2.21.1908301957220.21050@panix1.panix.com>


To be a little precise, Org mode PDF documents created using the default
Latex classes are not going to meet minimal accessibility standards. The
extent to which they can be accessed using accessibility software will
depend largely on the structure of the underlying PDF. I have not
investigated other workflows for generating PDFs from org (for example,
what happens if you go to some other intermediate format, like HTML or
markdown etc) and then to PDF using a different tool to generate the
final PDF. Likewise, I don't know if some of the TeX pdf generators are
better than others (this is partly why it gets complicated - there are
multiple workflows to generate PDFs from Latex). There are people
working on additional latex packages to address this accessibility
requirement. However, either they are only at experimental status or
require significant configuration and setup or require the author to
manually add additional data, making them inappropriate for org-mode.

With PDFs, the level of accessibility does depend a lot on the structure
of the underlying document. Even PDFs without full tagging can be
reasonably accessible if the structure of the data in the PDF is
straight-forward i.e. not lots of tables, multi-column, lots
of footnotes and internal references etc. If the data flow in the
document is reasonably 'linear', then it isn't too bad. If the document
has lots of embedded postscript or any image like data, that will not be
accessible and will not be tagged adequately. Likewise, I've found stuff
generated in math mode is typically inaccessible.

The various text extraction tools, like pdftotext are able to extract
the text. However, because it cannot determine the structure with any
accuracy, it can tend to be somewhat jumbled and have a bit of
'garbage'. Again, how good/bad this is depends on the underlying PDF
structure.

You might find the following links useful -

https://www.tug.org/twg/accessibility/

https://tug.org/pipermail/accessibility/2016q4/000005.html

You might be able to improve the accessibility of PDFs generated from
Latex by adding some of the (mostly experimental) additional packages to
your Org mode setup. However, this will probably have some unfortunate
side effects or corner cases (which is why I don't just recommend adding
these packages as defaults in org itself).

There has been an item on my todo list to experiment with this stuff for
a long time, but I just never seem to get that far down the list. If you
do find some configurations which help make the PDFs more accessible, I
would be happy to try adding them to my setup for further testing. We
may find some additional packages which improve the situation and don't
have unacceptable impact on general org performance and stability. I'm
confident that if we can demonstrate this, having these additions added
to org-mode defaults would be possible.

Tim

Jude DaShiell <jdashiel@panix.com> writes:

> Okay, orgmode pdf files will be inaccessible for the foreseeable future.
>  Has anyone had any luck extracting text however mangled from one of
> these with pdftotext or similar tools?  If that's not possible that will
> be another good thing to know.
>
> On Sat, 31 Aug 2019, Tim Cross wrote:
>
>> Date: Fri, 30 Aug 2019 19:31:32
>> From: Tim Cross <theophilusx@gmail.com>
>> To: emacs-orgmode@gnu.org
>> Cc: Nick Dokos <ndokos@gmail.com>
>> Subject: Re: [O] minimal testing setup for pdf export?
>>
>>
>> I think the main thing which needs to be in the PDF is structure
>> 'tagging'. Unfortunately, making truly accessible PDFs is the one area
>> I've found where the 'TeX suite is weak. I was tracking some discussions
>> about this on  the various TeX and Latex lists and it seems that to add
>> the necessary information needed to create accessible PDFs requires a
>> major redesign of TeX internals.
>>
>> It has been a while since I looked at this, but I do believe there are
>> some add-on latex packages which can help a bit, but creating PDFs which
>> meet minimal accessibility requirement tests is currently not possible.
>>
>> IIRC the speech-disabling feature is not part of the PDF spec. This is
>> somethinhg added by Adobe (along with other DRM support). This is no
>> 'switch' so to speak in plain PDF documents as the PDF spec predates
>> considerations like TTS or even accessibility.
>>
>>
>> Jude DaShiell <jdashiel@panix.com> writes:
>>
>> > most of the books sold on google play books are speech-disabled by
>> > publishers.  The adobe accessibility site has speech-enabled
>> > accessibility examples.  I think it's a matter of a single control that
>> > is either enabled or disabled.  Oh, the IRS has speech-enabled pdf tax
>> > forms anyone can download.  I nearly forgot about that one.  The 1099R
>> > form is a short one so it ought to be pretty quick to find the setting
>> > in one of those forms.
>> >
>> > On Fri, 30 Aug 2019, Nick Dokos wrote:
>> >
>> >> Date: Fri, 30 Aug 2019 16:07:49
>> >> From: Nick Dokos <ndokos@gmail.com>
>> >> To: emacs-orgmode@gnu.org
>> >> Subject: Re: [O] minimal testing setup for pdf export?
>> >>
>> >> Jude DaShiell <jdashiel@panix.com> writes:
>> >>
>> >> > It would be helpful if when pdf get exported from orgmode they have
>> >> > speech enabled by default.
>> >> >
>> >>
>> >> Not sure that org mode can do anything about, since it's LaTeX that produces
>> >> the PDF. That said, I'm not sure what needs to be done: what's the difference
>> >> between a speech-enabled PDF and a non-speech-enabled one?
>> >>
>> >>
>>
>>
>>


--
Tim Cross

      reply	other threads:[~2019-08-31  1:46 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-08-27  1:20 minimal testing setup for pdf export? Matt Price
2019-08-27  3:26 ` Ken Mankoff
2019-08-27  6:14 ` Robert Klein
2019-08-27 10:34   ` Matt Price
2019-08-27 11:55     ` Julius Dittmar
2019-08-27 12:26       ` John Kitchin
2019-08-27 12:57         ` Matt Price
2019-08-27 17:32           ` John Kitchin
2019-08-27 23:33             ` Matt Price
2019-08-28  8:29               ` Julius Dittmar
2019-08-29 15:27               ` John Hendy
2019-08-30 14:13                 ` Matt Price
2019-08-30 14:45                   ` John Hendy
2019-08-30 18:11                     ` Jude DaShiell
2019-08-30 20:07                       ` Nick Dokos
2019-08-30 21:10                         ` Jude DaShiell
2019-08-30 23:31                           ` Tim Cross
2019-08-30 23:59                             ` Jude DaShiell
2019-08-31  1:46                               ` Tim Cross [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.orgmode.org/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87ef12hzsx.fsf@gmail.com \
    --to=theophilusx@gmail.com \
    --cc=emacs-orgmode@gnu.org \
    --cc=jdashiel@panix.com \
    --cc=ndokos@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).