From: Tim Cross <theophilusx@gmail.com>
To: "Garreau, Alexandre" <galex-713@galex-713.eu>
Cc: emacs-org list <emacs-orgmode@gnu.org>,
Kaushal Modi <kaushal.modi@gmail.com>
Subject: Re: *markup*, /markup/ and _markup_ true semantics [Was: Re: Ox-html: Replace <b> with <strong> and <i> with <em>]
Date: Sat, 27 Oct 2018 07:15:30 +1100 [thread overview]
Message-ID: <87ftwslb19.fsf@gmail.com> (raw)
In-Reply-To: <87a7n1i8lr.fsf_-_@portable.galex-713.eu>
Garreau, Alexandre <galex-713@galex-713.eu> writes:
> Sorry, just found out that interesting (to me) thread I shouldn’t have
> let go:
>
> On 2018-10-25 at 08:00, Tim Cross wrote:
>> Kaushal Modi <kaushal.modi@gmail.com> writes:
>>> […]
>>> - b and i are not deprecated
>>> - b and strong are both valid but their use depends on the writer's
>>> context (but Org mode has just one mark for either "*")
>>> - i and em are both valid but their use depends on the writer's
>>> context (but Org mode has just one mark for either "/").
>>>
>>> […]
>>>
>>> From "em" docs[em], in the NOTE section there:
>>>> The em element isn’t a generic "italics" element. Sometimes, text
>>>> is intended to stand out from the rest of the paragraph, as if it
>>>> was in a different mood or voice. For this, the i element is more
>>>> appropriate.
>>>
>>> […]
>>>
>>> I guess there's no need to change what "*" and "/" do right now in
>>> ox-html, as there doesn't seem "one right way" to do things here.
>>>
>>> And folks strongly wanting to use <strong> and <em> for bold and
>>> italic can customize org-html-text-markup-alist.
>>>
>>> HTML experts, please chime in.
>>
>> I'll start by stating I'm definitely not an HTML expert.
>
> I don’t exactely know what an expert is, at least I’m not a
> professional, but I have passed some time figuring out various HTML
> specs semantic meaning.
>
> More especially, I’ve a big interest in semantics and typography, and
> past many time on my now deleted-crecreated-then-lost github account,
> and mail, to convince people to switch to more semantical markuping (oh,
> and to use complex CSS selectors rather than classes, and stop using
> <div> and <span> at all) and better typography (such as curly quotes,
> simple quotes inside quotes, and many things specific to french).
>
>> The problem with b and i is that they specify how rather than what and
>> don't always make sense for all possible media types. For example, what
>> does 'bold' or 'italic' mean for a screen reader?
>
> Italic means often pronounced with a different pitch afair. Bold
> probably means prounced differently too but I don’t know how this is
> pronounced iirc. I need to recheck with orca and firefox addons (I’ll
> do for a next mail). That might be change accross screenreaders so I
> might have to find some friend having a windows computer with NVDA, JAWS
> or some other non-free program to either ask or check.
>
> I believe the most correct handling for screen readers would be to use
> the appropriate language from the attribute lang or xml:lang of <i> tag,
> otherwise slower and slightly higher pitch, and for <b> the exact same
> higher pitch as caps, without changing speed, plus adding it to an
> easily reachable “keyword-list”, just as <dfn>.
>
> Fyi: both italic, bold, and underline, have been invented in typography
> as special ways of *purposely* making text harder to read. Both the
> intent and result is that the reader taking more time to read something
> in italic, for instance, will memorize it better, and have more free
> time to think about it, hence increasing the importance of this
> something.
>
> In the following “from far” means when you look at the global document
> and are not focusing reading a particular part of it. It doesn’t mean
> you are at a far distance and you can still read it, like it is for
> uppercase.
>
> Italic is the best way, the most readable, as it’s only seen when
> reading, near the text, but not “from far” and doesn’t break structure,
> flowing, or “typographic grey” (“gris typographique”, I’m not aware of
> the english term). It is hence commonly used for emphasis (best usage:
> if ever it gets long, it gets hard to read, but that reflects the fact
> original meaning was hard to grasp or hear or say originally), citation
> of artistical work names (such as books: conventional usage, but still
> okay, as these are mostly short anyway), and quotations (discouraged
> usage as they can get long (and thus unreadable) and quote marks cover
> this, *not* to be used *along* with them, never, as it is terribly
> redundant and almost no serious professional printer do that).
>
> Bold is sometimes harder to read, and sometimes, if not too bold,
> easier, however it’s really easy to “notice” its text from looking afar:
> therefore it’s normally *exclusively* recommanded for text structures,
> whose *role* is to purposely cut in parts the text, that is: *outlines*.
> However, in an attempt of pseudo-backward compatibility and “but look
> everybody was okay since the beginning”, by the W3C, another usage for
> bold than in outlines has been found: keywords. These are *meant* to be
> seen from far, are usually small (one word), and yet wouldn’t alter text
> structure, and might not be candidate for <dfn> (however most time they
> should).
>
> Underline is to be banned from everywhere, theorically. It is an
> especially simple and awful way of making text unreadable: it cuts the
> legs of non-zero-ascent letters (making as hard to read as italic) *and*
> is easy to lookup from far, yet you can notice the underline without
> having the word easily and quickly grasped when seen from far, like
> bold. Iirc it has been invented for typewriters because italic wasn’t
> available, for which it is the poorest candidate ever. It is also used
> in manuscript text, as people actually trying to manually write in
> italics or bold are nowadays few and others are often unable to do so.
> Most time I saw it used manuscriptly to anotate and highlight text.
> Conventions has been developed around this: in typewriter as well as
> manuscript text, you normally *only* use it for artistic works names
> (instead of italic), and blue hyperlinks. It is sad it has developed as
> a such important convention but it is done, clear, and well established.
>
> The W3C meaning of “added text” seems quite somewhat artificial to me,
> as it is not more conventional to use it for “added changes” than any
> other typographic convention. However it is necessarily *one of these*,
> as it is commonly used to highlight and anotate text (however the <mark>
> tag is here for that, in HTML).
>
>> I do believe we should move away from b/i to strong/em as I think these
>> are the correct semantic tags to use and are generally what is
>> preferred. This means they are also likely to already have appropriate
>> 'styling' in many 'canned' styles and valid consistent interpretations
>> for different media types.
>
> This is unsemantic (and is giving org markup a presentational rather
> than semantic role, so I strongly oppose this) and could break true
> accessibility. I’d say ideally what we should have is more markup to be
> compatible with HTML, as recently, with XHTML1, 2 and HTML5, it has
> become one of the richer and most clearly defined markup language
> available. However as org, comparably to markdown and rst, is trying to
> achieve some compatibility with classical clear-text markuping, such as
> in email, and from what semantics I detected, I’d say the following:
> –tag “*” with <em>, maybe find cases where “<b>” might be appropriate
> (for keywords, typically): I’d say an interesting experiment would,
> for some given languages (such as english, to begin) detect if an
> article (“the”, “a”, “an”…) is part of the markup: then it’s not a
> keyword (hence <em>), if it’s *preceding* the markup, then more
> probably it is a markup (but not necessarily);
> —tag “/” with <cite>, as this match the most accurate and commonly
> meaning of “/”, “_” might be appropriate as well, but may be redundant
> (so a safe (potentially usable as buffer-local) custom var would do
> better). However there are some cases where “/” would be more
> appropriate as <i> (I’d say the vast majority of occurences are words
> from foreign languages, other are most often incorrect and abusive
> usage of “/”);
> —tag “_” as either <cite>, if correct var is of the correct value, or
> <ins>, *only* if near “+” markup. Otherwise, as org only use “[]” for
> hyperlinks, I don’t know.
>
> Note that, indeed, “<strong>” has no usage. If it was up to me it
> should be banned. Maybe its most accurate usage would be for upcase
> urgent emphasis-text: *URGENT: READ THIS NOW OR YOU WILL DIE* (you might
> use <strong> if absolutely wanting to, for upcase emphasis text, or
> emphasis text containing “urgent:” or “important:”, and differently
> localized versions (format-level linguistic imperialism, bla bla: note
> for the same very reason this would work as is for french, but me and
> many people would funnily feel more reassured, respected or whatever if
> they were blessed by being in a list whose car is "fr")).
>
>> I don't think this is something that is urgent, but it is the
>> direction we should go. The only real reason for sooner rather than
>> later is that we can probably simplify some of the exporters and
>> ensure any new exporters are correct and won't need to be change
>> retrospectively.
>
> This has to be a semantics work to be reported on *all* semantic
> backends. As there are “accessibility” workaround for almost all
> formats (even PDF, which is understandable as it got important and
> widely used, while normally meant only for printing, hence display, not
> semantics (but you know, these days, you can put javascript in these…)),
> this may mean “every backend”.
I have either misunderstood most of your position or I simply disagree
with it - I'm not sure which.
- Much of what you argue seems to be based around ideas associated with
typography. IMO this is where things fall down. Typography is really
only relevant to 'printing' (either on paper or screen). Markup is not
just about printing - it is about conveying what the author wanted and
how that is best interpreted will depend on the media being used
(i.e. how the content is 'rendered') and should largely be up to the
consumer.
- I am a screen reader user. While you are correct that pitch, tone,
speed and different voices are often used to convey things like 'bold'
or 'italic', there is no universally accepted rule for this
interpretation, at least not in the same sense as there is with
typography. We all know what bold or italic looks like, but there is
no agreement as to what these should sound like. When you use Jaws,
you will get a different result from when you use Orca or Emacspeak or
Window Eyes or .... However, this shouldn't really matter - how these
are 'rendered' should ideally be under the control of the individual
consuming the content. When I consume a document, it should be my
decision as to how the content is presented and for me, interpreting
'strong' or 'emphasis' seems to be far clearer than 'bold' or
'italic'.
- I don't believe there is any strong reason that the markup used by org
should have any strong reference to HTML in appearance. Org supports
many different backends, many of which don't have anything to do with
HTML at all. It is perhaps unfortunate that Org syntax and markdown
are quite different (though I feel the unfortunate part is that
markdown didn't follow org more closely as I much prefer Org's syntax
to most markdown semantics).
- Probably the number 1 issue I come across when dealing with markup is
the expectation too many authors have that things will be rendered in
the browser in a specific way (a particular font, colour, position,
size, etc). This is a mistake. The big advantage of electronic
presentation is that for the first time, the consumer can have control
over the presentation - they can customise it to meet their
requirements or preferences. The problem with <b> and <i> is that it
gives authors an expectation their content will be rendered in a
specific way. Some may argue that the author should be able to control
how their content is rendered. I think this is misleading because
unlike printed material, the author has no control over the
presentation media - they don't know how large the screen is, what the
capabilities of the screen is, what fonts are installed
etc. Therefore, tags which focus on meaning i.e. I want this to stand
out or I want this to be emphasised are clearer than tags which say to
make this bold or make this italic.
The debate over <i>, <b>, <strong> and <em> is likely to continue for
some years yet. I do think things are moving towards <strong>/<em> and
nearly everything I read these days recommends these over <i> and
<b>. It is pretty well accepted that XHTML was a mistake and HTML5 goes
a long way to address the issues introduced with XHTML - I think XHTML
as a standard is pretty much relegated to an evolutionary dead end. I do
agree <div> is over used. In particular, HTML5 has a number of new tags
which should be used to convey document structure which would be a
better choice than <div> with different 'class' attributes. However, we
will continue to see a lot of div tags, even when authors begin to use
newer tags - at least it is a lot better than the early days when
everything was stuck inside tables! Backends which generate HTML should
be generating HTML5 compliant output if for no other reason than it is
clearer and easier than XHTML.
As to the OP's original question regarding changing <b> and <i> in HTML
backends - while I would vote for strong/em over b/i, I don't think
there is any real need to do this, certainly not in the short term. As
was pointed out b/i has not been deprecated, so it is still valid. There
is no suggestion to change Org's own internal markup (ironically
referred to as bold and italic!), so overall, the status quo seems fine.
Tim
.
--
Tim Cross
next prev parent reply other threads:[~2018-10-26 20:15 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-10-24 0:38 Ox-html: Replace <b> with <strong> and <i> with <em> Kaushal Modi
2018-10-24 6:04 ` Nicolas Goaziou
2018-10-24 15:14 ` Kaushal Modi
2018-10-24 21:00 ` Tim Cross
2018-10-26 5:24 ` *markup*, /markup/ and _markup_ true semantics [Was: Re: Ox-html: Replace <b> with <strong> and <i> with <em>] Garreau, Alexandre
2018-10-26 20:15 ` Tim Cross [this message]
2018-10-27 12:52 ` Garreau, Alexandre
2018-10-28 21:19 ` Tim Cross
2018-10-28 21:46 ` Neil Jerram
2018-10-28 22:43 ` *markup*, /markup/ and _markup_ true semantics Garreau, Alexandre
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://www.orgmode.org/
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87ftwslb19.fsf@gmail.com \
--to=theophilusx@gmail.com \
--cc=emacs-orgmode@gnu.org \
--cc=galex-713@galex-713.eu \
--cc=kaushal.modi@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://git.savannah.gnu.org/cgit/emacs/org-mode.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).