emacs-orgmode@gnu.org archives
 help / color / mirror / code / Atom feed
From: Mathias Bauer <mbauer@gmx.org>
To: emacs-orgmode@gnu.org
Subject: Re: Bug: text export and multi-word link descriptions with line breaks
Date: Thu, 3 Apr 2014 18:30:24 +0200	[thread overview]
Message-ID: <20140403163024.GB27299@gmx.org> (raw)
In-Reply-To: <87ha6a4er6.fsf@gmail.com>

Hello Nicolas,

* Nicolas Goaziou wrote on 2014-04-03 at 17:25 (+0200):

> Mathias Bauer <mbauer@gmx.org> writes:
>
> > I just stumbled over Org's plain text export and how it works on
> > links with descriptions consisting of multiple words and line
> > breaks between them.  I'm running Org stable version 8.2.5h.
> >
> > Org source (spaces at the end of line 1 and 2 don't matter):
> >
> > --------------------snip--------------------
> > "OpenPGP Message Format" ([[https://tools.ietf.org/html/rfc4880][RFC
> > 4880]] which obsoletes [[https://tools.ietf.org/html/rfc1991][RFC
> > 1991]] and [[https://tools.ietf.org/html/rfc2440][RFC 2440]])...
> > ...
> > foo [[https://tools.ietf.org/html/rfc4880][RFC 4880]] bar
> > baz [[https://tools.ietf.org/html/rfc1991][RFC 1991]] foo
> > bar [[https://tools.ietf.org/html/rfc2440][RFC 2440]] baz
> > --------------------snip--------------------
> >
> > Text export result:
> >
> > --------------------snip--------------------
> > "OpenPGP Message Format" ([RFC 4880] which obsoletes [RFC 1991] and [RFC
> > 2440])...  ...  foo [RFC 4880] bar baz [RFC 1991] foo bar [RFC 2440] baz
> >
> >
> > [RFC 4880] https://tools.ietf.org/html/rfc4880
> >
> > [RFC 1991] https://tools.ietf.org/html/rfc1991
> >
> > [RFC 2440] https://tools.ietf.org/html/rfc2440
> >
> > [RFC 4880] https://tools.ietf.org/html/rfc4880
> >
> > [RFC 1991] https://tools.ietf.org/html/rfc1991
> > --------------------snip--------------------
> >
> > These multiple references look quite bad.  Is it possible to
> > "normalize" the descriptions in some way *before* checking
> > them for uniqueness and output them thereafter?
>
> Could you be more explicit? What does look quite bad? What did
> you expect instead? How is related to line breaks in the
> descriptions?

Ok, let's go into more details.  See the Org source text:

1. There are three links and each of them appears twice.  The
   link targets of every two of them are identical.

2. Each of the two "[...][RFC 2440]" links appear in one line; the
   links "[...][RFC 4880]" and "[...][RFC 1991]" each have a
   newline in their description.  They are in fact
   "[...][RFC\n4880]" and "[...][RFC 4880]" and, respectively,
   "[...][RFC\n1991]" and "[...][RFC 1991]".

So, now let's examine the Org text export:

The final reference part - the five links below the paragraph -
shows two links, [RFC 4880] and [RFC 1991], which appear twice
but the link [RFC 2440] appears only once there.

This is, at least, inconsistent.

The point is, that Org obviously considers "[...][RFC 4880]" and
"[...][RFC\n4880]" as being two different links internally and
list both of them in the reference part.  For this listing, the
\n is removed.  This is, what I called "normalization" in my
first post.

Human eyes, however, won't see any difference between this two
forms and start being surprised.

I expect, Org to do the following steps while parsing the source
text:

1. "Normalize" or clean the link description, i.e. remove any
   newlines, starting and trailing spaces, and replace any
   occurrences of "[ \t]+" in the interior by a single space
   only.  (To be done.)

2. Check the tuple (description,target) for duplicates and drop
   them.  (Seems ok to me.)

3. Below the paragraph list the tuples as "[description] target"
   in the order of occurrence in the original text.  (Also seems
   ok to me.)

I hope this makes this issue a little bit more clear now.

Kind regards,
Mathias

  reply	other threads:[~2014-04-03 16:30 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-04-03 14:28 Bug: text export and multi-word link descriptions with line breaks Mathias Bauer
2014-04-03 15:25 ` Nicolas Goaziou
2014-04-03 16:30   ` Mathias Bauer [this message]
2014-04-03 20:54     ` Nicolas Goaziou

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.orgmode.org/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140403163024.GB27299@gmx.org \
    --to=mbauer@gmx.org \
    --cc=emacs-orgmode@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).