From: Mathias Bauer <mbauer@gmx.org>
To: emacs-orgmode@gnu.org
Subject: Re: Bug: text export and multi-word link descriptions with line breaks
Date: Thu, 3 Apr 2014 18:30:24 +0200 [thread overview]
Message-ID: <20140403163024.GB27299@gmx.org> (raw)
In-Reply-To: <87ha6a4er6.fsf@gmail.com>
Hello Nicolas,
* Nicolas Goaziou wrote on 2014-04-03 at 17:25 (+0200):
> Mathias Bauer <mbauer@gmx.org> writes:
>
> > I just stumbled over Org's plain text export and how it works on
> > links with descriptions consisting of multiple words and line
> > breaks between them. I'm running Org stable version 8.2.5h.
> >
> > Org source (spaces at the end of line 1 and 2 don't matter):
> >
> > --------------------snip--------------------
> > "OpenPGP Message Format" ([[https://tools.ietf.org/html/rfc4880][RFC
> > 4880]] which obsoletes [[https://tools.ietf.org/html/rfc1991][RFC
> > 1991]] and [[https://tools.ietf.org/html/rfc2440][RFC 2440]])...
> > ...
> > foo [[https://tools.ietf.org/html/rfc4880][RFC 4880]] bar
> > baz [[https://tools.ietf.org/html/rfc1991][RFC 1991]] foo
> > bar [[https://tools.ietf.org/html/rfc2440][RFC 2440]] baz
> > --------------------snip--------------------
> >
> > Text export result:
> >
> > --------------------snip--------------------
> > "OpenPGP Message Format" ([RFC 4880] which obsoletes [RFC 1991] and [RFC
> > 2440])... ... foo [RFC 4880] bar baz [RFC 1991] foo bar [RFC 2440] baz
> >
> >
> > [RFC 4880] https://tools.ietf.org/html/rfc4880
> >
> > [RFC 1991] https://tools.ietf.org/html/rfc1991
> >
> > [RFC 2440] https://tools.ietf.org/html/rfc2440
> >
> > [RFC 4880] https://tools.ietf.org/html/rfc4880
> >
> > [RFC 1991] https://tools.ietf.org/html/rfc1991
> > --------------------snip--------------------
> >
> > These multiple references look quite bad. Is it possible to
> > "normalize" the descriptions in some way *before* checking
> > them for uniqueness and output them thereafter?
>
> Could you be more explicit? What does look quite bad? What did
> you expect instead? How is related to line breaks in the
> descriptions?
Ok, let's go into more details. See the Org source text:
1. There are three links and each of them appears twice. The
link targets of every two of them are identical.
2. Each of the two "[...][RFC 2440]" links appear in one line; the
links "[...][RFC 4880]" and "[...][RFC 1991]" each have a
newline in their description. They are in fact
"[...][RFC\n4880]" and "[...][RFC 4880]" and, respectively,
"[...][RFC\n1991]" and "[...][RFC 1991]".
So, now let's examine the Org text export:
The final reference part - the five links below the paragraph -
shows two links, [RFC 4880] and [RFC 1991], which appear twice
but the link [RFC 2440] appears only once there.
This is, at least, inconsistent.
The point is, that Org obviously considers "[...][RFC 4880]" and
"[...][RFC\n4880]" as being two different links internally and
list both of them in the reference part. For this listing, the
\n is removed. This is, what I called "normalization" in my
first post.
Human eyes, however, won't see any difference between this two
forms and start being surprised.
I expect, Org to do the following steps while parsing the source
text:
1. "Normalize" or clean the link description, i.e. remove any
newlines, starting and trailing spaces, and replace any
occurrences of "[ \t]+" in the interior by a single space
only. (To be done.)
2. Check the tuple (description,target) for duplicates and drop
them. (Seems ok to me.)
3. Below the paragraph list the tuples as "[description] target"
in the order of occurrence in the original text. (Also seems
ok to me.)
I hope this makes this issue a little bit more clear now.
Kind regards,
Mathias
next prev parent reply other threads:[~2014-04-03 16:30 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-04-03 14:28 Bug: text export and multi-word link descriptions with line breaks Mathias Bauer
2014-04-03 15:25 ` Nicolas Goaziou
2014-04-03 16:30 ` Mathias Bauer [this message]
2014-04-03 20:54 ` Nicolas Goaziou
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://www.orgmode.org/
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20140403163024.GB27299@gmx.org \
--to=mbauer@gmx.org \
--cc=emacs-orgmode@gnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://git.savannah.gnu.org/cgit/emacs/org-mode.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).