* Bug: text export and multi-word link descriptions with line breaks @ 2014-04-03 14:28 Mathias Bauer 2014-04-03 15:25 ` Nicolas Goaziou 0 siblings, 1 reply; 4+ messages in thread From: Mathias Bauer @ 2014-04-03 14:28 UTC (permalink / raw) To: emacs-orgmode Dear Maintainers, I just stumbled over Org's plain text export and how it works on links with descriptions consisting of multiple words and line breaks between them. I'm running Org stable version 8.2.5h. Org source (spaces at the end of line 1 and 2 don't matter): --------------------snip-------------------- "OpenPGP Message Format" ([[https://tools.ietf.org/html/rfc4880][RFC 4880]] which obsoletes [[https://tools.ietf.org/html/rfc1991][RFC 1991]] and [[https://tools.ietf.org/html/rfc2440][RFC 2440]])... ... foo [[https://tools.ietf.org/html/rfc4880][RFC 4880]] bar baz [[https://tools.ietf.org/html/rfc1991][RFC 1991]] foo bar [[https://tools.ietf.org/html/rfc2440][RFC 2440]] baz --------------------snip-------------------- Text export result: --------------------snip-------------------- "OpenPGP Message Format" ([RFC 4880] which obsoletes [RFC 1991] and [RFC 2440])... ... foo [RFC 4880] bar baz [RFC 1991] foo bar [RFC 2440] baz [RFC 4880] https://tools.ietf.org/html/rfc4880 [RFC 1991] https://tools.ietf.org/html/rfc1991 [RFC 2440] https://tools.ietf.org/html/rfc2440 [RFC 4880] https://tools.ietf.org/html/rfc4880 [RFC 1991] https://tools.ietf.org/html/rfc1991 --------------------snip-------------------- These multiple references look quite bad. Is it possible to "normalize" the descriptions in some way *before* checking them for uniqueness and output them thereafter? Thanks for considering this issue. Kind regards Mathias ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Bug: text export and multi-word link descriptions with line breaks 2014-04-03 14:28 Bug: text export and multi-word link descriptions with line breaks Mathias Bauer @ 2014-04-03 15:25 ` Nicolas Goaziou 2014-04-03 16:30 ` Mathias Bauer 0 siblings, 1 reply; 4+ messages in thread From: Nicolas Goaziou @ 2014-04-03 15:25 UTC (permalink / raw) To: emacs-orgmode Hello, Mathias Bauer <mbauer@gmx.org> writes: > I just stumbled over Org's plain text export and how it works on > links with descriptions consisting of multiple words and line > breaks between them. I'm running Org stable version 8.2.5h. > > Org source (spaces at the end of line 1 and 2 don't matter): > > --------------------snip-------------------- > "OpenPGP Message Format" ([[https://tools.ietf.org/html/rfc4880][RFC > 4880]] which obsoletes [[https://tools.ietf.org/html/rfc1991][RFC > 1991]] and [[https://tools.ietf.org/html/rfc2440][RFC 2440]])... > ... > foo [[https://tools.ietf.org/html/rfc4880][RFC 4880]] bar > baz [[https://tools.ietf.org/html/rfc1991][RFC 1991]] foo > bar [[https://tools.ietf.org/html/rfc2440][RFC 2440]] baz > --------------------snip-------------------- > > Text export result: > > --------------------snip-------------------- > "OpenPGP Message Format" ([RFC 4880] which obsoletes [RFC 1991] and [RFC > 2440])... ... foo [RFC 4880] bar baz [RFC 1991] foo bar [RFC 2440] baz > > > [RFC 4880] https://tools.ietf.org/html/rfc4880 > > [RFC 1991] https://tools.ietf.org/html/rfc1991 > > [RFC 2440] https://tools.ietf.org/html/rfc2440 > > [RFC 4880] https://tools.ietf.org/html/rfc4880 > > [RFC 1991] https://tools.ietf.org/html/rfc1991 > --------------------snip-------------------- > > These multiple references look quite bad. Is it possible to > "normalize" the descriptions in some way *before* checking them > for uniqueness and output them thereafter? > > Thanks for considering this issue. Could you be more explicit? What does look quite bad? What did you expect instead? How is related to line breaks in the descriptions? Regards, -- Nicolas Goaziou ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Bug: text export and multi-word link descriptions with line breaks 2014-04-03 15:25 ` Nicolas Goaziou @ 2014-04-03 16:30 ` Mathias Bauer 2014-04-03 20:54 ` Nicolas Goaziou 0 siblings, 1 reply; 4+ messages in thread From: Mathias Bauer @ 2014-04-03 16:30 UTC (permalink / raw) To: emacs-orgmode Hello Nicolas, * Nicolas Goaziou wrote on 2014-04-03 at 17:25 (+0200): > Mathias Bauer <mbauer@gmx.org> writes: > > > I just stumbled over Org's plain text export and how it works on > > links with descriptions consisting of multiple words and line > > breaks between them. I'm running Org stable version 8.2.5h. > > > > Org source (spaces at the end of line 1 and 2 don't matter): > > > > --------------------snip-------------------- > > "OpenPGP Message Format" ([[https://tools.ietf.org/html/rfc4880][RFC > > 4880]] which obsoletes [[https://tools.ietf.org/html/rfc1991][RFC > > 1991]] and [[https://tools.ietf.org/html/rfc2440][RFC 2440]])... > > ... > > foo [[https://tools.ietf.org/html/rfc4880][RFC 4880]] bar > > baz [[https://tools.ietf.org/html/rfc1991][RFC 1991]] foo > > bar [[https://tools.ietf.org/html/rfc2440][RFC 2440]] baz > > --------------------snip-------------------- > > > > Text export result: > > > > --------------------snip-------------------- > > "OpenPGP Message Format" ([RFC 4880] which obsoletes [RFC 1991] and [RFC > > 2440])... ... foo [RFC 4880] bar baz [RFC 1991] foo bar [RFC 2440] baz > > > > > > [RFC 4880] https://tools.ietf.org/html/rfc4880 > > > > [RFC 1991] https://tools.ietf.org/html/rfc1991 > > > > [RFC 2440] https://tools.ietf.org/html/rfc2440 > > > > [RFC 4880] https://tools.ietf.org/html/rfc4880 > > > > [RFC 1991] https://tools.ietf.org/html/rfc1991 > > --------------------snip-------------------- > > > > These multiple references look quite bad. Is it possible to > > "normalize" the descriptions in some way *before* checking > > them for uniqueness and output them thereafter? > > Could you be more explicit? What does look quite bad? What did > you expect instead? How is related to line breaks in the > descriptions? Ok, let's go into more details. See the Org source text: 1. There are three links and each of them appears twice. The link targets of every two of them are identical. 2. Each of the two "[...][RFC 2440]" links appear in one line; the links "[...][RFC 4880]" and "[...][RFC 1991]" each have a newline in their description. They are in fact "[...][RFC\n4880]" and "[...][RFC 4880]" and, respectively, "[...][RFC\n1991]" and "[...][RFC 1991]". So, now let's examine the Org text export: The final reference part - the five links below the paragraph - shows two links, [RFC 4880] and [RFC 1991], which appear twice but the link [RFC 2440] appears only once there. This is, at least, inconsistent. The point is, that Org obviously considers "[...][RFC 4880]" and "[...][RFC\n4880]" as being two different links internally and list both of them in the reference part. For this listing, the \n is removed. This is, what I called "normalization" in my first post. Human eyes, however, won't see any difference between this two forms and start being surprised. I expect, Org to do the following steps while parsing the source text: 1. "Normalize" or clean the link description, i.e. remove any newlines, starting and trailing spaces, and replace any occurrences of "[ \t]+" in the interior by a single space only. (To be done.) 2. Check the tuple (description,target) for duplicates and drop them. (Seems ok to me.) 3. Below the paragraph list the tuples as "[description] target" in the order of occurrence in the original text. (Also seems ok to me.) I hope this makes this issue a little bit more clear now. Kind regards, Mathias ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Bug: text export and multi-word link descriptions with line breaks 2014-04-03 16:30 ` Mathias Bauer @ 2014-04-03 20:54 ` Nicolas Goaziou 0 siblings, 0 replies; 4+ messages in thread From: Nicolas Goaziou @ 2014-04-03 20:54 UTC (permalink / raw) To: emacs-orgmode Mathias Bauer <mbauer@gmx.org> writes: > I expect, Org to do the following steps while parsing the source > text: > > 1. "Normalize" or clean the link description, i.e. remove any > newlines, starting and trailing spaces, and replace any > occurrences of "[ \t]+" in the interior by a single space > only. (To be done.) > > 2. Check the tuple (description,target) for duplicates and drop > them. (Seems ok to me.) > > 3. Below the paragraph list the tuples as "[description] target" > in the order of occurrence in the original text. (Also seems > ok to me.) > > I hope this makes this issue a little bit more clear now. Indeed. I missed the duplicates links. This should be fixed. Thank you for the report. Regards, -- Nicolas Goaziou ^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2014-04-03 20:53 UTC | newest] Thread overview: 4+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2014-04-03 14:28 Bug: text export and multi-word link descriptions with line breaks Mathias Bauer 2014-04-03 15:25 ` Nicolas Goaziou 2014-04-03 16:30 ` Mathias Bauer 2014-04-03 20:54 ` Nicolas Goaziou
Code repositories for project(s) associated with this public inbox https://git.savannah.gnu.org/cgit/emacs/org-mode.git This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).