From mboxrd@z Thu Jan 1 00:00:00 1970 From: Kaushal Modi Subject: Canonical way to strip off all markup from an element in Org exporter backend? Date: Wed, 20 Dec 2017 18:30:20 +0000 Message-ID: Mime-Version: 1.0 Content-Type: multipart/alternative; boundary="001a114075da45667c0560c9c7ec" Return-path: Received: from eggs.gnu.org ([2001:4830:134:3::10]:59993) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1eRj8T-0001tG-Tz for emacs-orgmode@gnu.org; Wed, 20 Dec 2017 13:30:35 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1eRj8S-0000qw-NN for emacs-orgmode@gnu.org; Wed, 20 Dec 2017 13:30:33 -0500 Received: from mail-yw0-x22d.google.com ([2607:f8b0:4002:c05::22d]:39545) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1eRj8S-0000qS-HA for emacs-orgmode@gnu.org; Wed, 20 Dec 2017 13:30:32 -0500 Received: by mail-yw0-x22d.google.com with SMTP id q26so2059497ywa.6 for ; Wed, 20 Dec 2017 10:30:32 -0800 (PST) List-Id: "General discussions about Org-mode." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-orgmode-bounces+geo-emacs-orgmode=m.gmane.org@gnu.org Sender: "Emacs-orgmode" To: emacs-org list --001a114075da45667c0560c9c7ec Content-Type: text/plain; charset="UTF-8" Hello, What's the canonical way to strip off all markup from an element in an Org exporter backend. I do it in this round-about way in ox-hugo..it works but feels convoluted. The trick is to remove all markup chars from an element while retaining the *, /, `, etc chars *not* used for any markup. I export Org subtrees to individual posts, where the subtree headline will become the post title. So I need to sanitize that headline of any markup. Step1: I get the HTMLized version of the title (org-export-data-with-backend (plist-get info :title) 'html info) But getting the HTMLized version of the title, it would be easy to strip off the HTML tags which would be inserted basically for formatting (bold, italics, etc.). Step 2: Strip off the HTML tags. (while (string-match "<\\(?1:[a-z]+\\)[^>]*>\\(?2:[^<]+\\)" title) (setq title (replace-match "\\2" nil nil title))) If I do any other exporter like md, I will lose the ability to distinguish a literal * in the title from a * meant for bold/italics markup in Markdown. Even ascii is not good because then I'd need to do some intensive parsing to figure out if ` is meant to be a literal ` or part of `code'. So the question: Is this the best way.. or is there a canonical way to export an element without any markup char? Full actual code[1]. [1]: https://github.com/kaushalmodi/ox-hugo/blob/dffb7e970f33959a0b97fb8df267a54d01a98a2a/ox-hugo.el#L1769-L1802 -- Kaushal Modi --001a114075da45667c0560c9c7ec Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Hello,

What'= ;s the canonical way to strip off all markup from an element in an Org expo= rter backend.

I do it in this round-about way in ox-hugo..it w= orks but feels convoluted. The trick is to remove all markup chars from an = element while retaining the *, /, `, etc chars *not* used for any markup.
I export Org subtrees to individual posts, where the subtree he= adline will become the post title. So I need to sanitize that headline of a= ny markup.

Step1: I get the HTMLized version of the title

(org-export-data-with-backend (plist-get info :title)= 'html info)

But getting the HTMLized vers= ion of the title, it would be easy to strip off the HTML tags which would b= e inserted basically for formatting (bold, italics, etc.).

Ste= p 2: Strip off the HTML tags.

(while (string-match "<\\(?1:[= a-z]+\\)[^>]*>\\(?2:[^<]+\\)</\\1>" title)
=C2=A0 (s= etq title (replace-match "\\2" nil nil title)))
If I do any other exporter like md, I will lose the ability to = distinguish a literal * in the title from a * meant for bold/italics markup= in Markdown. Even ascii is not good because then I'd need to do some i= ntensive parsing to figure out if ` is meant to be a literal ` or part of `= code'.

So the question: Is this the best w= ay.. or is there a canonical way to export an element without any markup ch= ar?

Full actual code[1].
--001a114075da45667c0560c9c7ec--