emacs-orgmode@gnu.org archives
 help / color / mirror / code / Atom feed
* Canonical way to strip off all markup from an element in Org exporter backend?
@ 2017-12-20 18:30 Kaushal Modi
  2017-12-20 22:04 ` Nicolas Goaziou
  0 siblings, 1 reply; 7+ messages in thread
From: Kaushal Modi @ 2017-12-20 18:30 UTC (permalink / raw)
  To: emacs-org list

[-- Attachment #1: Type: text/plain, Size: 1449 bytes --]

Hello,

What's the canonical way to strip off all markup from an element in an Org
exporter backend.

I do it in this round-about way in ox-hugo..it works but feels convoluted.
The trick is to remove all markup chars from an element while retaining the
*, /, `, etc chars *not* used for any markup.

I export Org subtrees to individual posts, where the subtree headline will
become the post title. So I need to sanitize that headline of any markup.

Step1: I get the HTMLized version of the title

(org-export-data-with-backend (plist-get info :title) 'html info)

But getting the HTMLized version of the title, it would be easy to strip
off the HTML tags which would be inserted basically for formatting (bold,
italics, etc.).

Step 2: Strip off the HTML tags.

(while (string-match "<\\(?1:[a-z]+\\)[^>]*>\\(?2:[^<]+\\)</\\1>" title)
  (setq title (replace-match "\\2" nil nil title)))

If I do any other exporter like md, I will lose the ability to distinguish
a literal * in the title from a * meant for bold/italics markup in
Markdown. Even ascii is not good because then I'd need to do some intensive
parsing to figure out if ` is meant to be a literal ` or part of `code'.

So the question: Is this the best way.. or is there a canonical way to
export an element without any markup char?

Full actual code[1].

[1]:
https://github.com/kaushalmodi/ox-hugo/blob/dffb7e970f33959a0b97fb8df267a54d01a98a2a/ox-hugo.el#L1769-L1802
-- 

Kaushal Modi

[-- Attachment #2: Type: text/html, Size: 2032 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Canonical way to strip off all markup from an element in Org exporter backend?
  2017-12-20 18:30 Canonical way to strip off all markup from an element in Org exporter backend? Kaushal Modi
@ 2017-12-20 22:04 ` Nicolas Goaziou
  2017-12-20 22:11   ` Kaushal Modi
  0 siblings, 1 reply; 7+ messages in thread
From: Nicolas Goaziou @ 2017-12-20 22:04 UTC (permalink / raw)
  To: Kaushal Modi; +Cc: emacs-org list

Hello,

Kaushal Modi <kaushal.modi@gmail.com> writes:

> What's the canonical way to strip off all markup from an element in an Org
> exporter backend.

You could write a dedicated (anonymous) back-end for that, if you have
a clear idea about what "strip off all markup" means.

Regards,

-- 
Nicolas Goaziou

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Canonical way to strip off all markup from an element in Org exporter backend?
  2017-12-20 22:04 ` Nicolas Goaziou
@ 2017-12-20 22:11   ` Kaushal Modi
  2017-12-20 22:27     ` Nicolas Goaziou
  0 siblings, 1 reply; 7+ messages in thread
From: Kaushal Modi @ 2017-12-20 22:11 UTC (permalink / raw)
  To: Nicolas Goaziou; +Cc: emacs-org list

[-- Attachment #1: Type: text/plain, Size: 682 bytes --]

On Wed, Dec 20, 2017 at 5:04 PM Nicolas Goaziou <mail@nicolasgoaziou.fr>
wrote:

> You could write a dedicated (anonymous) back-end for that,


Thanks! I feared so. Then the strip-HTML-tags approach seems to be the
quickest.


> if you have a clear idea about what "strip off all markup" means.
>

Would there be an interest to add that to the core something like a "base
class" of exporter backends?

"strip off all markup" simply means export something like "*abc* /def/
=ghi= ~jkl~ +mno+ _pqr_" as "abc def ghi jkl mno pqr". Think of that as a
backend without even the minimal adornment that ox-ascii has... and
ox-ascii can be a derived backend from this one.
-- 

Kaushal Modi

[-- Attachment #2: Type: text/html, Size: 1287 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Canonical way to strip off all markup from an element in Org exporter backend?
  2017-12-20 22:11   ` Kaushal Modi
@ 2017-12-20 22:27     ` Nicolas Goaziou
  2017-12-20 22:41       ` Kaushal Modi
  0 siblings, 1 reply; 7+ messages in thread
From: Nicolas Goaziou @ 2017-12-20 22:27 UTC (permalink / raw)
  To: Kaushal Modi; +Cc: emacs-org list

Kaushal Modi <kaushal.modi@gmail.com> writes:

> Thanks! I feared so. Then the strip-HTML-tags approach seems to be the
> quickest.

You must be kidding. It must be around 8 locs. See for example
`org-export-toc-entry-backend'.

> Would there be an interest to add that to the core something like a "base
> class" of exporter backends?

I fail to see how it could be generally useful.

> "strip off all markup" simply means export something like "*abc* /def/
> =ghi= ~jkl~ +mno+ _pqr_" as "abc def ghi jkl mno pqr". Think of that as a
> backend without even the minimal adornment that ox-ascii has... and
> ox-ascii can be a derived backend from this one.

What are you doing with entities, footnotes, latex-snippets...?

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Canonical way to strip off all markup from an element in Org exporter backend?
  2017-12-20 22:27     ` Nicolas Goaziou
@ 2017-12-20 22:41       ` Kaushal Modi
  2017-12-21 14:22         ` Nicolas Goaziou
  0 siblings, 1 reply; 7+ messages in thread
From: Kaushal Modi @ 2017-12-20 22:41 UTC (permalink / raw)
  To: Nicolas Goaziou; +Cc: emacs-org list

[-- Attachment #1: Type: text/plain, Size: 1226 bytes --]

On Wed, Dec 20, 2017 at 5:27 PM Nicolas Goaziou <mail@nicolasgoaziou.fr>
wrote:

> You must be kidding. It must be around 8 locs. See for example
> `org-export-toc-entry-backend'.
>

Thank you! That function is educational. I'll play more with that idea. It
will be a lot more verbose than the 3 line solution I have right now.. but
I am intrigued enough to still try that out to see how it turns out.

I fail to see how it could be generally useful.
>

It can be used wherever just the element content is needed without
formatting properties, like in my case where the element title is needed to
be extracted without any formatting.


> What are you doing with entities, footnotes, latex-snippets...?
>

I haven't yet invested any time into serious development of this "base
class" backend. The idea of this exporter is to give formatting-free output
(like when you select plain text option in an email client).. so at whim,
entities will be translated to the correct unicode chars, footnotes
behavior could be the same as ox-ascii, and latex-snippets can stay in the
raw ascii form.

If there is an interest to move this forward, I can come up with a "raw"
backend spec, and we can discuss on the details.
-- 

Kaushal Modi

[-- Attachment #2: Type: text/html, Size: 1984 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Canonical way to strip off all markup from an element in Org exporter backend?
  2017-12-20 22:41       ` Kaushal Modi
@ 2017-12-21 14:22         ` Nicolas Goaziou
  2017-12-22 20:31           ` Kaushal Modi
  0 siblings, 1 reply; 7+ messages in thread
From: Nicolas Goaziou @ 2017-12-21 14:22 UTC (permalink / raw)
  To: Kaushal Modi; +Cc: emacs-org list

Hello,

Kaushal Modi <kaushal.modi@gmail.com> writes:

> Thank you! That function is educational. I'll play more with that idea. It
> will be a lot more verbose than the 3 line solution I have right now..

    (let ((no-thrill (lambda (o c _) (or c (org-element-property :value o)))))
      (org-export-create-backend
       :parent 'ascii           ;or `hugo', depending on what you mean
       :transcoders (mapcar (lambda (type) (cons type no-thrill))
                            '(bold code italic strike-through underline verbatim))))

Five locs. Not bad either.

> It can be used wherever just the element content is needed without
> formatting properties, like in my case where the element title is needed to
> be extracted without any formatting.

So far, no major back-end needs this. Also, it is very simple to provide
the back-end above.

> I haven't yet invested any time into serious development of this "base
> class" backend. The idea of this exporter is to give formatting-free output
> (like when you select plain text option in an email client).. so at whim,
> entities will be translated to the correct unicode chars, footnotes
> behavior could be the same as ox-ascii, and latex-snippets can stay in the
> raw ascii form.

You're basically describing `ox-ascii' with stripped emphasis markers.
At this point, I'm not convinced we need this in Org proper.


Regards,

-- 
Nicolas Goaziou                                                0x80A93738

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Canonical way to strip off all markup from an element in Org exporter backend?
  2017-12-21 14:22         ` Nicolas Goaziou
@ 2017-12-22 20:31           ` Kaushal Modi
  0 siblings, 0 replies; 7+ messages in thread
From: Kaushal Modi @ 2017-12-22 20:31 UTC (permalink / raw)
  To: Nicolas Goaziou; +Cc: emacs-org list

[-- Attachment #1: Type: text/plain, Size: 874 bytes --]

On Thu, Dec 21, 2017 at 9:22 AM Nicolas Goaziou <mail@nicolasgoaziou.fr>
wrote:

>     (let ((no-thrill (lambda (o c _) (or c (org-element-property :value
> o)))))
>       (org-export-create-backend
>        :parent 'ascii           ;or `hugo', depending on what you mean
>        :transcoders (mapcar (lambda (type) (cons type no-thrill))
>                             '(bold code italic strike-through underline
> verbatim))))
>
> Five locs. Not bad either.
>

Thank you. That also looks a cleaner way to implement what I want.

You're basically describing `ox-ascii' with stripped emphasis markers.
>

Exactly. That's why I suggested extending ox-ascii from this "raw" backend.

At this point, I'm not convinced we need this in Org proper.
>

That's understood. No problem. The snippet you suggested above serves the
purpose very well for now.

Thanks!
-- 

Kaushal Modi

[-- Attachment #2: Type: text/html, Size: 1681 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2017-12-22 20:31 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-12-20 18:30 Canonical way to strip off all markup from an element in Org exporter backend? Kaushal Modi
2017-12-20 22:04 ` Nicolas Goaziou
2017-12-20 22:11   ` Kaushal Modi
2017-12-20 22:27     ` Nicolas Goaziou
2017-12-20 22:41       ` Kaushal Modi
2017-12-21 14:22         ` Nicolas Goaziou
2017-12-22 20:31           ` Kaushal Modi

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).