emacs-orgmode@gnu.org archives
 help / color / mirror / code / Atom feed
* HTML --> Org-mode?
@ 2015-01-26 20:15 Matt Price
  2015-01-26 20:31 ` Tory S. Anderson
                   ` (3 more replies)
  0 siblings, 4 replies; 10+ messages in thread
From: Matt Price @ 2015-01-26 20:15 UTC (permalink / raw)
  To: Org Mode

[-- Attachment #1: Type: text/plain, Size: 596 bytes --]

Hmm,

Looks like I asked this about a year ago and didn't follow up on it.  Does
anyone know a way to generate org-mode syntax from an html string? I would
like to extend zotxt slightly (see my last post) and at present zotxt can
pull citations 7 bibliography entries from Zotero only in plain-text and
HTML form.  The plaintext form loses information, so I would like to
translate the HTML into org-mode syntax.

Since this would have to happen in the context of an

(org-add-link-type  )

invocation, it would be best if this could be done directly in emacs
somehow...

Thanks as always,

Matt

[-- Attachment #2: Type: text/html, Size: 757 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: HTML --> Org-mode?
  2015-01-26 20:15 HTML --> Org-mode? Matt Price
@ 2015-01-26 20:31 ` Tory S. Anderson
       [not found]   ` <CAN_Dec_FD7ys9zoOQ3pNym+E_0=D=acHUapKYHbQWxjL=huoNA@mail.gmail.com>
  2015-01-27  8:23 ` Willem Rein Oudshoorn
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 10+ messages in thread
From: Tory S. Anderson @ 2015-01-26 20:31 UTC (permalink / raw)
  To: Matt Price; +Cc: Org Mode

Using the magic wizard program Pandoc, I just had success with a simple little example: 

    pandoc -o test.org test.html

Input test.html:
    <html>
      <body>
          <strong>TEST strong!</strong>
        <div class='table'>
          <div class='cell'>Cell 1</div>
          <div class='cell'>Cell 2</div>
          <div class='cell'>Cell 3</div>
          <div class='cell'>Cell 4</div>
        </div>
      </body>
    </html>

Output test.org:
    *TEST strong!*
    Cell 1
    Cell 2
    Cell 3
    Cell 4

I'm not sure how sophisticated the strings you are dealing with, but pandoc might do the trick for you. 
- Tory


Matt Price <moptop99@gmail.com> writes:

> Hmm,
>
> Looks like I asked this about a year ago and didn't follow up on it.
> Does anyone know a way to generate org-mode syntax from an html
> string? I would like to extend zotxt slightly (see my last post) and
> at present zotxt can pull citations 7 bibliography entries from Zotero
> only in plain-text and HTML form. The plaintext form loses
> information, so I would like to translate the HTML into org-mode
> syntax. 
>
> Since this would have to happen in the context of an 
>
> (org-add-link-type )
>
> invocation, it would be best if this could be done directly in emacs
> somehow... 
>
> Thanks as always,
>
> Matt

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: HTML --> Org-mode?
       [not found]     ` <87a9154670.fsf@gmail.com>
@ 2015-01-27  4:42       ` Matt Price
  0 siblings, 0 replies; 10+ messages in thread
From: Matt Price @ 2015-01-27  4:42 UTC (permalink / raw)
  To: Org Mode

[-- Attachment #1: Type: text/plain, Size: 2512 bytes --]

I think the answer may be something like:

(shell-command-to-string (concat  "pandoc -f html -t org <<< '"  :html "'" )

Though I'm not quite sure how to go about it just yet.

On Mon, Jan 26, 2015 at 3:50 PM, Tory S. Anderson <torys.anderson@gmail.com>
wrote:

> man pandoc will be your friend. It guided me to the following simple
> (interactive) use:
>
> pandoc -f html -t org
> <b> how are you? </b>
> <i> I am good </i>
> *how are you?* /I am good/
>
> I won't be able to help you much farther than that, though.
> - Tory
>
> Matt Price <moptop99@gmail.com> writes:
>
> > That should be enough. I would need to feed a string form emacs to
> > pandoc, then capture the output as a new string that can be output in
> > the export filter. Do you know how to do that part?
> > Thanks,
> > Matt
> >
> > On Mon, Jan 26, 2015 at 3:31 PM, Tory S. Anderson
> > <torys.anderson@gmail.com> wrote:
> >
> >     Using the magic wizard program Pandoc, I just had success with a
> >     simple little example:
> >
> >     pandoc -o test.org test.html
> >
> >     Input test.html:
> >     <html>
> >     <body>
> >     <strong>TEST strong!</strong>
> >     <div class='table'>
> >     <div class='cell'>Cell 1</div>
> >     <div class='cell'>Cell 2</div>
> >     <div class='cell'>Cell 3</div>
> >     <div class='cell'>Cell 4</div>
> >     </div>
> >     </body>
> >     </html>
> >
> >     Output test.org:
> >     *TEST strong!*
> >     Cell 1
> >     Cell 2
> >     Cell 3
> >     Cell 4
> >
> >     I'm not sure how sophisticated the strings you are dealing with,
> >     but pandoc might do the trick for you.
> >     - Tory
> >
> >
> >
> >
> >     Matt Price <moptop99@gmail.com> writes:
> >
> >     > Hmm,
> >     >
> >     > Looks like I asked this about a year ago and didn't follow up on
> >     it.
> >     > Does anyone know a way to generate org-mode syntax from an html
> >     > string? I would like to extend zotxt slightly (see my last post)
> >     and
> >     > at present zotxt can pull citations 7 bibliography entries from
> >     Zotero
> >     > only in plain-text and HTML form. The plaintext form loses
> >     > information, so I would like to translate the HTML into org-mode
> >     > syntax.
> >     >
> >     > Since this would have to happen in the context of an
> >     >
> >     > (org-add-link-type )
> >     >
> >     > invocation, it would be best if this could be done directly in
> >     emacs
> >     > somehow...
> >     >
> >     > Thanks as always,
> >     >
> >     > Matt
> >
>

[-- Attachment #2: Type: text/html, Size: 3998 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: HTML --> Org-mode?
  2015-01-26 20:15 HTML --> Org-mode? Matt Price
  2015-01-26 20:31 ` Tory S. Anderson
@ 2015-01-27  8:23 ` Willem Rein Oudshoorn
  2015-01-27 11:55   ` Matt Price
  2015-01-27  8:27 ` Eric S Fraga
  2015-01-27  9:51 ` Albert Krewinkel
  3 siblings, 1 reply; 10+ messages in thread
From: Willem Rein Oudshoorn @ 2015-01-27  8:23 UTC (permalink / raw)
  To: emacs-orgmode

Matt Price <moptop99@gmail.com> writes:

> Hmm,
>
> Looks like I asked this about a year ago and didn't follow up on it.  Does
> anyone know a way to generate org-mode syntax from an html string? I would
> like to extend zotxt slightly (see my last post) and at present zotxt can
> pull citations 7 bibliography entries from Zotero only in plain-text and
> HTML form.  The plaintext form loses information, so I would like to
> translate the HTML into org-mode syntax.
>
> Since this would have to happen in the context of an
>
> (org-add-link-type  )
>
> invocation, it would be best if this could be done directly in emacs
> somehow...
>
> Thanks as always,

Not sure if this helps, but I recently hacked some code to convert a
simple subset of HTML to org.   It is quite hackish and not nearly
finished (and I might never finish).

However it is pure elisp.

Code is at:

http://github.com/woudshoo/html-2-org

Usage is:

 (h-2-o-insert-org-source-for-html HTML-STRING)

which inserts a org blob in the current buffer at point.

Wim Oudshoorn.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: HTML --> Org-mode?
  2015-01-26 20:15 HTML --> Org-mode? Matt Price
  2015-01-26 20:31 ` Tory S. Anderson
  2015-01-27  8:23 ` Willem Rein Oudshoorn
@ 2015-01-27  8:27 ` Eric S Fraga
  2015-01-27  9:51 ` Albert Krewinkel
  3 siblings, 0 replies; 10+ messages in thread
From: Eric S Fraga @ 2015-01-27  8:27 UTC (permalink / raw)
  To: Matt Price; +Cc: Org Mode

On Monday, 26 Jan 2015 at 15:15, Matt Price wrote:
> Hmm,
>
> Looks like I asked this about a year ago and didn't follow up on it.  Does
> anyone know a way to generate org-mode syntax from an html string? I would
> like to extend zotxt slightly (see my last post) and at present zotxt can

If you see my response to your last post, maybe use Zotero to generate
bibtex entries and use these directly...
-- 
: Eric S Fraga (0xFFFCF67D), Emacs 25.0.50.1, Org release_8.3beta-726-gd34b80.dirty

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: HTML --> Org-mode?
  2015-01-26 20:15 HTML --> Org-mode? Matt Price
                   ` (2 preceding siblings ...)
  2015-01-27  8:27 ` Eric S Fraga
@ 2015-01-27  9:51 ` Albert Krewinkel
  3 siblings, 0 replies; 10+ messages in thread
From: Albert Krewinkel @ 2015-01-27  9:51 UTC (permalink / raw)
  To: Matt Price; +Cc: Org Mode

Matt Price <moptop99@gmail.com> writes:
> Looks like I asked this about a year ago and didn't follow up on it.  Does
> anyone know a way to generate org-mode syntax from an html string?

My usual answer to this is Pandoc together with Pandoc-mode.  Wheather
or not this counts as 

> done directly in emacs

is debateable, however.

Cheers,
Albert

Pandoc: http://johnmacfarlane.net/pandoc/
Pandoc-mode: https://joostkremers.github.io/pandoc-mode/


-- 
Albert Krewinkel
GPG: 8eed e3e2 e8c5 6f18 81fe  e836 388d c0b2 1f63 1124

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: HTML --> Org-mode?
  2015-01-27  8:23 ` Willem Rein Oudshoorn
@ 2015-01-27 11:55   ` Matt Price
  2015-01-27 13:58     ` Wim Oudshoorn
  2015-01-27 18:59     ` Willem Rein Oudshoorn
  0 siblings, 2 replies; 10+ messages in thread
From: Matt Price @ 2015-01-27 11:55 UTC (permalink / raw)
  To: Willem Rein Oudshoorn, Org Mode

[-- Attachment #1: Type: text/plain, Size: 1556 bytes --]

Thank you Willem,

This looks very helpful.

I am trying the code out in a scratch buffer and I am unable to gnerate org
syntax, e.g. with this test string:

(h-2-o-insert-org-source-for-html  "<p><a
href=\"http://example.org/\">hello</a>
<span style=\"font-style:italic;\">world<br/> foo</span></p>" )


Am I doing something wrong?  Thanks,
Matt


On Tue, Jan 27, 2015 at 3:23 AM, Willem Rein Oudshoorn <woudshoo@xs4all.nl>
wrote:

> Matt Price <moptop99@gmail.com> writes:
>
> > Hmm,
> >
> > Looks like I asked this about a year ago and didn't follow up on it.
> Does
> > anyone know a way to generate org-mode syntax from an html string? I
> would
> > like to extend zotxt slightly (see my last post) and at present zotxt can
> > pull citations 7 bibliography entries from Zotero only in plain-text and
> > HTML form.  The plaintext form loses information, so I would like to
> > translate the HTML into org-mode syntax.
> >
> > Since this would have to happen in the context of an
> >
> > (org-add-link-type  )
> >
> > invocation, it would be best if this could be done directly in emacs
> > somehow...
> >
> > Thanks as always,
>
> Not sure if this helps, but I recently hacked some code to convert a
> simple subset of HTML to org.   It is quite hackish and not nearly
> finished (and I might never finish).
>
> However it is pure elisp.
>
> Code is at:
>
> http://github.com/woudshoo/html-2-org
>
> Usage is:
>
>  (h-2-o-insert-org-source-for-html HTML-STRING)
>
> which inserts a org blob in the current buffer at point.
>
> Wim Oudshoorn.
>
>
>
>

[-- Attachment #2: Type: text/html, Size: 2379 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: HTML --> Org-mode?
  2015-01-27 11:55   ` Matt Price
@ 2015-01-27 13:58     ` Wim Oudshoorn
  2015-01-27 18:59     ` Willem Rein Oudshoorn
  1 sibling, 0 replies; 10+ messages in thread
From: Wim Oudshoorn @ 2015-01-27 13:58 UTC (permalink / raw)
  To: Matt Price; +Cc: Org Mode, Willem Rein Oudshoorn

> Thank you Willem,
>
> This looks very helpful.
>
> I am trying the code out in a scratch buffer and I am unable to gnerate
> org
> syntax, e.g. with this test string:
>
> (h-2-o-insert-org-source-for-html  "<p><a
> href=\"http://example.org/\">hello</a>
> <span style=\"font-style:italic;\">world<br/> foo</span></p>" )
>
>
> Am I doing something wrong?  Thanks,
> Matt


No you are not doing anything wrong.
What happens is that:

1. I have not implemented links.  This should be easy
   to fix and I will do this when I get home.
2. Similarly for the <span> element.   However I do not
   exactly know what the corresponding Org syntax should be?
   Would you expect:

   /world/
   /foo/

   Or is there a structured element in org mode that renders
   a block as italics?

I will see what I can do about point 2.

In general, I am a bit torn between:

- faithful transformation.
- Getting readable org source.

I am leaning towards the readable org source and
sacrificing the faithfulness.

Wim Oudshoorn.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: HTML --> Org-mode?
  2015-01-27 11:55   ` Matt Price
  2015-01-27 13:58     ` Wim Oudshoorn
@ 2015-01-27 18:59     ` Willem Rein Oudshoorn
  2015-01-28  2:12       ` Matt Price
  1 sibling, 1 reply; 10+ messages in thread
From: Willem Rein Oudshoorn @ 2015-01-27 18:59 UTC (permalink / raw)
  To: emacs-orgmode

Matt Price <moptop99@gmail.com> writes:

> Thank you Willem,
>
> This looks very helpful.
>
> I am trying the code out in a scratch buffer and I am unable to gnerate org
> syntax, e.g. with this test string:
>
> (h-2-o-insert-org-source-for-html  "<p><a
> href=\"http://example.org/\">hello</a>
> <span style=\"font-style:italic;\">world<br/> foo</span></p>" )
>
> Am I doing something wrong?  Thanks,

I added the <a href...> conversion, so that part should work now.
However, I am not sure I want to do the <span style=...> thing.
It seems that this requires some CSS parsing to do the right thing.

Also in this example, the italics would span multiple lines, for which I
think the only way to represent this in org is by enclosing each line in
a pair of '/'.

However, is there a specific use case where you need the conversion from
the <span style='...'> to be precise?   (At the moment it will still
output the content of the <span> but it will not be formatted.

If there is a specific use case, I could maybe make it work for that.

Personally, I use it to edit text fields in salesforce.
For which I do not care too much if the round trip yields perfect
results.


Wim Oudshoorn

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: HTML --> Org-mode?
  2015-01-27 18:59     ` Willem Rein Oudshoorn
@ 2015-01-28  2:12       ` Matt Price
  0 siblings, 0 replies; 10+ messages in thread
From: Matt Price @ 2015-01-28  2:12 UTC (permalink / raw)
  To: Willem Rein Oudshoorn; +Cc: Org Mode

[-- Attachment #1: Type: text/plain, Size: 1612 bytes --]

On Tue, Jan 27, 2015 at 1:59 PM, Willem Rein Oudshoorn <woudshoo@xs4all.nl>
wrote:

> Matt Price <moptop99@gmail.com> writes:
>
> > Thank you Willem,
> >
> > This looks very helpful.
> >
> > I am trying the code out in a scratch buffer and I am unable to gnerate
> org
> > syntax, e.g. with this test string:
> >
> > (h-2-o-insert-org-source-for-html  "<p><a
> > href=\"http://example.org/\">hello</a>
> > <span style=\"font-style:italic;\">world<br/> foo</span></p>" )
> >
> > Am I doing something wrong?  Thanks,
>
> I added the <a href...> conversion, so that part should work now.
> However, I am not sure I want to do the <span style=...> thing.
> It seems that this requires some CSS parsing to do the right thing.
>
> Also in this example, the italics would span multiple lines, for which I
> think the only way to represent this in org is by enclosing each line in
> a pair of '/'.
>
> However, is there a specific use case where you need the conversion from
> the <span style='...'> to be precise?   (At the moment it will still
> output the content of the <span> but it will not be formatted.
>
> If there is a specific use case, I could maybe make it work for that.
>
> Personally, I use it to edit text fields in salesforce.
> For which I do not care too much if the round trip yields perfect
> results.
>
> Hi Wim,

I am trying to parse the output of citeproc-js, which does produce those
spans -- not veyr elegant, I must say, but that's what I'm working with.
I will try this again in about a day, when I am done with the hard part of
my week!  Thank you very much,
Matt


>
> Wim Oudshoorn
>
>
>

[-- Attachment #2: Type: text/html, Size: 2476 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2015-01-28  2:12 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-01-26 20:15 HTML --> Org-mode? Matt Price
2015-01-26 20:31 ` Tory S. Anderson
     [not found]   ` <CAN_Dec_FD7ys9zoOQ3pNym+E_0=D=acHUapKYHbQWxjL=huoNA@mail.gmail.com>
     [not found]     ` <87a9154670.fsf@gmail.com>
2015-01-27  4:42       ` Matt Price
2015-01-27  8:23 ` Willem Rein Oudshoorn
2015-01-27 11:55   ` Matt Price
2015-01-27 13:58     ` Wim Oudshoorn
2015-01-27 18:59     ` Willem Rein Oudshoorn
2015-01-28  2:12       ` Matt Price
2015-01-27  8:27 ` Eric S Fraga
2015-01-27  9:51 ` Albert Krewinkel

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).