emacs-orgmode@gnu.org archives
 help / color / mirror / code / Atom feed
From: Le Wang <l26wang@gmail.com>
To: Puneeth Chaganti <punchagan@gmail.com>
Cc: emacs-orgmode@gnu.org
Subject: Re: importing google docs document into org
Date: Fri, 18 Feb 2011 20:27:31 +0800	[thread overview]
Message-ID: <AANLkTinUbgiKAAqm+LAQdysKBnw0qdPUY3AsgUpCLb6_@mail.gmail.com> (raw)
In-Reply-To: <AANLkTikw0Mko3AyRBfxbJ5-0C9kSUmQiz+aKXf=a7q4c@mail.gmail.com>


[-- Attachment #1.1: Type: text/plain, Size: 2073 bytes --]

On Fri, Feb 18, 2011 at 1:24 PM, Puneeth Chaganti <punchagan@gmail.com>wrote:

> You could try using Pandoc [1]. It can parse various markups
> (including html) and reformat it into various other markups (including
> org). There may be a few rough ends in the org-exporter, but it might
> be good enough for you.
>

Outstanding.  I exported the document from google docs to html then used
pandoc to convert to org-mode.  The conversion wasn't perfect.  I had to
manually edit a few things.  But it wasn't too painful.

Here are some support functions I made to scrub the hyperlinks.

(defun le::fix-google-doc-link ()
  "unhexify URLs and collapse any repeated links.
"
  (interactive)
  (let ((link-regexp "\\[http://www.google.com/url\\?q=\\(.*
?\\)&sa=.*?\\]"))
    (save-excursion
      (goto-char (point-min))
      (le::collapse-org-consecutive-links)
      (goto-char (point-min))
      (while (re-search-forward link-regexp nil t)
        (replace-match (concat "["
                               (save-match-data
                                 (url-unhex-string
(match-string-no-properties 1)))
                               "]") t t)))))


(defun le::collapse-org-consecutive-links ()
  "pandoc conversion of google docs HTML sometimes break a
  multi-word hyperlink into individual links.  This function
  collapses them back into one
  link."
  (interactive)
  (let ((link-regex
"\\[\\[\\([^[:space:]]*?\\)\\]\\[\\(\\(?:.\\|\n\\)*?\\)\\]\\]")
        url
        text)
    (while (re-search-forward link-regex nil t)
      (setq url (match-string-no-properties 1)
            text (match-string-no-properties 2))
      (save-match-data
        (catch 'done
          (while (looking-at link-regex)
            (if (string-equal url (match-string-no-properties 1))
                (progn
                  (setq text (concat text (match-string-no-properties 2)))
                  (replace-match ""))
              (throw 'done nil)))))
      (when (not (string-equal text (match-string-no-properties 2)))
        (replace-match text t t nil 2)))))



-- 
Le

[-- Attachment #1.2: Type: text/html, Size: 4715 bytes --]

[-- Attachment #2: Type: text/plain, Size: 201 bytes --]

_______________________________________________
Emacs-orgmode mailing list
Please use `Reply All' to send replies to the list.
Emacs-orgmode@gnu.org
http://lists.gnu.org/mailman/listinfo/emacs-orgmode

  reply	other threads:[~2011-02-18 12:27 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-02-17 15:43 importing google docs document into org Le Wang
2011-02-17 18:41 ` Puneeth Chaganti
2011-02-18  5:02   ` Le Wang
2011-02-18  5:24     ` Puneeth Chaganti
2011-02-18 12:27       ` Le Wang [this message]
2011-02-18 12:57         ` Puneeth Chaganti
2011-02-20  6:31           ` Le Wang
2011-02-20  8:00             ` Puneeth Chaganti

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.orgmode.org/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=AANLkTinUbgiKAAqm+LAQdysKBnw0qdPUY3AsgUpCLb6_@mail.gmail.com \
    --to=l26wang@gmail.com \
    --cc=emacs-orgmode@gnu.org \
    --cc=punchagan@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).