On Fri, Feb 18, 2011 at 1:24 PM, Puneeth Chaganti <punchagan@gmail.com> wrote:
You could try using Pandoc [1]. It can parse various markups
(including html) and reformat it into various other markups (including
org). There may be a few rough ends in the org-exporter, but it might
be good enough for you.

Outstanding.  I exported the document from google docs to html then used pandoc to convert to org-mode.  The conversion wasn't perfect.  I had to manually edit a few things.  But it wasn't too painful.

Here are some support functions I made to scrub the hyperlinks.

(defun le::fix-google-doc-link ()
  "unhexify URLs and collapse any repeated links.
"
  (interactive)
  (let ((link-regexp "\\[http://www.google.com/url\\?q=\\(.*?\\)&sa=.*?\\]"))
    (save-excursion
      (goto-char (point-min))
      (le::collapse-org-consecutive-links)
      (goto-char (point-min))
      (while (re-search-forward link-regexp nil t)
        (replace-match (concat "["
                               (save-match-data
                                 (url-unhex-string (match-string-no-properties 1)))
                               "]") t t)))))


(defun le::collapse-org-consecutive-links ()
  "pandoc conversion of google docs HTML sometimes break a
  multi-word hyperlink into individual links.  This function
  collapses them back into one
  link."
  (interactive)
  (let ((link-regex "\\[\\[\\([^[:space:]]*?\\)\\]\\[\\(\\(?:.\\|\n\\)*?\\)\\]\\]")
        url
        text)
    (while (re-search-forward link-regex nil t)
      (setq url (match-string-no-properties 1)
            text (match-string-no-properties 2))
      (save-match-data
        (catch 'done
          (while (looking-at link-regex)
            (if (string-equal url (match-string-no-properties 1))
                (progn
                  (setq text (concat text (match-string-no-properties 2)))
                  (replace-match ""))
              (throw 'done nil)))))
      (when (not (string-equal text (match-string-no-properties 2)))
        (replace-match text t t nil 2)))))


--
Le