You could try using Pandoc [1]. It can parse various markups(including html) and reformat it into various other markups (including
org). There may be a few rough ends in the org-exporter, but it might
be good enough for you.
(defun le::fix-google-doc-link ()"unhexify URLs and collapse any repeated links."(interactive)(let ((link-regexp "\\[http://www.google.com/url\\?q=\\(.*?\\)&sa=.*?\\]"))(save-excursion(goto-char (point-min))(le::collapse-org-consecutive-links)(goto-char (point-min))(while (re-search-forward link-regexp nil t)(replace-match (concat "["(save-match-data(url-unhex-string (match-string-no-properties 1)))"]") t t)))))(defun le::collapse-org-consecutive-links ()"pandoc conversion of google docs HTML sometimes break amulti-word hyperlink into individual links. This functioncollapses them back into onelink."(interactive)(let ((link-regex "\\[\\[\\([^[:space:]]*?\\)\\]\\[\\(\\(?:.\\|\n\\)*?\\)\\]\\]")urltext)(while (re-search-forward link-regex nil t)(setq url (match-string-no-properties 1)text (match-string-no-properties 2))(save-match-data(catch 'done(while (looking-at link-regex)(if (string-equal url (match-string-no-properties 1))(progn(setq text (concat text (match-string-no-properties 2)))(replace-match ""))(throw 'done nil)))))(when (not (string-equal text (match-string-no-properties 2)))(replace-match text t t nil 2)))))