From mboxrd@z Thu Jan 1 00:00:00 1970 From: Le Wang Subject: Re: importing google docs document into org Date: Fri, 18 Feb 2011 20:27:31 +0800 Message-ID: References: Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============1438340009==" Return-path: Received: from [140.186.70.92] (port=41756 helo=eggs.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1PqPQs-0000a4-Mk for emacs-orgmode@gnu.org; Fri, 18 Feb 2011 07:27:35 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1PqPQr-0003qP-Ga for emacs-orgmode@gnu.org; Fri, 18 Feb 2011 07:27:34 -0500 Received: from mail-iy0-f169.google.com ([209.85.210.169]:62346) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1PqPQr-0003qE-9k for emacs-orgmode@gnu.org; Fri, 18 Feb 2011 07:27:33 -0500 Received: by iyf13 with SMTP id 13so171691iyf.0 for ; Fri, 18 Feb 2011 04:27:32 -0800 (PST) In-Reply-To: List-Id: "General discussions about Org-mode." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: emacs-orgmode-bounces+geo-emacs-orgmode=m.gmane.org@gnu.org Errors-To: emacs-orgmode-bounces+geo-emacs-orgmode=m.gmane.org@gnu.org To: Puneeth Chaganti Cc: emacs-orgmode@gnu.org --===============1438340009== Content-Type: multipart/alternative; boundary=90e6ba6e84ec5d9ebb049c8da34e --90e6ba6e84ec5d9ebb049c8da34e Content-Type: text/plain; charset=ISO-8859-1 On Fri, Feb 18, 2011 at 1:24 PM, Puneeth Chaganti wrote: > You could try using Pandoc [1]. It can parse various markups > (including html) and reformat it into various other markups (including > org). There may be a few rough ends in the org-exporter, but it might > be good enough for you. > Outstanding. I exported the document from google docs to html then used pandoc to convert to org-mode. The conversion wasn't perfect. I had to manually edit a few things. But it wasn't too painful. Here are some support functions I made to scrub the hyperlinks. (defun le::fix-google-doc-link () "unhexify URLs and collapse any repeated links. " (interactive) (let ((link-regexp "\\[http://www.google.com/url\\?q=\\(.* ?\\)&sa=.*?\\]")) (save-excursion (goto-char (point-min)) (le::collapse-org-consecutive-links) (goto-char (point-min)) (while (re-search-forward link-regexp nil t) (replace-match (concat "[" (save-match-data (url-unhex-string (match-string-no-properties 1))) "]") t t))))) (defun le::collapse-org-consecutive-links () "pandoc conversion of google docs HTML sometimes break a multi-word hyperlink into individual links. This function collapses them back into one link." (interactive) (let ((link-regex "\\[\\[\\([^[:space:]]*?\\)\\]\\[\\(\\(?:.\\|\n\\)*?\\)\\]\\]") url text) (while (re-search-forward link-regex nil t) (setq url (match-string-no-properties 1) text (match-string-no-properties 2)) (save-match-data (catch 'done (while (looking-at link-regex) (if (string-equal url (match-string-no-properties 1)) (progn (setq text (concat text (match-string-no-properties 2))) (replace-match "")) (throw 'done nil))))) (when (not (string-equal text (match-string-no-properties 2))) (replace-match text t t nil 2))))) -- Le --90e6ba6e84ec5d9ebb049c8da34e Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable
On Fri, Feb 18, 2011 at 1:24 PM, Puneeth Chagant= i <punchagan@gmail.com> wrote:
You could try using Pandoc [1]. It can parse various markups
(including html) and reformat it into various other markups (including
org). There may be a few rough ends in the org-exporter, but it might
be good enough for you.

Outstanding. = =A0I exported the document from google docs to html then used pandoc to con= vert to org-mode. =A0The conversion wasn't perfect. =A0I had to manuall= y edit a few things. =A0But it wasn't too painful.

Here are some support functions I made to scrub the hyp= erlinks.

(defun le::fix-google-doc-link ()
=A0=A0"unhexify URLs and collapse any repeated = links.
"
=A0=A0(interactive)
<= /div>
=A0=A0(let ((link-regexp "\\= [http://www.google.com/ur= l\\?q=3D\\(.*?\\)&sa=3D.*?\\]"))
=A0=A0 =A0(save-excursion<= /div>
=A0=A0 =A0 =A0(goto-c= har (point-min))
=A0= =A0 =A0 =A0(le::collapse-org-consecutive-links)
=A0=A0 =A0 =A0(goto-char (= point-min))
=A0=A0 = =A0 =A0(while (re-search-forward link-regexp nil t)
=A0=A0 =A0 =A0 =A0(replace-match (concat "["
=A0=A0 =A0 =A0 =A0 =A0 =A0 =A0 = =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 (save-match-data
=A0=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 = =A0 =A0 =A0 =A0 =A0 (url-unhex-string (match-string-no-properties 1)))
=A0=A0 =A0 =A0 =A0 =A0 =A0= =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 "]") t t)))))


(defun le::collapse-org-co= nsecutive-links ()
= =A0=A0"pandoc conversion of google docs HTML sometimes break a
=A0=A0multi-word hyperlink into indivi= dual links. =A0This function
=A0=A0collapses them back into one
=A0=A0link."
=A0=A0(interactive)
=A0=A0(let ((link-regex "\\[\\[\\([^[:space:]]*?\\)\\]\\[\\(\\(?= :.\\|\n\\)*?\\)\\]\\]")
=A0=A0 =A0 =A0 =A0url
=A0=A0 =A0 =A0 =A0text)
=A0=A0 =A0(while (re-s= earch-forward link-regex nil t)
=A0=A0 =A0 =A0(setq url (m= atch-string-no-properties 1)
=A0=A0 =A0 =A0 =A0 =A0 =A0text (match-string-no-properties 2))
=A0=A0 =A0 =A0(save-match-data
=A0=A0 =A0 =A0 =A0(catch 'done
=A0=A0 =A0 =A0 =A0 =A0(while (looking-at = link-regex)
=A0=A0 =A0 =A0 =A0 =A0 =A0(if (string-equal url (match-string-no-= properties 1))
=A0=A0= =A0 =A0 =A0 =A0 =A0 =A0 =A0(progn
=A0=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0(setq text (concat text= (match-string-no-properties 2)))
=A0=A0 =A0 =A0 =A0 =A0 =A0= =A0 =A0 =A0(replace-match ""))
=A0=A0 =A0 =A0 =A0 =A0 =A0 =A0(throw 'done nil))))= )
=A0=A0 =A0 =A0(when (not (string-equal text (match-string-no-prop= erties 2)))
=A0=A0 = =A0 =A0 =A0(replace-match text t t nil 2)))))


--
Le
--90e6ba6e84ec5d9ebb049c8da34e-- --===============1438340009== Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Disposition: inline _______________________________________________ Emacs-orgmode mailing list Please use `Reply All' to send replies to the list. Emacs-orgmode@gnu.org http://lists.gnu.org/mailman/listinfo/emacs-orgmode --===============1438340009==--