From: Le Wang <l26wang@gmail.com>
To: Puneeth Chaganti <punchagan@gmail.com>
Cc: emacs-orgmode@gnu.org
Subject: Re: importing google docs document into org
Date: Fri, 18 Feb 2011 20:27:31 +0800 [thread overview]
Message-ID: <AANLkTinUbgiKAAqm+LAQdysKBnw0qdPUY3AsgUpCLb6_@mail.gmail.com> (raw)
In-Reply-To: <AANLkTikw0Mko3AyRBfxbJ5-0C9kSUmQiz+aKXf=a7q4c@mail.gmail.com>
[-- Attachment #1.1: Type: text/plain, Size: 2073 bytes --]
On Fri, Feb 18, 2011 at 1:24 PM, Puneeth Chaganti <punchagan@gmail.com>wrote:
> You could try using Pandoc [1]. It can parse various markups
> (including html) and reformat it into various other markups (including
> org). There may be a few rough ends in the org-exporter, but it might
> be good enough for you.
>
Outstanding. I exported the document from google docs to html then used
pandoc to convert to org-mode. The conversion wasn't perfect. I had to
manually edit a few things. But it wasn't too painful.
Here are some support functions I made to scrub the hyperlinks.
(defun le::fix-google-doc-link ()
"unhexify URLs and collapse any repeated links.
"
(interactive)
(let ((link-regexp "\\[http://www.google.com/url\\?q=\\(.*
?\\)&sa=.*?\\]"))
(save-excursion
(goto-char (point-min))
(le::collapse-org-consecutive-links)
(goto-char (point-min))
(while (re-search-forward link-regexp nil t)
(replace-match (concat "["
(save-match-data
(url-unhex-string
(match-string-no-properties 1)))
"]") t t)))))
(defun le::collapse-org-consecutive-links ()
"pandoc conversion of google docs HTML sometimes break a
multi-word hyperlink into individual links. This function
collapses them back into one
link."
(interactive)
(let ((link-regex
"\\[\\[\\([^[:space:]]*?\\)\\]\\[\\(\\(?:.\\|\n\\)*?\\)\\]\\]")
url
text)
(while (re-search-forward link-regex nil t)
(setq url (match-string-no-properties 1)
text (match-string-no-properties 2))
(save-match-data
(catch 'done
(while (looking-at link-regex)
(if (string-equal url (match-string-no-properties 1))
(progn
(setq text (concat text (match-string-no-properties 2)))
(replace-match ""))
(throw 'done nil)))))
(when (not (string-equal text (match-string-no-properties 2)))
(replace-match text t t nil 2)))))
--
Le
[-- Attachment #1.2: Type: text/html, Size: 4715 bytes --]
[-- Attachment #2: Type: text/plain, Size: 201 bytes --]
_______________________________________________
Emacs-orgmode mailing list
Please use `Reply All' to send replies to the list.
Emacs-orgmode@gnu.org
http://lists.gnu.org/mailman/listinfo/emacs-orgmode
next prev parent reply other threads:[~2011-02-18 12:27 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-02-17 15:43 importing google docs document into org Le Wang
2011-02-17 18:41 ` Puneeth Chaganti
2011-02-18 5:02 ` Le Wang
2011-02-18 5:24 ` Puneeth Chaganti
2011-02-18 12:27 ` Le Wang [this message]
2011-02-18 12:57 ` Puneeth Chaganti
2011-02-20 6:31 ` Le Wang
2011-02-20 8:00 ` Puneeth Chaganti
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://www.orgmode.org/
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=AANLkTinUbgiKAAqm+LAQdysKBnw0qdPUY3AsgUpCLb6_@mail.gmail.com \
--to=l26wang@gmail.com \
--cc=emacs-orgmode@gnu.org \
--cc=punchagan@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://git.savannah.gnu.org/cgit/emacs/org-mode.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).