From mboxrd@z Thu Jan  1 00:00:00 1970
From: Peter Davis <pfd@pfdstudio.com>
Subject: Re: Importing from Oddmuse?
Date: Mon, 28 Oct 2013 11:01:30 -0400
Message-ID: <526E7C4A.7050804@pfdstudio.com>
References: <CAE-e6gkqaWRf=QGHjYXoTR9BjgE+98mecY_w4w9Dc+uzPTCw-Q@mail.gmail.com>
Mime-Version: 1.0
Content-Type: multipart/mixed; boundary="------------040606060106040103050908"
Return-path: <emacs-orgmode-bounces+geo-emacs-orgmode=m.gmane.org@gnu.org>
Received: from eggs.gnu.org ([2001:4830:134:3::10]:48646)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <pfd@pfdstudio.com>) id 1VaoK5-0007zq-Sh
	for emacs-orgmode@gnu.org; Mon, 28 Oct 2013 11:01:45 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <pfd@pfdstudio.com>) id 1VaoK0-00032h-Bf
	for emacs-orgmode@gnu.org; Mon, 28 Oct 2013 11:01:41 -0400
Received: from out4-smtp.messagingengine.com ([66.111.4.28]:36755)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <pfd@pfdstudio.com>) id 1VaoK0-00032Y-7N
	for emacs-orgmode@gnu.org; Mon, 28 Oct 2013 11:01:36 -0400
In-Reply-To: <CAE-e6gkqaWRf=QGHjYXoTR9BjgE+98mecY_w4w9Dc+uzPTCw-Q@mail.gmail.com>
List-Id: "General discussions about Org-mode." <emacs-orgmode.gnu.org>
List-Unsubscribe: <https://lists.gnu.org/mailman/options/emacs-orgmode>,
	<mailto:emacs-orgmode-request@gnu.org?subject=unsubscribe>
List-Archive: <http://lists.gnu.org/archive/html/emacs-orgmode>
List-Post: <mailto:emacs-orgmode@gnu.org>
List-Help: <mailto:emacs-orgmode-request@gnu.org?subject=help>
List-Subscribe: <https://lists.gnu.org/mailman/listinfo/emacs-orgmode>,
	<mailto:emacs-orgmode-request@gnu.org?subject=subscribe>
Errors-To: emacs-orgmode-bounces+geo-emacs-orgmode=m.gmane.org@gnu.org
Sender: emacs-orgmode-bounces+geo-emacs-orgmode=m.gmane.org@gnu.org
To: emacs-orgmode@gnu.org

This is a multi-part message in MIME format.
--------------040606060106040103050908
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit

Just to answer my own question, I shamelessly took Alex Schroeder's 
raw.pl script and hacked it up a bit to do some conversion from Oddmuse 
markup to org-mode. The attached Perl script should run through all the 
pages in an Oddmuse Wiki and generate .org versions of them in a 
separate directory.

This is still very much a work in progress, but I think the general 
framework is useful. On thing I have to fix is the hyperlinks. Right 
now, if the Wiki page is "one two.pg", this script will generate a file 
named "one_two.org," but any links will refer to "[[file:one 
two.org][one two]]"

I concentrated on the small subset of Oddmuse markup that I'm using, but 
I think it's easily extensible.

Let me know if this is at all useful to anyone else.

-pd


On 10/25/13 10:54 AM, Peter Davis wrote:
> I'm comparatively new to Org mode (actually, I've used it for years, 
> but only a small subset of its functionality). I've used Oddmuse for 
> years to maintain my own personal Wiki, but now I'm looking to move to 
> Org mode.
>
> I know there are lots of tools for exporting or publishing from Org 
> mode to Oddmuse, but how about the other direction? Any tools or tips 
> for importing a large number of Oddmuse pages into Org mode? Ideally, 
> I'd like to keep them as separate files, with links converted to file 
> links, etc.
>


-- 
Peter Davis
The Tech Curmudgeon
www.techcurmudgeon.com


--------------040606060106040103050908
Content-Type: text/x-perl-script;
 name="om2org.pl"
Content-Transfer-Encoding: 7bit
Content-Disposition: attachment;
 filename="om2org.pl"

#! /usr/bin/perl -w

# Copyright (C) 2005, 2007  Alex Schroeder <alex@emacswiki.org>
#
# Portions copyright (c) 2013, Peter Davis <pfd@pfdstudio.com>
#
# This program is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation; either version 3 of the License, or
# (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program.  If not, see <http://www.gnu.org/licenses/>.

sub ParseData {
  my $data = shift;
  my %result;
  while ($data =~ /(\S+?): (.*?)(?=\n[^ \t]|\Z)/sg) {
    my ($key, $value) = ($1, $2);
    $value =~ s/\n\t/\n/g;
    $result{$key} = $value;
  }
  return %result;
}

sub FixMarkUp {
    my $data = shift;
    my $orgout = "#+STARTUP: showeverything logdone\n#+options: num:nil\n\n";
    my $csvMode = 0;
    foreach (split /\n/, $data) {
	if (length($_)) {
	    s/\r//g;
	    # csv tables
	    if ($_ =~ /<csv>/) {
		$csvMode = 1;
		s/<csv>/#+ATTR_HTML: :border 2 :rules all :frame border/g;
	    } elsif ($_ =~ /^\s*$/) {
		$csvMode = 0;
	    } elsif ($csvMode) {
		s/^/|/g;
		s/,/|/g;
		s/$/|/g;
	    }
	    # hyperlinks
	    s/\[\[([^]]*)\]\]/[[file:$1.org][$1]]/g;
	    # strike through
	    s/<\/?s>/+/g;
	    # verse
	    s/:::/#+BEGIN_VERSE/g;
	    # bold and italic
	    s/'''/*/g;
	    s/''/\//g;
	    # bullet lists
	    s/^\*\*\*\*/    */g;
	    s/^\*\*\*/   */g;
	    s/^\*\*/  */g;
	    s/^\*/ */g;
	    # headers
	    s/^\=\=\=\=/****/g;
	    s/^\=\=\=/***/g;
	    s/^\=\=/**/g;
	    s/^\=/*/g;
#	    s/ \=?$//g;
	    s/ \=\=\=\=$//g;
	    s/ \=\=\=$//g;
	    s/ \=\=$//g;
	    s/ \=$//g;
	    s/^# / 1. /g;
	} else {
	    $csvMode = 0;
	}
	$orgout = $orgout . $_ . "\n";
    }
    return $orgout;
}

sub main {
  my ($regexp, $PageDir, $OrgDir) = @_;
  # include dotfiles!
  local $/ = undef;   # Read complete files
  foreach my $file (glob("$PageDir/*/*.pg $PageDir/*/.*.pg")) {
    next unless $file =~ m|/.*/(.+)\.pg$|;
    my $page = $1;
    next if $regexp && $page !~ m|$regexp|o;
    $page = $page . ".org";
    mkdir($OrgDir) or die "Cannot create $OrgDir directory: $!"
      unless -d $OrgDir;
    open(F, $file) or die "Cannot read $page file: $!";
    my $data = <F>;
    close(F);
    my $ts = (stat("$OrgDir/$page"))[9];
    my %result1 = ParseData($data);
    my $result2 = FixMarkUp($result1{text});
    if ($ts && $ts == $result1{ts}) {
      print "skipping $page because it is up to date\n" if $verbose;
    } else {
      print "writing $page because $ts != $result{ts}\n" if $verbose;
      open(F,"> $OrgDir/$page") or die "Cannot write $page org file: $!";
      # print F $result1{text};
      print F $result2;
      close(F);
      utime $result1{ts}, $result1{ts}, "$OrgDir/$page"; # touch file
    }
  }
}

use Getopt::Long;
my $regexp = undef;
my $page = 'page';
my $dir = 'org';
GetOptions ("regexp=s" => \$regexp,
	    "page=s"   => \$page,
	    "dir=s"    => \$dir,
	    "help"     => \$help);

if ($help) {
  print qq{
Usage: $0 [--regexp REGEXP] [--page DIR] [--dir DIR]

Writes the org wiki text into plain text files.

--regexp selects a subsets of pages whose names match the regular
  expression. Note that spaces have been translated to underscores.

--page designates the page directory. By default this is 'page' in the
  current directory. If you run this script in your data directory,
  the default should be fine.

--dir designates an output directory. By default this is 'org' in the
  current directory.

Example: $0 --regexp '\\.el\$' --dir elisp
}
} else {
  main ($regexp, $page, $dir);
}

--------------040606060106040103050908--