Convert Microsoft Word to Docbook XML using Ruby and OpenOffice
August 10, 2006The following script shows how to convert Microsoft Word files to DocBook XML using OpenOffice on Windows. The batch script uses OLE (Object Linking and Embedding) to transform an unlimited number of files.
It is assumed that you have OpenOffice installed. You need the ruby programming language (the script was tested with the most recent version Ruby 1.8.4).
require 'win32ole'
# Path to directory with Word Files.
PATH = "file:///c|/path/to/doc/files/"
# converts a word file to docbook XML.
# The XML file is named after the original file
# e.g.: ABC.doc -> ABC.xml
def convert_word_to_docbook(file, path)
serviceManager = WIN32OLE.new("com.sun.star.ServiceManager")
desktop = serviceManager.createInstance("com.sun.star.frame.Desktop")
url = path + file
document = desktop.loadComponentFromURL(url, "_blank", 0, [])
url_to = path + file.gsub(/\.doc/, ".xml")
fprops = []
property = serviceManager.Bridge_GetStruct("com.sun.star.beans.PropertyValue")
property["Name"] = "FilterName"
property["Value"] = "DocBook File"
fprops << property
begin
document.storeToUrl(url_to, fprops) # this line works!
ensure
document.close true
end
end
# convert all ".doc" files to DocBook XML
Dir.glob("*.doc").each do |file|
print "converting #{file}...\n"
$stdout.flush
convert_word_to_docbook file, PATH
end
Original script by Julian Elve: http://www.synesthesia.co.uk/blog/.../openoffice-and-ruby/.