Course project: spreadsheet to XML converter

XML Data Management (Winter 2009) offered by Professor Michael Carey


Members:
Shengyue Ji
Mingyan Gao

Download:
ss2xml.py (Command line tool)
xsdgen.py (GUI tool on top of ss2xml)
Requirements:
Python, SQLite(should come with Python)
PyGTK GTK+ for xsdgen.py

Brief Description

Many scientists use spreadsheets as a "poor man's database" for storing scientific data that may need to be shared. Design and implement a facility for automatically inferring "good" XML Schemas and extracting XML data from data stored in such spreadsheets. Make sure that your approach works for a variety of the sorts of spreadsheets that are found in practice, and consider including a "human assistance" capability to allow a domain expert to help the system arrive at an appropriate schema design. Test your approach on a variety of samples, some provided by Google and others provided by crawling the web and/or talking to various data owners in science-land.

Part 1 - Command line tool ss2xml

ss2xml.py reads spreadsheet file from stdin, generates XML file and writes to stdout, according to predefined conversion rules. Conversion rules are specified using extended XML schema file (XSD). The idea is to add special attributes on standard XSD to guide the conversion (e.g., how to map values, and how to wrap up data from flat form to nested form). SQLite is heavily used to avoid repeated efforts such as expression evaluation. In general each input spreadsheet file is defined in @bbx:define on xsd:annotation. Data are mapped from spreadsheet to XML using @bbx:value on xsd:element and xsd:attribute. Flat data are wrapped up using @bbx:foreach to created nested structure. See the following for conversion rule details. (This is too hard to use, I'm gonna jump to Part 2 to see the GUI tool.)

Example
./ss2xml.py rules.xsd < input.csv > output.xml
Conversion rule XSD

Tables/variables

Tables/variables shall never be referenced beyond their valid scopes.

@bbx:test

Misc



Part 2 - GUI tool xsdgen

Additional requirement: PyGTK and GTK+. "It seems the conversion rule XSD is very interesting, but who's willing to write the complicated syntax you defined?" xsdgen does. The purpose of this tool is let user generate the XSD rule file and then convert spreadsheet to XML by clicking on GUI. This tool depends on ss2xml.

Step 1: Add table(s).


Step 2: Verify table.


Step 3: Add cursor(s).


Step 4: Create XML tree and map values.


Step 5: Generate the XSD conversion rule.


Step 6: Generate the XML.