THE XML
I wanted to validate the XML sheets produced by auteur
example XML produced by auteur.
<?xml version="1.0" encoding="UTF-8"?>
<auteur>
<source>
<location>/home/neil/Videos/2010_04_canada/M2U00547.MPG</location>
<timestamp pos="11" />
<clip end="7.5" id="0001" start="1.5" />
<clip end="13.6" id="0002" start="7.5" />
<clip end="1.5" id="0003" start="0.0" />
</source>
<source>
<location>/home/neil/Videos/2010_04_canada/M2U00549.MPG</location>
<timestamp pos="2.678" />
</source>
</auteur>
The Schema
and here's the schema I wrote to check the validity of that data.
(saved as foo.xsd)
<?xml version="1.0"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:annotation>
<xs:documentation>
The rules in this schema will be used to validate
the content of an auteur project file.
</xs:documentation>
<xs:appinfo source="http://www.auteur-project.org" > </xs:appinfo>
</xs:annotation>
<!--RULES START HERE -->
<!-- ROOT NODE -->
<xs:element name="auteur" >
<xs:complexType>
<xs:sequence>
<!-- SOURCES -->
<xs:element name="source" minOccurs="0" maxOccurs="unbounded">
<xs:complexType>
<xs:sequence>
<!-- only 1 location allowed per source -->
<xs:element name = "location" type="xs:string" minOccurs="1" maxOccurs="1"/>
<!-- TIMESTAMPS -->
<xs:element name = "timestamp" minOccurs="0" maxOccurs="unbounded" >
<xs:complexType>
<xs:attribute name="pos" type="xs:decimal" use="required" />
</xs:complexType>
</xs:element>
<!-- CLIPS -->
<xs:element name="clip" minOccurs="0" maxOccurs="unbounded" >
<xs:complexType>
<xs:attribute name = "id" type="xs:string" use="required" />
<xs:attribute name = "start" type="xs:decimal" />
<xs:attribute name = "end" type="xs:decimal" />
</xs:complexType>
</xs:element>
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:schema>
THE PYTHON to put it all together
a short python script to check foo.xml's validity with foo.xsd
(note lxml is NOT in python standard lib - a real pity!)
#! /usr/bin/env python
from lxml import etree
try:
doc = etree.parse("foo.xml")
xsd = etree.parse("foo.xsd")
xmlschema = etree.XMLSchema(xsd)
xmlschema.assertValid(doc)
print ("document validates!")
except etree.XMLSyntaxError as e:
print ("PARSING ERROR", e)
except AssertionError as e:
print ("INVALID DOCUMENT", e)