Saturday, 15 January 2011

validating xml with python

THE XML

I wanted to validate the XML sheets produced by auteur
example XML produced by auteur.

<?xml version="1.0" encoding="UTF-8"?>
    
<auteur>
  <source>
    <location>/home/neil/Videos/2010_04_canada/M2U00547.MPG</location>
      <timestamp pos="11" />
      <clip end="7.5" id="0001" start="1.5" />
      <clip end="13.6" id="0002" start="7.5" />
      <clip end="1.5" id="0003" start="0.0" />
  </source>
  
  <source>
    <location>/home/neil/Videos/2010_04_canada/M2U00549.MPG</location>
    <timestamp pos="2.678" />
  </source>

</auteur>


The Schema

and here's the schema I wrote to check the validity of that data.
(saved as foo.xsd)

<?xml version="1.0"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">

<xs:annotation>
    <xs:documentation>
       The rules in this schema will be used to validate 
       the content of an auteur project file.
    </xs:documentation>
    <xs:appinfo source="http://www.auteur-project.org" > </xs:appinfo>
</xs:annotation>
<!--RULES START HERE -->

<!-- ROOT NODE -->
<xs:element name="auteur" >
<xs:complexType>
<xs:sequence>

  
  <!-- SOURCES -->
  <xs:element name="source" minOccurs="0" maxOccurs="unbounded">
  
    <xs:complexType>
    <xs:sequence>
        <!-- only 1 location allowed per source -->
        <xs:element name = "location"  type="xs:string" minOccurs="1" maxOccurs="1"/>
        
        <!-- TIMESTAMPS -->
        <xs:element name = "timestamp" minOccurs="0" maxOccurs="unbounded" >
          <xs:complexType>
          <xs:attribute name="pos" type="xs:decimal" use="required" />
          </xs:complexType>
        </xs:element>
          
        <!-- CLIPS -->
        <xs:element name="clip" minOccurs="0" maxOccurs="unbounded" >
            <xs:complexType>
            <xs:attribute name = "id" type="xs:string"  use="required" />
            <xs:attribute name = "start" type="xs:decimal" />
            <xs:attribute name = "end" type="xs:decimal" />
            </xs:complexType>
        </xs:element>
        
    
    </xs:sequence>
    </xs:complexType>  
    
  </xs:element>

</xs:sequence>
</xs:complexType>
</xs:element>
</xs:schema>

THE PYTHON to put it all together

a short python script to check foo.xml's validity with foo.xsd

(note lxml is NOT in python standard lib - a real pity!)

#! /usr/bin/env python
from lxml import etree

try:
    doc = etree.parse("foo.xml")    
    xsd = etree.parse("foo.xsd")

    xmlschema = etree.XMLSchema(xsd)
    xmlschema.assertValid(doc)
    
    print ("document validates!")

except etree.XMLSyntaxError as e:
    print ("PARSING ERROR", e)
    
except AssertionError as e:
    print ("INVALID DOCUMENT", e)

No comments: