Saturday, 15 January 2011

using pyqt4 to validate XML Schemas

Follow up to the problem solved in the last posting.
lxml worked great, but I don't want to burden folks with yet another 3rd party module.

so I once again looked to pyqt4 for help....

from PyQt4.QtXmlPatterns import QXmlSchemaValidator, QXmlSchema

#open up the xsd file - and load it into QXMLSchema
f=open("foo.xsd", "r")
xsd =

schema = QXmlSchema()

#now the xml itself.
f = open("foo.xml", "r")
xml =

validator = QXmlSchemaValidator(schema)
print (validator.validate(xml))

#Returns True :)

validating xml with python


I wanted to validate the XML sheets produced by auteur
example XML produced by auteur.

<?xml version="1.0" encoding="UTF-8"?>
      <timestamp pos="11" />
      <clip end="7.5" id="0001" start="1.5" />
      <clip end="13.6" id="0002" start="7.5" />
      <clip end="1.5" id="0003" start="0.0" />
    <timestamp pos="2.678" />


The Schema

and here's the schema I wrote to check the validity of that data.
(saved as foo.xsd)

<?xml version="1.0"?>
<xs:schema xmlns:xs="">

       The rules in this schema will be used to validate 
       the content of an auteur project file.
    <xs:appinfo source="" > </xs:appinfo>

<!-- ROOT NODE -->
<xs:element name="auteur" >

  <!-- SOURCES -->
  <xs:element name="source" minOccurs="0" maxOccurs="unbounded">
        <!-- only 1 location allowed per source -->
        <xs:element name = "location"  type="xs:string" minOccurs="1" maxOccurs="1"/>
        <!-- TIMESTAMPS -->
        <xs:element name = "timestamp" minOccurs="0" maxOccurs="unbounded" >
          <xs:attribute name="pos" type="xs:decimal" use="required" />
        <!-- CLIPS -->
        <xs:element name="clip" minOccurs="0" maxOccurs="unbounded" >
            <xs:attribute name = "id" type="xs:string"  use="required" />
            <xs:attribute name = "start" type="xs:decimal" />
            <xs:attribute name = "end" type="xs:decimal" />


THE PYTHON to put it all together

a short python script to check foo.xml's validity with foo.xsd

(note lxml is NOT in python standard lib - a real pity!)

#! /usr/bin/env python
from lxml import etree

    doc = etree.parse("foo.xml")    
    xsd = etree.parse("foo.xsd")

    xmlschema = etree.XMLSchema(xsd)
    print ("document validates!")

except etree.XMLSyntaxError as e:
    print ("PARSING ERROR", e)
except AssertionError as e:
    print ("INVALID DOCUMENT", e)