Why Schemas?

Avato December 21, 2018
Why schemas?

Schemas are specifications for the structure and content of a data object or document. Schemas are used to determine whether a given data object is conformant to a set of rules that dictate the structure of the object (i.e. in which order are data items listed) and content of the object (i.e. what type of information is contained in a data item). Through the use of schemas, a data object can be programatically validated as conformant to its specification: in other words, schemas ensure that one’s data is correct and not the source of processing errors.

XML itself is defined by a schema. The XML schema provides the rules by which a document can be determined to be compliant with the XML specification. For instance, one of the rules is that a tag name must start with a letter:

<tagname> is a valid tag </tagname>
<2tagit> is _not_ a valid tag </2tagit>

Schemas for other XML data objects or documents build upon the XML schema. A schema might, for instance, have a rule regarding the content of a tag: for a schema that defines a tag that is intended to wrap a temperature measurement, it might define the name of the tag, the name of an attribute of the tag, and the type of data the tag will contain:

<element name="temp" type="integer">
    <attribute name="unit" type="character">
        <constraint>
            <value>F</value>
            <value>C</value>
        </constraint>
    </attribute>
</element>

Applied against this data object

<temp unit="K">99.9</temp>

the data object would be assessed as invalid: while the tag name temp is correct, the unit attribute K is not one of the two allowed values and the data content 99.9 itself is a decimal number, not an integer.

There are several varieties of XML schema in use today, with differing capabilities for leveraging knowledge of the content of one data element to restrict the content of another. In particular, XSD is the schema language used by XSLT processors to provide data structure and basic content type-checking. Often, XSD is supplemented with a RelaxNG or Schematron schema, which are used during validation to perform comprehensive and complex assessment of the validity of content that can not be achieved through an XSD schema.

An Example XSD

<!-- http://www.xmlvalidation.com/example/ -->
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
  <xs:element name="addresses">
    <xs:complexType>
      <xs:sequence>
        <xs:element ref="address" minOccurs="1" maxOccurs="unbounded"/>
      </xs:sequence>
    </xs:complexType>
  </xs:element>
  <xs:element name="address">
    <xs:complexType>
      <xs:sequence>
        <xs:element ref="name" minOccurs="0" maxOccurs="1"/>
        <xs:element ref="street" minOccurs="0" maxOccurs="1"/>
      </xs:sequence>
    </xs:complexType>
  </xs:element>
  <xs:element name="name" type="xs:string"/>
  <xs:element name="street" type="xs:string"/>
</xs:schema>

An Example Schematron Rule

<!-- https://en.wikipedia.org/wiki/Schematron -->
<schema xmlns="http://purl.oclc.org/dsdl/schematron">
   <pattern>
      <title>Date rules</title>
      <rule context="Contract">
         <assert test="ContractDate < current-date()">ContractDate should be
 in the past because future contracts are not allowed.</assert>
      </rule>
   </pattern>
</schema>

Being able to assess the validity of XML data objects enables one to ensure that programs receive the data that they expect. An entire class of errors is avoided right from the start, simply by rejecting data that does not match the program’s requirements. Further, the XML output of a program can be verified correct, alleviating yet another class of errors that all too frequently slip by unnoticed when less rigorous data standards are used.

Using data typing when developing a program enables integrated development editors to detect errors as the code is being written, providing an immediate heads-up to the programmer and eliminating yet another class of costly run-time errors.

Finally, using schemas prevents decisions from being made based on incomplete, inaccurate, or just plain bad data.

Why schemas?

Because an error caught during data entry is orders of magnitude less expensive to fix than an error caught during program run-time; and an error caught during run-time is many orders of magnitude less expensive than a one caught after a corporate decision has been made based on bad data.

The bottom line is that schemas save money.

David Priest

JSON vs XML

May 7, 2019
READ MORE
Avato

Why Choose XPath?

January 8, 2019
READ MORE