XML, the eXtensible Markup Language, is a formal, open specification that describes the structure of data objects and the algorithms used to process them. XML decouples data access from the specialized, often short-lived software that creates and uses data: in short, it ensures that one’s data is never rendered unusable by changes in technology.
Much of XML’s success can be attributed to its simplicity. Indeed, the success of the World Wide Web is in large part due to its inventor choosing to use HTML, a variant of XML, as its markup language.
The basics of XML can be expressed in only a few sentences:
- Raw “data” comprises unadorned information, such as a name or quantity.
- To be understood, the context or meaning of the data must be associated with the data: if the data is a name, what does it name? If a quantity, what does it measure? In other words, the meaning of the data is “meta-data”: information about the data, but not the actual data itself.
You are already intimately familiar with these two ideas. Every form you have filled out demonstrates the use of meta-data and data. The boxes where you provide a name, an address, or a date are labeled with meta-data; checkboxes are adjoined with a description.
In XML, data content is “tagged” using angle bracket markers containing the name of the tag. Tags occur in pairs, surrounding the data content:
<tag>data content</tag>
Typically, tags convey meaning and, if needed, can convey additional meaning through the use of attributes, which are placed inside the tag markers:
<address location="home">1234 Whosville</address>
<address location="work">9876 Grinchtown</address>
Tagged content can itself contain tagged content.
<text>There is <bold>bold</bold> text in this sentence.</text>
Tags can not, however, overlap:
<text>This is <italics>NOT valid XML!</text></italics>
A few other rules dictate what comprises a valid name, how to represent angle brackets in a way that doesn’t create a tag by accident, reserve a few special characters used to indicate comments and processing instructions, and stake out a few special attributes used to convey important meta-data to the processor.
XML by itself was an important step forward in developing a robust data object specification that is rigorously defined yet adaptable to nearly all imaginable use cases. There are other data object formats: some are proprietary and thus render data unusable when the associated software becomes extinct; while others are also plain text but lack the expressiveness required of a complete solution, or lack the robustness required to maintain data integrity.
XML in combination with schemas, providing data content and data structure validation, and XPath, providing the ability to address parts of an XML document and perform basic data manipulation, extend XML’s capabilities far beyond those available through other data object specifications.
Why XML?
Because XML can represent every data structure, has a complete and comprehensive open specification, will still be in use hundreds of years from now, and is supported by toolsets in myriad languages. When interoperability and longevity are concerns, XML is the best choice.