QUESTION: What is XML, and how does it work?

ANSWER: XML stands for Extensible Markup Language. You can think of XML as a method for describing information so computers can understand it easily.

On this Web page, we discuss some basic XML concepts. We also give some rudimentary examples that show how XML technology works.


What does XML look like?

Here is some XML code:

<?xml version='1.0' ?>

<students>

  <student>
    <name>
      <first>John</first>
      <middle>Michael</middle>
      <last>Green</last>
    </name>
    <birth_date>
      <month>8</month>
      <day>28</day>
      <year>1992</year>
    </birth_date>
    <sex>male</sex>
    <grade>3</grade>
  </student>

  <student>
    <name>
      <first>Jane</first>
      <middle>Marie</middle>
      <last>Boyd</last>
    </name>
    <birth_date>
      <month>2</month>
      <day>26</day>
      <year>1993</year>
    </birth_date>
    <sex>female</sex>
    <grade>2</grade>
  </student>
  
  <student>
    <name>
      <first>Nancy</first>
      <middle>Elizabeth</middle>
      <last>Butterworth</last>
    </name>
    <birth_date>
      <month>4</month>
      <day>05</day>
      <year>1993</year>
    </birth_date>
    <sex>female</sex>
    <grade>2</grade>
  </student>
    
</students>

You can probably understand this information quite easily after looking at it for a few moments. Let's break it down.

  • The line that says <?xml version='1.0' ?> tells computer programs and humans that are reading the file that it is an XML file and that the XML code in the file conforms to Version 1.0 of the W3C's XML Standard.

  • The portion of the XML code that contains data begins with <students> and ends with </students>. You can assume that the file contains information about (duh!) students!

  • The code is sectioned off into areas that begin with <student> and end with </student>. Each of these sections gives information about a particular student.

  • For each student, each individual tidbit of information is enclosed within a set of labels. For example, grade is a label, and the student's grade is preceded by <grade> and followed by </grade>.


XML is not a language!

Technically speaking, XML is not a language.

(Yes, we know we just told you XML stands for "Extensible Markup Language" — and now we are saying XML isn't even a language! We're sorry this is somewhat confusing, but that's just the way it is. Bear with us, and we will explain.)


If XML isn't a language, then what is it?

XML is a set of rules that explain how to create languages.

Stretching this definition a bit, some people say that "XML is a language that specifies how to create other languages" (which is why the folks who created XML can get away with calling it "Extensible Markup Language").

In our discussion here, languages that conform to XML's rules will be referred to as "XML-based languages." We will call the above XML-based language the students language.


A set of rules that explain how to create languages

As we've said, XML is a set of rules that explain how to create languages. To understand this concept, think for a moment about two languages that are probably familiar to you: English and Spanish. There is a set of basic syntax rules that apply to both languages, such as:

  • Words are separated from each other by spaces.
  • Words are contained in sentences.
  • A sentence begins with a capitalized word
  • A sentence ends with a period, question mark, or exclamation point.

Just as English and Spanish have syntax rules, so does XML. Here are some of them:

  • Each piece of information starts and ends with a label.
  • Each label enclosed between <  and  >.
  • The label that precedes a piece of information does not begin with a slash ("/"). The label that follows a piece of information does begin with a slash.
  • Small pieces of information are contained within larger pieces of information.

All XML-based languages conform to these rules.


Tags

In XML parlance, the labels that enclose chunks of information are called "tags." In our example above, students is a tag, student is a tag, name is a tag, first is a tag, and so on.

To specify an hierarchy within a collection of information, you nest tags inside each other. For example, in the students language, <first>, <middle>, and <last> are nested inside <name> which in turn is nested inside <student> which in turn is nested inside <students>.


Another XML-based language

Now look at the following XML-based faculty_members language:

<?xml version='1.0' ?>

<faculty_members>

  <faculty_member>
    <name>
      <first>Joanne</first>
      <middle>Gayle</middle>
      <last>Jones</last>
    </name>
    <date_of_birth>08/28/1955</date_of_birth>
    <gender>female</gender>
    <job_title>teacher</job_title>
    <email>[email protected]</email>
    <mobile>555-488-3499</mobile>
    <salary>42,000</salary>
  </faculty_member>
  
  <faculty_member>
    <name>
      <first>Ruth</first>
      <middle>Ann</middle>
      <last>Green</last>
    </name>
    <date_of_birth>08/25/1978</date_of_birth>
    <gender>female</gender>
    <job_title>teacher's assistant</job_title>
    <email>[email protected]</email>
    <mobile>555-488-1261</mobile>
    <salary>21,500</salary>
  </faculty_member>  
  
  <faculty_member>
    <name>
      <first>Vincent</first>
      <middle>John</middle>
      <last>Pellio</last>
    </name>
    <date_of_birth>07/16/1959</date_of_birth>
    <gender>male</gender>
    <job_title>teacher</job_title>
    <email>[email protected]</email>
    <mobile>555-488-7528</mobile>
    <salary>36,500</salary>
  </faculty_member>    
  
</faculty_members>


Right away, you can see there are some similarities and some differences between the students language and the faculty_members language.

The two languages are similar in that they both follow the XML syntax rules.

However, beyond the obvious distinction between information that pertains to students and information that pertains to faculty members, the two languages are different in a couple ways.

  1. First, the two languages use different terminology for the same types of information. For example, the students language has tags called birth_date and sex, but the faculty_members language uses tags called date_of_birth and gender to express the same concepts.

  2. Second, the information is organized differently in the two languages. In the students language, the date of birth is broken down into three pieces: month, day, and year. In the faculty_members language, however, the date of birth is expressed as one piece of text.

Going forth with our analogy — let's go over some of the differences between English and Spanish.

The words themselves are different.

Here are some English words:
  house hello bicycle dove

Here are the Spanish equivalents:
  casa hola bicicleta paloma

The rules of grammar are different.

In English, an adjective comes before the noun it modifies:
  a white dove

In Spanish. an adjective comes after the noun it modifies:
  una paloma blanca

To sum up: the differences between two XML-based languages are a lot like the differences between English and Spanish. That is: the syntax rules are basically the same, but some of the words are different, and the rules of grammar are different.


* * * TIME TO REVIEW * * *

Before we go on, let's review what we've learned so far about XML:

  • XML is not a language. It is a set of rules that specify how to create a language.
  • XML's rules specify the syntax that is used for all XML-based languages.
  • XML tags specify the meaning of each individual chunk of information.
  • XML tags can be nested to describe hierarchical relationships among chunks of information.

How do you use XML to create a language?

To create an XML-based language, all you have to do is figure out what kind of information is needed; figure out a logical, hierarchical structure for the information; and then use XML rules to put it together.

For example, to create the students language, the first thing we did was think about the characteristics of students. Basically, what we came up with was this: students have names, each student was born on a certain date and is either a boy or a girl, each student is assigned to a grade number.

After thinking about the characteristics of students and after thinking about the hierarchy we needed, we were ready to create the students language, which we did by typing the XML code into a text editor. We used a text editor called XMLwriter, which is designed specifically for creating XML files. However, you don't need a special text editor; you can use pretty much any text editor you have on your computer. For example, if you are using a Windows computer, you could use Notepad, WordPad, or Microsoft Word. Just make sure you save your XML files in plain-text (ASCII) format, and give each XML file an .XML filename extension.

We think you'll agree that the XML-based students language and the XML-based faculty_members language can be understood pretty easily be most literate, English-speaking humans.

But what about computers? Can a computer understand an XML-based language well enough to make use of the information in XML files?

The answer is: "Yes, but a computer needs help understanding each particular XML-based language."


How do you enable computers to understand XML-based languages?

It turns out there are two methods for describing XML-based languages. They are:

  1. The Document Type Definition (DTD)
  2. The XML Schema
In this discussion, we will talk about only one of them: the DTD. Most of the things we say about the DTD apply to the XML Schema as well.


What is a DTD?

A DTD is a series of expressions that define the logical structure of an XML document. In other words, the expressions in a DTD specify the rules for a particular XML-based language.

Humans and computers use DTDs to understand XML-based languages.


What does a DTD look like?

Below is a DTD that specifies the rules for the "faculty_members" language.


<!ELEMENT faculty_members
  (faculty_member+)>
<!ELEMENT faculty_member
  (name, date_of_birth, gender, job_title, email, mobile, salary)>
<!ELEMENT name
  (first, middle, last)>
<!ELEMENT date_of_birth
  (#PCDATA)>
<!ELEMENT gender
  (#PCDATA)>
<!ELEMENT job_title 
  (#PCDATA)>
<!ELEMENT email 
  (#PCDATA)>
<!ELEMENT mobile
  (#PCDATA)>
<!ELEMENT salary 
  (#PCDATA)>
<!ELEMENT first 
  (#PCDATA)>
<!ELEMENT middle 
  (#PCDATA)>
<!ELEMENT last 
  (#PCDATA)>

A DTD enables a computer program to understand a particular XML-based language. A computer program designed for XML technology can digest a DTD as easily as a dog digests Alpo. Once two computer programs have digested the above DTD, they can talk to each other about faculty_members for forty days and forty nights or even longer.

As mentioned above, DTDs are also used by humans. For example, when a person is creating an XML-based language, that person usually creates a DTD that describes the language. Creating a DTD helps the person to organize his thoughts while he is creating the XML-based language. Also, if other people need to understand the language being created, the person creating the language can give the DTD to the other people, and they can use the DTD to learn the language.

Just like when you create an XML file, you can use any text editor to create a DTD, as long as you save your DTD file in plain-text (ASCII) format. You should give a DTD file a .DTD filename extension.

Note that you are not required to create a DTD when you create an XML-based language. If you are creating a very simple XML-based language that needs to be understood by humans but does not need to be understood by computers, there is no need to create a DTD. In real-life situations, though, XML-based languages are usually somewhat complicated, and they usually need to be understood by computers.


This is probably a good time to mention that in real-life situations you probably would create DTD files by hand, by typing into a text editor; however, you probably would NOT create XML files by hand. Typically, XML files are created by software and are read by software. But on this Web page we are explaining how to create XML files by hand so you can do it as an exercise to help you learn about XML.


"Well-formed" and "valid" XML

At this point, it's appropriate to discuss the concepts of "well-formed" XML and "valid" XML.


Well-formed XML

"Well-formed" XML is simply XML that is structured according to XML syntax rules. Here is an XML file that is well formed.

<?xml version = "1.0" ?>

<Name>
  <First>Sally</First>
  <Middle>Jessy</Middle>
  <Last>Raphael</Last>
</Name>

When working with XML, it is essential to make sure your XML is well-formed. Otherwise, computer programs won't be able to parse it.

You can obtain software that will tell you whether or not your XML is well formed. Such software is called an XML parser. An XML parser reads XML code, analyzes it, and puts it into computer memory. There is an XML parser in the XMLwriter software we mentioned above.


Valid XML

"Valid" XML is XML that adheres to a specific DTD or a specific XML Schema.

For example, the following XML file is valid with respect to the faculty_members DTD we presented earlier, because the information is structured according to that DTD.

<?xml version = "1.0" ?>

<faculty_members>

  <faculty_member>
    <name>
      <first>Ronald</first>
      <middle>George</middle>
      <last>Cutler</last>
    </name>
    <date_of_birth>05/17/1974</date_of_birth>
    <gender>male</gender>
    <job_title>teacher</job_title>
    <email>[email protected]</email>
    <mobile>555-923-0056</mobile>
    <salary>45,700</salary>
  </faculty_member>
  
  <faculty_member>
    <name>
      <first>Frederick</first>
      <middle>James</middle>
      <last>Adler</last>
    </name>
    <date_of_birth>03/06/1958</date_of_birth>
    <gender>male</gender>
    <job_title>teacher</job_title>
    <email>[email protected]</email>
    <mobile>555-799-4521</mobile>
    <salary>46,250</salary>
  </faculty_member>  
  
  <faculty_member>
    <name>
      <first>Vivian</first>
      <middle>Arlene</middle>
      <last>Sullivan</last>
    </name>
    <date_of_birth>09/12/1971</date_of_birth>
    <gender>female</gender>
    <job_title>teacher's assistant</job_title>
    <email>[email protected]</email>
    <mobile>555-214-1145</mobile>
    <salary>22,500</salary>
  </faculty_member>    

<faculty_members> 


Now look at the following XML file. It is not valid with respect to the DTD we showed, because it uses the tag <birthdate> for each faculty_member's date of birth, but the DTD stipulates that the tag <date_of_birth> must be used.


<?xml version = "1.0" ?>

<faculty_members>

  <faculty_member>
    <name>
      <first>Ronald</first>
      <middle>George</middle>
      <last>Cutler</last>
    </name>
    <birthdate>05/17/1954</date_of_birth>
    <gender>male</gender>
    <job_title>teacher</job_title>
    <email>[email protected]</email>
    <mobile>555-623-1196</mobile>
    <salary>45,700</salary>
  </faculty_member>
  
  <faculty_member>
    <name>
      <first>Frederick</first>
      <middle>James</middle>
      <last>Adler</last>
    </name>
    <birthdate>03/06/1958</date_of_birth>
    <gender>male</gender>
    <job_title>teacher</job_title>
    <email>[email protected]</email>
    <mobile>555-467-2121</mobile>
    <salary>46,250</salary>
  </faculty_member>  
  
  <faculty_member>
    <name>
      <first>Vivian</first>
      <middle>Arlene</middle>
      <last>Sullivan</last>
    </name>
    <birthdate>09/12/1971</date_of_birth>
    <gender>female</gender>
    <job_title>teacher's assistant</job_title>
    <email>[email protected]</email>
    <mobile>555-945-2311</mobile>
    <salary>22,500</salary>
  </faculty_member>    

<faculty_members> 


The concept of valid XML doesn't apply unless you have a DTD or an XML Schema for the particular XML-based language you are working with. Remember, you are not required to create a DTD or an XML Schema when you create an XML-based language. In real-life situations, however, there is usually a DTD or an XML Schema; and you should make sure each XML file you create is valid with respect to the DTD or XML Schema on which it is based.

XML parsers have a feature that will tell you whether or not your XML is valid. The XMLwriter software mentioned above has an XML parser. You tell XMLwriter what XML file you want to validate and what DTD to validate against, and XMLwriter tells you whether or not the code in the XML file is valid.

As we've said (but it bears repeating), XML can be well-formed without being valid. That is the case with the faculty_members example above: the XML follows all the XML syntax rules; but because the tag used for birth-date information is not the tag specified in the DTD, the XML is invalid.



* * * TIME TO REVIEW * * *

Let's do another review.

  • XML is not a language. It is a set of rules that specify how to create a language.

  • XML's rules specify the syntax that is used for all XML-based languages.

  • XML tags specify the meanings of individual bits of information and describe hierarchies within a collection of information.

  • A particular XML-based language is described by a DTD or an XML Schema that was prepared for that specific language.

  • Humans and computers use DTDs and XML Schemas to understand XML-based languages.

  • You should make sure all your XML files are well formed, which means that they are structured according to the syntax rules of XML.

  • If you are working with a DTD or an XML Schema, you should also make sure that all your XML files are valid, which means each XML file follows the rules that are specified in the associated DTD or XML Schema.

  • To see if your XML is well formed and valid, you use a software program called an XML parser.

  • Remember that XML can be invalid even if it is well formed.


How to make use of XML code

The general idea is that software applications read XML files (that is, they retrieve information from XML files), and software applications also write XML files (that is, they put information into XML files).  Since XML has a rigid structure, it is easy to develop software that reads and writes XML files.

When a software application reads an XML file, it can do various things with the information it retrieves.

For example, a school might have a software application that generates HTML code for displaying some faculty_members information on a Web site. The information might be laid out something like this:


NameGenderJob Title
Joanne Jonesfemaleteacher
Ruth Greenfemaleteacher's assistant
Vincent Pelliomaleteacher


The school might have another software application that uses the <email> information to send an email message to each faculty members.

And the school might have another software application that uses the <mobile> information to send a text to each faculty member.


For more information about XML, go to www.w3.org/XML/.
TOP