XML-Basic Introduction for Beginners

XML-Basic Introduction for Beginners

XML stands for Extensible Markup Language and it is case sensitive and must begin with "<?xml>"

Syntax

which also called XML Prolog or XML declaration.

<?xml version = "version-number" encoding = "encoding-declaration" standalone = "standalone-status" ?>

Version

  • defines: XML standard version
  • value: 1.0

Encoding

  • defines: character encoding of a document
  • value(one of these): UTF-8, UTF-16, ISO-10646-UCS-2, ISO-10646-UCS-4, ISO-8859-1 to ISO-8859-9,ISO-2022-JP, Shift_JIS, EUC-JP
  • default: UTF-8

Why UTF-8 is default or most comminly used, as unwesen said at stakoverflow,

There are essentially two different types of encodings: one expands the value range by adding more bits. Examples of these encodings would be UCS2[Unicode Character Set Coded in 2 octets] (2 bytes = 16 bits) and UCS4[Unicode Character Set Coded in 4 octets] (4 bytes = 32 bits). They suffer from inherently the same problem as the ASCII and ISO-8859 standards, as their value range is still limited, even if the limit is vastly higher.

The other type of encoding uses a variable number of bytes per character, and the most commonly known encodings for this are the UTF[Unicode Transformation Format] encodings. All UTF encodings work in roughly the same manner: you choose a unit size, which for UTF-8 is 8 bits, for UTF-16 is 16 bits, and for UTF-32 is 32 bits. The standard then defines a few of these bits as flags: if they're set, then the next unit in a sequence of units is to be considered part of the same character. If they're not set, this unit represents one character fully. Thus the most common (English) characters only occupy one byte in UTF-8 (two in UTF-16, 4 in UTF-32), but other language characters can occupy six bytes or more.

Windows handles so-called "Unicode" strings as UTF-16 strings, while most UNIXes default to UTF-8 these days. Communications protocols such as HTTP tend to work best with UTF-8, as the unit size in UTF-8 is the same as in ASCII, and most such protocols were designed in the ASCII era. On the other hand, UTF-16 gives the best average space/processing performance when representing all living languages.

The Unicode standard defines fewer code points than can be represented in 32 bits. Thus for all practical purposes, UTF-32 and UCS4 became the same encoding, as you're unlikely to have to deal with multi-unit characters in UTF-32.

Standalone

  • defines: document refers to external entities or not
  • value(one of these): yes,no
  • default: no

Now, what are external entities?

  • Document Type Definition is one of the external entities examples.
  • DTD use to define the custom rules of your XML document, which is more likely validating your XML document. by specifying the names of the elements that are allowed in the document, which elements are allowed to be nested inside other elements, and which elements can only contain data.
  • DTD itself can be either internal or external, internal DTD are specified within the document and external DTD are specified in an external file

Syntax:

External Document Type Definition

.dtd

   element1/attribute1 declaration
   element2/attribute2 declaration
   ...

.xml

<!DOCTYPE <root element> SYSTEM "file path">

i.e. list.dtd

<!ELEMENT list (song)+>
<!ELEMENT song (singer-name,name)+>
<!ELEMENT singer-name (#PCDATA)>
<!ELEMENT name (#PCDATA)>

song.xml

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE list SYSTEM "./list.dtd">
<list>
  <song>
    <singer-name>
      Fash
    </singer-name>
    <name>
      Lights on red
    </name>
  </song>
</list>

Internal Document Type Definition

   <!DOCTYPE <root-element>
   [
    element1/attribute1 declaration
    element2/attribute2 declaration
   ...
   ]>

i.e. song.xml

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE list [
<!ELEMENT list (song)>
<!ELEMENT song (singer-name)+>
<!ELEMENT singer-name (#PCDATA)>
]>
<list>
  <song>
    <singer-name>
      Fash
    </singer-name>
  </song>
</list>
  • If declaration and definition are not matched then errors like these will be shown:

Cause: Genre attribute not defined in DTD

image.png

Cause: Year element not found in DTD

image.png

Elements

Elements are used for making XML documents and documents can have one or more elements. these elements are either delimited by start and end tags <song>Lights in red</song> or for empty elements <song></song> or by an empty-element tag <song/>

Rules:

  • Name of an element contains only alphanumeric characters, hyphen(-), underscore(_) and period(.)

i.e. <singer-name/>

  • As XML is case-sensitive, start and end tags of an element must be the same

i.e.

<singer-name></singer-name>
<singer-name></singerName> //cause error

Elements Nesting

  • Element can contain multiple child elements, but tags must be closed in the order they started.
<root>
  <child>
    <subchild1>.....</subchild1>
    <subchild2>.....</subchild2>
    <subchild3>.....</subchild3>
  </child>
</root>

i.e.

<?xml version="1.0" encoding="UTF-8"?>
<list>
  <song>
    <singer-name>
    Fash
    </singer-name>
    <name>
    Lighting in red
   </name>
  </song>
</list>

Root Element

  • Each XML document must have a root element.
<root>
  <child>
    <subchild>.....</subchild>
  </child>
</root>

i.e.

<?xml version="1.0" encoding="UTF-8"?>
<list>
  <song>Lights in red</song>
</list>

Attributes

  • Attributes are part of XML elements, As they define properties of elements in name-value pair they must be unique for that element, the same attributes do not define multiple times.
  • Attribute names in XML are case sensitive.
  • Attribute names are defined without quotation marks, and attribute values must be defined with quotation marks.

i.e.

<singer-name genre='classic soul' />
<singer-name genre='classic soul' genre="pop"/> //cause error

Element vs Attribute

//Element
<?xml version="1.0" encoding="UTF-8"?>
<list>
  <song>
    <singer-name>
    Fash
    </singer-name>
    <name>
    Lighting in red
    </name>
    <genre>
    Classic Soul
    </genre>
  </song>
</list>

//Attribute
<?xml version="1.0" encoding="UTF-8"?>
<list>
  <song genre="Classic Soul">
    <singer-name>
    Fash
    </singer-name>
    <name>
    Lighting in red
    </name>
  </song>
</list>

XML Rules

  • Every XML element must have a closing tag.

valid:

<?xml version="1.0" encoding="UTF-8"?>
<list>
  <song>Lights in red</song>
  <singer-name/>
</list>

invalid:

<?xml version="1.0" encoding="UTF-8"?>
<list>
  <song>Lights in red
  <singer-name>
</list>
  • XML tags are case-sensitive.

valid:

<?xml version="1.0" encoding="UTF-8"?>
<list>
  <song>Lights in red</song>
</list>

invalid:

<?xml version="1.0" encoding="UTF-8"?>
<list>
  <song>Lights in red</Song>
</list>
  • Every XML element must be properly nested.

valid:

<?xml version="1.0" encoding="UTF-8"?>
<list>
  <song>
    <singer-name>
    Fash
    </singer-name>
    <name>
    Lighting in red
   </name>
  </song>
</list>

invalid:

<?xml version="1.0" encoding="UTF-8"?>
<list>
  <song>
    <singer-name>
    Fash
      <name>
      Lighting in red
     </name>
  </singer-name>
  </song>
</list>
  • Every XML document must have a root element.

valid:

<?xml version="1.0" encoding="UTF-8"?>
<list>
  <song>Lights in red</song>
</list>

invalid:

<?xml version="1.0" encoding="UTF-8"?>
   <singer-name>
    Fash
    </singer-name>
    <name>
    Lighting in red
   </name>
  • Attribute values must always be quoted.

valid:

<?xml version="1.0" encoding="UTF-8"?>
<list>
  <singer-name genre='classic soul'/>
</list>

invalid:

<?xml version="1.0" encoding="UTF-8"?>
<list>
    <singer-name 'genre'='classic soul'/>
    <singer-name genre=classic soul'/>
</list>

Example

song.xml

<?xml version="1.0" encoding="UTF-8"?>

<list>
  <song genre="pop">
    <singer-name>
      Fash
    </singer-name>
    <name>
      Lights on red
    </name>
    <year>
      2022
    </year>
    <singer-name>
      Carly Rae Jepsen
    </singer-name>
    <name>
      Call Me Maybe
    </name>
    <year>
      2011
    </year>
  </song>
  <song genre="edm">
    <singer-name>
      Marshmello, Anne-Marie
    </singer-name>
    <name>
      Friends
    </name>
    <year>
      2018
    </year>
  </song>
</list>

Tree structure xmlsong (1).jpg

In browser

image.png

Did you find this article valuable?

Support TheSourcePedia's Blog by becoming a sponsor. Any amount is appreciated!