XML is a form of semi structured data which is organized in the form of trees. Semi structured data is helpful when you serialize the program data for saving in a file or shipping across a network. It defines a standardized document which is easy to read an interpret. XML stands for eXtensible Markup Language.
XML consists of two basic elements text and tags. Text is a sequence of characters. Tags consists of a less than sign alphanumeric character and greater than sign. An end tag is same as start tag except that it consists of a slash in the end. Start tag and end tag must have the same label.
For example;
1 2 3 4 5 |
<school> <standard>4</standard> </school> |
Above is valid XML as the start and end tag match each other.
1 2 3 |
<school><standard>6</standard> 7 |
Above is invalid XML as the end tag is not specified.
1 2 3 |
<school><standard>8 </school></standard> |
Above XML is also invalid because the standard tag which is the child should be closed first and then the parent tag school should be closed.
Since tags have to be matched, XML are structured as nested elements. The start and end tags forms a pair of matching elements and elements can be nested within each other. In the above example standard is the nested element.
The shorthand notation which is the start tag followed by the slash indicates the start and end tag. One tag with a slash indicates an empty element.
For instance in below XML standard
is an empty element.
1 2 3 |
<school> <standard /> </school> |
Start tags can have attributes. An attribute is a name value pair with an equal sign in the middle. The attribute is surrounded by double quotes or single quotes.
For instance
1 2 3 |
<standard section ="A" strings = "true"></standard> |
Now that we have a brief knowledge of XML, let’s look over different things we can do in Scala for XML processing.
Scala XML Literals
Type a start tag and then continue writing the XML content. The XML contents are read until the end tag is seen.
For example, Open the Scala REPL shell and execute the code as
1 2 3 |
<a>Scala is a functional Programming language</a> |
Scala expression can be evaluated in the tag value using curly braces. For example;
1 2 3 |
<a> {"hi"+",Reena"} </a> |
Output: res1: scala.xml.Elem = <a> hi,Reena </a>
A brace escape can include arbitrary scala content including XML literals. For example;
1 2 3 4 |
val marks = 78 <a> { if ( marks < 80) <marks> {marks} </marks> else xml.NodeSeq.Empty } </a> |
Output: res3: scala.xml.Elem = <a> <marks> 78 </marks> </a>
The code inside the curly braces are evaluated to an XML node or a sequence of XML nodes. In the above example if the marks is less than 80 it is added to <a> element else nothing is added.
The expression inside the brace is evaluated to a scala value and then converted to string and inserted as text.
1 2 3 |
<a> {9+40} </a> |
Output: res4: scala.xml.Elem = <a> 49 </a>
The <, >, and & characters in the text will be escaped if you print the node.
1 2 3 |
<a> {"</a>Hello Scala<a>"} </a> |
Output: res5: scala.xml.Elem = <a> </a>Hello Scala<a> </a>
Below image shows all the above Scala XML Literals processing in scala shell.
Serialization in Scala
Serialization converts the internal data structure to XML so that the data can be stored, transmitted or reused. Use XML literals and brace escapes to convert to XML. Use the toXML
method that supports XML literals and brace escapes.
For example first of all we will define Student class and create an instance of it.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 |
scala> abstract class Student { val name:String val id:Int val marks:Int override def toString = name def toXML = <student> <name>{name}</name> <id>{id}</id> <marks>{marks}</marks> </student> } scala> val stud = new Student { val name = "Rob" val id = 12 val marks =90 } scala> stud.toXML res7: scala.xml.Elem = <student> <name>Rob</name> <id>12</id> <marks>90</marks> </student> |
Below image shows the scala serialization process in scala shell.
Scala XML Parsing
There are many methods available for XML classes. Let us now see a very useful method as how to extract text, sub elements and attributes.
Extracting Text
The text method on the XML node retrieves the text within that node. For example;
1 2 3 4 |
scala> <a>Scala is a <p>programming</p> language </a>.text Output: res8: String = "Scala is a programming language " |
Here the tags are excluded from the output.
Extracting sub-elements
The sub elements are extracted by calling \ followed by tag name. For example;
1 2 3 4 5 6 |
scala> <school><standard><section>C</section></standard></school> \"section" Output:res21: scala.xml.NodeSeq = NodeSeq(<section>C</section>) scala> <school><standard><section>C</section></standard></school> \"school" Output:res22:scala.xml.NodeSeq = NodeSeq(<school><standard><section>C</section></standard></school>) |
Below image shows the above xml parsing examples in scala shell.
Tag attributes are extracted using the same and \ methods with an at sign (@) before the attribute name. For example;
1 2 3 4 5 6 7 8 9 10 11 |
scala> val adam = <student name = "Adam" id ="12" marks = "65" /> Output:adam: scala.xml.Elem = <student name="Adam" id="12" marks="65"/> scala> adam \"@name" Output:res3: scala.xml.NodeSeq = NodeSeq(Adam) scala> adam \"@iduct" Output:res5: scala.xml.NodeSeq = NodeSeq(12) |
Scala De-serialization example
The XML is converted back to the internal data structure for the program to use. For example;
The Student class created during serialization process shall be used as the student class and the toXML
methods are used.
1 2 3 4 5 6 7 8 |
scala> def fromXML(node: scala.xml.Node): Student = new Student { val name = (node "name").text val id = (node "id").text.toInt val marks = (node "marks").text.toInt } |
Output: fromXML: (node: scala.xml.Node)Student
Now call the stud created in the serialization and print the xml content as below.
1 2 3 4 5 6 7 |
scala> val stud = new Student { val name = "Rob" val id = 12 val marks =90 } |
Now invoke toXML
method as;
1 2 3 4 5 6 7 8 9 |
scala> val st = stud.toXML st: scala.xml.Elem = <student> <name>Rob</name> <id>12</id> <marks>90</marks> </student> |
Call the fromXML
method as;
1 2 3 4 |
scala>fromXML(st) Output:res17: Student = Rob |
Scala XML Saving into file and Loading from file
The XML.saveFull
command is used to convert data to a file of bytes. The first argument is the file name to which the node is to be saved, second is the node, third is the character encoding, fourth is whether to write an XML declaration at the top that includes the character encoding and finally the fifth is the document type.
For example;
1 2 3 |
scala> scala.xml.XML.save("stud.xml",st,"UTF-8",true,null) |
We are using the st node created above in the de-serialization process.
Now open the stud.xml
file which stores the following contents:
1 2 3 4 5 6 7 8 |
<?xml version='1.0' encoding='UTF-8'?> <student> <name>Rob</name> <id>12</id> <marks>90</marks> </student> |
Now for loading the file we can use the load
method as;
1 2 3 4 5 6 7 8 9 |
scala> val s1 = xml.XML.load("stud.xml") s1: scala.xml.Elem = <student> <name>Rob</name> <id>12</id> <marks>90</marks> </student> |
That’s all for XML processing in Scala programming, we will look into more Scala features in coming posts.