Python XML parser provides us an easy way to read the XML file and extract useful data. Today we will look into python ElementTree XML API and learn how to use it to parse XML file as well as modify and create XML documents.
Python XML Parser – Python ElementTree
Python ElementTree is one of the most efficient APIs to extract, parse and transform XML data using Python programming language. In this post, we will have good look at how to create, read, parse and update XML data in files and programmatically.
Let’s get started with Python XML parser examples using ElementTree.
Python ElementTree examples
We will start with a very simple example to create an XML file programmatically and then, we will move to more complex files.
Creating XML file
In this example, we will create a new XML file with an element and a sub-element. Let’s get started straight away:
1 2 3 4 5 6 7 8 9 10 11 12 13 |
import xml.etree.ElementTree as xml def createXML(filename): # Start with the root element root = xml.Element("users") children1 = xml.Element("user") root.append(children1) tree = xml.ElementTree(root) with open(filename, "wb") as fh: tree.write(fh) if __name__ == "__main__": createXML("test.xml") |
Once we run this script, a new file will be created in the same directory with file named as test.xml
with following contents:
1 2 3 |
<users><user /></users> |
There are two things to notice here:
- While writing the file, we used
wb
mode instead ofw
as we need to write the file in binary mode. - The child user tag is a self-closing tag as we haven’t put any sub-elements in it.
Adding values to XML elements
Let’s improve the program by adding values to the XML elements:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
import xml.etree.ElementTree as xml def createXML(filename): # Start with the root element root = xml.Element("users") children1 = xml.Element("user") root.append(children1) userId1 = xml.SubElement(children1, "id") userId1.text = "123" userName1 = xml.SubElement(children1, "name") userName1.text = "Shubham" tree = xml.ElementTree(root) with open(filename, "wb") as fh: tree.write(fh) if __name__ == "__main__": createXML("test.xml") |
Once we run this script, we will see that new elements are added added with values. Here are the content of the file:
1 2 3 4 5 6 7 8 |
<users> <user> <id>123</id> <name>Shubham</name> </user> </users> |
This is perfectly valid XML and all tags are closed. Please note that I formatted the XML myself as the API writes the complete XML in a single fine which is kind a bit, incompleteness!
Now, let’s start with editing files.
Editing XML data
We will use the same XML file we showed above. We just added some more data into it as:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
<users> <user> <id>123</id> <name>Shubham</name> <salary>0</salary> </user> <user> <id>234</id> <name>Pankaj</name> <salary>0</salary> </user> <user> <id>345</id> <name>JournalDev</name> <salary>0</salary> </user> </users> |
Let’s try and update salaries of each user:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
import xml.etree.ElementTree as xml def updateXML(filename): # Start with the root element tree = xml.ElementTree(file=filename) root = tree.getroot() for salary in root.iter("salary"): salary.text="1000" tree = xml.ElementTree(root) with open("updated_test.xml", "wb") as fh: tree.write(fh) if __name__ == "__main__": updateXML("test.xml") |
It is worth noticing that if you try to update an elements value to an integer, it won’t work. You will have to assign a String, like:
1 2 3 |
salary.text="1000" |
instead of doing:
1 2 3 |
salary.text = 1000 |
Python XML Parser Example
This time, let’s try to parse the XML data present in the file and print the data:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 |
import xml.etree.cElementTree as xml def parseXML(file_name): # Parse XML with ElementTree tree = xml.ElementTree(file=file_name) print(tree.getroot()) root = tree.getroot() print("tag=%s, attrib=%s" % (root.tag, root.attrib)) # get the information via the children! print("-" * 40) print("Iterating using getchildren()") print("-" * 40) users = root.getchildren() for user in users: user_children = user.getchildren() for user_child in user_children: print("%s=%s" % (user_child.tag, user_child.text)) if __name__ == "__main__": parseXML("test.xml") |
When we run above script, below image shows the output produced.
In this post, we studied how to extract, parse, and transform XML files. ElementTree
is one of the most efficient APIs to do these tasks. I would suggest you try some more examples of XML parsing and modifying different values in XML files.
Reference: API Doc