Python XML Processing

Share on facebook
Share on twitter
Share on linkedin
Share on twitter
Share on tumblr

In this Python tutorial, we will discuss Python XML processing with program and examples.

Introduction of XML Processing in Python

The Extensible Markup Language (XML) is a markup language that is recommended by the World Wide Web Consortium and available as an open standard. XML is extremely helpful for keeping track of small to medium amounts of data without requiring a SQL-based backbone.

The architecture of XML Parser and APIs

The Python standard library presents a minimal but useful set of interfaces to work with XML.

  • Simple API for XML (SAX)
  • Document Object Model (DOM) API

Example code

<movie title="Transformers">
   <type>Anime, Science Fiction</type>
   <format>DVD</format>
   <year>1989</year>
   <rating>R</rating>
   <stars>8</stars>
   <description>A schientific fiction</description>
</movie>
   <movie title="Trigun">
   <type>Anime, Action</type>
   <format>DVD</format>
   <episodes>4</episodes>
   <rating>PG</rating>
   <stars>10</stars>
   <description>Vash the Stampede!</description>
</movie>
<movie title="Ishtar">
   <type>Comedy</type>
   <format>VHS</format>
   <rating>PG</rating>
   <stars>2</stars>
   <description>Viewable boredom</description>
</movie>
</collection>

Important Methods

make_parser

xml.sax.make_parser( [parser_list] )

parse

xml.sax.parse( xmlfile, contenthandler[, errorhandler])

parseString

xml.sax.parseString(xmlstring, contenthandler[, errorhandler])

Example

#!/usr/bin/python
import xml.sax
class MovieHandler( xml.sax.ContentHandler ):
  def __init__(self):
     self.CurrentData = ""
     self.type = ""
     self.format = ""
     self.year = ""
     self.rating = ""
     self.stars = ""
     self.description = ""
  # Call when an element starts
  def startElement(self, tag, attributes):
     self.CurrentData = tag
     if tag == "movie":
        print "*****Movie*****"
        title = attributes["title"]
        print "Title:", title
  # Call when an elements ends
  def endElement(self, tag):
     if self.CurrentData == "type":
        print "Type:", self.type
     elif self.CurrentData == "format":
        print "Format:", self.format
     elif self.CurrentData == "year":
        print "Year:", self.year
     elif self.CurrentData == "rating":
        print "Rating:", self.rating
     elif self.CurrentData == "stars":
        print "Stars:", self.stars
     elif self.CurrentData == "description":
        print "Description:", self.description
     self.CurrentData = ""
  # Call when a character is read
  def characters(self, content):
     if self.CurrentData == "type":
        self.type = content
     elif self.CurrentData == "format":
        self.format = content
     elif self.CurrentData == "year":
        self.year = content
     elif self.CurrentData == "rating":
        self.rating = content
     elif self.CurrentData == "stars":
        self.stars = content
     elif self.CurrentData == "description":
        self.description = content
if ( __name__ == "__main__"):
  
  # create an XMLReader
  parser = xml.sax.make_parser()
  # turn off namepsaces
  parser.setFeature(xml.sax.handler.feature_namespaces, 0)
  # override the default ContextHandler
  Handler = MovieHandler()
  parser.setContentHandler( Handler )
     parser.parse("movies.xml")

Output

Title: Enemy Behind
Type: War, Thriller
Format: DVD
Year: 2003
Rating: PG
Stars: 10
Description: Talk about a US-Japan war
*****Movie*****
Title: Transformers
Type: Anime, Science Fiction
Format: DVD
Year: 1989
Rating: R
Stars: 8
Description: A scientific fiction
*****Movie*****
Title: Trigun
Type: Anime, Action
Format: DVD
Rating: PG
Stars: 10
Description: Vash the Stampede!
*****Movie*****
Title: Ishtar
Type: Comedy
Format: VHS
Rating: PG
Stars: 2
Description: Viewable boredom

Leave a Comment

Your email address will not be published. Required fields are marked *