Tuesday, June 9, 2015

Convert XML to XSD and generate Python Data Structure

1)
Install generateDS

a)
Create a virtualenv and activate it.

b)
#wget https://pypi.python.org/packages/source/g/generateDS/generateDS-2.16a0.tar.gz#md5=bc110d5987da661274c2f2532e673488
#tar -xzf generateDS-2.16a0.tar.gz
#python setup.py install


c)

#pip install lxml

2)
Convert XML file to XSD file


You can use following site to Convert XML file "myfile.xml" to XSD file and save it as "convertedfile.xsd".
http://xmlgrid.net/xml2xsd.html

3)
Create Python Data Structure from "convertedfile.xsd" and save it as "pydatastruct.py".

#generateDS.py -f -o pydatastruct.py convertedfile.xsd

4)
Print all classes defined in the "pydatastruct.py"

#grep "^class " pydatastruct.py

5)
Parse "myfile.xml" and build element using "pydatastruct.py"

import pydatastruct
from lxml import etree

#parse "convertedfile.xsd" and find the element whose tag match "{lcn-lcn_ctrl_d}flavours"
a = pydatastruct.parse("convertedfile.xsd")
tree = etree.parse("a.xml")
for x in tree.getroot().getchildren()[0].getchildren():
    if x.tag == '{lcn-lcn_ctrl_d}flavours':
        flavour_el = x

#Build the element "flavour" with all its attributes
flavour_el_obj = pydatastruct.flavoursType()
flavour_el_build_obj = flavour_el_obj.build(flavour_el)
dir(flavour_el_build_obj)

#print the value of element "flavour"
flavour_el_build_obj.get_flavour_id()
flavour_el_build_obj.vdus.get_memory().get_total_memory_gb()

6)
http://www.davekuhlman.org/generateDS.html#building-instances

7)
http://lxml.de/tutorial.html

http://www.davekuhlman.org/generateds_tutorial.html <===== XSD to python class
http://www.xml.com/pub/a/2003/06/11/py-xml.html <===

http://xmlgrid.net/xml2xsd.html <=== XML to XSD

http://www.davekuhlman.org/generateDS.html#how-to-build-and-install-it

http://infohost.nmt.edu/tcc/help/pubs/pylxml/web/etree-Element.html
http://infohost.nmt.edu/tcc/help/pubs/pylxml/web/etree-parse.html

8)
Example codes:

a)
from xml.etree.ElementTree import iterparse
depth = 0
for (event, node) in iterparse('myfile.xml', ['start', 'end', 'start-ns', 'end-ns']):
    if event == 'end':
        depth -= 1
    if not isinstance(node, tuple):
        if node:  
            print "." * depth*2, (event, node.tag)
    if event == 'start':
        depth += 1

b)
from lxml import etree
tree = etree.parse("myfile.xml")
dir(tree)
root = tree.getroot()
dir(root)
children = root.getchildren()

for e in root.getchildren():
    (e.tag, e.text, e.attrib)

etree.tostring(tree)
etree.tostring(root)
etree.tostring(child)

root.tag.title()
root.attrib

1 comment: