2 is not a problem you can easily wrap the script call in a bit of bash
Code:
mkdir newdir
for f in *.xml; do ./scriptname.py "$f" > "newdir/$f"; done
but 1... this might stink because i suspect parsing and rebuilding from scratch might drop comments in the process. Care to give some small example of input and output accounting for these comments?
which version of python do you have?
edit:
Code:
#!/usr/bin/env python
import sys
import xml.etree.ElementTree as ET
#doc http://docs.python.org/2/library/xml.etree.elementtree.html
# preserving comments, from http://bugs.python.org/issue8277
class CommentedTreeBuilder ( ET.XMLTreeBuilder ):
def __init__ ( self, html = 0, target = None ):
ET.XMLTreeBuilder.__init__( self, html, target )
self._parser.CommentHandler = self.handle_comment
def handle_comment ( self, data ):
self._target.start( ET.Comment, {} )
self._target.data( data )
self._target.end( ET.Comment )
def strip_attrs( e ):
for a in e.attrib.keys():
if a not in preserve:
del e.attrib[a]
preserve = ( 'name', 'label', 'title' )
f = open( sys.argv[1] )
xmlheader = f.readline().strip()
tree = ET.parse( f, parser=CommentedTreeBuilder() )
xmlroot = tree.getroot()
for e in xmlroot.findall('.')+xmlroot.findall('.//*'):
strip_attrs( e )
#tree.write( 'out.xml', xml_declaration=True ) # python 2.7+
print xmlheader
print ET.tostring( xmlroot )
f.close()
after inserting some comment
Code:
$ ./xml_strip_attrs.py in.xml
<?xml version="1.0" encoding="utf-8" standalone="yes" ?>
<floater name="rlvLocks" title="Active RLV Locks">
<!-- wut wut wut -->
<scroll_list name="lock_list">
<column label="Lock Type" name="lock_type" />
<column label="Add / Rem" name="lock_addrem" />
<column label="Lock Target" name="lock_target" />
<column label="Held By" name="lock_origin" />
</scroll_list>
</floater>
Bookmarks