If you are looking for something to keep yourself occupied with:
You transform almost all nodes to node of the name "entry". And entry either contains a name attribute which will be either made up of the original #ID value of a certain attribute (you get to chose the name, but there may be only 1 such attribute-type in any source document) or alternatively the local-name() of the original node that was transformed into this entry; or a ref attribute which is an #IDREF in DTD lingo. The entry has either a subtree of entries or a value node. The value node is either a plain string, a special formatted string containing plain strings and text nodes (text refers to a constant that may have been re-used; the key is this text node complicates matters further in the real task); or in turn the value may be a entry node with a ref attribute. In an entry node with the ref attribute the rule change slightly: everything descending from that ancestor carries a ref attribute instead of a name attribute with the same contents wherever applicable.
Consider you had an XML document which allows you to cross-reference with other elements: essentially importing their contents into the current one, using an attribute. (The XML elements that can be imported will have a (different) attribute which can be counted upon to conform to the ID requirements if a DTD were available.)
The XML elements that import other elements may specify `overrides' that is: they may re-defined certain sub-elements their own way. This means that the elements which are imported must *not* already exist within the direct descendants of the importing element (a name test will suffice).
Furthermore the XML elements that are imported may have been importing other XML elements themselves. You must take care of this by cascading the import effect. Make sure that this does not in-advertently undo any `overrides' in the final importing element!
Finally, due to a small oddity in the way the XML documents works you must ensure that you don't copy the entire imported trees: just the top-most node of whatever tree it is that is imported. Consider:
Code:
<root>
<foo id='example' imports='bar' />
<foo id='bar'><node1><child></node1><node2 /><node3>hello world!</node3></foo>
</root>
The resulting tree would be:
Code:
<entry name="root">
<entry name="example">
<value ref="bar">
<entry ref="node1" />
<entry ref="node2" />
<entry ref="node3" />
</value>
</entry>
<entry name="bar">
<entry name="node1">
<entry name="child">
<value />
</entry>
</entry>
<entry name="node2">
<value />
</entry>
<entry name="node3">
<value>hello world!</value>
</entry>
</entry>
</entry>
EDIT: I forgot to mention that when the imported nodes themselves would transform to either the plain string or the formatted string; you must make sure that this information is not lost. (There is only 1 true solution which is to copy over these elements.)
Bookmarks