XML-XXPATH¶ ↑
Overview, Motivation¶ ↑
Xml-xxpath is an (incomplete) XPath interpreter that is at the moment
bundled with xml-mapping. It is built on top of REXML. xml-mapping uses xml-xxpath extensively for
implementing its node types – see the README
file and the reference documentation (and the source code) for details.
xml-xxpath, however, does not depend on xml-mapping at all, and is useful
in its own right – maybe I'll later distribute it as a seperate library
instead of bundling it. For the time being, if you want to use this XPath
implementation stand-alone, you can just rip the files
lib/xml/xxpath.rb
, lib/xml/xxpath/steps.rb
, and
lib/xml/xxpath_methods.rb
out of the xml-mapping distribution
and use them on their own (they do not depend on anything else).
xml-xxpath's XPath support is vastly incomplete (see below), but, in
addition to the normal reading/matching functionality found in other XPath
implementations (i.e. “find all elements in a given XML document matching a given XPath expression”),
xml-xxpath supports write access. For example, when writing the
XPath expression /foo/bar[3]/baz[@key='hiho']
to the XML document
<foo> <bar> <baz key='ab'>hello</baz> <baz key='xy'>goodbye</baz> </bar> </foo>
, you'll get:
<foo> <bar> <baz key='ab'>hello</baz> <baz key='xy'>goodbye</baz> </bar> <bar/> <bar><baz key='hiho'/></bar> </foo>
This feature is used by xml-mapping when writing (marshalling) Ruby objects to XML, and is actually the reason why I couldn't just use any of the existing XPath implementations, e.g. the one that comes with REXML. Also, the whole xml-xxpath implementation is just 300 lines of Ruby code, it is quite fast (paths are precompiled), and xml-xxpath returns matched elements in the order they appeared in the source document – I've heard REXML::XPath doesn't do that :)
Some basic knowledge of XPath is helpful for reading this document.
At the moment, xml-xxpath understands XPath expressions of the form
[/
]pathelement/pathelement/…, where each pathelement must be
one of these:
-
a simple element name name, e.g.
signature
-
an attribute name, @attrname, e.g.
@key
-
a combination of an element name and an attribute name and -value, in the form
elt_name[@attr_name='attr_value']
-
an element name and an index,
elt_name[index]
-
the "match-all" path element,
*
-
.
-
name1
|
name2|
... -
.[@key='xy'] / self::*[@key='xy']
-
child::*[@key='xy']
-
text()
Xml-xxpath only supports relative paths at this time, i.e. XPath expressions beginning with “/” or “//” will still only find nodes below the node the expression is applied to (as if you had written “./” or “.//”, respectively).
Usage¶ ↑
Xml-xxpath defines the class XML::XXPath. An instance of that class wraps an XPath expression, the string representation of which must be supplied when constructing the instance. You then call instance methods like first, all or create_new on the instance, supplying the REXML Element the XPath expression should be applied to, and get the results, or, in the case of write access, the element is updated in-place.
Read Access¶ ↑
require 'xml/xxpath' d=REXML::Document.new <<EOS <foo> <bar> <baz key="work">Java</baz> <baz key="play">Ruby</baz> </bar> <bar> <baz key="ab">hello</baz> <baz key="play">scrabble</baz> <baz key="xy">goodbye</baz> </bar> <more> <baz key="play">poker</baz> </more> </foo> EOS ####read access path=XML::XXPath.new("/foo/bar[2]/baz") ## path.all(document) gives all elements matching path in document path.all(d) => [<baz key='ab'> ... </>, <baz key='play'> ... </>, <baz key='xy'> ... </>] ## loop over them path.each(d){|elt| puts elt.text} hello scrabble goodbye => [<baz key='ab'> ... </>, <baz key='play'> ... </>, <baz key='xy'> ... </>] ## the first of those path.first(d) => <baz key='ab'> ... </> ## no match here (only three "baz" elements) path2=XML::XXPath.new("/foo/bar[2]/baz[4]") path2.all(d) => [] ## "first" raises XML::XXPathError in such cases... path2.first(d) XML::XXPathError: path not found: /foo/bar[2]/baz[4] from /Users/oklischat/xml-mapping/lib/xml/xxpath.rb:75:in `first' ##...unless we allow nil returns path2.first(d,:allow_nil=>true) => nil ##attribute nodes can also be returned keysPath=XML::XXPath.new("/foo / @key") keysPath.all(d).map{|attr|attr.text} => ["work", "play", "ab", "play", "xy", "play"]
The objects supplied to the all()
, first()
, and
each()
calls must be REXML element
nodes, i.e. they must support messages like elements
,
attributes
etc (instances of REXML::Element and its subclasses do this).
The calls return the found elements as instances of REXML::Element or XML::XXPath::Accessors::Attribute.
The latter is a wrapper around attribute nodes that is largely
call-compatible to REXML::Element. This is
so you can write things like path.each{|node|puts node.text}
without having to special-case anything even if the path matches
attributes, not just elements.
As you can see, you can re-use path objects, applying them to different XML elements at will. You should do this because the XPath pattern is stored inside the XPath object in a pre-compiled form, which makes it more efficient.
The path elements of the XPath pattern are applied to the
.elements
collection of the passed XML
element and its sub-elements, starting with the first one. This is shown by
the following code:
require 'xml/xxpath' d=REXML::Document.new <<EOS <foo> <bar x="hello"> <first> <second>pingpong</second> </first> </bar> <bar x="goodbye"/> </foo> EOS XML::XXPath.new("/foo/bar").all(d) => [<bar x='hello'> ... </>, <bar x='goodbye'/>] XML::XXPath.new("/bar").all(d) => [] XML::XXPath.new("/foo/bar").all(d.root) => [] XML::XXPath.new("/bar").all(d.root) => [<bar x='hello'> ... </>, <bar x='goodbye'/>] firstelt = XML::XXPath.new("/foo/bar/first").first(d) => <first> ... </> XML::XXPath.new("/first/second").all(firstelt) => [] XML::XXPath.new("/second").all(firstelt) => [<second> ... </>]
A REXML Document
object is a REXML Element
object whose
elements
collection consists only of a single member – the
document's root node. The first path element of the XPath – “foo” in
the example – is matched against that. That is why the path “/bar” in the
example doesn't match anything when matched against the document
d
itself.
An ordinary REXML Element
object that
represents a node somewhere inside an XML tree has
an elements
collection that consists of all the element's
direct sub-elements. That is why XPath patterns matched against the
firstelt
element in the example must not start with
“/first” (unless there is a child node that is also named “first”).
Write Access¶ ↑
You may pass an :ensure_created=>true
option argument to
path.first(elt) / path.all(elt) calls
to make sure that path exists inside the passed XML element elt. If it existed before, nothing
changes, and the call behaves just as it would without the option argument.
If the path didn't exist before, the XML element
is modified such that
-
the path exists afterwards
-
all paths that existed before still exist afterwards
-
the modification is as small as possible (i.e. as few elements as possible are added, additional attributes are added to existing elements if possible etc.)
The created resp. previously existing, matching elements are returned.
Examples:
require 'xml/xxpath' d=REXML::Document.new <<EOS <foo> <bar> <baz key="work">Java</baz> <baz key="play">Ruby</baz> </bar> </foo> EOS rootelt=d.root #### ensuring that a specific path exists inside the document XML::XXPath.new("/bar/baz[@key='work']").first(rootelt,:ensure_created=>true) => <baz key='work'> ... </> d.write($stdout,2) <foo> <bar> <baz key='work'> Java </baz> <baz key='play'> Ruby </baz> </bar> </foo>### no change (path existed before) XML::XXPath.new("/bar/baz[@key='42']").first(rootelt,:ensure_created=>true) => <baz key='42'/> d.write($stdout,2) <foo> <bar> <baz key='work'> Java </baz> <baz key='play'> Ruby </baz> <baz key='42'/> </bar> </foo>### path was added XML::XXPath.new("/bar/baz[@key='42']").first(rootelt,:ensure_created=>true) => <baz key='42'/> d.write($stdout,2) <foo> <bar> <baz key='work'> Java </baz> <baz key='play'> Ruby </baz> <baz key='42'/> </bar> </foo>### no change this time XML::XXPath.new("/bar/baz[@key2='hello']").first(rootelt,:ensure_created=>true) => <baz key='work' key2='hello'> ... </> d.write($stdout,2) <foo> <bar> <baz key='work' key2='hello'> Java </baz> <baz key='play'> Ruby </baz> <baz key='42'/> </bar> </foo>### this fit in the 1st "baz" element since ### there was no "key2" attribute there before. XML::XXPath.new("/bar/baz[2]").first(rootelt,:ensure_created=>true) => <baz key='play'> ... </> d.write($stdout,2) <foo> <bar> <baz key='work' key2='hello'> Java </baz> <baz key='play'> Ruby </baz> <baz key='42'/> </bar> </foo>### no change XML::XXPath.new("/bar/baz[6]/@haha").first(rootelt,:ensure_created=>true) => #<XML::XXPath::Accessors::Attribute:0x007ff64a13ed48 @parent=<baz haha='[unset]'/>, @name="haha"> d.write($stdout,2) <foo> <bar> <baz key='work' key2='hello'> Java </baz> <baz key='play'> Ruby </baz> <baz key='42'/> <baz/> <baz/> <baz haha='[unset]'/> </bar> </foo>### for there to be a 6th "baz" element, there must be 1st..5th "baz" elements XML::XXPath.new("/bar/baz[6]/@haha").first(rootelt,:ensure_created=>true) => #<XML::XXPath::Accessors::Attribute:0x007ff64a12e240 @parent=<baz haha='[unset]'/>, @name="haha"> d.write($stdout,2) <foo> <bar> <baz key='work' key2='hello'> Java </baz> <baz key='play'> Ruby </baz> <baz key='42'/> <baz/> <baz/> <baz haha='[unset]'/> </bar> </foo>### no change this time
Alternatively, you may pass a :create_new=>true
option
argument or call create_new
(path.create_new(
elt)
is
equivalent to
path.first(
elt,:create_new=>true)
).
In that case, a new node is created in elt for each path element
of path (or an exception raised if that wasn't possible for
any path element).
Examples:
require 'xml/xxpath' d=REXML::Document.new <<EOS <foo> <bar> <baz key="work">Java</baz> <baz key="play">Ruby</baz> </bar> </foo> EOS rootelt=d.root path1=XML::XXPath.new("/bar/baz[@key='work']") path1.create_new(rootelt) => <baz key='work'/> d.write($stdout,2) <foo> <bar> <baz key='work'> Java </baz> <baz key='play'> Ruby </baz> </bar> <bar> <baz key='work'/> </bar> </foo>### a new element is created for *each* path element, regardless of ### what existed before. So a new "bar" element was added, with a new ### "baz" element inside it ### same call again... path1.create_new(rootelt) => <baz key='work'/> d.write($stdout,2) <foo> <bar> <baz key='work'> Java </baz> <baz key='play'> Ruby </baz> </bar> <bar> <baz key='work'/> </bar> <bar> <baz key='work'/> </bar> </foo>### same procedure -- new elements added for each path element ## get reference to 1st "baz" element firstbazelt=XML::XXPath.new("/bar/baz").first(rootelt) => <baz key='work'> ... </> path2=XML::XXPath.new("@key2") path2.create_new(firstbazelt) => #<XML::XXPath::Accessors::Attribute:0x007ff649a6bc00 @parent=<baz key='work' key2='[unset]'> ... </>, @name="key2"> d.write($stdout,2) <foo> <bar> <baz key='work' key2='[unset]'> Java </baz> <baz key='play'> Ruby </baz> </bar> <bar> <baz key='work'/> </bar> <bar> <baz key='work'/> </bar> </foo>### ok, new attribute node added ### same call again... path2.create_new(firstbazelt) XML::XXPathError: XPath (@key2): create_new and attribute already exists from /Users/oklischat/xml-mapping/lib/xml/xxpath/steps.rb:215:in `create_on' from /Users/oklischat/xml-mapping/lib/xml/xxpath/steps.rb:80:in `block in creator' from /Users/oklischat/xml-mapping/lib/xml/xxpath.rb:91:in `call' from /Users/oklischat/xml-mapping/lib/xml/xxpath.rb:91:in `all' from /Users/oklischat/xml-mapping/lib/xml/xxpath.rb:70:in `first' from /Users/oklischat/xml-mapping/lib/xml/xxpath.rb:112:in `create_new' ### can't create that path anew again -- an element can't have more ### than one attribute with the same name ### the document hasn't changed d.write($stdout,2) <foo> <bar> <baz key='work' key2='[unset]'> Java </baz> <baz key='play'> Ruby </baz> </bar> <bar> <baz key='work'/> </bar> <bar> <baz key='work'/> </bar> </foo> ### create_new the same path as in the ensure_created example baz6elt=XML::XXPath.new("/bar/baz[6]").create_new(rootelt) => <baz/> d.write($stdout,2) <foo> <bar> <baz key='work' key2='[unset]'> Java </baz> <baz key='play'> Ruby </baz> </bar> <bar> <baz key='work'/> </bar> <bar> <baz key='work'/> </bar> <bar> <baz/> <baz/> <baz/> <baz/> <baz/> <baz/> </bar> </foo>### ok, new "bar" element and 6th "baz" element inside it created XML::XXPath.new("baz[6]").create_new(baz6elt.parent) XML::XXPathError: XPath: baz[6]: create_new and element already exists from /Users/oklischat/xml-mapping/lib/xml/xxpath/steps.rb:167:in `create_on' from /Users/oklischat/xml-mapping/lib/xml/xxpath/steps.rb:80:in `block in creator' from /Users/oklischat/xml-mapping/lib/xml/xxpath.rb:91:in `call' from /Users/oklischat/xml-mapping/lib/xml/xxpath.rb:91:in `all' from /Users/oklischat/xml-mapping/lib/xml/xxpath.rb:70:in `first' from /Users/oklischat/xml-mapping/lib/xml/xxpath.rb:112:in `create_new' ### yep, baz[6] already existed and thus couldn't be created once ### again ### but of course... XML::XXPath.new("/bar/baz[6]").create_new(rootelt) => <baz/> d.write($stdout,2) <foo> <bar> <baz key='work' key2='[unset]'> Java </baz> <baz key='play'> Ruby </baz> </bar> <bar> <baz key='work'/> </bar> <bar> <baz key='work'/> </bar> <bar> <baz/> <baz/> <baz/> <baz/> <baz/> <baz/> </bar> <bar> <baz/> <baz/> <baz/> <baz/> <baz/> <baz/> </bar> </foo>### this works because *all* path elements are newly created
This feature is used in xml-mapping by node types like XML::Mapping::ArrayNode, which must create a new instance of the “per-array element path” for each element of the array to be stored in an XML tree.
Pathological Cases¶ ↑
What is created when the Path “*” is to be created inside an empty XML element? The name of the element to be created isn't known, but still some element must be created. The answer is that xml-xxpath creates a special “unspecified” element whose name must be set by the caller afterwards:
require 'xml/xxpath' d=REXML::Document.new <<EOS <foo> <bar/> <bar/> </foo> EOS rootelt=d.root XML::XXPath.new("*").all(rootelt) => [<bar/>, <bar/>] ### ok XML::XXPath.new("bar/*").first(rootelt, :allow_nil=>true) => nil ### ok, nothing there ### the same call with :ensure_created=>true newelt = XML::XXPath.new("bar/*").first(rootelt, :ensure_created=>true) => </> d.write($stdout,2) <foo> <bar> </> </bar> <bar/> </foo> ### a new "unspecified" element was created newelt.unspecified? => true ### we must modify it to "specify" it newelt.name="new-one" newelt.text="hello!" newelt.unspecified? => false d.write($stdout,2) <foo> <bar> <new-one> hello! </new-one> </bar> <bar/> </foo> ### you could also set unspecified to false explicitly, as in: newelt.unspecified=true
The “newelt” object in the last example is an ordinary REXML::Element. xml-xxpath mixes the “unspecified” attribute into that class, as well as into the XML::XXPath::Accessors::Attribute class mentioned above.
Implentation notes¶ ↑
doc/xpath_impl_notes.txt
contains some documentation on the
implementation of xml-xxpath.
License¶ ↑
Ruby's.