Linq and XML
Written by Mike James   
Tuesday, 28 October 2014
Article Index
Linq and XML
Select And Project

XML, which is all about tree structured data, and Linq, which is all about querying collections, might not seem to fit together, but they work together just fine.

Linq isn’t just about SQL and it isn’t even just about database. It is a mechanism for querying collections with a wide range of structures. 

After looking in some detail at the basic idea behind Linq, it is instructive to examine probably its second most common application - working with XML.

The basics of working with the new XML facilities are covered in XML in C# and for this article it is assumed that you know all about the basic XML facilities.



Linq is a very simple idea with a fairly simple implementation - see The LINQ principle for a more in depth discussion.

Linq queries are provided by extension methods applied to objects that implement the generic IEnumerable interface.

In the case of XML the main objects don’t implement IEnumerable but some of their methods return objects that do.

This is a slight expansion of the basic Linq idea, but a fairly obvious one.

For example, the Elements method returns an IEnumerable supporting a  collection of all  the child elements of the object.

This means you can write a foreach loop to step through each of the child elements:

foreach( XElement ele in root.Elements()){
 textBox1.Text += ele.ToString();

You can also make use of the usual Linq extension methods – although this isn’t the most common way of explaining how Linq to XML works.

For example, assuming we have an XML tree something like:

  <First />
  <Second />
 <Address />

you can use the Where method to filter the collection of child nodes:

var q = root.Elements().
foreach( XElement ele in q){
 textBox1.Text += ele.ToString();

which, of course, selects only those child elements that are “Address” tags.

You can chain together a set of Linq extension methods to produce something more complicated and you can use the syntactic shortcuts introduced into C# to make it even easier.

For example the previous query can be written as:

var q = from E in root.Elements()
 where  E.Name == "Address"
  select E;

and the compiler translates it back into the method calls.

If you understand the general workings of Linq then the only new element is using a method, i.e. Elements, that returns an IEnumerable collection rather than an object that implements IEnumerable.

This may appear to be a small difference but it does alter the “flavour” of using Linq ever so slightly.

The point is that the XML tree is quite a complicated data structure and there are lots of different ways that its nodes or attributes could be enumerated. This is the reason why it doesn’t just implement the IEnumerable interface in its own right and why it is preferable to delegate the enumeration to other methods – called in the Linq to XML jargon, XML Axis methods.

This small difference gives us a lot of power but it can also be confusing because it often provides more than one way of doing things.

For example, most Linq to XML instructors would not demonstrate finding an XElement with a specific name using the Where method. The reason is simply that the Elements method comes with the ability to construct a collection of child nodes that are restricted to a single name.

For example, you can return a collection of elements named “Address” in one simple step:

var q=root.Elements("Address");

No need for Linq proper here as the axis method does the job of picking out the specific objects and returns them as a collection.

Notice, however, that this isn’t returned as a standard collection type. The axis method adheres to the “deferred” execution model of Linq by returning an XContainer.GetElements  type which is only enumerated when the enumeration is really needed.

Another slightly confusing issue that is solved by Axis methods is determining which type of object needs to be returned.

For example:

var q = root.Attributes();

is a query that returns all of the attributes set on the root object. Once you have constructed the query you can step through it in the usual way using a foreach loop.

Most of the Axis methods allow the user to specify some simple filtering conditions that often mean that you don’t need to use a full Linq query at all.

Some Axis methods are so specific that they return a single element.

For example, FirstNode and LastNode return the first and last node respectively. Similarly Element(“name”) returns the first matching element which should be contrasted with Elements(“name”) which returns all child elements that match.

As well as working with sequences of elements that go “down” the tree you can work back up to the top most level using the “Ancestor” methods. For example:

var q = root.LastNode.Ancestors();

returns a collection of all of the elements in the tree by, perversely, starting at the last node and extracting all of its ancestors.





Now what about querying sub-trees?

This is very easy and almost doesn’t need any thought.

All you have to do is find the node that is the root of the sub-tree and use its Descendants method.

For example:

var q=root.Element("Name").Descendants();

This returns all of the child nodes below the Name XElement in the tree, i.e. First and Second in our earlier example.

Notice that Descendants is “recursive” in the sense that it returns all of the child nodes of the first node specified, then the child nodes of each of those and so on. The order in which the child nodes are returned is described as “document” order, i.e. the order in which the tags appear when the XML is listed down a page.

Notice that if you use a Linq query to return an element you automatically get its “deep” value – i.e. all of the child nodes it contains.

In this sense the query:

var q2 = from E in root.Elements()
  where E.Name == "Name"
   select E;

returns a sub-tree starting at the XElement “Name”. It is slightly different from the previous example because it includes the “Name” node and not just the sub-tree below it.

You can also chain Axis methods just as you can chain standard Linq methods.

For example:

var q=root.Element("Name").Attributes();

finds the first element that matches “Name” and then returns a sequence of its attributes, if any.

Some things are much easier to do with axis methods which are designed to work with a tree structure and some are easier using standard Linq queries which are designed to work with flat collections.

Sometimes a combination of the two works even better. For example consider:

var q = from E in root.Elements()
 where (E.Element("First")!=null)
  select E;

This selects all of the elements that have at least one “First” child element. Again as a deep value is returned, you actually get the subtree below any node that has a “First” child node.

Once you start to follow the relentless logic of IEnumerable and its Linq methods it becomes almost fun to try and work out the most “interesting” way of obtaining a result.

Not necessarily good programming practice but a good way to master the techniques.









Last Updated ( Tuesday, 20 January 2015 )