Monday 20 October 2008

Express yourself

I thought I'd start by having a look at expressions - in particular System.Linq.Expressions.Expression.

Expression is an important part of the glue that makes LINQ work, but it has been almost completely hidden away by the spectacular job that C# 3.0 makes of lambda statements and LINQ.

For example, consider the following C# 3.0 query:

    var qry = from foo in source
where foo.Bar == "abc"
select foo;

Because of how query syntax is interpreted, this is identical to the following:

    var qry = source.Where(foo => foo.Bar == "abc"); // Sample A

But what does this mean? Well, first it depends on what our "source" is, or more accurately, whether our Where() method accepts a delegate or an expression tree. For LINQ-to-Objects, then the Where will accept a delegate (such as Func<T, bool>), and so (keeping the extension method syntax) this is identical to:

    var qry = source.Where(delegate(Foo foo) {
return foo.Bar == "abc"; });

However, for most other LINQ providers, it is likely that the Where method accepts an expression tree instead (such as Expression<Func<T,bool>>). So what is an expression tree? The short and imprecise answer is that is is an object representation of code - similar in some (very limited) ways to things like CodeDom - but aimed squarely at providing LINQ support - i.e. all the operations you might need for every-day queries, and the ability to point to framework methods for things that can't be expressed purely in expressions.

Doing It The Hard Way

So lets dissect our example: looking at the delegate version, you can see that the function (the argument to Where) accepts an argument called "foo" (of type Foo), performs member-access (.Bar), and performs an equality test with the string constant "abc". So lets try writing that (as an expression) ourselves:

    ParameterExpression foo = Expression.Parameter(typeof(Foo), "foo");
MemberExpression bar = Expression.PropertyOrField(foo, "Bar");
ConstantExpression abc = Expression.Constant("abc", typeof(string));
BinaryExpression test = Expression.Equal(bar, abc);

Expression<Func<Foo, bool>> lambda =
Expression.Lambda<Func<Foo, bool>>( test, foo);
var qry = source.Where(lambda);

Yikes! That looks like a lot of work! And you'd be right... but take it step-by-step and it makes sense:

  • We declare a parameter "foo" of type Foo
  • We perform member-access to get foo.Bar
  • We define a string constant "abc"
  • We test the value of .Bar against this constant
  • Finally, we wrap the whole thing up in a lambda; this is just our way of baking the expression into something usable (and note that we need to explicitly re-state the parameter we expect).
[update] Note that the expression is typed using Expression<T>, where T is a delegate-type that reflects what we want. In this case we wan't to use the method as a predicate - i.e. to accept a Foo and return a bool - so we use Expression<Func<Foo,bool>>.

Doing It The Easy Way

The good news is that in most cases, we simply don't need to do any of this; the compiler does all the hard work for us; in fact, it can use a number of tricks that can't be done conveniently in regular C#, such as pointing directly at a MethodInfo for the "get" (.Bar); here's what reflector shows for our original "Sample A":

    ParameterExpression CS$0$0000;
IQueryable<Foo> qry = source.Where<Foo>(Expression.Lambda<Func<Foo, bool>>(Expression.Equal(Expression.Property(CS$0$0000 = Expression.Parameter(typeof(Foo), "foo"), (MethodInfo) methodof(Foo.get_Bar)), Expression.Constant("abc", typeof(string)), false, (MethodInfo) methodof(string.op_Equality)), new ParameterExpression[] { CS$0$0000 }));

OK, that isn't the easiest code to read... it is mainly the same as our previous example (but with explicit generics and "params" values). The parameter usage (CS$0$0000) looks tricky, but is still the same - just initialised and used in the same code block.

The two big differences are in how ".Bar" and "==" are performed; note the use of "methodof(Foo.get_Bar)" and "methodof(string.op_Equality)". No, this isn't a C# keyword you didn't know about - it is simply reflector doing the best job it can of displaying something that isn't valid C#, but is valid IL - using a MethodInfo directly instead of via reflection. In our hand-cranked version, the Expression API will get the same answer, but needs to jump through a few more hoops first.

So Why Bother?

It would be valid, at this point, to ask: "If the compiler can write expressions for me, and do it better, why should I care?". And in most day-to-day coding you would be right: please don't start re-writing your LINQ queries with hand-cranked expressions! And in particular, they are pretty-much a nightmare to use (manually) with anonymous types. However, they do have some intersting possibilities. For regular LINQ (even LINQ-to-Objects via .AsQueryable()), this can be useful for things like:

  • Dynamic filters / sorts / etc based on properties that cannot be known at compile-time
  • General expression-tree manipulation: merging trees (at InvocationExpression nodes), re-forming trees, etc

But for me, a more common-place usage is in using expressions as the basis of a micro-compiler: a hidden gem of an Expression<T> is the Compile() method; this lets us turn our lambda into a typed delegate, ready for use. But this isn't using reflection (when executed): it creates regular IL in a dynamic assembly that can be invoked very efficiently. This is as simple as:

    Func<Foo, bool> predicate = lambda.Compile();
Foo obj = new Foo { Bar = "abc" };
bool isAbc = predicate(obj);

Now, obviously there is some cost to this:

  • we need to build an expression tree (with the reflection lookups etc)
  • the library code needs to parse the expression tree (including validation), and emit IL

So this isn't something we'd want to do every time we call the method - but stick the delegate creation in some start-up code (perhaps in a static constructor / type initializer, storing it in a static readonly field) and you have a powerful tool.

So What Can We Do?

imageSo far, I've only given a very small glimpse of the expression capabilities available, but the list is quite comprehensive, with no fewer than 60 different methods. This covers most of the common things you want to do, including:

  • all common operators
  • branching (conditional)
  • object construction
  • collection / array construction
  • member access
  • invocation (to both regular .NET methods and to pre-existing lambda expressions)
  • casting / conversion

The biggest thing to note is that expression-trees are designed to encapsulate a functional statement; in particular:

  • it doesn't lend itself to mutating existing instances
  • it doesn't lend itself to writing a series of statements (unless you can chain them via a fluent API or an operator such as +)
  • it should return something (i.e. not Action<T>)

What I mean is that you can't write the following (either in C# or manually):

    Expression<Action<Foo>> action = foo =>
{
foo.Bar = "def";
Console.WriteLine(foo);
};

since it fails all 3 conditions.

Example

So far, I've just tried to give a flavor of what you can do with expression trees. I've covered some more of this in various places, that might be interesting:

  • Generic operators [overview | usage] - i.e. +-/* with generic T; this is used in MiscUtil as part of the Push LINQ code
  • Object cloning [here] - efficient object cloning without the pain
  • LINQ object construction [here]
  • Partial LINQ projections [here]
I hope that has given you a taste for expressions, and a few ideas for when to use them.