Monday, 6 February 2012

Abstractions are only a tool; you don’t need to be

Abstractions abstractions abstractions. IAbstraction and AbstractionFactory.

Ah, joy.

I get very very confused when I see people saying things like “but it must be done via EF” (substitute EF for any other tool), when the thing they are trying to do is clearly not suited to the abstraction.

Abstractions, as tools, are simply a way of reducing the overhead / complexity of doing something specific (managing the relationship between objects and a database in the case of EF). Sometimes they are fairly tight; sometimes they are leaky. But; to use any abstraction, it is absolutely critical to be aware of two related things:

  • where the limits of the abstraction end
  • how the underlying “thing” works

I get so weary of people using things like EF in the hope that they then “don’t need to learn SQL”. This is a false and destructive aim. You, as a professional developer, wouldn’t try to write a web-page without knowing at least the fundamentals of html and css? Would you? Really?

In the same way; occasionally, a tool simply isn’t aimed at the job you want to do. If you want to avoid pain, you need to recognise this as soon as possible. I’ve seen some horrible horrible ugliness intended to strong-arm an existing tool to do something that it didn’t want to. In most cases, simply using a different tool for that piece (yes, you are allowed to use more than one), or writing the code directly, would have been simpler, less buggy, and wouldn’t take a great big core-dump all over your shiny code. Code matters; don’t contaminate it.

[end rant]

Thursday, 12 January 2012

Playing with your member

(and: introducing FastMember)

Toying with members. We all do it. Some do it slow, some do it fast.

I am of course talking about the type of flexible member access that you need regularly in data-binding, materialization, and serialization code – and various other utility code.

Background

Here’s standard member access:

Foo obj = GetStaticTypedFoo();
obj.Bar = "abc";

Not very exciting, is it? Traditional static-typed C# is very efficient here when everything is known at compile-time. With C# 4.0, we also get nice support for when the target is not known at compile time:

dynamic obj = GetDynamicFoo();
obj.Bar = "abc";

Looks much the same, eh? But what about when the member is not known? What we can’t do is:

dynamic obj = GetStaticTypedFoo();
string propName = "Bar";
obj.propName = "abc"; // does not do what we intended!

So, we find ourselves in the realm of reflection. And as everyone knows, reflection is slooooooooow. Or at least, it is normally; if you don’t object to talking with Cthulhu you can get into the exciting realms of meta-programming with tools like Expression or ILGenerator – but most people like keeping hold of their sanity, so… what to do?

Middle-ground

A few years ago, I threw together HyperDescriptor; this is a custom implementation of the System.ComponentModel representation of properties, but using some IL instead of reflection – significantly faster. It is a good tool – a worthy tool; but… I just can’t get excited about it now, for various reasons, but perhaps most importantly:

  • the weirdness that is System.ComponentModel is slowly fading away into obscurity
  • it does not really address the DLR

Additionally, I’ve seen a few bug reports since 4.0, and frankly I’m not sure it is quite the right tool now. Fixing it is sometimes a bad thing.

Having written tools like dapper-dot-net and protobuf-net, my joy of meta-programming has grown. Time to start afresh!

FastMember

So with gleaming eyes and a bottle of Chilean to keep the evil out, I whacked together a fresh library; FastMember – available on google-code and nuget. It isn’t very big, or very complex – it simply aims to solve two scenarios:

  • reading and writing properties and fields (known by name at runtime) on a set of homogeneous (i.e. groups of the same type) objects
  • reading and writing properties and fields (known by name at runtime) on an individual object, which might by a DLR object

Here’s some typical usage (EDITED - API changes):

var accessor = TypeAccessor.Create(type);
string propName = // something known only at runtime
while( /* some loop of data */ ) {
accessor[obj, propName] = rowValue;
}

or:

// could be static or DLR
var wrapped = ObjectAccessor.Create(obj);
string propName = // something known only at runtime
Console.WriteLine(wrapped[propName]);

Nothing hugely exciting, but it comes up often enough (especially with the DLR aspect) to be worth putting somewhere reusable. It might also serve as a small but complete example for either meta-programming (ILGenerator etc), or manual DLR programming (CallSite etc).

Mary Mary quite contrary, how does your member perform?

So let’s roll some numbers; I’m bundling read and write together here for brevity, but - based on 1M reads and 1M writes of a class with an auto-implemented string property:

Static C#: 14ms
Dynamic C#: 268ms
PropertyInfo: 8879ms
PropertyDescriptor: 12847ms
TypeAccessor.Create: 73ms
ObjectAccessor.Create: 92ms

As you can see, it somewhat stomps on both reflection (PropertyInfo) and System.ComponentModel (PropertyDescriptor), and isn't very far from static-typed C#. Furthermore, both APIs work (as mentioned) with DLR types, which is cute - becaues frankly they are a pain to talk to manually. It also supports fields (vs. properties) and structs (vs. classes, although only for read operations).

That's all; I had some fun writing it; I hope some folks get some use out of it.

Saturday, 26 November 2011

The autistic elephant in the server-room

A bit of a serious blog-entry this time. Normally I blog about language lemmas, IL intricacies, serialization subtleties, and materialisation mechanisms; but occasionally bigger topics present.

This blog entry is probably more for my purposes than yours. But you might like it anyway. If not, just come back next time.

Woah, lots of words; why should I read this?


Because knowledge never hurts. You might start to understand a colleague better; maybe it might help you recognise autistic traits in a family-member; or maybe you'll just think for a second before "tutting" when you see a child apparently misbehaving in the supermarket - perhaps you don't know what is happening quite as much as you think.

Why am I putting this on my geek blog?


Because, while the reasons still aren't exactly clear [edit: alternative], it tends to present more commonly in the children (mainly sons) of people in geeky professions. Indeed, our profession has a pretty high rate of successful autistic or Asperger's members; some very notable, some doing the day job like everyone else.

More specifically, I care because my eldest son is autistic. Now, autism (or more correctly, ASD) is a pretty large spectrum. In the UK, there is a trend not to bother trying to diagnose more specific groupings (like Asperger's) because it doesn't actually help with treatment: everyone with ASD needs to be assessed as an individual to understand their needs.

Evan is at the (perhaps unfairly named) "high functioning" end of the spectrum, which means he is of pretty normal intelligence and largely independent, but experiences the world in a slightly different way.

If you want to very quickly get a feel for stereotypical ASD at the "high" end, then Sheldon Cooper is perhaps the easiest place to look, with two caveats:

- there is no automatic "brilliance"; normal intelligence (albeit it, focused) is more likely
- everybody truly is individual; it helps to remember that generally speaking, generalisations don't help

Also, AFAIK The Big Bang Theory has never explicitly labelled Sheldon with ASD or Asperger's

(boo, someone deleted my Sheldon image... take your pick from here

Typical observations would include (not exhaustive):

- attention / interest can be intensely focused on a narrow topic, to the exclusion of others
- sensory perception (sound, touch, taste, smell, sight, etc) can be extreme (both hyper- and hypo-); for example, simply the washing/size-label on a t-shirt can cause agitation
- cognitive processing can get focused on a particular detail rather than processing the wider context; in programming terms think "depth first" rather than "bredth first"
- language and interpretation can be very literal (for example, "laughing your head off" could be concerning) and repetitive
- routine and predictability are important (in particular, this reduced the stress of processing unfamiliar scenarios), to the point where an individual can seem inflexible and stubborn
- social interaction tends to be limited (people aren't predictable, and often aren't very interesting - even less so if they aren't talking about your preferred topic) and a little awkward (not least, brutal honesty doesn't always go down well - "that lady smells bad" etc)
- there can be a definite need for "quiet time", or just time to unwind in whatever way works. That might be "stimming", or it might be sitting under a table for half an hour
- recollection (in particular visual memory) can be more acute

Of course, at the other end of ASD are far more debilitating issues, which may mean dedicated life-long care, maybe institutionalisation. It is not always gentle. I can't speak of this from experience, so I won't try.

So... What is the point of this blog entry?


It is a serious issue that affects our industry, perhaps more than most. Also, our industry is well suited to ASD. Computers are predictable; IT has a use for massively-focused detail-oriented people (experts perhaps in a very narrow and specialised field), and people who can quickly spot some really minor and subtle differences.

Perhaps more importantly though: it shouldn't be something we don't talk about. ASD tends to be an invisible and private thing (you can see a wheel-chair or hearing-aid; you can't necessarily see that the person you are talking to is actually massively stressed to breaking point because a road was closed and they took a different route to work, and then the 2nd lift on the right - THE LIFT THEY USE - was out of order).

Ultimately, no matter where on the spectrum someone is, we should not feel anything like embarrassment. I'm not embarrassed by my son - he's awesome! It took us as a family a little time to properly understand him and his needs, but I think we have something that works well now. And his current school is really great (mainstream, but really understand ASD; his last two schools..... not so much).

If you have a child with ASD


Moving from denial through to acceptance can take time. There's only so many birthday parties you can take them to where opening and closing a door for an hour is more interesting that the hired entertainer - eventually you need to accept that they are simply different. Not better or worse; just different.

Yes, I know, it can be hard sometimes. You may feel like a chess grand-master constantly thinking 12 steps ahead to avoid some meltdown later in the day. Remembering to give advance notice of change, and implementing damage limitation when something truly unexpected happens. Talking openly about it can be hard (yet is also very liberating). There are often local groups of other parents with similar experiences that you may find helpful (assuming you can arrange cover - regular babysitters may not work out so well).

Laughing helps. Try reading #youmightbeanautismparentif. Or watching Mary and Max

Also, don't buy into any of the snake-oil.

If you have ASD


Then you know ASD better than me. I've seen some folks with ASD get annoyed before because it is always "parents of..." doing the talking. Well, the world is your podium: I'd love to hear your thoughts. The "parents of" also have an awful lot to bring to the table, though.

And for Evan


He's doing great. He likes maths and his handwriting is almost as bad as mine. He can probably name a handful of the children in his class, but I'm sure he could tell me how many lights there are in each room in the school (and how many have broken/missing bulbs). We have good days, and we've had some really horrible days - but as we get better, together, at knowing what each of us needs, we get more of the former and less of the latter.

Final words




Dr. Sheldon Cooper

Monday, 24 October 2011

Assault by GC

TL;DR;

We had performance spikes, which we eased with some insane use of structs.

History

For a while now, we had been seeing a problem in the Stack Exchange engine where we would see regular and predictable stalls in performance. So much so that our sysadmins blogged about it back here. Annoyingly, we could only see this problem from the outside (i.e. haproxy logging, etc) – and to cut a very long story short, these stalls were due to garbage collection (GC).

We had a problem, you see, with some particularly large and long-lived sets of data (that we use for some crazy-complex performance-related code), that would hobble the server periodically.

Short aside on Server GC

The regular .NET framework has a generational GC – meaning: new objects are allocated “generation 0” (GEN-0). When it chooses, the system scans GEN-0 and finds (by walking references from the so-called “roots”) which (if any) of the objects are still in use to your application. Most objects have very short lives, and it can very quickly and efficiently reclaim the space from GEN-0; any objects that survived move to GEN-1. GEN-1 works the same (more-or-less), but is swept less often – moving any survivors into GEN-2. GEN-2 is the final lurking place for all your long-lived data – it is swept least often, and is the most expensive to check (especially if it gets big).

Until .NET 4.5 rolls into town with a background server GC, checking GEN-2 is a “stop the world” event – it (usually briefly) pauses your code, and does what it needs to. Now imagine you have a huge set of objects which will never be available to collect (because you will always be using them) all sat in GEN-2. What does that look like? Well, using StackExchange Data Explorer to analyse our haproxy logs, it looks a bit like this:

image

I’ve omitted the numbers, as they don’t matter; but interpretation: normally the server is ticking along with nice fast response times, then WHAM! A big spike (which we have correlated with GC) that just hammers the response-times.

So what to do?

We take performance very seriously at Stack Exchange, so as you imagine this got a bit of attention. The obvious answer of “don’t keep that data”, while a valid suggestion, would have hurt a lot of the overall performance, so we needed to find a way to remove or reduce this while keeping the data.

Our initial efforts focused on removing things like unnecessary copy/extend operations on the data, which helped some, but didn’t really make a step-change. Eventually, we concluded…

Break all the rules

Important: the following is merely a discussion of what has helped us. This is not a magic bullet, and should only be applied to some very limited scenarios after you have profiled and you know what you are doing and why. And it helps if you are just a little bit crazy.

First, a brief overview – imagine you are holding Customer objects in memory for an extended period; you have a lot of them in a List<Customer>, and occasionally add more (with suitable threading protection, etc). Further, you have some pre-calculated subsets of the data (perhaps by region) – so a number of smaller List<Customer>. That is entirely inaccurate, but sets the scene ;p

After exhausting alternatives, what we did was:

  • change Customer from a class to a struct (only within this crazy code)
  • change the main store from a List<Customer> to a Customer[]
  • change the subsets from List<Customer> to List<int>, specifically the offset into the main Customer[]

eek; so what does that do for us?

  • the main data is now a single object (the array) on the "large object heap", rather than lots of small objects
  • by using direct access into an array (rather than a list indexer) you can access the data in-situ, so we are not constantly copying Customer values on the stack
  • the int offsets for subsets are essential to make sure we don't have multiple separate values of each record, but on x64 using int offsets rather than references also means our subsets suddenly take half the memory

Note that for some of the more complex code (applying predicates etc), we also had to switch to by-ref passing, i.e.

void SomethingComplex(ref Customer customer) {...}
...
int custIndex = ...
SomethingComplex(ref customers[custIndex]);

This again is accessing the Customer value in-situ inside the array, rather than copying it.

To finish off the bad-practice sheet, we also had some crazy changes to replace some reference data inside the record to fixed sized buffers inside the value (avoiding an array on the heap per item), and some corresponding unsafe code to query it (including the rarely-used stackalloc), but that code is probably a bit complex to cover properly - essentially: we removed all the objects/references from this data. And after removing the objects, there is nothing for GC to look at.

Impact

It helped! Here’s the “after”, at the same scales:

image

As you can see, there are still a few vertical spikes (which still tie into GC), but they are much less dense, and less tall. Basically, the server is no longer tied up in knots. The code is a bit less OOP than we might usually do, but it is a: constrained to this very specific scenario (this is absolutely not a carte-blanche “make everything a struct”), and b: for understood reasons.

Plus, it was pretty interesting (oh dear; I really, really need to get out more). Enjoy.

Disclaimers

  • You need to be really careful messing with this – especially with updates etc, since your records are certainly over-size for atomic read/write, and you don’t want callers seeing torn data.
  • This is not about “struct is faster” – please don’t let that be what you take away; this is a very specific scenario.
  • You need to be really careful to avoid copying fat values on the stack.
  • This code really is well outside of the normal bell-curve. Indeed, it is pretty resonant of XNA-style C# (for games, where GC hurts the frame-rate).

Credit

A lot of input, data-analysis, scheming, etc here is also due to Sam Saffron and Kyle Brandt; any stupid/wrong bits are mine alone.

Friday, 7 October 2011

The trouble with tuples

A little while ago, I mentioned how new tuple handling in protobuf-net meant that likely-looking tuples could now be handled automatically, and even better this meant I could remove my hacky KeyValuePair<,> handling. All was well in the world – after all, what could go wrong?

Answer: Mono.

Actually, I’m not criticizing the Mono implementation really, but simply: Mono has a subtly different implementation of KeyValuePair<,> to Microsoft – nothing huge; simply there exist some set accessors (private ones, visible here). And the library was pretty fussy – if there was a set it wouldn’t be treated as a sure-thing tuple.

This is now fixed in r447, but: if you are using Mono and your dictionaries stopped serializing – then honestly and humbly: sorry about that.

And if you are a library author – watch out: sometimes the simplest most subtle differences between implementations can kill you.

Monday, 15 August 2011

Automatic serialization; what’s in a tuple?

I recently had a lot of reason (read: a serialization snafu) to think about tuples in the context of serialization. Historically, protobuf-net has focused on mutable types, which are convenient to manipulate while processing a protobuf stream (setting fields/properties on a per-field basis). However, if you think about it, tuples have an implicit obvious positional “contract”, so it makes a lot of sense to serialize them automatically. If I write:

var tuple = Tuple.Create(123, "abc");

then it doesn’t take a lot of initiative to think of tuple as an implicit contract with two fields:

  • field 1 is an integer with value 123
  • field 2 is a string with value “abc”

Since it is deeply immutable, at the moment we would need to either abuse reflection to mutate private fields, or write a mutable surrogate type for serialization, with conversion operators, and tell protobuf-net about the surrogate. Wouldn’t it be nice if protobuf-net could make this leap for us?

Well, after a Sunday-night hack it now (in the source code) does.

The rules are:

  • it must not already be marked as an explicit contract
  • only public fields / properties are considered
  • any public fields (spit) must be readonly
  • any public properties must have a get but not a set (on the public API, at least)
  • there must be exactly one interesting constructor, with parameters that are a case-insensitive match for each field/property in some order (i.e. there must be an obvious 1:1 mapping between members and constructor parameter names)

If all of the above conditions are met then it is now capable of behaving as you might hope and expect, deducing the contract and using the chosen constructor to rehydrate the objects. Which is nice! As a few side-benefits:

  • this completely removes the need for the existing KeyValuePairSurrogate<,>, which conveniently meets all of the above requirements
  • it also works for C# anonymous types if we want, since they too have an implicit positional contract (I am not convinced this is significant, but it may have uses)

This should make it into the next deploy, once I’m sure there are no obvious failures in my assumptions.

Just one more thing, sir…

While I’m on the subject of serialization (which, to be fair, I often am) – I have now also completed some changes to use RuntimeHelpers.GetHashCode()for reference-tracking (serialization). This lets met construct a reference-preserving hash-based lookup to minimise the cost of checking whether an object has already been seen (and if so, fetch the existing token). Wins all round.

Friday, 12 August 2011

Shifting expectations: non-integer shift operators in C#

In C#, it is quite nice that the shift operators (<< and >>) have had their second argument constrained to int arguments, avoiding the oft-confusing piped C++ usage:

cout << "abc" << "def";

But wait a minute! The above line is actually C#, in an all-C# project. OK, I cheated a little (I always do…) but genuinely! This little nugget comes from a stackoverflow post that really piqued my curiosity. All credit here goes to vcsjones, but indeed the line in the documentation about restricting to int (and the return type, etc) is about declaring the operator – not consuming it – so it is seemingly valid for the C# dynamic implementation to use it quite happily.

In fact, the main cheat I used here was simply hiding the assignment of the result, since that is still required. Here’s the full evil:

  using System;

using System.Dynamic;
using System.Linq.Expressions;

class Evil : DynamicObject {
static void Main() {
dynamic cout = new Evil();
var hacketyHackHack =
cout << "abc" << "def";
}
public override bool TryBinaryOperation(
BinaryOperationBinder binder,
object arg, out object result) {
switch(binder.Operation) {
case ExpressionType.LeftShift:
case ExpressionType.LeftShiftAssign:
// or whatever you want to do
Console.WriteLine(arg);
result = this;
return true;
}
return base.TryBinaryOperation(
binder, arg, out result);
}
}

I've seen more evil things (subverting new() via ContextBoundObject is still my favorite evil), but quite dastardly, IMO!

Additional: grrr! I don't know why blogger hates me so much; here it is on pastie.org.