Monday, 15 March 2010

When is an int[] not an int[]?

I’ve spent my entire train journey trying to get to the bottom of this, so I thought I'd blog it for posterity. In my crazed Reflection.Emit frenzy, my unit tests were erroring with PEVerify complaining about illegal ldlen codes:

[offset 0x....] Expected single-dimension zero-based array.

If you're doing meta-programming, tools like PEVerify and Reflector are your closest allies, but this took some head-scratching. I even distilled the code down to two seemingly identical bits of code that read and discard the length of an array variable initialized to null:

imageimage

The first pane declares “loc 0” and “loc 2” as a local int[] variables; forget about “loc 1” – it is unrelated. The second pane initializes each array variable as a null reference, obtains the length (which is a “native int” which I immediately convert to Int32), and then discards the value.

So why the error? And why one error and not two? PEVerify is, after all, a chatty beast… Either I’ve gone crazy in my code, or somebody is lying to me! Actually, both it turns out.

Pop quiz: what is the difference between these two Type instances representing a 1-dimension array of int:

Type explicitRank = typeof(int).MakeArrayType(1),
implicitRank = typeof(int).MakeArrayType();

The second is our friend, int[]. The first is something different, though; it is a 1-dimensional array of int sure enough, but it isn’t explicitly zero-based! (correction due: see comments) D’oh! It goes by the moniker int[*].

Simply; you can’t use ldlen on an int[*] – only an int[]. What I don’t yet understand is why the upstream code (when it assigned the array “for real”) didn’t complain about the very attempt to assign an int[] value (from a standard “get” accessor) to an int[*] local variable. Presumably the PEVerify authors didn’t think anyone would be stupid enough to try ;-p

The moral here; sometimes it pays to be less explicit (and I don’t just mean the language I used when I found the problem). I’ve also left feedback with Red Gate to tweak how it displays, but to be honest the number of people this cosmetic glitch will affect is minimal.

6 comments:

Jon Skeet said...

In CLR terms, int[*] is an array, whereas int[] is a vector. Somewhat confusing, given that we'd normally talk about int[] as an array...

Note that in C#, an int[] *always* means a vector: if you try to cast a single-dimensional array with a non-zero base to int[], it will fail. Also note that int[*] doesn't implement IList<int>, unlike int[]. (I can't remember about IEnumerable<T> offhand.)

Marc Gravell said...

Thanks for the extra context Jon - I've updated the body slightly to account for this.

charlie said...

Very interesting stuff. I'm working on a hobby compiler project with MSIL as the target language, so I've been learning more about it than I ever thought I would need to know (which is great).

Anyway I'm sure I'm just being dense, but I think I missed the punch line of the post. Was the problem that numArray and numArray2 are not actually both int[] and that Reflector was obscuring this fact? That seems to be what you're getting at, but I'm not positive.

Marc Gravell said...

@charlie - I guess the "punchline" is the double-whammy of reflector showing int[] and int[*] the same, *coupled* with the fact that the type.GetArrayType(...) overloads behave so critically differently.

Really, I'm just pain-dumping here. It bit me, and google didn't answer why "ldlen" and " Expected single-dimension zero-based array" might fight. Hopefully now it will.

Rob Smallshire said...

I'm also working on a hobby compiler OWL BASIC and just ran into this exact problem with the MakeArrayType overloads violating the prinicple of least surprise. I'm back on track now, thanks to your article.

Marc Gravell said...

Thanks Rob - I'm now very happy that I took the time to write up my findings.