A few years back I read a book written by Don Box called Essential .Net, Volume 1: The Common Language Runtime, but at the time I really didn’t have the kind of grounding needed to fully grasp all the concepts and explanations. However, I recently re-read the book, after having acquired a more solid understanding of the design and architecture of .Net, based on reading authors like Andrew Troelsen, Juval Lowy and Jeffrey Richter, upon which I based a course on .Net fundamentals which I taught to a group of about 30 developers at Disney.
I have to say that Don’s book, written with the help of Chris Sells, stands alone in the depth to which it goes to explain precisely how the CLR works under the covers. Not only does Don drop down from C# to CIL (Common Intermediate Language), but he flies right past it down to the “bare metal†native assembly code generated by the Jitter (Just-in-Time Compiler)! Be warned, however, that Don’s book is only for those already comfortable with the fundamentals of C# and the .Net Framework. I would also recommend picking up a good book on CIL, such as Expert .NET 2.0 IL Assembler by Serge Lidin. But once you’re up to speed on the basics, Don’s book will help you not only understand how the CLR works, but more importantly, why it behaves the way it does.
At the risk of oversimplifying a complex topic, I’m going to take a stab at explaining here the concept of the 8 byte object header that every reference type has and how it enables type casting and virtual method dispatch. I realize there’s no way I can do justice to the topic, but I hope to whet your appetite enough that you’ll run out and get Don’s book to see how it really works. In a future entry, I’ll show you how value types get by without the object header and what they can’t do as a result of not having it.
As I explained in an earlier blog entry, types in .Net basically fall into two categories: reference types and value types. Reference types are allocated on the managed heap where they can be garbage collected. Value types are allocated on the stack, and their memory is reclaimed when the value falls out of scope. All reference types come with an Object Header with two fields that are 4 bytes each (on a 32 bit system). The first field is called the Sync Block Index and is used for things like thread synchronization. The second field is called the Type Handle and is a pointer to a data structure which Don refers to as CORINFO_CLASS_STRUCT (I’ll just call it CORINFO for short).
CORINFO contains class-specific data (such as static fields), a Method Table, and a pointer to the type’s metadata, as well as a few other odds and ends. CORINFO does not itself reside on the garbage collected heap but is located in the CLR’s private memory space.
The first purpose of CORINFO (and the reason why each instance of a reference type has a pointer to it) is to support type casting. One of the items in CORINFO is a pointer to the CORINFO of the type’s base class. The CLR can test to see if a class can be downcast (from a base to a derived class) simply by traversing the list of linked CORINFO data structures. CORINFO also supports sidecasting to an interface implemented by the type because it contains a pointer to an Interface Table.
The second purpose of CORINFO is to support virtual method dispatch. It does this by placing virtual methods at the top of the Method Table, together with a flag indicating whether the derived type is overriding the virtual method from the base class (as well as whether the method is abstract or sealed). Because CORINFO structures for derived and base types are linked together, it’s easy for the CLR to know whether to execute the base or derived class’s version of the method.
Third, CORINFO supports obtaining an object’s type by calling the GetType method of the object, which all types inherit from System.Object. As mentioned earlier, CORINFO contains a pointer to the type’s metadata, which can be used to create a new instance of System.Type containing the metadata of the type in question. That metadata is also stored in the CLR’s private memory space (in a structure called EEClass), but it is separate from CORINFO, which is optimized for the fastest performance possible.
Lastly, CORINFO can also have a header (called GCDesc) which tracks whether the object is still alive or is dead and can be garbage collected, or whether it has been pinned and cannot be moved or reclaimed by the garbage collector.
So there you go, a crash course on the mechanism used by the CLR to manage some of the characteristics of reference types. Next, I’ll take a look at the effect on value types of not having the object header with a type handle pointing to CORINFO. Stay tuned … same bat time, same bat channel.