Last time I told you about how the CLR treats reference types, and I gave a brief description of how each instance of a reference type on the garbage collected heap carries and 8 byte object header, which includes a type handle pointing to an undocumented, opaque data structure called CORINFO_CLASS_STRUCT. The main purpose of the type handle is to enable fast type casting and virtual method dispatch. This time, I’m going to take a close look at how the CLR treats value types and how they’re designed to be as lightweight as possible by sacrificing some of the features sported by reference types.
First I’d like to recap the main differences between value types and reference types. Value types, as you may recall, reside on the call stack of the currently executing thread. They start out life with all their bits set to zero. Examples of value types include bools, numeric types such as ints and doubles, and other types such as DateTime and Guid. Structs and enums are user-defined value types. Each value type implicitly descends from System.ValueType and is implicitly sealed, so that you cannot derive another type from it. Unlike reference types, you cannot set a value type to null. Assigning one variable to another copies the value over, and its memory is deallocated when it falls out of scope after the method containing it is finished executing. Value types still inherit from System.Object, they can override virtual methods from System.Object, and they can implement interfaces.
I’ve spent quite a bit of time lately pondering how the CLR treats value types as opposed to reference types and specifically how they can do what they do without the benefit of carrying an object header.
The bottom line is that the CLR creates a CORINFO_CLASS_STRUCT for value types, the same way it creates one for reference types, that is, by scanning the type metadata contained in the type’s assembly. CORINFO contains a method table and also a table of interfaces supported by the type. The difference is that value types don’t need to carry a type handle pointing to the CORINFO structure, because the CLR does not have to quickly traverse a linked list of CORINFO’s in order to perform downcasting. The reason is that value types are sealed, which means their CORINFO structure is only one level deep.
Some of you may exclaim, “Wait a minute! I thought you said value types inherit from System.Object. Wouldn’t the CORINFO for a value type at least be linked to the CORINFO for System.Object?†Well, it turns out that, because each and every object in the .Net universe descends either directly or indirectly from System.Object, the CLR already factors that into the layout of each type.
The primary reason why reference types need an object header with a type handle pointing to a chain of CORINFO structures, one for each type in the class hierarchy, is to support polymorphism. For example, let’s say type Dog descends from Animal. When you declare a variable of type Animal but assign it a Dog instance, the only way for the CLR to figure out the Animal variable contains a Dog (as opposed to a Cat) is to use the type handle attached to the instance on the heap in order to traverse the linked list of CORINFO structures to see if the base class for Dog happens to be an Animal.
Some of you may yet exclaim, “Wait a minute! Can’t a value type override one of the virtual methods of System.Object, like Equals, GetHashCode and ToString? Doesn’t that require virtual method dispatch?†Well, it turns out that, because every type descends from System.Object, the first three methods on the method table of every type are Equals, GetHashCode and ToString. That means the CLR can invoke those methods non-virtually, that is, it can call them directly if they override those methods, or just call the implementation from System.ValueType or System.Object if they do not.
Some of you may exclaim again, “Wait a minute! What if I have a struct that implements an interface? If I cast my struct, which is a value type, to the interface, won’t the CLR need the type handle to track down the relation?†Well, it also turns out that casting from a value type to an interface (which is called sidecasting) results in a boxing operation, which takes the value and places it in an object on the heap, where it gets an object header with a type handle. The same thing happens when you invoke GetType on a value type instance, because CORINFO contains a link to the type’s metadata, which you can use at runtime via reflection.
So there you go. I realize my cursory overview does not do justice to the subtleties and nuances of the discussion and may leave you with more questions than answers. So I highly recommend you read the parts of Don Box’s CLR book dealing with this topic to get a clear idea about what’s going on with CORINFO and how the CLR uses it. Of course, feel free as always to post your questions here, and I’ll be glad to answer them!