Home All Groups Group Topic Archive Search About

Unbox IL instruction question

Author
26 Jul 2006 3:16 PM
Ben R.
Hi,

I'm referring to:
http://msdn2.microsoft.com/en-us/library/system.reflection.emit.opcodes.unbox.aspx

There's a lot of stuff here that doesn't make sense to me. This page states:

A value type has two separate representations within the Common Language
Infrastructure (CLI):

A 'raw' form used when a value type is embedded within another object.

A 'boxed' form, where the data in the value type is wrapped (boxed) into an
object so it can exist as an independent entity.

The unbox instruction converts the object reference (type O), the boxed
representation of a value type, to a value type pointer (a managed pointer,
type &), its unboxed form. The supplied value type (valType) is a metadata
token indicating the type of value type contained within the boxed object.

Unlike Box, which is required to make a copy of a value type for use in the
object, unbox is not required to copy the value type from the object.
Typically it simply computes the address of the value type that is already
present inside of the boxed object.

________________

Now a few things: It mentions an "object reference" and a "value type
pointer." What is the difference here? Aren't they both pointers? An object
reference is an address where an object lives, correct? A value type pointer
is what then? A pointer? A pointer is also an address where something
(perhaps an object) lives, right? What to type O and type & refer to?

Another question:
"unbox is not required to copy the value type from the object. Typically it
simply computes the address of the value type that is already present inside
of the boxed object."
It seems then that unbox which is supposed to return a value type is simply
returning a pointer to what is inside the box. I thought value types were
very simple: they contain only the actual data you're using. If you push the
integer 6 on the stack, you don't deal with addresses at all, you simply put
a 6 on the stack. right? so why is this unbox instruction able to avoid
copying from the box and just return a pointer to what's in the box? Isn't
that the exact same situation we were in when it was a boxed object?

Thanks...

-Ben

Author
26 Jul 2006 8:10 PM
Ben R.
Hi all, I moved this to

microsoft.public.dotnet.framework.clr

as I thought it fit better there...

-Ben

Show quote
"Ben R." wrote:

> Hi,
>
> I'm referring to:
>
> http://msdn2.microsoft.com/en-us/library/system.reflection.emit.opcodes.unbox.aspx
>
> There's a lot of stuff here that doesn't make sense to me. This page states:
>
> A value type has two separate representations within the Common Language
> Infrastructure (CLI):
>
> A 'raw' form used when a value type is embedded within another object.
>
> A 'boxed' form, where the data in the value type is wrapped (boxed) into an
> object so it can exist as an independent entity.
>
> The unbox instruction converts the object reference (type O), the boxed
> representation of a value type, to a value type pointer (a managed pointer,
> type &), its unboxed form. The supplied value type (valType) is a metadata
> token indicating the type of value type contained within the boxed object.
>
> Unlike Box, which is required to make a copy of a value type for use in the
> object, unbox is not required to copy the value type from the object.
> Typically it simply computes the address of the value type that is already
> present inside of the boxed object.
>
> ________________
>
> Now a few things: It mentions an "object reference" and a "value type
> pointer." What is the difference here? Aren't they both pointers? An object
> reference is an address where an object lives, correct? A value type pointer
> is what then? A pointer? A pointer is also an address where something
> (perhaps an object) lives, right? What to type O and type & refer to?
>
> Another question:
> "unbox is not required to copy the value type from the object. Typically it
> simply computes the address of the value type that is already present inside
> of the boxed object."
> It seems then that unbox which is supposed to return a value type is simply
> returning a pointer to what is inside the box. I thought value types were
> very simple: they contain only the actual data you're using. If you push the
> integer 6 on the stack, you don't deal with addresses at all, you simply put
> a 6 on the stack. right? so why is this unbox instruction able to avoid
> copying from the box and just return a pointer to what's in the box? Isn't
> that the exact same situation we were in when it was a boxed object?
>
> Thanks...
>
> -Ben
Author
26 Jul 2006 10:29 PM
Barry Kelly
Ben R. <benr@newsgroup.nospam> wrote:

> http://msdn2.microsoft.com/en-us/library/system.reflection.emit.opcodes.unbox.aspx

> Now a few things: It mentions an "object reference" and a "value type
> pointer."

An object, traditionally, looks somewhat like this:

  <vtable pointer>
  <object data>
  <object data...>

The equivalent in the CLR is:

  <MethodTable pointer>
  <object data>
  <object data...>

So, if you have a boxed int, you've got this:

  <System.Int32 MethodTable pointer>
  <int32 m_value>

(You can view this 'm_value' field if you use reflection to look into a
boxed Int32.)

So, to 'unbox' a boxed instance, all the CLR needs to do is increment
the pointer. A boxed instance looks like this (including the pointer):

-> <System.Int32 MethodTable pointer>
   <int32 m_value>

.... to:

   <System.Int32 MethodTable pointer>
-> <int32 m_value>

et voilà - a managed pointer to the value!

> What to type O and type & refer to?

Type O is an object type, and will always point to a fully-fledged
object on the heap. A managed pointer, denoted by '&', may point to an
arbitrary typed location. For example, 'ref' and 'out' parameters are
implemented using '&', in which case the pointers might point to locals
on the stack.

> Another question:
> "unbox is not required to copy the value type from the object. Typically it
> simply computes the address of the value type that is already present inside
> of the boxed object."

Hopefully this is clear from the adjustment above - typically, the CLR
simply needs to add IntPtr.Size to the pointer (4 on 32-bit, 8 on
64-bit).

> It seems then that unbox which is supposed to return a value type is simply
> returning a pointer to what is inside the box.

It loads a managed pointer onto the stack. You can then use ldobj to
dereference the managed pointer and convert it into a fully-fledged
value type on the stack. There is an instruction on 2.0, unbox.any,
which combines the two steps, and is pretty essential for generics to
work on both value types and integers.

> I thought value types were
> very simple: they contain only the actual data you're using. If you push the
> integer 6 on the stack, you don't deal with addresses at all, you simply put
> a 6 on the stack. right?

Yes.

> so why is this unbox instruction able to avoid
> copying from the box and just return a pointer to what's in the box?

You may be calling an instance method on the value type. In that case,
argument 0 - i.e. the 'this' argument - needs to be a pointer.

Also, I hasten to point out that the IL instructions are simply a
description of required semantics. I think it's wise not to think of the
CLI's abstract machine as an actual CPU with x cost for this instruction
and y cost for that instruction, but rather as input to a compiler and
optimizer which can figure things out properly. When considering how to
generate code, look at the output of the C# or C++/CLI compiler for an
example expression / statement. The C++/CLI compiler in particular seems
to sometimes generate slightly faster IL.

-- Barry

Author
27 Jul 2006 2:32 AM
Ben R.
Hi Barry,

Thanks, this info is really useful. Your thoroughness is much appreciated. 
You've inspired me to read more about methodtables.

There was one thing you mentioned, though, that I have trouble understanding:

> > so why is this unbox instruction able to avoid
> > copying from the box and just return a pointer to what's in the box?
>
> You may be calling an instance method on the value type. In that case,
> argument 0 - i.e. the 'this' argument - needs to be a pointer.

Would an example of this be if you define a function within a struct? I'm
also not sure how this ties into the "this" pointer. Could you elaborate on
this?

Thanks...

-Ben

Show quote
"Barry Kelly" wrote:

> Ben R. <benr@newsgroup.nospam> wrote:
>
> > http://msdn2.microsoft.com/en-us/library/system.reflection.emit.opcodes.unbox.aspx
>
> > Now a few things: It mentions an "object reference" and a "value type
> > pointer."
>
> An object, traditionally, looks somewhat like this:
>
>   <vtable pointer>
>   <object data>
>   <object data...>
>
> The equivalent in the CLR is:
>
>   <MethodTable pointer>
>   <object data>
>   <object data...>
>
> So, if you have a boxed int, you've got this:
>
>   <System.Int32 MethodTable pointer>
>   <int32 m_value>
>
> (You can view this 'm_value' field if you use reflection to look into a
> boxed Int32.)
>
> So, to 'unbox' a boxed instance, all the CLR needs to do is increment
> the pointer. A boxed instance looks like this (including the pointer):
>
> -> <System.Int32 MethodTable pointer>
>    <int32 m_value>
>
> .... to:
>
>    <System.Int32 MethodTable pointer>
> -> <int32 m_value>
>
> et voilà - a managed pointer to the value!
>
> > What to type O and type & refer to?
>
> Type O is an object type, and will always point to a fully-fledged
> object on the heap. A managed pointer, denoted by '&', may point to an
> arbitrary typed location. For example, 'ref' and 'out' parameters are
> implemented using '&', in which case the pointers might point to locals
> on the stack.
>
> > Another question:
> > "unbox is not required to copy the value type from the object. Typically it
> > simply computes the address of the value type that is already present inside
> > of the boxed object."
>
> Hopefully this is clear from the adjustment above - typically, the CLR
> simply needs to add IntPtr.Size to the pointer (4 on 32-bit, 8 on
> 64-bit).
>
> > It seems then that unbox which is supposed to return a value type is simply
> > returning a pointer to what is inside the box.
>
> It loads a managed pointer onto the stack. You can then use ldobj to
> dereference the managed pointer and convert it into a fully-fledged
> value type on the stack. There is an instruction on 2.0, unbox.any,
> which combines the two steps, and is pretty essential for generics to
> work on both value types and integers.
>
> > I thought value types were
> > very simple: they contain only the actual data you're using. If you push the
> > integer 6 on the stack, you don't deal with addresses at all, you simply put
> > a 6 on the stack. right?
>
> Yes.
>
> > so why is this unbox instruction able to avoid
> > copying from the box and just return a pointer to what's in the box?
>
> You may be calling an instance method on the value type. In that case,
> argument 0 - i.e. the 'this' argument - needs to be a pointer.
>
> Also, I hasten to point out that the IL instructions are simply a
> description of required semantics. I think it's wise not to think of the
> CLI's abstract machine as an actual CPU with x cost for this instruction
> and y cost for that instruction, but rather as input to a compiler and
> optimizer which can figure things out properly. When considering how to
> generate code, look at the output of the C# or C++/CLI compiler for an
> example expression / statement. The C++/CLI compiler in particular seems
> to sometimes generate slightly faster IL.
>
> -- Barry
>
> --
> http://barrkel.blogspot.com/
>
Author
27 Jul 2006 8:54 AM
Barry Kelly
Ben R. <benr@newsgroup.nospam> wrote:

> Thanks, this info is really useful. Your thoroughness is much appreciated. 
> You've inspired me to read more about methodtables.
>
> There was one thing you mentioned, though, that I have trouble understanding:
>
> > > so why is this unbox instruction able to avoid
> > > copying from the box and just return a pointer to what's in the box?
> >
> > You may be calling an instance method on the value type. In that case,
> > argument 0 - i.e. the 'this' argument - needs to be a pointer.
>
> Would an example of this be if you define a function within a struct?

Yes, if you define a method on the struct.

> I'm
> also not sure how this ties into the "this" pointer. Could you elaborate on
> this?

Consider this struct:

---8<---
using System;

class Program
{
    struct S
    {
        int _value;
        public S(int value) { _value = value; }
        public void PrintValue() { Console.WriteLine(_value); }
    }
    static void Main()
    {
        object o = new S(42);
        ((S) o).PrintValue();
    }
}
--->8---

When you compile this and decompile it, the relevant instructions in
Main() to do the call look like this:

---8<---
    IL_000e:  ldloc.0
    IL_000f:  unbox.any  Program/S
    IL_0014:  stloc.1
    IL_0015:  ldloca.s   V_1
    IL_0017:  call       instance void Program/S::PrintValue()
--->8---

These could be rewritten like this:

---8<---
    ldloc.0
    unbox Program/S
    call instance void Program/S::PrintValue()
--->8---

If you disassemble the above C# program and replace the given lines with
these new lines, and then reassemble, you'll find that it works just the
same.

Basically, when you call an instance method on something, whether it's
an object or a struct, it needs a 'this' pointer. In the case of value
types, the 'this' pointer is a managed pointer, and for classes it's an
object reference. The 'this' pointer is always the first (i.e. slot 0)
argument to the instance method. That means it needs to be loaded onto
the stack before any other arguments to the method.

-- Barry

Author
27 Jul 2006 5:54 PM
Ben R.
Thanks Barry,

After stairing for a bit, I think this makes sense to me.

PS: I've added your blog to my RSS feeds!

Best,

-Ben

Show quote
"Barry Kelly" wrote:

> Ben R. <benr@newsgroup.nospam> wrote:
>
> > Thanks, this info is really useful. Your thoroughness is much appreciated. 
> > You've inspired me to read more about methodtables.
> >
> > There was one thing you mentioned, though, that I have trouble understanding:
> >
> > > > so why is this unbox instruction able to avoid
> > > > copying from the box and just return a pointer to what's in the box?
> > >
> > > You may be calling an instance method on the value type. In that case,
> > > argument 0 - i.e. the 'this' argument - needs to be a pointer.
> >
> > Would an example of this be if you define a function within a struct?
>
> Yes, if you define a method on the struct.
>
> > I'm
> > also not sure how this ties into the "this" pointer. Could you elaborate on
> > this?
>
> Consider this struct:
>
> ---8<---
> using System;
>
> class Program
> {
>     struct S
>     {
>         int _value;
>         public S(int value) { _value = value; }
>         public void PrintValue() { Console.WriteLine(_value); }
>     }
>     static void Main()
>     {
>         object o = new S(42);
>         ((S) o).PrintValue();
>     }
> }
> --->8---
>
> When you compile this and decompile it, the relevant instructions in
> Main() to do the call look like this:
>
> ---8<---
>     IL_000e:  ldloc.0
>     IL_000f:  unbox.any  Program/S
>     IL_0014:  stloc.1
>     IL_0015:  ldloca.s   V_1
>     IL_0017:  call       instance void Program/S::PrintValue()
> --->8---
>
> These could be rewritten like this:
>
> ---8<---
>     ldloc.0
>     unbox Program/S
>     call instance void Program/S::PrintValue()
> --->8---
>
> If you disassemble the above C# program and replace the given lines with
> these new lines, and then reassemble, you'll find that it works just the
> same.
>
> Basically, when you call an instance method on something, whether it's
> an object or a struct, it needs a 'this' pointer. In the case of value
> types, the 'this' pointer is a managed pointer, and for classes it's an
> object reference. The 'this' pointer is always the first (i.e. slot 0)
> argument to the instance method. That means it needs to be loaded onto
> the stack before any other arguments to the method.
>
> -- Barry
>
> --
> http://barrkel.blogspot.com/
>

AddThis Social Bookmark Button