|
dev
newsgroups
|
|||||||||||||||||||||||
|
|||||||||||||||||||||||
Fatal Execution Engine Error using unmanaged obj from 2 appdomainspost it in here as well...so here goes I believe I've found a bug in either the compiler or in the runtime....for some reason when accessing a specific unmanaged object from 2 appdomains it causes a fatal execution engine error. It took me over a week to narrow it down, but below I've attached a nice simple snippet of code that showcases this issue: Anyone have any idea what exactly is the root cause of this or how to fix it properly? It seems that if I change the code much (such as remove the 'virtual' from either of the methods, or change the return types, or change the dictionary to something else) the problem goes away, but I'm more worried about where else in my code this affects and I'm hoping for a more viable solution than "search through 500K lines of code and look for that pattern"....not realistic If anyone wants more detail I've put a rar up which contains the source, a debug build with symbols, the eventlog entry, and an adplus dump with corresponding reports: http://www.virgeweb.com/redec/crap/RuntimeFailureTest.rar <snippet> #include <msclr/appdomain.h> using namespace System; using namespace System::Collections::Generic; using namespace msclr; ref class MyManagedClass { }; class MyUnmanagedClass { public: virtual MyManagedClass^ Foo() { return nullptr; } virtual Object^ CrashyCrashy() { Dictionary<String^, MyManagedClass^> ^bar = gcnew Dictionary<String^, MyManagedClass^>(); bar->Add("", nullptr); return nullptr; } }; void Test(MyUnmanagedClass *foo) { foo->CrashyCrashy(); } int WINAPI WinMain(HINSTANCE, HINSTANCE, LPSTR, int) { MyUnmanagedClass foo; AppDomain ^domain1 = AppDomain::CreateDomain("TestDomainOfGoodness"); call_in_appdomain<MyUnmanagedClass*>(domain1->Id, &Test, &foo); return 0; } </snippet> On 2007-11-07 09:52:02 -0800, redec <re***@discussions.microsoft.com> said:
> I originally posted this in dotnet.languages.vc, and it was suggested that I I don't. I admit, I know very little about this sort of thing. > post it in here as well...so here goes > > I believe I've found a bug in either the compiler or in the runtime....for > some reason when accessing a specific unmanaged object from 2 appdomains it > causes a fatal execution engine error. It took me over a week to narrow it > down, but below I've attached a nice simple snippet of code that showcases > this issue: > > Anyone have any idea what exactly is the root cause of this or how to fix it > properly? However, figuring I might learn something I tried your code example. One thing I noticed: if you call foo.CrashyCrashy() in the default app domain (that is, just call it directly), then the subsequent attempt to call it in the other app domain succeeds. That suggests to me that there some sort of deferred initialization that happens and which isn't happening if the object is first call happens in the other app domain. I don't know enough about app domains to know why this would be, and you may be right that it's a bug in the CLR. Or it could be some defined behavior for app domains. I don't really know. I didn't bother trying to look at the vtable for the object, but I'd guess that the state of the vtable is somehow related to this. Especially since I was unable to catch any exception or have the debugger show me an exception: on attempting to execute the call, the process simply exits without any notification or opportunity to look at it in the debugger. :( I suppose a temporary work-around might be to create a dummy virtual function that you call in the default app domain first, and then hopefully the other calls would work. I admit, that's not what I'd call a "good" or "robust" solution. But it might be useful for now. Pete On 2007-11-07 11:14:34 -0800, Peter Duniho <NpOeStPe***@NnOwSlPiAnMk.com> said:
> I don't. I admit, I know very little about this sort of thing. Another observation:> However, figuring I might learn something I tried your code example. > One thing I noticed: if you call foo.CrashyCrashy() in the default app > domain (that is, just call it directly), then the subsequent attempt to > call it in the other app domain succeeds. If the "MyUnmanagedClass" instance is initialized in the other app domain, the call also succeeds. So, obviously there is some initialization of the object that is specific to the app domain in which the object itself is created. Accessing the object in a different app domain before the creating app domain has an opportunity to fully initialize the object causes problems. Again, I don't have enough knowledge to say whether this is "by design" or an oversight within the CLR. However, I wouldn't be surprised if it's "by design", or at least a "we can't fix this" sort of thing. The initialization we're talking about is specific to the C++ features of the compiler, and there may be some reason that the compiler doesn't take into account app domains when managing that initialization. As a completely uninformed hypothesis: perhaps the vtable is initialized in a static constructor of the class, so until the first actual use of some instance of the class the vtable hasn't been initialized. Further, perhaps this "first actual use" for the class is tracked on a per-app-domain basis, so an object created in one app domain must be first used in that same app domain in order for the static constructor to be called. No, I have no idea if this is actually what's going on. But it would fit the symptoms. :) Pete hehehe....yeah, I'm sure it has something to do with deffered
initialization...I've inspected the vtables and they look fine. and yeah I know changing pretty much anything in that code snippet will make it work properly....I'm not terribly worried about getting a work-around to fix this specific case, I'm more worried about the other, harder to find, instances of this in my existing code... this snippet of code, as it existed in the original code, spanned 5 or 6 different objects, across 3 different assemblies (1 C++/CLI and 2 C#)...I was able to narrow it down only because it happened to be frequently executed, and I was able to reproduce it reliably....I'm worried about the infrequently executed code paths where this problem may also crop up...I'd rather not rely on QA to find them all, if you know what I mean :) On 2007-11-07 13:10:01 -0800, redec <re***@discussions.microsoft.com> said:
> hehehe....yeah, I'm sure it has something to do with deffered Well, it seems that by narrowing it down, you've identified the > initialization...I've inspected the vtables and they look fine. and yeah I > know changing pretty much anything in that code snippet will make it work > properly....I'm not terribly worried about getting a work-around to fix this > specific case, I'm more worried about the other, harder to find, instances of > this in my existing code... fundamental issue: calling a virtual function on an object passed across an app domain boundary. For what it's worth (and maybe that's not much), I thought I'd try to read up a little more on app domains, why they exist, what they do, etc. I found the docs actually kind of sparse, considering the potential complexity it seems like an app domain would introduce. But one thing they do discuss is that one main reason for having an app domain is to be able to introduce a process-boundary-like separation between executing code, without all of the overhead of a process. In particular, they make it pretty clear that data _isn't_ supposed to be able to easily get from one app domain to another. It's either copied or proxied as near as I can tell, without allowing code executing in one app domain to directly access data from another app domain. How this applies specifically to your scenario I'm not entirely sure. Taken as simplistically as I've described it above, it's hard to see how your code would _ever_ work, assuming that the data referenced by the "&foo" is simply copied. Since it does work most of the time, that suggests that "call_in_appdomain" is supposed to handle either copying or proxying the object correctly to allow this cross-app-domain execution to take place, and that the case where it doesn't is in fact a bug. If it is a bug, you may have some luck by filing a support request with Microsoft. I have found them moderately responsive to that sort of thing. They don't always solve my problem, but they at least do generally wind up confirming that the behavior is in fact a bug (good for one's sanity, if nothing else :) ). There's clearly a lot I don't understand about app domains still, but you may be able to make more sense of the documents I did run across. Though you may well already be familiar with them, just in case I will offer those links here: http://msdn2.microsoft.com/en-us/library/2bh4z9hs.aspx http://msdn2.microsoft.com/en-us/library/system.marshalbyrefobject.aspx http://msdn2.microsoft.com/en-us/library/x0w2664k(VS.80).aspx I'm a little surprised you haven't gotten a reply from someone more knowledgable. Obviously such people exist :), and hopefully they'll see this thread and offer their own insight. Pete Yeah....I've read alot about them....and I *think* I understand them quite
well. You're mostly correct re: your description of appdomains, however the one thing it seems you don't quite understand is that appdomains are a managed-only concept....they only affect managed objects. Unmanaged objects are (supposed to be) 100% appdomain neutral/ignorant....unaffected by appdomains. Now the docs for call_in_appdomain say that the parameters/return type "must not be clr types". I took that to mean that they should be unmanaged types....which seems to be the correct assumpton 99% of the time....but this specific situation makes one thing that maybe sed unmanaged types can't even reference any managed types (or reference any other unmanaged types which reference managed types)....I really can't see this being the case cuz it seems like a HUGE restriction, and you'd think it would be mentioned somewhere.....but I'm no expert so I don't know On 2007-11-08 08:42:02 -0800, redec <re***@discussions.microsoft.com> said:
> Yeah....I've read alot about them....and I *think* I understand them quite I didn't find anything that made that clear. You may be right about > well. You're mostly correct re: your description of appdomains, however the > one thing it seems you don't quite understand is that appdomains are a > managed-only concept....they only affect managed objects. Unmanaged objects > are (supposed to be) 100% appdomain neutral/ignorant....unaffected by > appdomains. that, but if so it seems like there's some inconsistency here. In particular: > Now the docs for call_in_appdomain say that the First, in your example code the MyUnmanagedClass doesn't really > parameters/return type "must not be clr types". I took that to mean that > they should be unmanaged types.... > which seems to be the correct assumpton 99% > of the time....but this specific situation makes one thing that maybe sed > unmanaged types can't even reference any managed types (or reference any > other unmanaged types which reference managed types)....I really can't see > this being the case cuz it seems like a HUGE restriction, and you'd think it > would be mentioned somewhere.....but I'm no expert so I don't know reference any managed types per se, until one of the methods actually gets a chance to execute. I suppose the mere fact that the return type of the method is a managed type is sufficient to violate the rule, but like you say, you'd think they'd be more clear about that. Secondly (and maybe more germane), if when using call_in_appdomain you're not suppose to use managed types in any way, and if it's also true that for unmanaged data app domains are irrelevant, then I'm at a loss as to why the call_in_appdomain API exists at all. To me, the latter seems more likely to be true, and being in conflict with the former it suggests that the former is what's not true. Which is a long way of saying that I not only agree that it seems like a big restriction that you can't use managed types when using call_in_appdomain, such a restriction would be logically inconsistent with the behavior of app domains generally. I guess in the end, I'm left thinking that you may have simply found a bug in the call_in_appdomain API, and that whatever is accomplished by calling a virtual method on your unmanaged class prior to using call_in_appdomain is something that call_in_appdomain _ought_ to be doing for itself. ..NET bugs, at least those that affect typical use of the framework, are fairly rare but they definitely aren't impossible. I hope you can get some confirmation from Microsoft via the usual support channels that this is in fact a bug, or at least some explanation for what's going on. Pete |
|||||||||||||||||||||||