Friday, 15 October 2010

App Pool stopping in IIS7 - error 0xc0000374, fatal communication error with the Windows Process Activation Service

It's my first blog post people!  But don't get too excited, I'm only starting this as somewhere to record all those really evil issues you come across when trouble shooting that wind up being a pig to google for.

So, I'll dive right in.  We had an issue with a web service falling over on a live server.  It wasn't happening in dev and we hadn't really seen it happen in test but come go live, there it was.  Inconsistent but still quite common, our Asp .NET 2.0 App Pool would go down with the following entries in the event log:

Error:
Faulting application w3wp.exe, version 7.0.6001.18000, time stamp 0x47919413, faulting module ntdll.dll, version 6.0.6001.18000, time stamp 0x4791a7a6, exception code 0xc0000374, fault offset 0x000b015d, process id 0x1e78, application start time 0x01cb6b79e518d927.
Warning:
A process serving application pool 'ASP.NET 2.0 App Pool' suffered a fatal communication error with the Windows Process Activation Service. The process id was 'xxxx'. The data field contains the error number.

It turns out that this error message indicates some form of heap corruption going on.

After much googling I found the following potential solution that suggested we disable the DynamicIPRestrictionModule.  Well, there isn't a dynamic IP restriction module in IIS anymore, but I went ahead and disabled the IPRestrictionModule anyway.
We did that by opening IIS config %system32%/inetsrv/config/applicationHost.config file and commenting out the following 2 lines:
<!-- <add name="IpRestrictionModule" image="%windir%\System32\inetsrv\iprestr.dll" /> -->
<!-- <add name="IpRestrictionModule" lockItem="true" /> -->
So far as I'm aware, this module is used to prevent access from specific IP addresses or domains, which isn't quite the same as the dynamic ip restriction module but we weren't using it and we figured it was better to try than not.  That said, I don't think this did anything to help, but it was worth a try while we were waiting for the crash to happen again.

The other thing we did was to install the Debug Diag tool on the server and watch for the crash.  This gave us the following error message in the logs to work from:
Script Error
Error Code - 0x80004005
Error Source [Unavailable]
Error Description [Could not obtain System ID for this thread]

This lead us to another potential cause (I can't find the source of this now) that suggested it was a piece of code that was disposing of an already disposed object.  We had just one destructor on the project, so we took that out and we're hoping it takes the problem away.  I'll update this post once I know which solution did what.

UPDATE:
It's now significantly after the fact and we haven't seen this again since so I believe the problem was in fact a destructor that didn't need to be there.  Check your libraries, check your dlls people!  Make sure there aren't any unnecessary destructors in there! (Sorry Tim, I don't know how to reply to you directly).