NetTcpActivator 服务(Net.Tcp Listener Adapter)偶尔停止响应
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/15412761/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
NetTcpActivator service (Net.Tcp Listener Adapter) stops responding occasionally
提问by Sergey Popov
In my current project we (I mean "project team") use WCF services hosted on IIS.
在我当前的项目中,我们(我的意思是“项目团队”)使用托管在 IIS 上的 WCF 服务。
Here are some technical details which may be important:
以下是一些可能很重要的技术细节:
- We use NET 3.5 for WCF services
- We use NET.TCP communication protocol
- We use both IIS 7 and IIS 7.5 to host these services
- We use multiple IIS worker processes on each server
- 我们将 NET 3.5 用于 WCF 服务
- 我们使用NET.TCP通讯协议
- 我们同时使用 IIS 7 和 IIS 7.5 来托管这些服务
- 我们在每台服务器上使用多个 IIS 工作进程
So, the problem is - sometimes WCF-services become unavailable. When we try to reach these WCF-services we get timeout error. And the only way to restore WCF-service functioning is to restart NetTcpActivator (Net.Tcp Listener Adapter) Windows service.
所以,问题是 - 有时 WCF 服务变得不可用。当我们尝试访问这些 WCF 服务时,我们会收到超时错误。恢复 WCF 服务功能的唯一方法是重新启动 NetTcpActivator(Net.Tcp 侦听器适配器)Windows 服务。
According to my colleague's theory, this error may be related to the problems described in this KB article:
根据我同事的理论,这个错误可能与这篇KB文章中描述的问题有关:
FIX: Smsvchost.exe for the WCF service stops responding when you run a .NET Framework 4-based WCF service http://support.microsoft.com/kb/2536618
修复:当您运行基于 .NET Framework 4 的 WCF 服务时,WCF 服务的 Smsvchost.exe 停止响应http://support.microsoft.com/kb/2536618
According to this article, SMSvcHost (container service which hosts NetTcpActivator and Port Sharing Service) hangs up if it can't route a request to w3wp (IIS worker process) in over 60 seconds (non-configurable timeout). Unfortunately, we are unable to find the way to reproduce this error. For example, we limited SMSvcHost to 1 CPU core and 1 thread and extended pending connections limit to 1M and pushing it to 100% CPU load in user mode. And it didn't hang!
根据这篇文章,如果 SMSvcHost(承载 NetTcpActivator 和端口共享服务的容器服务)无法在超过 60 秒(不可配置超时)内将请求路由到 w3wp(IIS 工作进程),则会挂起。不幸的是,我们无法找到重现此错误的方法。例如,我们将 SMSvcHost 限制为 1 个 CPU 内核和 1 个线程,并将挂起连接限制扩展到 1M,并在用户模式下将其推到 100% CPU 负载。它没有挂!
Sometimes our load tests lead to strange errors, but when we stop them, all services automatically recover to their normal state. But sometimes not a heavy load may hang NetTcpActivator!
有时我们的负载测试会导致奇怪的错误,但是当我们停止它们时,所有服务都会自动恢复到正常状态。但是有时候负载不重可能会挂掉NetTcpActivator!
In addition, I would like to say that this is not a new problem. My colleagues already got it 3 years ago (see this thread for additional information http://forums.iis.net/t/1167668.aspx/1/10). And, unfortunately, they didn't get the answer. The problem just disappeared after some configuration changes! And now it came back on the new server.
另外,我想说,这不是一个新问题。我的同事在 3 年前就已经拿到了(有关其他信息,请参阅此主题http://forums.iis.net/t/1167668.aspx/1/10)。而且,不幸的是,他们没有得到答案。一些配置更改后问题就消失了!现在它又回到了新服务器上。
I will really appreciate all you thoughts and ideas!
我将非常感谢您的所有想法和想法!
回答by Nelson Rothermel
Alright, after lots of research I tracked down the cause of our issue. There may be other scenarios where this occurs, but hopefully this will help some people. Microsoft is in the process of reproducing in their labs and should have a fix eventually.
好的,经过大量研究,我找到了我们问题的原因。可能还有其他情况会发生这种情况,但希望这会对某些人有所帮助。Microsoft 正在他们的实验室中进行复制,最终应该会修复。
In our case, all the planets had to align. We had one .NET 4 integrated app pool for client and server (on developer machine). The service was using an external config file for bindings (<bindings configSource="serviceModel.bindings.config" />) which was linked from another project and copied at build time with a custom build task added to the service's .csproj.
在我们的例子中,所有的行星都必须对齐。我们有一个用于客户端和服务器的 .NET 4 集成应用程序池(在开发人员机器上)。该服务使用外部配置文件进行绑定 ( <bindings configSource="serviceModel.bindings.config" />),该文件从另一个项目链接并在构建时复制,自定义构建任务添加到服务的 .csproj。
To reproduce the issue:
要重现该问题:
- Stop all SMSvcHost services that are running (Net.Tcp*, Net.Pipe, Net.Msmq). Restart won't work since the SMSvcHost process doesn't go away.
- From Visual Studio, run a Clean for WcfService
- From Windows Explorer, delete serviceModel.bindings.config in WcfService
- Run iisreset (gets rid of w3wp and starts SMSvcHost services -- press F5 is services list to see that)
- Build WcfService (copies the linked config file)
- Browse to WcfClient page, submit twice. If you get an error each time, you probably have the issue. On our main application it was giving a timeout, in the test app CommunicationObjectFaultedException instead of the timeout, but either is fine.
- Stop the SMSvcHost services. If the issue occurred, Event ID 8 for SMSvcHost is logged to the System event log.
- 停止所有正在运行的 SMSvcHost 服务(Net.Tcp*、Net.Pipe、Net.Msmq)。由于 SMSvcHost 进程没有消失,因此重新启动将不起作用。
- 在 Visual Studio 中,为 WcfService 运行 Clean
- 从 Windows 资源管理器中,删除 WcfService 中的 serviceModel.bindings.config
- 运行 iisreset(摆脱 w3wp 并启动 SMSvcHost 服务——按 F5 是服务列表查看)
- 构建 WcfService(复制链接的配置文件)
- 浏览到 WcfClient 页面,提交两次。如果您每次都遇到错误,则您可能遇到了问题。在我们的主应用程序中,它给出了超时,在测试应用程序 CommunicationObjectFaultedException 中而不是超时,但两者都可以。
- 停止 SMSvcHost 服务。如果出现此问题,SMSvcHost 的事件 ID 8 将记录到系统事件日志中。
I don't know yet if w3wp or SMSvcHost is the culprit. Step #3 is critical, though I can't explain why yet. If you don't delete the file, then all is fine. If you modify the file (created date stays the same), all is fine. If you move the config XML into the main Web.config file, all is fine. When the build task copies the file the created date is updated, so I am guessing it's cached some way and one of the processes detects the date change.
我还不知道 w3wp 还是 SMSvcHost 是罪魁祸首。第 3 步很关键,但我还不能解释为什么。如果您不删除该文件,则一切正常。如果您修改文件(创建日期保持不变),一切都很好。如果您将配置 XML 移动到主 Web.config 文件中,则一切正常。当构建任务复制文件时,创建的日期会更新,所以我猜它是以某种方式缓存的,其中一个进程检测到日期更改。
If you restart the SMSvcHost services (full stop, full start) once or twice the client request will go through and from then on you're fine.
如果您重新启动 SMSvcHost 服务(完全停止、完全启动)一两次,客户端请求将通过,从那时起您就可以了。
So my guess for now is that this could be an issue right after a deployment, but if you make sure everything is running (and restart services as needed) then you should be fine. You can also not do the external/linked files.
所以我现在的猜测是,这可能是部署后立即出现的问题,但如果您确保一切都在运行(并根据需要重新启动服务),那么您应该没问题。您也可以不执行外部/链接文件。
Once Microsoft tracks down the issue I will hopefully have more insight.
一旦微软追踪到这个问题,我希望能有更多的洞察力。
Final UpdateI forgot to come back to this earlier. Microsoft essentially admitted they probably had a bug but since there was a workaround and had spent enough time on the ticket they were closing it and not researching further. There appears to be some type of race condition when SMSvcHost starts up with the following setup (similar to what I posted earlier):
最后更新我忘了早点回来。微软基本上承认他们可能有一个错误,但由于有一个解决方法并且在票证上花费了足够的时间,他们正在关闭它而不是进一步研究。当 SMSvcHost 使用以下设置启动时,似乎存在某种类型的竞争条件(类似于我之前发布的内容):
- Host WCF in IIS
- Use a non-HTTP binding so that SMSvcHost comes into play
- Use external config file for bindings using
configSource
- 在 IIS 中托管 WCF
- 使用非 HTTP 绑定,以便 SMSvcHost 发挥作用
- 使用外部配置文件进行绑定
configSource
Linking the external config had nothing to do with it. The workaround was to not use configSourcewhich we are doing now.
链接外部配置与它无关。解决方法是不使用configSource我们现在正在做的事情。

