以最少的停机时间部署 Java webapp 的最佳实践?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/1640333/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Best practices for deploying Java webapps with minimal downtime?
提问by knorv
When deploying a large Java webapp (>100 MB .war) I'm currently use the following deployment process:
部署大型 Java webapp (>100 MB .war) 时,我目前使用以下部署过程:
- The application .war file is expanded locally on the development machine.
- The expanded application is rsync:ed from the development machine to the live environment.
- The app server in the live environment is restarted after the rsync. This step is not strictly needed, but I've found that restarting the application server on deployment avoids "java.lang.OutOfMemoryError: PermGen space" due to frequent class loading.
- 应用程序 .war 文件在开发机器上本地展开。
- 扩展的应用程序是 rsync:ed 从开发机器到实时环境。
- rsync后重启live环境中的app server。这一步不是严格需要的,但我发现在部署时重新启动应用程序服务器可以避免由于频繁的类加载而导致的“java.lang.OutOfMemoryError: PermGen space”。
Good things about this approach:
这种方法的优点:
- The rsync minimizes the amount of data sent from the development machine to the live environment. Uploading the entire .war file takes over ten minutes, whereas an rsync takes a couple of seconds.
- rsync 最大限度地减少了从开发机器发送到实时环境的数据量。上传整个 .war 文件需要十多分钟,而 rsync 需要几秒钟。
Bad things about this approach:
这种方法的坏处:
- While the rsync is running the application context is restarted since the files are updated. Ideally the restart should happen after the rsync is complete, not when it is still running.
- The app server restart causes roughly two minutes of downtime.
- 当 rsync 正在运行时,应用程序上下文会重新启动,因为文件已更新。理想情况下,重新启动应该在 rsync 完成后发生,而不是在它仍在运行时发生。
- 应用服务器重启导致大约两分钟的停机时间。
I'd like to find a deployment process with the following properties:
我想找到具有以下属性的部署过程:
- Minimal downtime during deployment process.
- Minimal time spent uploading the data.
- If the deployment process is app server specific, then the app server must be open-source.
- 部署过程中的停机时间最短。
- 上传数据所花费的时间最少。
- 如果部署过程是特定于应用服务器的,那么应用服务器必须是开源的。
Question:
题:
- Given the stated requirements, what is the optimal deployment process?
- 鉴于规定的要求,最佳部署过程是什么?
采纳答案by Stephen C
It has been noted that rsync does not work well when pushing changes to a WAR file. The reason for this is that WAR files are essentially ZIP files, and by default are created with compressed member files. Small changes to the member files (before compression) result in large scale differences in the ZIP file, rendering rsync's delta-transfer algorithm ineffective.
已经注意到在将更改推送到 WAR 文件时 rsync 不能很好地工作。这样做的原因是 WAR 文件本质上是 ZIP 文件,默认情况下是使用压缩的成员文件创建的。对成员文件(压缩前)的小改动会导致 ZIP 文件中出现大规模差异,从而导致 rsync 的增量传输算法无效。
One possible solution is to use jar -0 ...
to create the original WAR file. The -0
option tells the jar
command to not compress the member files when creating the WAR file. Then, when rsync
compares the old and new versions of the WAR file, the delta-transfer algorithm should be able to create small diffs. Then arrange that rsync sends the diffs (or original files) in compressed form; e.g. use rsync -z ...
or a compressed data stream / transport underneath.
一种可能的解决方案是使用jar -0 ...
来创建原始 WAR 文件。该-0
选项告诉jar
命令在创建 WAR 文件时不要压缩成员文件。然后,当rsync
比较 WAR 文件的新旧版本时,delta-transfer 算法应该能够创建小的差异。然后安排 rsync 以压缩形式发送差异(或原始文件);例如使用rsync -z ...
或下面的压缩数据流/传输。
EDIT: Depending on how the WAR file is structured, it may also be necessary to use jar -0 ...
to create component JAR files. This would apply to JAR files that are frequently subject to change (or that are simply rebuilt), rather than to stable 3rd party JAR files.
编辑:根据 WAR 文件的结构,可能还需要使用它jar -0 ...
来创建组件 JAR 文件。这适用于经常更改(或只是重新构建)的 JAR 文件,而不适用于稳定的 3rd 方 JAR 文件。
In theory, this procedure should give a significant improvement over sending regular WAR files. In practice I have not tried this, so I cannot promise that it will work.
理论上,这个过程应该比发送常规 WAR 文件有显着的改进。在实践中我没有尝试过这个,所以我不能保证它会起作用。
The downside is that the deployed WAR file will be significantly bigger. This may result in longer webapp startup times, though I suspect that the effect would be marginal.
缺点是部署的 WAR 文件会大很多。这可能会导致 webapp 启动时间更长,但我怀疑效果会微乎其微。
A different approach entirely would be to look at your WAR file to see if you can identify library JARs that are likely to (almost) never change. Take these JARs out of the WAR file, and deploy them separately into the Tomcat server's common/lib
directory; e.g. using rsync
.
一种完全不同的方法是查看您的 WAR 文件,看看您是否可以识别可能(几乎)永远不会更改的库 JAR。将这些jar包从WAR文件中取出,分别部署到Tomcat服务器的common/lib
目录中;例如使用rsync
.
回答by Asaph
Update:
更新:
Since this answer was first written, a better way to deploy war files to tomcat with zero downtime has emerged. In recent versions of tomcat you can include version numbers in your war filenames. So for example, you can deploy the files ROOT##001.war
and ROOT##002.war
to the same context simultaneously. Everything after the ##
is interpreted as a version number by tomcat and not part of the context path. Tomcat will keep all versions of your app running and serve new requests and sessions to the newest version that is fully up while gracefully completing old requests and sessions on the version they started with. Specifying version numbers can also be done via the tomcat manager and even the catalina ant tasks. More info here.
自从首次编写此答案以来,出现了一种将战争文件部署到 tomcat 且零停机时间的更好方法。在最新版本的 tomcat 中,您可以在 war 文件名中包含版本号。因此,例如,您可以同时将文件ROOT##001.war
和部署ROOT##002.war
到相同的上下文。之后的所有内容都##
被 tomcat 解释为版本号,而不是上下文路径的一部分。Tomcat 将使您的应用程序的所有版本保持运行,并将新请求和会话提供给完全启动的最新版本,同时在它们开始的版本上优雅地完成旧请求和会话。指定版本号也可以通过 tomcat 管理器甚至 catalina ant 任务来完成。更多信息在这里。
Original Answer:
原答案:
Rsync tends to be ineffective on compressed files since it's delta-transfer algorithm looks for changes in files and a small change an uncompressed file, can drastically alter the resultant compressed version. For this reason, it might make good sense to rsync an uncompressed war file rather than a compressed version, if network bandwith proves to be a bottleneck.
Rsync 对压缩文件往往无效,因为它的增量传输算法会查找文件中的更改,而未压缩文件的微小更改可能会彻底改变生成的压缩版本。出于这个原因,如果网络带宽被证明是一个瓶颈,那么 rsync 一个未压缩的 war 文件而不是压缩版本可能是有意义的。
What's wrong with using the Tomcat manager application to do your deployments? If you don't want to upload the entire war file directly to the Tomcat manager app from a remote location, you could rsync it (uncompressed for reasons mentioned above) to a placeholder location on the production box, repackage it to a war, and then hand it to the manager locally. There exists a nice ant task that ships with Tomcat allowing you to script deployments using the Tomcat manager app.
使用 Tomcat 管理器应用程序进行部署有什么问题?如果您不想将整个 war 文件从远程位置直接上传到 Tomcat 管理器应用程序,您可以将它(由于上述原因未压缩)rsync 到生产框上的占位符位置,将其重新打包为一个战争,然后然后交给当地的经理。Tomcat 附带了一个很好的 ant 任务,允许您使用 Tomcat 管理器应用程序编写部署脚本。
There is an additional flaw in your approach that you haven't mentioned: While your application is partially deployed (during an rsync operation), your application could be in an inconsistent state where changed interfaces may be out of sync, new/updated dependencies may be unavailable, etc. Also, depending on how long your rsync job takes, your application may actually restart multiple times. Are you aware that you can and should turn off the listening-for-changed-files-and-restarting behavior in Tomcat? It is actually not recommended for production systems. You can always do a manual or ant scripted restart of your application using the Tomcat manager app.
There is an additional flaw in your approach that you haven't mentioned: While your application is partially deployed (during an rsync operation), your application could be in an inconsistent state where changed interfaces may be out of sync, new/updated dependencies may be unavailable, etc. Also, depending on how long your rsync job takes, your application may actually restart multiple times. Are you aware that you can and should turn off the listening-for-changed-files-and-restarting behavior in Tomcat? It is actually not recommended for production systems. You can always do a manual or ant scripted restart of your application using the Tomcat manager app.
Your application will be unavailable to users during a restart, of course. But if you're so concerned about availability, you surely have redundant web servers behind a load balancer. When deploying an updated war file, you could temporarily have the load balancer send all requests to other web servers until the deployment is over. Rinse and repeat for your other web servers.
当然,您的应用程序在重启期间对用户不可用。但是,如果您非常关心可用性,那么您肯定在负载平衡器后面有冗余的 Web 服务器。部署更新的 war 文件时,您可以暂时让负载均衡器将所有请求发送到其他 Web 服务器,直到部署结束。冲洗并重复您的其他 Web 服务器。
回答by jarnbjo
Can't you make a local copy of the current web application on the web server, rsync to that directory and then perhaps even using symbolic links, in one "go", point Tomcat to a new deployment without much downtime?
您不能在 Web 服务器上制作当前 Web 应用程序的本地副本,将其 rsync 同步到该目录,然后甚至可能使用符号链接,一次性将 Tomcat 指向一个新的部署,而无需太多停机时间吗?
回答by Kent Lai
I'm not sure if this answers your question, but I'll just share on the deployment process I use or encounter in the few projects I did.
我不确定这是否能回答您的问题,但我只会分享我在我做过的几个项目中使用或遇到的部署过程。
Similiar to you, I do not ever recall making a full war redeployment or update. Most of the time, my updates are restricted to a few jsp files, maybe a library, some class files. I am able to manage and determine which are the affected artifacts, and usually, we packaged those update in a zip file, along with an update script. I will run the update script. The script does the following:
与您类似,我不记得进行过全面的战争重新部署或更新。大多数时候,我的更新仅限于几个 jsp 文件,可能是一个库,一些类文件。我能够管理和确定哪些是受影响的工件,通常,我们将这些更新与更新脚本一起打包在一个 zip 文件中。我将运行更新脚本。该脚本执行以下操作:
- Backup the files that will be overwritten, maybe to a folder with today's date and time.
- Unpackage my files
- Stop the application server
- Move the files over
- Start the application server
- 备份将被覆盖的文件,也许备份到具有今天日期和时间的文件夹。
- 解包我的文件
- 停止应用服务器
- 将文件移过去
- 启动应用服务器
If downtime is a concern, and they usually are, my projects are usually HA, even if they are not sharing state but using a router that provide sticky session routing.
如果停机是一个问题,通常是这样,我的项目通常是 HA,即使它们不共享状态而是使用提供粘性会话路由的路由器。
Another thing that I am curious would be, why the need to rsync? You should able to know what are the required changes, by determining them on your staging/development environment, not performing delta checks with live. In most cases, you would have to tune your rsync to ignore files anyway, like certain property files that define resources a production server use, like database connection, smtp server, etc.
我很好奇的另一件事是,为什么需要 rsync?您应该能够通过在您的登台/开发环境中确定所需的更改,而不是使用 live 执行增量检查来了解哪些更改。在大多数情况下,您必须调整 rsync 以忽略文件,例如定义生产服务器使用的资源的某些属性文件,例如数据库连接、smtp 服务器等。
I hope this is helpful.
我希望这是有帮助的。
回答by cetnar
My advice is to use rsync with exploded versions but deploy a war file.
我的建议是将 rsync 与分解版本一起使用,但部署一个 war 文件。
- Create temporary folder in the live environment where you'll have exploded version of webapp.
- Rsync exploded versions.
- After successfull rsync create a war file in temporary folder in the live environment machine.
- Replace old war in the server deploy directory with new one from temporary folder.
- 在实时环境中创建临时文件夹,您将在其中拥有扩展版本的 webapp。
- Rsync 爆炸版本。
- 成功rsync后,在实时环境机器的临时文件夹中创建一个war文件。
- 用临时文件夹中的新战争替换服务器部署目录中的旧战争。
Replacing old war with new one is recommended in JBoss container (which is based on Tomcat) beacause it'a atomic and fast operation and it's sure that when deployer will start entire application will be in deployed state.
建议在 JBoss 容器(基于 Tomcat)中用新的 war 替换旧的 war,因为它是一个原子且快速的操作,并且确保当部署程序启动时整个应用程序将处于部署状态。
回答by elhoim
Hot Deploy a Java EAR to Minimize or Eliminate Downtime of an Application on a Serveror How to “hot” deploy war dependency in Jboss using Jboss Tools Eclipse pluginmight have some options for you.
热部署 Java EAR 以最小化或消除服务器上应用程序的停机时间或如何使用 Jboss 工具在 Jboss 中“热”部署战争依赖项 Eclipse 插件可能为您提供一些选择。
Deploying to a cluster with no downtimeis interesting too.
部署到没有停机时间的集群也很有趣。
JavaRebel has hot-code deployementtoo.
JavaRebel 也有热代码部署。
回答by mhaller
Your approach to rsync the extracted war is pretty good, also the restart since I believe that a production server should not have hot-deployment enabled. So, the only downside is the downtime when you need to restart the server, right?
您对提取的战争进行 rsync 同步的方法非常好,重新启动也是如此,因为我认为生产服务器不应该启用热部署。那么,唯一的缺点是需要重新启动服务器时的停机时间,对吗?
I assume all state of your application is hold in the database, so you have no problem with some users working on one app server instance while other users are on another app server instance. If so,
我假设您的应用程序的所有状态都保存在数据库中,因此您没有问题,一些用户在一个应用服务器实例上工作,而其他用户在另一个应用服务器实例上工作。如果是这样的话,
Run two app servers: Start up the second app server (which listens on other TCP ports) and deploy your application there. After deployment, update the Apache httpd's configuration (mod_jk or mod_proxy) to point to the second app server. Gracefully restarting the Apache httpd process. This way you will have no downtime and new users and requests are automatically redirected to the new app server.
运行两个应用服务器:启动第二个应用服务器(它侦听其他 TCP 端口)并在那里部署您的应用程序。部署后,更新 Apache httpd 的配置(mod_jk 或 mod_proxy)以指向第二个应用服务器。正常重启 Apache httpd 进程。这样您就不会停机,新用户和请求会自动重定向到新的应用程序服务器。
If you can make use of the app server's clustering and session replication support, it will be even smooth for users which are currently logged in, as the second app server will resync as soon as it starts. Then, when there are no accesses to the first server, shut it down.
如果您可以利用应用服务器的集群和会话复制支持,那么对于当前登录的用户来说,它会更加顺畅,因为第二个应用服务器将在启动后立即重新同步。然后,当没有访问第一个服务器时,将其关闭。
回答by Pascal Thivent
If static files are a big part of your big WAR (100Mo is pretty big), then putting them outside the WAR and deploying them on a web server (e.g. Apache) in front of your application server might speed up things. On top of that, Apache usually does a better job at serving static files than a servlet engine does (even if most of them made significant progress in that area).
如果静态文件是大型 WAR 的重要组成部分(100Mo 相当大),那么将它们放在 WAR 之外并将它们部署在应用服务器前面的 Web 服务器(例如 Apache)上可能会加快速度。最重要的是,Apache 在提供静态文件方面通常比 servlet 引擎做得更好(即使它们中的大多数在该领域取得了重大进展)。
So, instead of producing a big fat WAR, put it on diet and produce:
因此,与其产生巨大的脂肪战争,不如节食并产生:
- a big fat ZIP with static files for Apache
- a less fat WAR for the servlet engine.
- 一个包含 Apache 静态文件的大 ZIP
- servlet 引擎的一个不那么胖的 WAR。
Optionally, go further in the process of making the WAR thinner: if possible, deploy Grails and other JARs that don't change frequently (which is likely the case of most of them) at the application server level.
或者,在使 WAR 更精简的过程中更进一步:如果可能,在应用程序服务器级别部署 Grails 和其他不经常更改的 JAR(这可能是其中的大多数情况)。
If you succeed in producing a lighter WAR, I wouldn't bother of rsyncing directories rather than archives.
如果你成功地制作了一个更轻的 WAR,我就不会费心 rsync 目录而不是档案。
Strengths of this approach:
这种方法的优点:
- The static files can be hot "deployed" on Apache (e.g. use a symbolic link pointing on the current directory, unzip the new files, update the symlink and voilà).
- The WAR will be thinner and it will take less time to deploy it.
- 静态文件可以在 Apache 上热“部署”(例如,使用指向当前目录的符号链接、解压缩新文件、更新符号链接和瞧)。
- WAR 将更薄,部署它所需的时间将更少。
Weakness of this approach:
这种方法的弱点:
- There is one more server (the web server) so this add (a bit) more complexity.
- You'll need to change the build scripts (not a big deal IMO).
- You'll need to change the rsync logic.
- 还有一个服务器(网络服务器),所以这增加了(有点)复杂性。
- 您需要更改构建脚本(IMO 没什么大不了的)。
- 您需要更改 rsync 逻辑。
回答by ideasculptor
In any environment where downtime is a consideration, you are surely running some sort of cluster of servers to increase reliability via redundancy. I'd take a host out of the cluster, update it, and then throw it back into the cluster. If you have an update that cannot run in a mixed environment (incompatible schema change required on the db, for example), you are going to have to take the whole site down, at least for a moment. The trick is to bring up replacement processes before dropping the originals.
在任何需要考虑停机时间的环境中,您肯定会运行某种服务器集群以通过冗余来提高可靠性。我会从集群中取出一台主机,对其进行更新,然后将其扔回集群中。如果您的更新无法在混合环境中运行(例如,需要在数据库上进行不兼容的架构更改),那么您将不得不关闭整个站点,至少暂时关闭。诀窍是在丢弃原件之前提出更换流程。
Using tomcat as an example - you can use CATALINA_BASE to define a directory where all of tomcat's working directories will be found, separate from the executable code. Every time I deploy software, I deploy to a new base directory so that I can have new code resident on disk next to old code. I can then start up another instance of tomcat which points to the new base directory, get everything started up and running, then swap the old process (port number) with the new one in the load balancer.
以 tomcat 为例 - 您可以使用 CATALINA_BASE 定义一个目录,在该目录中可以找到所有 tomcat 的工作目录,与可执行代码分开。每次我部署软件时,我都会部署到一个新的基本目录,这样我就可以将新代码驻留在旧代码旁边的磁盘上。然后,我可以启动另一个指向新基目录的 tomcat 实例,启动并运行所有内容,然后将旧进程(端口号)与负载均衡器中的新进程交换。
If I am concerned about preserving session data across the switch, I can set up my system such that every host has a partner to which it replicates session data. I can drop one of those hosts, update it, bring it back up so that it picks the session data back up, and then switch the two hosts. If I've got multiple pairs in the cluster, I can drop half of all pairs, then do a mass switch, or I can do them a pair at a time, depending upon the requirements of the release, requirements of the enterprise, etc. Personally, however, I prefer to just allow end-users to suffer the very occasional loss of an active session rather than deal with trying to upgrade with sessions intact.
如果我担心在交换机上保留会话数据,我可以设置我的系统,以便每个主机都有一个伙伴来复制会话数据。我可以删除其中一台主机,更新它,将其恢复,以便它选择备份会话数据,然后切换两台主机。如果集群中有多个对,我可以丢弃所有对的一半,然后进行批量切换,或者我可以一次做一对,具体取决于发布的要求、企业的要求等. 然而,就我个人而言,我更愿意让最终用户偶尔遭受活动会话的损失,而不是尝试在会话完好无损的情况下进行升级。
It's all a tradeoff between IT infrastructure, release process complexity, and developer effort. If your cluster is big enough and your desire strong enough, it is easy enough to design a system that can be swapped out with no downtime at all for most updates. Large schema changes often force actual downtime, since updated software usually cannot accommodate the old schema, and you probably cannot get away with copying the data to a new db instance, doing the schema update, and then switching the servers to the new db, since you will have missed any data written to the old after the new db was cloned from it. Of course, if you have resources, you can task developers with modifying the new app to use new table names for all tables that are updated, and you can put triggers in place on the live db which will correctly update the new tables with data as it is written to the old tables by the prior version (or maybe use views to emulate one schema from the other). Bring up your new app servers and swap them into the cluster. There are a ton of games you can play in order to minimize downtime if you have the development resources to build them.
这都是 IT 基础设施、发布过程复杂性和开发人员工作之间的权衡。如果您的集群足够大并且您的愿望足够强烈,那么设计一个系统就足够容易了,该系统可以在不停机的情况下进行大多数更新。大型架构更改通常会导致实际停机时间,因为更新的软件通常无法容纳旧架构,并且您可能无法逃脱将数据复制到新数据库实例,进行架构更新,然后将服务器切换到新数据库,因为在从中克隆新数据库后,您将错过任何写入旧数据库的数据。当然,如果您有资源,您可以要求开发人员修改新应用程序以对所有更新的表使用新表名,并且您可以在实时数据库上放置触发器,这将正确更新新表的数据,因为它是由先前版本写入旧表的(或者可能使用视图来模拟另一个模式)。启动您的新应用服务器并将它们交换到集群中。如果您有开发资源来构建它们,您可以玩大量游戏以最大程度地减少停机时间。
Perhaps the most useful mechanism for reducing downtime during software upgrades is to make sure that your app can function in a read-only mode. That will deliver some necessary functionality to your users but leave you with the ability to make system-wide changes that require database modifications and such. Place your app into read-only mode, then clone the data, update schema, bring up new app servers against new db, then switch the load balancer to use the new app servers. Your only downtime is the time required to switch into read-only mode and the time required to modify the config of your load balancer (most of which can handle it without any downtime whatsoever).
也许在软件升级期间减少停机时间的最有用机制是确保您的应用程序可以在只读模式下运行。这将为您的用户提供一些必要的功能,但让您能够进行需要数据库修改等的系统范围的更改。将您的应用程序置于只读模式,然后克隆数据、更新架构、针对新数据库启动新的应用程序服务器,然后切换负载均衡器以使用新的应用程序服务器。您唯一的停机时间是切换到只读模式所需的时间以及修改负载平衡器配置所需的时间(其中大部分时间都可以处理而无需任何停机时间)。
回答by Jé Queue
At what is your PermSpace set? I would expect to see this grow as well but shouldgo down after collection of the old classes? (or does the ClassLoader still sit around?)
您的 PermSpace 设置在什么位置?我希望看到这种增长,但应该在收集旧类后下降?(或者 ClassLoader 仍然坐在那里吗?)
Thinking outloud, you could rsync to a separate version- or date-named directory. If the container supports symbolic links, could you SIGSTOP the root process, switch over the context's filesystem root via symbolic link, and then SIGCONT?
大胆思考,您可以将 rsync 同步到单独的以版本或日期命名的目录。如果容器支持符号链接,你能不能 SIGSTOP 根进程,通过符号链接切换上下文的文件系统根,然后 SIGCONT?