java 在创建高可用性应用程序时最常利用哪些设计模式?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/814742/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-29 13:54:27  来源:igfitidea点击:

What design patterns are most leveraged in creating high availability applications?

javadesign-patternsdatabase-designhigh-availability

提问by McGovernTheory

Likewise are there design patterns that should be avoided?

同样,是否有应该避免的设计模式?

采纳答案by nso1

I assume you are writing a server type application (lets leave Web apps for a while - there are some good off the shelf solutions that can help there, so lets look at the "i've got this great new type of server I have write", but I want it to be HA problem).

我假设您正在编写一个服务器类型的应用程序(让我们暂时离开 Web 应用程序 - 有一些很好的现成解决方案可以帮助那里,所以让我们看看“我有我写的这种很棒的新型服务器",但我希望它是 HA 问题)。

In a server implementation, the requests from clients are usually (in some form or another) converted to some event or command type pattern, and are then executed on one or more queue's.

在服务器实现中,来自客户端的请求通常(以某种形式)转换为某种事件或命令类型模式,然后在一个或多个队列上执行。

So, first problem - need to store events/commands in a manner that will survive in the cluster (ie. when a new node takes over as master , it looks at the next command that needs executing and begins).

所以,第一个问题 - 需要以一种将在集群中存活的方式存储事件/命令(即,当一个新节点作为 master 接管时,它会查看下一个需要执行并开始的命令)。

Lets start with a single threaded server impl (the easiest - and concepts still apply to multi-threaded but its got its own set of issues0. When a command is being processed need some sort of transaction processing.

让我们从单线程服务器实现开始(最简单的 - 和概念仍然适用于多线程,但它有自己的一组问题 0。处理命令时需要某种事务处理。

Another concern is managing side effects and how do you handle failure of the current command ? Where possible, handle side effects in a transactional manner, so that they are all or nothing. ie. if the command changes state variables, but crashes half way through execution, being able to return to the "previous" state is great. This allows the new master node to resume the crashed command and just re-run the command. A good way again is breaking a side effects into smaller tasks that can again be run on any node. ie. store the main request start and end tasks, with lots of little tasks that handle say only one side effect per task.

另一个问题是管理副作用以及如何处理当前命令的失败?在可能的情况下,以事务性的方式处理副作用,以便它们要么全有要么全无。IE。如果命令更改了状态变量,但在执行过程中崩溃了,那么能够返回到“前一个”状态就很好了。这允许新的主节点恢复崩溃的命令并重新运行该命令。另一个好方法是将副作用分解为可以再次在任何节点上运行的较小任务。IE。存储主要请求开始和结束任务,有很多小任务处理,每个任务只有一个副作用。

This also introduces other issues which will effect your design. Those state variables are not necessarily databases updates. They could be shared state (say a finite state machine for an internal component) that needs to also be distributed in the cluster. So the pattern for managing changes such that the master code must see a consistent version of the state it needs, and then committing that state across the cluster. Using some form of immutable (at least from the master thread doing the update) data storage is useful. ie. all updates are effectively done on new copies that must go through some sort of mediator or facade that only updates the local in memory copies with the updates after updating across the cluster (or the minimum number of members across the cluster for data consistency).

这也引入了会影响您的设计的其他问题。这些状态变量不一定是数据库更新。它们可以是共享状态(比如内部组件的有限状态机),也需要分布在集群中。因此,管理更改的模式使得主代码必须看到其所需状态的一致版本,然后在整个集群中提交该状态。使用某种形式的不可变(至少来自执行更新的主线程)数据存储很有用。IE。所有更新都是在新副本上有效完成的,这些新副本必须通过某种中介或门面,这些副本仅在跨集群更新后使用更新更新内存中的本地副本(或跨集群的最小成员数量以实现数据一致性)。

Some of these issues are also present for master worker systems.

其中一些问题也存在于主工作器系统中。

Also need good error management as the number of things that can go wrong on state update increases (as you have the network now involved).

还需要良好的错误管理,因为状态更新时可能出错的事情的数量会增加(因为现在涉及网络)。

I use the state pattern a lot. Instead of one line updates, for side effects you want to send requests/responses, and use conversation specific fsm's to track the progress.

我经常使用状态模式。而不是一行更新,对于副作用,您希望发送请求/响应,并使用对话特定的 fsm 来跟踪进度。

Another issue is the representation of end points. ie. client connected to master node needs to be able to reconnect to the new master, and then listen for results ? Or do you simply cancel all pending results and let the clients resubmit ? If you allow pending requests to be processed, a nice way to identify endpoints (clients) is needed (ie. some sort of client id in a lookup).

另一个问题是端点的表示。IE。连接到主节点的客户端需要能够重新连接到新的主节点,然后监听结果?或者您只是取消所有待处理的结果并让客户重新提交?如果您允许处理挂起的请求,则需要一种识别端点(客户端)的好方法(即,查找中的某种客户端 ID)。

Also need cleanup code etc (ie. don't want data waiting for a client to reconnect to wait forever).

还需要清理代码等(即不希望等待客户端重新连接的数据永远等待)。

Lots of queue are used. A lot of people will therefore using some message bus (jms say for java) to push events in a transactional manner.

使用了很多队列。因此,很多人会使用一些消息总线(jms 表示 java)以事务方式推送事件。

Terracotta (again for java) solves a lot of this for you - just update the memory - terracotta is your facade/mediator here. They have just inject the aspects for your.

兵马俑(再次用于 Java)为您解决了很多问题 - 只需更新内存 - 兵马俑在这里是您的门面/中介。他们刚刚为您注入了方面。

Terracotta (i don't work for them) - introduces the concept of "super static", so you get these cluster wide singletons that are cool, but you just need to be aware how this will effect testing and development workflow - ie. use lots of composition, instead of inheritance of concrete implementations for good reuse.

Terracotta(我不为他们工作) - 引入了“超级静态”的概念,所以你会得到这些很酷的集群范围的单例,但你只需要知道这将如何影响测试和开发工作流程 - 即。使用大量组合,而不是继承具体实现以实现良好的重用。

For web apps - a good app server can help with session variable replication and a good load balancer works. In someways, using this via a REST (or your web service method of choice) is a an easy way to write a multi-threaded service. But it will have performance implications. Again depends on your problem domain.

对于 Web 应用程序 - 一个好的应用程序服务器可以帮助会话变量复制和一个好的负载平衡器工作。在某些方面,通过 REST(或您选择的 Web 服务方法)使用它是编写多线程服务的一种简单方法。但它会对性能产生影响。再次取决于您的问题域。

Messages serves (say jms) are often used to introduce a loose coupling between different services. With a decent message server, you can do a lot of msg routing (again apache camel or similar does a great job) ie. say a sticky consumer against a cluster of jms producers etc. that can also allow for good failover. Jms queue's etc can provide a simple way to distribute cmds in the cluster, indept of master / slave. (again it depends on if you are doing LOB or writing a server / product from scratch).

消息服务(例如 jms)通常用于在不同服务之间引入松散耦合。使用体面的消息服务器,您可以执行很多 msg 路由(同样,apache camel 或类似的工作做得很好)即。比如说一个粘性消费者反对一组 jms 生产者等,这也可以允许良好的故障转移。Jms 队列等可以提供一种简单的方法来在集群中分发 cmds,独立于主/从。(同样,这取决于您是在执行 LOB 还是从头开始编写服务器/产品)。

(if i get time later I will tidy up, maybe put some more detail in fix spelling grammar etc)

(如果我以后有时间我会整理一下,也许会在修复拼写语法等方面提供更多细节)

回答by Esko Luontola

One approach to creating reliable software is crash-only software:

创建可靠软件的一种方法是仅崩溃的软件

Crash-only software is software that crashes safely and recovers quickly. The only way to stop it is to crash it, and the only way to start it is to recover. A crash-only system is composed of crash-only components which communicate with retryable requests; faults are handled by crashing and restarting the faulty component and retrying any requests which have timed out. The resulting system is often more robust and reliable because crash recovery is a first-class citizen in the development process, rather than an afterthought, and you no longer need the extra code (and associated interfaces and bugs) for explicit shutdown. All software ought to be able to crash safely and recover quickly, but crash-only software must have these qualities, or their lack becomes quickly evident.

仅崩溃软件是安全崩溃并快速恢复的软件。阻止它的唯一方法是崩溃,而启动它的唯一方法是恢复。仅崩溃系统由与可重试请求通信的仅崩溃组件组成;通过崩溃并重新启动故障组件并重试任何已超时的请求来处理故障。由此产生的系统通常更加健壮和可靠,因为崩溃恢复是开发过程中的一等公民,而不是事后的想法,并且您不再需要额外的代码(以及相关的接口和错误)来显式关闭。所有软件都应该能够安全地崩溃并快速恢复,但仅崩溃的软件必须具有这些特性,否则它们的缺陷很快就会显现出来。

回答by Pierce Hickey

I'd recommend having a read of Release it!by Michael Nygard. He outlines a number of anti-patterns that impact production systems, and patterns to help prevent one errant component from taking the whole system down. The book covers three major areas; Stability, Capacity and General Design (covering Networking, Security, Availability and Administration).

我建议您阅读一下发布它!作者:迈克尔·尼加德 他概述了许多影响生产系统的反模式,以及帮助防止一个错误组件使整个系统瘫痪的模式。本书涵盖三个主要领域;稳定性、容量和一般设计(涵盖网络、安全、可用性和管理)。

My previous workplace was bitten (at one time or another) by pretty much every single failure scenario Nygard outlines (with loss of revenue for each resulting outage). Implementing some of the techniques and patterns he suggests resulted in significantly more stable and predictable systems (and yes, the book is a little Java centric, but the principles are applicable in many contexts).

我以前的工作场所(一次或多次)几乎被 Nygard 概述的每一个故障场景所困扰(每次中断都会导致收入损失)。实施他建议的一些技术和模式,导致系统更加稳定和可预测(是的,这本书有点以 Java 为中心,但这些原则适用于许多上下文)。

回答by David Schlosnagle

Designing high availability (HA) systems is an active research and development area. If you look at ACM or IEEE, there are a ton of research papers on qualities of service (availability, reliability, scalability, etc.) and how to achieve them (loose coupling, adaptation, etc.). If you're looking more for practical applications, take a look at fault tolerant systems and middleware that is built to allow clustering, grid, or cloud like functionality.

设计高可用性 (HA) 系统是一个活跃的研究和开发领域。如果您查看 ACM 或 IEEE,就会发现大量关于服务质量(可用性、可靠性、可扩展性等)以及如何实现它们(松散耦合、自适应等)的研究论文。如果您正在寻找更多实际应用程序,请查看容错系统和中间件,这些系统和中间件旨在支持集群、网格或类似云的功能。

Replication and load balancing (a.k.a. reverse proxy) are some of the most common patterns of achieving HA systems, and can often be done without making code changes to the underlying software assuming it is not too tightly coupled. Even a lot of the recent cloud offerings are achieved essentially through replication and load balancing, although they tend to build in elasticity to handle wide ranges of system demand.

复制和负载平衡(又名反向代理)是实现 HA 系统的一些最常见模式,并且通常可以在不更改底层软件代码的情况下完成,假设它不是太紧密耦合。甚至许多最近的云产品基本上都是通过复制和负载平衡来实现的,尽管它们倾向于建立弹性来处理广泛的系统需求。

Making software components stateless eases the burden of replication, as the state itself doesn't need to be replicated along with the software components. Statelessness is one of the major reasons that HTTP scales so well, but it often requires applications to add on their own state (e.g. sessions) which then needs to be replicated.

使软件组件无状态减轻了复制的负担,因为状态本身不需要与软件组件一起复制。无状态是 HTTP 扩展性如此好的主要原因之一,但它通常需要应用程序添加自己的状态(例如会话),然后需要复制这些状态。

Therefore, it is easier to make loosely coupled systems highly available than tightly coupled systems. Since reliability of the system's components determine the overall system reliability, components that are unreliable may need to be replaced (hardware failures, software bugs, etc). Allowing for dynamic adaptation at runtime lets these failed components to be replaced without affecting the availability of the overall system. Loose coupling is another reason for the use of reliable messaging systems where the sender and receiver do not have to be available at the same time, but the system itself is still available.

因此,松耦合系统比紧密耦合系统更容易实现高可用性。由于系统组件的可靠性决定了整个系统的可靠性,不可靠的组件可能需要更换(硬件故障、软件错误等)。允许在运行时进行动态调整,可以在不影响整个系统可用性的情况下更换这些故障组件。松散耦合是使用可靠消息系统的另一个原因,其中发送方和接收方不必同时可用,但系统本身仍然可用。

回答by Pawe? Polewicz

Wrong:

错误的:

...and there will be a storage server

...还有一个存储服务器

Good:

好的:

...and there will be a farm of (multiple) storage servers with (multiple) load balancers in front of them

...并且会有一个(多个)存储服务器群,它们前面有(多个)负载平衡器

  • Put load balancers in front of everything. For now You can have 4 backends, but in the future You can have 400 of them, so it's wise to only manage it on the LB, not all the apps that use the backend.

  • Use multiple levels of cache.

  • Look for popular solutions on speeding thigs up (memcached for example).

  • If You are going to renew a system, do it part-by-part, in multiple small steps. If You do it in one big step (turn off the old one, turn on the new one and pray it will work) it will most probably fail.

  • Use DNS names for stuff, f.e. storage-lb.servicenameresolves to addresses of all storage loadbalancers. If You want to add one, just modify the dns, all the services will start using it automaticly.

  • Keep It Simple. The more systems You depend on, the more Your service will suffer from it.

  • 将负载平衡器放在一切前面。现在你可以有 4 个后端,但将来你可以有 400 个,所以明智的做法是只在 LB 上管理它,而不是所有使用后端的应用程序。

  • 使用多级缓存。

  • 寻找有关加速 thigs 的流行解决方案(例如 memcached)。

  • 如果您要更新系统,请分多个小步骤逐个进行。如果您迈出一大步(关闭旧的,打开新的并祈祷它会起作用),它很可能会失败。

  • 使用 DNS 名称作为内容,festorage-lb.servicename解析为所有存储负载均衡器的地址。如果你想加一个,只需要修改dns,所有的服务都会自动开始使用它。

  • 把事情简单化。您依赖的系统越多,您的服务受到的影响就越大。

回答by Ravindra babu

I am interpreting "High Availability" as "Zero Downtime"`, which can be implemented as per other SE question :

我将“高可用性”解释为“零停机时间”`,可以按照其他 SE 问题实施:

Zero downtime deployment for Java apps

Java 应用程序的零停机部署

  1. A/B switch: ( Rolling upgrade + Fallback mechanism )
  2. Parallel deployment – Apache Tomcat: ( For web applications only)
  3. Delayed port binding
  4. Advanced port binding
  1. A/B切换:(滚动升级+回退机制)
  2. 并行部署 – Apache Tomcat:(仅适用于 Web 应用程序)
  3. 延迟端口绑定
  4. 高级端口绑定

I will use some of those concepts to come up with design patterns for High Availability system from software perspective, which compliments above approaches.

我将使用其中一些概念从软件的角度提出高可用性系统的设计模式,这是对上述方法的补充。

Patterns to use:

使用的模式:

Proxy/Factory:

代理/工厂

Have a proxy object and proxy will decide where to redirect the requests. Assume that you have Version 1 & Version 2 of software. If clients are connecting with old protocol, redirect them to Version 1 software. New Clients can connect to Version 2 directly. Proxy can have either Factory method or AbstractFactory to render new version of software.

有一个代理对象,代理将决定将请求重定向到哪里。假设您有版本 1 和版本 2 的软件。如果客户端使用旧协议连接,请将它们重定向到版本 1 软件。新客户端可以直接连接到版本 2。代理可以使用 Factory 方法或 AbstractFactory 来呈现新版本的软件。

Strategy

战略

You can change algorithm at run time by selecting one algorithm from a family of algorithms. If you take airlines example, you can switch between DiscountFare and NormalFare algorithms during Non-peak and Peak traffic months.

您可以在运行时通过从一系列算法中选择一个算法来更改算法。如果以航空公司为例,您可以在非高峰和高峰交通月份在 DiscountFare 和 NormalFare 算法之间切换。

Decorator:

装饰

You can change the behaviour of object at run time. Add a new class and decorate additional responsibility.

您可以在运行时更改对象的行为。添加一个新类并装饰额外的职责。

Adapter:

适配器

Useful when you change interface or contract between version 1 and version 2. Adapter will respond to both old & new client requests appropriately.

当您在版本 1 和版本 2 之间更改接口或合同时很有用。适配器将适当地响应新旧客户端请求。

General guidelines:

一般准则:

  1. Loose coupling between objects
  2. Follow S.O.L.I.Dprinciples in your application
  1. 对象之间的松散耦合
  2. 在您的应用程序中遵循SOLID原则

Refer to sourcemakingwebsite articles for above patterns for better understanding.

以上模式请参考源码制作网站文章以获得更好的理解。

What not to use:

什么不应该使用:

Apart from design patterns, you have to take some precautions to achieve zero downtime for your application.

除了设计模式之外,您还必须采取一些预防措施来实现应用程序的零停机时间。

  1. Don't introduce single point of failures in your system.
  2. Use distributed caches(e.g. Terracotta) /locks sparingly.
  3. Remove hard coupling between services. Remove tight coupling between the services by using messaging bus/ frameworks ( JMS, ActiveMQ etc.)
  1. 不要在您的系统中引入单点故障。
  2. 谨慎使用分布式缓存(例如 Terracotta)/锁。
  3. 去除服务之间的硬耦合。通过使用消息总线/框架(JMS、ActiveMQ 等)消除服务之间的紧密耦合

回答by Emil H

As I understand it, you're looking for specific patterns to use in java applications part of an HA architecture. Of course there's a numerous number of patterns and best practices that can be used, but these aren't really "HA patterns". Rather, they're good ideas that can be utilized in manys contexts.

据我了解,您正在寻找在 HA 架构的 Java 应用程序部分中使用的特定模式。当然,有许多模式和最佳实践可供使用,但这些都不是真正的“HA 模式”。相反,它们是可以在许多情况下使用的好主意。

I guess what I'm trying to say is this: A high availability architecture is composed of numerous small parts. If we pick one of these small parts and examine them, we'll probably find that there's no magical HA attributes to this small component. If we examine all the other components we'll find the same thing. It's when they're combined in an intelligent manner thay the become an HA application.

我想我想说的是:高可用性架构由许多小部分组成。如果我们选择这些小部件中的一个并检查它们,我们可能会发现这个小部件没有神奇的 HA 属性。如果我们检查所有其他组件,我们会发现同样的事情。只有当它们以智能方式组合时,才能成为 HA 应用程序。

An HA application is an application where you plan for the worst from the beginning. If you ever think in terms of "This component is so stable that we don't need additional redundancy for it" it's probably not a HA architecture. After all, it's easy to handle the problem scenarios that you foresee. It's the one that surprises you that brings down the system.

HA 应用程序是您从一开始就做好最坏打算的应用程序。如果您曾经想过“这个组件非常稳定,我们不需要额外的冗余”,那么它可能不是 HA 架构。毕竟,处理您预见的问题场景很容易。这是一个让你感到惊讶的系统。

Despite all this, there are patterns that are especially useful in HA contexts. Many of them are documented in the classic book "Patterns of Enterprise Application Architecture"by Martin Fowler.

尽管如此,还是有一些模式在 HA 上下文中特别有用。其中许多都记录在Martin Fowler的经典著作“企业应用程序架构模式”中

回答by mylesmg

High availability is more about hardware availability and redundancy than about coding conventions. There are a couple patterns that I would use in almost every HA case: I would choose the singleton patternfor my database object and use the factory patternto create the singleton. The factory can then have the logic to handle availability issues with the database (which is where most availability problems happen). For instance, if the Master is down, then connect to a second Master for both reads and writes until the Master is back. I don't know if these are the most leveraged patterns, but they are the most leveraged in my code.

高可用性更多地是关于硬件可用性和冗余而不是编码约定。在几乎所有 HA 案例中,我都会使用几种模式:我会为我的数据库对象选择单例模式,并使用工厂模式来创建单例模式。然后,工厂可以拥有处理数据库可用性问题的逻辑(这是大多数可用性问题发生的地方)。例如,如果 Master 已关闭,则连接到第二个 Master 进行读取和写入,直到 Master 恢复。我不知道这些是否是最常用的模式,但它们在我的代码中是最常用的。

Of course this logic could be handled in a __construct method, but a factory pattern will allow you to better control your code and the decision-making logic of how to handle database connectivity issues. A factory will also allow you to better handle the singleton pattern.

当然,这个逻辑可以在 __construct 方法中处理,但是工厂模式将允许你更好地控制你的代码和如何处理数据库连接问题的决策逻辑。工厂还可以让您更好地处理单例模式。

I would absolutely avoid the decorator pattern, and the observer pattern. They both create complexity in your code that makes it difficult to maintain. Their are cases where these are the best choice for your needs, but most of the time they are not.

我绝对会避免装饰者模式观察者模式。它们都会在您的代码中造成复杂性,使其难以维护。在他们的情况下,这些是满足您需求的最佳选择,但大多数情况下它们不是。