java 如何为 Web 应用程序实现缓存

Question

提问by Rajat Gupta

What are the different ways to cache a web application data, developed using Java and NoSQL database? Databases also provide caching, are they, the only & always the best option to go with, for caching?

缓存使用 Java 和 NoSQL 数据库开发的 Web 应用程序数据有哪些不同的方法？数据库还提供缓存，它们是唯一且始终是缓存的最佳选择吗？

How else can I cache my data of users on the application. Application contains very user specific data like in a social network. Are there some simple thumb rules of what type of things should be cached?

我还能如何在应用程序上缓存我的用户数据。应用程序包含非常用户特定的数据，例如在社交网络中。是否有一些简单的拇指规则应该缓存什么类型的东西？

Can I also cache my data on the application server using Java ?

我还可以使用 Java 在应用程序服务器上缓存我的数据吗？

Answer 1

回答by Tom Anderson

If you want a rule of thumb, here's what Michael Hymanson (not thatMichael Hymanson) said:

如果你想要一个经验法则，这是迈克尔Hyman逊（不是那个迈克尔Hyman逊）所说的：

The First Rule of Program Optimization: Don't do it.
The Second Rule of Program Optimization (for experts only!): Don't do it yet.

程序优化的第一条规则：不要这样做。
程序优化的第二条规则（仅供专家使用！）：先不要这样做。

The ancient tradition is that you don't optimise until you've profiled - that is, until you have hard evidence as to what actually needs to be optimised. Cacheing is a kind of optimisation; it is very likely to be important for your app, but until you are able to put your app under load and look at what objects are taking a long time to obtain (loading from the database or whatever), you won't know what needs cacheing. It really doesn't matter how smart you are, or what advice you get here - until you do that, you will not knowwhat needs to be cached.

古老的传统是，在您进行概要分析之前，您不会进行优化 - 也就是说，直到您有确凿的证据表明实际需要优化什么。缓存是一种优化；它很可能对您的应用程序很重要，但是直到您能够将您的应用程序置于加载状态并查看哪些对象需要很长时间才能获取（从数据库加载或其他）之前，您将不知道需要什么缓存。不管你有多聪明，或者你在这里得到什么建议 - 除非你这样做，否则你将不知道需要缓存什么。

As for things you can cache, it's anything, but i suppose you can classify it into three groups:

至于你可以缓存的东西，它是任何东西，但我想你可以把它分为三组：

Things that have come fresh from the database. These are easy to cache, because at the point at which you go to the database, you have the identifying information you'd need for a cache key (primary key, query parameters, etc). By cacheing them, you save the time taken to get them from the database - this involves IO, so it is likely to be quite large.
Things that have been produced by computation in the domain model (news feeds in a social app, perhaps). These may be trickier to cache, because more contextual information goes into producing them; you might have to refactor your code to create a single point where the required information is all to hand, so you can apply cacheing to it. Or you might find that this exists already. Cacheing these will save all the database access needed to obtain the information that goes into making them, as well as all the computation; the time taken for computation may or may not be a significant addition to the time taken for IO. Invalidating cached things of this kind is likely to be much harder than pure database objects.
Things that are being sent to the browser - pages, or fragments of pages. These can be quite easy to cache, because in a properly-designed application, they're uniquely identified by either the URL, or the combination of URL and user. Cacheing these will save all the computation in your app; it can even avoid servicing requests, because it can be done by a reverse proxysitting in front of your app server. Two problems. Firstly, it uses a huge amount of memory: the page rendered from a few kilobytes of objects could be tens or hundreds of kilobytes in size (my Facebook homepage is 50 kB). That means you have to save a vast amount of computation to make it a better deal than cacheing at the database or domain model layers, and there just isn't that much computation between the domain model and the HTML in a sensibly-designed application. Secondly, invalidation is even harder than in the domain model, and is likely to happen prohibitively often - anything which changes the page or the fragment needs to invalidate the cache.

来自数据库的新鲜事物。这些很容易缓存，因为在您访问数据库时，您拥有缓存键（主键、查询参数等）所需的识别信息。通过缓存它们，您可以节省从数据库中获取它们所花费的时间 - 这涉及 IO，因此它可能非常大。
通过域模型中的计算产生的东西（可能是社交应用程序中的新闻提要）。这些缓存可能更棘手，因为更多的上下文信息用于生成它们；您可能需要重构您的代码以创建一个可以提供所需信息的单点，以便您可以对其应用缓存。或者您可能会发现这已经存在。缓存这些将保存获取制作它们的信息所需的所有数据库访问，以及所有计算；计算所需的时间可能会或可能不会显着增加 IO 所需的时间。使这种缓存的内容失效可能比纯数据库对象要困难得多。
发送到浏览器的内容 - 页面或页面片段。这些可以很容易缓存，因为在设计合理的应用程序中，它们由 URL 或 URL 和用户的组合唯一标识。缓存这些将保存应用程序中的所有计算；它甚至可以避免服务请求，因为它可以通过反向代理完成坐在你的应用服务器前面。两个问题。首先，它使用大量内存：从几千字节的对象呈现的页面大小可能有几十或几百 KB（我的 Facebook 主页是 50 kB）。这意味着您必须节省大量计算以使其比在数据库或域模型层缓存更好，并且在设计合理的应用程序中域模型和 HTML 之间没有那么多计算。其次，失效比域模型更难，并且可能经常发生 - 任何更改页面或片段的事情都需要使缓存失效。

Finally, the actual mechanism: start with something simple and in-process, like a map with limited size and a least-recently-used eviction policy. That's simple but effective. Something out-of-process like EHCache is more complicated, but has two advantages: you can share caches between multiple processes (helpful if you have a cluster, which you probably will at some point), and you can store data where the garbage collector won't see it, which might save some CPU time (might - this is too big a subject to get into here).

最后，实际机制：从一些简单的进程开始，比如大小有限的地图和最近最少使用的驱逐策略。这很简单但很有效。像 EHCache 这样的进程外的东西更复杂，但有两个优点：你可以在多个进程之间共享缓存（如果你有一个集群会很有帮助，你可能会在某个时候这样做），并且你可以将数据存储在垃圾收集器的位置不会看到它，这可能会节省一些 CPU 时间（可能 - 这是一个太大的话题，无法进入）。

But i reiterate my first point: don't cache until you know what needs to be cached, and once you do, be mindful of the limitations on the benefits of cacheing, and try to keep your cacheing strategy as simple as possible (but no simpler, of course).

但我重申我的第一点：在你知道需要缓存什么之前不要缓存，一旦你这样做，请注意缓存好处的局限性，并尽量保持你的缓存策略尽可能简单（但不要当然更简单）。

Answer 2

回答by orangepips

I'll assume you're building a relatively typical web application that:

我假设您正在构建一个相对典型的 Web 应用程序：

has a single server used for persistence
multiple web servers
ties authenticated users to a single server via sticky sessions through a load balancer

有一个用于持久化的服务器
多个网络服务器
通过负载均衡器通过粘性会话将经过身份验证的用户绑定到单个服务器

Now, with that stated to answer so of your questions. Most persistence, database or NoSQL, likely have some sort of caching built in such that if you execute the same simple query repeatedly (e.g. retrieval by primary key) it's able to cache the result. However, the more complex the query, the less likely persistence can perform caching on it. In addition, if there's only one server for persistence (i.e. no sharding, or write master/read slaves) it quickly becomes the bottleneck. So the application level caching you want to do usually should occur on the web servers to reduce load on the database.

现在，这样说来回答你的问题。大多数持久性、数据库或 NoSQL 可能都内置了某种缓存，这样如果您重复执行相同的简单查询（例如，通过主键检索），它就能够缓存结果。但是，查询越复杂，持久性对其执行缓存的可能性就越小。此外，如果只有一台服务器用于持久化（即没有分片，或写主/读从），它很快就会成为瓶颈。所以你想要做的应用级缓存通常应该发生在 web 服务器上，以减少数据库的负载。

As far as what should be cached, the heuristic is items frequently accessed and/or expensive to generate (in terms of database/web server processing/memory). Typical candidates are the home page and any other landing page of a site - often the best approach for these is generating a static file and serving that. The next pieces depend on your application, but typically the most effective strategy is caching as close to the final result as possible - often the HTML being served. For your social network this might be a list of featured updates or some such.

至于应该缓存的内容，启发式是经常访问和/或生成成本高的项目（就数据库/Web 服务器处理/内存而言）。典型的候选者是网站的主页和任何其他登陆页面——通常最好的方法是生成一个静态文件并提供服务。接下来的部分取决于您的应用程序，但通常最有效的策略是缓存尽可能接近最终结果 - 通常是正在提供的 HTML。对于您的社交网络，这可能是精选更新列表或类似内容。

As far as user sessions are concerned, these are definitely a good candidate for caching. In this case you can probably get a lot of mileage out of judicious use of the web server's session scope (assuming a JSP server). This data lives in memory and is a good place to keep of user specific information shown once a user authenticates on every page (e.g. first and last name).

就用户会话而言，这些绝对是缓存的好选择。在这种情况下，您可能会从明智地使用 Web 服务器的会话范围（假设是 JSP 服务器）中获得很多好处。该数据存在于内存中，是保存用户在每个页面上进行身份验证后显示的用户特定信息（例如名字和姓氏）的好地方。

Now the final thing to consider is dealing with cache invalidation and really is the hard part of all this (naming stuff is the other hard thing in computer science). In this case using something like memcached or ehcache as others have mentioned is the right approach. ehcache can easily run in process with your java application and does a good job of expiring things, with policies for least recently used and least frequently used, and allowing you to use both memory and disk for caching. What you'll need to think about is the situations where you need to expire something form the cache ahead of this schedule because data's changed. In this case you need to work through those dependencies in your application's architecture so that it read/writes to the cache as appropriate.

现在要考虑的最后一件事是处理缓存失效，这确实是所有这一切的难点（命名东西是计算机科学中的另一个难点）。在这种情况下，使用其他人提到的 memcached 或 ehcache 之类的东西是正确的方法。ehcache 可以轻松地与您的 Java 应用程序一起运行，并且可以很好地处理过期内容，具有最近最少使用和最不常用的策略，并允许您使用内存和磁盘进行缓存。您需要考虑的是，由于数据已更改，您需要在此计划之前使缓存中的某些内容过期的情况。在这种情况下，您需要处理应用程序架构中的这些依赖项，以便它根据需要读取/写入缓存。

java 如何为 Web 应用程序实现缓存

提问by Rajat Gupta

回答by Tom Anderson

回答by orangepips

相关推荐

最近更新

标签

java 如何为 Web 应用程序实现缓存

提问by Rajat Gupta

回答by Tom Anderson

回答by orangepips

相关推荐

java 跨不同线程的 ThreadLocal 值访问

java 使用 struts 2 标签检索 ArrayList 的元素而不使用 s:iterate

java 休眠延迟加载不起作用

java 从 spring 异常处理程序读取 httprequest 内容

相关推荐

最近更新

标签