Java Servlet 作为 HTTP 代理

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/12073147/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-31 07:28:59  来源:igfitidea点击:

Java Servlet as a HTTP Proxy

javaservletsproxyhttp-headers

提问by Yves030

I have read hundreds of SO Posts and studied several Java HTTP-Proxy Sources available... but I could not find a solution for my Problem.

我已经阅读了数百篇 SO Posts 并研究了几个可用的 Java HTTP-Proxy Sources ......但我找不到我的问题的解决方案。

I wrote a WebApp that proxies Http-Requests. The WebApp is working, but links and referrers become broken because the "Root" of the proxied page points to the root of my server and not to the path of my proxyservlet..

我写了一个代理 Http-Requests 的 WebApp。WebApp 正在运行,但链接和引用已损坏,因为代理页面的“根”指向我的服务器的根而不是我的 proxyservlet 的路径。

To make it more clear:

为了更清楚地说明:

  1. My ProxyServlet gets a Request "http://myserver.com/proxy/ProxyServlet?foo=bar"

  2. The ProxyServlet now fetches the pagecontent from ServerX (e.g. "http://original.com/test.html")

  3. The content of the page is delivered to the browser by just reading and writing from one stream to the other and copying the headers.

  4. The browser displays the page, the URL, that the browser shows is the original request ("http://myserver.com/proxy/ProxyServlet?foo=bar"), but all relative links now point to "http://myserver.com/XXX.html" instead of "http://myserver.com/proxy/ProxyServlet/XXX.html"

  1. 我的 ProxyServlet 收到一个请求“ http://myserver.com/proxy/ProxyServlet?foo=bar

  2. ProxyServlet 现在从 ServerX 获取页面内容(例如“ http://original.com/test.html”)

  3. 页面的内容通过从一个流读取和写入另一个流并复制标头来传递到浏览器。

  4. 浏览器显示页面,即浏览器显示的 URL 是原始请求(“ http://myserver.com/proxy/ProxyServlet?foo=bar”),但所有相关链接现在都指向“ http://myserver .com/XXX.html" 而不是 " http://myserver.com/proxy/ProxyServlet/XXX.html"

Is there a response-header where I can change the "path" so that relative links correctly point to my ProxyServlet?

是否有响应头,我可以在其中更改“路径”,以便相关链接正确指向我的 ProxyServlet?

(Rewriting the page-content and replacing links would be too difficult, because the page contains relatively addressed elements such as javascript code and other active content...)

(重写页面内容和替换链接会太困难,因为页面包含相对寻址的元素,例如 javascript 代码和其他活动内容......)

(Changing the mapping for my Servlet to "/*" is also not possible... it must be accessed via this path...)

(将我的 Servlet 的映射更改为“/*”也是不可能的......它必须通过这个路径访问......)

回答by Szocske

You are inventing a "reverse proxy", and miss the "URL rewriting" feature... Off the top of my search results, here's an open source proxy servlet that does this: http://j2ep.sourceforge.net/docs/rewrite.html

您正在发明“反向代理”,而错过了“URL 重写”功能...在我的搜索结果顶部,这是一个执行此操作的开源代理 servlet:http: //j2ep.sourceforge.net/docs/重写.html

Also you should know there is probably something wrong with the system architecture if you have to do this. Dropping in a standalone proxy like Apache, nginex, Varnish should always be an option, as you will HAVE to add one (or more!) as you start scaling.

此外,如果您必须这样做,您应该知道系统架构可能有问题。加入像 Apache、nginex、Varnish 这样的独立代理应该总是一种选择,因为你必须在开始扩展时添加一个(或多个!)。

回答by dimo414

It sounds like the page you're proxying in is using absolute links, e.g. <a href="/XXX.html">which means "no matter where this link is found, look for it relative to the document root". If you have control of it, the best thing is for the proxy target to be more lenient in it's linking, and instead use <a href="XXX.html">. If you can't do that, then you need to re-write these URLs, some example code, using JSoup:

听起来您代理的页面正在使用绝对链接,例如<a href="/XXX.html">,这意味着“无论在何处找到此链接,都要相对于文档根目录查找它”。如果您可以控制它,最好的办法是让代理目标在链接时更加宽松,而是使用<a href="XXX.html">. 如果你不能这样做,那么你需要使用 JSoup 重写这些 URL,一些示例代码:

Document doc = Jsoup.parse(rawBody, getDisplayUrl());

for(Element cssALink : doc.select("link[rel=stylesheet],a[href]"))
{
    cssALink.attr("href", cssALink.absUrl("href"));
}
for(Element imgJsLink : doc.select("script[src],img[src]"))
{
    imgJsLink.attr("src", imgJsLink.absUrl("src"));
}
return doc.toString();