java 基于 VPN/代理的 JSoup
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/13288471/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
JSoup over VPN/proxy
提问by Peck3277
I'm trying to use JSoup to scrape some pages that are on a staging server. To view the pages on the staging server with a browser I need to be connected to a VPN.
我正在尝试使用 JSoup 来抓取临时服务器上的一些页面。要使用浏览器查看临时服务器上的页面,我需要连接到 VPN。
I am connected to the VPN but when I use JSoup to try to scrape the page it keeps timing out. How can I make my program use the VPN connection. Or is there something else here I'm not thinking of?
我已连接到 VPN,但是当我使用 JSoup 尝试抓取页面时,它一直超时。如何让我的程序使用 VPN 连接。或者这里还有什么我没有想到的吗?
Note: I also make use of HttpClient in another part of my program. Is there a way I can set my program to connect to the VPN/Proxy once the program initialises so both JSoup and HttpClient use the VPN/Proxy.
注意:我还在程序的另一部分使用了 HttpClient。有没有一种方法可以设置我的程序在程序初始化后连接到 VPN/代理,以便 JSoup 和 HttpClient 都使用 VPN/代理。
Thanks
谢谢
回答by ollo
You can set java properties for the proxy:
您可以为代理设置 java 属性:
// if you use https, set it here too
System.setProperty("http.proxyHost", "<proxyip>"); // set proxy server
System.setProperty("http.proxyPort", "<proxyport>"); // set proxy port
Document doc = Jsoup.connect("http://your.url.here").get(); // Jsoup now connects via proxy
or download the website into a string and parse it then:
或者将网站下载到一个字符串中然后解析它:
final URL website = new URL("http://your.url.here"); // The website you want to connect
// -- Setup connection through proxy
Proxy proxy = new Proxy(Proxy.Type.HTTP, new InetSocketAddress("<proxyserver>", 1234)); // set proxy server and port
HttpURLConnection httpUrlConnetion = (HttpURLConnection) website.openConnection(proxy);
httpUrlConnetion.connect();
// -- Download the website into a buffer
BufferedReader br = new BufferedReader(new InputStreamReader(httpUrlConnetion.getInputStream()));
StringBuilder buffer = new StringBuilder();
String str;
while( (str = br.readLine()) != null )
{
buffer.append(str);
}
// -- Parse the buffer with Jsoup
Document doc = Jsoup.parse(buffer.toString());
You can use HttpClient
for this solution as well.
您也可以HttpClient
用于此解决方案。
回答by Kees de Kooter
As of version 1.9 you can set it on the connection: https://jsoup.org/apidocs/org/jsoup/Connection.html#proxy-java.net.Proxy-
从 1.9 版开始,您可以在连接上设置它:https: //jsoup.org/apidocs/org/jsoup/Connection.html#proxy-java.net.Proxy-
JSoup.connect("http://your.url.here").proxy("<proxy-host>", <proxy-port>).get();
回答by rtyusolf
To add on for ollo if your proxy needs username/password authentication.
如果您的代理需要用户名/密码身份验证,则添加 ollo。
final String authUser = <username>;
final String authPassword = <password>;
Authenticator.setDefault(
new Authenticator() {
public PasswordAuthentication getPasswordAuthentication() {
return new PasswordAuthentication(
authUser, authPassword.toCharArray());
}
}
);
System.setProperty("http.proxyHost", <yourproxyhost>);
System.setProperty("http.proxyPort", <yourproxyport>);
System.setProperty("http.proxyUser", authUser);
System.setProperty("http.proxyPassword", authPassword);
Document doc = Jsoup.connect("http://your.url.here").get();