java 使用套接字通过java获取网页
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/7500342/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
using sockets to fetch a webpage with java
提问by vdegenne
I'd like to fetch a webpage, just fetching the data (not parsing or rendering anything), just catch the data returned after a http request.
我想获取一个网页,只获取数据(不解析或呈现任何内容),只获取 http 请求后返回的数据。
I'm trying to do this using the high-level Class Socketof the JavaRuntime Library.
我正在尝试使用JavaRuntime 库的高级类套接字来做到这一点。
I wonder if this is possible since I'm not at ease figuring out the beneath layer used for this two-point communication or I don't know if the trouble is coming from my own system.
我想知道这是否可能,因为我不放心弄清楚用于这种两点通信的底层,或者我不知道问题是否来自我自己的系统。
.
.
Here's what my code is doing:
这是我的代码正在做的事情:
1)setting the socket.
1)设置插座。
this.socket = new Socket( "www.example.com", 80 );
2)setting the appropriate streams used for this communication.
2)设置用于此通信的适当流。
this.out = new PrintWriter( socket.getOutputStream(), true);
this.in = new BufferedReader( new InputStreamReader( socket.getInputStream() ) );
3)requesting the page (and this is where I'm not sure it's alright to do like this).
3)请求页面(这是我不确定这样做是否可以的地方)。
String query = "";
query += "GET / HTTP/1.1\r\n";
query += "Host: www.example.com\r\n";
...
query += "\r\n";
this.out.print(query);
4)reading the result (nothing in my case).
4)读取结果(在我的情况下没有)。
System.out.print( this.in.readLine() );
5)closing socket and streams.
5)关闭套接字和流。
回答by FloppyDisk
If you're on a *nix system, look into CURL, which allows you to retrieve information off the internet using the command line. More lightweight than a Java socket connection.
如果您使用的是 *nix 系统,请查看CURL,它允许您使用命令行从互联网上检索信息。比 Java 套接字连接更轻量级。
If you want to use Java, and are just retrieving information from a webpage, check out the Java URL library (java.net.URL). Some sample Java code:
如果您想使用 Java,并且只是从网页中检索信息,请查看 Java URL 库 ( java.net.URL)。一些示例 Java 代码:
URL ur = new URL("www.google.com");
URLConnection conn = ur.openConnection();
InputStream is = conn.getInputStream();
String foo = new Scanner(is).useDelimiter("\A").next();
System.out.println(foo);
That'll grab the specified URL, grab the data (html in this case) and spit it out to the console. Might have to tweak the delimiter abit, but this will work with most network endpoints sending data.
这将获取指定的 URL,获取数据(在本例中为 html)并将其输出到控制台。可能需要调整分隔符 abit,但这适用于大多数发送数据的网络端点。
回答by Karthik Ramachandran
Your code looks pretty close. Your GET request is probably malformed in some way. Try this: open up a telnet client and connect to a web server. Paste in the GET request as you believe it should work. See if that returns anything. If it doesn't it means there is a problem with the GET request. The easiest thing to do that point would be write a program that listens on a socket (more or less the inverse of what you're doing) and point a web browser to localhost:[correct port] and see what the web browser sends you. Use that as your template for the GET request.
你的代码看起来很接近。您的 GET 请求可能在某些方面格式不正确。试试这个:打开一个 telnet 客户端并连接到 Web 服务器。粘贴 GET 请求,因为它应该可以工作。看看它是否返回任何东西。如果不是,则表示 GET 请求有问题。做到这一点最简单的方法是编写一个侦听套接字的程序(或多或少与您正在执行的操作相反)并将 Web 浏览器指向 localhost:[正确端口] 并查看 Web 浏览器向您发送的内容. 将其用作 GET 请求的模板。
Alternatively you could try and piece it together from the HTTP specification.
或者,您可以尝试将其从 HTTP 规范中拼凑起来。
回答by BumpBitcoin
I had to add the full URL to the GET parameter. To make it work. Although I see you can specify HOST also if you want.
我必须将完整的 URL 添加到 GET 参数。让它发挥作用。虽然我看到您也可以根据需要指定 HOST。
Socket socket = new Socket("youtube.com",80);
PrintWriter out = new PrintWriter(new BufferedWriter(new
OutputStreamWriter(socket.getOutputStream())));
out.println("GET http://www.youtube.com/yts/img/favicon_48-vflVjB_Qk.png
HTTP/1.0");
out.println();
out.flush();
回答by Joshua
Yes, it is possible. You just need to figure out the protocol. You are close.
对的,这是可能的。你只需要弄清楚协议。你很近。
I would create a simple server socket that prints out what it gets in. You can then use your browser to connect to the socket using a url like: http://localhost:8080. Then use your client socket to mimic the HTTP protocol from the browser.
我将创建一个简单的服务器套接字,打印出它进入的内容。然后您可以使用浏览器使用如下网址连接到套接字:http://localhost:8080。然后使用您的客户端套接字从浏览器模仿 HTTP 协议。
回答by atrain
Not sure why you're going lower down than URLConnection
- its designed to do what you want to do: http://download.oracle.com/javase/tutorial/networking/urls/readingWriting.html.
不知道你为什么要低于URLConnection
- 它旨在做你想做的事情:http: //download.oracle.com/javase/tutorial/networking/urls/readingWriting.html。
The Java Tutorialon Sockets even says: "URLs and URLConnections provide a relatively high-level mechanism for accessing resources on the Internet. Sometimes your programs require lower-level network communication, for example, when you want to write a client-server application." Since you're not going lower than HTTP, I'm not sure what the point is of using a Socket.
在Java教程上接甚至说:“URL和的URLConnections提供访问互联网上的资源相对高级别机制有时候你的程序需要低级别的网络通信,例如,当你想要写一个客户端-服务器应用程序。 ” 由于您不会低于 HTTP,因此我不确定使用 Socket 的意义何在。