如何从 Android 中的 HTML 链接获取页面的 HTML 源代码?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/2423498/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to get the HTML source of a page from a HTML link in Android?
提问by Praveen
I'm working on an application that needs to get the source of a web page from a link, and then parse the html from that page.
我正在开发一个应用程序,它需要从链接中获取网页的源代码,然后从该页面解析 html。
Could you give me some examples, or starting points where to look to start writing such an app?
你能给我一些例子,或者从哪里开始编写这样一个应用程序的起点?
回答by Mark B
You can use HttpClientto perform an HTTP GET and retrieve the HTML response, something like this:
您可以使用HttpClient执行 HTTP GET 并检索 HTML 响应,如下所示:
HttpClient client = new DefaultHttpClient();
HttpGet request = new HttpGet(url);
HttpResponse response = client.execute(request);
String html = "";
InputStream in = response.getEntity().getContent();
BufferedReader reader = new BufferedReader(new InputStreamReader(in));
StringBuilder str = new StringBuilder();
String line = null;
while((line = reader.readLine()) != null)
{
str.append(line);
}
in.close();
html = str.toString();
回答by Paul Spiesberger
I would suggest jsoup.
我会建议jsoup。
According to their website:
根据他们的网站:
Fetch the Wikipedia homepage, parse it to a DOM, and select the headlines from the In the news section into a list of Elements (online sample):
获取 Wikipedia 主页,将其解析为 DOM,然后将 In the news 部分的标题选择为 Elements 列表(在线示例):
Document doc = Jsoup.connect("http://en.wikipedia.org/").get();
Elements newsHeadlines = doc.select("#mp-itn b a");
Getting started:
入门:
回答by Colin White
This question is a bit old, but I figured I should post my answer now that DefaultHttpClient
, HttpGet
, etc. are deprecated. This function should get and return HTML, given a URL.
这个问题是有点老了,但我想我现在应该张贴我的答案DefaultHttpClient
,HttpGet
等已被弃用。这个函数应该获取并返回 HTML,给定一个 URL。
public static String getHtml(String url) throws IOException {
// Build and set timeout values for the request.
URLConnection connection = (new URL(url)).openConnection();
connection.setConnectTimeout(5000);
connection.setReadTimeout(5000);
connection.connect();
// Read and store the result line by line then return the entire string.
InputStream in = connection.getInputStream();
BufferedReader reader = new BufferedReader(new InputStreamReader(in));
StringBuilder html = new StringBuilder();
for (String line; (line = reader.readLine()) != null; ) {
html.append(line);
}
in.close();
return html.toString();
}
回答by Julian
public class RetrieveSiteData extends AsyncTask<String, Void, String> {
@Override
protected String doInBackground(String... urls) {
StringBuilder builder = new StringBuilder(100000);
for (String url : urls) {
DefaultHttpClient client = new DefaultHttpClient();
HttpGet httpGet = new HttpGet(url);
try {
HttpResponse execute = client.execute(httpGet);
InputStream content = execute.getEntity().getContent();
BufferedReader buffer = new BufferedReader(new InputStreamReader(content));
String s = "";
while ((s = buffer.readLine()) != null) {
builder.append(s);
}
} catch (Exception e) {
e.printStackTrace();
}
}
return builder.toString();
}
@Override
protected void onPostExecute(String result) {
}
}
回答by Jawad Zeb
Call it like
像这样称呼
new RetrieveFeedTask(new OnTaskFinished()
{
@Override
public void onFeedRetrieved(String feeds)
{
//do whatever you want to do with the feeds
}
}).execute("http://enterurlhere.com");
RetrieveFeedTask.class
检索FeedTask.class
class RetrieveFeedTask extends AsyncTask<String, Void, String>
{
String HTML_response= "";
OnTaskFinished onOurTaskFinished;
public RetrieveFeedTask(OnTaskFinished onTaskFinished)
{
onOurTaskFinished = onTaskFinished;
}
@Override
protected void onPreExecute()
{
super.onPreExecute();
}
@Override
protected String doInBackground(String... urls)
{
try
{
URL url = new URL(urls[0]); // enter your url here which to download
URLConnection conn = url.openConnection();
// open the stream and put it into BufferedReader
BufferedReader br = new BufferedReader(new InputStreamReader(conn.getInputStream()));
String inputLine;
while ((inputLine = br.readLine()) != null)
{
// System.out.println(inputLine);
HTML_response += inputLine;
}
br.close();
System.out.println("Done");
}
catch (MalformedURLException e)
{
e.printStackTrace();
}
catch (IOException e)
{
e.printStackTrace();
}
return HTML_response;
}
@Override
protected void onPostExecute(String feed)
{
onOurTaskFinished.onFeedRetrieved(feed);
}
}
OnTaskFinished.java
OnTaskFinished.java
public interface OnTaskFinished
{
public void onFeedRetrieved(String feeds);
}
回答by Anupam Rajanish
One of the other SOpost answer helped me. This doesn't read line by line; supposingly the html file had a line nullin between. As preRequisite add this dependancy from project settings "com.koushikdutta.ion:ion:2.2.1"implement this code in AsyncTASK. If you want the returned -something-to be in UI thread, pass it to a mutual interface.
其他SOpost 答案之一帮助了我。这不会逐行读取;假设 html 文件之间有一行null。作为先决条件,从项目设置“com.koushikdutta.ion:ion:2.2.1”中添加此依赖项,在AsyncTASK 中实现此代码。如果您希望返回的-something-在 UI 线程中,请将其传递给一个相互接口。
Ion.with(getApplicationContext()). load("https://google.com/hashbrowns") .asString() .setCallback(new FutureCallback<String>() { @Override public void onCompleted(Exception e, String result) { //int s = result.lastIndexOf("user_id")+9; // String st = result.substring(s,s+5); // Log.e("USERID",st); //something } });
Ion.with(getApplicationContext()). load("https://google.com/hashbrowns") .asString() .setCallback(new FutureCallback<String>() { @Override public void onCompleted(Exception e, String result) { //int s = result.lastIndexOf("user_id")+9; // String st = result.substring(s,s+5); // Log.e("USERID",st); //something } });
回答by Ashique Hira Manzil
public class DownloadTask extends AsyncTask<String, Void, String> {
@Override
protected String doInBackground(String... urls) {
String result = "";
URL url;
HttpsURLConnection urlConnection = null;
try {
url = new URL(urls[0]);
urlConnection = (HttpsURLConnection) url.openConnection();
BufferedReader br = new BufferedReader(new InputStreamReader(urlConnection.getInputStream()));
String inputLine;
while ((inputLine = br.readLine()) != null)
{
// System.out.println(inputLine);
result += inputLine;
}
br.close();
return result;
} catch (Exception e) {
e.printStackTrace();
return "failed";
}
}
}
@Override
protected void onCreate(Bundle savedInstanceState) {
super.onCreate(savedInstanceState);
setContentView(R.layout.activity_main);
DownloadTask task = new DownloadTask();
String result = null;
try {
result = task.execute("https://www.example.com").get();
}catch (Exception e){
e.printStackTrace();
}
Log.i("Result", result);
}