如何从 Android 中的 HTML 链接获取页面的 HTML 源代码?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/2423498/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-20 05:50:08  来源:igfitidea点击:

How to get the HTML source of a page from a HTML link in Android?

htmlandroidandroid-emulator

提问by Praveen

I'm working on an application that needs to get the source of a web page from a link, and then parse the html from that page.

我正在开发一个应用程序,它需要从链接中获取网页的源代码,然后从该页面解析 html。

Could you give me some examples, or starting points where to look to start writing such an app?

你能给我一些例子,或者从哪里开始编写这样一个应用程序的起点?

回答by Mark B

You can use HttpClientto perform an HTTP GET and retrieve the HTML response, something like this:

您可以使用HttpClient执行 HTTP GET 并检索 HTML 响应,如下所示:

HttpClient client = new DefaultHttpClient();
HttpGet request = new HttpGet(url);
HttpResponse response = client.execute(request);

String html = "";
InputStream in = response.getEntity().getContent();
BufferedReader reader = new BufferedReader(new InputStreamReader(in));
StringBuilder str = new StringBuilder();
String line = null;
while((line = reader.readLine()) != null)
{
    str.append(line);
}
in.close();
html = str.toString();

回答by Paul Spiesberger

I would suggest jsoup.

我会建议jsoup

According to their website:

根据他们的网站:

Fetch the Wikipedia homepage, parse it to a DOM, and select the headlines from the In the news section into a list of Elements (online sample):

获取 Wikipedia 主页,将其解析为 DOM,然后将 In the news 部分的标题选择为 Elements 列表(在线示例):

Document doc = Jsoup.connect("http://en.wikipedia.org/").get();
Elements newsHeadlines = doc.select("#mp-itn b a");

Getting started:

入门:

  1. Downloadthe jsoup jar core library
  2. Read the cookbookintroduction
  1. 下载jsoup jar核心库
  2. 阅读食谱介绍

回答by Colin White

This question is a bit old, but I figured I should post my answer now that DefaultHttpClient, HttpGet, etc. are deprecated. This function should get and return HTML, given a URL.

这个问题是有点老了,但我想我现在应该张贴我的答案DefaultHttpClientHttpGet等已被弃用。这个函数应该获取并返回 HTML,给定一个 URL。

public static String getHtml(String url) throws IOException {
    // Build and set timeout values for the request.
    URLConnection connection = (new URL(url)).openConnection();
    connection.setConnectTimeout(5000);
    connection.setReadTimeout(5000);
    connection.connect();

    // Read and store the result line by line then return the entire string.
    InputStream in = connection.getInputStream();
    BufferedReader reader = new BufferedReader(new InputStreamReader(in));
    StringBuilder html = new StringBuilder();
    for (String line; (line = reader.readLine()) != null; ) {
        html.append(line);
    }
    in.close();

    return html.toString();
}

回答by Julian

public class RetrieveSiteData extends AsyncTask<String, Void, String> {
@Override
protected String doInBackground(String... urls) {
    StringBuilder builder = new StringBuilder(100000);

    for (String url : urls) {
        DefaultHttpClient client = new DefaultHttpClient();
        HttpGet httpGet = new HttpGet(url);
        try {
            HttpResponse execute = client.execute(httpGet);
            InputStream content = execute.getEntity().getContent();

            BufferedReader buffer = new BufferedReader(new InputStreamReader(content));
            String s = "";
            while ((s = buffer.readLine()) != null) {
                builder.append(s);
            }

        } catch (Exception e) {
            e.printStackTrace();
        }
    }

    return builder.toString();
}

@Override
protected void onPostExecute(String result) {

}
}

回答by Jawad Zeb

Call it like

像这样称呼

new RetrieveFeedTask(new OnTaskFinished()
        {
            @Override
            public void onFeedRetrieved(String feeds)
            {
                //do whatever you want to do with the feeds
            }
        }).execute("http://enterurlhere.com");

RetrieveFeedTask.class

检索FeedTask.class

class RetrieveFeedTask extends AsyncTask<String, Void, String>
{
    String HTML_response= "";

    OnTaskFinished onOurTaskFinished;


    public RetrieveFeedTask(OnTaskFinished onTaskFinished)
    {
        onOurTaskFinished = onTaskFinished;
    }
    @Override
    protected void onPreExecute()
    {
        super.onPreExecute();
    }

    @Override
    protected String doInBackground(String... urls)
    {
        try
        {
            URL url = new URL(urls[0]); // enter your url here which to download

            URLConnection conn = url.openConnection();

            // open the stream and put it into BufferedReader
            BufferedReader br = new BufferedReader(new InputStreamReader(conn.getInputStream()));

            String inputLine;

            while ((inputLine = br.readLine()) != null)
            {
                // System.out.println(inputLine);
                HTML_response += inputLine;
            }
            br.close();

            System.out.println("Done");

        }
        catch (MalformedURLException e)
        {
            e.printStackTrace();
        }
        catch (IOException e)
        {
            e.printStackTrace();
        }
        return HTML_response;
    }

    @Override
    protected void onPostExecute(String feed)
    {
        onOurTaskFinished.onFeedRetrieved(feed);
    }
}

OnTaskFinished.java

OnTaskFinished.java

public interface OnTaskFinished
{
    public void onFeedRetrieved(String feeds);
}

回答by Anupam Rajanish

One of the other SOpost answer helped me. This doesn't read line by line; supposingly the html file had a line nullin between. As preRequisite add this dependancy from project settings "com.koushikdutta.ion:ion:2.2.1"implement this code in AsyncTASK. If you want the returned -something-to be in UI thread, pass it to a mutual interface.

其他SOpost 答案之一帮助了我。这不会逐行读取;假设 html 文件之间有一行null。作为先决条件,从项目设置“com.koushikdutta.ion:ion:2.2.1”中添加此依赖项,在AsyncTASK 中实现此代码。如果您希望返回的-something-在 UI 线程中,请将其传递给一个相互接口。

Ion.with(getApplicationContext()).
load("https://google.com/hashbrowns")
.asString()
.setCallback(new FutureCallback<String>()
 {
        @Override
        public void onCompleted(Exception e, String result) {
            //int s = result.lastIndexOf("user_id")+9;
            // String st = result.substring(s,s+5);
           // Log.e("USERID",st); //something

        }
    });
Ion.with(getApplicationContext()).
load("https://google.com/hashbrowns")
.asString()
.setCallback(new FutureCallback<String>()
 {
        @Override
        public void onCompleted(Exception e, String result) {
            //int s = result.lastIndexOf("user_id")+9;
            // String st = result.substring(s,s+5);
           // Log.e("USERID",st); //something

        }
    });

回答by Ashique Hira Manzil

public class DownloadTask extends AsyncTask<String, Void, String> {

        @Override
        protected String doInBackground(String... urls) {

            String result = "";
            URL url;
            HttpsURLConnection urlConnection = null;

            try {
                url = new URL(urls[0]);

                urlConnection = (HttpsURLConnection) url.openConnection();

                BufferedReader br = new BufferedReader(new InputStreamReader(urlConnection.getInputStream()));


                String inputLine;

                while ((inputLine = br.readLine()) != null)
                {
                    // System.out.println(inputLine);
                    result += inputLine;
                }
                br.close();
                return result;
            } catch (Exception e) {
                e.printStackTrace();
                return "failed";
            }
        }
    }

    @Override
    protected void onCreate(Bundle savedInstanceState) {
        super.onCreate(savedInstanceState);
        setContentView(R.layout.activity_main);

        DownloadTask task = new DownloadTask();

        String result = null;

        try {
            result = task.execute("https://www.example.com").get();
        }catch (Exception e){

            e.printStackTrace();
        }
        Log.i("Result", result);

    }

回答by Sephy

If you have a look hereor here, you will see that you can't do that directly with android API, you need an external librairy...

如果您查看此处此处,您会发现您无法直接使用 android API 执行此操作,您需要一个外部库...

You can choose between the 2 here's hereabove if you need an external librairy.

如果您需要外部图书馆,您可以在上面的 2 种之间进行选择。