如何从c#获取网站标题

Question

提问by Morten Christiansen

I'm revisiting som old code of mine and have stumbled upon a method for getting the title of a website based on its url. It's not really what you would call a stable method as it often fails to produce a result and sometimes even produces incorrect results. Also, sometimes it fails to show some of the characters from the title as they are of an alternative encoding.

我正在重新审视我的一些旧代码，并偶然发现了一种根据网址获取网站标题的方法。这并不是您所说的稳定方法，因为它经常无法产生结果，有时甚至会产生不正确的结果。此外，有时它无法显示标题中的某些字符，因为它们具有替代编码。

Does anyone have suggestions for improvements over this old version?

有没有人对这个旧版本有改进的建议？

public static string SuggestTitle(string url, int timeout)
{
    WebResponse response = null;
    string line = string.Empty;

    try
    {
        WebRequest request = WebRequest.Create(url);
        request.Timeout = timeout;

        response = request.GetResponse();
        Stream streamReceive = response.GetResponseStream();
        Encoding encoding = System.Text.Encoding.GetEncoding("utf-8");
        StreamReader streamRead = new System.IO.StreamReader(streamReceive, encoding);

        while(streamRead.EndOfStream != true)
        {
            line = streamRead.ReadLine();
            if (line.Contains("<title>"))
            {
                line = line.Split(new char[] { '<', '>' })[2];
                break;
            }
        }
    }
    catch (Exception) { }
    finally
    {
        if (response != null)
        {
            response.Close();
        }
    }

    return line;
}

One final note - I would like the code to run faster as well, as it is blocking until the page as been fetched, so if I can get only the site header and not the entire page, it would be great.

最后一点 - 我也希望代码运行得更快，因为它在页面被获取之前一直处于阻塞状态，所以如果我只能获取站点标题而不是整个页面，那就太好了。

Answer 1

采纳答案by Timothy Khouri

A simpler way to get the content:

获取内容的更简单方法：

WebClient x = new WebClient();
string source = x.DownloadString("http://www.singingeels.com/");

A simpler, more reliable way to get the title:

获得标题的更简单、更可靠的方法：

string title = Regex.Match(source, @"\<title\b[^>]*\>\s*(?<Title>[\s\S]*?)\</title\>",
    RegexOptions.IgnoreCase).Groups["Title"].Value;

Answer 2

回答by Nick Berardi

Inorder to accomplish this you are going to need to do a couple of things.

为了实现这一点，您需要做一些事情。

Make your app threaded, so that you can process multiple requests at the time and maximize the number of HTTP requests that are being made.
Durring the async request, download only the amount of data you want to pull back, you could probably do parsing on the data as it comes back looking for
Probably want to use regex to pull out the title name

使您的应用程序线程化，以便您可以一次处理多个请求并最大限度地增加正在发出的 HTTP 请求数。
在异步请求期间，仅下载您想要撤回的数据量，您可能会在数据返回寻找时对其进行解析
大概是想用regex把标题名拉出来

I have done this before with SEO bots and I have been able to handle almost 10,000 requests at a single time. You just need to make sure that each web request can be self contained in a thread.

我之前使用 SEO 机器人完成了这项工作，并且我已经能够一次处理近 10,000 个请求。您只需要确保每个 Web 请求都可以自包含在一个线程中。

Answer 3

回答by Roberto B

Perhaps with this suggestion a new world opens up for you I also had this question and came to this

也许这个建议为你打开了一个新世界我也有这个问题，来到这个

Download "Html Agility Pack" from http://html-agility-pack.net/?z=codeplex

从http://html-agility-pack.net/?z=codeplex下载“Html Agility Pack”

Or go to nuget: https://www.nuget.org/packages/HtmlAgilityPack/And add in this reference.

或者去 nuget: https://www.nuget.org/packages/HtmlAgilityPack/并添加这个参考。

Add folow using in the code file:

在代码文件中添加以下使用：

using HtmlAgilityPack;

Write folowing code in your methode:

在您的方法中编写以下代码：

var webGet = new HtmlWeb();
var document = webGet.Load(url);    
var title = document.DocumentNode.SelectSingleNode("html/head/title").InnerText;

Sources:

资料来源：

https://codeshare.co.uk/blog/how-to-scrape-meta-data-from-a-url-using-htmlagilitypack-in-c/HtmlAgilityPack obtain Title and meta

https://codeshare.co.uk/blog/how-to-scrape-meta-data-from-a-url-using-htmlagilitypack-in-c/ HtmlAgilityPack 获取 Title 和 meta

如何从c#获取网站标题

提问by Morten Christiansen

采纳答案by Timothy Khouri

回答by Nick Berardi

回答by Roberto B

相关推荐

最近更新

标签

如何从c#获取网站标题

提问by Morten Christiansen

采纳答案by Timothy Khouri

回答by Nick Berardi

回答by Roberto B

相关推荐

Linux Silverlight，“命名空间‘System.Net’中不存在类型或命名空间名称‘CookieContainer’”

C# 如何将复选框双向绑定到标志枚举的单个位？

Linux Silverlight 和 DataGridColumn 类型

Linux mysql，使用 if

相关推荐

最近更新

标签