Html 如何在网站上找到损坏的链接

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/65515/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-28 22:15:16  来源:igfitidea点击:

How to find broken links on a website

html

提问by Ian Nelson

What techniques or tools are recommended for finding broken links on a website?

建议使用哪些技术或工具来查找网站上的损坏链接?

I have access to the logfiles, so could conceivably parse these looking for 404 errors, but would like something automated which will follow (or attempt to follow) all links on a site.

我可以访问日志文件,因此可以想象解析这些文件以查找 404 错误,但希望能够自动跟踪(或尝试跟踪)站点上的所有链接。

采纳答案by jrudolph

For Chrome Extension there is hexometer

对于 Chrome 扩展程序,有六度计

See LinkCheckerfor Firefox.

请参阅Firefox 的LinkChecker

For Mac OS there is a tool Integritywhich can check URLs for broken links.

对于 Mac OS,有一个工具Integrity可以检查 URL 是否有损坏的链接。

For Windows there is Xenu's Link Sleuth.

对于 Windows,有Xenu 的 Link Sleuth

回答by wjbrown

Just found a wget script that does what you are asking for.

刚刚找到了一个 wget 脚本,可以满足您的要求。

wget --spider  -o wget.log  -e robots=off --wait 1 -r -p http://www.example.com

Credit for this goes to this page.

归功于此页面

回答by Paul Reiners

I like the W3C Link Checker.

我喜欢W3C 链接检查器

回答by Paul Reiners

See linkcheckertool:

查看链接检查工具:

LinkChecker is a free, GPL licensed website validator. LinkChecker checks links in web documents or full websites.

LinkChecker 是一个免费的、GPL 许可的网站验证器。LinkChecker 检查网络文档或完整网站中的链接。

回答by Peter Hilton

Either use a tool that parses your log files and gives you a 'broken links' report (e.g. Analogor Google Webmaster Tools), or run a tool that spiders your web site and reports broken links (e.g. W3C Link Checker).

要么使用解析您的日志文件并为您提供“断开链接”报告的工具(例如Analog或 Google Webmaster Tools),要么运行一个工具来抓取您的网站并报告断开的链接(例如W3C Link Checker)。

回答by Ian Mercer

In a .NET application you can set IIS to pass all requests to ASP.NET and then in your global error handler you can catch and log 404 errors. This is something you'd do in addition to spidering your site to check for internal missing links. Doing this can help find broken links from OTHER sites and you can then fix them with 301 redirects to the correct page.

在 .NET 应用程序中,您可以将 IIS 设置为将所有请求传递给 ASP.NET,然后在您的全局错误处理程序中您可以捕获并记录 404 错误。除了爬行您的网站以检查内部缺失的链接之外,您还需要执行此操作。这样做可以帮助找到来自其他网站的损坏链接,然后您可以使用 301 重定向到正确的页面来修复它们。

To help test your site internally there's also the Microsoft SEO toolkit.

为了帮助在内部测试您的网站,还有 Microsoft SEO 工具包。

Of course the best technique is to avoid the problem at compile time! In ASP.NET you can get close to this by requiring that all links be generated from static methods on each page so there's only ever one location where any given URL is generated. e.g. http://www.codeproject.com/KB/aspnet/StronglyTypedPages.aspx

当然,最好的技术是在编译时避免这个问题!在 ASP.NET 中,您可以通过要求从每个页面上的静态方法生成所有链接来接近这一点,这样任何给定的 URL 只会在一个位置生成。例如http://www.codeproject.com/KB/aspnet/StronglyTypedPages.aspx

If you want a complete C# crawler, there's one here:- http://blog.abodit.com/2010/03/a-simple-web-crawler-in-c-using-htmlagilitypack/

如果你想要一个完整的 C# 爬虫,这里有一个:- http://blog.abodit.com/2010/03/a-simple-web-crawler-in-c-using-htmlagilitypack/

回答by Jonathan

Our commercial product DeepTrawldoes this and can be used on both Windows / Mac.

我们的商业产品DeepTrawl 可以做到这一点,并且可以在 Windows / Mac 上使用。

Disclosure: I'm the lead developer behind DeepTrawl.

披露:我是 DeepTrawl 背后的首席开发人员。

回答by akauppi

LinkTigerseems like a very polished (though non-free) service to do this. I'm not using it, just wanted to add because it was not yet mentioned.

LinkTiger似乎是一个非常完善的(虽然不是免费的)服务来做到这一点。我没有使用它,只是想添加,因为它尚未提及。

回答by ConroyP

Your best bet is to knock together your own spider in your scripting language of choice, it could be done recursively along the lines of:

你最好的办法是用你选择的脚本语言组合你自己的蜘蛛,它可以按照以下方式递归地完成:

// Pseudo-code to recursively check for broken links
// logging all errors centrally
function check_links($page)
{
    $html = fetch_page($page);
    if(!$html)
    {
        // Log page to failures log
        ...
    }
    else
    {
        // Find all html, img, etc links on page
        $links = find_links_on_page($html);
        foreach($links as $link)
        {
            check_links($link);
        }
    }
}

Once your site has gotten a certain level of attention from Google, their webmaster toolsare invaluable in showing broken links that users may come across, but this is quite reactionary - the dead links may be around for several weeks before google indexes them and logs the 404 in your webmaster panel.

一旦您的网站得到了 Google 一定程度的关注,他们的网站管理员工具在显示用户可能遇到的损坏链接方面非常宝贵,但这是非常反动的 - 在 Google 将它们编入索引并记录之前,死链接可能会存在数周404 在您的网站管理员面板中。

Writing your own script like above will show you all possible broken links, without having to wait for google (webmaster tool) or your users (404 in access logs) to stumble across them.

像上面一样编写自己的脚本将向您显示所有可能的损坏链接,而无需等待 google(网站管理员工具)或您的用户(访问日志中的 404)偶然发现它们。

回答by scunliffe

There's a windows app called CheckWeb. Its no longer developed, but it works well, and the code is open (C++ I believe).

有一个名为 CheckWeb 的 Windows 应用程序。它不再开发,但运行良好,并且代码是开放的(我相信是 C++)。

You just give it a url, and it will crawl your site (and external links if you choose), reporting any errors, image / page "weight" etc.

您只需给它一个 url,它就会抓取您的网站(以及您选择的外部链接),报告任何错误、图像/页面“权重”等。

http://www.algonet.se/~hubbabub/how-to/checkweben.html

http://www.algonet.se/~hubbabub/how-to/checkweben.html