如何使用Python检索网页的页面标题？-IGI

时间：2020-03-05 18:49:50 　来源:igfitidea点击:

如何使用Python检索网页的页面标题(标题html标签)？

解决方案

回答

我将始终将lxml用于此类任务。我们也可以使用beautifulsoup。

import lxml.html
t = lxml.html.parse(url)
print t.find(".//title").text

回答

对于这样一个简单的任务，这可能是过高的，但是如果我们打算做更多的事情，那么从这些工具(机械化，BeautifulSoup)开始比较明智，因为它们比其他工具(使用urllib获取内容和进行正则表达式)更容易使用或者其他解析器来解析html)

链接：
美丽汤
机械化

#!/usr/bin/env python
#coding:utf-8

from BeautifulSoup import BeautifulSoup
from mechanize import Browser

#This retrieves the webpage content
br = Browser()
res = br.open("https://www.google.com/")
data = res.get_data() 

#This parses the content
soup = BeautifulSoup(data)
title = soup.find('title')

#This outputs the content :)
print title.renderContents()

回答

机械化浏览器对象具有title()方法。因此，本文中的代码可以重写为：

from mechanize import Browser
br = Browser()
br.open("http://www.google.com/")
print br.title()

回答

这是@Vinko Vrsalovic的答案的简化版本：

import urllib2
from BeautifulSoup import BeautifulSoup

soup = BeautifulSoup(urllib2.urlopen("https://www.google.com"))
print soup.title.string

笔记：

soup.title在html文档中的任何位置找到第一个title元素
title.string假定它只有一个子节点，并且该子节点是一个字符串

对于beautifulsoup 4.x，请使用其他导入：

from bs4 import BeautifulSoup

如何使用Python检索网页的页面标题？

解决方案

回答

回答

回答

回答

相关推荐

最近更新

标签

如何使用Python检索网页的页面标题？

解决方案

回答

回答

回答

回答

相关推荐

Ubuntu 32位最大地址空间

付款处理器-如果我想在自己的网站上接受信用卡，该怎么办？

使用VB宏修改电子表格

斯巴达编程

相关推荐

最近更新

标签