Python 使用带有 Scrapy 的 css 选择器获取 href

Question

提问by Marco Dinatsoli

I want to get the hrefvalue:

我想获得href价值：

<span class="title">
  <a href="https://www.example.com"></a>
</span>

I tried this:

我试过这个：

Link = Link1.css('span[class=title] a::text').extract()[0]

But I just get the text inside the <a>. How can I get the link inside the href?

但我只是在<a>. 我怎样才能得到里面的链接href？

Answer 1

采纳答案by paul trmbrth

What you're looking for is:

你要找的是：

Link = Link1.css('span[class=title] a::attr(href)').extract()[0]

Since you're matching a span"class" attribute also, you can even write

由于您还匹配span“类”属性，您甚至可以编写

Link = Link1.css('span.title a::attr(href)').extract()[0]

Please note that ::textpseudo element and ::attr(attributename)functional pseudo element are NOTstandard CSS3 selectors. They're extensions to CSS selectors in Scrapy 0.20.

请注意，::text伪元素和::attr(attributename)功能伪元素不是标准的 CSS3 选择器。它们是 Scrapy 0.20 中 CSS 选择器的扩展。

Edit (2017-07-20): starting from Scrapy 1.0, you can use .extract_first()instead of .extract()[0]

编辑（2017-07-20）：从Scrapy 1.0开始，可以使用.extract_first()代替.extract()[0]

Link = Link1.css('span[class=title] a::attr(href)').extract_first()
Link = Link1.css('span.title a::attr(href)').extract_first()

Answer 2

回答by Eddy

Link = Link1.css('span.title a::attr(href)').extract_first()

you can get more infomation from this

您可以从中获得更多信息

Answer 3

回答by Jorgesys

This will do the job:

这将完成这项工作：

Link = Link1.css('span.title a::attr(href)').extract()

Linkwill have the value : https://www.example.com

Link将具有以下值：https: //www.example.com

Python 使用带有 Scrapy 的 css 选择器获取 href

提问by Marco Dinatsoli

采纳答案by paul trmbrth

回答by Eddy

回答by Jorgesys

相关推荐

最近更新

标签

Python 使用带有 Scrapy 的 css 选择器获取 href

提问by Marco Dinatsoli

采纳答案by paul trmbrth

回答by Eddy

回答by Jorgesys

相关推荐

Python：等待所有`concurrent.futures.ThreadPoolExecutor`的期货

在 Python 中将 CSV 文件的行读入字符串

Python中的梯形规则

Python 3.3 CSV.Writer 写入额外的空白行

相关推荐

最近更新

标签