Python 使用带有 Scrapy 的 css 选择器获取 href

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/21181628/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-18 22:13:45  来源:igfitidea点击:

Get href using css selector with Scrapy

pythonpython-2.7scrapy

提问by Marco Dinatsoli

I want to get the hrefvalue:

我想获得href价值:

<span class="title">
  <a href="https://www.example.com"></a>
</span>

I tried this:

我试过这个:

Link = Link1.css('span[class=title] a::text').extract()[0]

But I just get the text inside the <a>. How can I get the link inside the href?

但我只是在<a>. 我怎样才能得到里面的链接href

采纳答案by paul trmbrth

What you're looking for is:

你要找的是:

Link = Link1.css('span[class=title] a::attr(href)').extract()[0]

Since you're matching a span"class" attribute also, you can even write

由于您还匹配span“类”属性,您甚至可以编写

Link = Link1.css('span.title a::attr(href)').extract()[0]

Please note that ::textpseudo element and ::attr(attributename)functional pseudo element are NOTstandard CSS3 selectors. They're extensions to CSS selectors in Scrapy 0.20.

请注意,::text伪元素和::attr(attributename)功能伪元素不是标准的 CSS3 选择器。它们是 Scrapy 0.20 中 CSS 选择器的扩展。



Edit (2017-07-20): starting from Scrapy 1.0, you can use .extract_first()instead of .extract()[0]

编辑(2017-07-20):从Scrapy 1.0开始,可以使用.extract_first()代替.extract()[0]

Link = Link1.css('span[class=title] a::attr(href)').extract_first()
Link = Link1.css('span.title a::attr(href)').extract_first()

回答by Eddy

Link = Link1.css('span.title a::attr(href)').extract_first()

you can get more infomation from this

您可以从中获得更多信息

回答by Jorgesys

This will do the job:

这将完成这项工作:

Link = Link1.css('span.title a::attr(href)').extract()

Linkwill have the value : https://www.example.com

Link将具有以下值:https: //www.example.com