Python 使用带有 Scrapy 的 css 选择器获取 href
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/21181628/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Get href using css selector with Scrapy
提问by Marco Dinatsoli
I want to get the hrefvalue:
我想获得href价值:
<span class="title">
<a href="https://www.example.com"></a>
</span>
I tried this:
我试过这个:
Link = Link1.css('span[class=title] a::text').extract()[0]
But I just get the text inside the <a>. How can I get the link inside the href?
但我只是在<a>. 我怎样才能得到里面的链接href?
采纳答案by paul trmbrth
What you're looking for is:
你要找的是:
Link = Link1.css('span[class=title] a::attr(href)').extract()[0]
Since you're matching a span"class" attribute also, you can even write
由于您还匹配span“类”属性,您甚至可以编写
Link = Link1.css('span.title a::attr(href)').extract()[0]
Please note that ::textpseudo element and ::attr(attributename)functional pseudo element are NOTstandard CSS3 selectors. They're extensions to CSS selectors in Scrapy 0.20.
请注意,::text伪元素和::attr(attributename)功能伪元素不是标准的 CSS3 选择器。它们是 Scrapy 0.20 中 CSS 选择器的扩展。
Edit (2017-07-20): starting from Scrapy 1.0, you can use .extract_first()instead of .extract()[0]
编辑(2017-07-20):从Scrapy 1.0开始,可以使用.extract_first()代替.extract()[0]
Link = Link1.css('span[class=title] a::attr(href)').extract_first()
Link = Link1.css('span.title a::attr(href)').extract_first()
回答by Eddy
Link = Link1.css('span.title a::attr(href)').extract_first()
回答by Jorgesys
This will do the job:
这将完成这项工作:
Link = Link1.css('span.title a::attr(href)').extract()
Linkwill have the value : https://www.example.com
Link将具有以下值:https: //www.example.com

