使用 XPath 获取 HTML 元素的文本内容？

Question

提问by Genghis Khan

See this html

看这个html

<div>
    <p>
    <span class="abc">Monitor</span> <b>0</b>
    </p>
    <a href="/add">Add to cart</a>
</div>
<div>
    <p>
    <span class="abc">Keyboard</span>  
    </p>
    <a href="/add">Add to cart</a>
</div>

Using xpath I want to parse Monitor $300and Keyboard $20. I use this xpath

使用 xpath 我想解析Monitor $300和Keyboard $20. 我使用这个 xpath

 //div[a[contains(., "Add to cart")]]/p/text()

But it selects Monitor $300. I don't want the tags. How do I get only the text?

但它选择Monitor $300. 我不要标签。如何只获取文本？

Answer 1

回答by Martijn Pieters

You want to select all descendanttext, not just child text:

您想选择所有后代文本，而不仅仅是子文本：

//div[a[contains(., "Add to cart")]]/p//text()

Note the double slash between pand text()there.

注意p和text()那里之间的双斜线。

This potentially will also include a lot of inter-tag whitespace though, you you'll need to clean that up. Example using lxml:

这可能还会包括大量的标签间空白，但您需要将其清理干净。使用示例lxml：

>>> import lxml.etree as ET
>>> tree = ET.fromstring('''<div>
... <div>
...     <p>
...     <span class="abc">Monitor</span> <b>0</b>
...     </p>
...     <a href="/add">Add to cart</a>
... </div>
... <div>
...     <p>
...     <span class="abc">Keyboard</span>  
...     </p>
...     <a href="/add">Add to cart</a>
... </div>
... </div>''')
>>> tree.xpath('//div[a[contains(., "Add to cart")]]/p//text()')
['\n    ', 'Monitor', ' ', '0', '\n    ', '\n    ', 'Keyboard', '  \n    ']
>>> res = _
>>> [txt for txt in (txt.strip() for txt in res) if txt]
['Monitor', '0', 'Keyboard', '']

使用 XPath 获取 HTML 元素的文本内容？

提问by Genghis Khan

回答by Martijn Pieters

相关推荐

最近更新

标签

使用 XPath 获取 HTML 元素的文本内容？

提问by Genghis Khan

回答by Martijn Pieters

相关推荐

HTML - 在  中编辑代码

Html 如何用css斜切

Html HTML5 地理定位如何工作？

Html CSS“和”选择器 - 我可以选择具有多个类的元素吗？

相关推荐

最近更新

标签