Java 获取页面的可见文本
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/18336956/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Get visible text of page
提问by David Michael Gang
How do I get the visible text portion of a web page with selenium webdriver without the HTML tags?
如何在没有 HTML 标签的情况下使用 selenium webdriver 获取网页的可见文本部分?
I need something equivalent to the function HtmlPage.asText() from Htmlunit.
我需要一些等效于 Htmlunit 函数 HtmlPage.asText() 的东西。
It is not enough to take the text with the function WebDriver.getSource and parse it with jsoup because there could be in the page hidden elements (by external CSS) which I am not interested in them.
使用函数 WebDriver.getSource 获取文本并使用 jsoup 解析它是不够的,因为页面中可能存在我对它们不感兴趣的隐藏元素(通过外部 CSS)。
采纳答案by Nathan Merrill
Doing By.tagName("body")
(or some other selector to select the top element), then performing getText()
on that element will return all of the visible text.
Doing By.tagName("body")
(或其他选择器选择顶部元素),然后getText()
在该元素上执行将返回所有可见文本。
回答by Brantley Blanchard
I'm not sure what language you're using, but in C# the IWebElement object has a .Text method. That method shows all text that is displayed between the element's opening and closing tag.
我不确定您使用的是什么语言,但在 C# 中,IWebElement 对象有一个 .Text 方法。该方法显示元素的开始和结束标记之间显示的所有文本。
I would create an IWebElement using XPath to grab the entire page. In other words, you're grabbing the body element and looking at the text in it.
我将使用 XPath 创建一个 IWebElement 来抓取整个页面。换句话说,您正在抓取 body 元素并查看其中的文本。
string pageText = driver.FindElement(By.XPath("//html/body/")).Text;
If the above code does not work for selenium, use this:
如果上述代码不适用于 selenium,请使用以下代码:
string yourtext= driver.findElement(By.tagName("body")).getText();
回答by Anuraj S.L
I can help you with C# Selenium.
我可以帮助您使用 C# Selenium。
By using this you can select all the text on that particular page and save it to a text file at your preferred location.
通过使用它,您可以选择该特定页面上的所有文本并将其保存到您首选位置的文本文件中。
Make sure you are using this stuff:
确保你正在使用这些东西:
using System.IO;
using System.Text;
using OpenQA.Selenium;
using OpenQA.Selenium.Support.UI;
After reaching the particular page try using this code.
到达特定页面后,尝试使用此代码。
IWebElement body = driver.FindElement(By.TagName("body"));
var result = driver.FindElement(By.TagName("body")).Text;
// Folder location
var dir = @"C:Textfile" + DateTime.Now.ToShortDateString();
// If the folder doesn't exist, create it
if (!Directory.Exists(dir))
Directory.CreateDirectory(dir);
// Creates a file copiedtext.txt with all the contents on the page.
File.AppendAllText(Path.Combine(dir, "Copiedtext.txt"), result);