Java 获取页面的可见文本

Question

提问by David Michael Gang

How do I get the visible text portion of a web page with selenium webdriver without the HTML tags?

如何在没有 HTML 标签的情况下使用 selenium webdriver 获取网页的可见文本部分？

I need something equivalent to the function HtmlPage.asText() from Htmlunit.

我需要一些等效于 Htmlunit 函数 HtmlPage.asText() 的东西。

It is not enough to take the text with the function WebDriver.getSource and parse it with jsoup because there could be in the page hidden elements (by external CSS) which I am not interested in them.

使用函数 WebDriver.getSource 获取文本并使用 jsoup 解析它是不够的，因为页面中可能存在我对它们不感兴趣的隐藏元素（通过外部 CSS）。

Answer 1

采纳答案by Nathan Merrill

Doing By.tagName("body")(or some other selector to select the top element), then performing getText()on that element will return all of the visible text.

Doing By.tagName("body")（或其他选择器选择顶部元素），然后getText()在该元素上执行将返回所有可见文本。

Answer 2

回答by Brantley Blanchard

I'm not sure what language you're using, but in C# the IWebElement object has a .Text method. That method shows all text that is displayed between the element's opening and closing tag.

我不确定您使用的是什么语言，但在 C# 中，IWebElement 对象有一个 .Text 方法。该方法显示元素的开始和结束标记之间显示的所有文本。

I would create an IWebElement using XPath to grab the entire page. In other words, you're grabbing the body element and looking at the text in it.

我将使用 XPath 创建一个 IWebElement 来抓取整个页面。换句话说，您正在抓取 body 元素并查看其中的文本。

string pageText = driver.FindElement(By.XPath("//html/body/")).Text;

If the above code does not work for selenium, use this:

如果上述代码不适用于 selenium，请使用以下代码：

string yourtext= driver.findElement(By.tagName("body")).getText();

Answer 3

回答by Anuraj S.L

I can help you with C# Selenium.

我可以帮助您使用 C# Selenium。

By using this you can select all the text on that particular page and save it to a text file at your preferred location.

通过使用它，您可以选择该特定页面上的所有文本并将其保存到您首选位置的文本文件中。

Make sure you are using this stuff:

确保你正在使用这些东西：

using System.IO;
using System.Text;
using OpenQA.Selenium;
using OpenQA.Selenium.Support.UI;

After reaching the particular page try using this code.

到达特定页面后，尝试使用此代码。

IWebElement body = driver.FindElement(By.TagName("body"));
var result = driver.FindElement(By.TagName("body")).Text;

// Folder location
var dir = @"C:Textfile" + DateTime.Now.ToShortDateString();

// If the folder doesn't exist, create it
if (!Directory.Exists(dir))
Directory.CreateDirectory(dir);

// Creates a file copiedtext.txt with all the contents on the page.
File.AppendAllText(Path.Combine(dir, "Copiedtext.txt"), result);

Java 获取页面的可见文本

提问by David Michael Gang

采纳答案by Nathan Merrill

回答by Brantley Blanchard

回答by Anuraj S.L

相关推荐

最近更新

标签

Java 获取页面的可见文本

提问by David Michael Gang

采纳答案by Nathan Merrill

回答by Brantley Blanchard

回答by Anuraj S.L

相关推荐

Java CompletableFuture | thenApply 与 thenCompose

Java Scala 中的“评估”

starttls.enabled = true 对于从 Java 代码发送的邮件是否安全？

Java 属性文件的Spring Boot外部配置

相关推荐

最近更新

标签