java 如何在java中使用模式匹配器？

Question

提问by hao

lets say the string is <title>xyz</title>I want to extract the xyzout of the string. I used:

假设字符串是<title>xyz</title>我想从字符串中提取xyz出来的。我用了：

Pattern titlePattern = Pattern.compile("&lttitle&gt\s*(.+?)\s*&lt/title&gt");
Matcher titleMatcher = titlePattern.matcher(line);
String title=titleMatcher.group(1));

but I am getting an error for titlePattern.matcher(line);

但我收到 titlePattern.matcher(line) 错误；

Answer 1

回答by Fabian Steeg

You say your error occurs earlier (what is the actual error, runs without an error for me), but after solving that you will need to call find()on the matcher once to actually search for the pattern:

你说你的错误发生得更早（实际错误是什么，对我来说没有错误运行），但是在解决之后你需要调用find()一次匹配器来实际搜索模式：

if(titleMatcher.find()){
  String title = titleMatcher.group(1);
}

Not that if you really match against a string with non-escaped HTML entities like

并不是说如果你真的匹配一个带有非转义 HTML 实体的字符串，比如

<title>xyz</title>

Then your regular expression will have to use these, not the escaped entities:

那么你的正则表达式将不得不使用这些，而不是转义的实体：

"<title>\s*(.+?)\s*</title>"

Also, you should be careful about how far you try to get with this, as you can't really parse HTML or XML with regular expressions. If you are working with XML, it's much easier to use an XML parser, e.g. JDOM.

此外，您应该小心尝试使用它的程度，因为您无法真正使用正则表达式解析 HTML 或 XML。如果您正在使用 XML，那么使用 XML 解析器（例如JDOM）要容易得多。

Answer 2

回答by Pace

Not technically an answer but you shouldn't be using regular expressions to parse HTML. You can try and you can get away with it for simple tasks but HTML can get ugly. There are a number of Java libraries that can parse HTML/XML just fine. If you're going to be working a lot with HTML/XML it would be worth your time to learn them.

从技术上讲，这不是答案，但您不应该使用正则表达式来解析 HTML。您可以尝试使用它来完成简单的任务，但 HTML 可能会变得丑陋。有许多 Java 库可以很好地解析 HTML/XML。如果您要大量使用 HTML/XML，那么花时间学习它们是值得的。

Answer 3

回答by hao

As others have suggested, it's probably not a good idea to parse HTML/XML with regex. You can parse XML Documents with the standard java API, but I don't recommend it. As Fabian Steeg already answered, it's probably better to use JDOM or a similar open source library for parsing XML.

正如其他人所建议的那样，使用正则表达式解析 HTML/XML 可能不是一个好主意。您可以使用标准的 java API 解析 XML 文档，但我不推荐它。正如 Fabian Steeg 已经回答的那样，使用 JDOM 或类似的开源库来解析 XML 可能更好。

With javax.xml.parsers you can do the following:

使用 javax.xml.parsers，您可以执行以下操作：

String xml = "<title>abc</title>";

DocumentBuilderFactory docBuilderFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder docBuilder = docBuilderFactory.newDocumentBuilder();

Document doc = docBuilder.parse(new InputSource(new StringReader(xml)));
NodeList nodeList = doc.getElementsByTagName("title");
String title = nodeList.item(0).getTextContent();

This parses your XML string into a Documentobject which you can use for further lookups. The API is kinda horrible though.

这会将您的 XML 字符串解析为一个Document对象，您可以使用该对象进行进一步的查找。API 有点可怕。

Another way is to use XPath for the lookup:

另一种方法是使用 XPath 进行查找：

XPathFactory xpathFactory = XPathFactory.newInstance();
XPath xPath = xpathFactory.newXPath();
String titleByXpath = xPath.evaluate("/title/text()", new InputSource(new StringReader(xml)));
// or use the Document for lookup
String titleFromDomByXpath = xPath.evaluate("/title/text()", doc);

java 如何在java中使用模式匹配器？

提问by hao

回答by Fabian Steeg

回答by Pace

回答by hao

相关推荐

最近更新

标签

java 如何在java中使用模式匹配器？

提问by hao

回答by Fabian Steeg

回答by Pace

回答by hao

相关推荐

Java 中 Tf Idf 的任何教程或代码

java Executors.newSingleThreadExecutor().execute(command) 和 new Thread(command).start() 的区别；

java Hamcrest 平等集合

java Maven 和 JOGL 库？

相关推荐

最近更新

标签