java 如何在java中使用模式匹配器?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/1986031/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-29 18:49:21  来源:igfitidea点击:

how to use Pattern matcher in java?

javaregex

提问by hao

lets say the string is <title>xyz</title>I want to extract the xyzout of the string. I used:

假设字符串是<title>xyz</title>我想从字符串中提取xyz出来的。我用了:

Pattern titlePattern = Pattern.compile("&lttitle&gt\s*(.+?)\s*&lt/title&gt");
Matcher titleMatcher = titlePattern.matcher(line);
String title=titleMatcher.group(1));    

but I am getting an error for titlePattern.matcher(line);

但我收到 titlePattern.matcher(line) 错误;

回答by Fabian Steeg

You say your error occurs earlier (what is the actual error, runs without an error for me), but after solving that you will need to call find()on the matcher once to actually search for the pattern:

你说你的错误发生得更早(实际错误是什么,对我来说没有错误运行),但是在解决之后你需要调用find()一次匹配器来实际搜索模式:

if(titleMatcher.find()){
  String title = titleMatcher.group(1);
}

Not that if you really match against a string with non-escaped HTML entities like

并不是说如果你真的匹配一个带有非转义 HTML 实体的字符串,比如

<title>xyz</title>

Then your regular expression will have to use these, not the escaped entities:

那么你的正则表达式将不得不使用这些,而不是转义的实体:

"<title>\s*(.+?)\s*</title>"

Also, you should be careful about how far you try to get with this, as you can't really parse HTML or XML with regular expressions. If you are working with XML, it's much easier to use an XML parser, e.g. JDOM.

此外,您应该小心尝试使用它的程度,因为您无法真正使用正则表达式解析 HTML 或 XML。如果您正在使用 XML,那么使用 XML 解析器(例如JDOM)要容易得多。

回答by Pace

Not technically an answer but you shouldn't be using regular expressions to parse HTML. You can try and you can get away with it for simple tasks but HTML can get ugly. There are a number of Java libraries that can parse HTML/XML just fine. If you're going to be working a lot with HTML/XML it would be worth your time to learn them.

从技术上讲,这不是答案,但您不应该使用正则表达式来解析 HTML。您可以尝试使用它来完成简单的任务,但 HTML 可能会变得丑陋。有许多 Java 库可以很好地解析 HTML/XML。如果您要大量使用 HTML/XML,那么花时间学习它们是值得的。

回答by hao

As others have suggested, it's probably not a good idea to parse HTML/XML with regex. You can parse XML Documents with the standard java API, but I don't recommend it. As Fabian Steeg already answered, it's probably better to use JDOM or a similar open source library for parsing XML.

正如其他人所建议的那样,使用正则表达式解析 HTML/XML 可能不是一个好主意。您可以使用标准的 java API 解析 XML 文档,但我不推荐它。正如 Fabian Steeg 已经回答的那样,使用 JDOM 或类似的开源库来解析 XML 可能更好。

With javax.xml.parsers you can do the following:

使用 javax.xml.parsers,您可以执行以下操作:

String xml = "<title>abc</title>";

DocumentBuilderFactory docBuilderFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder docBuilder = docBuilderFactory.newDocumentBuilder();

Document doc = docBuilder.parse(new InputSource(new StringReader(xml)));
NodeList nodeList = doc.getElementsByTagName("title");
String title = nodeList.item(0).getTextContent();

This parses your XML string into a Documentobject which you can use for further lookups. The API is kinda horrible though.

这会将您的 XML 字符串解析为一个Document对象,您可以使用该对象进行进一步的查找。API 有点可怕。

Another way is to use XPath for the lookup:

另一种方法是使用 XPath 进行查找:

XPathFactory xpathFactory = XPathFactory.newInstance();
XPath xPath = xpathFactory.newXPath();
String titleByXpath = xPath.evaluate("/title/text()", new InputSource(new StringReader(xml)));
// or use the Document for lookup
String titleFromDomByXpath = xPath.evaluate("/title/text()", doc);