Java 如何仅获取具有值 Stax 的元素
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/3293841/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to get element only elements with values Stax
提问by ant
I'm trying to get only elements that have text, ex xml :
我试图只获取具有文本的元素,例如 xml :
<root>
<Item>
<ItemID>4504216603</ItemID>
<ListingDetails>
<StartTime>10:00:10.000Z</StartTime>
<EndTime>10:00:30.000Z</EndTime>
<ViewItemURL>http://url</ViewItemURL>
....
</item>
It should print
它应该打印
Element Local Name:ItemID
Text:4504216603
Element Local Name:StartTime
Text:10:00:10.000Z
Element Local Name:EndTime
Text:10:00:30.000Z
Element Local Name:ViewItemURL
Text:http://url
This code prints also root, item etc. Is it even possible, it must be I just can't google it.
这段代码也打印根、项目等。它甚至可能吗,一定是我不能用谷歌搜索它。
XMLInputFactory inputFactory = XMLInputFactory.newInstance();
InputStream input = new FileInputStream(new File("src/main/resources/file.xml"));
XMLStreamReader xmlStreamReader = inputFactory.createXMLStreamReader(input);
while (xmlStreamReader.hasNext()) {
int event = xmlStreamReader.next();
if (event == XMLStreamConstants.START_ELEMENT) {
System.out.println("Element Local Name:" + xmlStreamReader.getLocalName());
}
if (event == XMLStreamConstants.CHARACTERS) {
if(!xmlStreamReader.getText().trim().equals("")){
System.out.println("Text:"+xmlStreamReader.getText().trim());
}
}
}
Edit incorrect behaviour:
编辑不正确的行为:
Element Local Name:root
Element Local Name:item
Element Local Name:ItemID
Text:4504216603
Element Local Name:ListingDetails
Element Local Name:StartTime
Text:10:00:10.000Z
Element Local Name:EndTime
Text:10:00:30.000Z
Element Local Name:ViewItemURL
Text:http://url
I don't want that root and other nodes which don't have text to be printed, just the output which I wrote above. thank you
我不想打印没有文本的根节点和其他节点,只是我上面写的输出。谢谢你
采纳答案by Georgy Bolyuba
Try this:
尝试这个:
while (xmlStreamReader.hasNext()) {
int event = xmlStreamReader.next();
if (event == XMLStreamConstants.START_ELEMENT) {
try {
String text = xmlStreamReader.getElementText();
System.out.println("Element Local Name:" + xmlStreamReader.getLocalName());
System.out.println("Text:" + text);
} catch (XMLStreamException e) {
}
}
}
SAX based solution (works):
基于 SAX 的解决方案(有效):
public class Test extends DefaultHandler {
public static void main(String[] args) throws ParserConfigurationException, IOException, SAXException, XPathExpressionException, XMLStreamException {
SAXParser parser = SAXParserFactory.newInstance().newSAXParser();
parser.parse(new File("src/file.xml"), new Test());
}
private String currentName;
@Override
public void startElement(String uri, String localName, String qName, Attributes attributes) throws SAXException {
currentName = qName;
}
@Override
public void characters(char[] ch, int start, int length) throws SAXException {
String string = new String(ch, start, length);
if (hasText(string)) {
System.out.println(currentName);
System.out.println(string);
}
}
private boolean hasText(String string) {
string = string.trim();
return string.length() > 0;
}
}
回答by ant
Stax solution :
税收解决方案:
Parse document
解析文档
public void parseXML(InputStream xml) {
try {
DOMResult result = new DOMResult();
XMLInputFactory xmlInputFactory = XMLInputFactory.newInstance();
XMLEventReader reader = xmlInputFactory.createXMLEventReader(new StreamSource(xml));
TransformerFactory transFactory = TransformerFactory.newInstance();
Transformer transformer = transFactory.newTransformer();
transformer.transform(new StAXSource(reader), result);
Document document = (Document) result.getNode();
NodeList startlist = document.getChildNodes();
processNodeList(startlist);
} catch (Exception e) {
System.err.println("Something went wrong, this might help :\n" + e.getMessage());
}
}
Now all nodes from the document are in a NodeList so do this next :
现在文档中的所有节点都在一个 NodeList 中,所以接下来执行以下操作:
private void processNodeList(NodeList nodelist) {
for (int i = 0; i < nodelist.getLength(); i++) {
if (nodelist.item(i).getNodeType() == Node.ELEMENT_NODE && (hasValidAttributes(nodelist.item(i)) || hasValidText(nodelist.item(i)))) {
getNodeNamesAndValues(nodelist.item(i));
}
processNodeList(nodelist.item(i).getChildNodes());
}
}
Then for each element node with valid text get name and value
然后为每个具有有效文本的元素节点获取名称和值
public void getNodeNamesAndValues(Node n) {
String nodeValue = null;
String nodeName = null;
if (hasValidText(n)) {
while (n != null && isWhiteSpace(n.getTextContent()) == true && StringUtils.isWhitespace(n.getTextContent()) && n.getNodeType() != Node.ELEMENT_NODE) {
n = n.getFirstChild();
}
nodeValue = StringUtils.strip(n.getTextContent());
nodeName = n.getLocalName();
System.out.println(nodeName + " " + nodeValue);
}
}
Bunch of useful methods to check nodes :
一堆有用的方法来检查节点:
private static boolean hasValidAttributes(Node node) {
return (node.getAttributes().getLength() > 0);
}
private boolean hasValidText(Node node) {
String textValue = node.getTextContent();
return (textValue != null && textValue != "" && isWhiteSpace(textValue) == false && !StringUtils.isWhitespace(textValue) && node.hasChildNodes());
}
private boolean isWhiteSpace(String nodeText) {
if (nodeText.startsWith("\r") || nodeText.startsWith("\t") || nodeText.startsWith("\n") || nodeText.startsWith(" "))
return true;
else
return false;
}
I also used StringUtils, you can get that by including this in your pom.xml if you're using maven :
我还使用了 StringUtils,如果您使用的是 maven,您可以通过在 pom.xml 中包含它来获得它:
<dependency>
<groupId>commons-lang</groupId>
<artifactId>commons-lang</artifactId>
<version>2.5</version>
</dependency>
This is inefficient if you're reading huge files, but not so much if you split them first. This is what I've come with(with google). There are more better solutions this is mine, I'm an amateur(for now).
如果您正在读取大文件,这将是低效的,但如果您先拆分它们,则效率不会那么高。这就是我带来的(使用谷歌)。有更多更好的解决方案,这是我的,我是业余爱好者(目前)。