使用java将html转换为xml
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/19489882/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Convert html to xml using java
提问by suresh
Can any one suggest me a best approach for converting html to xml using java Is there any API available for that? The html also might contain javascript code
任何人都可以建议我使用 java 将 html 转换为 xml 的最佳方法有没有可用的 API?html 也可能包含 javascript 代码
I have tried below code:
我试过下面的代码:
import java.io.BufferedInputStream;
import java.io.BufferedReader;
import java.io.BufferedWriter;
import java.io.DataInputStream;
import java.io.FileOutputStream;
import java.io.FileReader;
import java.io.FileWriter;
import java.io.IOException;
import java.io.InputStream;
import java.io.InputStreamReader;
import java.net.URL;
import java.net.URLConnection;
import org.jdom.JDOMException;
import org.jdom.input.SAXBuilder;
import org.jdom.output.XMLOutputter;
import java.io.IOException;
class HTML2XML {
public static void main(String args[]) throws JDOMException {
InputStream isInHtml = null;
URL url = null;
URLConnection connection = null;
DataInputStream disInHtml = null;
FileOutputStream fosOutHtml = null;
FileWriter fwOutXml = null;
FileReader frInHtml = null;
BufferedWriter bwOutXml = null;
BufferedReader brInHtml = null;
try {
// url = new URL("www.climb.co.jp");
// connection = url.openConnection();
// isInHtml = connection.getInputStream();
frInHtml = new FileReader("D:\Second.html");
brInHtml = new BufferedReader(frInHtml);
SAXBuilder saxBuilder = new SAXBuilder(
"org.ccil.cowan.tagsoup.Parser", false);
org.jdom.Document jdomDocument = saxBuilder.build(brInHtml);
XMLOutputter outputter = new XMLOutputter();
org.jdom.output.Format newFormat = outputter.getFormat();
String encoding = "iso-8859-2";
newFormat.setEncoding(encoding);
outputter.setFormat(newFormat);
try {
outputter.output(jdomDocument, System.out);
fwOutXml = new FileWriter("D:\Second.xml");
bwOutXml = new BufferedWriter(fwOutXml);
outputter.output(jdomDocument, bwOutXml);
System.out.flush();
} catch (IOException e) {
}
} catch (IOException e) {
} finally {
System.out.flush();
try {
isInHtml.close();
disInHtml.close();
fosOutHtml.flush();
fosOutHtml.getFD().sync();
fosOutHtml.close();
fwOutXml.flush();
fwOutXml.close();
bwOutXml.close();
} catch (Exception w) {
}
}
}
}
But its not working as expected
但它没有按预期工作
采纳答案by Clyde Lobo
回答by Ahsan Shah
HTML is not the same as XML unless it is conforming XHTML or HTML5 in XML mode.
HTML 与 XML 不同,除非它在 XML 模式下符合 XHTML 或 HTML5。
suggesting to use a HTML parser to read the HTML and transform it to XML – or process it directly.
建议使用 HTML 解析器读取 HTML 并将其转换为 XML - 或直接处理它。
回答by Rajj
If you want to parse html than rather than converting html to xml you can use html parser. http://www.mkyong.com/java/jsoup-html-parser-hello-world-examples/http://htmlparser.sourceforge.net/javadoc/doc-files/using.htmlI hope it helps you.
如果您想解析 html 而不是将 html 转换为 xml,您可以使用 html 解析器。 http://www.mkyong.com/java/jsoup-html-parser-hello-world-examples/ http://htmlparser.sourceforge.net/javadoc/doc-files/using.html希望对你有帮助。