JTidy java API toConvert HTML to XHTML
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/15063870/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
JTidy java API toConvert HTML to XHTML
提问by mohammad
I am using JTidy to convert from HTML to XHTML but I found in my XHTML file this tag
.
Can i prevent it ?
this is my code
我正在使用 JTidy 从 HTML 转换为 XHTML,但我在我的 XHTML 文件中找到了这个标签
。我可以预防吗?
这是我的代码
//from html to xhtml
try
{
fis = new FileInputStream(htmlFileName);
}
catch (java.io.FileNotFoundException e)
{
System.out.println("File not found: " + htmlFileName);
}
Tidy tidy = new Tidy();
tidy.setShowWarnings(false);
tidy.setXmlTags(false);
tidy.setInputEncoding("UTF-8");
tidy.setOutputEncoding("UTF-8");
tidy.setXHTML(true);//
tidy.setMakeClean(true);
Document xmlDoc = tidy.parseDOM(fis, null);
try
{
tidy.pprint(xmlDoc,new FileOutputStream("c.xhtml"));
}
catch(Exception e)
{
}
采纳答案by mohammad
i created a function that parse the the xhtml code and remove the unwelcome tags and to add a link to the css File "tableStyle.css"
我创建了一个函数来解析 xhtml 代码并删除不受欢迎的标签并添加到 css 文件“tableStyle.css”的链接
public static String xhtmlparser(){
String Cleanline="";
try {
// the file url
FileInputStream fstream = new FileInputStream("c.xhtml");
// Use DataInputStream to read binary NOT text.
BufferedReader br = new BufferedReader(new InputStreamReader(fstream));
String strLine = null;
int linescounter=0;
while ((strLine = br.readLine()) != null) {// read every line in the file
String m=strLine.replaceAll(" ", "");
linescounter++;
if(linescounter==5)
m=m+"\n"+ "<link rel="+ "\"stylesheet\" "+"type="+ "\"text/css\" "+"href= " +"\"tableStyle.css\""+ "/>";
Cleanline+=m+"\n";
}
}
catch(IOException e){}
return Cleanline;
}
but as a performance issue is it good?
但作为一个性能问题,它好吗?
by the way it works will
顺便说一下,它的工作原理将
回答by Christian
I had only success, when the input is treated as XML as well. So either set xmltags to true
当输入也被视为 XML 时,我只取得了成功。所以要么将 xmltags 设置为 true
tidy.setXmlTags(true);
and live with the errors and warnings or do the conversion twice. First conversion to sanitize the html (html to xhtml) and a second conversion from xhtml to xhtml with set xmltags, thus no errors and warnings occur.
并忍受错误和警告,或者进行两次转换。第一次转换以清理 html(html 到 xhtml),然后使用设置的 xmltags 从 xhtml 到 xhtml 的第二次转换,因此不会发生错误和警告。
String htmlFileName = "test.html";
try( InputStream in = Thread.currentThread().getContextClassLoader().getResourceAsStream(htmlFileName);
FileOutputStream fos = new FileOutputStream("tmp.xhtml");) {
Tidy tidy = new Tidy();
tidy.setShowWarnings(true);
tidy.setInputEncoding("UTF-8");
tidy.setOutputEncoding("UTF-8");
tidy.setXHTML(true);
tidy.setMakeClean(true);
Document xmlDoc = tidy.parseDOM(in, fos);
} catch (Exception e) {
e.printStackTrace();
}
try( InputStream in = new FileInputStream("tmp.xhtml");
FileOutputStream fos = new FileOutputStream("c.xhtml");) {
Tidy tidy = new Tidy();
tidy.setShowWarnings(true);
tidy.setXmlTags(true);
tidy.setInputEncoding("UTF-8");
tidy.setOutputEncoding("UTF-8");
tidy.setXHTML(true);
tidy.setMakeClean(true);
Document xmlDoc = tidy.parseDOM(in, null);
tidy.pprint(xmlDoc, fos);
} catch (Exception e) {
e.printStackTrace();
}
I used the latest jtidy version 938.
我使用了最新的 jtidy 版本 938。
回答by Tanmay kumar shaw
You can use the following method to get xhtml from html
您可以使用以下方法从 html 中获取 xhtml
public static String getXHTMLFromHTML(String inputFile,
String outputFile) throws Exception {
File file = new File(inputFile);
FileOutputStream fos = null;
InputStream is = null;
try {
fos = new FileOutputStream(outputFile);
is = new FileInputStream(file);
Tidy tidy = new Tidy();
tidy.setXHTML(true);
tidy.parse(is, fos);
} catch (FileNotFoundException e) {
e.printStackTrace();
}finally{
if(fos != null){
try {
fos.close();
} catch (IOException e) {
fos = null;
}
fos = null;
}
if(is != null){
try {
is.close();
} catch (IOException e) {
is = null;
}
is = null;
}
}
return outputFile;
}