JTidy java API toConvert HTML to XHTML

Question

提问by mohammad

I am using JTidy to convert from HTML to XHTML but I found in my XHTML file this tag  . Can i prevent it ?
this is my code

我正在使用 JTidy 从 HTML 转换为 XHTML，但我在我的 XHTML 文件中找到了这个标签 。我可以预防吗？
这是我的代码

    //from html to xhtml
   try   
    {  
        fis = new FileInputStream(htmlFileName);  
    }  
    catch (java.io.FileNotFoundException e)   
    {  
        System.out.println("File not found: " + htmlFileName);  
    }  
        Tidy tidy = new Tidy(); 
        tidy.setShowWarnings(false);
        tidy.setXmlTags(false);
        tidy.setInputEncoding("UTF-8");
        tidy.setOutputEncoding("UTF-8");
        tidy.setXHTML(true);// 
        tidy.setMakeClean(true);
        Document xmlDoc = tidy.parseDOM(fis, null);  
    try  
    {  
        tidy.pprint(xmlDoc,new FileOutputStream("c.xhtml"));  
    }  
    catch(Exception e)  
    {  
    }

Answer 1

采纳答案by mohammad

i created a function that parse the the xhtml code and remove the unwelcome tags and to add a link to the css File "tableStyle.css"

我创建了一个函数来解析 xhtml 代码并删除不受欢迎的标签并添加到 css 文件“tableStyle.css”的链接

    public static  String xhtmlparser(){ 
    String Cleanline="";

    try { 
        // the file url
        FileInputStream fstream = new FileInputStream("c.xhtml");
        // Use DataInputStream to read binary NOT text.
        BufferedReader br = new BufferedReader(new InputStreamReader(fstream));
        String strLine = null;
        int linescounter=0;
        while ((strLine = br.readLine()) != null)   {// read every line in the file             
            String m=strLine.replaceAll("&nbsp;", "");
            linescounter++;
            if(linescounter==5)
                m=m+"\n"+ "<link rel="+ "\"stylesheet\" "+"type="+ "\"text/css\" "+"href= " +"\"tableStyle.css\""+ "/>";
            Cleanline+=m+"\n";
        }

    }
    catch(IOException e){}

    return Cleanline;
}

but as a performance issue is it good?

但作为一个性能问题，它好吗？

by the way it works will

顺便说一下，它的工作原理将

Answer 2

回答by Christian

I had only success, when the input is treated as XML as well. So either set xmltags to true

当输入也被视为 XML 时，我只取得了成功。所以要么将 xmltags 设置为 true

 tidy.setXmlTags(true);

and live with the errors and warnings or do the conversion twice. First conversion to sanitize the html (html to xhtml) and a second conversion from xhtml to xhtml with set xmltags, thus no errors and warnings occur.

并忍受错误和警告，或者进行两次转换。第一次转换以清理 html（html 到 xhtml），然后使用设置的 xmltags 从 xhtml 到 xhtml 的第二次转换，因此不会发生错误和警告。

        String htmlFileName = "test.html";
    try( InputStream in = Thread.currentThread().getContextClassLoader().getResourceAsStream(htmlFileName);
         FileOutputStream fos = new FileOutputStream("tmp.xhtml");) {
        Tidy tidy = new Tidy();
        tidy.setShowWarnings(true);
        tidy.setInputEncoding("UTF-8");
        tidy.setOutputEncoding("UTF-8");
        tidy.setXHTML(true);
        tidy.setMakeClean(true);
        Document xmlDoc = tidy.parseDOM(in, fos);
    } catch (Exception e) {
        e.printStackTrace();
    }

    try( InputStream in = new FileInputStream("tmp.xhtml");
         FileOutputStream fos = new FileOutputStream("c.xhtml");) {
        Tidy tidy = new Tidy();
        tidy.setShowWarnings(true);
        tidy.setXmlTags(true);
        tidy.setInputEncoding("UTF-8");
        tidy.setOutputEncoding("UTF-8");
        tidy.setXHTML(true);
        tidy.setMakeClean(true);
        Document xmlDoc = tidy.parseDOM(in, null);
        tidy.pprint(xmlDoc, fos);
    } catch (Exception e) {
        e.printStackTrace();
    }

I used the latest jtidy version 938.

我使用了最新的 jtidy 版本 938。

Answer 3

回答by Tanmay kumar shaw

You can use the following method to get xhtml from html

您可以使用以下方法从 html 中获取 xhtml

public static String getXHTMLFromHTML(String inputFile,
            String outputFile) throws Exception {

        File file = new File(inputFile);
        FileOutputStream fos = null;
        InputStream is = null;
        try {
            fos = new FileOutputStream(outputFile);
            is = new FileInputStream(file);
            Tidy tidy = new Tidy(); 
            tidy.setXHTML(true); 
            tidy.parse(is, fos);
        } catch (FileNotFoundException e) {
            e.printStackTrace();
        }finally{
            if(fos != null){
                try {
                    fos.close();
                } catch (IOException e) {
                    fos = null;
                }
                fos = null;
            }
            if(is != null){
                try {
                    is.close();
                } catch (IOException e) {
                    is = null;
                }
                is = null;
            }
        }

        return outputFile;
    }

JTidy java API toConvert HTML to XHTML

提问by mohammad

采纳答案by mohammad

回答by Christian

回答by Tanmay kumar shaw

相关推荐

最近更新

标签

JTidy java API toConvert HTML to XHTML

提问by mohammad

采纳答案by mohammad

回答by Christian

回答by Tanmay kumar shaw

相关推荐

java C#中的同步方法

java setFirstResult 和 setMaxResults 没有按预期工作

java HTTP 请求中缺少元素 - Null 还是 Empty？

java 无法使用 maven 导入 o​​rg.springframework.jdbc.core

相关推荐

最近更新

标签

java 无法使用 maven 导入 org.springframework.jdbc.core