JTidy java API toConvert HTML to XHTML

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/15063870/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-31 18:22:09  来源:igfitidea点击:

JTidy java API toConvert HTML to XHTML

javahtmlxhtmljtidy

提问by mohammad

I am using JTidy to convert from HTML to XHTML but I found in my XHTML file this tag  . Can i prevent it ?
this is my code

我正在使用 JTidy 从 HTML 转换为 XHTML,但我在我的 XHTML 文件中找到了这个标签 。我可以预防吗?
这是我的代码

    //from html to xhtml
   try   
    {  
        fis = new FileInputStream(htmlFileName);  
    }  
    catch (java.io.FileNotFoundException e)   
    {  
        System.out.println("File not found: " + htmlFileName);  
    }  
        Tidy tidy = new Tidy(); 
        tidy.setShowWarnings(false);
        tidy.setXmlTags(false);
        tidy.setInputEncoding("UTF-8");
        tidy.setOutputEncoding("UTF-8");
        tidy.setXHTML(true);// 
        tidy.setMakeClean(true);
        Document xmlDoc = tidy.parseDOM(fis, null);  
    try  
    {  
        tidy.pprint(xmlDoc,new FileOutputStream("c.xhtml"));  
    }  
    catch(Exception e)  
    {  
    }

采纳答案by mohammad

i created a function that parse the the xhtml code and remove the unwelcome tags and to add a link to the css File "tableStyle.css"

我创建了一个函数来解析 xhtml 代码并删除不受欢迎的标签并添加到 css 文件“tableStyle.css”的链接

    public static  String xhtmlparser(){ 
    String Cleanline="";

    try { 
        // the file url
        FileInputStream fstream = new FileInputStream("c.xhtml");
        // Use DataInputStream to read binary NOT text.
        BufferedReader br = new BufferedReader(new InputStreamReader(fstream));
        String strLine = null;
        int linescounter=0;
        while ((strLine = br.readLine()) != null)   {// read every line in the file             
            String m=strLine.replaceAll(" ", "");
            linescounter++;
            if(linescounter==5)
                m=m+"\n"+ "<link rel="+ "\"stylesheet\" "+"type="+ "\"text/css\" "+"href= " +"\"tableStyle.css\""+ "/>";
            Cleanline+=m+"\n";
        }

    }
    catch(IOException e){}

    return Cleanline;
}

but as a performance issue is it good?

但作为一个性能问题,它好吗?

by the way it works will

顺便说一下,它的工作原理将

回答by Christian

I had only success, when the input is treated as XML as well. So either set xmltags to true

当输入也被视为 XML 时,我只取得了成功。所以要么将 xmltags 设置为 true

 tidy.setXmlTags(true);

and live with the errors and warnings or do the conversion twice. First conversion to sanitize the html (html to xhtml) and a second conversion from xhtml to xhtml with set xmltags, thus no errors and warnings occur.

并忍受错误和警告,或者进行两次转换。第一次转换以清理 html(html 到 xhtml),然后使用设置的 xmltags 从 xhtml 到 xhtml 的第二次转换,因此不会发生错误和警告。

        String htmlFileName = "test.html";
    try( InputStream in = Thread.currentThread().getContextClassLoader().getResourceAsStream(htmlFileName);
         FileOutputStream fos = new FileOutputStream("tmp.xhtml");) {
        Tidy tidy = new Tidy();
        tidy.setShowWarnings(true);
        tidy.setInputEncoding("UTF-8");
        tidy.setOutputEncoding("UTF-8");
        tidy.setXHTML(true);
        tidy.setMakeClean(true);
        Document xmlDoc = tidy.parseDOM(in, fos);
    } catch (Exception e) {
        e.printStackTrace();
    }

    try( InputStream in = new FileInputStream("tmp.xhtml");
         FileOutputStream fos = new FileOutputStream("c.xhtml");) {
        Tidy tidy = new Tidy();
        tidy.setShowWarnings(true);
        tidy.setXmlTags(true);
        tidy.setInputEncoding("UTF-8");
        tidy.setOutputEncoding("UTF-8");
        tidy.setXHTML(true);
        tidy.setMakeClean(true);
        Document xmlDoc = tidy.parseDOM(in, null);
        tidy.pprint(xmlDoc, fos);
    } catch (Exception e) {
        e.printStackTrace();
    }

I used the latest jtidy version 938.

我使用了最新的 jtidy 版本 938。

回答by Tanmay kumar shaw

You can use the following method to get xhtml from html

您可以使用以下方法从 html 中获取 xhtml

public static String getXHTMLFromHTML(String inputFile,
            String outputFile) throws Exception {

        File file = new File(inputFile);
        FileOutputStream fos = null;
        InputStream is = null;
        try {
            fos = new FileOutputStream(outputFile);
            is = new FileInputStream(file);
            Tidy tidy = new Tidy(); 
            tidy.setXHTML(true); 
            tidy.parse(is, fos);
        } catch (FileNotFoundException e) {
            e.printStackTrace();
        }finally{
            if(fos != null){
                try {
                    fos.close();
                } catch (IOException e) {
                    fos = null;
                }
                fos = null;
            }
            if(is != null){
                try {
                    is.close();
                } catch (IOException e) {
                    is = null;
                }
                is = null;
            }
        }

        return outputFile;
    }