java 在java中将换行符和段落换行符转换为新行
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/3132257/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
convert breaks and paragraph breaks into new line in java
提问by user91954
Basically I have an HTML fragment with <br>and <p></p>inside. I was able to remove all the HTML tags but doing so leaves the text in a bad format.
基本上我有一个带有<br>和<p></p>内部的 HTML 片段。我能够删除所有 HTML 标签,但这样做会使文本格式错误。
I want something like nl2br()in PHP except reverse the input and output and also takes into account <p>tags. is there a library for it in Java?
我想要一些类似于nl2br()PHP 的东西,除了反转输入和输出,还考虑了<p>标签。Java 中有它的库吗?
回答by BalusC
You basically need to replace each <br>with \nand each <p>with \n\n. So, at the points where you succeed to remove them, you need to insert the \nand \n\nrespectively.
你基本上需要更换每一个<br>与\n每个<p>用\n\n。因此,在您成功删除它们的地方,您需要分别插入\n和\n\n。
Here's a kickoff example with help of the JsoupHTML parser (the HTML example is intentionally written that way so that it's hard if not nearly impossible to use regex for this).
这是在JsoupHTML 解析器的帮助下的启动示例(HTML 示例是故意这样编写的,因此很难甚至几乎不可能为此使用正则表达式)。
public static void main(String[] args) throws Exception {
String originalHtml = "<p>p1l1<br/><!--</p>-->p1l2<br><!--<p>--></br><p id=p>p2l1<br class=b>p2l2</p>";
String text = br2nl(originalHtml);
String newHtml = nl2br(text);
System.out.println("-------------");
System.out.println(text);
System.out.println("-------------");
System.out.println(newHtml);
}
public static String br2nl(String html) {
Document document = Jsoup.parse(html);
document.select("br").append("\n");
document.select("p").prepend("\n\n");
return document.text().replace("\n", "\n");
}
public static String nl2br(String text) {
return text.replace("\n\n", "<p>").replace("\n", "<br>");
}
(note: replaceAll()is unnecessary as we just want a simple charsequence-by-charsequence replacement here, not regexpattern-by-charsequence replacement)
(注意:这replaceAll()是不必要的,因为我们只想要一个简单的 charsequence-by-charsequence 替换,而不是 regexpattern-by-charsequence 替换)
Output:
输出:
<p>p1l1<br/><!--</p>-->p1l2<br><!--<p>--></br><p id=p>p2l1<br class=b>p2l2</p>
-------------
p1l1
p1l2
p2l1
p2l2
-------------
<p>p1l1 <br>p1l2 <br> <br> <p>p2l1 <br>p2l2
A bit hacky, but it works.
有点hacky,但它有效。
回答by Andreas Dolk
br2nland p2nlare not too complicated. Give this a try:
br2nl并且p2nl不太复杂。试试这个:
String plain = htmlText.replaceAll("<br>","\n").replaceAll("<p>","\n\n").replaceAll("</p>","");
回答by Joelio
You should be able to use replaceAll. See http://www.rgagnon.com/javadetails/java-0454.htmlfor an example. Just 2 of those, one for p and one for br. The example is going the other way, but you can change it around to replace the html with slash n
您应该能够使用replaceAll。有关示例,请参见http://www.rgagnon.com/javadetails/java-0454.html。只有其中 2 个,一个用于 p,一个用于 br。该示例相反,但您可以更改它以将 html 替换为斜杠 n

