java 在空行上将文本文件拆分为字符串
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/10065885/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Split text file into Strings on empty line
提问by Sunny
I want to read a local txt file and read the text in this file. After that i want to split this whole text into Strings like in the example below .
我想读取本地txt文件并读取该文件中的文本。之后,我想将整个文本拆分为字符串,如下例所示。
Example : Lets say file contains-
示例:假设文件包含-
abcdef
ghijkl
aededd
ededed
ededfe
efefeef
efefeff
......
......
I want to split this text in to Strings
我想将此文本拆分为字符串
s1 = abcdef+"\n"+ghijkl;
s2 = aededd+"\n"+ededed;
s3 = ededfe+"\n"+efefeef+"\n"+efefeff;
........................
I mean I want to split text on empty line.
我的意思是我想在空行上拆分文本。
I do know how to read a file. I want help in splitting the text in to strings
我知道如何读取文件。我需要帮助将文本拆分为字符串
回答by Kevin
you can split a string to an array by
您可以通过以下方式将字符串拆分为数组
String.split();
if you want it by new lines it will be
如果你想要它的新行,它将是
String.split("\n\n");
UPDATE*
更新*
If I understand what you are saying then john.
如果我明白你在说什么,那么约翰。
then your code will essentially be
那么你的代码基本上是
BufferedReader in
= new BufferedReader(new FileReader("foo.txt"));
List<String> allStrings = new ArrayList<String>();
String str ="";
while(true)
{
String tmp = in.readLine();
if(tmp.isEmpty())
{
if(!str.isEmpty())
{
allStrings.add(str);
}
str= "";
}
else if(tmp==null)
{
break;
}
else
{
if(str.isEmpty())
{
str = tmp;
}
else
{
str += "\n" + tmp;
}
}
}
Might be what you are trying to parse.
可能是您要解析的内容。
Where allStrings is a list of all of your strings.
其中 allStrings 是所有字符串的列表。
回答by Pushpak Dagade
The below code would work even if there are more than 2 empty lines between useful data.
即使有用数据之间有 2 个以上的空行,下面的代码也能工作。
import java.util.regex.*;
// read your file and store it in a string named str_file_data
Pattern p = Pattern.compile("\n[\n]+"); /*if your text file has \r\n as the newline character then use Pattern p = Pattern.compile("\r\n[\r\n]+");*/
String[] result = p.split(str_file_data);
(I did not test the code so there could be typos.)
(我没有测试代码,所以可能会有错别字。)
回答by grayswander
I would suggest more general regexp:
我会建议更通用的正则表达式:
text.split("(?m)^\s*$");
In this case it would work correctly on any end-of-line convention, and also would treat the same empty and blank-space-only lines.
在这种情况下,它可以在任何行尾约定上正常工作,并且还会处理相同的空行和仅空格行。
回答by Godwin
It may depend on how the file is encoded, so I would likely do the following:
这可能取决于文件的编码方式,因此我可能会执行以下操作:
String.split("(\n\r|\n|\r){2}");
Some text files encode newlines as "\n\r" while others may be simply "\n". Two new lines in a row means you have an empty line.
一些文本文件将换行符编码为“\n\r”,而其他文本文件可能只是“\n”。连续两个新行意味着您有一个空行。
回答by Brian
Godwin was on the right track, but I think we can make this work a bit better. Using the '[ ]' in regx is an or, so in his example if you had a \r\n that would just be a new line not an empty line. The regular expression would split it on both the \r and the \n, and I believe in the example we were looking for an empty line which would require a either a \n\r\n\r, a \r\n\r\n, a \n\r\r\n, a \r\n\n\r, or a \n\n or a \r\r
Godwin 走在正确的轨道上,但我认为我们可以让这项工作做得更好。在 regx 中使用 '[ ]' 是一个或,所以在他的例子中,如果你有一个 \r\n 那只是一个新行而不是一个空行。正则表达式会将它拆分为 \r 和 \n,我相信在示例中我们正在寻找一个空行,该行需要一个 \n\r\n\r、一个 \r\n\ r\n、\n\r\r\n、\r\n\n\r、或\n\n 或\r\r
So first we want to look for either \n\r or \r\n twice, with any combination of the two being possible.
所以首先我们要查找 \n\r 或 \r\n 两次,两者的任意组合都是可能的。
String.split(((\n\r)|(\r\n)){2}));
next we need to look for \r without a \n after it
接下来我们需要寻找 \r 之后没有 \n
String.split(\r{2});
lastly, lets do the same for \n
最后,让我们对 \n 做同样的事情
String.split(\n{2});
And all together that should be
所有这些都应该是
String.split("((\\n\\r)|(\\r\\n)){2}|(\\r){2}|(\\n){2}");
String.split("((\\n\\r)|(\\r\\n)){2}|(\\r){2}|(\\n){2}");
Note, this works only on the very specific example of using new lines and character returns. I in ruby you can do the following which would encompass more cases. I don't know if there is an equivalent in Java.
请注意,这仅适用于使用换行符和字符返回的非常具体的示例。我在 ruby 中,您可以执行以下操作,其中包含更多情况。我不知道 Java 中是否有等价物。
.match($^$)
回答by dna
@Kevin code works fine and as he mentioned that the code was not tested, here are the 3 changes required:
@Kevin 代码工作正常,正如他提到的代码没有经过测试,这里需要进行 3 次更改:
1.The if check for (tmp==null)should come first, otherwise there will be a null pointer exception.
1. if check for (tmp==null)应该先来,否则会出现空指针异常。
2.This code leaves out the last set of lines being added to the ArrayList. To make sure the last one gets added, we have to include this code after the while loop: if(!str.isEmpty()) { allStrings.add(str); }
2.此代码省略了添加到 ArrayList 的最后一组行。为了确保添加最后一个,我们必须在 while 循环之后包含以下代码:if(!str.isEmpty()) { allStrings.add(str); }
3.The line str += "\n" + tmp; should be changed to use \ninstead if \\n. Please see the end of this thread, I have added the entire code so that it can help
3. 行str += "\n" + tmp; 如果\\n ,则应改为使用\ n。请看这个帖子的结尾,我已经添加了整个代码,以便它可以提供帮助
BufferedReader in
= new BufferedReader(new FileReader("foo.txt"));
List<String> allStrings = new ArrayList<String>();
String str ="";
List<String> allStrings = new ArrayList<String>();
String str ="";
while(true)
{
String tmp = in.readLine();
if(tmp==null)
{
break;
}else if(tmp.isEmpty())
{
if(!str.isEmpty())
{
allStrings.add(str);
}
str= "";
}else
{
if(str.isEmpty())
{
str = tmp;
}
else
{
str += "\n" + tmp;
}
}
}
if(!str.isEmpty())
{
allStrings.add(str);
}