java 检测文本文件中的制表符空间和下一个标记符号

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/5051194/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-30 09:10:47  来源:igfitidea点击:

Detecting tab space and next-lime markup symbols in text files

javaparsing

提问by aneuryzm

I need to parse a raw text file having a item for each line, and tab-delimited fields.

我需要解析一个原始文本文件,每行都有一个项目,以及制表符分隔的字段。

How can I detect a tab space and next-line markup symbols from a plain text document ? I was thinking to use Java APIs for it... but if you know any faster language and easy to use) for text parsing please let me know

如何从纯文本文档中检测制表符空间和下一行标记符号?我正在考虑使用 Java APIs ......但如果你知道任何更快的语言并且易于使用)进行文本解析,请告诉我

thanks

谢谢

回答by Jigar Joshi

String str = "Hello\tworld\nHello Universe";
System.out.println(str);
System.out.println(str.contains("\t"));
System.out.println(str.indexOf("\t"));
System.out.println(str.contains("\n"));
System.out.println(str.indexOf("\n"));

Output:

输出:

Hello        world
Hello Universe
true
5
true
11

回答by Dead Programmer

You can try this

你可以试试这个

 try 
 {
     BufferedReader br = new BufferedReader(new FileReader(file1));
     String strLine = "";
      while (br.readLine() != null) 
      {
        strLine =br.readLine();
        Scanner str = new Scanner(strLine);
        str.useDelimiter("\t");
        while(str.hasNextToken)
        {
        }
      }
   } catch (Exception e)
   {
   } 

回答by Daniel

You can use the Guava librairy from Google
Have a look to the CharMatcherand Guava's slides

您可以使用 Google 的 Guava 库
查看CharMatcherGuava 的幻灯片

This is an exemple :

这是一个例子:

@Test
public void testGuavaMatcher(){

    String str = "Hello\tworld\nHello Universe";        

    CharMatcher tabMatcher = CharMatcher.is('\t');
    CharMatcher newLineMatcher = CharMatcher.is('\n');

    assertThat(tabMatcher.indexIn(str), is(5));
    assertThat(tabMatcher.matchesAnyOf(str), is(true));
    assertThat(newLineMatcher.indexIn(str), is(11));
    assertThat(newLineMatcher.matchesAnyOf(str), is(true));

    CharMatcher tabAndNewLineMatcher = tabMatcher.or(newLineMatcher);

    assertThat(tabAndNewLineMatcher.removeFrom(str), is("HelloworldHello Universe"));
}  

You can also have a look to the CharMatcher.BREAKING_WHITESPACE constant.

您还可以查看 CharMatcher.BREAKING_WHITESPACE 常量。

回答by Andrew Thompson

Text files do not have 'mark up' as such. Get each line using BufferedReader.readLine(). Tabs can be found by searching the lines for "\t".

文本文件本身没有“标记”。使用 BufferedReader.readLine() 获取每一行。可以通过搜索“\t”行来找到制表符。