java 检测文本文件中的制表符空间和下一个标记符号

Question

提问by aneuryzm

I need to parse a raw text file having a item for each line, and tab-delimited fields.

我需要解析一个原始文本文件，每行都有一个项目，以及制表符分隔的字段。

How can I detect a tab space and next-line markup symbols from a plain text document ? I was thinking to use Java APIs for it... but if you know any faster language and easy to use) for text parsing please let me know

如何从纯文本文档中检测制表符空间和下一行标记符号？我正在考虑使用 Java APIs ......但如果你知道任何更快的语言并且易于使用）进行文本解析，请告诉我

thanks

谢谢

Answer 1

回答by Jigar Joshi

String str = "Hello\tworld\nHello Universe";
System.out.println(str);
System.out.println(str.contains("\t"));
System.out.println(str.indexOf("\t"));
System.out.println(str.contains("\n"));
System.out.println(str.indexOf("\n"));

Output:

输出：

Hello        world
Hello Universe
true
5
true
11

Answer 2

回答by Dead Programmer

You can try this

你可以试试这个

 try 
 {
     BufferedReader br = new BufferedReader(new FileReader(file1));
     String strLine = "";
      while (br.readLine() != null) 
      {
        strLine =br.readLine();
        Scanner str = new Scanner(strLine);
        str.useDelimiter("\t");
        while(str.hasNextToken)
        {
        }
      }
   } catch (Exception e)
   {
   }

Answer 3

回答by Daniel

You can use the Guava librairy from Google
Have a look to the CharMatcherand Guava's slides

您可以使用 Google 的 Guava 库
查看CharMatcher和Guava 的幻灯片

This is an exemple :

这是一个例子：

@Test
public void testGuavaMatcher(){

    String str = "Hello\tworld\nHello Universe";        

    CharMatcher tabMatcher = CharMatcher.is('\t');
    CharMatcher newLineMatcher = CharMatcher.is('\n');

    assertThat(tabMatcher.indexIn(str), is(5));
    assertThat(tabMatcher.matchesAnyOf(str), is(true));
    assertThat(newLineMatcher.indexIn(str), is(11));
    assertThat(newLineMatcher.matchesAnyOf(str), is(true));

    CharMatcher tabAndNewLineMatcher = tabMatcher.or(newLineMatcher);

    assertThat(tabAndNewLineMatcher.removeFrom(str), is("HelloworldHello Universe"));
}

You can also have a look to the CharMatcher.BREAKING_WHITESPACE constant.

您还可以查看 CharMatcher.BREAKING_WHITESPACE 常量。

Answer 4

回答by Andrew Thompson

Text files do not have 'mark up' as such. Get each line using BufferedReader.readLine(). Tabs can be found by searching the lines for "\t".

文本文件本身没有“标记”。使用 BufferedReader.readLine() 获取每一行。可以通过搜索“\t”行来找到制表符。

java 检测文本文件中的制表符空间和下一个标记符号

提问by aneuryzm

回答by Jigar Joshi

回答by Dead Programmer

回答by Daniel

回答by Andrew Thompson

相关推荐

最近更新

标签

java 检测文本文件中的制表符空间和下一个标记符号

提问by aneuryzm

回答by Jigar Joshi

回答by Dead Programmer

回答by Daniel

回答by Andrew Thompson

相关推荐

java 布尔递归

java 当 TestNG @BeforeMethod 方法驻留在超类中并且运行特定组时不会调用它

java 无法执行 JDBC 批量更新

java 为什么/我如何得到错误：NoClassDefFoundError: org/springframework/aop/framework/ProxyFactory

相关推荐

最近更新

标签