java 从字符串中提取以特定字符开头的单词

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/29429074/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-11-02 15:15:18  来源:igfitidea点击:

Extract words starting with a particular character from a string

javastringextraction

提问by Devendra Singh

I got the following string:

我得到以下字符串:

 String line = "#food was testy. #drink lots of. #night was fab. #three #four";

I want to take #food#drink#night#threeand #fourfrom it.

我想借此#food#drink#night#three#four从它。

I tried this code:

我试过这个代码:

    String[] words = line.split("#");
    for (String word: words) {
        System.out.println(word);
    }

But it gives food was testy, drink lots of, nigth was fab, threeand four.

但它给出food was testy, drink lots of, nigth was fab,threefour

回答by Orace

splitwill only cuts the whole string at where it founds a #. That explain your current result.

split只会在找到# 的地方剪切整个字符串。这解释了你目前的结果。

You may want to extract the first word of every pieces of string, but the good tool to perform your task is RegEx

您可能想提取每个字符串的第一个单词,但执行任务的好工具是RegEx

Here how you can achieve it:

在这里你可以如何实现它:

String line = "#food was testy. #drink lots of. #night was fab. #three #four";

Pattern pattern = Pattern.compile("#\w+");

Matcher matcher = pattern.matcher(line);
while (matcher.find())
{
    System.out.println(matcher.group());
}

Output is:

输出是:

#food
#drink
#night
#three
#four

The magic happen in "#\w+".

魔法发生在“#\w+”中。

So we search for stuff starting with #followed by one or more letter, number or underscore.

因此,我们搜索以开头的内容,#后跟一个或多个字母、数字或下划线。

We use '\\' for '\' because of Escape Sequences.

由于转义序列,我们将 '\\' 用于 '\' 。

You can play with it here.

你可以在这里玩它。

findand groupare explained here:

findgroup这里解释:

  • The findmethod scans the input sequence looking for the next subsequence that matches the pattern.
  • group()returns the input subsequence matched by the previous match.
  • find方法扫描输入序列,寻找与模式匹配的下一个子序列。
  • group()返回与前一个匹配项匹配的输入子序列。

[edit]

[编辑]

The use of \wcan be an issue if you need to detect accented characters or non-latin characters.

\w如果您需要检测重音字符或非拉丁字符,则使用可能是一个问题。

For example in:

例如在:

"Bonjour mon #bébé #chat."

“Bonjour mon #bébé #chat。”

The matches will be:

比赛将是:

  • #b
  • #chat
  • #b
  • #聊天

It depends on what you will accept as possible hashTag. But it is an other question and multiplediscussionsexistabout it.

这取决于你会尽可能接受hashTag。但这是另一个问题,并且存在多个关于它的讨论

For example, if you want any characters from any language, #\p{L}+looks good, but the underscore is not in it...

例如,如果您想要任何语言的任何字符,#\p{L}+看起来不错,但下划线不在其中...

回答by Jitesh Upadhyay

Please follow the procedure to do ==>

请按照程序去做==>

   String candidate = "#food was testy. #drink lots of. #night was fab. #three #four";

        String regex = "#\w+";
        Pattern p = Pattern.compile(regex);

        Matcher m = p.matcher(candidate);
        String val = null;

        System.out.println("INPUT: " + candidate);

        System.out.println("REGEX: " + regex + "\r\n");

        while (m.find()) {
          val = m.group();
          System.out.println("MATCH: " + val);
        }
        if (val == null) {
          System.out.println("NO MATCHES: ");
        }

which will give output as follows as i solved the problem at my netbeans IDE and tested the program

当我在我的 netbeans IDE 解决问题并测试程序时,它将给出如下输出

INPUT: #food was testy. #drink lots of. #night was fab. #three #four

REGEX: #\w+

MATCH: #food

MATCH: #drink

MATCH: #night

MATCH: #three

MATCH: #four

you will need the following imports

您将需要以下导入

import java.util.regex.Matcher;
import java.util.regex.Pattern;