java 从字符串中提取以特定字符开头的单词

Question

提问by Devendra Singh

I got the following string:

我得到以下字符串：

 String line = "#food was testy. #drink lots of. #night was fab. #three #four";

I want to take #food#drink#night#threeand #fourfrom it.

我想借此#food#drink#night#three和#four从它。

I tried this code:

我试过这个代码：

    String[] words = line.split("#");
    for (String word: words) {
        System.out.println(word);
    }

But it gives food was testy, drink lots of, nigth was fab, threeand four.

但它给出food was testy, drink lots of, nigth was fab,three和four。

Answer 1

回答by Orace

splitwill only cuts the whole string at where it founds a #. That explain your current result.

split只会在找到# 的地方剪切整个字符串。这解释了你目前的结果。

You may want to extract the first word of every pieces of string, but the good tool to perform your task is RegEx

您可能想提取每个字符串的第一个单词，但执行任务的好工具是RegEx

Here how you can achieve it:

在这里你可以如何实现它：

String line = "#food was testy. #drink lots of. #night was fab. #three #four";

Pattern pattern = Pattern.compile("#\w+");

Matcher matcher = pattern.matcher(line);
while (matcher.find())
{
    System.out.println(matcher.group());
}

Output is:

输出是：

#food
#drink
#night
#three
#four

The magic happen in "#\w+".

魔法发生在“#\w+”中。

#the pattern start with a #
\wMatches any letter (a-z, A-Z), number (0-9), or underscore.
+Matches one or more consecutive \wcharacters.

#模式以 # 开头
\w匹配任何字母 (az, AZ)、数字 (0-9) 或下划线。
+匹配一个或多个连续\w字符。

So we search for stuff starting with #followed by one or more letter, number or underscore.

因此，我们搜索以开头的内容，#后跟一个或多个字母、数字或下划线。

We use '\\' for '\' because of Escape Sequences.

由于转义序列，我们将 '\\' 用于 '\' 。

You can play with it here.

你可以在这里玩它。

findand groupare explained here:

find并group在这里解释：

The findmethod scans the input sequence looking for the next subsequence that matches the pattern.
group()returns the input subsequence matched by the previous match.

该find方法扫描输入序列，寻找与模式匹配的下一个子序列。
group()返回与前一个匹配项匹配的输入子序列。

[edit]

[编辑]

The use of \wcan be an issue if you need to detect accented characters or non-latin characters.

\w如果您需要检测重音字符或非拉丁字符，则使用可能是一个问题。

For example in:

例如在：

"Bonjour mon #bébé #chat."

“Bonjour mon #bébé #chat。”

The matches will be:

比赛将是：

#b
#chat

#b
＃聊天

It depends on what you will accept as possible hashTag. But it is an other question and multiple discussions exist about it.

这取决于你会尽可能接受hashTag。但这是另一个问题，并且存在多个关于它的讨论。

For example, if you want any characters from any language, #\p{L}+looks good, but the underscore is not in it...

例如，如果您想要任何语言的任何字符，#\p{L}+看起来不错，但下划线不在其中...

Answer 2

回答by Jitesh Upadhyay

Please follow the procedure to do ==>

请按照程序去做==>

   String candidate = "#food was testy. #drink lots of. #night was fab. #three #four";

        String regex = "#\w+";
        Pattern p = Pattern.compile(regex);

        Matcher m = p.matcher(candidate);
        String val = null;

        System.out.println("INPUT: " + candidate);

        System.out.println("REGEX: " + regex + "\r\n");

        while (m.find()) {
          val = m.group();
          System.out.println("MATCH: " + val);
        }
        if (val == null) {
          System.out.println("NO MATCHES: ");
        }

which will give output as follows as i solved the problem at my netbeans IDE and tested the program

当我在我的 netbeans IDE 解决问题并测试程序时，它将给出如下输出

INPUT: #food was testy. #drink lots of. #night was fab. #three #four

REGEX: #\w+

MATCH: #food

MATCH: #drink

MATCH: #night

MATCH: #three

MATCH: #four

you will need the following imports

您将需要以下导入

import java.util.regex.Matcher;
import java.util.regex.Pattern;

java 从字符串中提取以特定字符开头的单词

提问by Devendra Singh

回答by Orace

回答by Jitesh Upadhyay

相关推荐

最近更新

标签

java 从字符串中提取以特定字符开头的单词

提问by Devendra Singh

回答by Orace

回答by Jitesh Upadhyay

相关推荐

NullPointerException：尝试对空对象引用调用虚拟方法 AlertDialog.setTitle(java.lang.CharSequence)

java 如何在 Hibernate 应用程序中处理数据库空值？

java 最小的 java8 nio 安全 websocket 客户端 (wss)

java Lombok 的 Maven 范围（编译与提供）

相关推荐

最近更新

标签