java 从字符串中提取以特定字符开头的单词
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/29429074/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Extract words starting with a particular character from a string
提问by Devendra Singh
I got the following string:
我得到以下字符串:
String line = "#food was testy. #drink lots of. #night was fab. #three #four";
I want to take #food
#drink
#night
#three
and #four
from it.
我想借此#food
#drink
#night
#three
和#four
从它。
I tried this code:
我试过这个代码:
String[] words = line.split("#");
for (String word: words) {
System.out.println(word);
}
But it gives food was testy
, drink lots of
, nigth was fab
, three
and four
.
但它给出food was testy
, drink lots of
, nigth was fab
,three
和four
。
回答by Orace
split
will only cuts the whole string at where it founds a #. That explain your current result.
split
只会在找到# 的地方剪切整个字符串。这解释了你目前的结果。
You may want to extract the first word of every pieces of string, but the good tool to perform your task is RegEx
您可能想提取每个字符串的第一个单词,但执行任务的好工具是RegEx
Here how you can achieve it:
在这里你可以如何实现它:
String line = "#food was testy. #drink lots of. #night was fab. #three #four";
Pattern pattern = Pattern.compile("#\w+");
Matcher matcher = pattern.matcher(line);
while (matcher.find())
{
System.out.println(matcher.group());
}
Output is:
输出是:
#food
#drink
#night
#three
#four
The magic happen in "#\w+".
魔法发生在“#\w+”中。
#
the pattern start with a #\w
Matches any letter (a-z, A-Z), number (0-9), or underscore.+
Matches one or more consecutive\w
characters.
#
模式以 # 开头\w
匹配任何字母 (az, AZ)、数字 (0-9) 或下划线。+
匹配一个或多个连续\w
字符。
So we search for stuff starting with #
followed by one or more letter, number or underscore.
因此,我们搜索以开头的内容,#
后跟一个或多个字母、数字或下划线。
We use '\\' for '\' because of Escape Sequences.
由于转义序列,我们将 '\\' 用于 '\' 。
You can play with it here.
你可以在这里玩它。
find
and group
are explained here:
find
并group
在这里解释:
- The
find
method scans the input sequence looking for the next subsequence that matches the pattern. group()
returns the input subsequence matched by the previous match.
- 该
find
方法扫描输入序列,寻找与模式匹配的下一个子序列。 group()
返回与前一个匹配项匹配的输入子序列。
[edit]
[编辑]
The use of \w
can be an issue if you need to detect accented characters or non-latin characters.
\w
如果您需要检测重音字符或非拉丁字符,则使用可能是一个问题。
For example in:
例如在:
"Bonjour mon #bébé #chat."
“Bonjour mon #bébé #chat。”
The matches will be:
比赛将是:
- #b
- #chat
- #b
- #聊天
It depends on what you will accept as possible hashTag. But it is an other question and multiplediscussionsexistabout it.
这取决于你会尽可能接受hashTag。但这是另一个问题,并且存在多个关于它的讨论。
For example, if you want any characters from any language, #\p{L}+
looks good, but the underscore is not in it...
例如,如果您想要任何语言的任何字符,#\p{L}+
看起来不错,但下划线不在其中...
回答by Jitesh Upadhyay
Please follow the procedure to do ==>
请按照程序去做==>
String candidate = "#food was testy. #drink lots of. #night was fab. #three #four";
String regex = "#\w+";
Pattern p = Pattern.compile(regex);
Matcher m = p.matcher(candidate);
String val = null;
System.out.println("INPUT: " + candidate);
System.out.println("REGEX: " + regex + "\r\n");
while (m.find()) {
val = m.group();
System.out.println("MATCH: " + val);
}
if (val == null) {
System.out.println("NO MATCHES: ");
}
which will give output as follows as i solved the problem at my netbeans IDE and tested the program
当我在我的 netbeans IDE 解决问题并测试程序时,它将给出如下输出
INPUT: #food was testy. #drink lots of. #night was fab. #three #four
REGEX: #\w+
MATCH: #food
MATCH: #drink
MATCH: #night
MATCH: #three
MATCH: #four
you will need the following imports
您将需要以下导入
import java.util.regex.Matcher;
import java.util.regex.Pattern;