Java查找所有以字母开头的单词
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/22399118/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Java Finding all words begining with a letter
提问by Yahya Uddin
I am trying to get all words that begin with a letter from a long string. How would you do this is java? I don't want to loop through every letter or something inefficient.
我试图从一个长字符串中获取所有以字母开头的单词。你会怎么做这是java?我不想遍历每个字母或低效的东西。
EDIT: I also can't use any in built data structures (except arrays of course)- its for a cs class. I can however make my own data structures (which i have created sevral).
编辑:我也不能使用任何内置的数据结构(当然数组除外)——它用于 cs 类。但是,我可以制作自己的数据结构(我已经创建了 sevral)。
回答by Ankit Rustagi
回答by mig
Scanner scan = new Scanner(text); // text being the string you are looking in
char test = 'x'; //whatever letter you are looking for
while(scan.hasNext()){
String wordFound = scan.next();
if(wordFound.charAt(0)==test){
//do something with the wordFound
}
}
this will do what you are looking for, inside the if statement do what you want with the word
这将做你正在寻找的,在 if 语句中做你想要的单词
回答by Stathis Andronikos
You can use split() method. Here is an example :
您可以使用 split() 方法。这是一个例子:
String string = "your string";
String[] parts = string.split(" C");
for(int i=0; i<parts.length; i++) {
String[] word = parts[i].split(" ");
if( i > 0 ) {
// ignore the rest words because don't starting with C
System.out.println("C" + word[0]);
}
else { // Check 1st excplicitly
for(int j=0; j<word.length; j++) {
if ( word[j].startsWith("c") || word[j].startsWith("C"))
System.out.println(word[j]);
}
}
}
where "C" is you letter. Just then loop around the array. For parts[0] you have to check if it starts with "C". It was my mistake to start looping from i=1. The correct is from 0.
其中“C”是您的字母。然后在数组周围循环。对于parts[0],您必须检查它是否以“C”开头。从 i=1 开始循环是我的错误。正确的是从0开始。
回答by Levenal
You could try obtaining an array collection from your String and then iterating through it:
您可以尝试从 String 获取数组集合,然后遍历它:
String s = "my very long string to test";
for(String st : s.split(" ")){
if(st.startsWith("t")){
System.out.println(st);
}
}
回答by Gaurav Gupta
You can get the first letter of the string and check with API method that if it is letter or not.
您可以获取字符串的第一个字母并使用 API 方法检查它是否为字母。
String input = "jkk ds 32";
String[] array = input.split(" ");
for (String word : array) {
char[] arr = word.toCharArray();
char c = arr[0];
if (Character.isLetter(c)) {
System.out.println( word + "\t isLetter");
} else {
System.out.println(word + "\t not Letter");
}
}
Following are some sample output:
以下是一些示例输出:
jkk isLetter
ds isLetter
32 not Letter
回答by Andrés Oviedo
Regexp way:
正则表达式方式:
public static void main(String[] args) {
String text = "my very long string to test";
Matcher m = Pattern.compile("(^|\W)(\w*)").matcher(text);
while (m.find()) {
System.out.println("Found: "+m.group(2));
}
}
回答by stema
You need to be clear about some things. What is a "word"? You want to find only "words" starting with a letter, so I assume that words can have other characters too. But what chars are allowed? What defines the start of such a word? Whitespace, any non letter, any non letter/non digit, ...?
你需要清楚一些事情。什么是“词”?您只想找到以字母开头的“单词”,因此我假设单词也可以包含其他字符。但是允许使用什么字符?什么定义了这样一个词的开头?空格,任何非字母,任何非字母/非数字,...?
e.g.:
例如:
String TestInput = "test séntènce ?where I'm want,to üfind 1words starting $with le11ers.";
String regex = "(?<=^|\s)\pL\w*";
Pattern p = Pattern.compile(regex, Pattern.UNICODE_CHARACTER_CLASS);
Matcher matcher = p.matcher(TestInput);
while (matcher.find()) {
System.out.println(matcher.group());
}
The regex (?<=^|\s)\pL\w*
will find sequences that starts with a letter (\pL
is a Unicode propertyfor letter), followed by 0 or more "word" characters(Unicode letters and numbers, because of the modifier Pattern.UNICODE_CHARACTER_CLASS
).
The lookbehind assertion(?<=^|\s)
ensures that there is the start of the string or a whitespace before the sequence.
正则表达式(?<=^|\s)\pL\w*
将查找以字母开头的序列(\pL
是字母的Unicode 属性),后跟 0 个或多个“单词”字符(Unicode 字母和数字,因为有修饰符Pattern.UNICODE_CHARACTER_CLASS
)。
该向后断言(?<=^|\s)
确保有字符串的开始或序列之前一个空白。
So my code will print:
所以我的代码将打印:
test
séntènce ==> contains non ASCII letters
?where ==> starts with a non ASCII letter
I ==> 'm is missing, because `'` is not in `\w`
want
üfind ==> starts with a non ASCII letter
starting
le11ers ==> contains digits
Missing words:
遗漏的话:
,to ==> starting with a ","
1words ==> starting with a digit
$with ==> starting with a "$"