Java 如何使用正则表达式匹配特定类型单词之前的所有内容

Question

提问by John Daly

I am new to regular expressions.

我是正则表达式的新手。

Is it possible to match everything before a word that meets a certain criteria:

是否可以匹配满足特定条件的单词之前的所有内容：

E.g.

例如

THIS IS A TEST - - +++ This is a test

这是一个测试 - - +++ 这是一个测试

I would like it to encounter a word that begins with an uppercase and the next character is lower case. This constitutes a proper word. I would then like to delete everything before that word.

我希望它遇到一个以大写开头而下一个字符是小写的单词。这构成了一个恰当的词。然后我想删除那个词之前的所有内容。

The example above should produce: This is a test

上面的例子应该产生：这是一个测试

I only want to this processing until it finds the proper word and then stop.

我只想这个处理，直到它找到合适的词然后停止。

Any help would be appreciated.

任何帮助，将不胜感激。

Thanks

谢谢

Answer 1

采纳答案by Tomalak

Replace

代替

^.*?(?=[A-Z][a-z])

with the empty string. This works for ASCII input. For non-ASCII input (Unicode, other languages), different strategies apply.

与空字符串。这适用于 ASCII 输入。对于非 ASCII 输入（Unicode、其他语言），应用不同的策略。

Explanation

解释

.*?    Everything, until
(?=    followed by
[A-Z]  one of A .. Z and
[a-z]  one of a .. z
)

The Java Unicode-enabled variant would be this:

启用 Java Unicode 的变体是这样的：

^.*?(?=\p{Lu}\p{Ll})

Answer 2

回答by hhafez

then you can do something like this

那么你可以做这样的事情

'.*([A-Z][a-z].*)\s*'

.* matches anything
( [A-Z] #followed by an uper case char 
  [a-z] #followed by a lower case 
  .*)   #followed by anything
  \s*   #followed by zeror or more white space

Which is what you are looking for I think

我想这就是你要找的

Answer 3

回答by Jon Skeet

Having woken up a bit, you don't need to delete anything, or even create a sub-group - just find the pattern expressed elsewhere in answers. Here's a complete example:

醒来后，您不需要删除任何内容，甚至不需要创建子组 - 只需找到答案中其他地方表达的模式即可。这是一个完整的例子：

import java.util.regex.*;

public class Test
{
    public static void main(String args[])
    {
        Pattern pattern = Pattern.compile("[A-Z][a-z].*");

        String original = "THIS IS A TEST - - +++ This is a test";
        Matcher match = pattern.matcher(original);
        if (match.find())
        {
            System.out.println(match.group());
        }
        else
        {
            System.out.println("No match");
        }        
    }
}

EDIT: Original answer

编辑：原始答案

This looks like it's doing the right thing:

这看起来像是在做正确的事情：

import java.util.regex.*;

public class Test
{
    public static void main(String args[])
    {
        Pattern pattern = Pattern.compile("^.*?([A-Z][a-z].*)$");

        String original = "THIS IS A TEST - - +++ This is a test";
        String replaced = pattern.matcher(original).replaceAll("");

        System.out.println(replaced);
    }
}

Basically the trick is not to ignore everything before the proper word - it's to group everything from the proper word onwards, and replace the whole text with that group.

基本上，诀窍不是忽略正确单词之前的所有内容 - 而是将正确单词之后的所有内容分组，并用该组替换整个文本。

The above would fail with "*** FOO *** I am fond of peanuts"because the "I" wouldn't be considered a proper word. If you want to fix that, change the [a-z] to [a-z\s] which will allow for whitespace instead of a letter.

上面会失败，"*** FOO *** I am fond of peanuts"因为“我”不会被认为是一个合适的词。如果您想解决这个问题，请将 [az] 更改为 [az\s]，这将允许使用空格而不是字母。

Answer 4

回答by Maiku Mori

([A-Z][a-z].+)

([AZ][az].+)

would match:

会匹配：

This is a text

这是一段文字

Answer 5

回答by Bill K

I know my opinion on this really isn't that popular so you guys can down-vote me into oblivion if you want, but I have to rant a little (and this contains an solution, just not in the way the poster asked for).

我知道我对此的看法真的不是那么受欢迎，所以你们可以根据需要投票给我遗忘，但我必须咆哮一点（这包含一个解决方案，只是不是海报要求的方式） .

I really don't get why people go to regular expressions so quickly.

我真的不明白为什么人们这么快就使用正则表达式。

I've done a lot of string parsing (Used to screen-scrape vt100 menu screens) and I've never found a single case where Regular Expressions would have been much easier than just writing code. (Maybe a couple would have been a little easier, but not much).

我已经完成了很多字符串解析（用于屏幕抓取 vt100 菜单屏幕），但我从未发现正则表达式比编写代码容易得多的情况。（也许一对夫妇会更容易一些，但不会太多）。

I kind of understand they are supposed to be easier once you know them--but you see someone ask a question like this and realize they aren't easy for every programmer to just get by glancing at it. If it costs 1 programmer somewhere down the line 10 minutes of thought, it has a huge net loss over just coding it, even if you took 5 minutes to write 5 lines.

我有点理解，一旦你了解它们，它们应该会更容易——但是你看到有人问这样的问题，并意识到每个程序员都不容易通过瞥一眼就知道它们。如果让 1 名程序员花费 10 分钟的思考时间，那么即使您花了 5 分钟来编写 5 行代码，也比仅仅编写代码会造成巨大的净损失。

So it's going to need documentation--and if someone who is at that same level comes across it, he won't be able to modify it without knowledge outside his domain, even with documentation.

所以它将需要文档——如果处于同一级别的人遇到它，他将无法在没有他的领域之外的知识的情况下修改它，即使有文档也是如此。

I mean if the poster had to ask on a trivial case--then there just isn't such thing as a trivial case.

我的意思是，如果发帖人不得不问一个微不足道的案例——那么就不存在微不足道的案例了。

public String getRealText(String scanMe) {
    for(int i=0 ; i < scanMe.length ; i++)
        if( isUpper(scanMe[i]) && isLower(scanMe[i+1]) )
            return scanMe.subString(i);
return null; }

I mean it's 5 lines, but it's simple, readable, and faster than most (all?) RE parsers. Once you've wrapped a regular expression in a method and commented it, the difference in size isn't measurable. The difference in time--well for the poster it would have obviously been a LOT less time--as it might be for the next guy that comes across his code.

我的意思是它有 5 行，但它比大多数（全部？）RE 解析器简单、易读且速度更快。一旦您在方法中包装了正则表达式并对其进行了注释，大小的差异就无法衡量了。时间上的差异——对于海报来说显然会少很多时间——就像下一个遇到他的代码的人一样。

And this string operation is one of the ones that are even easier in C with pointers--and it would be even quicker since the testing functions are macros in C.

这个字符串操作是在 C 中使用指针更容易的操作之一——而且它会更快，因为测试函数是 C 中的宏。

By the way, make sure you look for a space in the second slot, not just a lower case variable, otherwise you'll miss any lines starting with the words A or I.

顺便说一句，确保您在第二个插槽中查找空格，而不仅仅是小写变量，否则您将错过任何以单词 A 或 I 开头的行。

Java 如何使用正则表达式匹配特定类型单词之前的所有内容

提问by John Daly

采纳答案by Tomalak

回答by hhafez

回答by Jon Skeet

回答by Maiku Mori

回答by Bill K

相关推荐

最近更新

标签

Java 如何使用正则表达式匹配特定类型单词之前的所有内容

提问by John Daly

采纳答案by Tomalak

回答by hhafez

回答by Jon Skeet

回答by Maiku Mori

回答by Bill K

相关推荐

如何在java中的if-else函数中引发错误

如何设置默认 Java 安装/运行时 (Windows)？

Android：java.lang.NullPointerException：尝试在空对象引用上调用虚拟方法“java.lang.String java.lang.Object.toString()”

Java 如何从 Hibernate Criteria API 获取 SQL（*不* 用于日志记录）

相关推荐

最近更新

标签

Java 如何从 Hibernate Criteria API 获取 SQL（不用于日志记录）