Java 中匹配 URL 的正则表达式
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/163360/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Regular expression to match URLs in Java
提问by Sergio del Amo
I use RegexBuddy while working with regular expressions. From its library I copied the regular expression to match URLs. I tested successfully within RegexBuddy. However, when I copied it as Java String
flavor and pasted it into Java code, it does not work. The following class prints false
:
我在使用正则表达式时使用 RegexBuddy。我从它的库中复制了正则表达式以匹配 URL。我在 RegexBuddy 中测试成功。但是,当我将其复制为 JavaString
风格并将其粘贴到 Java 代码中时,它不起作用。以下类打印false
:
public class RegexFoo {
public static void main(String[] args) {
String regex = "\b(https?|ftp|file)://[-A-Z0-9+&@#/%?=~_|!:,.;]*[-A-Z0-9+&@#/%=~_|]";
String text = "http://google.com";
System.out.println(IsMatch(text,regex));
}
private static boolean IsMatch(String s, String pattern) {
try {
Pattern patt = Pattern.compile(pattern);
Matcher matcher = patt.matcher(s);
return matcher.matches();
} catch (RuntimeException e) {
return false;
}
}
}
Does anyone know what I am doing wrong?
有谁知道我做错了什么?
采纳答案by TomC
Try the following regex string instead. Your test was probably done in a case-sensitive manner. I have added the lowercase alphas as well as a proper string beginning placeholder.
请尝试使用以下正则表达式字符串。您的测试可能以区分大小写的方式完成。我添加了小写字母以及适当的字符串开头占位符。
String regex = "^(https?|ftp|file)://[-a-zA-Z0-9+&@#/%?=~_|!:,.;]*[-a-zA-Z0-9+&@#/%=~_|]";
This works too:
这也有效:
String regex = "\b(https?|ftp|file)://[-a-zA-Z0-9+&@#/%?=~_|!:,.;]*[-a-zA-Z0-9+&@#/%=~_|]";
Note:
笔记:
String regex = "<\b(https?|ftp|file)://[-a-zA-Z0-9+&@#/%?=~_|!:,.;]*[-a-zA-Z0-9+&@#/%=~_|]>"; // matches <http://google.com>
String regex = "<^(https?|ftp|file)://[-a-zA-Z0-9+&@#/%?=~_|!:,.;]*[-a-zA-Z0-9+&@#/%=~_|]>"; // does not match <http://google.com>
回答by billjamesdev
I'll try a standard "Why are you doing it this way?" answer... Do you know about java.net.URL
?
我会尝试一个标准的“你为什么这样做?” 回答... 你知道java.net.URL
吗?
URL url = new URL(stringURL);
The above will throw a MalformedURLException
if it can't parse the URL.
MalformedURLException
如果无法解析 URL ,上面将抛出一个。
回答by Sergio del Amo
This works too:
这也有效:
String regex = "\b(https?|ftp|file)://[-a-zA-Z0-9+&@#/%?=~_|!:,.;]*[-a-zA-Z0-9+&@#/%=~_|]";
Note:
笔记:
String regex = "<\b(https?|ftp|file)://[-a-zA-Z0-9+&@#/%?=~_|!:,.;]*[-a-zA-Z0-9+&@#/%=~_|]>"; // matches <http://google.com>
String regex = "<^(https?|ftp|file)://[-a-zA-Z0-9+&@#/%?=~_|!:,.;]*[-a-zA-Z0-9+&@#/%=~_|]>"; // does not match <http://google.com>
So probably the first one is more useful for general use.
所以可能第一个对一般用途更有用。
回答by Jan Goyvaerts
When using regular expressions from RegexBuddy's library, make sure to use the same matching modes in your own code as the regex from the library. If you generate a source code snippet on the Use tab, RegexBuddy will automatically set the correct matching options in the source code snippet. If you copy/paste the regex, you have to do that yourself.
使用 RegexBuddy 库中的正则表达式时,请确保在您自己的代码中使用与库中的正则表达式相同的匹配模式。如果您在“使用”选项卡上生成源代码片段,RegexBuddy 将自动在源代码片段中设置正确的匹配选项。如果您复制/粘贴正则表达式,则必须自己完成。
In this case, as others pointed out, you missed the case insensitivity option.
在这种情况下,正如其他人指出的那样,您错过了不区分大小写的选项。
回答by Fuad Efendi
The problem with all suggested approaches: all RegEx is validating
所有建议方法的问题:所有 RegEx 都在验证
All RegEx -based code is over-engineered: it will find only valid URLs! As a sample, it will ignore anything starting with "http://" and having non-ASCII characters inside.
所有基于 RegEx 的代码都是过度设计的:它只会找到有效的 URL!作为示例,它会忽略任何以“http://”开头并包含非 ASCII 字符的内容。
Even more: I have encountered 1-2-seconds processing times (single-threaded, dedicated) with Java RegEx package (filtering Email addresses from text) for very small and simple sentences, nothing specific; possibly bug in Java 6 RegEx...
甚至更多:我遇到过使用 Java RegEx 包(从文本过滤电子邮件地址)处理非常小的简单句子的 1-2 秒处理时间(单线程、专用),没有什么具体的;可能是 Java 6 RegEx 中的错误...
Simplest/Fastest solution would be to use StringTokenizer to split text into tokens, to remove tokens starting with "http://" etc., and to concatenate tokens into text again.
最简单/最快的解决方案是使用 StringTokenizer 将文本拆分为标记,删除以“http://”等开头的标记,并再次将标记连接为文本。
If you want to filter Emails from text (because later on you will do NLP staff etc) - just remove all tokens containing "@" inside.
如果你想从文本中过滤电子邮件(因为稍后你会做 NLP 人员等) - 只需删除所有包含“@”的标记。
This is simple text where RegEx of Java 6 fails. Try it in divverent variants of Java. It takes about 1000 milliseconds per RegEx call, in a long running single threaded test application:
这是 Java 6 的 RegEx 失败的简单文本。在 Java 的不同变体中尝试一下。在长时间运行的单线程测试应用程序中,每次 RegEx 调用大约需要 1000 毫秒:
pattern = Pattern.compile("[A-Za-z0-9](([_\.\-]?[a-zA-Z0-9]+)*)@([A-Za-z0-9]+)(([\.\-]?[a-zA-Z0-9]+)*)\.([A-Za-z]{2,})", Pattern.CASE_INSENSITIVE);
"Avalanna is such a sweet little girl! It would b heartbreaking if cancer won. She's so precious! #BeliebersPrayForAvalanna");
"@AndySamuels31 Hahahahahahahahahhaha lol, you don't look like a girl hahahahhaahaha, you are... sexy.";
Do not rely on regular expressions if you only need to filter words with "@", "http://", "ftp://", "mailto:"; it is huge engineering overhead.
如果您只需要过滤带有“@”、“http://”、“ftp://”、“mailto:”的单词,请不要依赖正则表达式;这是巨大的工程开销。
If you really want to use RegEx with Java, try Automaton
如果您真的想在 Java 中使用 RegEx,请尝试Automaton
回答by Kamil Lelonek
The best way to do it now is:
现在最好的方法是:
android.util.Patterns.WEB_URL.matcher(linkUrl).matches();
EDIT: Code of Patterns
from https://github.com/android/platform_frameworks_base/blob/master/core/java/android/util/Patterns.java:
编辑:Patterns
来自https://github.com/android/platform_frameworks_base/blob/master/core/java/android/util/Patterns.java 的代码:
/*
* Copyright (C) 2007 The Android Open Source Project
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
package android.util;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
/**
* Commonly used regular expression patterns.
*/
public class Patterns {
/**
* Regular expression to match all IANA top-level domains.
* List accurate as of 2011/07/18. List taken from:
* http://data.iana.org/TLD/tlds-alpha-by-domain.txt
* This pattern is auto-generated by frameworks/ex/common/tools/make-iana-tld-pattern.py
*
* @deprecated Due to the recent profileration of gTLDs, this API is
* expected to become out-of-date very quickly. Therefore it is now
* deprecated.
*/
@Deprecated
public static final String TOP_LEVEL_DOMAIN_STR =
"((aero|arpa|asia|a[cdefgilmnoqrstuwxz])"
+ "|(biz|b[abdefghijmnorstvwyz])"
+ "|(cat|com|coop|c[acdfghiklmnoruvxyz])"
+ "|d[ejkmoz]"
+ "|(edu|e[cegrstu])"
+ "|f[ijkmor]"
+ "|(gov|g[abdefghilmnpqrstuwy])"
+ "|h[kmnrtu]"
+ "|(info|int|i[delmnoqrst])"
+ "|(jobs|j[emop])"
+ "|k[eghimnprwyz]"
+ "|l[abcikrstuvy]"
+ "|(mil|mobi|museum|m[acdeghklmnopqrstuvwxyz])"
+ "|(name|net|n[acefgilopruz])"
+ "|(org|om)"
+ "|(pro|p[aefghklmnrstwy])"
+ "|qa"
+ "|r[eosuw]"
+ "|s[abcdeghijklmnortuvyz]"
+ "|(tel|travel|t[cdfghjklmnoprtvwz])"
+ "|u[agksyz]"
+ "|v[aceginu]"
+ "|w[fs]"
+ "|(\u03b4\u03bf\u03ba\u03b9\u03bc\u03ae|\u0438\u0441\u043f\u044b\u0442\u0430\u043d\u0438\u0435|\u0440\u0444|\u0441\u0440\u0431|\u05d8\u05e2\u05e1\u05d8|\u0622\u0632\u0645\u0627\u06cc\u0634\u06cc|\u0625\u062e\u062a\u0628\u0627\u0631|\u0627\u0644\u0627\u0631\u062f\u0646|\u0627\u0644\u062c\u0632\u0627\u0626\u0631|\u0627\u0644\u0633\u0639\u0648\u062f\u064a\u0629|\u0627\u0644\u0645\u063a\u0631\u0628|\u0627\u0645\u0627\u0631\u0627\u062a|\u0628\u06be\u0627\u0631\u062a|\u062a\u0648\u0646\u0633|\u0633\u0648\u0631\u064a\u0629|\u0641\u0644\u0633\u0637\u064a\u0646|\u0642\u0637\u0631|\u0645\u0635\u0631|\u092a\u0930\u0940\u0915\u094d\u0937\u093e|\u092d\u093e\u0930\u0924|\u09ad\u09be\u09b0\u09a4|\u0a2d\u0a3e\u0a30\u0a24|\u0aad\u0abe\u0ab0\u0aa4|\u0b87\u0ba8\u0bcd\u0ba4\u0bbf\u0baf\u0bbe|\u0b87\u0bb2\u0b99\u0bcd\u0b95\u0bc8|\u0b9a\u0bbf\u0b99\u0bcd\u0b95\u0baa\u0bcd\u0baa\u0bc2\u0bb0\u0bcd|\u0baa\u0bb0\u0bbf\u0b9f\u0bcd\u0b9a\u0bc8|\u0c2d\u0c3e\u0c30\u0c24\u0c4d|\u0dbd\u0d82\u0d9a\u0dcf|\u0e44\u0e17\u0e22|\u30c6\u30b9\u30c8|\u4e2d\u56fd|\u4e2d\u570b|\u53f0\u6e7e|\u53f0\u7063|\u65b0\u52a0\u5761|\u6d4b\u8bd5|\u6e2c\u8a66|\u9999\u6e2f|\ud14c\uc2a4\ud2b8|\ud55c\uad6d|xn\-\-0zwm56d|xn\-\-11b5bs3a9aj6g|xn\-\-3e0b707e|xn\-\-45brj9c|xn\-\-80akhbyknj4f|xn\-\-90a3ac|xn\-\-9t4b11yi5a|xn\-\-clchc0ea0b2g2a9gcd|xn\-\-deba0ad|xn\-\-fiqs8s|xn\-\-fiqz9s|xn\-\-fpcrj9c3d|xn\-\-fzc2c9e2c|xn\-\-g6w251d|xn\-\-gecrj9c|xn\-\-h2brj9c|xn\-\-hgbk6aj7f53bba|xn\-\-hlcj6aya9esc7a|xn\-\-j6w193g|xn\-\-jxalpdlp|xn\-\-kgbechtv|xn\-\-kprw13d|xn\-\-kpry57d|xn\-\-lgbbat1ad8j|xn\-\-mgbaam7a8h|xn\-\-mgbayh7gpa|xn\-\-mgbbh1a71e|xn\-\-mgbc0a9azcg|xn\-\-mgberp4a5d4ar|xn\-\-o3cw4h|xn\-\-ogbpf8fl|xn\-\-p1ai|xn\-\-pgbs0dh|xn\-\-s9brj9c|xn\-\-wgbh1c|xn\-\-wgbl6a|xn\-\-xkc2al3hye2a|xn\-\-xkc2dl3a5ee0h|xn\-\-yfro4i67o|xn\-\-ygbi2ammx|xn\-\-zckzah|xxx)"
+ "|y[et]"
+ "|z[amw])";
/**
* Regular expression pattern to match all IANA top-level domains.
* @deprecated This API is deprecated. See {@link #TOP_LEVEL_DOMAIN_STR}.
*/
@Deprecated
public static final Pattern TOP_LEVEL_DOMAIN =
Pattern.compile(TOP_LEVEL_DOMAIN_STR);
/**
* Regular expression to match all IANA top-level domains for WEB_URL.
* List accurate as of 2011/07/18. List taken from:
* http://data.iana.org/TLD/tlds-alpha-by-domain.txt
* This pattern is auto-generated by frameworks/ex/common/tools/make-iana-tld-pattern.py
*
* @deprecated This API is deprecated. See {@link #TOP_LEVEL_DOMAIN_STR}.
*/
@Deprecated
public static final String TOP_LEVEL_DOMAIN_STR_FOR_WEB_URL =
"(?:"
+ "(?:aero|arpa|asia|a[cdefgilmnoqrstuwxz])"
+ "|(?:biz|b[abdefghijmnorstvwyz])"
+ "|(?:cat|com|coop|c[acdfghiklmnoruvxyz])"
+ "|d[ejkmoz]"
+ "|(?:edu|e[cegrstu])"
+ "|f[ijkmor]"
+ "|(?:gov|g[abdefghilmnpqrstuwy])"
+ "|h[kmnrtu]"
+ "|(?:info|int|i[delmnoqrst])"
+ "|(?:jobs|j[emop])"
+ "|k[eghimnprwyz]"
+ "|l[abcikrstuvy]"
+ "|(?:mil|mobi|museum|m[acdeghklmnopqrstuvwxyz])"
+ "|(?:name|net|n[acefgilopruz])"
+ "|(?:org|om)"
+ "|(?:pro|p[aefghklmnrstwy])"
+ "|qa"
+ "|r[eosuw]"
+ "|s[abcdeghijklmnortuvyz]"
+ "|(?:tel|travel|t[cdfghjklmnoprtvwz])"
+ "|u[agksyz]"
+ "|v[aceginu]"
+ "|w[fs]"
+ "|(?:\u03b4\u03bf\u03ba\u03b9\u03bc\u03ae|\u0438\u0441\u043f\u044b\u0442\u0430\u043d\u0438\u0435|\u0440\u0444|\u0441\u0440\u0431|\u05d8\u05e2\u05e1\u05d8|\u0622\u0632\u0645\u0627\u06cc\u0634\u06cc|\u0625\u062e\u062a\u0628\u0627\u0631|\u0627\u0644\u0627\u0631\u062f\u0646|\u0627\u0644\u062c\u0632\u0627\u0626\u0631|\u0627\u0644\u0633\u0639\u0648\u062f\u064a\u0629|\u0627\u0644\u0645\u063a\u0631\u0628|\u0627\u0645\u0627\u0631\u0627\u062a|\u0628\u06be\u0627\u0631\u062a|\u062a\u0648\u0646\u0633|\u0633\u0648\u0631\u064a\u0629|\u0641\u0644\u0633\u0637\u064a\u0646|\u0642\u0637\u0631|\u0645\u0635\u0631|\u092a\u0930\u0940\u0915\u094d\u0937\u093e|\u092d\u093e\u0930\u0924|\u09ad\u09be\u09b0\u09a4|\u0a2d\u0a3e\u0a30\u0a24|\u0aad\u0abe\u0ab0\u0aa4|\u0b87\u0ba8\u0bcd\u0ba4\u0bbf\u0baf\u0bbe|\u0b87\u0bb2\u0b99\u0bcd\u0b95\u0bc8|\u0b9a\u0bbf\u0b99\u0bcd\u0b95\u0baa\u0bcd\u0baa\u0bc2\u0bb0\u0bcd|\u0baa\u0bb0\u0bbf\u0b9f\u0bcd\u0b9a\u0bc8|\u0c2d\u0c3e\u0c30\u0c24\u0c4d|\u0dbd\u0d82\u0d9a\u0dcf|\u0e44\u0e17\u0e22|\u30c6\u30b9\u30c8|\u4e2d\u56fd|\u4e2d\u570b|\u53f0\u6e7e|\u53f0\u7063|\u65b0\u52a0\u5761|\u6d4b\u8bd5|\u6e2c\u8a66|\u9999\u6e2f|\ud14c\uc2a4\ud2b8|\ud55c\uad6d|xn\-\-0zwm56d|xn\-\-11b5bs3a9aj6g|xn\-\-3e0b707e|xn\-\-45brj9c|xn\-\-80akhbyknj4f|xn\-\-90a3ac|xn\-\-9t4b11yi5a|xn\-\-clchc0ea0b2g2a9gcd|xn\-\-deba0ad|xn\-\-fiqs8s|xn\-\-fiqz9s|xn\-\-fpcrj9c3d|xn\-\-fzc2c9e2c|xn\-\-g6w251d|xn\-\-gecrj9c|xn\-\-h2brj9c|xn\-\-hgbk6aj7f53bba|xn\-\-hlcj6aya9esc7a|xn\-\-j6w193g|xn\-\-jxalpdlp|xn\-\-kgbechtv|xn\-\-kprw13d|xn\-\-kpry57d|xn\-\-lgbbat1ad8j|xn\-\-mgbaam7a8h|xn\-\-mgbayh7gpa|xn\-\-mgbbh1a71e|xn\-\-mgbc0a9azcg|xn\-\-mgberp4a5d4ar|xn\-\-o3cw4h|xn\-\-ogbpf8fl|xn\-\-p1ai|xn\-\-pgbs0dh|xn\-\-s9brj9c|xn\-\-wgbh1c|xn\-\-wgbl6a|xn\-\-xkc2al3hye2a|xn\-\-xkc2dl3a5ee0h|xn\-\-yfro4i67o|xn\-\-ygbi2ammx|xn\-\-zckzah|xxx)"
+ "|y[et]"
+ "|z[amw]))";
/**
* Good characters for Internationalized Resource Identifiers (IRI).
* This comprises most common used Unicode characters allowed in IRI
* as detailed in RFC 3987.
* Specifically, those two byte Unicode characters are not included.
*/
public static final String GOOD_IRI_CHAR =
"a-zA-Z0-9\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF";
public static final Pattern IP_ADDRESS
= Pattern.compile(
"((25[0-5]|2[0-4][0-9]|[0-1][0-9]{2}|[1-9][0-9]|[1-9])\.(25[0-5]|2[0-4]"
+ "[0-9]|[0-1][0-9]{2}|[1-9][0-9]|[1-9]|0)\.(25[0-5]|2[0-4][0-9]|[0-1]"
+ "[0-9]{2}|[1-9][0-9]|[1-9]|0)\.(25[0-5]|2[0-4][0-9]|[0-1][0-9]{2}"
+ "|[1-9][0-9]|[0-9]))");
/**
* RFC 1035 Section 2.3.4 limits the labels to a maximum 63 octets.
*/
private static final String IRI
= "[" + GOOD_IRI_CHAR + "]([" + GOOD_IRI_CHAR + "\-]{0,61}[" + GOOD_IRI_CHAR + "]){0,1}";
private static final String GOOD_GTLD_CHAR =
"a-zA-Z\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF";
private static final String GTLD = "[" + GOOD_GTLD_CHAR + "]{2,63}";
private static final String HOST_NAME = "(" + IRI + "\.)+" + GTLD;
public static final Pattern DOMAIN_NAME
= Pattern.compile("(" + HOST_NAME + "|" + IP_ADDRESS + ")");
/**
* Regular expression pattern to match most part of RFC 3987
* Internationalized URLs, aka IRIs. Commonly used Unicode characters are
* added.
*/
public static final Pattern WEB_URL = Pattern.compile(
"((?:(http|https|Http|Https|rtsp|Rtsp):\/\/(?:(?:[a-zA-Z0-9\$\-\_\.\+\!\*\'\(\)"
+ "\,\;\?\&\=]|(?:\%[a-fA-F0-9]{2})){1,64}(?:\:(?:[a-zA-Z0-9\$\-\_"
+ "\.\+\!\*\'\(\)\,\;\?\&\=]|(?:\%[a-fA-F0-9]{2})){1,25})?\@)?)?"
+ "(?:" + DOMAIN_NAME + ")"
+ "(?:\:\d{1,5})?)" // plus option port number
+ "(\/(?:(?:[" + GOOD_IRI_CHAR + "\;\/\?\:\@\&\=\#\~" // plus option query params
+ "\-\.\+\!\*\'\(\)\,\_])|(?:\%[a-fA-F0-9]{2}))*)?"
+ "(?:\b|$)"); // and finally, a word boundary or end of
// input. This is to stop foo.sure from
// matching as foo.su
public static final Pattern EMAIL_ADDRESS
= Pattern.compile(
"[a-zA-Z0-9\+\.\_\%\-\+]{1,256}" +
"\@" +
"[a-zA-Z0-9][a-zA-Z0-9\-]{0,64}" +
"(" +
"\." +
"[a-zA-Z0-9][a-zA-Z0-9\-]{0,25}" +
")+"
);
/**
* This pattern is intended for searching for things that look like they
* might be phone numbers in arbitrary text, not for validating whether
* something is in fact a phone number. It will miss many things that
* are legitimate phone numbers.
*
* <p> The pattern matches the following:
* <ul>
* <li>Optionally, a + sign followed immediately by one or more digits. Spaces, dots, or dashes
* may follow.
* <li>Optionally, sets of digits in parentheses, separated by spaces, dots, or dashes.
* <li>A string starting and ending with a digit, containing digits, spaces, dots, and/or dashes.
* </ul>
*/
public static final Pattern PHONE
= Pattern.compile( // sdd = space, dot, or dash
"(\+[0-9]+[\- \.]*)?" // +<digits><sdd>*
+ "(\([0-9]+\)[\- \.]*)?" // (<digits>)<sdd>*
+ "([0-9][0-9\- \.]+[0-9])"); // <digit><digit|sdd>+<digit>
/**
* Convenience method to take all of the non-null matching groups in a
* regex Matcher and return them as a concatenated string.
*
* @param matcher The Matcher object from which grouped text will
* be extracted
*
* @return A String comprising all of the non-null matched
* groups concatenated together
*/
public static final String concatGroups(Matcher matcher) {
StringBuilder b = new StringBuilder();
final int numGroups = matcher.groupCount();
for (int i = 1; i <= numGroups; i++) {
String s = matcher.group(i);
if (s != null) {
b.append(s);
}
}
return b.toString();
}
/**
* Convenience method to return only the digits and plus signs
* in the matching string.
*
* @param matcher The Matcher object from which digits and plus will
* be extracted
*
* @return A String comprising all of the digits and plus in
* the match
*/
public static final String digitsAndPlusOnly(Matcher matcher) {
StringBuilder buffer = new StringBuilder();
String matchingRegion = matcher.group();
for (int i = 0, size = matchingRegion.length(); i < size; i++) {
char character = matchingRegion.charAt(i);
if (character == '+' || Character.isDigit(character)) {
buffer.append(character);
}
}
return buffer.toString();
}
/**
* Do not create this static utility class.
*/
private Patterns() {}
}
回答by Cavaleiro
In line with billjamesdev answer, here is another approach to validate an URL without using a RegEx:
与 billjamesdev 的回答一致,这是另一种不使用 RegEx 验证 URL 的方法:
From Apache Commons Validatorlib, look at class UrlValidator. Some example code:
从Apache Commons Validator库中,查看类UrlValidator。一些示例代码:
Construct a UrlValidator with valid schemes of "http", and "https".
使用有效的“http”和“https”方案构造一个 UrlValidator。
String[] schemes = {"http","https"}.
UrlValidator urlValidator = new UrlValidator(schemes);
if (urlValidator.isValid("ftp://foo.bar.com/")) {
System.out.println("url is valid");
} else {
System.out.println("url is invalid");
}
prints "url is invalid"
If instead the default constructor is used.
如果改为使用默认构造函数。
UrlValidator urlValidator = new UrlValidator();
if (urlValidator.isValid("ftp://foo.bar.com/")) {
System.out.println("url is valid");
} else {
System.out.println("url is invalid");
}
prints out "url is valid"
打印出“网址有效”
回答by Abhiraj
((http?|https|ftp|file)://)?((W|w){3}.)?[a-zA-Z0-9]+\.[a-zA-Z]+
check here:- https://www.freeformatter.com/java-regex-tester.html#ad-output
在这里查看:- https://www.freeformatter.com/java-regex-tester.html#ad-output
It sorts out theses entries correctly
它正确地整理了这些条目
- google.com
- www.google.com
- wwwgooglecom
- ft.
- Www.google.com
- .ft
- https://www.google.com
- https://
- https://www.
- https://google.com
- google.com
- www.google.com
- wwwgooglecom
- 英尺
- www.google.com
- .ft
- https://www.google.com
- https://
- https://www。
- https://google.com