从给定字符串中提取 url 的 C# 正则表达式模式 - 不是完整的 html url,而是裸链接
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/10576686/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
C# regex pattern to extract urls from given string - not full html urls but bare links as well
提问by MonsterMMORPG
I need a regex which will do the following
我需要一个正则表达式,它将执行以下操作
Extract all strings which starts with http://
Extract all strings which starts with www.
So i need to extract these 2.
所以我需要提取这两个。
For example there is this given string text below
例如,下面有这个给定的字符串文本
house home go www.monstermmorpg.com nice hospital http://www.monstermmorpg.com this is incorrect url http://www.monstermmorpg.commerged continue
So from the given above string i will get
所以从上面给出的字符串我会得到
www.monstermmorpg.com
http://www.monstermmorpg.com
http://www.monstermmorpg.commerged
Looking for regex or another way. Thank you.
寻找正则表达式或其他方式。谢谢你。
C# 4.0
C# 4.0
采纳答案by Jason Larke
You can write some pretty simple regular expressions to handle this, or go via more traditional string splitting + LINQ methodology.
您可以编写一些非常简单的正则表达式来处理这个问题,或者使用更传统的字符串拆分 + LINQ 方法。
Regex
正则表达式
var linkParser = new Regex(@"\b(?:https?://|www\.)\S+\b", RegexOptions.Compiled | RegexOptions.IgnoreCase);
var rawString = "house home go www.monstermmorpg.com nice hospital http://www.monstermmorpg.com this is incorrect url http://www.monstermmorpg.commerged continue";
foreach(Match m in linkParser.Matches(rawString))
MessageBox.Show(m.Value);
ExplanationPattern:
解释模式:
\b -matches a word boundary (spaces, periods..etc)
(?: -define the beginning of a group, the ?: specifies not to capture the data within this group.
https?:// - Match http or https (the '?' after the "s" makes it optional)
| -OR
www\. -literal string, match www. (the \. means a literal ".")
) -end group
\S+ -match a series of non-whitespace characters.
\b -match the closing word boundary.
Basically the pattern looks for strings that start with http:// OR https:// OR www. (?:https?://|www\.)and then matches all the characters up to the next whitespace.
基本上,该模式查找以 开头的字符串,http:// OR https:// OR www. (?:https?://|www\.)然后匹配所有字符直到下一个空格。
Traditional String Options
传统字符串选项
var rawString = "house home go www.monstermmorpg.com nice hospital http://www.monstermmorpg.com this is incorrect url http://www.monstermmorpg.commerged continue";
var links = rawString.Split("\t\n ".ToCharArray(), StringSplitOptions.RemoveEmptyEntries).Where(s => s.StartsWith("http://") || s.StartsWith("www.") || s.StartsWith("https://"));
foreach (string s in links)
MessageBox.Show(s);

