从给定字符串中提取 url 的 C# 正则表达式模式 - 不是完整的 html url，而是裸链接

Question

提问by MonsterMMORPG

I need a regex which will do the following

我需要一个正则表达式，它将执行以下操作

Extract all strings which starts with http://
Extract all strings which starts with www.

So i need to extract these 2.

所以我需要提取这两个。

For example there is this given string text below

例如，下面有这个给定的字符串文本

house home go www.monstermmorpg.com nice hospital http://www.monstermmorpg.com this is incorrect url http://www.monstermmorpg.commerged continue

So from the given above string i will get

所以从上面给出的字符串我会得到

    www.monstermmorpg.com
http://www.monstermmorpg.com
http://www.monstermmorpg.commerged

Looking for regex or another way. Thank you.

寻找正则表达式或其他方式。谢谢你。

C# 4.0

Answer 1

采纳答案by Jason Larke

You can write some pretty simple regular expressions to handle this, or go via more traditional string splitting + LINQ methodology.

您可以编写一些非常简单的正则表达式来处理这个问题，或者使用更传统的字符串拆分 + LINQ 方法。

Regex

正则表达式

var linkParser = new Regex(@"\b(?:https?://|www\.)\S+\b", RegexOptions.Compiled | RegexOptions.IgnoreCase);
var rawString = "house home go www.monstermmorpg.com nice hospital http://www.monstermmorpg.com this is incorrect url http://www.monstermmorpg.commerged continue";
foreach(Match m in linkParser.Matches(rawString))
    MessageBox.Show(m.Value);

ExplanationPattern:

解释模式：

\b       -matches a word boundary (spaces, periods..etc)
(?:      -define the beginning of a group, the ?: specifies not to capture the data within this group.
https?://  - Match http or https (the '?' after the "s" makes it optional)
|        -OR
www\.    -literal string, match www. (the \. means a literal ".")
)        -end group
\S+      -match a series of non-whitespace characters.
\b       -match the closing word boundary.

Basically the pattern looks for strings that start with http:// OR https:// OR www. (?:https?://|www\.)and then matches all the characters up to the next whitespace.

基本上，该模式查找以开头的字符串，http:// OR https:// OR www. (?:https?://|www\.)然后匹配所有字符直到下一个空格。

Traditional String Options

传统字符串选项

var rawString = "house home go www.monstermmorpg.com nice hospital http://www.monstermmorpg.com this is incorrect url http://www.monstermmorpg.commerged continue";
var links = rawString.Split("\t\n ".ToCharArray(), StringSplitOptions.RemoveEmptyEntries).Where(s => s.StartsWith("http://") || s.StartsWith("www.") || s.StartsWith("https://"));
foreach (string s in links)
    MessageBox.Show(s);

从给定字符串中提取 url 的 C# 正则表达式模式 - 不是完整的 html url，而是裸链接

提问by MonsterMMORPG

采纳答案by Jason Larke

Regex

正则表达式

Traditional String Options

传统字符串选项

相关推荐

最近更新

标签

从给定字符串中提取 url 的 C# 正则表达式模式 - 不是完整的 html url，而是裸链接

提问by MonsterMMORPG

采纳答案by Jason Larke

Regex

正则表达式

Traditional String Options

传统字符串选项

相关推荐

C# 当文本超过一定长度时，将文本换行到下一行？

C# 如何将具有反斜杠的连接字符串传递给 SqlConnection？

C# 在 JSON.Net 4.0 中使用 JObject 和 JProperty

C# 如何通过 JObject 进行枚举？

相关推荐

最近更新

标签