vb.net 如何在长文本中查找所有出现的特定字符串

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/18615040/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-17 14:55:35  来源:igfitidea点击:

How to find all occurrences of specific string in long text

vb.netstringloopssearch

提问by jenik2205

I have some long text (e.g. information about many books) in one string and in one line.

我在一个字符串和一行中有一些长文本(例如关于许多书籍的信息)。

I want to find just ISBN (only number - each number prevents by chars ISBN). I found code how to extract this number on first position. The problem is how to create loop for all text. Can I use it for this example streamreader? Thank you for your answers.

我只想找到 ISBN(只有数字 - 每个数字都由字符 ISBN 阻止)。我找到了如何在第一个位置提取这个数字的代码。问题是如何为所有文本创建循环。我可以将它用于此示例流阅读器吗?谢谢您的回答。

Example:

例子:

Sub Main()
    Dim getLiteratura As String = "'Author 1. Name of book 1. ISBN 978-80-251-2025-5.', 'Author 2. Name of Book 2. ISBN 80-01-01346.', 'Author 3. Name of book. ISBN 80-85849-83.'"
    Dim test As Integer = getLiteratura.IndexOf("ISBN")
    Dim getISBN As String = getLiteratura.Substring(test + 5, getLiteratura.IndexOf(".", test + 1) - test - 5)

    Console.Write(getISBN)
    Console.ReadKey()
End Sub

采纳答案by Steven Doggart

Since you can pass the start position into the IndexOfmethod, you can loop through the string by starting the search from where the last iteration left off. For instance:

由于您可以将开始位置传递给IndexOf方法,因此您可以通过从上次迭代停止的位置开始搜索来循环遍历字符串。例如:

Dim getLiteratura As String = "'Author 1. Name of book 1. ISBN 978-80-251-2025-5.', 'Author 2. Name of Book 2. ISBN 80-01-01346.', 'Author 3. Name of book. ISBN 80-85849-83.'"
Dim isbns As New List(Of String)()
Dim position As Integer = 0
While position <> -1
    position = getLiteratura.IndexOf("ISBN", position)
    If position <> -1 Then
        Dim endPosition As Integer = getLiteratura.IndexOf(".", position + 1)
        If endPosition <> -1 Then
            isbns.Add(getLiteratura.Substring(position + 5, endPosition - position - 5))
        End If
        position = endPosition
    End If
End While

That would be about as efficient of a method as you are likely to find, if the data is already all loaded into a string. However, that method is not very readable or flexible. If those things concern you more than mere efficiency, you may want to consider using RegEx:

如果数据已经全部加载到字符串中,那么这将与您可能找到的方法一样有效。但是,该方法的可读性或灵活性并不强。如果这些事情不仅仅是效率,您可能需要考虑使用 RegEx:

For Each i As Match In Regex.Matches(getLiteratura, "ISBN (?<isbn>.*?)\.")
    isbns.Add(i.Groups("isbn").Value)
Next

As you can see, not only is it much easier to read, it is also configurable. You could store the pattern externally in a resource, configuration file, database, etc.

如您所见,它不仅更易于阅读,而且还可以进行配置。您可以将模式外部存储在资源、配置文件、数据库等中。

If the data isn't already all loaded into a string, and efficiency is an utmost concern, you may want to look into using a stream reader so that you only load a small subset of the data into memory at once. That logic would be a bit more complicated, but still not overly difficult.

如果数据尚未全部加载到字符串中,并且效率是最重要的问题,您可能需要考虑使用流读取器,以便一次只将一小部分数据加载到内存中。这个逻辑会稍微复杂一些,但仍然不会太难。

Here's a simple example of how you could do it via a StreamReader:

这是一个简单的示例,说明如何通过 a 执行此操作StreamReader

Dim isbns As New List(Of String)()
Using reader As StreamReader = New StreamReader(stream)
    Dim builder As New StringBuilder()
    Dim isbnRegEx As New Regex("ISBN (?<isbn>.*?)\.")
    While Not reader.EndOfStream
        Dim charValue As Integer = reader.Read()
        If charValue <> -1 Then
            builder.Append(Convert.ToChar(charValue))
            Dim matches As MatchCollection = isbnRegEx.Matches(builder.ToString())
            If matches.Count <> 0 Then
                For Each i As Match In matches
                    isbns.Add(i.Groups("isbn").Value)
                Next
                builder.Clear()
            End If
        End If
    End While
End Using

As you can see, in that example, as soon as a match is found, it adds it to the list and then clears out the builderwhich is being used as a buffer. That way, the amount of data being held in memory at one time is never more than the size of one "record".

如您所见,在该示例中,一旦找到匹配项,它就会将其添加到列表中,然后清除builder用作缓冲区的 。这样,一次保存在内存中的数据量永远不会超过一个“记录”的大小。

UPDATE

更新

Since, based on your comments, you're having trouble getting it to work properly, here is a full working sample which outputs justthe ISBN numbers, without any of the surrounding characters. Just create a new VB.NET console application and paste in the following code:

因为,根据您的意见,您无法得到它工作正常,在此可以输出一个完整的工作示例只是在ISBN号,没有任何周围的字符。只需创建一个新的 VB.NET 控制台应用程序并粘贴以下代码:

Imports System.Text.RegularExpressions

Module Module1
    Public Sub Main()
        Dim data As String = "'Author 1. Name of book 1. ISBN 978-80-251-2025-5.', 'Author 2. Name of Book 2. ISBN 80-01-01346.', 'Author 3. Name of book. ISBN 80-85849-83.'"
        For Each i As String In GetIsbns(data)
            Console.WriteLine(i)
        Next
        Console.ReadKey()
    End Sub

    Public Function GetIsbns(data As String) As List(Of String)
        Dim isbns As New List(Of String)()
        For Each i As Match In Regex.Matches(data, "ISBN (?<isbn>.*?)\.")
            isbns.Add(i.Groups("isbn").Value)
        Next
        Return isbns
    End Function
End Module

回答by John Bustos

When dealing with a large group of data, I would suggest Regular Expressions.

在处理大量数据时,我建议使用正则表达式。

Try something like this:

尝试这样的事情:

    Dim getLiteratura As String = "'Author 1. Name of book 1. ISBN 978-80-251-2025-5.', 'Author 2. Name of Book 2. ISBN 80-01-01346.', 'Author 3. Name of book. ISBN 80-85849-83.'"
    Dim Pattern As String = "ISBN (.*?)\."
    Dim ReturnedMatches As MatchCollection = Regex.Matches(getLiteratura, Pattern)
    For Each ReturnedMatch As Match In ReturnedMatches
        MsgBox(ReturnedMatch.Groups(1).ToString)
    Next

AND, at the top of your module, include the line Imports System.Text.RegularExpressions

并且,在您的模块顶部,包括该行 Imports System.Text.RegularExpressions

Hope this points you in the right direction...

希望这会为您指明正确的方向......

回答by Derek Meyer

Here is my solution

这是我的解决方案

Private Sub Form1_Load(sender As Object, e As EventArgs) Handles MyBase.Load
    Dim outputtext As New String("")
    Dim test As Integer = 0
    Dim getLiteratura As String = "'Author 1. Name of book 1. ISBN 978-80-251-2025-5.', 'Author 2. Name of Book 2. ISBN 80-01-01346.', 'Author 3. Name of book. ISBN 80-85849-83.'"
    test = getLiteratura.IndexOf("ISBN")
    Dim getISBN As String = ""
    While Not getLiteratura.Substring(test + 5, getLiteratura.IndexOf(".", test + 1) - test - 5).Length = 0
        outputtext = outputtext & getLiteratura.Substring(test + 5, getLiteratura.IndexOf(".", test + 1) - test - 5) & " : "
        If getLiteratura.Substring(test + 1).IndexOf("ISBN") = 0 Then
            Exit While
        Else
            test = test + getLiteratura.Substring(test + 1).IndexOf("ISBN")
        End If
    End While

    Label1.Text = outputtext
End Sub