scala 用Scala中的占位符替换字符串中的值

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/2183503/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-22 01:53:44  来源:igfitidea点击:

Substitute values in a string with placeholders in Scala

scalafunctional-programming

提问by Gavin

I have just started using Scala and wish to better understand the functional approach to problem solving. I have pairs of strings the first has placeholders for parameter and it's pair has the values to substitute. e.g. "select col1 from tab1 where id > $1 and name like $2" "parameters: $1 = '250', $2 = 'some%'"

我刚刚开始使用 Scala,希望更好地理解函数式解决问题的方法。我有一对字符串,第一个有参数占位符,它的对有要替换的值。例如“select col1 from tab1 where id > $1 and name like $2” “参数:$1 = '250', $2 = 'some%'”

There may be many more than 2 parameters.

参数可能多于 2 个。

I can build the correct string by stepping through and using regex.findAllIn(line) on each line and then going through the iterators to construct the substitution but this seems fairly inelegant and procedurally driven.

我可以通过在每一行上逐步执行并使用 regex.findAllIn(line) 来构建正确的字符串,然后通过迭代器来构造替换,但这似乎相当不雅观且受程序驱动。

Could anyone point me towards a functional approach that will be neater and less error prone?

任何人都可以向我指出一种更简洁且不易出错的功能方法吗?

回答by Daniel C. Sobral

Speaking strictly to the replacement problem, my preferred solution is one enabled by a feature that should probably be available in the upcoming Scala 2.8, which is the ability to replace regex patterns using a function. Using it, the problem can be reduced to this:

严格说到替换问题,我的首选解决方案是由一项可能在即将推出的 Scala 2.8 中可用的功能启用的解决方案,即使用函数替换正则表达式模式的能力。使用它,问题可以简化为:

def replaceRegex(input: String, values: IndexedSeq[String]) =  
  """$(\d+)""".r.replaceAllMatchesIn(input, {
    case Regex.Groups(index) => values(index.toInt)
  })

Which reduces the problem to what you actually intend to do: replace all $Npatterns by the corresponding Nthvalue of a list.

这将问题减少到您实际打算做的事情:用列表的相应第 N 个值替换所有$N模式。

Or, if you can actually set the standards for your input string, you could do it like this:

或者,如果您实际上可以为输入字符串设置标准,则可以这样做:

"select col1 from tab1 where id > %1$s and name like %2$s" format ("one", "two")

If that's all you want, you can stop here. If, however, you are interested in how to go about solving such problems in a functional way, absent clever library functions, please do continue reading.

如果这就是你想要的,你可以停在这里。但是,如果您对如何以函数式方式解决此类问题感兴趣,而缺少巧妙的库函数,请继续阅读。

Thinking functionally about it means thinking of the function. You have a string, some values, and you want a string back. In a statically typed functional language, that means you want something like this:

从功能上考虑它意味着考虑功能。你有一个字符串,一些值,你想要一个字符串回来。在静态类型的函数式语言中,这意味着您需要这样的东西:

(String, List[String]) => String

If one considers that those values may be used in any order, we may ask for a type better suited for that:

如果认为这些值可以按任何顺序使用,我们可能会要求一种更适合的类型:

(String, IndexedSeq[String]) => String

That should be good enough for our function. Now, how do we break down the work? There are a few standard ways of doing it: recursion, comprehension, folding.

这对我们的功能来说应该足够了。现在,我们如何分解工作?有几种标准的方法可以做到:递归、理解、折叠。

RECURSION

递归

Let's start with recursion. Recursion means to divide the problem into a first step, and then repeating it over the remaining data. To me, the most obvious division here would be the following:

让我们从递归开始。递归意味着将问题划分为第一步,然后在剩余的数据上重复执行。对我来说,这里最明显的划分如下:

  1. Replace the first placeholder
  2. Repeat with the remaining placeholders
  1. 替换第一个占位符
  2. 用剩余的占位符重复

That is actually pretty straight-forward to do, so let's get into further details. How do I replace the first placeholder? One thing that can't be avoided is that I need to know what that placeholder is, because I need to get the index into my values from it. So I need to find it:

这实际上非常简单,让我们深入了解更多细节。如何替换第一个占位符?无法避免的一件事是我需要知道那个占位符是什么,因为我需要从中获取索引到我的值中。所以我需要找到它:

(String, Pattern) => String

Once found, I can replace it on the string and repeat:

一旦找到,我可以在字符串上替换它并重复:

val stringPattern = "\$(\d+)"
val regexPattern = stringPattern.r
def replaceRecursive(input: String, values: IndexedSeq[String]): String = regexPattern findFirstIn input match {
  case regexPattern(index) => replaceRecursive(input replaceFirst (stringPattern, values(index.toInt)))
  case _ => input // no placeholder found, finished
}

That is inefficient, because it repeatedly produces new strings, instead of just concatenating each part. Let's try to be more clever about it.

这是低效的,因为它重复地产生新的字符串,而不是仅仅连接每个部分。让我们试着更聪明一点。

To efficiently build a string through concatenation, we need to use StringBuilder. We also want to avoid creating new strings. StringBuildercan accepts CharSequence, which we can get from String. I'm not sure if a new string is actually created or not -- if it is, we could roll our own CharSequencein a way that acts as a view into String, instead of creating a new String. Assured that we can easily change this if required, I'll proceed on the assumption it is not.

为了通过连接有效地构建字符串,我们需要使用StringBuilder. 我们还想避免创建新字符串。StringBuilder可以接受CharSequence,我们可以从中得到String。我不确定是否真的创建了一个新字符串——如果是,我们可以CharSequence以一种充当视图的方式滚动我们自己的字符串String,而不是创建一个新的String. 确保我们可以在需要时轻松更改此设置,我将继续假设它不是。

So, let's consider what functions we need. Naturally, we'll want a function that returns the index into the first placeholder:

那么,让我们考虑一下我们需要哪些功能。自然,我们需要一个将索引返回到第一个占位符的函数:

String => Int

But we also want to skip any part of the string we have already looked at. That means we also want a starting index:

但是我们也想跳过我们已经看过的字符串的任何部分。这意味着我们还需要一个起始索引:

(String, Int) => Int

There's one small detail, though. What if there's on further placeholder? Then there wouldn't be any index to return. Java reuses the index to return that exception. When doing functional programming however, it is always best to return what you mean. And what we mean is that we mayreturn an index, or we may not. The signature for that is this:

不过有一个小细节。如果有进一步的占位符怎么办?那么就不会有任何索引返回。Java 重用索引来返回该异常。但是,在进行函数式编程时,最好返回您的意思。我们的意思是我们可能会返回一个索引,也可能不会。签名是这样的:

(String, Int) => Option[Int]

Let's build this function:

让我们构建这个函数:

def indexOfPlaceholder(input: String, start: Int): Option[Int] = if (start < input.lengt) {
  input indexOf ("$", start) match {
    case -1 => None
    case index => 
      if (index + 1 < input.length && input(index + 1).isDigit)
        Some(index)
      else
        indexOfPlaceholder(input, index + 1)
  }
} else {
  None
}

That's rather complex, mostly to deal with boundary conditions, such as index being out of range, or false positives when looking for placeholders.

这相当复杂,主要是为了处理边界条件,例如索引超出范围,或者在寻找占位符时出现误报。

To skip the placeholder, we'll also need to know it's length, signature (String, Int) => Int:

要跳过占位符,我们还需要知道它的长度、签名(String, Int) => Int

def placeholderLength(input: String, start: Int): Int = {
  def recurse(pos: Int): Int = if (pos < input.length && input(pos).isDigit)
    recurse(pos + 1)
  else
    pos
  recurse(start + 1) - start  // start + 1 skips the "$" sign
}

Next, we also want to know what, exactly, the index of the value the placeholder is standing for. The signature for this is a bit ambiguous:

接下来,我们还想知道占位符所代表的值的索引究竟是什么。这个签名有点模棱两可:

(String, Int) => Int

The first Intis an index into the input, while the second is an index into the values. We could do something about that, but not that easily or efficiently, so let's ignore it. Here's an implementation for it:

第一个Int是输入的索引,而第二个是值的索引。我们可以对此做点什么,但不是那么容易或有效,所以让我们忽略它。这是它的一个实现:

def indexOfValue(input: String, start: Int): Int = {
  def recurse(pos: Int, acc: Int): Int = if (pos < input.length && input(pos).isDigit)
    recurse(pos + 1, acc * 10 + input(pos).asDigit)
  else
    acc
  recurse(start + 1, 0) // start + 1 skips "$"
}

We could have used the length too, and achieve a simpler implementation:

我们也可以使用长度,并实现更简单的实现:

def indexOfValue2(input: String, start: Int, length: Int): Int = if (length > 0) {
  input(start + length - 1).asDigit + 10 * indexOfValue2(input, start, length - 1)
} else {
  0
}

As a note, using curly brackets around simple expressions, such as above, is frowned upon by conventional Scala style, but I use it here so it can be easily pasted on REPL.

需要注意的是,在简单的表达式周围使用大括号,例如上面,传统的 Scala 风格是不受欢迎的,但我在这里使用它,所以它可以很容易地粘贴到 REPL 上。

So, we can get the index to the next placeholder, its length, and the index of the value. That's pretty much everything needed for a more efficient version of replaceRecursive:

所以,我们可以得到下一个占位符的索引、它的长度和值的索引。这几乎是更高效版本所需的一切replaceRecursive

def replaceRecursive2(input: String, values: IndexedSeq[String]): String = {
  val sb = new StringBuilder(input.length)
  def recurse(start: Int): String = if (start < input.length) {
    indexOfPlaceholder(input, start) match {
      case Some(placeholderIndex) =>
        val placeholderLength = placeholderLength(input, placeholderIndex)
        sb.append(input subSequence (start, placeholderIndex))
        sb.append(values(indexOfValue(input, placeholderIndex)))
        recurse(start + placeholderIndex + placeholderLength)
      case None => sb.toString
    }
  } else {
    sb.toString
  }
  recurse(0)
}

Much more efficient, and as functional as one can be using StringBuilder.

使用StringBuilder.

COMPREHENSION

理解

Scala Comprehensions, at their most basic level, means transforming T[A]into T[B]given a function A => B, something known as a functor. It can be easily understood when it comes to collections. For instance, I may transform a List[String]of names into a List[Int]of name lengths through a function String => Intwhich returns the length of a string. That's a list comprehension.

Scala Comprehensions 在最基本的层面上意味着转换T[A]T[B]给定的函数A => B,称为函子。当涉及到集合时,它很容易理解。例如,我可以通过一个返回字符串长度的函数将 a List[String]of names 转换为 a List[Int]of name 长度String => Int。这是一个列表理解。

There are other operations that can be done through comprehensions, given functions with signatures A => T[B], which is related to monads, or A => Boolean.

还有其他操作可以通过A => T[B]推导完成,给定带有签名的函数,这与 monad 或A => Boolean.

That means we need to see the input string as a T[A]. We can't use Array[Char]as input because we want to replace the whole placeholder, which is larger than a single char. Let's propose, therefore, this type signature:

这意味着我们需要将输入字符串视为T[A]. 我们不能Array[Char]用作输入,因为我们要替换比单个字符大的整个占位符。因此,让我们提出这种类型的签名:

(List[String], String => String) => String

Since we the input we receive is String, we need a function String => List[String]first, which will divide our input into placeholders and non-placeholders. I propose this:

由于我们收到的输入是String,我们String => List[String]首先需要一个函数,它将我们的输入分为占位符和非占位符。我提出这个:

val regexPattern2 = """((?:[^$]+|$(?!\d))+)|($\d+)""".r
def tokenize(input: String): List[String] = regexPattern2.findAllIn(input).toList

Another problem we have is that we got an IndexedSeq[String], but we need a String => String. There are many ways around that, but let's settle with this:

我们遇到的另一个问题是我们有一个IndexedSeq[String],但我们需要一个String => String。有很多方法可以解决这个问题,但让我们解决这个问题:

def valuesMatcher(values: IndexedSeq[String]): String => String = (input: String) => values(input.substring(1).toInt - 1)

We also need a function List[String] => String, but List's mkStringdoes that already. So there's little left to do aside composing all this stuff:

我们还需要一个函数List[String] => String,但是List'smkString已经做到了。所以除了编写所有这些东西之外,几乎没有什么可做的:

def comprehension(input: List[String], matcher: String => String) = 
  for (token <- input) yield (token: @unchecked) match {
    case regexPattern2(_, placeholder: String) => matcher(placeholder)
    case regexPattern2(other: String, _) => other
  }

I use @uncheckedbecause there shouldn'tbe any pattern other than these two above, if my regex pattern was built correctly. The compiler doesn't know that, however, so I use that annotation to silent the warning it would produce. If an exception is thrown, there's a bug in the regex pattern.

我使用@unchecked是因为如果我的正则表达式模式构建正确,除了上面这两个模式之外不应该有任何模式。然而,编译器不知道这一点,所以我使用该注释来消除它会产生的警告。如果抛出异常,则正则表达式模式中存在错误。

The final function, then, unifies all that:

然后,最后一个函数将所有这些统一起来:

def replaceComprehension(input: String, values: IndexedSeq[String]) =
  comprehension(tokenize(input), valuesMatcher(values)).mkString

One problem with this solution is that I apply the regex pattern twice: once to break up the string, and the other to identify the placeholders. Another problem is that the Listof tokens is an unnecessary intermediate result. We can solve that with these changes:

此解决方案的一个问题是我应用了两次正则表达式模式:一次用于拆分字符串,另一次用于识别占位符。另一个问题是List令牌的of 是不必要的中间结果。我们可以通过这些更改来解决这个问题:

def tokenize2(input: String): Iterator[List[String]] = regexPattern2.findAllIn(input).matchData.map(_.subgroups)

def comprehension2(input: Iterator[List[String]], matcher: String => String) = 
  for (token <- input) yield (token: @unchecked) match {
    case List(_, placeholder: String) => matcher(placeholder)
    case List(other: String, _) => other
  }

def replaceComprehension2(input: String, values: IndexedSeq[String]) =
  comprehension2(tokenize2(input), valuesMatcher(values)).mkString

FOLDING

折叠式的

Folding is a bit similar to both recursion and comprehension. With folding, we take a T[A]input that can be comprehended, a B"seed", and a function (B, A) => B. We comprehend the list using the function, always taking the Bthat resulted from the last element processed (the first element takes the seed). Finally, we return the result of the last comprehended element.

折叠有点类似于递归和理解。通过折叠,我们接受一个T[A]可以理解的输入、一个B“种子”和一个函数(B, A) => B。我们使用函数来理解列表,总是采用B最后一个元素处理的结果(第一个元素采用种子)。最后,我们返回最后一个理解元素的结果。

I'll admit I could hardly explained it in a less-obscure way. That's what happens when you try to keep abstract. I explained it that way so that the type signatures involved become clear. But let's just see a trivial example of folding to understand its usage:

我承认我很难用一种不那么晦涩的方式来解释它。当你试图保持抽象时,就会发生这种情况。我是这样解释的,以便所涉及的类型签名变得清晰。但是让我们看一个简单的折叠例子来理解它的用法:

def factorial(n: Int) = {
  val input = 2 to n
  val seed = 1
  val function = (b: Int, a: Int) => b * a
  input.foldLeft(seed)(function)
}

Or, as a one-liner:

或者,作为单线:

def factorial2(n: Int) = (2 to n).foldLeft(1)(_ * _)

Ok, so how would we go about solving the problem with folding? The result, of course, should be the string we want to produce. Therefore, the seed should be an empty string. Let's use the result from tokenize2as the comprehensible input, and do this:

好的,那么我们将如何解决折叠问题?结果当然应该是我们想要产生的字符串。因此,种子应该是一个空字符串。让我们使用结果 fromtokenize2作为可理解的输入,并执行以下操作:

def replaceFolding(input: String, values: IndexedSeq[String]) = {
  val seed = new StringBuilder(input.length)
  val matcher = valuesMatcher(values)
  val foldingFunction = (sb: StringBuilder, token: List[String]) => {
    token match {          
      case List(_, placeholder: String) => sb.append(matcher(placeholder))
      case List(other: String, _) => sb.append(other)
    }
    sb
  }
  tokenize2(input).foldLeft(seed)(foldingFunction).toString
}

And, with that, I finish showing the most usual ways one would go about this in a functional manner. I have resorted to StringBuilderbecause concatenation of Stringis slow. If that wasn't the case, I could easily replace StringBuilderin functions above by String. I could also convert Iteratorinto a Stream, and completely do away with mutability.

并且,通过这个,我完成了以功能方式展示最常见的方法。我之所以诉诸,StringBuilder是因为连接String速度很慢。如果不是这种情况,我可以很容易地将StringBuilder上面的函数替换为String. 我也可以转换Iterator为 a Stream,并完全消除可变性。

This is Scala, though and Scala is about balancing needs and means, not of purist solutions. Though, of course, you are free to go purist. :-)

这就是 Scala,尽管 Scala 是关于平衡需求和手段,而不是纯粹的解决方案。当然,你可以自由地变得纯粹。:-)

回答by oxbow_lakes

You can use the standard Java String.formatstylewith a twist:

您可以使用标准的 JavaString.format风格

"My name is %s and I am %d years of age".format("Oxbow", 34)

In Java of course this would have looked like:

在 Java 中,这当然看起来像:

String.format("My name is %s and I am %d years of age", "Oxbow", 34)

The primary difference between these two styles (I much prefer Scala's) is that conceptually this means that every String can be considered a format string in Scala(i.e. the format method appears to be an instance method on the Stringclass). Whilst this might be argued to be conceptually wrong, it leads to more intuitive and readable code.

这两种风格(我更喜欢 Scala 的)之间的主要区别在于,从概念上讲,这意味着每个 String 都可以被视为 Scala 中的格式字符串(即格式方法似乎是String类上的实例方法)。虽然这可能会被认为在概念上是错误的,但它会导致更直观和可读的代码。

This formatting style allows you to format floating-point numbers as you wish, dates etc. The main issue with it is that the "binding" between the placeholders in the format string and the arguments is purely order based, not related to names in any way (like "My name is ${name}") although I fail to see how...

这种格式样式允许您根据需要格式化浮点数、日期等。它的主要问题是格式字符串中的占位符和参数之间的“绑定”纯粹是基于顺序的,与任何名称无关方式(如"My name is ${name}")虽然我不明白如何......

interpolate("My name is ${name} and I am ${age} years of age", 
               Map("name" -> "Oxbow", "age" -> 34))

...is any more readable embedded in my code. This sort of thing is much more useful for text replacement where the source text is embedded in separate files (in i18nfor example) where you would want something like:

...嵌入在我的代码中是否更具可读性。这类事情对于文本替换更有用,其中源文本嵌入在单独的文件中(例如在i18n中),您需要这样的内容:

"name.age.intro".text.replacing("name" as "Oxbow").replacing("age" as "34").text

Or:

或者:

"My name is ${name} and I am ${age} years of age"
     .replacing("name" as "Oxbow").replacing("age" as "34").text

I would think that this would be pretty easy to use and take just a few minutes to write (I can't seem to get Daniel's interpolate to compile with the Scala 2.8 version I have):

我认为这将非常易于使用,只需几分钟即可编写(我似乎无法让 Daniel 的插值与我拥有的 Scala 2.8 版本一起编译):

object TextBinder {
  val p = new java.util.Properties
  p.load(new FileInputStream("C:/mytext.properties"))

  class Replacer(val text: String) {
    def replacing(repl: Replacement) = new Replacer(interpolate(text, repl.map))
  }

  class Replacement(from: String, to: String) {
    def map = Map(from -> to)
  }
  implicit def stringToreplacementstr(from: String) = new {
    def as(to: String) = new Replacement(from, to)
    def text = p.getProperty(from)
    def replacing(repl: Replacement) = new Replacer(from)
  }

  def interpolate(text: String, vars: Map[String, String]) = 
    (text /: vars) { (t, kv) => t.replace("${"+kv._1+"}", kv._2)  }
}

I am a a sucker for fluent APIs by the way! No matter how unperformant they are!

顺便说一句,我是流畅的 API 的傻瓜!不管他们的表现有多差!

回答by Eric

This is not a direct answer to your question but more of a Scala trick. You can interpolate strings in Scala by using xml:

这不是对您问题的直接回答,而是更多的 Scala 技巧。您可以使用 xml 在 Scala 中插入字符串:

val id = 250
val value = "some%"
<s>select col1 from tab1 where id > {id} and name like {value}</s>.text
// res1: String = select col1 from tab1 where id > 250 and name like some%

Eric.

埃里克。

回答by Mitch Blevins

You can use the little known "QP brackets" to delimit scala expressions within strings. This has an advantage over other methods in that you can use any scala expression, not just simple vals/vars. Just use an opening "+and closing +"bracket delimiters.

您可以使用鲜为人知的“QP 括号”来分隔字符串中的 Scala 表达式。与其他方法相比,这有一个优势,因为您可以使用任何 scala 表达式,而不仅仅是简单的 vals/vars。只需使用"++"括号和右括号分隔符即可。

Example:

例子:

  val name = "Joe Schmoe"
  val age = 32
  val str = "My name is "+name+" and my age is "+age+"."