理解 Scala 解析器组合器中的波浪号

Question

提问by Jano

I'm fairly new to Scala and while reading about parser combinators(The Magic Behind Parser Combinators, Domain-Specific Languages in Scala) I came across method definitions like this:

我是相当新的Scala和一边念叨解析器组合（魔术解析器组合的背后，在斯卡拉领域特定语言），我遇到的方法定义来是这样的：

def classPrefix = "class" ~ ID ~ "(" ~ formals ~ ")"

I've been reading throught the API doc of scala.util.parsing.Parsers which defines a method named (tilde) but I still dont't really understand its usage in the example above. In that example (tilde) is a method which is called on java.lang.String which doesn't have that method and causes the compiler to fail. I know that (tilde) is defined as

我一直在阅读 scala.util.parsing.Parsers 的 API 文档，它定义了一个名为 (波浪号) 的方法，但我仍然没有真正理解它在上面的例子中的用法。在那个例子中（波浪号）是一个在 java.lang.String 上调用的方法，它没有那个方法并导致编译器失败。我知道（波浪号）被定义为

case class ~ [+a, +b] (_1: a, _2: b)

but how does this help in the example above?

但这对上面的例子有什么帮助？

I'd be happy if someone could give me a hint to understand what's going on here. Thank you very much in advance!

如果有人能给我一个提示来了解这里发生了什么，我会很高兴。非常感谢您提前！

Jan

简

Answer 1

回答by Rex Kerr

The structure here is a little bit tricky. First, notice that you always define these things insidea subclass of some parser, e.g. class MyParser extends RegexParsers. Now, you may note two implicit definitions inside RegexParsers:

这里的结构有点棘手。首先，请注意您总是在某个解析器的子类中定义这些东西，例如class MyParser extends RegexParsers. 现在，您可能会注意到里面有两个隐式定义RegexParsers：

implicit def literal (s: String): Parser[String]
implicit def regex (r: Regex): Parser[String]

What these will do is take any string or regex and convert them into a parser that matches that string or that regex as a token. They're implicit, so they'll be applied any time they're needed (e.g. if you call a method on Parser[String]that String(or Regex) does not have).

这些将做的是获取任何字符串或正则表达式并将它们转换为与该字符串或正则表达式匹配的解析器作为标记。它们是隐式的，因此它们将在需要时随时应用（例如，如果您调用Parser[String]该String（或Regex）没有的方法）。

But what is this Parserthing? It's an inner class defined inside Parsers, the supertrait for RegexParser:

但这是什么Parser东西？它是内部定义的内部类Parsers，其超特征为RegexParser：

class Parser [+T] extends (Input) ? ParseResult[T]

Looks like it's a function that takes input and maps it to a result. Well, that makes sense! And you can see the documentation for it here.

看起来它是一个接受输入并将其映射到结果的函数。嗯，有道理！您可以在此处查看它的文档。

Now we can just look up the ~method:

现在我们可以只查找~方法：

def ~ [U] (q: ? Parser[U]): Parser[~[T, U]]
  A parser combinator for sequential composition
  p ~ q' succeeds if p' succeeds and q' succeeds on the input left over by p'.

So, if we see something like

所以，如果我们看到类似的东西

def seaFacts = "fish" ~ "swim"

what happens is, first, "fish"does not have the ~method, so it's implicitly converted to Parser[String]which does. The ~method then wants an argument of type Parser[U], and so we implicitly convert "swim"into Parser[String](i.e. U== String). Now we have something that will match an input "fish", and whatever is left in the input should match "swim", and if both are the case, then seaFactswill succeed in its match.

发生的情况是，首先，"fish"没有~方法，所以它被隐式转换为Parser[String]有。~然后该方法需要一个类型为的参数Parser[U]，因此我们隐式转换"swim"为Parser[String]（即U== String）。现在我们有一些东西可以匹配 input "fish"，并且input中剩下的任何东西都应该 match "swim"，如果两者都是这种情况，那么seaFacts它的匹配就会成功。

Answer 2

回答by Didier Dupont

The ~method on parser combines two parser in one which applies the two original parsers successively and returns the two results. That could be simply (in Parser[T])

~parser 上的方法将两个解析器合二为一，依次应用两个原始解析器并返回两个结果。那可能只是 (in Parser[T])

def ~[U](q: =>Parser[U]): Parser[(T,U)].

If you never combined more than two parsers, that would be ok. However, if you chain three of them, p1, p2, p3, with return types T1, T2, T3, then p1 ~ p2 ~ p3, which means p1.~(p2).~(p3)is of type Parser[((T1, T2), T3)]. And if you combine five of them as in your example, that would be Parser[((((T1, T2), T3), T4), T5)]. Then when you pattern match on the result, you would have all those parantheses too :

如果你从来没有组合过两个以上的解析器，那就没问题了。但是，如果您将其中三个p1, p2, p3, 与返回类型T1, T2, T3, 连接起来，则p1 ~ p2 ~ p3，这意味着p1.~(p2).~(p3)类型为Parser[((T1, T2), T3)]。如果您像示例中那样将其中的五个组合在一起，那将是Parser[((((T1, T2), T3), T4), T5)]. 然后，当您对结果进行模式匹配时，您也会拥有所有这些括号：

case ((((_, id), _), formals), _) => ...

This is quite uncomfortable.

这很不舒服。

Then comes a clever syntactic trick. When a case class has two parameters, it can appears in infix rather than prefix position in a pattern. That is, if you have case class X(a: A, b: B), you can pattern match with case X(a, b), but also with case a X b. (That is what is done with a pattern x::xsto match a non empty List, ::is a case class). When you write case a ~ b ~ c, it means case ~(~(a,b), c), but is much more pleasant, and more pleasant than case ((a,b), c)too, which is tricky to get right.

然后是一个巧妙的句法技巧。当一个 case 类有两个参数时，它可以出现在模式中的中缀而不是前缀位置。也就是说，如果您有 case class X(a: A, b: B)，则可以使用进行模式匹配case X(a, b)，也可以使用case a X b。（这就是使用模式x::xs匹配非空列表所做的，::是一个案例类）。当你写 case 时a ~ b ~ c，它的意思是case ~(~(a,b), c), but 更令人愉快，比case ((a,b), c)too更令人愉快，这很难做到正确。

So the ~method in Parser returns a Parser[~[T,U]]instead of a Parser[(T,U)], so you can pattern match easily on the result of multiple ~. Beside that, ~[T,U]and (T,U)are pretty much the same thing, as isomorphic as you can get.

因此~Parser 中的方法返回 aParser[~[T,U]]而不是 a Parser[(T,U)]，因此您可以轻松地对多个 ~ 的结果进行模式匹配。在那旁边， ~[T,U]并且(T,U)是几乎同样的事情，因为同构的，你可以得到。

The same name is chosen for the combining method in parser and for the result type, because the resulting code is natural to read. One sees immediately how each part in the result processing relates to the items of the grammar rule.

为解析器中的组合方法和结果类型选择相同的名称，因为结果代码易于阅读。人们立即看到结果处理中的每个部分如何与语法规则的项目相关。

parser1 ~ parser2 ~ parser3 ^^ {case part1 ~ part2 ~ part3 => ...}

Tilda is chosen because its precedence (it binds tightly) plays nicely with the other operators on parser.

选择 Tilda 是因为它的优先级（紧密绑定）与解析器上的其他运算符很好地配合。

One last point, there are auxiliary operators ~>and <~which discard the result of one of the operand, typically the constant parts in the rule which carries no useful data. So one would rather write

最后一点，有辅助运算符~>，<~它们丢弃操作数之一的结果，通常是规则中不携带有用数据的常量部分。所以一个人宁愿写

"class" ~> ID <~ ")" ~ formals <~ ")"

and get only the values of ID and formals in the result.

并在结果中仅获取 ID 和形式的值。

Answer 3

回答by Eugene Yokota

You should checkout Parsers.Parser. Scala sometimes defines method and case class with the same name to aid pattern matching etc, and it's a little confusing if you're reading the Scaladoc.

你应该检查Parsers.Parser。Scala 有时会定义具有相同名称的方法和案例类以帮助模式匹配等，如果您正在阅读 Scaladoc，这会有点令人困惑。

In particular, "class" ~ IDis same as "class".~(ID). ~is a method that combines the parser with another parser sequentially.

特别是，"class" ~ ID与相同"class".~(ID)。~是一种将解析器与另一个解析器顺序组合的方法。

There's an implicit conversiondefined in RegexParsersthat automatically creates a parser from a Stringvalue. So, "class"automatically becomes an instance of Parser[String].

有一个隐式转换定义RegexParsers自动创建从一个解析器String值。因此，"class"自动成为Parser[String].

val ID = """[a-zA-Z]([a-zA-Z0-9]|_[a-zA-Z0-9])*"""r

RegexParsersalso defines another implicit conversion that automatically creates parser from a Regexvalue. So, IDautomatically becomes an instance of Parser[String]too.

RegexParsers还定义了另一个隐式转换，它自动从一个Regex值创建解析器。所以，ID自动变成了Parser[String]太的实例。

By combining two parsers, "class" ~ IDreturns a Parser[String]that matches the literal "class" and then the regular expression IDappearing sequentially. There are other methods like |and |||. For more info, read Programming in Scala.

通过组合两个解析器，"class" ~ ID返回Parser[String]与文字“类”匹配的a ，然后是ID顺序出现的正则表达式。还有其他方法，如|和|||。有关更多信息，请阅读Scala 编程。

理解 Scala 解析器组合器中的波浪号

提问by Jano

回答by Rex Kerr

回答by Didier Dupont

回答by Eugene Yokota

相关推荐

最近更新

标签

理解 Scala 解析器组合器中的波浪号

提问by Jano

回答by Rex Kerr

回答by Didier Dupont

回答by Eugene Yokota

相关推荐

scala 如何声明空列表然后在scala中添加字符串？

如果在 Scala 中找不到键，如何使用键访问地图的值？

Scala String 与 java.lang.String - 类型推断

Scala 中 :: 和 ::: 有什么区别

相关推荐

最近更新

标签