scala 从Scala中的地图中删除空字符串

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/26753626/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-22 06:40:32  来源:igfitidea点击:

Removing empty strings from maps in scala

scalaapache-spark

提问by Siva

val lines: RDD[String] = sc.textFile("/tmp/inputs/*")
val tokenizedLines = lines.map(Tokenizer.tokenize)

in the above code snippet, the tokenize function may return empty strings. How do i skip adding it to the map in that case? or remove empty entries post adding to map?

在上面的代码片段中,tokenize 函数可能会返回空字符串。在这种情况下,我如何跳过将其添加到地图中?或删除添加到地图后的空条目?

回答by axmrnv

tokenizedLines.filter(_.nonEmpty)

tokenizedLines.filter(_.nonEmpty)

回答by Daniel C. Sobral

The currently accepted answer, using filterand nonEmpty, incurs some performance penalty because nonEmptyis not a method on String, but, instead, it's added through implicit conversion. With value objects being used, I expect the difference to be almost imperceptible, but on versions of Scala where that is not the case, it is a substantial hit.

当前接受的答案使用filterandnonEmpty会导致一些性能损失,因为nonEmpty它不是 on 的方法String,而是通过隐式转换添加的。使用值对象时,我预计差异几乎是察觉不到的,但在 Scala 版本中,情况并非如此,这是一个巨大的打击。

Instead, one could use this, which is assured to be faster:

相反,可以使用它,它可以保证更快:

tokenizedLines.filterNot(_.isEmpty)

回答by crak

You could use flatMapwith Option.

您可以flatMapOption.

Something like that:

类似的东西:

lines.flatMap{
     case "" => None 
     case s => Some(s)
}

回答by user1989252

val tokenizedLines = (lines.map(Tokenizer.tokenize)).filter(_.nonEmpty)