scala 从Scala中的地图中删除空字符串
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/26753626/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Removing empty strings from maps in scala
提问by Siva
val lines: RDD[String] = sc.textFile("/tmp/inputs/*")
val tokenizedLines = lines.map(Tokenizer.tokenize)
in the above code snippet, the tokenize function may return empty strings. How do i skip adding it to the map in that case? or remove empty entries post adding to map?
在上面的代码片段中,tokenize 函数可能会返回空字符串。在这种情况下,我如何跳过将其添加到地图中?或删除添加到地图后的空条目?
回答by axmrnv
tokenizedLines.filter(_.nonEmpty)
tokenizedLines.filter(_.nonEmpty)
回答by Daniel C. Sobral
The currently accepted answer, using filterand nonEmpty, incurs some performance penalty because nonEmptyis not a method on String, but, instead, it's added through implicit conversion. With value objects being used, I expect the difference to be almost imperceptible, but on versions of Scala where that is not the case, it is a substantial hit.
当前接受的答案使用filterandnonEmpty会导致一些性能损失,因为nonEmpty它不是 on 的方法String,而是通过隐式转换添加的。使用值对象时,我预计差异几乎是察觉不到的,但在 Scala 版本中,情况并非如此,这是一个巨大的打击。
Instead, one could use this, which is assured to be faster:
相反,可以使用它,它可以保证更快:
tokenizedLines.filterNot(_.isEmpty)
回答by crak
You could use flatMapwith Option.
您可以flatMap与Option.
Something like that:
类似的东西:
lines.flatMap{
case "" => None
case s => Some(s)
}
回答by user1989252
val tokenizedLines = (lines.map(Tokenizer.tokenize)).filter(_.nonEmpty)

