scala Spark 键/值过滤功能
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/30577850/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Spark Key/Value filter Function
提问by theMadKing
I have data in a Key Value pairing. I am trying to apply a filter function to the data that looks like:
我有键值对中的数据。我正在尝试将过滤器函数应用于如下所示的数据:
def filterNum(x: Int) : Boolean = {
if (decimalArr.contains(x)) return true
else return false
}
My Spark code that has:
我的 Spark 代码具有:
val numRDD = columnRDD.filter(x => filterNum(x(0)))
but that wont work and when I send in the:
但这行不通,当我发送:
val numRDD = columnRDD.filter(x => filterNum(x))
I get the error:
我收到错误:
<console>:23: error: type mismatch;
found : (Int, String)
required: Int
val numRDD = columnRDD.filter(x => filterNum(x))
I also have tried to do other things like changing the inputs to the function
我还尝试做其他事情,例如更改函数的输入
回答by Justin Pihony
This is because RDD.filteris passing in the Key-Value Tuple, (Int, String), and filterNum is expecting an Int, which is why the first attempt works: tuple(index)pulls out the value at that index of the tuple.
这是因为RDD.filter传入键值元组,(Int, String)而 filterNum 期望的是Int,这就是第一次尝试有效的原因:tuple(index)在元组的该索引处提取值。
You could change your filter function to be
您可以将过滤器功能更改为
def filterNum(x: (Int, String)) : Boolean = {
if (decimalArr.contains(x._1)) return true
else return false
}
Although, I would personally do a more terse version as the false is baked into the containsand you can just use the expression directly:
虽然,我个人会做一个更简洁的版本,因为 false 被烘焙到了contains,你可以直接使用这个表达式:
columnRDD.filter(decimalArr.contains(_._1))
Or, if you don't like the underscore syntax:
或者,如果您不喜欢下划线语法:
columnRDD.filter(x=>decimalArr.contains(x._1))
Also, do not use returnin scala, the last evaluated line is the return automatically
另外,不要return在scala中使用,最后评估的行是自动返回

