scala Apache Spark:按名称获取行的元素
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/30674376/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Apache Spark: get elements of Row by name
提问by Ken Williams
In a DataFrameobject in Apache Spark (I'm using the Scala interface), if I'm iterating over its Rowobjects, is there any way to extract values by name? I can see how to do some really awkward stuff:
在DataFrameApache Spark 中的对象(我使用的是 Scala 接口)中,如果我正在迭代它的Row对象,有没有办法按名称提取值?我可以看到如何做一些非常尴尬的事情:
def foo(r: Row) = {
val ix = (0 until r.schema.length).map( i => r.schema(i).name -> i).toMap
val field1 = r.getString(ix("field1"))
val field2 = r.getLong(ix("field2"))
...
}
dataframe.map(foo)
I figure there must be a better way - this is pretty verbose, it requires creating this extra structure, and it also requires knowing the types explicitly, which if incorrect, will produce a runtime exception rather than a compile-time error.
我认为必须有更好的方法 - 这非常冗长,它需要创建这个额外的结构,并且还需要明确知道类型,如果不正确,将产生运行时异常而不是编译时错误。
采纳答案by Justin Pihony
This is not supported at this time in the Scala API. The closest you have is this JIRA titled "Support converting DataFrames to typed RDDs"
目前 Scala API 不支持此功能。您拥有的最接近的是标题为“支持将数据帧转换为类型化 RDD”的 JIRA
回答by Kexin Nie
You can use "getAs" from org.apache.spark.sql.Row
您可以使用“ getAs”从org.apache.spark.sql.Row
r.getAs("field1")
r.getAs("field2")
Know more about getAs(java.lang.String fieldName)

