将 RDD 作为参数传递并将数据帧返回给函数 - scala

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/37932879/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-22 08:24:00  来源:igfitidea点击:

pass RDD as parameter and return dataframe to a function - scala

scalaapache-sparkspark-dataframe

提问by user1122

I am trying to create function which takes string or RDD as an argument but returns dataframe.

我正在尝试创建将字符串或 RDD 作为参数但返回数据帧的函数。

Code:

代码:

def udf1 (input: String) = {
  val file = sc.textFile(input);
  file.map(p => Person(
    (p.substring(1, 15)),
     p.substring(16, 20))).toDF()  
}

def main() { 
  case class Person(id: String, name: String)     
  val df1 = udf1 ("hdfs:\")
}

but it retuns always rdd. any suggestions?

但它总是返回 rdd。有什么建议?

回答by evan.oman

Not sure exactly why your code isn't working, but good Scalaform would include specifying return types:

不确定为什么您的代码不起作用,但好的Scala形式包括指定返回类型:

scala> case class Person(id: Int)
defined class Person

scala> def udf1(fName: String): DataFrame = {
     | val file = sc.textFile(fName)
     | file.map(p => Person(p.toInt)).toDF()
     | }
udf1: (fName: String)org.apache.spark.sql.DataFrame

scala> val df = udf1("file.txt")
df: org.apache.spark.sql.DataFrame = [id: int]