将 RDD 作为参数传递并将数据帧返回给函数 - scala
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/37932879/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-22 08:24:00 来源:igfitidea点击:
pass RDD as parameter and return dataframe to a function - scala
提问by user1122
I am trying to create function which takes string or RDD as an argument but returns dataframe.
我正在尝试创建将字符串或 RDD 作为参数但返回数据帧的函数。
Code:
代码:
def udf1 (input: String) = {
val file = sc.textFile(input);
file.map(p => Person(
(p.substring(1, 15)),
p.substring(16, 20))).toDF()
}
def main() {
case class Person(id: String, name: String)
val df1 = udf1 ("hdfs:\")
}
but it retuns always rdd. any suggestions?
但它总是返回 rdd。有什么建议?
回答by evan.oman
Not sure exactly why your code isn't working, but good Scalaform would include specifying return types:
不确定为什么您的代码不起作用,但好的Scala形式包括指定返回类型:
scala> case class Person(id: Int)
defined class Person
scala> def udf1(fName: String): DataFrame = {
| val file = sc.textFile(fName)
| file.map(p => Person(p.toInt)).toDF()
| }
udf1: (fName: String)org.apache.spark.sql.DataFrame
scala> val df = udf1("file.txt")
df: org.apache.spark.sql.DataFrame = [id: int]

