scala 将两列传递给scala中的udf?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/44970829/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Passing two columns to a udf in scala?
提问by Rohan Oswal
I have a dataframe containing two columns,one is data and the other column is character count in that data field.
我有一个包含两列的数据框,一列是数据,另一列是该数据字段中的字符数。
Data Count
Hello 5
How 3
World 5
I want to change value of column data based on the value in count column. How can this be achieved? I tried this using an udf :
我想根据计数列中的值更改列数据的值。如何做到这一点?我用 udf 试过这个:
invalidrecords.withColumn("value",appendDelimiterError(invalidrecords("value"),invalidrecords("a_cnt")))
This seems to fail, is this the correct way to do it?
这似乎失败了,这是正确的做法吗?
回答by Ramesh Maharjan
Here's an easy way of doing it
这是一个简单的方法
first you create a dataframe
首先你创建一个 dataframe
import sqlContext.implicits._
val invalidrecords = Seq(
("Hello", 5),
("How", 3),
("World", 5)
).toDF("Data", "Count")
you should have
你应该有
+-----+-----+
|Data |Count|
+-----+-----+
|Hello|5 |
|How |3 |
|World|5 |
+-----+-----+
Then you define udf function as
然后你定义 udf 函数为
import org.apache.spark.sql.functions._
def appendDelimiterError = udf((data: String, count: Int) => "value with error" )
And you call using withColumnas
你打电话使用withColumnas
invalidrecords.withColumn("value",appendDelimiterError(invalidrecords("Data"),invalidrecords("Count"))).show(false)
You should have output as
你应该有输出
+-----+-----+----------------+
|Data |Count|value |
+-----+-----+----------------+
|Hello|5 |value with error|
|How |3 |value with error|
|World|5 |value with error|
+-----+-----+----------------+
You can write your logic instead of returning a string from udffunction
您可以编写逻辑而不是从udf函数返回字符串
Edited
已编辑
Answering your requirements in the comment below would require you to change the udf function and withColumn as below
在下面的评论中回答您的要求将需要您更改 udf 函数和 withColumn 如下
def appendDelimiterError = udf((data: String, count: Int) => {
if(count < 5) s"convert value to ${data} - error"
else data
} )
invalidrecords.withColumn("Data",appendDelimiterError(invalidrecords("Data"),invalidrecords("Count"))).show(false)
you should have output as
你应该有输出
+----------------------------+-----+
|Data |Count|
+----------------------------+-----+
|Hello |5 |
|convert value to How - error|3 |
|World |5 |
+----------------------------+-----+

