scala 将两列传递给scala中的udf?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/44970829/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-22 09:20:33  来源:igfitidea点击:

Passing two columns to a udf in scala?

scalaapache-sparkuser-defined-functions

提问by Rohan Oswal

I have a dataframe containing two columns,one is data and the other column is character count in that data field.

我有一个包含两列的数据框,一列是数据,另一列是该数据字段中的字符数。

Data    Count
Hello   5
How     3
World   5

I want to change value of column data based on the value in count column. How can this be achieved? I tried this using an udf :

我想根据计数列中的值更改列数据的值。如何做到这一点?我用 udf 试过这个:

invalidrecords.withColumn("value",appendDelimiterError(invalidrecords("value"),invalidrecords("a_cnt")))

This seems to fail, is this the correct way to do it?

这似乎失败了,这是正确的做法吗?

回答by Ramesh Maharjan

Here's an easy way of doing it

这是一个简单的方法

first you create a dataframe

首先你创建一个 dataframe

import sqlContext.implicits._
val invalidrecords = Seq(
  ("Hello", 5),
  ("How", 3),
  ("World", 5)
).toDF("Data", "Count")

you should have

你应该有

+-----+-----+
|Data |Count|
+-----+-----+
|Hello|5    |
|How  |3    |
|World|5    |
+-----+-----+

Then you define udf function as

然后你定义 udf 函数为

import org.apache.spark.sql.functions._
def appendDelimiterError = udf((data: String, count: Int) => "value with error" )

And you call using withColumnas

你打电话使用withColumnas

invalidrecords.withColumn("value",appendDelimiterError(invalidrecords("Data"),invalidrecords("Count"))).show(false)

You should have output as

你应该有输出

+-----+-----+----------------+
|Data |Count|value           |
+-----+-----+----------------+
|Hello|5    |value with error|
|How  |3    |value with error|
|World|5    |value with error|
+-----+-----+----------------+

You can write your logic instead of returning a string from udffunction

您可以编写逻辑而不是从udf函数返回字符串

Edited

已编辑

Answering your requirements in the comment below would require you to change the udf function and withColumn as below

在下面的评论中回答您的要求将需要您更改 udf 函数和 withColumn 如下

def appendDelimiterError = udf((data: String, count: Int) => {
  if(count < 5) s"convert value to ${data} - error"
  else data
} )

invalidrecords.withColumn("Data",appendDelimiterError(invalidrecords("Data"),invalidrecords("Count"))).show(false)

you should have output as

你应该有输出

+----------------------------+-----+
|Data                        |Count|
+----------------------------+-----+
|Hello                       |5    |
|convert value to How - error|3    |
|World                       |5    |
+----------------------------+-----+