scala 将函数应用于 Spark Dataframe 列

Question

提问by Michael Discenza

Coming from R, I am used to easily doing operations on columns. Is there any easy way to take this function that I've written in scala

来自 R，我习惯于轻松地对列进行操作。有没有什么简单的方法可以使用我用 Scala 编写的这个函数

def round_tenths_place( un_rounded:Double ) : Double = {
    val rounded = BigDecimal(un_rounded).setScale(1, BigDecimal.RoundingMode.HALF_UP).toDouble
    return rounded
}

And apply it to a one column of a dataframe - kind of what I hoped this would do:

并将其应用于数据框的一列 - 我希望这会做什么：

 bid_results.withColumn("bid_price_bucket", round_tenths_place(bid_results("bid_price")) )

I haven't found any easy way and am struggling to figure out how to do this. There's got to be an easier way than converting the dataframe to and RDD and then selecting from rdd of rows to get the right field and mapping the function across all of the values, yeah? And also something more succinct creating a SQL table and then doing this with a sparkSQL UDF?

我还没有找到任何简单的方法，并且正在努力弄清楚如何做到这一点。必须有一种比将数据帧转换为和 RDD 然后从行的 rdd 中选择以获取正确的字段并在所有值上映射函数更简单的方法，是吗？还有更简洁的方法创建 SQL 表，然后使用 sparkSQL UDF 执行此操作吗？

Answer 1

回答by zero323

You can define an UDF as follows:

您可以按如下方式定义 UDF：

val round_tenths_place_udf = udf(round_tenths_place _)
bid_results.withColumn(
  "bid_price_bucket", val round_tenths_place_udf($"bid_price"))

although built-in Roundexpressionis using exactly the same logic as your function and should be more than enough, not to mention much more efficient:

尽管内置Round表达式使用与您的函数完全相同的逻辑并且应该绰绰有余，更不用说效率更高了：

import org.apache.spark.sql.functions.round

bid_results.withColumn("bid_price_bucket", round($"bid_price", 1))

scala 将函数应用于 Spark Dataframe 列

提问by Michael Discenza

回答by zero323

相关推荐

最近更新

标签

scala 将函数应用于 Spark Dataframe 列

提问by Michael Discenza

回答by zero323

相关推荐

scala 如何在 spark-sql 中使用“not rlike”？

scala 如何将 spark 数据框中的 WrappedArray 列转换为字符串？

scala Scala要么向右映射要么向左返回

scala 我应该在声明案例类时使用 final 修饰符吗？

相关推荐

最近更新

标签