scala 如何在 spark 中使用 Regexp_replace

Question

提问by user3420819

I am pretty new to spark and would like to perform an operation on a column of a dataframe so as to replace all the ,in the column with .

我对火花很陌生，想对数据框的一列执行操作，以便将列中的所有内容替换,为.

Assume there is a dataframe x and column x4

假设有一个数据框 x 和列 x4

I want the output to be as

我希望输出为

The code I am using is

我使用的代码是

import org.apache.spark.sql.Column
def replace = regexp_replace((x.x4,1,6566:String,1.6566:String)x.x4)

But I get the following error

但我收到以下错误

import org.apache.spark.sql.Column
<console>:1: error: ')' expected but '.' found.
       def replace = regexp_replace((train_df.x37,0,160430299:String,0.160430299:String)train_df.x37)

Any help on the syntax, logic or any other suitable way would be much appreciated

任何有关语法、逻辑或任何其他合适方式的帮助将不胜感激

Answer 1

回答by mtoto

Here's a reproducible example, assuming x4is a string column.

这是一个可重现的示例，假设x4是一个字符串列。

import org.apache.spark.sql.functions.regexp_replace

val df = spark.createDataFrame(Seq(
  (1, "1,3435"),
  (2, "1,6566"),
  (3, "-0,34435"))).toDF("Id", "x4")

The syntax is regexp_replace(str, pattern, replacement), which translates to:

语法是regexp_replace(str, pattern, replacement)，它转换为：

df.withColumn("x4New", regexp_replace(df("x4"), "\,", ".")).show
+---+--------+--------+
| Id|      x4|   x4New|
+---+--------+--------+
|  1|  1,3435|  1.3435|
|  2|  1,6566|  1.6566|
|  3|-0,34435|-0.34435|
+---+--------+--------+

scala 如何在 spark 中使用 Regexp_replace

提问by user3420819

回答by mtoto

相关推荐

最近更新

标签

scala 如何在 spark 中使用 Regexp_replace

提问by user3420819

回答by mtoto

相关推荐

scala 如何查询 Spark 数据集的列名？

错误：未找到：值点亮/何时 - spark scala

scala 将 Array[(String,String)] 类型转换为 Spark 中的 RDD[(String,String)] 类型

scala 如果地图功能中的条件

相关推荐

最近更新

标签