scala 如何在 spark 中使用 Regexp_replace
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/40080609/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
how to use Regexp_replace in spark
提问by user3420819
I am pretty new to spark and would like to perform an operation on a column of a dataframe so as to replace all the ,in the column with .
我对火花很陌生,想对数据框的一列执行操作,以便将列中的所有内容替换,为.
Assume there is a dataframe x and column x4
假设有一个数据框 x 和列 x4
x4
1,3435
1,6566
-0,34435
I want the output to be as
我希望输出为
x4
1.3435
1.6566
-0.34435
The code I am using is
我使用的代码是
import org.apache.spark.sql.Column
def replace = regexp_replace((x.x4,1,6566:String,1.6566:String)x.x4)
But I get the following error
但我收到以下错误
import org.apache.spark.sql.Column
<console>:1: error: ')' expected but '.' found.
def replace = regexp_replace((train_df.x37,0,160430299:String,0.160430299:String)train_df.x37)
Any help on the syntax, logic or any other suitable way would be much appreciated
任何有关语法、逻辑或任何其他合适方式的帮助将不胜感激
回答by mtoto
Here's a reproducible example, assuming x4is a string column.
这是一个可重现的示例,假设x4是一个字符串列。
import org.apache.spark.sql.functions.regexp_replace
val df = spark.createDataFrame(Seq(
(1, "1,3435"),
(2, "1,6566"),
(3, "-0,34435"))).toDF("Id", "x4")
The syntax is regexp_replace(str, pattern, replacement), which translates to:
语法是regexp_replace(str, pattern, replacement),它转换为:
df.withColumn("x4New", regexp_replace(df("x4"), "\,", ".")).show
+---+--------+--------+
| Id| x4| x4New|
+---+--------+--------+
| 1| 1,3435| 1.3435|
| 2| 1,6566| 1.6566|
| 3|-0,34435|-0.34435|
+---+--------+--------+

