scala 将 Spark 数据框中的所有“:”替换为“_”

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/39308928/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-22 08:36:00  来源:igfitidea点击:

Replace all ":" with "_" in Spark dataframe

scalaapache-sparkuser-defined-functionsspark-dataframe

提问by Feynman27

I'm trying to replace all instances of ":" --> "_" in a single column of a Spark dataframe. I'm trying to do this with:

我正在尝试在 Spark 数据帧的单个列中替换“:”-->“_”的所有实例。我正在尝试这样做:

val url_cleaner = (s:String) => {
   s.replaceAll(":","_")
}
val url_cleaner_udf = udf(url_cleaner)
val df = old_df.withColumn("newCol", url_cleaner_udf(old_df("oldCol")) )

But I keep getting the error:

但我不断收到错误消息:

 SparkException: Job aborted due to stage failure: Task 0 in stage 25.0 failed 4 times, most recent failure: Lost task 0.3 in stage 25.0 (TID 692, ip-10-81-194-29.ec2.internal): java.lang.NullPointerException

Where am I going wrong in the udf?

我在 udf 哪里出错了?

回答by T. Gaw?da

Probably you've got some nulls in this column.

可能您在此列中有一些空值。

Try:

尝试:

val urlCleaner = (s:String) => {
   if (s == null) null else s.replaceAll(":","_")
}

You can also use regexp_replace(col("newCol"), ":", "_")instead of own function

您也可以使用regexp_replace(col("newCol"), ":", "_")代替自己的功能