scala 将 Spark 数据框中的所有“:”替换为“_”
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/39308928/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Replace all ":" with "_" in Spark dataframe
提问by Feynman27
I'm trying to replace all instances of ":" --> "_" in a single column of a Spark dataframe. I'm trying to do this with:
我正在尝试在 Spark 数据帧的单个列中替换“:”-->“_”的所有实例。我正在尝试这样做:
val url_cleaner = (s:String) => {
s.replaceAll(":","_")
}
val url_cleaner_udf = udf(url_cleaner)
val df = old_df.withColumn("newCol", url_cleaner_udf(old_df("oldCol")) )
But I keep getting the error:
但我不断收到错误消息:
SparkException: Job aborted due to stage failure: Task 0 in stage 25.0 failed 4 times, most recent failure: Lost task 0.3 in stage 25.0 (TID 692, ip-10-81-194-29.ec2.internal): java.lang.NullPointerException
Where am I going wrong in the udf?
我在 udf 哪里出错了?
回答by T. Gaw?da
Probably you've got some nulls in this column.
可能您在此列中有一些空值。
Try:
尝试:
val urlCleaner = (s:String) => {
if (s == null) null else s.replaceAll(":","_")
}
You can also use regexp_replace(col("newCol"), ":", "_")instead of own function
您也可以使用regexp_replace(col("newCol"), ":", "_")代替自己的功能

