Spark Scala 如何在 RDD 中使用替换功能

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/42909092/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-22 09:08:23  来源:igfitidea点击:

Spark Scala How to use replace function in RDD

scalaapache-spark

提问by Ravinder Karra

I am having a tweet file

我有一个推文文件

396124436845178880,"When's 12.4k gonna roll around",Matty_T_03
396124437168537600,"I really wish I didn't give up everything I did for you.     I'm so mad at my self for even letting it get as far as it did.",savava143
396124436958412800,"I really need to double check who I'm sending my     snapchats to before sending it ",juliannpham
396124437218885632,"@Darrin_myers30 I feel you man, gotta stay prayed up.     Year is important",Ful_of_Ambition
396124437558611968,"tell me what I did in my life to deserve this.",_ItsNotBragging
396124437499502592,"Too many fine men out here...see me drooling",LolaofLife
396124437722198016,"@jaiclynclausen will do",I_harley99

I am trying to replace all special character after reading file into RDD,

我试图在将文件读入 RDD 后替换所有特殊字符,

    val fileReadRdd = sc.textFile(fileInput)
    val fileReadRdd2 = fileReadRdd.map(x => x.map(_.replace(","," ")))
    val fileFlat = fileReadRdd.flatMap(rec => rec.split(" "))

I am getting following error

我收到以下错误

Error:(41, 57) value replace is not a member of Char
    val fileReadRdd2 = fileReadRdd.map(x => x.map(_.replace(",","")))

采纳答案by Brian Agnew

I suspect:

我猜测:

x => x.map(_.replace(",",""))

is treating your string as a sequence of characters, and you actually want

将您的字符串视为字符序列,而您实际上想要

x => x.replace(",", "")

(i.e. you don't need to map over the 'sequence' of chars)

(即您不需要映射字符的“序列”)

回答by Yordan Georgiev

The Perl's oneliner perl -pi 's/\s+//' $filein a regular file system would look as follows in spark scala on any spark supported file system ( feel free to adjust your regex ) :

Perlperl -pi 's/\s+//' $file在常规文件系统中的 oneliner在任何支持 spark 的文件系统上的 spark scala 中如下所示(随意调整您的正则表达式):

// read the file into rdd of strings
val rdd: RDD[String] = spark.sparkContext.textFile(uri)

// for each line in rdd apply pattern and save to file
rdd
  .map(line => line.replaceAll("^\s+", ""))
  .saveAsTextFile(uri + ".tmp")