scala Spark 数据帧过滤器

Question

提问by Ramesh

val df = sc.parallelize(Seq((1,"Emailab"), (2,"Phoneab"), (3, "Faxab"),(4,"Mail"),(5,"Other"),(6,"MSL12"),(7,"MSL"),(8,"HCP"),(9,"HCP12"))).toDF("c1","c2")

+---+-------+
| c1|     c2|
+---+-------+
|  1|Emailab|
|  2|Phoneab|
|  3|  Faxab|
|  4|   Mail|
|  5|  Other|
|  6|  MSL12|
|  7|    MSL|
|  8|    HCP|
|  9|  HCP12|
+---+-------+

I want to filter out records which have first 3 characters of column 'c2' either 'MSL' or 'HCP'.

我想过滤掉“c2”列的前 3 个字符（“MSL”或“HCP”）的记录。

So the output should be like below.

所以输出应该如下所示。

+---+-------+
| c1|     c2|
+---+-------+
|  1|Emailab|
|  2|Phoneab|
|  3|  Faxab|
|  4|   Mail|
|  5|  Other|
+---+-------+

Can any one please help on this?

任何人都可以帮忙吗？

I knew that df.filter($"c2".rlike("MSL"))-- This is for selecting the records but how to exclude the records. ?

我知道df.filter($"c2".rlike("MSL"))- 这是用于选择记录但如何排除记录。?

Version: Spark 1.6.2 Scala : 2.10

版本：Spark 1.6.2 Scala：2.10

Answer 1

采纳答案by pasha701

df.filter(not(
    substring(col("c2"), 0, 3).isin("MSL", "HCP"))
    )

Answer 2

回答by Jegan

This works too. Concise and very similar to SQL.

这也有效。简洁且与 SQL 非常相似。

df.filter("c2 not like 'MSL%' and c2 not like 'HCP%'").show
+---+-------+
| c1|     c2|
+---+-------+
|  1|Emailab|
|  2|Phoneab|
|  3|  Faxab|
|  4|   Mail|
|  5|  Other|
+---+-------+

Answer 3

回答by Priyanshu Singh

I used below to filter rows from dataframe and this worked form me.Spark 2.2

我在下面用来过滤数据框中的行，这在 me.Spark 2.2 中起作用

val spark = new org.apache.spark.sql.SQLContext(sc)    
val data = spark.read.format("csv").
          option("header", "true").
          option("delimiter", "|").
          option("inferSchema", "true").
          load("D:\test.csv")   


import  spark.implicits._
val filter=data.filter($"dept" === "IT" )

OR

或者

val filter=data.filter($"dept" =!= "IT" )

Answer 4

回答by Ramesh

val df1 = df.filter(not(df("c2").rlike("MSL"))&&not(df("c2").rlike("HCP")))

This worked.

这奏效了。

scala Spark 数据帧过滤器

提问by Ramesh

采纳答案by pasha701

回答by Jegan

回答by Priyanshu Singh

回答by Ramesh

相关推荐

最近更新

标签

scala Spark 数据帧过滤器

提问by Ramesh

采纳答案by pasha701

回答by Jegan

回答by Priyanshu Singh

回答by Ramesh

相关推荐

scala java.lang.RuntimeException: java.lang.String 不是 bigint 或 int 模式的有效外部类型

scala sbt 和公司代理 - SunCertPathBuilderException

scala 将 DataFrame 保存为 CSV 时指定文件名

使用多个条件加入多个数据帧 Spark Scala

相关推荐

最近更新

标签