scala 在字符串上过滤 spark DataFrame 包含

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/35759099/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-22 08:03:23  来源:igfitidea点击:

Filter spark DataFrame on string contains

scalaapache-sparkdataframeapache-spark-sql

提问by Knows Not Much

I am using Spark 1.3.0and Spark Avro 1.0.0. I am working from the example on the repository page. This following code works well

我正在使用Spark 1.3.0Spark Avro 1.0.0。我正在使用存储库页面上的示例。以下代码运行良好

val df = sqlContext.read.avro("src/test/resources/episodes.avro")
df.filter("doctor > 5").write.avro("/tmp/output")

But what if I needed to see if the doctorstring contains a substring? Since we are writing our expression inside of a string. What do I do to do a "contains"?

但是如果我需要查看doctor字符串是否包含子字符串呢?因为我们在字符串中编写我们的表达式。我该怎么做才能做到“包含”?

回答by zero323

You can use contains(this works with an arbitrary sequence):

您可以使用contains(这适用于任意序列):

df.filter($"foo".contains("bar"))

like(SQL like with SQL simple regular expression whith _matching an arbitrary character and %matching an arbitrary sequence):

like(SQL 类似于 SQL 简单正则表达式,_匹配任意字符并%匹配任意序列):

df.filter($"foo".like("bar"))

or rlike(like with Java regular expressions):

rlike(如Java 正则表达式):

df.filter($"foo".rlike("bar"))

depending on your requirements. LIKEand RLIKEshould work with SQL expressions as well.

取决于您的要求。LIKE并且也RLIKE应该使用 SQL 表达式。

回答by Jay1991

In pyspark,SparkSql syntax:

在pyspark中,SparkSql语法:

where column_n like 'xyz%'

might not work.

可能不起作用。

Use:

利用:

where column_n RLIKE '^xyz' 

This works perfectly fine.

这工作得很好。