scala 在字符串上过滤 spark DataFrame 包含
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/35759099/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Filter spark DataFrame on string contains
提问by Knows Not Much
I am using Spark 1.3.0and Spark Avro 1.0.0. I am working from the example on the repository page. This following code works well
我正在使用Spark 1.3.0和Spark Avro 1.0.0。我正在使用存储库页面上的示例。以下代码运行良好
val df = sqlContext.read.avro("src/test/resources/episodes.avro")
df.filter("doctor > 5").write.avro("/tmp/output")
But what if I needed to see if the doctorstring contains a substring? Since we are writing our expression inside of a string. What do I do to do a "contains"?
但是如果我需要查看doctor字符串是否包含子字符串呢?因为我们在字符串中编写我们的表达式。我该怎么做才能做到“包含”?
回答by zero323
You can use contains(this works with an arbitrary sequence):
您可以使用contains(这适用于任意序列):
df.filter($"foo".contains("bar"))
like(SQL like with SQL simple regular expression whith _matching an arbitrary character and %matching an arbitrary sequence):
like(SQL 类似于 SQL 简单正则表达式,_匹配任意字符并%匹配任意序列):
df.filter($"foo".like("bar"))
or rlike(like with Java regular expressions):
或rlike(如Java 正则表达式):
df.filter($"foo".rlike("bar"))
depending on your requirements. LIKEand RLIKEshould work with SQL expressions as well.
取决于您的要求。LIKE并且也RLIKE应该使用 SQL 表达式。
回答by Jay1991
In pyspark,SparkSql syntax:
在pyspark中,SparkSql语法:
where column_n like 'xyz%'
might not work.
可能不起作用。
Use:
利用:
where column_n RLIKE '^xyz'
This works perfectly fine.
这工作得很好。

