scala 在字符串上过滤 spark DataFrame 包含

Question

提问by Knows Not Much

I am using Spark 1.3.0and Spark Avro 1.0.0. I am working from the example on the repository page. This following code works well

我正在使用Spark 1.3.0和Spark Avro 1.0.0。我正在使用存储库页面上的示例。以下代码运行良好

val df = sqlContext.read.avro("src/test/resources/episodes.avro")
df.filter("doctor > 5").write.avro("/tmp/output")

But what if I needed to see if the doctorstring contains a substring? Since we are writing our expression inside of a string. What do I do to do a "contains"?

但是如果我需要查看doctor字符串是否包含子字符串呢？因为我们在字符串中编写我们的表达式。我该怎么做才能做到“包含”？

Answer 1

回答by zero323

You can use contains(this works with an arbitrary sequence):

您可以使用contains（这适用于任意序列）：

df.filter($"foo".contains("bar"))

like(SQL like with SQL simple regular expression whith _matching an arbitrary character and %matching an arbitrary sequence):

like（SQL 类似于 SQL 简单正则表达式，_匹配任意字符并%匹配任意序列）：

df.filter($"foo".like("bar"))

or rlike(like with Java regular expressions):

或rlike（如Java 正则表达式）：

df.filter($"foo".rlike("bar"))

depending on your requirements. LIKEand RLIKEshould work with SQL expressions as well.

取决于您的要求。LIKE并且也RLIKE应该使用 SQL 表达式。

Answer 2

回答by Jay1991

In pyspark,SparkSql syntax:

在pyspark中，SparkSql语法：

where column_n like 'xyz%'

might not work.

可能不起作用。

Use:

利用：

where column_n RLIKE '^xyz'

This works perfectly fine.

这工作得很好。

scala 在字符串上过滤 spark DataFrame 包含

提问by Knows Not Much

回答by zero323

回答by Jay1991

相关推荐

最近更新

标签

scala 在字符串上过滤 spark DataFrame 包含

提问by Knows Not Much

回答by zero323

回答by Jay1991

相关推荐

scala 在 Spark 中四舍五入

如何迭代 org.json4s.JsonAST.JValue，它是一个 JSON 对象数组，以分别处理 Scala 中的每个对象？

scala Spark：有条件地将列添加到数据框

scala 如何在 Spark 中访问广播的 DataFrame

相关推荐

最近更新

标签