Python Pyspark RDD .filter() 带通配符

Question

提问by Lucas Mattos

I have an Pyspark RDD with a text column that I want to use as a a filter, so I have the following code:

我有一个带有文本列的 Pyspark RDD，我想将其用作 aa 过滤器，因此我有以下代码：

table2 = table1.filter(lambda x: x[12] == "*TEXT*")

To problem is... As you see I'm using the *to try to tell him to interpret that as a wildcard, but no success. Anyone has a help no that ?

问题是......正如你所见，我正在使用*试图告诉他将其解释为通配符，但没有成功。没有人有帮助吗？

Answer 1

The lambda function is pure python, so something like below would work

lambda 函数是纯 python，所以像下面这样的东西可以工作

table2 = table1.filter(lambda x: "TEXT" in x[12])