Python Pyspark RDD .filter() 带通配符

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/39256520/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 22:03:00  来源:igfitidea点击:

Pyspark RDD .filter() with wildcard

pythonapache-sparkrdd

提问by Lucas Mattos

I have an Pyspark RDD with a text column that I want to use as a a filter, so I have the following code:

我有一个带有文本列的 Pyspark RDD,我想将其用作 aa 过滤器,因此我有以下代码:

table2 = table1.filter(lambda x: x[12] == "*TEXT*")

To problem is... As you see I'm using the *to try to tell him to interpret that as a wildcard, but no success. Anyone has a help no that ?

问题是......正如你所见,我正在使用*试图告诉他将其解释为通配符,但没有成功。没有人有帮助吗?

回答by David

The lambda function is pure python, so something like below would work

lambda 函数是纯 python,所以像下面这样的东西可以工作

table2 = table1.filter(lambda x: "TEXT" in x[12])