Python Pyspark RDD .filter() 带通配符
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/39256520/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Pyspark RDD .filter() with wildcard
提问by Lucas Mattos
I have an Pyspark RDD with a text column that I want to use as a a filter, so I have the following code:
我有一个带有文本列的 Pyspark RDD,我想将其用作 aa 过滤器,因此我有以下代码:
table2 = table1.filter(lambda x: x[12] == "*TEXT*")
To problem is... As you see I'm using the *
to try to tell him to interpret that as a wildcard, but no success.
Anyone has a help no that ?
问题是......正如你所见,我正在使用*
试图告诉他将其解释为通配符,但没有成功。没有人有帮助吗?
回答by David
The lambda function is pure python, so something like below would work
lambda 函数是纯 python,所以像下面这样的东西可以工作
table2 = table1.filter(lambda x: "TEXT" in x[12])