Python 对列表元素进行 Grep

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/12845288/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-18 11:59:22  来源:igfitidea点击:

Grep on elements of a list

pythonlistgrep

提问by mmann1123

I have a list of files names:

我有一个文件名列表:

names = ['aet2000','ppt2000', 'aet2001', 'ppt2001']

While I have found some functions that can work to grep character strings, I haven't figured out how to grep all elements of a list.

虽然我发现了一些可以对字符串进行 grep 的函数,但我还没有弄清楚如何对列表的所有元素进行 grep。

for instance I would like to:

例如我想:

grep(names,'aet')

and get:

并得到:

['aet2000','aet2001']

Sure its not too hard, but I am new to Python

当然不是太难,但我是 Python 新手



update The question above apparently wasn't accurate enough. All the answers below work for the example but not for my actual data. Here is my code to make the list of file names:

更新上面的问题显然不够准确。下面的所有答案都适用于示例,但不适用于我的实际数据。这是我制作文件名列表的代码:

years = range(2000,2011)
months = ["jan","feb","mar","apr","may","jun","jul","aug","sep","oct","nov","dec"]
variables = ["cwd","ppt","aet","pet","tmn","tmx"]     #  *variable name*  with wildcards   
tifnames = list(range(0,(len(years)*len(months)*len(variables)+1)  ))
i = 0
for variable in variables:
   for year in years:
      for month in months:
         fullname = str(variable)+str(year)+str(month)+".tif"
         tifnames[i] = fullname
         i = i+1 

Running filter(lambda x:'aet' in x,tifnames) or the other answers return:

运行 filter(lambda x:'aet' in x,tifnames) 或其他答案返回:

Traceback (most recent call last):
  File "<pyshell#89>", line 1, in <module>
    func(tifnames,'aet')
  File "<pyshell#88>", line 2, in func
    return [i for i in l if s in i]
TypeError: argument of type 'int' is not iterable

Despite the fact that tifnames is a list of character strings:

尽管 tifnames 是一个字符串列表:

type(tifnames[1])
<type 'str'>

Do you guys see what's going on here? Thanks again!

你们看到这里发生了什么吗?再次感谢!

采纳答案by Ashwini Chaudhary

Use filter():

使用filter()

>>> names = ['aet2000','ppt2000', 'aet2001', 'ppt2001']
>>> filter(lambda x:'aet' in x, names)
['aet2000', 'aet2001']

with regex:

regex

>>> import re
>>> filter(lambda x: re.search(r'aet', x), names)
['aet2000', 'aet2001']


In Python 3 filter returns an iterator, hence to get a list call list()on it.

在 Python 3 中,过滤器返回一个迭代器,因此可以对其进行列表调用list()

>>> list(filter(lambda x:'aet' in x, names))
['aet2000', 'aet2001']

else use list-comprehension(it will work in both Python 2 and 3:

否则使用列表理解(它适用于 Python 2 和 3:

>>> [name for name in names if 'aet' in name]
['aet2000', 'aet2001']

回答by root

>>> names = ['aet2000', 'ppt2000', 'aet2001', 'ppt2001']
>>> def grep(l, s):
...     return [i for i in l if s in i]
... 
>>> grep(names, 'aet')
['aet2000', 'aet2001']

Regex version, closer to grep, although not needed in this case:

正则表达式版本,更接近 grep,虽然在这种情况下不需要:

>>> def func(l, s):
...     return [i for i in l if re.search(s, i)]
... 
>>> func(names, r'aet')
['aet2000', 'aet2001']

回答by Florin Stingaciu

You should try to look into the pythong module called re. Bellow I have a grep function implmentation in python that uses re. It will help you understand how re works (of course only after you read about re)

您应该尝试查看名为re的 pythong 模块。波纹管我在python中有一个使用re的grep函数实现。它将帮助您了解 re 的工作原理(当然只有在您阅读 re 之后)

def grep(pattern,word_list):
    expr = re.compile(pattern)
    return [elem for elem in word_list if expr.match(elem)]

回答by mrchampe

Try this out. It may not be the "shortest" of all the code shown, but for someone trying to learn python, I think it teaches more

试试这个。它可能不是显示的所有代码中“最短的”,但对于尝试学习 Python 的人来说,我认为它教得更多

names = ['aet2000','ppt2000', 'aet2001', 'ppt2001']
found = []
for name in names:
    if 'aet' in name:
       found.append(name)
print found

Output

输出

['aet2000', 'aet2001']

Edit: Changed to produce list.

编辑:更改为生产列表。

See also:

也可以看看:

How to use Python to find out the words begin with vowels in a list?

如何使用Python找出列表中以元音开头的单词?

回答by mrchampe

You do not need to preallocate the list tifnamesor use the counter to put in elements. Just append the data to the list as generated or use a list comprehension.

您不需要预先分配列表tifnames或使用计数器来放入元素。只需将数据附加到生成的列表中或使用列表理解。

ie, Just do this:

即,只需这样做:

import re

years = ['2000','2011']
months = ["jan","feb","mar","apr","may","jun","jul","aug","sep","oct","nov","dec"]
variables = ["cwd","ppt","aet","pet","tmn","tmx"]     #  *variable name*  with wildcards   
tifnames = []
for variable in variables:
   for year in years:
      for month in months:
         fullname = variable+year+month+".tif"
         tifnames.append(fullname)

print tifnames
print '==='
print filter(lambda x: re.search(r'aet',x),tifnames)

Prints:

印刷:

['cwd2000jan.tif', 'cwd2000feb.tif', 'cwd2000mar.tif', 'cwd2000apr.tif', 'cwd2000may.tif', 'cwd2000jun.tif', 'cwd2000jul.tif', 'cwd2000aug.tif', 'cwd2000sep.tif', 'cwd2000oct.tif', 'cwd2000nov.tif', 'cwd2000dec.tif', 'cwd2011jan.tif', 'cwd2011feb.tif', 'cwd2011mar.tif', 'cwd2011apr.tif', 'cwd2011may.tif', 'cwd2011jun.tif', 'cwd2011jul.tif', 'cwd2011aug.tif', 'cwd2011sep.tif', 'cwd2011oct.tif', 'cwd2011nov.tif', 'cwd2011dec.tif', 'ppt2000jan.tif', 'ppt2000feb.tif', 'ppt2000mar.tif', 'ppt2000apr.tif', 'ppt2000may.tif', 'ppt2000jun.tif', 'ppt2000jul.tif', 'ppt2000aug.tif', 'ppt2000sep.tif', 'ppt2000oct.tif', 'ppt2000nov.tif', 'ppt2000dec.tif', 'ppt2011jan.tif', 'ppt2011feb.tif', 'ppt2011mar.tif', 'ppt2011apr.tif', 'ppt2011may.tif', 'ppt2011jun.tif', 'ppt2011jul.tif', 'ppt2011aug.tif', 'ppt2011sep.tif', 'ppt2011oct.tif', 'ppt2011nov.tif', 'ppt2011dec.tif', 'aet2000jan.tif', 'aet2000feb.tif', 'aet2000mar.tif', 'aet2000apr.tif', 'aet2000may.tif', 'aet2000jun.tif', 'aet2000jul.tif', 'aet2000aug.tif', 'aet2000sep.tif', 'aet2000oct.tif', 'aet2000nov.tif', 'aet2000dec.tif', 'aet2011jan.tif', 'aet2011feb.tif', 'aet2011mar.tif', 'aet2011apr.tif', 'aet2011may.tif', 'aet2011jun.tif', 'aet2011jul.tif', 'aet2011aug.tif', 'aet2011sep.tif', 'aet2011oct.tif', 'aet2011nov.tif', 'aet2011dec.tif', 'pet2000jan.tif', 'pet2000feb.tif', 'pet2000mar.tif', 'pet2000apr.tif', 'pet2000may.tif', 'pet2000jun.tif', 'pet2000jul.tif', 'pet2000aug.tif', 'pet2000sep.tif', 'pet2000oct.tif', 'pet2000nov.tif', 'pet2000dec.tif', 'pet2011jan.tif', 'pet2011feb.tif', 'pet2011mar.tif', 'pet2011apr.tif', 'pet2011may.tif', 'pet2011jun.tif', 'pet2011jul.tif', 'pet2011aug.tif', 'pet2011sep.tif', 'pet2011oct.tif', 'pet2011nov.tif', 'pet2011dec.tif', 'tmn2000jan.tif', 'tmn2000feb.tif', 'tmn2000mar.tif', 'tmn2000apr.tif', 'tmn2000may.tif', 'tmn2000jun.tif', 'tmn2000jul.tif', 'tmn2000aug.tif', 'tmn2000sep.tif', 'tmn2000oct.tif', 'tmn2000nov.tif', 'tmn2000dec.tif', 'tmn2011jan.tif', 'tmn2011feb.tif', 'tmn2011mar.tif', 'tmn2011apr.tif', 'tmn2011may.tif', 'tmn2011jun.tif', 'tmn2011jul.tif', 'tmn2011aug.tif', 'tmn2011sep.tif', 'tmn2011oct.tif', 'tmn2011nov.tif', 'tmn2011dec.tif', 'tmx2000jan.tif', 'tmx2000feb.tif', 'tmx2000mar.tif', 'tmx2000apr.tif', 'tmx2000may.tif', 'tmx2000jun.tif', 'tmx2000jul.tif', 'tmx2000aug.tif', 'tmx2000sep.tif', 'tmx2000oct.tif', 'tmx2000nov.tif', 'tmx2000dec.tif', 'tmx2011jan.tif', 'tmx2011feb.tif', 'tmx2011mar.tif', 'tmx2011apr.tif', 'tmx2011may.tif', 'tmx2011jun.tif', 'tmx2011jul.tif', 'tmx2011aug.tif', 'tmx2011sep.tif', 'tmx2011oct.tif', 'tmx2011nov.tif', 'tmx2011dec.tif']
===
['aet2000jan.tif', 'aet2000feb.tif', 'aet2000mar.tif', 'aet2000apr.tif', 'aet2000may.tif', 'aet2000jun.tif', 'aet2000jul.tif', 'aet2000aug.tif', 'aet2000sep.tif', 'aet2000oct.tif', 'aet2000nov.tif', 'aet2000dec.tif', 'aet2011jan.tif', 'aet2011feb.tif', 'aet2011mar.tif', 'aet2011apr.tif', 'aet2011may.tif', 'aet2011jun.tif', 'aet2011jul.tif', 'aet2011aug.tif', 'aet2011sep.tif', 'aet2011oct.tif', 'aet2011nov.tif', 'aet2011dec.tif']

And, depending if you find this more readable, it would be more idiomatic Python to have this:

而且,取决于你是否觉得这更易读,拥有这样的 Python 会更惯用:

years = ['2000','2011']
months = ["jan","feb","mar","apr","may","jun","jul","aug","sep","oct","nov","dec"]
vars = ["cwd","ppt","aet","pet","tmn","tmx"]        
tifnames = [v+y+m+".tif" for y in years for m in months for v in vars]
print tifnames
print '==='
print [e for e in tifnames if re.search(r'aet',e)]

...same output

...相同的输出