Linux 在 awk 中可能更简单,但我怎么能在 Python 中说这个呢?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/3677116/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-03 23:16:13  来源:igfitidea点击:

It's probably simpler in awk, but how can I say this in Python?

pythonnlpnltk

提问by magnetar

I have:

我有:

Rutsch is for rutterman ramping his roe

Rutsch 是为 rutterman 增加他的鱼子

which is a phrase from Finnegans Wake. The epic riddle book is full of leitmotives like this, such as 'take off that white hat,' and 'tip,' all which get mutated into similar sounding words depending on where you are in the book itself. All I want is a way to find obvious occurrences of this particular leitmotif, IE

这是 Finnegans Wake 中的一句话。史诗般的谜语书充满了这样的主题,例如“脱掉那顶白帽子”和“提示”,根据您在书中的位置,所有这些词都会变异成发音相似的词。我想要的只是一种找到这个特定主题的明显出现的方法,IE

[word1] is for [word2] [word-part1]ing his [word3]

[word1] 用于 [word2] [word-part1] 对他的 [word3]

采纳答案by imgx64

import re
# read the book into a variable 'text'
matches = re.findall(r'\w+ is for \w+ \w+ing his \w+', text)

回答by nmichaels

You can do this with regular expressions in Python:

您可以使用 Python 中的正则表达式执行此操作:

import re
pattern = re.compile(r'(?P<word>.*) is for (?P=word) (?P=word)ing his (?P=word)')
words = pattern.findall(text)

That won't match your example, but it will match [word] is for [word] [word-part]ing his [word]. Add seasoning to taste. You can find more details in the re module docs.

这与您的示例不匹配,但它会匹配[word] is for [word] [word-part]ing his [word]. 加入调味料调味。您可以在 re 模块文档中找到更多详细信息。

回答by alexis

This solution is for your example, not for your description: Only the first letter is alliterative:

此解决方案适用于您的示例,而不适用于您的描述:只有第一个字母是头韵:

pairs = re.findall(r'((.)\w* is for \w* \w*ing his \w*)', fin, re.IGNORECASE)
matches = [ p[0] for p in pairs ]

To search for cases matching your description, just replace (.) with (\w+), and remove all instances of \w*.

要搜索与您的description匹配的案例,只需将 (.) 替换为 (\w+),并删除 \w* 的所有实例。