Python 和文本操作
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/676253/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Python and text manipulation
提问by ardsrk
I want to learn a text manipulation language and I have zeroed in on Python. Apart from text manipulation Python is also used for numerical applications, machine learning, AI, etc.
我想学习一种文本操作语言,并且我已经专注于 Python。除了文本操作之外,Python 还用于数值应用、机器学习、人工智能等。
My question is how do I approach the learning of Python language so that I am quickly able to write sophisticated text manipulation utilities. Apart from regular expressions in the context of "text manipulation" what language features are more important than others what modules are useful and so on.
我的问题是如何学习 Python 语言,以便我能够快速编写复杂的文本操作实用程序。除了“文本操作”上下文中的正则表达式之外,哪些语言功能比其他语言功能更重要,哪些模块有用等等。
回答by Van Gale
Beyond regular expressions here are some important features:
除了正则表达式,这里还有一些重要的特性:
- Generators, see Generator Tricks for Systems Programmersby David Beazley for a lot of great examples to pipeline unlimited amounts of text through generators.
- 生成器,请参阅David Beazley 的Generator Tricks for Systems Programmers以获取许多通过生成器传输无限量文本的优秀示例。
For tools, I recommend looking at the following:
对于工具,我建议查看以下内容:
Whoosh, a pure Python search engine that will give you some nice real life examples of parsing text using pyparsingand text processing in Python in general.
Ned Batcheldor's nice reviews of various Python parsing tools.
Docutilssource code for more advanced text processing in Python, including a sophisticated state machine.
Whoosh,一个纯 Python 搜索引擎,它将为您提供一些使用pyparsing和 Python 中的文本处理来解析文本的真实示例。
Ned Batcheldor对各种 Python 解析工具的好评。
Docutils源代码,用于在 Python 中进行更高级的文本处理,包括复杂的状态机。
Edit:A good links specific to text processing in Python:
编辑:一个特定于 Python 文本处理的好链接:
- Text Processing in Pythonby David Mertz. I think the book is still available, although it's probably a bit dated now.
- David Mertz在 Python 中的文本处理。我认为这本书仍然可用,尽管它现在可能有点过时了。
回答by Eugene Morozov
There's a book Text Processing in Python. I didn't read it myself yet but I've read other articles of this author and generally they're a good staff.
有一本书Text Processing in Python。我自己还没有读过,但我读过这位作者的其他文章,总的来说他们是一个很好的员工。
回答by RedBlueThing
I found the object.__doc__ and dir(obj) commands incredibly useful in learning the language.
我发现 object.__doc__ 和 dir(obj) 命令在学习语言方面非常有用。
e.g.
例如
a = "test,test,test"
What can I do with a? dir(a). Seems I can split a.
我可以用 a 做什么?目录(a)。看来我可以拆分一个。
vec = a.split (",")
What is vec? vec.__doc__:
什么是vec?vec.__doc__:
"new list initialized from sequence's items"
“从序列的项目初始化的新列表”
What can I do with vec? dir(vec).
我可以用 vec 做什么?目录(vec)。
vec.sort ()
etc ...
等等 ...
回答by claws
Although I didn't read, Python for Data Analysis by Wes McKinney - 1 edition (October 8, 2012)looks promising.
虽然我没有读过,但Wes McKinney 的 Python for Data Analysis - 1 版(2012 年 10 月 8 日)看起来很有希望。