python python中的二进制文件IO,从哪里开始?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/967652/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Binary file IO in python, where to start?
提问by DrBloodmoney
As a self-taught python hobbyist, how would I go about learning to import and export binary files using standard formats?
作为自学的 Python 爱好者,我将如何学习使用标准格式导入和导出二进制文件?
I'd like to implement a script that takes ePub ebooks (XHTML + CSS in a zip) and converts it to a mobipocket (Palmdoc) format in order to allow the Amazon Kindle to read it (as part of a larger project that I'm working on).
我想实现一个脚本,该脚本采用 ePub 电子书(XHTML + CSS in a zip)并将其转换为 mobipocket (Palmdoc) 格式,以允许 Amazon Kindle 阅读它(作为我的一个更大项目的一部分)我正在努力)。
There is already an awesome open-source project for managing ebook libraries : Calibre. I wanted to try implementing this on my own as a learning/self-teaching exercise. I started looking at their python source codeand realized that I have no idea what is going on. Of course, the big danger in being self-taught at anything is not knowing what you don't know.
已经有一个很棒的开源项目用于管理电子书库:Calibre。我想尝试将其作为学习/自学练习自行实施。我开始查看他们的Python 源代码并意识到我不知道发生了什么。当然,自学任何事情的最大危险就是不知道自己不知道什么。
In this case, I know that I don't know much about these binary files and how to work with them in python code (struct?). But I think I'm probably missing a lot of knowledge about binary files in general and I'd like some help understanding how to work with them. Here is a detailed overviewof the mobi/palmdoc headers. Thanks!
在这种情况下,我知道我不太了解这些二进制文件以及如何在 python 代码(struct?)中使用它们。但我想我可能缺少很多关于二进制文件的一般知识,我需要一些帮助来理解如何使用它们。 这里是mobi/palmdoc 头文件的详细概述。谢谢!
Edit: No question, good point! Do you have any tips on how to gain a basic knowledge of working with binary files? Python-specific would be helpful but other approaches could also be useful.
编辑:毫无疑问,好点!关于如何获得使用二进制文件的基本知识,您有什么提示吗?特定于 Python 会有所帮助,但其他方法也可能有用。
TOM:Edited as question, added intro / better title
TOM:编辑为问题,添加介绍/更好的标题
采纳答案by tom10
You should probably start with the structmodule, as you pointed to in your question, and of course, open the file as a binary.
正如您在问题中指出的那样,您可能应该从struct模块开始,当然,将文件作为二进制文件打开。
Basically you just start at the beginning of the file and pick it apart piece by piece. It's a hassle, but not a huge problem. If the files are compressed or encrypted, things can get more difficult. It's helpful if you start with a file that you know the contents of so you're not guessing all the time.
基本上,您只需从文件的开头开始,然后将其逐个拆分。这是一个麻烦,但不是一个大问题。如果文件被压缩或加密,事情会变得更加困难。如果您从一个您知道其内容的文件开始,这会很有帮助,这样您就不会一直在猜测。
Try it a bit, and maybe you'll evolve more specific questions.
尝试一下,也许你会发展出更具体的问题。
回答by Scott Griffiths
If you want to construct and analyse binary files the struct module will give you the basic tools, but it isn't very friendly, especially if you want to look at things that aren't a whole number of bytes.
如果你想构建和分析二进制文件,struct 模块会给你基本的工具,但它不是很友好,特别是如果你想查看不是整数字节的东西。
There are a few modules that can help, such as BitVector, bitarrayand bitstring. (I favour bitstring, but I wrote it and so may be biased).
有一些模块可以提供帮助,例如BitVector、bitarray和bitstring。(我喜欢bitstring,但我写了它,所以可能有偏见)。
For parsing binary formats the hachtheitroadmodule is very good, but I suspect it's too high-level for your current needs.
对于解析二进制格式,hachtheitroad模块非常好,但我怀疑它对于您当前的需求来说太高级了。
回答by Scott Griffiths
For teaching yourself python tools that work with binary files, this will get you going. Fun too. Exercises with binaries, zips, images... lots more.
为了自学处理二进制文件的 Python 工具, 这会让你继续前进。也很有趣。练习二进制文件、zip、图像……更多。