如何在 Python 中计算文件的 md5 校验和?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/16874598/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-18 23:54:42  来源:igfitidea点击:

How do I calculate the md5 checksum of a file in Python?

pythonpython-3.xmd5md5sum

提问by user2344996

I have made a code in python that checks for an md5 in a file and makes sure the md5 matches that of the original. Here is what I have developed:

我在 python 中编写了一个代码,用于检查文件中的 md5 并确保 md5 与原始 md5 匹配。这是我开发的:

#Defines filename
filename = "file.exe"

#Gets MD5 from file 
def getmd5(filename):
    return m.hexdigest()

md5 = dict()

for fname in filename:
    md5[fname] = getmd5(fname)

#If statement for alerting the user whether the checksum passed or failed

if md5 == '>md5 will go here<': 
    print("MD5 Checksum passed. You may now close this window")
    input ("press enter")
else:
    print("MD5 Checksum failed. Incorrect MD5 in file 'filename'. Please download a    new copy")
    input("press enter") 
exit

But whenever I run the code, I get the following:

但是每当我运行代码时,我都会得到以下信息:

Traceback (most recent call last):
File "C:\Users\Username\md5check.py", line 13, in <module>
 md5[fname] = getmd5(fname)
File "C:\Users\Username\md5check.py, line 9, in getmd5
  return m.hexdigest()
NameError: global name 'm' is not defined

Is there anything I am missing in my code?

我的代码中缺少什么吗?

采纳答案by PSS

In regards to your error and what's missing in your code. mis a name which is not defined for getmd5()function. No offence, I know you are a beginner, but your code is all over the place. Let's look at your issues one by one :) First off, you are not using hashlib.md5.hexdigest()method correctly. Please find explanation on hashlib functions Python Doc Library. The correct way to return MD5 for provided stringis to do something like this:

关于您的错误以及代码中缺少的内容。m是一个没有为getmd5()函数定义的名称。无意冒犯,我知道您是初学者,但是您的代码到处都是。让我们一一看看您的问题:) 首先,您没有hashlib.md5.hexdigest()正确使用方法。请在 hashlib 函数Python Doc Library 中找到解释。为提供的字符串返回 MD5 的正确方法是执行以下操作:

>>> import hashlib
>>> hashlib.md5("filename.exe").hexdigest()
'2a53375ff139d9837e93a38a279d63e5'

However, you have a bigger problem here. You are calculating MD5 on a file name string, where in reality MD5 is calculated based on file contents. You will need to basically read file contents and pipe it though MD5. My next example is not very efficient, but something like this:

但是,这里有一个更大的问题。您正在根据文件名字符串计算 MD5 ,而实际上 MD5 是根据文件内容计算的。您将需要基本上读取文件内容并通过 MD5 进行管道传输。我的下一个例子效率不高,但类似这样:

>>> import hashlib
>>> hashlib.md5(open('filename.exe','rb').read()).hexdigest()
'd41d8cd98f00b204e9800998ecf8427e'

As you can clearly see second MD5 hash is totally different from the first one. The reason for that is that we are pushing contents of the file through, not just file name. A simple solution could be something like that:

如您所见,第二个 MD5 哈希与第一个完全不同。原因是我们正在推送文件的内容,而不仅仅是文件名。一个简单的解决方案可能是这样的:

# Import hashlib library (md5 method is part of it)
import hashlib

# File to check
file_name = 'filename.exe'

# Correct original md5 goes here
original_md5 = '5d41402abc4b2a76b9719d911017c592'  

# Open,close, read file and calculate MD5 on its contents 
with open(file_name) as file_to_check:
    # read contents of the file
    data = file_to_check.read()    
    # pipe contents of the file through
    md5_returned = hashlib.md5(data).hexdigest()

# Finally compare original MD5 with freshly calculated
if original_md5 == md5_returned:
    print "MD5 verified."
else:
    print "MD5 verification failed!."

Please look at the post Python: Generating a MD5 checksum of a fileit explains in detail a couple of ways how it can be achieved efficiently.

请查看文章Python: Generating a MD5 checksum of a file它详细解释了如何有效实现它的几种方法。

Best of luck.

祝你好运。

回答by Boris

In Python 3.8+you can do

在 Python 3.8+ 中你可以做

import hashlib

with open("your_filename.png", "rb") as f:
    file_hash = hashlib.md5()
    while chunk := f.read(8192):
        file_hash.update(chunk)

print(file_hash.digest())
print(file_hash.hexdigest())  # to get a printable str instead of bytes


Consider using hashlib.blake2binstead of md5(just replace md5with blake2bin the above snippet). It's cryptographically secure and fasterthan MD5.

考虑使用hashlib.blake2b而不是md5(只需在上面的代码片段中替换md5blake2b)。它在密码学上是安全的并且比 MD5更快