Python 多处理:TypeError:预期的字符串或 Unicode 对象,找到 NoneType

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/14219038/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-18 10:47:24  来源:igfitidea点击:

Python multiprocessing: TypeError: expected string or Unicode object, NoneType found

pythonpython-2.7python-multithreading

提问by Mike Furlender

I am attempting to download a whole ftp directory in parallel.

我正在尝试并行下载整个 ftp 目录。

#!/usr/bin/python
import sys
import datetime
import os
from multiprocessing import Process, Pool
from ftplib import FTP
curYear=""
remotePath =""
localPath = ""

def downloadFiles (remotePath,localPath):
        splitted = remotePath.split('/');
        host= splitted[2]
        path='/'+'/'.join(splitted[3:])
        ftp = FTP(host)
        ftp.login()
        ftp.cwd(path)
        filenames =  ftp.nlst()
        total=len(filenames)
        i=0
        pool = Pool()
        for filename in filenames:
                        local_filename = os.path.join(localPath,filename)
                        pool.apply_async(downloadFile, (filename,local_filename,ftp))
                        #downloadFile(filename,local_filename,ftp);
                        i=i+1

        pool.close()
        pool.join()
        ftp.close()

def downloadFile(filename,local_filename,ftp):
        file = open(local_filename, 'wb')
        ftp.retrbinary('RETR '+ filename, file.write)
        file.close()

def getYearFromArgs():
        if len(sys.argv) >= 2 and sys.argv[1] == "Y":
                year = sys.argv[2]
                del sys.argv[1:2]
        else:
                year = str(datetime.datetime.now().year)
        return year

def assignGlobals():
        global p
        global remotePath
        global localPath
        global URL
        global host
        global user
        global password
        global sqldb
        remotePath = 'ftp://ftp3.ncdc.noaa.gov/pub/data/noaa/isd-lite/%s/' % (curYear)
        localPath = '/home/isd-lite/%s/' % (curYear)

def main():
        global curYear
        curYear=getYearFromArgs()
        assignGlobals()
        downloadFiles(remotePath,localPath)

if __name__ == "__main__":
        main()

But I get this exception:

但我得到这个例外:

Exception in thread Thread-1:
Traceback (most recent call last):
  File "/usr/lib64/python2.6/threading.py", line 532, in __bootstrap_inner
    self.run()
  File "/usr/lib64/python2.6/threading.py", line 484, in run
    self.__target(*self.__args, **self.__kwargs)
  File "/usr/lib64/python2.6/multiprocessing/pool.py", line 225, in _handle_tasks
    put(task)
TypeError: expected string or Unicode object, NoneType found

If I comment out this line:

如果我注释掉这一行:

pool.apply_async(downloadFile, (filename,local_filename,ftp)

and remove the comment on this line:

并删除此行的注释:

downloadFile(filename,local_filename,ftp);

Then it works just fine but it is slow and not multithreaded.

然后它工作得很好,但它很慢而且不是多线程的。

回答by ATOzTOA

Have you tried:

你有没有尝试过:

pool.apply_async(downloadFile, args=(filename,local_filename,ftp))

The prototype is :

原型是:

apply_async(func, args=(), kwds={}, callback=None)

回答by Multimedia Mike

Update, May 9, 2014:

2014 年 5 月 9 日更新:

I have determined the precise limitation. It is possible to send objects across process boundaries to worker processes as long as the objects can be pickled by Python's pickle facility. The problem which I described in my original answer occurred because I was trying to send a file handle to the workers. A quick experiment demonstrates why this doesn't work:

我已经确定了精确的限制。只要对象可以被Python 的 pickle 工具pickle,就可以跨进程边界将对象发送到工作进程。我在原始答案中描述的问题是因为我试图向工作人员发送文件句柄。一个快速实验演示了为什么这不起作用:

>>> f = open("/dev/null")
>>> import pickle
>>> pickle.dumps(f)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python2.7/pickle.py", line 1374, in dumps
    Pickler(file, protocol).dump(obj)
  File "/usr/lib/python2.7/pickle.py", line 224, in dump
    self.save(obj)
  File "/usr/lib/python2.7/pickle.py", line 306, in save
    rv = reduce(self.proto)
  File "/usr/lib/python2.7/copy_reg.py", line 70, in _reduce_ex
    raise TypeError, "can't pickle %s objects" % base.__name__
TypeError: can't pickle file objects

Thus, if you're encountering the Python error which led you to find this Stack Overflow question, make sure all the things you're sending across process boundaries can be pickled.

因此,如果您遇到导致您找到此堆栈溢出问题的 Python 错误,请确保您跨进程边界发送的所有内容都可以被腌制。

Original answer:

原答案:

I'm a bit late to answering. However, I ran into the same error message as the original poster while trying to use Python's multiprocessing module. I'll record my findings so that anyone else who stumbles upon this thread has something to try.

我来回答有点晚了。但是,我在尝试使用 Python 的多处理模块时遇到了与原始海报相同的错误消息。我会记录我的发现,以便任何偶然发现此线程的人都可以尝试。

In my case, the error occurred because of what I was trying to send to the pool of workers: I was trying to pass an array of file objects for the pool workers to chew on. That's apparently too much to send across process boundaries in Python. I solved the problem by sending the pool workers dictionaries which specified input and output filename strings.

就我而言,发生错误是因为我试图发送到工作人员池的内容:我试图传递一组文件对象以供池工作人员咀嚼。在 Python 中跨进程边界发送显然太多了。我通过发送指定输入和输出文件名字符串的池工作者字典解决了这个问题。

So it seems that the iterable that you supply to the function such as apply_async(I used map()and imap_unordered()) can contain a list of numbers or strings, or even a detailed dictionary data structure (as long as the values aren't objects).

因此,您提供给函数(例如apply_async(我使用map()imap_unordered()))的可迭代对象似乎可以包含数字或字符串列表,甚至是详细的字典数据结构(只要值不是对象)。

In your case:

在你的情况下:

pool.apply_async(downloadFile, (filename,local_filename,ftp))

ftpis an object, which might be causing the problem. As a workaround, I would recommend sending the parameters to the worker (looks like hostand pathin this case) and let the worker instantiate the object and deal with the cleanup.

ftp是一个对象,这可能会导致问题。作为一种解决方法,我建议将参数发送给工作人员(在这种情况下看起来像hostpath),并让工作人员实例化对象并处理清理。