问题描述
为什么下面的代码只适用于multiprocessing.dummy
,而不适用于简单的multiprocessing
.
Why does the code below work only with multiprocessing.dummy
, but not with simple multiprocessing
.
import urllib.request
#from multiprocessing.dummy import Pool #this works
from multiprocessing import Pool
urls = ['http://www.python.org', 'http://www.yahoo.com','http://www.scala.org', 'http://www.google.com']
if __name__ == '__main__':
with Pool(5) as p:
results = p.map(urllib.request.urlopen, urls)
错误:
Traceback (most recent call last):
File "urlthreads.py", line 31, in <module>
results = p.map(urllib.request.urlopen, urls)
File "C:UserspatriAnaconda3libmultiprocessingpool.py", line 268, in map
return self._map_async(func, iterable, mapstar, chunksize).get()
File "C:UserspatriAnaconda3libmultiprocessingpool.py", line 657, in get
raise self._value
multiprocessing.pool.MaybeEncodingError: Error sending result: '[<http.client.HTTPResponse object at 0x0000016AEF204198>]'. Reason: 'TypeError("cannot serialize '_io.BufferedReader' object")'
缺少什么才能在没有虚拟"的情况下工作?
What's missing so that it works without "dummy" ?
推荐答案
你从 urlopen()
得到的 http.client.HTTPResponse
-object 有一个 >_io.BufferedReader
- 附加对象,这个对象不能被pickle.
The http.client.HTTPResponse
-object you get back from urlopen()
has a _io.BufferedReader
-object attached, and this object cannot be pickled.
pickle.dumps(urllib.request.urlopen('http://www.python.org').fp)
Traceback (most recent call last):
...
pickle.dumps(urllib.request.urlopen('http://www.python.org').fp)
TypeError: cannot serialize '_io.BufferedReader' object
multiprocessing.Pool
将需要腌制(序列化)结果以将其发送回父进程,但此处失败.由于 dummy
使用线程而不是进程,因此不会出现酸洗,因为同一进程中的线程自然共享它们的内存.
multiprocessing.Pool
will need to pickle (serialize) the results to send it back to the parent process and this fails here. Since dummy
uses threads instead of processes, there will be no pickling, because threads in the same process share their memory naturally.
这个TypeError
的一般解决方案是:
A general solution to this TypeError
is:
- 读出缓冲区并保存内容(如果需要)
- 从您尝试腌制的对象中删除对
'_io.BufferedReader'
的引用
在您的情况下,在 http.client.HTTPResponse
上调用 .read()
将清空并删除缓冲区,因此是用于将响应转换为可腌制内容的函数可以这样做:
In your case, calling .read()
on the http.client.HTTPResponse
will empty and remove the buffer, so a function for converting the response into something pickleable could simply do this:
def read_buffer(response):
response.text = response.read()
return response
例子:
r = urllib.request.urlopen('http://www.python.org')
r = read_buffer(r)
pickle.dumps(r)
# Out: b'x80x03chttp.client
HTTPResponse...
在考虑这种方法之前,请确保您确实想要使用多处理而不是多线程.对于像您在此处拥有的 I/O 绑定任务,多线程就足够了,因为无论如何大部分时间都花在等待响应上(不需要 cpu 时间).多处理和所涉及的 IPC 也会带来大量开销.
Before you consider this approach, make sure you really want to use multiprocessing instead of multithreading. For I/O-bound tasks like you have it here, multithreading would be sufficient, since most of the time is spend in waiting (no need for cpu-time) for the response anyway. Multiprocessing and the IPC involved also introduces substantial overhead.
这篇关于multiprocessing.pool.MaybeEncodingError: 'TypeError("cannot serialize '_io.BufferedReader' object",)'的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持跟版网!