问题描述
我针对这个问题提出了一个问题,但没有得到足够彻底的答案来解决这个问题(很可能是由于在解释我的问题时缺乏严谨性,这是我试图纠正的问题):python多处理守护进程中的僵尸进程
I opened up a question for this problem and did not get a thorough enough answer to solve the issue (most likely due to a lack of rigor in explaining my issues which is what I am attempting to correct): Zombie process in python multiprocessing daemon
我正在尝试实现一个 python 守护程序,它使用一个工作池来使用 Popen
执行命令.我从 http://www.jejik.com/articles/借用了基本的守护进程2007/02/a_simple_unix_linux_daemon_in_python/
I am trying to implement a python daemon that uses a pool of workers to executes commands using Popen
. I have borrowed the basic daemon from http://www.jejik.com/articles/2007/02/a_simple_unix_linux_daemon_in_python/
我只更改了 init
、daemonize
(或同样的 start
)和 stop
方法.以下是对 init
方法的更改:
I have only changed the init
, daemonize
(or equally the start
) and stop
methods. Here are the changes to the init
method:
def __init__(self, pidfile):
#, stdin='/dev/null', stdout='STDOUT', stderr='STDOUT'):
#self.stdin = stdin
#self.stdout = stdout
#self.stderr = stderr
self.pidfile = pidfile
self.pool = Pool(processes=4)
我没有设置标准输入、标准输出和标准错误,以便我可以使用打印语句调试代码.另外,我尝试将这个池移动到几个地方,但这是唯一不会产生异常的地方.
I am not setting stdin, stdout and stderr so that I can debug the code with print statements. Also, I have tried moving this pool around to a few places but this is the only place that does not produce exceptions.
以下是对 daemonize
方法的更改:
Here are the changes to the daemonize
method:
def daemonize(self):
...
# redirect standard file descriptors
#sys.stdout.flush()
#sys.stderr.flush()
#si = open(self.stdin, 'r')
#so = open(self.stdout, 'a+')
#se = open(self.stderr, 'a+', 0)
#os.dup2(si.fileno(), sys.stdin.fileno())
#os.dup2(so.fileno(), sys.stdout.fileno())
#os.dup2(se.fileno(), sys.stderr.fileno())
print self.pool
...
同样的事情,我没有重定向 io 以便我可以调试.这里的打印是用来检查池位置的.
Same thing, I am not redirecting io so that I can debug. The print here is used so that I can check the pools location.
并且stop
方法发生变化:
def stop(self):
...
# Try killing the daemon process
try:
print self.pool
print "closing pool"
self.pool.close()
print "joining pool"
self.pool.join()
print "set pool to None"
self.pool = None
while 1:
print "kill process"
os.kill(pid, SIGTERM)
...
这里的想法是,我不仅需要终止进程,还需要清理池.self.pool = None
只是解决无效问题的随机尝试.起初我认为这是僵尸孩子的问题,当我在 while 中有 self.pool.close()
和 self.pool.join()
时发生使用 os.kill(pid, SIGTERM)
循环.这是在我决定开始通过 print self.pool
查看池位置之前.这样做之后,我相信当守护进程启动和停止时池是不一样的.这是一些输出:
Here the idea is that I not only need to kill the process but also clean up the pool. The self.pool = None
is just a random attempt to solve the issues which didn't work. At first I thought this was a problem with zombie children which was occurring when I had the self.pool.close()
and self.pool.join()
inside the while loop with the os.kill(pid, SIGTERM)
. This is before I decided to start looking at the pool location via the print self.pool
. After doing this, I believe the pools are not the same when the daemon starts and when it stops. Here is some output:
me@pc:~/pyCode/jobQueue$ sudo ./jobQueue.py start
<multiprocessing.pool.Pool object at 0x1c543d0>
me@pc:~/pyCode/jobQueue$ sudo ./jobQueue.py stop
<multiprocessing.pool.Pool object at 0x1fb7450>
closing pool
joining pool
set pool to None
kill process
kill process
... [ stuck in infinite loop]
物体的不同位置向我暗示它们不是同一个池,其中一个可能是僵尸?
The different locations of the objects suggest to me that they are not the same pool and that one of them is probably the zombie?
在 CTRL+C
之后,这是我从 ps aux|grep jobQueue
得到的:
After CTRL+C
, here is what I get from ps aux|grep jobQueue
:
root 21161 0.0 0.0 50384 5220 ? Ss 22:59 0:00 /usr/bin/python ./jobQueue.py start
root 21162 0.0 0.0 0 0 ? Z 22:59 0:00 [jobQueue.py] <defunct>
me 21320 0.0 0.0 7624 940 pts/0 S+ 23:00 0:00 grep --color=auto jobQueue
我尝试将 self.pool = Pool(processes=4)
移动到多个不同的地方.如果移到start()'或
daemonize()方法,
print self.pool`会抛出异常,说它是NoneType.另外,位置似乎改变了会弹出的僵尸进程的数量.
I have tried moving the self.pool = Pool(processes=4)
to a number of different places. If it is moved to the start()' or
daemonize()methods,
print self.pool` will throw an exception saying that it is NoneType. In addition, the location seems to change the number of zombie process that will pop up.
目前,我还没有添加通过工人运行任何东西的功能.我的问题似乎与正确设置工人池完全相关.我将不胜感激任何导致解决此问题的信息或有关创建使用工作人员池使用 Popen
执行一系列命令的守护程序服务的建议.由于我还没有走那么远,我不知道我将面临哪些挑战.我在想我可能只需要编写自己的池,但如果有一个很好的技巧可以让池在这里工作,那就太棒了.
Currently, I have not added the functionality to run anything via the workers. My problem seems completely related to setting up the pool of workers correctly. I would appreciate any information that leads to solving this issue or advice about creating a daemon service that uses a pool of workers to execute a series of commands using Popen
. Since I haven't gotten that far, I do not know what challenges I face ahead. I am thinking I might just need to write my own pool but if there is a nice trick to make the pool work here, it would be amazing.
推荐答案
解决方法是将self.pool = Pool(process=4)
作为daemonize的最后一行
方法.否则池最终会在某个地方丢失(可能在 fork
中).然后可以在 run
方法中访问该池,该方法由您希望守护的应用程序重载.但是,不能在 stop 方法中访问池,这样做会导致 NoneType 异常.我相信有一个更优雅的解决方案,但这很有效,这就是我现在所拥有的.如果我希望 stop
在池仍在运行时失败,我将不得不向 run
和某种形式的消息添加额外的功能,但我目前不关心这个.
The solution is to put the self.pool = Pool(process=4)
as the last line of the daemonize
method. Otherwise the pool ends up getting lost somewhere (perhaps in the fork
s). Then the pool can be access inside the run
method which is overloaded by the application you wish to daemonize. However, the pool cannot be accessed in the stop method and to do so would lead to NoneType exceptions. I believe there is a more elegant solution but this works and it is all I have for now. If I want the stop
to fail when the pool is still in action, I will have to add additional functionality to run
and some form of message but I am not currently concerned with this.
这篇关于守护进程内的Python多处理池的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持跟版网!