• <legend id='tFKAE'><style id='tFKAE'><dir id='tFKAE'><q id='tFKAE'></q></dir></style></legend>
      • <bdo id='tFKAE'></bdo><ul id='tFKAE'></ul>
    1. <tfoot id='tFKAE'></tfoot>
      <i id='tFKAE'><tr id='tFKAE'><dt id='tFKAE'><q id='tFKAE'><span id='tFKAE'><b id='tFKAE'><form id='tFKAE'><ins id='tFKAE'></ins><ul id='tFKAE'></ul><sub id='tFKAE'></sub></form><legend id='tFKAE'></legend><bdo id='tFKAE'><pre id='tFKAE'><center id='tFKAE'></center></pre></bdo></b><th id='tFKAE'></th></span></q></dt></tr></i><div id='tFKAE'><tfoot id='tFKAE'></tfoot><dl id='tFKAE'><fieldset id='tFKAE'></fieldset></dl></div>

      <small id='tFKAE'></small><noframes id='tFKAE'>

        蟒蛇-&gt;多处理模块

        python -gt; multiprocessing module(蟒蛇-gt;多处理模块)
        <i id='1VAB8'><tr id='1VAB8'><dt id='1VAB8'><q id='1VAB8'><span id='1VAB8'><b id='1VAB8'><form id='1VAB8'><ins id='1VAB8'></ins><ul id='1VAB8'></ul><sub id='1VAB8'></sub></form><legend id='1VAB8'></legend><bdo id='1VAB8'><pre id='1VAB8'><center id='1VAB8'></center></pre></bdo></b><th id='1VAB8'></th></span></q></dt></tr></i><div id='1VAB8'><tfoot id='1VAB8'></tfoot><dl id='1VAB8'><fieldset id='1VAB8'></fieldset></dl></div>

            <tfoot id='1VAB8'></tfoot>
            • <bdo id='1VAB8'></bdo><ul id='1VAB8'></ul>
            • <legend id='1VAB8'><style id='1VAB8'><dir id='1VAB8'><q id='1VAB8'></q></dir></style></legend>

                <tbody id='1VAB8'></tbody>
              • <small id='1VAB8'></small><noframes id='1VAB8'>

                  本文介绍了蟒蛇-&gt;多处理模块的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着跟版网的小编来一起学习吧!

                  问题描述

                  这就是我想要完成的 -

                  Here's what I am trying to accomplish -

                  1. 我有大约一百万个文件需要解析 &将解析后的内容附加到单个文件中.
                  2. 由于单个进程需要很长时间,因此此选项已失效.
                  3. 不使用 Python 中的线程,因为它本质上是运行单个进程(由于 GIL).
                  4. 因此使用多处理模块.即产生 4 个子进程来利用所有原始核心功能:)

                  到目前为止一切顺利,现在我需要一个所有子进程都可以访问的共享对象.我正在使用多处理模块中的队列.此外,所有子流程都需要将其输出写入单个文件.我猜是使用锁的潜在场所.当我运行这个设置时,我没有收到任何错误(所以父进程看起来很好),它只是停止了.当我按 ctrl-C 时,我看到一个回溯(每个子进程一个).也没有输出写入输出文件.这是代码(请注意,在没有多进程的情况下一切运行良好) -

                  So far so good, now I need a shared object which all the sub-processes have access to. I am using Queues from the multiprocessing module. Also, all the sub-processes need to write their output to a single file. A potential place to use Locks I guess. With this setup when I run, I do not get any error (so the parent process seems fine), it just stalls. When I press ctrl-C I see a traceback (one for each sub-process). Also no output is written to the output file. Here's code (note that everything runs fine without multi-processes) -

                  import os
                  import glob
                  from multiprocessing import Process, Queue, Pool
                  
                  data_file  = open('out.txt', 'w+')
                  
                  def worker(task_queue):
                      for file in iter(task_queue.get, 'STOP'):
                          data = mine_imdb_page(os.path.join(DATA_DIR, file))
                          if data:
                              data_file.write(repr(data)+'
                  ')
                      return
                  
                  def main():
                      task_queue = Queue()
                      for file in glob.glob('*.csv'):
                          task_queue.put(file)
                      task_queue.put('STOP') # so that worker processes know when to stop
                  
                      # this is the block of code that needs correction.
                      if multi_process:
                          # One way to spawn 4 processes
                          # pool = Pool(processes=4) #Start worker processes
                          # res  = pool.apply_async(worker, [task_queue, data_file])
                  
                          # But I chose to do it like this for now.
                          for i in range(4):
                              proc = Process(target=worker, args=[task_queue])
                              proc.start()
                      else: # single process mode is working fine!
                          worker(task_queue)
                      data_file.close()
                      return
                  

                  我做错了什么?我还尝试在生成时将打开的 file_object 传递给每个进程.但是没有效果.例如- Process(target=worker, args=[task_queue, data_file]).但这并没有改变什么.我觉得子进程由于某种原因无法写入文件.file_object 的实例没有被复制(在生成时)或其他一些怪癖......有人知道吗?

                  what am I doing wrong? I also tried passing the open file_object to each of the processes at the time of spawning. But to no effect. e.g.- Process(target=worker, args=[task_queue, data_file]). But this did not change anything. I feel the subprocesses are not able to write to the file for some reason. Either the instance of the file_object is not getting replicated (at the time of spawn) or some other quirk... Anybody got an idea?

                  EXTRA: 还有有什么办法可以让持久的mysql_connection 保持打开&将其传递给 sub_processes?所以我在我的父进程中打开了一个 mysql 连接 &我的所有子进程都应该可以访问打开的连接.基本上这相当于 python 中的 shared_memory .这里有什么想法吗?

                  EXTRA: Also Is there any way to keep a persistent mysql_connection open & pass it across to the sub_processes? So I open a mysql connection in my parent process & the open connection should be accessible to all my sub-processes. Basically this is the equivalent of a shared_memory in python. Any ideas here?

                  推荐答案

                  虽然和 Eric 的讨论很有成果,但后来我找到了更好的方法.在多处理模块中,有一个名为Pool"的方法非常适合我的需求.

                  Although the discussion with Eric was fruitful, later on I found a better way of doing this. Within the multiprocessing module there is a method called 'Pool' which is perfect for my needs.

                  它会根据我的系统拥有的核心数量进行自我优化.即只产生与否一样多的进程.的核心.当然,这是可定制的.所以这里是代码.以后可能会帮助别人-

                  It's optimizes itself to the number of cores my system has. i.e. only as many processes are spawned as the no. of cores. Of course this is customizable. So here's the code. Might help someone later-

                  from multiprocessing import Pool
                  
                  def main():
                      po = Pool()
                      for file in glob.glob('*.csv'):
                          filepath = os.path.join(DATA_DIR, file)
                          po.apply_async(mine_page, (filepath,), callback=save_data)
                      po.close()
                      po.join()
                      file_ptr.close()
                  
                  def mine_page(filepath):
                      #do whatever it is that you want to do in a separate process.
                      return data
                  
                  def save_data(data):
                      #data is a object. Store it in a file, mysql or...
                      return
                  

                  仍在经历这个巨大的模块.不确定 save_data() 是由父进程执行还是由衍生的子进程使用.如果是孩子进行了保存,则在某些情况下可能会导致并发问题.如果有人有更多使用此模块的经验,您可以在这里了解更多知识...

                  Still going through this huge module. Not sure if save_data() is executed by parent process or this function is used by spawned child processes. If it's the child which does the saving it might lead to concurrency issues in some situations. If anyone has anymore experience in using this module, you appreciate more knowledge here...

                  这篇关于蟒蛇-&gt;多处理模块的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持跟版网!

                  本站部分内容来源互联网,如果有图片或者内容侵犯了您的权益,请联系我们,我们会在确认后第一时间进行删除!

                  相关文档推荐

                  Adding config modes to Plotly.Py offline - modebar(将配置模式添加到 Plotly.Py 离线 - 模式栏)
                  Plotly: How to style a plotly figure so that it doesn#39;t display gaps for missing dates?(Plotly:如何设置绘图图形的样式,使其不显示缺失日期的间隙?)
                  python save plotly plot to local file and insert into html(python将绘图保存到本地文件并插入到html中)
                  Plotly: What color cycle does plotly express follow?(情节:情节表达遵循什么颜色循环?)
                  How to save plotly express plot into a html or static image file?(如何将情节表达图保存到 html 或静态图像文件中?)
                  Plotly: How to make a line plot from a pandas dataframe with a long or wide format?(Plotly:如何使用长格式或宽格式的 pandas 数据框制作线图?)
                    1. <i id='B7y1R'><tr id='B7y1R'><dt id='B7y1R'><q id='B7y1R'><span id='B7y1R'><b id='B7y1R'><form id='B7y1R'><ins id='B7y1R'></ins><ul id='B7y1R'></ul><sub id='B7y1R'></sub></form><legend id='B7y1R'></legend><bdo id='B7y1R'><pre id='B7y1R'><center id='B7y1R'></center></pre></bdo></b><th id='B7y1R'></th></span></q></dt></tr></i><div id='B7y1R'><tfoot id='B7y1R'></tfoot><dl id='B7y1R'><fieldset id='B7y1R'></fieldset></dl></div>

                      <tfoot id='B7y1R'></tfoot>
                      <legend id='B7y1R'><style id='B7y1R'><dir id='B7y1R'><q id='B7y1R'></q></dir></style></legend>

                          <small id='B7y1R'></small><noframes id='B7y1R'>

                            <bdo id='B7y1R'></bdo><ul id='B7y1R'></ul>
                              <tbody id='B7y1R'></tbody>