<tfoot id='BbIeu'></tfoot>

    <small id='BbIeu'></small><noframes id='BbIeu'>

  1. <i id='BbIeu'><tr id='BbIeu'><dt id='BbIeu'><q id='BbIeu'><span id='BbIeu'><b id='BbIeu'><form id='BbIeu'><ins id='BbIeu'></ins><ul id='BbIeu'></ul><sub id='BbIeu'></sub></form><legend id='BbIeu'></legend><bdo id='BbIeu'><pre id='BbIeu'><center id='BbIeu'></center></pre></bdo></b><th id='BbIeu'></th></span></q></dt></tr></i><div id='BbIeu'><tfoot id='BbIeu'></tfoot><dl id='BbIeu'><fieldset id='BbIeu'></fieldset></dl></div>

    • <bdo id='BbIeu'></bdo><ul id='BbIeu'></ul>
    1. <legend id='BbIeu'><style id='BbIeu'><dir id='BbIeu'><q id='BbIeu'></q></dir></style></legend>

      多线程文件复制比多核 CPU 上的单线程慢得多

      Multithreaded file copy is far slower than a single thread on a multicore CPU(多线程文件复制比多核 CPU 上的单线程慢得多)

          <i id='odBJF'><tr id='odBJF'><dt id='odBJF'><q id='odBJF'><span id='odBJF'><b id='odBJF'><form id='odBJF'><ins id='odBJF'></ins><ul id='odBJF'></ul><sub id='odBJF'></sub></form><legend id='odBJF'></legend><bdo id='odBJF'><pre id='odBJF'><center id='odBJF'></center></pre></bdo></b><th id='odBJF'></th></span></q></dt></tr></i><div id='odBJF'><tfoot id='odBJF'></tfoot><dl id='odBJF'><fieldset id='odBJF'></fieldset></dl></div>
            <tbody id='odBJF'></tbody>

            • <bdo id='odBJF'></bdo><ul id='odBJF'></ul>

                <small id='odBJF'></small><noframes id='odBJF'>

              • <tfoot id='odBJF'></tfoot>

                <legend id='odBJF'><style id='odBJF'><dir id='odBJF'><q id='odBJF'></q></dir></style></legend>
              • 本文介绍了多线程文件复制比多核 CPU 上的单线程慢得多的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着跟版网的小编来一起学习吧!

                问题描述

                我正在尝试用 Python 编写一个多线程程序来加速(1000 个以下).csv 文件的复制.多线程代码的运行速度甚至比顺序方法还要慢.我用 profile.py 对代码进行了计时.我确定我一定做错了什么,但我不确定是什么.

                I am trying to write a multithreaded program in Python to accelerate the copying of (under 1000) .csv files. The multithreaded code runs even slower than the sequential approach. I timed the code with profile.py. I am sure I must be doing something wrong but I'm not sure what.

                环境:

                • 四核 CPU.
                • 2 个硬盘驱动器,其中一个包含源文件.另一个是目的地.
                • 1000 个 csv 文件,大小从几 KB 到 10 MB 不等.

                方法:

                我把所有的文件路径放在一个Queue中,并创建4-8个工作线程从队列中拉取文件路径并复制指定的文件.在任何情况下,多线程代码都不会更快:

                I put all the file paths in a Queue, and create 4-8 worker threads pull file paths from the queue and copy the designated file. In no case is the multithreaded code faster:

                • 连续复制需要 150-160 秒
                • 线程复制需要超过 230 秒

                我假设这是一个 I/O 绑定任务,所以多线程应该有助于提高操作速度.

                I assume this is an I/O bound task, so multithreading should help the operation speed.

                守则:

                    import Queue
                    import threading
                    import cStringIO 
                    import os
                    import shutil
                    import timeit  # time the code exec with gc disable
                    import glob    # file wildcards list, glob.glob('*.py')
                    import profile # 
                
                    fileQueue = Queue.Queue() # global
                    srcPath  = 'C:\temp'
                    destPath = 'D:\temp'
                    tcnt = 0
                    ttotal = 0
                
                    def CopyWorker():
                        while True:
                            fileName = fileQueue.get()
                            fileQueue.task_done()
                            shutil.copy(fileName, destPath)
                            #tcnt += 1
                            print 'copied: ', tcnt, ' of ', ttotal
                
                    def threadWorkerCopy(fileNameList):
                        print 'threadWorkerCopy: ', len(fileNameList)
                        ttotal = len(fileNameList)
                        for i in range(4):
                            t = threading.Thread(target=CopyWorker)
                            t.daemon = True
                            t.start()
                        for fileName in fileNameList:
                            fileQueue.put(fileName)
                        fileQueue.join()
                
                    def sequentialCopy(fileNameList):
                        #around 160.446 seconds, 152 seconds
                        print 'sequentialCopy: ', len(fileNameList)
                        cnt = 0
                        ctotal = len(fileNameList)
                        for fileName in fileNameList:
                            shutil.copy(fileName, destPath)
                            cnt += 1
                            print 'copied: ', cnt, ' of ', ctotal
                
                    def main():
                        print 'this is main method'
                        fileCount = 0
                        fileList = glob.glob(srcPath + '\' + '*.csv')
                        #sequentialCopy(fileList)
                        threadWorkerCopy(fileList)
                
                    if __name__ == '__main__':
                        profile.run('main()')
                

                推荐答案

                当然慢.硬盘驱动器必须不断地在文件之间寻找.您认为多线程会使这项任务更快的信念是完全没有道理的.限制速度是您可以从磁盘读取数据或将数据写入磁盘的速度,从一个文件到另一个文件的每次寻道都会浪费本可以用于传输数据的时间.

                Of course it's slower. The hard drives are having to seek between the files constantly. Your belief that multi-threading would make this task faster is completely unjustified. The limiting speed is how fast you can read data from or write data to the disk, and every seek from one file to another is a loss of time that could have been spent transferring data.

                这篇关于多线程文件复制比多核 CPU 上的单线程慢得多的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持跟版网!

                本站部分内容来源互联网,如果有图片或者内容侵犯了您的权益,请联系我们,我们会在确认后第一时间进行删除!

                相关文档推荐

                Initialize Multiple Numpy Arrays (Multiple Assignment) - Like MATLAB deal()(初始化多个 Numpy 数组(多重赋值) - 像 MATLAB deal())
                How to extend Python class init(如何扩展 Python 类初始化)
                What#39;s the difference between dict() and {}?(dict() 和 {} 有什么区别?)
                What is a wrapper_descriptor, and why is Foo.__init__() one in this case?(什么是 wrapper_descriptor,为什么 Foo.__init__() 在这种情况下是其中之一?)
                Initialize list with same bool value(使用相同的布尔值初始化列表)
                setattr with kwargs, pythonic or not?(setattr 与 kwargs,pythonic 与否?)
                <legend id='OcAF2'><style id='OcAF2'><dir id='OcAF2'><q id='OcAF2'></q></dir></style></legend>
                  <tbody id='OcAF2'></tbody>
                  <bdo id='OcAF2'></bdo><ul id='OcAF2'></ul>
                  <i id='OcAF2'><tr id='OcAF2'><dt id='OcAF2'><q id='OcAF2'><span id='OcAF2'><b id='OcAF2'><form id='OcAF2'><ins id='OcAF2'></ins><ul id='OcAF2'></ul><sub id='OcAF2'></sub></form><legend id='OcAF2'></legend><bdo id='OcAF2'><pre id='OcAF2'><center id='OcAF2'></center></pre></bdo></b><th id='OcAF2'></th></span></q></dt></tr></i><div id='OcAF2'><tfoot id='OcAF2'></tfoot><dl id='OcAF2'><fieldset id='OcAF2'></fieldset></dl></div>

                        <tfoot id='OcAF2'></tfoot>

                        <small id='OcAF2'></small><noframes id='OcAF2'>