<tfoot id='6qxks'></tfoot>

    <small id='6qxks'></small><noframes id='6qxks'>

      <bdo id='6qxks'></bdo><ul id='6qxks'></ul>
        <legend id='6qxks'><style id='6qxks'><dir id='6qxks'><q id='6qxks'></q></dir></style></legend>

      1. <i id='6qxks'><tr id='6qxks'><dt id='6qxks'><q id='6qxks'><span id='6qxks'><b id='6qxks'><form id='6qxks'><ins id='6qxks'></ins><ul id='6qxks'></ul><sub id='6qxks'></sub></form><legend id='6qxks'></legend><bdo id='6qxks'><pre id='6qxks'><center id='6qxks'></center></pre></bdo></b><th id='6qxks'></th></span></q></dt></tr></i><div id='6qxks'><tfoot id='6qxks'></tfoot><dl id='6qxks'><fieldset id='6qxks'></fieldset></dl></div>

        如何检查任务是否已经在 python 队列中?

        How check if a task is already in python Queue?(如何检查任务是否已经在 python 队列中?)

        <legend id='Pb9Xp'><style id='Pb9Xp'><dir id='Pb9Xp'><q id='Pb9Xp'></q></dir></style></legend>

          <tbody id='Pb9Xp'></tbody>
        <i id='Pb9Xp'><tr id='Pb9Xp'><dt id='Pb9Xp'><q id='Pb9Xp'><span id='Pb9Xp'><b id='Pb9Xp'><form id='Pb9Xp'><ins id='Pb9Xp'></ins><ul id='Pb9Xp'></ul><sub id='Pb9Xp'></sub></form><legend id='Pb9Xp'></legend><bdo id='Pb9Xp'><pre id='Pb9Xp'><center id='Pb9Xp'></center></pre></bdo></b><th id='Pb9Xp'></th></span></q></dt></tr></i><div id='Pb9Xp'><tfoot id='Pb9Xp'></tfoot><dl id='Pb9Xp'><fieldset id='Pb9Xp'></fieldset></dl></div>

                <bdo id='Pb9Xp'></bdo><ul id='Pb9Xp'></ul>

                <small id='Pb9Xp'></small><noframes id='Pb9Xp'>

                1. <tfoot id='Pb9Xp'></tfoot>
                  本文介绍了如何检查任务是否已经在 python 队列中?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着跟版网的小编来一起学习吧!

                  问题描述

                  我正在使用 Python 中的 threading 和 Queue 模块编写一个简单的爬虫.我获取一个页面,检查链接并将它们放入队列中,当某个线程完成处理页面时,它会从队列中获取下一个.我正在为我已经访问过的页面使用一个数组来过滤我添加到队列中的链接,但是如果有多个线程并且它们在不同的页面上获得相同的链接,它们会将重复的链接放入队列中.那么如何确定某个 url 是否已经在队列中以避免再次将其放在那里呢?

                  I'm writing a simple crawler in Python using the threading and Queue modules. I fetch a page, check links and put them into a queue, when a certain thread has finished processing page, it grabs the next one from the queue. I'm using an array for the pages I've already visited to filter the links I add to the queue, but if there are more than one threads and they get the same links on different pages, they put duplicate links to the queue. So how can I find out whether some url is already in the queue to avoid putting it there again?

                  推荐答案

                  如果您不关心处理项目的顺序,我会尝试使用 Queue 的子类代码>设置内部:

                  If you don't care about the order in which items are processed, I'd try a subclass of Queue that uses set internally:

                  class SetQueue(Queue):
                  
                      def _init(self, maxsize):
                          self.maxsize = maxsize
                          self.queue = set()
                  
                      def _put(self, item):
                          self.queue.add(item)
                  
                      def _get(self):
                          return self.queue.pop()
                  

                  正如 Paul McGuire 所指出的,这将允许在从待处理"集中删除且尚未添加到已处理"集中的重复项目之后添加.为了解决这个问题,您可以将这两个集合存储在 Queue 实例中,但是由于您使用较大的集合来检查项目是否已被处理,您也可以返回 queue 将正确排序请求.

                  As Paul McGuire pointed out, this would allow adding a duplicate item after it's been removed from the "to-be-processed" set and not yet added to the "processed" set. To solve this, you can store both sets in the Queue instance, but since you are using the larger set for checking if the item has been processed, you can just as well go back to queue which will order requests properly.

                  class SetQueue(Queue):
                  
                      def _init(self, maxsize):
                          Queue._init(self, maxsize) 
                          self.all_items = set()
                  
                      def _put(self, item):
                          if item not in self.all_items:
                              Queue._put(self, item) 
                              self.all_items.add(item)
                  

                  与单独使用一个集合相比,这样做的优点是 Queue 的方法是线程安全的,因此您不需要额外的锁定来检查另一个集合.

                  The advantage of this, as opposed to using a set separately, is that the Queue's methods are thread-safe, so that you don't need additional locking for checking the other set.

                  这篇关于如何检查任务是否已经在 python 队列中?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持跟版网!

                  本站部分内容来源互联网,如果有图片或者内容侵犯了您的权益,请联系我们,我们会在确认后第一时间进行删除!

                  相关文档推荐

                  Adding config modes to Plotly.Py offline - modebar(将配置模式添加到 Plotly.Py 离线 - 模式栏)
                  Plotly: How to style a plotly figure so that it doesn#39;t display gaps for missing dates?(Plotly:如何设置绘图图形的样式,使其不显示缺失日期的间隙?)
                  python save plotly plot to local file and insert into html(python将绘图保存到本地文件并插入到html中)
                  Plotly: What color cycle does plotly express follow?(情节:情节表达遵循什么颜色循环?)
                  How to save plotly express plot into a html or static image file?(如何将情节表达图保存到 html 或静态图像文件中?)
                  Plotly: How to make a line plot from a pandas dataframe with a long or wide format?(Plotly:如何使用长格式或宽格式的 pandas 数据框制作线图?)
                    <tbody id='rUVzx'></tbody>

                    <legend id='rUVzx'><style id='rUVzx'><dir id='rUVzx'><q id='rUVzx'></q></dir></style></legend>

                          • <bdo id='rUVzx'></bdo><ul id='rUVzx'></ul>

                            <small id='rUVzx'></small><noframes id='rUVzx'>

                          • <i id='rUVzx'><tr id='rUVzx'><dt id='rUVzx'><q id='rUVzx'><span id='rUVzx'><b id='rUVzx'><form id='rUVzx'><ins id='rUVzx'></ins><ul id='rUVzx'></ul><sub id='rUVzx'></sub></form><legend id='rUVzx'></legend><bdo id='rUVzx'><pre id='rUVzx'><center id='rUVzx'></center></pre></bdo></b><th id='rUVzx'></th></span></q></dt></tr></i><div id='rUVzx'><tfoot id='rUVzx'></tfoot><dl id='rUVzx'><fieldset id='rUVzx'></fieldset></dl></div>
                            <tfoot id='rUVzx'></tfoot>