• <tfoot id='esROl'></tfoot>
    • <bdo id='esROl'></bdo><ul id='esROl'></ul>

      <small id='esROl'></small><noframes id='esROl'>

    1. <legend id='esROl'><style id='esROl'><dir id='esROl'><q id='esROl'></q></dir></style></legend>
      <i id='esROl'><tr id='esROl'><dt id='esROl'><q id='esROl'><span id='esROl'><b id='esROl'><form id='esROl'><ins id='esROl'></ins><ul id='esROl'></ul><sub id='esROl'></sub></form><legend id='esROl'></legend><bdo id='esROl'><pre id='esROl'><center id='esROl'></center></pre></bdo></b><th id='esROl'></th></span></q></dt></tr></i><div id='esROl'><tfoot id='esROl'></tfoot><dl id='esROl'><fieldset id='esROl'></fieldset></dl></div>

        如何在 webtable 中打开多个 href 以抓取 selenium

        How to open multiple hrefs within a webtable to scrape through selenium(如何在 webtable 中打开多个 href 以抓取 selenium)

        • <small id='E3FFr'></small><noframes id='E3FFr'>

              <bdo id='E3FFr'></bdo><ul id='E3FFr'></ul>
            • <i id='E3FFr'><tr id='E3FFr'><dt id='E3FFr'><q id='E3FFr'><span id='E3FFr'><b id='E3FFr'><form id='E3FFr'><ins id='E3FFr'></ins><ul id='E3FFr'></ul><sub id='E3FFr'></sub></form><legend id='E3FFr'></legend><bdo id='E3FFr'><pre id='E3FFr'><center id='E3FFr'></center></pre></bdo></b><th id='E3FFr'></th></span></q></dt></tr></i><div id='E3FFr'><tfoot id='E3FFr'></tfoot><dl id='E3FFr'><fieldset id='E3FFr'></fieldset></dl></div>

                <tbody id='E3FFr'></tbody>
                <legend id='E3FFr'><style id='E3FFr'><dir id='E3FFr'><q id='E3FFr'></q></dir></style></legend>

                <tfoot id='E3FFr'></tfoot>
                  本文介绍了如何在 webtable 中打开多个 href 以抓取 selenium的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着跟版网的小编来一起学习吧!

                  问题描述

                  我正在尝试使用 python 和 selenium 抓取这个网站.但是我需要的所有信息都没有在主页上,那么我该如何点击申请号"栏中的链接到该页面并抓取信息然后返回原始页面?

                  我试过了:

                  def getData():数据 = []select = Select(driver.find_elements_by_xpath('//*[@id="node-41"]/div/div/div/div/div/div[1]/table/tbody/tr/td/a/@href'))list_options = select.options对于范围内的项目(len(list_options)):item.click()driver.get(url)

                  网址:

                  解决方案

                  要在 webtable 中打开多个 href 以通过 selenium 进行抓取,您可以使用以下解决方案:

                  • 代码块:

                     from selenium import webdriver从 selenium.webdriver.chrome.options 导入选项从 selenium.webdriver.support.ui 导入 WebDriverWait从 selenium.webdriver.common.by 导入从 selenium.webdriver.support 导入 expected_conditions 作为 EC链接 = []选项=选项()options.add_argument(开始最大化")options.add_argument(禁用信息栏")options.add_argument("--disable-extensions")options.add_argument("--disable-gpu")options.add_argument("--no-sandbox")driver = webdriver.Chrome(chrome_options=options, executable_path=r'C:WebDriversChromeDriverchromedriver_win32chromedriver.exe')driver.get('http://www.scilly.gov.uk/planning-development/planning-applications')windows_before = driver.current_window_handle # 存储 parent_window_handle 以备将来使用elements = WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "td.views-field.views-field-title>a"))) # 引入 WebDriverWait 以获得所需元素的可见性对于元素中的元素:hrefs.append(element.get_attribute("href")) # 收集所需的href属性并存储在列表中对于hrefs中的href:driver.execute_script("window.open('" + href +"');") # 在新标签页中通过execute_script方法一一打开hrefWebDriverWait(driver, 10).until(EC.number_of_windows_to_be(2)) # 为 number_of_windows_to_be 2 引入 WebDriverWaitwindows_after = driver.window_handlesnew_window = [x for x in windows_after if x != windows_before][0] # 识别新打开的窗口# driver.switch_to_window(new_window) <!---弃用>driver.switch_to.window(new_window) # switch_to新窗口# 在这里执行你的网页抓取print(driver.title) # 打印页面标题或执行您的网页抓取driver.close() # 关闭窗口# driver.switch_to_window(windows_before) <!---弃用>driver.switch_to.window(windows_before) # switch_to parent_window_handledriver.quit() #退出你的程序

                  • 控制台输出:

                     规划申请:P/18/064 |锡利群岛理事会规划申请:P/18/063 |锡利群岛理事会规划申请:P/18/062 |锡利群岛理事会规划申请:P/18/061 |锡利群岛理事会规划申请:p/18/059 |锡利群岛理事会规划申请:P/18/058 |锡利群岛理事会规划申请:P/18/057 |锡利群岛理事会规划申请:P/18/056 |锡利群岛理事会规划申请:P/18/055 |锡利群岛理事会规划申请:P/18/054 |锡利群岛理事会


                  参考文献

                  您可以在以下位置找到一些相关的详细讨论:

                  • 在 Python 中使用 Selenium 抓取 JavaScript 渲染的内容
                  • StaleElementReferenceException 即使在添加等待使用网络抓取从维基百科收集数据
                  • 在访问第一个元素后,无法通过 xpaths 在循环中访问其余元素-Webscraping Selenium Python
                  • 如何在新标签页中打开网站中的每个产品,以便通过 Python 使用 Selenium 进行抓取
                  • 如何在 webtable 中打开多个 href 以通过 selenium 进行抓取

                  I'm trying to scrape this website using python and selenium. However all the information I need is not on the main page, so how would I click the links in the 'Application number' column one by one go to that page scrape the information then return to original page?

                  Ive tried:

                  def getData():
                    data = []
                    select = Select(driver.find_elements_by_xpath('//*[@id="node-41"]/div/div/div/div/div/div[1]/table/tbody/tr/td/a/@href'))
                    list_options = select.options
                    for item in range(len(list_options)):
                      item.click()
                    driver.get(url)
                  

                  URL: http://www.scilly.gov.uk/planning-development/planning-applications

                  Screenshot of the site:

                  解决方案

                  To open multiple hrefs within a webtable to scrape through selenium you can use the following solution:

                  • Code Block:

                      from selenium import webdriver
                      from selenium.webdriver.chrome.options import Options
                      from selenium.webdriver.support.ui import WebDriverWait
                      from selenium.webdriver.common.by import By
                      from selenium.webdriver.support import expected_conditions as EC
                    
                      hrefs = []
                      options = Options()
                      options.add_argument("start-maximized")
                      options.add_argument("disable-infobars")
                      options.add_argument("--disable-extensions")
                      options.add_argument("--disable-gpu")
                      options.add_argument("--no-sandbox")
                      driver = webdriver.Chrome(chrome_options=options, executable_path=r'C:WebDriversChromeDriverchromedriver_win32chromedriver.exe')
                      driver.get('http://www.scilly.gov.uk/planning-development/planning-applications')
                      windows_before  = driver.current_window_handle # Store the parent_window_handle for future use
                      elements = WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "td.views-field.views-field-title>a"))) # Induce WebDriverWait for the visibility of the desired elements
                      for element in elements:
                          hrefs.append(element.get_attribute("href")) # Collect the required href attributes and store in a list
                      for href in hrefs:
                          driver.execute_script("window.open('" + href +"');") # Open the hrefs one by one through execute_script method in a new tab
                          WebDriverWait(driver, 10).until(EC.number_of_windows_to_be(2)) # Induce  WebDriverWait for the number_of_windows_to_be 2
                          windows_after = driver.window_handles
                          new_window = [x for x in windows_after if x != windows_before][0] # Identify the newly opened window
                          # driver.switch_to_window(new_window) <!---deprecated>
                          driver.switch_to.window(new_window) # switch_to the new window
                          # perform your webscraping here
                          print(driver.title) # print the page title or your perform your webscraping
                          driver.close() # close the window
                          # driver.switch_to_window(windows_before) <!---deprecated>
                          driver.switch_to.window(windows_before) # switch_to the parent_window_handle
                      driver.quit() #Quit your program
                    

                  • Console Output:

                      Planning application: P/18/064 | Council of the ISLES OF SCILLY
                      Planning application: P/18/063 | Council of the ISLES OF SCILLY
                      Planning application: P/18/062 | Council of the ISLES OF SCILLY
                      Planning application: P/18/061 | Council of the ISLES OF SCILLY
                      Planning application: p/18/059 | Council of the ISLES OF SCILLY
                      Planning application: P/18/058 | Council of the ISLES OF SCILLY
                      Planning application: P/18/057 | Council of the ISLES OF SCILLY
                      Planning application: P/18/056 | Council of the ISLES OF SCILLY
                      Planning application: P/18/055 | Council of the ISLES OF SCILLY
                      Planning application: P/18/054 | Council of the ISLES OF SCILLY
                    


                  References

                  You can find a couple of relevant detailed discussions in:

                  • WebScraping JavaScript-Rendered Content using Selenium in Python
                  • StaleElementReferenceException even after adding the wait while collecting the data from the wikipedia using web-scraping
                  • Unable to access the remaining elements by xpaths in a loop after accessing the first element- Webscraping Selenium Python
                  • How to open each product within a website in a new tab for scraping using Selenium through Python
                  • How to open multiple hrefs within a webtable to scrape through selenium

                  这篇关于如何在 webtable 中打开多个 href 以抓取 selenium的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持跟版网!

                  本站部分内容来源互联网,如果有图片或者内容侵犯了您的权益,请联系我们,我们会在确认后第一时间进行删除!

                  相关文档推荐

                  env: python: No such file or directory(env: python: 没有这样的文件或目录)
                  How to evaluate environment variables into a string in Python?(如何在 Python 中将环境变量评估为字符串?)
                  Python - temporarily modify the current process#39;s environment(Python - 临时修改当前进程的环境)
                  Change current process environment#39;s LD_LIBRARY_PATH(更改当前进程环境的 LD_LIBRARY_PATH)
                  Reading and writing environment variables in Python?(在 Python 中读写环境变量?)
                  When to use sys.path.append and when modifying %PYTHONPATH% is enough(何时使用 sys.path.append 以及何时修改 %PYTHONPATH% 就足够了)

                    • <bdo id='vkguF'></bdo><ul id='vkguF'></ul>
                    • <small id='vkguF'></small><noframes id='vkguF'>

                      <tfoot id='vkguF'></tfoot>

                        1. <i id='vkguF'><tr id='vkguF'><dt id='vkguF'><q id='vkguF'><span id='vkguF'><b id='vkguF'><form id='vkguF'><ins id='vkguF'></ins><ul id='vkguF'></ul><sub id='vkguF'></sub></form><legend id='vkguF'></legend><bdo id='vkguF'><pre id='vkguF'><center id='vkguF'></center></pre></bdo></b><th id='vkguF'></th></span></q></dt></tr></i><div id='vkguF'><tfoot id='vkguF'></tfoot><dl id='vkguF'><fieldset id='vkguF'></fieldset></dl></div>
                            <tbody id='vkguF'></tbody>
                          <legend id='vkguF'><style id='vkguF'><dir id='vkguF'><q id='vkguF'></q></dir></style></legend>