• <i id='NBEOd'><tr id='NBEOd'><dt id='NBEOd'><q id='NBEOd'><span id='NBEOd'><b id='NBEOd'><form id='NBEOd'><ins id='NBEOd'></ins><ul id='NBEOd'></ul><sub id='NBEOd'></sub></form><legend id='NBEOd'></legend><bdo id='NBEOd'><pre id='NBEOd'><center id='NBEOd'></center></pre></bdo></b><th id='NBEOd'></th></span></q></dt></tr></i><div id='NBEOd'><tfoot id='NBEOd'></tfoot><dl id='NBEOd'><fieldset id='NBEOd'></fieldset></dl></div>
    <legend id='NBEOd'><style id='NBEOd'><dir id='NBEOd'><q id='NBEOd'></q></dir></style></legend>

    • <bdo id='NBEOd'></bdo><ul id='NBEOd'></ul>

    <small id='NBEOd'></small><noframes id='NBEOd'>

    1. <tfoot id='NBEOd'></tfoot>

        在 BeautifulSoup 中使用字典解析脚本标签

        Parsing a script tag with dicts in BeautifulSoup(在 BeautifulSoup 中使用字典解析脚本标签)
          <tbody id='HsmNr'></tbody>
          <bdo id='HsmNr'></bdo><ul id='HsmNr'></ul>

            <tfoot id='HsmNr'></tfoot>
          1. <legend id='HsmNr'><style id='HsmNr'><dir id='HsmNr'><q id='HsmNr'></q></dir></style></legend>

              <i id='HsmNr'><tr id='HsmNr'><dt id='HsmNr'><q id='HsmNr'><span id='HsmNr'><b id='HsmNr'><form id='HsmNr'><ins id='HsmNr'></ins><ul id='HsmNr'></ul><sub id='HsmNr'></sub></form><legend id='HsmNr'></legend><bdo id='HsmNr'><pre id='HsmNr'><center id='HsmNr'></center></pre></bdo></b><th id='HsmNr'></th></span></q></dt></tr></i><div id='HsmNr'><tfoot id='HsmNr'></tfoot><dl id='HsmNr'><fieldset id='HsmNr'></fieldset></dl></div>

              1. <small id='HsmNr'></small><noframes id='HsmNr'>

                • 本文介绍了在 BeautifulSoup 中使用字典解析脚本标签的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着跟版网的小编来一起学习吧!

                  问题描述

                  为 this 问题提供部分答案,我来了bs4.element.Tag 是一堆嵌套的字典和列表(s,下面).

                  Working on a partial answer to this question, I came across a bs4.element.Tag that is a mess of nested dicts and lists (s, below).

                  有没有办法使用 re.find_all 返回包含在 s 中的 url 列表?有关此标签结构的其他评论也很有帮助.

                  Is there a way to return a list of urls contained in s without using re.find_all? Other comments regarding the structure of this tag are helpful too.

                  from bs4 import BeautifulSoup
                  import requests
                  
                  link = 'https://stackoverflow.com/jobs?med=site-ui&ref=jobs-tab&sort=p'
                  r = requests.get(link)
                  soup = BeautifulSoup(r.text, 'html.parser')
                  
                  s = soup.find('script', type='application/ld+json')
                  
                  ## the first bit of s:
                  # s
                  # Out[116]: 
                  # <script type="application/ld+json">
                  # {"@context":"http://schema.org","@type":"ItemList","numberOfItems":50,
                  

                  我尝试过的:

                  • s 上随机浏览带有 tab 补全的方法.
                  • 通过文档进行挑选.
                  • randomly perusing through methods with tab completion on s.
                  • picking through the docs.

                  我的问题是 s 只有 1 个属性(type)而且似乎没有任何子标签.

                  My problem is that s only has 1 attribute (type) and doesn't seem to have any child tags.

                  推荐答案

                  可以使用s.text来获取脚本的内容.它是 JSON,因此您可以使用 json.loads 对其进行解析.从那里,它是简单的字典访问:

                  You can use s.text to get the content of the script. It's JSON, so you can then just parse it with json.loads. From there, it's simple dictionary access:

                  import json
                  
                  from bs4 import BeautifulSoup
                  import requests
                  
                  link = 'https://stackoverflow.com/jobs?med=site-ui&ref=jobs-tab&sort=p'
                  r = requests.get(link)
                  
                  soup = BeautifulSoup(r.text, 'html.parser')
                  
                  s = soup.find('script', type='application/ld+json')
                  
                  urls = [el['url'] for el in json.loads(s.text)['itemListElement']]
                  
                  print(urls)
                  

                  这篇关于在 BeautifulSoup 中使用字典解析脚本标签的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持跟版网!

                  本站部分内容来源互联网,如果有图片或者内容侵犯了您的权益,请联系我们,我们会在确认后第一时间进行删除!

                  相关文档推荐

                  Initialize Multiple Numpy Arrays (Multiple Assignment) - Like MATLAB deal()(初始化多个 Numpy 数组(多重赋值) - 像 MATLAB deal())
                  How to extend Python class init(如何扩展 Python 类初始化)
                  What#39;s the difference between dict() and {}?(dict() 和 {} 有什么区别?)
                  What is a wrapper_descriptor, and why is Foo.__init__() one in this case?(什么是 wrapper_descriptor,为什么 Foo.__init__() 在这种情况下是其中之一?)
                  Initialize list with same bool value(使用相同的布尔值初始化列表)
                  setattr with kwargs, pythonic or not?(setattr 与 kwargs,pythonic 与否?)

                      • <tfoot id='vMDDv'></tfoot>
                          <bdo id='vMDDv'></bdo><ul id='vMDDv'></ul>
                        • <i id='vMDDv'><tr id='vMDDv'><dt id='vMDDv'><q id='vMDDv'><span id='vMDDv'><b id='vMDDv'><form id='vMDDv'><ins id='vMDDv'></ins><ul id='vMDDv'></ul><sub id='vMDDv'></sub></form><legend id='vMDDv'></legend><bdo id='vMDDv'><pre id='vMDDv'><center id='vMDDv'></center></pre></bdo></b><th id='vMDDv'></th></span></q></dt></tr></i><div id='vMDDv'><tfoot id='vMDDv'></tfoot><dl id='vMDDv'><fieldset id='vMDDv'></fieldset></dl></div>

                          1. <legend id='vMDDv'><style id='vMDDv'><dir id='vMDDv'><q id='vMDDv'></q></dir></style></legend>
                              <tbody id='vMDDv'></tbody>

                            <small id='vMDDv'></small><noframes id='vMDDv'>