1. <i id='R9d2L'><tr id='R9d2L'><dt id='R9d2L'><q id='R9d2L'><span id='R9d2L'><b id='R9d2L'><form id='R9d2L'><ins id='R9d2L'></ins><ul id='R9d2L'></ul><sub id='R9d2L'></sub></form><legend id='R9d2L'></legend><bdo id='R9d2L'><pre id='R9d2L'><center id='R9d2L'></center></pre></bdo></b><th id='R9d2L'></th></span></q></dt></tr></i><div id='R9d2L'><tfoot id='R9d2L'></tfoot><dl id='R9d2L'><fieldset id='R9d2L'></fieldset></dl></div>
      <tfoot id='R9d2L'></tfoot>
        <bdo id='R9d2L'></bdo><ul id='R9d2L'></ul>
    2. <small id='R9d2L'></small><noframes id='R9d2L'>

      1. <legend id='R9d2L'><style id='R9d2L'><dir id='R9d2L'><q id='R9d2L'></q></dir></style></legend>

        如何从 HTML 字符串中获取美丽汤中的开始和结束标记?

        How to get the opening and closing tag in beautiful soup from HTML string?(如何从 HTML 字符串中获取美丽汤中的开始和结束标记?)
          <i id='0QB07'><tr id='0QB07'><dt id='0QB07'><q id='0QB07'><span id='0QB07'><b id='0QB07'><form id='0QB07'><ins id='0QB07'></ins><ul id='0QB07'></ul><sub id='0QB07'></sub></form><legend id='0QB07'></legend><bdo id='0QB07'><pre id='0QB07'><center id='0QB07'></center></pre></bdo></b><th id='0QB07'></th></span></q></dt></tr></i><div id='0QB07'><tfoot id='0QB07'></tfoot><dl id='0QB07'><fieldset id='0QB07'></fieldset></dl></div>
            <bdo id='0QB07'></bdo><ul id='0QB07'></ul>

            1. <small id='0QB07'></small><noframes id='0QB07'>

            2. <tfoot id='0QB07'></tfoot>

                <legend id='0QB07'><style id='0QB07'><dir id='0QB07'><q id='0QB07'></q></dir></style></legend>

                    <tbody id='0QB07'></tbody>
                  本文介绍了如何从 HTML 字符串中获取美丽汤中的开始和结束标记?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着跟版网的小编来一起学习吧!

                  问题描述

                  我正在使用漂亮的汤编写一个 python 脚本,我必须从包含一些 HTML 代码的字符串中获取一个开始标签.

                  I am writing a python script using beautiful soup, where i have to get an opening tag from a string containing some HTML code.

                  这是我的字符串:

                  string = <p>...</p>
                  

                  我想在名为 opening_tag 的变量中获取 <p> 并在名为 的变量中获取 </p>关闭标签.我已经搜索了文档,但似乎没有找到解决方案.谁能给我建议?

                  I want to get <p> in a variable called opening_tag and </p> in a variable called closing_tag. I have searched the documentation but don't seem to find the solution. Can anyone advise me with that?

                  推荐答案

                  有一种方法可以使用 BeautifulSoup 和一个简单的 reg-ex:

                  There is a way to do this with BeautifulSoup and a simple reg-ex:

                  • 将段落放在 BeautifulSoup 对象中,例如,soupParagraph.

                  • Put the paragraph in a BeautifulSoup object, e.g., soupParagraph.

                  对于开始 (<p>) 和结束 (</p>) 标记之间的内容,将内容移动到另一个 BeautifulSoup 对象,例如,soupInnerParagraph.(通过移动内容,它们不会被删除).

                  For the contents between the opening (<p>) and closing (</p>) tags, move the contents to another BeautifulSoup object, e.g., soupInnerParagraph. (By moving the contents, they are not deleted).

                  然后,soupParagraph 将只有开始和结束标签.

                  Then, soupParagraph will just have the opening and closing tags.

                  将 soupParagraph 转换为 HTML 文本格式并将其存储在字符串变量中

                  Convert soupParagraph to HTML text-format and store that in a string variable

                  要获取开始标签,请使用正则表达式从字符串变量中删除结束标签.

                  To get the opening tag, use a regular expression to remove the closing tag from the string variable.

                  一般来说,用正则表达式解析 HTML 是有问题的,通常最好避免.但是,这里可能是合理的.

                  In general, parsing HTML with a regular-expression is problematic, and usually best avoided. However, it may be reasonable here.

                  结束标签很简单.它没有为其定义属性,并且不允许在其中添加注释.

                  A closing tag is simple. It does not have attributes defined for it, and a comment is not allowed within it.

                  我可以在结束标签上有属性吗?

                  元素开始标签内的HTML注释

                  此代码从 <body...> ... </body> 部分获取开始标记.代码已经过测试.

                  This code gets the opening tag from a <body...> ... </body> section. The code has been tested.

                  # The variable "body" is a BeautifulSoup object that contains a <body> section.
                  bodyInnerHtml = BeautifulSoup("", 'html.parser')
                  bodyContentsList = body.contents
                  for i in range(0, len(bodyContentsList)):
                      # .append moves the HTML element from body to bodyInnerHtml
                      bodyInnerHtml.append(bodyContentsList[0])
                  
                  # Convert the <body> opening and closing tags to HTML text format
                  bodyTags = body.decode(formatter='html')
                  # Extract the opening tag, by removing the closing tag
                  regex = r"(s*</bodys*>s*$)"
                  substitution = ""
                  bodyOpeningTag, substitutionCount = re.subn(regex, substitution, bodyTags, 0, re.M)
                  if (substitutionCount != 1):
                      print("")
                      print("ERROR.  The expected HTML </body> tag was not found.")
                  

                  这篇关于如何从 HTML 字符串中获取美丽汤中的开始和结束标记?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持跟版网!

                  本站部分内容来源互联网,如果有图片或者内容侵犯了您的权益,请联系我们,我们会在确认后第一时间进行删除!

                  相关文档推荐

                  Initialize Multiple Numpy Arrays (Multiple Assignment) - Like MATLAB deal()(初始化多个 Numpy 数组(多重赋值) - 像 MATLAB deal())
                  How to extend Python class init(如何扩展 Python 类初始化)
                  What#39;s the difference between dict() and {}?(dict() 和 {} 有什么区别?)
                  What is a wrapper_descriptor, and why is Foo.__init__() one in this case?(什么是 wrapper_descriptor,为什么 Foo.__init__() 在这种情况下是其中之一?)
                  Initialize list with same bool value(使用相同的布尔值初始化列表)
                  setattr with kwargs, pythonic or not?(setattr 与 kwargs,pythonic 与否?)

                    <small id='rB6eY'></small><noframes id='rB6eY'>

                    <legend id='rB6eY'><style id='rB6eY'><dir id='rB6eY'><q id='rB6eY'></q></dir></style></legend>
                      • <tfoot id='rB6eY'></tfoot>

                        <i id='rB6eY'><tr id='rB6eY'><dt id='rB6eY'><q id='rB6eY'><span id='rB6eY'><b id='rB6eY'><form id='rB6eY'><ins id='rB6eY'></ins><ul id='rB6eY'></ul><sub id='rB6eY'></sub></form><legend id='rB6eY'></legend><bdo id='rB6eY'><pre id='rB6eY'><center id='rB6eY'></center></pre></bdo></b><th id='rB6eY'></th></span></q></dt></tr></i><div id='rB6eY'><tfoot id='rB6eY'></tfoot><dl id='rB6eY'><fieldset id='rB6eY'></fieldset></dl></div>
                          <bdo id='rB6eY'></bdo><ul id='rB6eY'></ul>
                              <tbody id='rB6eY'></tbody>