<i id='CPWAg'><tr id='CPWAg'><dt id='CPWAg'><q id='CPWAg'><span id='CPWAg'><b id='CPWAg'><form id='CPWAg'><ins id='CPWAg'></ins><ul id='CPWAg'></ul><sub id='CPWAg'></sub></form><legend id='CPWAg'></legend><bdo id='CPWAg'><pre id='CPWAg'><center id='CPWAg'></center></pre></bdo></b><th id='CPWAg'></th></span></q></dt></tr></i><div id='CPWAg'><tfoot id='CPWAg'></tfoot><dl id='CPWAg'><fieldset id='CPWAg'></fieldset></dl></div>

      1. <tfoot id='CPWAg'></tfoot>
      2. <legend id='CPWAg'><style id='CPWAg'><dir id='CPWAg'><q id='CPWAg'></q></dir></style></legend>

        <small id='CPWAg'></small><noframes id='CPWAg'>

        • <bdo id='CPWAg'></bdo><ul id='CPWAg'></ul>
      3. 使用 python 的 lxml 去除内联标签

        stripping inline tags with python#39;s lxml(使用 python 的 lxml 去除内联标签)
          <tbody id='mSfk1'></tbody>
        <legend id='mSfk1'><style id='mSfk1'><dir id='mSfk1'><q id='mSfk1'></q></dir></style></legend>

        <small id='mSfk1'></small><noframes id='mSfk1'>

        <i id='mSfk1'><tr id='mSfk1'><dt id='mSfk1'><q id='mSfk1'><span id='mSfk1'><b id='mSfk1'><form id='mSfk1'><ins id='mSfk1'></ins><ul id='mSfk1'></ul><sub id='mSfk1'></sub></form><legend id='mSfk1'></legend><bdo id='mSfk1'><pre id='mSfk1'><center id='mSfk1'></center></pre></bdo></b><th id='mSfk1'></th></span></q></dt></tr></i><div id='mSfk1'><tfoot id='mSfk1'></tfoot><dl id='mSfk1'><fieldset id='mSfk1'></fieldset></dl></div>
        1. <tfoot id='mSfk1'></tfoot>
            • <bdo id='mSfk1'></bdo><ul id='mSfk1'></ul>

                  本文介绍了使用 python 的 lxml 去除内联标签的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着跟版网的小编来一起学习吧!

                  问题描述

                  我必须处理 xml 文档中的两种内联标签.第一种类型的标签包含我想要保留的文本.我可以用 lxml 处理这个

                  I have to deal with two types of inline tags in xml documents. The first type of tags enclose text that I want to keep in-between. I can deal with this with lxml's

                  etree.tostring(element, method="text", encoding='utf-8')
                  

                  第二种类型的标签包含我不想保留的文本.我怎样才能摆脱这些标签和他们的文字?如果可能,我宁愿不使用正则表达式.

                  The second type of tags include text that I don't want to keep. How can I get rid of these tags and their text? I would prefer not to use regular expressions, if possible.

                  谢谢

                  推荐答案

                  我认为 strip_tagsstrip_elements 在每种情况下都是您想要的.例如,这个脚本:

                  I think that strip_tags and strip_elements are what you want in each case. For example, this script:

                  from lxml import etree
                  
                  text = "<x>hello, <z>keep me</z> and <y>ignore me</y>, and here's some <y>more</y> text</x>"
                  
                  tree = etree.fromstring(text)
                  
                  print etree.tostring(tree, pretty_print=True)
                  
                  # Remove the <z> tags, but keep their contents:
                  etree.strip_tags(tree, 'z')
                  
                  print '-' * 72
                  print etree.tostring(tree, pretty_print=True)
                  
                  # Remove all the <y> tags including their contents:
                  etree.strip_elements(tree, 'y', with_tail=False)
                  
                  print '-' * 72
                  print etree.tostring(tree, pretty_print=True)
                  

                  ... 产生以下输出:

                  ... produces the following output:

                  <x>hello, <z>keep me</z> and <y>ignore me</y>, and
                  here's some <y>more</y> text</x>
                  
                  ------------------------------------------------------------------------
                  <x>hello, keep me and <y>ignore me</y>, and
                  here's some <y>more</y> text</x>
                  
                  ------------------------------------------------------------------------
                  <x>hello, keep me and , and
                  here's some  text</x>
                  

                  这篇关于使用 python 的 lxml 去除内联标签的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持跟版网!

                  本站部分内容来源互联网,如果有图片或者内容侵犯了您的权益,请联系我们,我们会在确认后第一时间进行删除!

                  相关文档推荐

                  Initialize Multiple Numpy Arrays (Multiple Assignment) - Like MATLAB deal()(初始化多个 Numpy 数组(多重赋值) - 像 MATLAB deal())
                  How to extend Python class init(如何扩展 Python 类初始化)
                  What#39;s the difference between dict() and {}?(dict() 和 {} 有什么区别?)
                  What is a wrapper_descriptor, and why is Foo.__init__() one in this case?(什么是 wrapper_descriptor,为什么 Foo.__init__() 在这种情况下是其中之一?)
                  Initialize list with same bool value(使用相同的布尔值初始化列表)
                  setattr with kwargs, pythonic or not?(setattr 与 kwargs,pythonic 与否?)

                  <tfoot id='LT8Q0'></tfoot>
                  • <i id='LT8Q0'><tr id='LT8Q0'><dt id='LT8Q0'><q id='LT8Q0'><span id='LT8Q0'><b id='LT8Q0'><form id='LT8Q0'><ins id='LT8Q0'></ins><ul id='LT8Q0'></ul><sub id='LT8Q0'></sub></form><legend id='LT8Q0'></legend><bdo id='LT8Q0'><pre id='LT8Q0'><center id='LT8Q0'></center></pre></bdo></b><th id='LT8Q0'></th></span></q></dt></tr></i><div id='LT8Q0'><tfoot id='LT8Q0'></tfoot><dl id='LT8Q0'><fieldset id='LT8Q0'></fieldset></dl></div>
                    • <bdo id='LT8Q0'></bdo><ul id='LT8Q0'></ul>
                        <legend id='LT8Q0'><style id='LT8Q0'><dir id='LT8Q0'><q id='LT8Q0'></q></dir></style></legend>
                          <tbody id='LT8Q0'></tbody>

                          1. <small id='LT8Q0'></small><noframes id='LT8Q0'>