• <bdo id='jDTvN'></bdo><ul id='jDTvN'></ul>
  1. <i id='jDTvN'><tr id='jDTvN'><dt id='jDTvN'><q id='jDTvN'><span id='jDTvN'><b id='jDTvN'><form id='jDTvN'><ins id='jDTvN'></ins><ul id='jDTvN'></ul><sub id='jDTvN'></sub></form><legend id='jDTvN'></legend><bdo id='jDTvN'><pre id='jDTvN'><center id='jDTvN'></center></pre></bdo></b><th id='jDTvN'></th></span></q></dt></tr></i><div id='jDTvN'><tfoot id='jDTvN'></tfoot><dl id='jDTvN'><fieldset id='jDTvN'></fieldset></dl></div>
    <legend id='jDTvN'><style id='jDTvN'><dir id='jDTvN'><q id='jDTvN'></q></dir></style></legend>
    1. <small id='jDTvN'></small><noframes id='jDTvN'>

      <tfoot id='jDTvN'></tfoot>

      在 Python 中与 finditer() 重叠匹配

      Overlapping matches with finditer() in Python(在 Python 中与 finditer() 重叠匹配)

      <small id='SX1ZU'></small><noframes id='SX1ZU'>

      <i id='SX1ZU'><tr id='SX1ZU'><dt id='SX1ZU'><q id='SX1ZU'><span id='SX1ZU'><b id='SX1ZU'><form id='SX1ZU'><ins id='SX1ZU'></ins><ul id='SX1ZU'></ul><sub id='SX1ZU'></sub></form><legend id='SX1ZU'></legend><bdo id='SX1ZU'><pre id='SX1ZU'><center id='SX1ZU'></center></pre></bdo></b><th id='SX1ZU'></th></span></q></dt></tr></i><div id='SX1ZU'><tfoot id='SX1ZU'></tfoot><dl id='SX1ZU'><fieldset id='SX1ZU'></fieldset></dl></div>
      <tfoot id='SX1ZU'></tfoot>
        <bdo id='SX1ZU'></bdo><ul id='SX1ZU'></ul>

            <tbody id='SX1ZU'></tbody>

              <legend id='SX1ZU'><style id='SX1ZU'><dir id='SX1ZU'><q id='SX1ZU'></q></dir></style></legend>
              • 本文介绍了在 Python 中与 finditer() 重叠匹配的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着跟版网的小编来一起学习吧!

                问题描述

                I'm using a regex to match Bible verse references in a text. The current regex is

                REF_REGEX = re.compile('''
                  (?<!w)                        # Not preceded by any words
                  (?P<quote>q(?:uote)?s+)?      # Match optional 'q' or 'quote' followed by many spaces
                  (?P<book>                           
                    (?:(?:[1-3]|I{1,3})s*)?     # Match an optional arabic or roman number between 1 and 3.
                    [A-Za-z]+                    # Match any alphabetics
                  ).?                           # Followed by an optional dot
                  (?:                         
                    s*(?P<chapter>d+)          # Match the chapter number
                    (?:
                      [:.](?P<startverse>d+)   # Match the starting verse number, preceded by ':' or '.'
                        (?:-(?P<endverse>d+))?  # Match the optional ending verse number, preceded by '-'
                    )?                           # Verse numbers are optional
                  )
                  (?:
                    s+(?:                       # Here be spaces
                      (?:froms+)|(?:ins+)|(?P<lbrace>())   # Match 'from[:space:]', 'in[:space:]' or '('
                      s*(?P<version>w+)        # Match a word preceded by optional spaces
                      (?(lbrace)))              # Close the '(' if found earlier
                  )?                             # The whole 'in|from|()' is optional
                  ''', re.IGNORECASE | re.VERBOSE | re.UNICODE)
                

                This matches the following expressions fine:

                "jn 3:16":                           (None, 'jn', '3', '16', None, None, None),
                "matt. 18:21-22":                    (None, 'matt', '18', '21', '22', None, None),
                "q matt. 18:21-22":                  ('q ', 'matt', '18', '21', '22', None, None),
                "QuOTe jn 3:16":                     ('QuOTe ', 'jn', '3', '16', None, None, None),
                "q 1co13:1":                         ('q ', '1co', '13', '1', None, None, None), 
                "q 1 co 13:1":                       ('q ', '1 co', '13', '1', None, None, None),
                "quote 1 co 13:1":                   ('quote ', '1 co', '13', '1', None, None, None),
                "quote 1co13:1":                     ('quote ', '1co', '13', '1', None, None, None),
                "jean 3:18 (PDV)":                   (None, 'jean', '3', '18', None, '(', 'PDV'),
                "quote malachie 1.1-2 fRom Colombe": ('quote ', 'malachie', '1', '1', '2', None, 'Colombe'),
                "quote malachie 1.1-2 In Colombe":   ('quote ', 'malachie', '1', '1', '2', None, 'Colombe'),
                "cinq jn 3:16 (test)":               (None, 'jn', '3', '16', None, '(', 'test'),
                "Q   IIKings5.13-58   from   wolof": ('Q     ', 'IIKings', '5', '13', '58', None, 'wolof'),
                "This text is about lv5.4-6 in KJV only": (None, 'lv', '5', '4', '6', None, 'KJV'),
                

                but it fails to parse:

                "Found in 2 Cor. 5:18-21 ( Ministers":                    (None, '2 Cor', '5', '18', '21', None, None),
                

                because it returns (None, 'in', '2', None, None, None, None) instead.

                Is there a way to get finditer() to return all matches, even if they overlap, or is there a way to improve my regex so it matches this last bit properly?

                Thanks.

                解决方案

                A character consumed is consumed, you should not ask the regex engine to go back.

                From your examples the verse part (e.g. :1) seems not optional. Removing that will match the last bit.

                ref_regex = re.compile('''
                (?<!w)                      # Not preceeded by any words
                ((?i)q(?:uote)?s+)?            # Match 'q' or 'quote' followed by many spaces
                (
                    (?:(?:[1-3]|I{1,3})s*)?    # Match an arabic or roman number between 1 and 3.
                    [A-Za-z]+                   # Match many alphabetics
                ).?                            # Followed by an optional dot
                (?:
                    s*(d+)                    # Match the chapter number
                    (?:
                        [:.](d+)               # Match the verse number
                        (?:-(d+))?             # Match the ending verse number
                    )                    # <-- no '?' here
                )
                (?:
                    s+
                    (?:
                        (?i)(?:froms+)|        # Match the keyword 'from' or 'in'
                        (?:ins+)|
                        (?P<lbrace>()      # or stuff between (...)
                    )s*(w+)
                    (?(lbrace)))
                )?
                ''', re.X | re.U)
                

                (If you're going to write a gigantic RegEx like this, please use the /x flag.)


                If you really need overlapping matches, you could use a lookahead. A simple example is

                >>> rx = re.compile('(.)(?=(.))')
                >>> x = rx.finditer("abcdefgh")
                >>> [y.groups() for y in x]
                [('a', 'b'), ('b', 'c'), ('c', 'd'), ('d', 'e'), ('e', 'f'), ('f', 'g'), ('g', 'h')]
                

                You may extend this idea to your RegEx.

                这篇关于在 Python 中与 finditer() 重叠匹配的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持跟版网!

                本站部分内容来源互联网,如果有图片或者内容侵犯了您的权益,请联系我们,我们会在确认后第一时间进行删除!

                相关文档推荐

                Running .jl file from R or Python(从 R 或 Python 运行 .jl 文件)
                Running Julia .jl file in python(在 python 中运行 Julia .jl 文件)
                Using PIP in a Azure WebApp(在 Azure WebApp 中使用 PIP)
                How to run python3.7 based flask web api on azure(如何在 azure 上运行基于 python3.7 的烧瓶 web api)
                Azure Python Web App Internal Server Error(Azure Python Web 应用程序内部服务器错误)
                Run python dlib library on azure app service(在 azure app 服务上运行 python dlib 库)
                    <tfoot id='bAFYB'></tfoot>
                    <i id='bAFYB'><tr id='bAFYB'><dt id='bAFYB'><q id='bAFYB'><span id='bAFYB'><b id='bAFYB'><form id='bAFYB'><ins id='bAFYB'></ins><ul id='bAFYB'></ul><sub id='bAFYB'></sub></form><legend id='bAFYB'></legend><bdo id='bAFYB'><pre id='bAFYB'><center id='bAFYB'></center></pre></bdo></b><th id='bAFYB'></th></span></q></dt></tr></i><div id='bAFYB'><tfoot id='bAFYB'></tfoot><dl id='bAFYB'><fieldset id='bAFYB'></fieldset></dl></div>
                    <legend id='bAFYB'><style id='bAFYB'><dir id='bAFYB'><q id='bAFYB'></q></dir></style></legend>
                        <tbody id='bAFYB'></tbody>

                          <bdo id='bAFYB'></bdo><ul id='bAFYB'></ul>

                          <small id='bAFYB'></small><noframes id='bAFYB'>