Python 正则表达式之四：re 模块

kdpz2605 8年前
   <p>最最基本的用法就是re.search了，在前面的三篇文章中，我们也已经见过多次，这里就不再赘述了。</p>    <h3>re.sub</h3>    <p>使用正则表达式进行查找替换，正是re.sub的功能。</p>    <p>例如，下面这个例子，将格式化列与之间逗号的用法：</p>    <pre>  <code class="language-python">>>> row = "column 1,column 2, column 3"  >>> re.sub(r',\s*', ',', row)  'column 1,column 2,column 3'  </code></pre>    <p>下面这个例子更复杂一些，配合正则表达式中的捕获和引用特性，可以方便的转换日期格式：</p>    <pre>  <code class="language-python">>>> sentence = "from 12/22/1629 to 11/14/1643"  >>> re.sub(r'(\d{2})/(\d{2})/(\d{4})', r'\3-\1-\2', sentence)  'from 1629-12-22 to 1643-11-14'  </code></pre>    <h3>re.split</h3>    <p>re.split，算是string.split的正则表达式增强版。</p>    <p>例如，对于如下的格式不太规范的逗号分隔的列，就可以用re.split分割出正确的列内容，比用re.findall简洁得多：</p>    <pre>  <code class="language-python">>>> re.findall(r'([^,]*)(?:,\s*|$)', 'column1, column2,column3')  ['column1', 'column2', 'column3', '']  >>> re.split(r',\s*', 'column1, column2,column3')  ['column1', 'column2', 'column3']  </code></pre>    <h3>re.compile</h3>    <p>如果一个正则表达式会被多次用到，那么最好使用re.compile预先创建好一个正则对象，这样执行效率更高。</p>    <pre>  <code class="language-python">>>> COMMA_RE = re.compile(r',\s*')  >>> COMMA_RE.split('column1, column2,column3')  ['column1', 'column2', 'column3']  </code></pre>    <h3>re.IGNORECASE和re.VERBOSE</h3>    <p>re.IGNORECASE很简单，就是在匹配的过程中忽略大小写，就不单独举例了。</p>    <p>re.VERBOSE主要解决的是复杂的正则表达式可读性差的问题。使用re.VERBOSE之后，正则表达式字符串中可以使用空格、换行等空白字符隔开各个子部分，增强可读性。例如，如下的正则表达式，匹配了一个uuid字符串：</p>    <pre>  <code class="language-python">def is_valid_uuid(uuid):      hex_re = r'[ a-f \d ]'      uuid_re = r'''          ^               # beginning of string          {hex} 8     # 8 hexadecimal digits          -               # dash character          {hex} 4     # 4 hexadecimal digits          -               # dash character          {hex} 4     # 4 hexadecimal digits          -               # dash character          {hex} 4     # 4 hexadecimal digits          -               # dash character          {hex} 12    # 12 hexadecimal digits          $               # end of string      '''.format(hex=hex_re)      uuid_regex = (uuid_re)      return bool(re.search(uuid_regex, uuid, re.IGNORECASE | re.VERBOSE))  </code></pre>    <p>这里用{ {8} }是因为format函数中对于{}有特殊含义（占位符），所以这需要转义一次。</p>    <p>至此，对Python正则表达式的介绍就告一段落了。更多的细节，当然首推Python的官方文档。</p>    <p>在使用正则表达式的过程中，经常会出现当时写的爽，过后再看就犯迷糊的情况，这罪魁祸首就是可读性差。虽然借助re.VERBOSE和注释，可以部分缓解这一问题，但是依然不够理想。</p>    <p>前一段时间阅读skynet源码，发现云风在解析skynet config文件时，用到了一个叫lpeg的lua库来进行字符串的模式匹配。lpeg相比于裸正则表达式的优点在于，它可以将一个复杂的模式切分成若干个子部分，并且分别对其命名，然后像拼接字符串一样对各个子模块进行组合，可读性很好。当然，已经有前辈帮我们将其移植到了Python中，有兴趣的读者可以点击 <a href="/misc/goto?guid=4959740645129408180" rel="nofollow,noindex">这里</a> 玩玩。</p>    <p> </p>    <p> </p>    <p>来自：http://blog.guoyb.com/2017/03/06/python-regex-4/</p>    <p> </p>
Python 正则表达式之四：re 模块

相关经验

目录