《Python标准库》-迷你书


华章专业开发者丛书 Python 标准库 The Python Standard Library by Example (美) Doug Hellmann 著 刘炽 等译 本书由资深 Python 专家亲自执笔,Python 语言的核心开发人员作序推荐,权威性毋庸置疑。 对于程序员而言,标准库与语言本身同样重要,它好比一个百宝箱,能为各种常见的任务提供完美的 解决方案,所以本书是所有 Python 程序员都必备的工具书!本书以案例驱动的方式讲解了标准库中一百 多个模块的使用方法(如何工作)和工作原理(为什么要这样工作),比标准库的官方文档更容易理解 (一个简单的示例比一份手册文档更有帮助),为 Python 程序员熟练掌握和使用这些模块提供了绝佳指导。 全书一共 19 章,系统而全面地对 Python 标准库中的一百多个模块进行了生动的讲解。这些模块主要 包括:文本处理工具模块、与数据结构相关的模块、与算法有关的模块、管理日期和时间值的模块、用于 数学计算的模块、管理文件系统的模块、用于数据存储与交换的模块、用于数据压缩与归档的模块、用于 加密的模块、与进程和线程相关的模块、与网络通信和 Email 相关的模块、构建模块、支持处理多种自然 语言和文化设置的模块、开发工具模块、与运行时特性相关的模块,等等。 Authorized translation from the English language edition, entitled The Python Standard Library by Example, 9780321767349 by Doug Hellman, published by Pearson Education, Inc., Copyright © 2011 Pearson Education, Inc. All rights reserved. No part of this book may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopying, recording or by any information storage retrieval system, without permission from Pearson Education, Inc. CHINESE SIMPLIFIED language edition published by PEARSON EDUCATION ASIA LTD., and CHINA MACHINE PRESS Copyright © 2012. 本书封底贴有 Pearson Education(培生教育出版集团)激光防伪标签,无标签者不得销售。 封底无防伪标均为盗版 版权所有,侵权必究 本书法律顾问 北京市展达律师事务所 本书版权登记号:图字:01-2011-4808 图书在版编目(CIP)数据 Python 标准库 /(美)荷尔曼(Hellmann, D.)著;刘炽等译 . —北京:机械工业出版社,2012.5 书名原文:The Python Standard Library by Example ISBN 978-7-111-37810-5 Ⅰ . P… Ⅱ . ①荷… ②刘… Ⅲ . 软件工具 – 程序设计 Ⅳ . TP311.56 中国版本图书馆 CIP 数据核字(2012)第 053254 号 机械工业出版社(北京市西城区百万庄大街 22 号 邮政编码 100037) 责任编辑:关 敏 印刷 2012 年 5 月第 1 版第 1 次印刷 186mm×240mm • 65 印张 标准书号:ISBN 978-7-111-37810-5 定价:139.00 元 凡购本书,如有缺页、倒页、脱页,由本社发行部调换 客服热线:(010)88378991;88361066 购书热线:(010)68326294;88379649;68995259 投稿热线:(010)88379604 读者信箱:hzjsj@hzbook.com 译 者 序 Python 的设计哲学是“优雅”、“明确”、“简单”。因此,Python 开发者的哲学是“用一种 方法,最好是只有一种方法来做一件事”。不仅如此,Python 设计为一种可扩展的语言,并非 所有的特性和功能都集成到语言核心。随 Python 附带安装有 Python 标准库,标准库包含大量 极其有用的模块。熟悉 Python 标准库十分重要,因为如果熟悉这些库中的模块,那么大多数问 题都可以简单快捷地使用它们来解决。 标准库包含数百个模块,为常见任务提供了丰富的工具,可以用来作为应用开发的起点。 也许有人发现一个模块正是他想要的,但是不知道如何来使用 ;也许有人为他的任务挑选了不 合适的模块,也可能已经厌倦了一切从头开始。大多时候,一个简短的例子要比一份手册文档 更有帮助。这正是本书的出发点。本书会提供一些精选的例子,展示如何使用这些模块中最常 用的一些特性。相信你在使用 Python 时遇到的问题都能在本书中得到解答。 本书作者从 1.4 版开始就一直在从事 Python 编程方面的工作,他正是享有盛誉的博客系列 “Python Module of the Week”的博主。在这个博客中,他全面研究了标准库的众多模块,利用 实际例子来介绍各个模块如何工作。相信很多人已经从他的博客中受益。为满足人们的迫切需 求,他把这些博客文章进一步整理完善,形成了你手上的这本书。 本书沿袭了博客的叙事风格,Doug Hellmann 通过轻松的方式,让你从具体的例子、具体 的实践中了解技术细节,在知道“怎样做”的同时还能理解“为什么这样做”。相信你已经迫 不及待地想要翻开下一页了,那么,进入 Doug 的 Python 世界吧! 本书由刘炽、苏金国、李璜、杨健康等主译,乔会东、仝磊、王少轩、程芳、宋旭民、黄 小钰等分别对全书各章进行审阅,另外姚曜、程龙、吴忠望、张练达、陈峰、江健、姚勇、卢 鋆、张莹参与了全书的修改整理,林琪、刘亮、刘跃邦、高强和王志淋统一了全书术语,并完 善了关键部分的翻译。由于水平有限,译文肯定有不当之处,敬请读者批评指正。 刘炽   序 今天是 2010 年的感恩节。不论人们是否身在美国,在这个节日里,大家都一边品尝丰盛的 食物,一边欣赏橄榄球比赛,有些人可能会出门逛逛。 对我(以及其他很多人)来说,会借此机会回顾一下过去的岁月,想想那些让我们的生活 充满色彩的人和事,向他们致以感谢。当然,我们每天都该这么做,不过专门有一天来表达谢 意有时会让我们想得更深远一些。 现在我坐在这里为本书写序,非常感谢能有机会做这件事,不过我想到的不只是本书的内 容,也不只是作者本人(一个无比热情的社区成员),我所想的是这个主题本身—Python,具 体来讲,还有它的标准库。 当前发布的每版 Python 都包含数百个模块,它们是多年来多位开发人员针对多个主题和 多个任务共同完成的。这些模块涵盖一切,从发送和接收 Email,到 GUI 开发,再到内置的 HTTP 服务器都一应俱全。就其本身而言,标准库的开发和维护是一项极其庞大的工作。如果 没有多年来一直维护它的人们,没有数以千计的人提交补丁、文档和提供反馈,它绝不会成为 今天的模样。 这是一个惊人的成就,在 Python 日益普及的今天(不论是作为语言还是一种生态系统), 标准库已经成为其中不可或缺的重要组成部分。如果没有标准库,没有核心团队和其他人员的 “内含动力”(batteries included)口号,Python 绝对不可能走这么远。它已经被成千上万的人和 公司下载,并已安装在数百万服务器、台式机和其他设备上。 即使没有标准库,Python 仍是一种很不错的语言,在教学、学习和可读性方面有扎实的基 础。基于这些优点,它本身也能发展得足够好。不过标准库把它从一种有趣的体验变成为一个 强大而有效的工具。 每一天,全世界的开发人员都在构建工具和完整的应用,他们所基于的只是核心语言和标 准库。你不仅要能够明确概念,描述汽车是什么(语言),还要得到足够的部件和工具来自行 组装一辆基本的汽车。它可能并不完善,不过可以使你从无到有,这将是一个很好的奖励,会 赋予你巨大的动力。我曾反复对那些骄傲地看着我的人说 :“看看我构建的应用,除了 Python 提供的,其他的工具通通都没有用到!” 不过,标准库也不是完美无缺的,它也有自己的缺陷。由于标准库的规模、广度和它的 “年龄”,毫无疑问有些模块有不同层次的质量、API 简洁性和覆盖性。有些模块存在“特性蔓 延”的问题,或者无法跟进其覆盖领域中的最新进展。通过众多不计酬劳的志愿者的帮助和辛 勤工作,Python 还在继续发展、壮大和改进。 不过,有些人对 Python 还有争议,不仅由于它的缺点,而且因为标准库并不一定构 成其模块涵盖领域中“最顶尖的”解决方案(毕竟,“最佳”是一个不断改变和调整的目 V 标),因而认为应当将它完全舍弃,尽管它还在不断改善。这些人遗漏了一个事实 :标准 库不仅是促使 Python 不断成功的一个重要组成部分,而且尽管存在瑕疵,它还是一个绝 妙的资源。 不过我有意忽略了一个巨大的领域 :文档。标准库的文档很不错,还在继续改进和发展。 由于标准库的庞大规模和广度,相应的文档规模也很惊人。在数以千计的开发人员和用户的努 力下,我们有成百上千页文档,实在让人佩服。每天都有数万人在使用这些文档创建应用— 可能简单到只是一页的脚本,也可能很复杂,比如控制大型机械手的软件。 正是由于文档,我们才会看到本书。所有好的文档和代码都有一个起点 —关于目标 “是什么”以及“将是什么”要有一个核心概念。从这个内核出发,才有了角色(API)和故 事情节(模块)。谈到代码,有时代码会从一个简单的想法开始 :“我想解析一个字符串,查 找一个日期。”不过等结束时,你可能已经查看了数百个单元测试、函数以及你编写的其他代 码,你会坐下来,发现自己构建的东西远远超出了原先的设想。文档也是如此,特别是代码 文档。 在我看来,代码文档中最重要的部分就属于这种例子。要写有关一个 API 中某一部分的描 述,你可能会写上几本书,可以用华丽的文字和经过深思熟虑的用例来描述松耦合的接口。不 过,如果第一次查看这个描述的用户无法将这些华丽的文字、仔细考量的用例和 API 签名结合 在一起,构建出有意义的应用并解决他们的问题,这一切就完全是徒劳。 人们建立重要连接所用的网关也属于这种例子,这些逻辑会从一个抽象概念跳转到具体的 事物。“了解”思想和 API 是一回事 ;知道它如何使用则是另外一回事。如果你不仅想要学到 东西,还希望改善现状,这会很有帮助。 这就把我们重新引回 Python。本书作者 Doug Hellmann 在 2007 年创建了一个名为“Python Module of the Week”的博客。在这个博客中,他全面研究了标准库的众多模块,采用一种“示 例为先”的方式介绍各个模块如何工作以及为什么。从读到它的第一天起,它就成为我除了核 心 Python 文档之外的又一个必访之地。他的作品已经成为我以及 Python 社区其他人不可缺少 的必备资源。 Doug 的文章填补了当前我所看到的 Python 文档的一个重大空白 :对例子的迫切需求。用 一种有效而简单的方式展示如何做以及为什么这么做,这绝非易事。我们已经看到,这也是一 项很重要、很有价值的工作,对人们每一天的工作都有帮助。人们频繁地给我发邮件,告诉 我 :“你看过 Doug 的这个帖子吗?实在太棒了!”或者“为什么这个不能放在核心文档里呢? 它能帮助我了解到底是怎么做的!” 当我听说 Doug 准备花些时间进一步完善他现有的工作,把它变成一本书,让我能把它放 在桌子上反复翻阅,以备急用,我真是太兴奋了。Doug 是一位非凡的技术作者,而且能敏锐地 捕捉到细节。有一整本书专门讲解实际例子,介绍标准库中一百多个模块是如何工作的,而且 是他来写这本书,实在让我欣喜若狂。 所以,我要感谢 Python,感谢标准库(包括它的瑕疵),感谢我拥有的这个活力充沛有时 也存在问题的庞大 Python 社区。我要感谢核心开发小组的辛勤工作,包括过去、现在,还有将 VI 来。我还要感谢这么多社区成员提供的资源、投入的时间和做出的努力,其中 Doug Hellmann 更是卓越的代表,正是这些让这个社区和生态系统如此生机勃勃。 最后,我要感谢本书。继续向它的作者表示敬意,这本书会在未来几年得到充分利用。 Jesse Noller Python 核心开发人员 PSF Board 成员 Nasuni 公司首席工程师 前  言 随每个 Python 版本的发布会同时发布标准库,标准库包含数百个模块,为操作系统、解释 器和互联网之间的交互提供了丰富的工具。所有这些模块都得到充分测试,可以用来作为应用 开发的起点。本书会提供一些精选的例子,展示如何使用这些模块中最常用的一些特性,正是 这些特性使 Python 有了“内含动力”的口号。这些例子均取自流行的“Python Module of the Week (PyMOTW)”博客系列。 本书读者对象 本书的读者应该是中等水平的 Python 程序员,所以尽管书中对所有源代码都做了讨论,但 只有一部分会逐行给出解释。每节会通过源代码和完全独立的示例程序的输出来重点介绍一个 模块的特性。我会尽可能简洁地介绍各个特性,使读者能够把重点放在所展示的模块或函数 上,而不会因支持代码而分心。 熟悉其他语言的有经验的程序员可以从本书了解 Python,不过本书并不是这种语言的入门 读物。研究这些例子时,如果之前对编写 Python 程序有些经验会很有帮助。 有些章节(比如 6.7.9 节和 9.2 节)还需要一些领域特定的知识。这里会提供解释这些例子 所需的基本信息,不过由于标准库中模块涵盖的主题如此宽泛,因此不可能在一本书中全面地 介绍每一个主题。在每个模块的讨论之后,还提供一个推荐资源列表,可以进一步阅读这些资 源,从中了解更多信息。其中包括在线资源、RFC 标准文档以及相关图书。 尽管目前向 Python 3 的过渡正在进行当中,Python 2 仍可能是这几年生产环境中使用的主 要 Python 版本,这是因为存在大量遗留 Python 2 源代码,另外向 Python 3 过渡的速度相当迟 缓。本书中所有例子的源代码都由原来的在线版本做了更新,并用 Python 2.7(这是 Python 2.x 系列的最后一个版本)进行了测试。很多示例程序完全可以在 Python 3 下工作,不过有些例子 涉及的模块已经改名或者已经废弃。 本书组织结构 模块均分组在不同章中介绍,以便查找单个模块来加以引用,并且可以按主题浏览做更深 层次的探讨。相对于 http://docs.python.org 上全面的参考指南,本书可以作为补充,提供完全 可用的示例程序来展示参考指南中介绍的特性。 VIII 下载示例代码 原来的博客文章、本书勘误以及示例代码都可以从作者的网站(http://www.doughellmann. com/books/byexample)下载。 致谢 如果没有大家的贡献和支持,本书绝无可能出现。 1997 年 Dick Wall 让我第一次接触到 Python,那时我们正在 ERDAS 一起合作开发 GIS 软 件。我记得,当我发现这样一门如此简便易用的新工具语言时,立刻就喜欢上了它,还对公司 不让我们用它来完成“实际工作”颇有不满。在接下来的所有工作中我都大量使用了 Python, 我要感谢 Dick,正是因为他,给我以后的软件开发带来许多快乐时光。 Python 核心开发小组创建了一个由语言、工具和库共同构建的健壮的生态系统,这些库在 日益普及,还在不断发现新的应用领域。如果没有他们付出的宝贵时间,没有他们提供的丰富 资源,我们都还得花时间一次又一次地一切从头开始。 正如前言中所说的,本书中的材料最初是一系列博客文章。每篇文章都得到了 Python 社区 成员的审阅和评论,有纠正,有建议,也有问题,这些评论促使我做出修改,这才有了读者手 上这本书。感谢大家每周都花时间来阅读我的博客,谢谢大家的关注。 本书的技术审校人员— Matt Culbreth、Katie Cunningham、Jeff McNeil 和 Keyton Weissinger— 花了大量时间查找示例代码和相关解释中存在的问题。最终的作品远比我靠一人之力得到的结 果强得多。我还得到了 Jesse Noller 对于 multiprocessing 模块以及 Brett Cannon 关于创建定制 导入工具提出的很多建议。 还要特别感谢 Pearson 的编辑和制作人员,感谢大家辛苦的工作和一贯的支持,帮助我明 确本书的目标。 最后,我要感谢我的妻子 Theresa Flynn,她总是能提出最棒的写作建议,在完成本书的整 个过程中,她都一如既往地鼓励我、支持我。当她告诉我,“要知道,某些情况下你要坐下来 把它写出来”,我真的很佩服她能如此洞察一切。下面轮到你了。 关于作者 Doug Hellmann 目前是 Racemi 公司的一位高级开发人员,也是 Python Software Foundation 的信息交流主管。从 1.4 版开始他就一直在做 Python 编程,曾在大量 UNIX 和非 UNIX 平台 上参与项目开发,涉及领域包括地图、医疗新闻播报、金融和数据中心自动化。为《Python Magazine》做了一年普通专栏作家后,他在 2008—2009 年成为这家杂志的主编。自 2007 年以 来,Doug 在他的博客上发表了颇受关注的“Python Module of the Week”系列。他居住在乔治 亚州的 Athens。 目  录 译者序 序 前言 第 1 章 文本 ········································· 1 1.1 string—文本常量和模板 ····················1 1.1.1 函数 ·················································1 1.1.2 模板 ·················································2 1.1.3 高级模板 ··········································4 1.2 textwrap—格式化文本段落 ················6 1.2.1 示例数据 ··········································6 1.2.2 填充段落 ··········································6 1.2.3 去除现有缩进 ···································7 1.2.4 结合 dedent 和 fill ····························7 1.2.5 悬挂缩进 ··········································8 1.3 re—正则表达式 ·································· 9 1.3.1 查找文本中的模式 ···························9 1.3.2 编译表达式 ····································10 1.3.3 多重匹配 ········································11 1.3.4 模式语法 ········································12 1.3.5 限制搜索 ········································22 1.3.6 用组解析匹配 ·································23 1.3.7 搜索选项 ········································28 1.3.8 前向或后向 ····································36 1.3.9 自引用表达式 ·································40 1.3.10 用模式修改字符串 ·······················44 1.3.11 利用模式拆分 ·······························46 1.4 difflib—比较序列 ·····························49 1.4.1 比较文本体 ····································49 1.4.2 无用数据 ········································51 1.4.3 比较任意类型 ·································53 第 2 章 数据结构 ·································55 2.1 collections—容器数据类型 ··············56 2.1.1 Counter···········································56 2.1.2 defaultdict ······································59 2.1.3 deque ··············································59 2.1.4 namedtuple ·····································63 2.1.5 OrderedDict ····································65 2.2 array—固定类型数据序列 ················66 2.2.1 初始化 ············································67 2.2.2 处理数组 ········································67 2.2.3 数组与文件 ····································68 2.2.4 候选字节顺序 ·································68 2.3 heapq—堆排序算法··························69 2.3.1 示例数据 ········································70 2.3.2 创建堆 ············································70 2.3.3 访问堆的内容 ·································72 2.3.4 堆的数据极值 ·································73 2.4 bisect—维护有序列表 ······················74 2.4.1 有序插入 ········································74 2.4.2 处理重复 ········································75 2.5 Queue—线程安全的 FIFO 实现 ·······76 2.5.1 基本 FIFO 队列 ······························77 2.5.2 LIFO 队列 ······································77 2.5.3 优先队列 ········································78 X 2.5.4 构建一个多线程播客客户程序 ·······79 2.6 struct—二进制数据结构 ···················81 2.6.1 函数与 Struct 类 ·····························81 2.6.2 打包和解包 ····································81 2.6.3 字节序 ············································82 2.6.4 缓冲区 ············································84 2.7 weakref—对象的非永久引用 ···········85 2.7.1 引用 ···············································85 2.7.2 引用回调 ········································86 2.7.3 代理 ···············································87 2.7.4 循环引用 ········································87 2.7.5 缓存对象 ········································92 2.8 copy—复制对象 ·······························94 2.8.1 浅副本 ············································94 2.8.2 深副本 ············································95 2.8.3 定制复制行为 ·································96 2.8.4 深副本中的递归 ·····························96 2.9 pprint—美观打印数据结构 ··············98 2.9.1 打印 ···············································99 2.9.2 格式化 ············································99 2.9.3 任意类 ··········································100 2.9.4 递归 ·············································101 2.9.5 限制嵌套输出 ·······························101 2.9.6 控制输出宽度 ·······························101 第 3 章 算法 ······································103 3.1 functools—管理函数的工具 ···········103 3.1.1 修饰符 ··········································103 3.1.2 比较 ·············································111 3.2 itertools—迭代器函数 ····················114 3.2.1 合并和分解迭代器 ·······················114 3.2.2 转换输入 ······································116 3.2.3 生成新值 ······································117 3.2.4 过滤 ·············································119 3.2.5 数据分组 ······································121 3.3  operator—内置操作符的函数 接口 ···················································123 3.3.1 逻辑操作 ······································123 3.3.2 比较操作符 ··································124 3.3.3 算术操作符 ··································124 3.3.4 序列操作符 ··································126 3.3.5 原地操作符 ··································127 3.3.6 属性和元素“获取方法” ··············128 3.3.7 结合操作符和定制类 ····················129 3.3.8 类型检查 ······································130 3.4 contextlib—上下文管理器工具 ······131 3.4.1 上下文管理器 API ························131 3.4.2 从生成器到上下文管理器 ············134 3.4.3 嵌套上下文 ··································135 3.4.4 关闭打开的句柄 ···························136 第 4 章 日期和时间 ···························138 4.1 time—时钟时间 ·····························138 4.1.1 壁挂钟时间 ··································138 4.1.2 处理器时钟时间 ···························139 4.1.3 时间组成 ······································140 4.1.4 处理时区 ······································141 4.1.5 解析和格式化时间 ·······················143 4.2 datetime—日期和时间值管理 ········144 4.2.1 时间 ·············································144 4.2.2 日期 ·············································145 4.2.3 timedelta·······································147 4.2.4 日期算术运算 ·······························148 4.2.5 比较值 ··········································149 4.2.6 结合日期和时间 ···························150 4.2.7 格式化和解析 ·······························151 XI 4.2.8 时区 ·············································151 4.3 calendar—处理日期 ·······················152 4.3.1 格式化示例 ··································152 4.3.2 计算日期 ······································155 第 5 章 数学计算 ·······························157 5.1  decimal—定点数和浮点数的数学 运算 ···················································157 5.1.1 Decimal ········································157 5.1.2 算术运算 ······································158 5.1.3 特殊值 ··········································160 5.1.4 上下文 ··········································160 5.2 fractions—有理数 ··························165 5.2.1 创建 Fraction 实例 ·······················165 5.2.2 算术运算 ······································167 5.2.3 近似值 ··········································168 5.3 random—伪随机数生成器 ··············168 5.3.1 生成随机数 ··································168 5.3.2 指定种子 ······································169 5.3.3 保存状态 ······································170 5.3.4 随机整数 ······································171 5.3.5 选择随机元素 ·······························172 5.3.6 排列 ·············································172 5.3.7 采样 ·············································174 5.3.8 多个并发生成器 ···························175 5.3.9 SystemRandom ·····························176 5.3.10 非均匀分布·································177 5.4 math—数学函数 ·····························178 5.4.1 特殊常量 ······································178 5.4.2 测试异常值 ··································179 5.4.3 转换为整数 ··································180 5.4.4 其他表示 ······································181 5.4.5 正号和负号 ··································183 5.4.6 常用计算 ······································184 5.4.7 指数和对数 ··································186 5.4.8 角 ·················································190 5.4.9 三角函数 ······································191 5.4.10 双曲函数 ····································194 5.4.11 特殊函数 ····································195 第 6 章 文件系统 ·······························197 6.1 os.path—平台独立的文件名管理 ···198 6.1.1 解析路径 ······································198 6.1.2 建立路径 ······································200 6.1.3 规范化路径 ··································201 6.1.4 文件时间 ······································202 6.1.5 测试文件 ······································203 6.1.6 遍历一个目录树 ···························204 6.2 glob—文件名模式匹配 ··················205 6.2.1 示例数据 ······································205 6.2.2 通配符 ··········································206 6.2.3 单字符通配符 ·······························207 6.2.4 字符区间 ······································207 6.3 linecache—高效读取文本文件 ·······208 6.3.1 测试数据 ······································208 6.3.2 读取特定行 ··································209 6.3.3 处理空行 ······································209 6.3.4 错误处理 ······································210 6.3.5 读取 Python 源文件 ······················210 6.4 tempfile—临时文件系统对象 ·········211 6.4.1 临时文件 ······································211 6.4.2 命名文件 ······································213 6.4.3 临时目录 ······································214 6.4.4 预测名 ··········································214 6.4.5 临时文件位置 ·······························215 6.5 shutil—高级文件操作 ····················216 XII 6.5.1 复制文件 ······································216 6.5.2 复制文件元数据 ···························218 6.5.3 处理目录树 ··································220 6.6 mmap—内存映射文件 ····················222 6.6.1 读文件 ··········································223 6.6.2 写文件 ··········································223 6.6.3 正则表达式 ··································225 6.7 codecs—字符串编码和解码 ···········226 6.7.1 Unicode 入门 ································226 6.7.2 处理文件 ······································228 6.7.3 字节序 ··········································230 6.7.4 错误处理 ······································232 6.7.5 标准输入和输出流 ·······················235 6.7.6 编码转换 ······································238 6.7.7 非 Unicode 编码 ···························239 6.7.8 增量编码 ······································240 6.7.9 Unicode 数据和网络通信 ·············242 6.7.10 定义定制编码 ·····························245 6.8  StringIO—提供类文件 API 的文本 缓冲区 ···············································251 6.9 fnmatch—UNIX 式 glob 模式匹配 ···252 6.9.1 简单匹配 ······································252 6.9.2 过滤 ·············································253 6.9.3 转换模式 ······································254 6.10 dircache—缓存目录列表 ··············254 6.10.1 列出目录内容 ·····························255 6.10.2 标注列表 ····································256 6.11 filecmp—比较文件 ·······················257 6.11.1 示例数据 ····································258 6.11.2 比较文件 ···································· 260 6.11.3 比较目录 ····································261 6.11.4 程序中使用差异 ·························262 第 7 章 数据持久存储与交换 ·············267 7.1 pickle—对象串行化 ·······················268 7.1.1 导入 ·············································268 7.1.2 编码和解码字符串数据 ················268 7.1.3 处理流 ··········································269 7.1.4 重构对象的问题 ···························271 7.1.5 不可 pickle 的对象 ·······················272 7.1.6 循环引用 ······································273 7.2 shelve—对象持久存储 ···················275 7.2.1  创建一个新 shelf ·························275 7.2.2 写回 ·············································276 7.2.3 特定 shelf 类型 ·····························277 7.3 anydbm—DBM 数据库 ··················278 7.3.1 数据库类型 ··································278 7.3.2 创建一个新数据库 ·······················279 7.3.3 打开一个现有数据库 ····················279 7.3.4 错误情况 ······································280 7.4 whichdb—识别 DBM 数据库格式 ···281 7.5 sqlite3—嵌入式关系数据库 ···········281 7.5.1 创建数据库 ··································282 7.5.2 获取数据 ······································285 7.5.3 查询元数据 ··································286 7.5.4 行对象 ··········································287 7.5.5 查询中使用变量 ···························288 7.5.6 批量加载 ······································290 7.5.7 定义新列类型 ·······························291 7.5.8 确定列类型 ··································294 7.5.9 事务 ·············································296 7.5.10 隔离级别 ····································298 7.5.11 内存中数据库 ·····························302 7.5.12 导出数据库内容 ·························302 7.5.13 SQL 中使用 Python 函数 ············304 XIII 7.5.14 定制聚集 ····································306 7.5.15 定制排序 ····································307 7.5.16 线程和连接共享 ·························308 7.5.17 限制对数据的访问 ·····················309 7.6  xml.etree.ElementTree—XML 操纵 API ····················································311 7.6.1 解析 XML 文档 ····························312 7.6.2 遍历解析树 ··································313 7.6.3 查找文档中的节点 ·······················314 7.6.4 解析节点属性 ·······························315 7.6.5 解析时监视事件 ···························317 7.6.6 创建一个定制树构造器 ················319 7.6.7 解析串 ··········································321 7.6.8 用元素节点构造文档 ····················322 7.6.9 美观打印 XML ·····························323 7.6.10 设置元素属性 ·····························325 7.6.11 由节点列表构造树 ······················327 7.6.12 将 XML 串行化至一个流 ···········329 7.7 csv—逗号分隔值文件 ····················331 7.7.1 读文件 ··········································332 7.7.2 写文件 ··········································332 7.7.3 方言 ·············································334 7.7.4 使用字段名 ··································338 第 8 章 数据压缩与归档 ····················340 8.1 zlib—GNU zlib 压缩 ······················340 8.1.1 处理内存中数据 ···························340 8.1.2 增量压缩与解压缩 ·······················341 8.1.3 混合内容流 ··································342 8.1.4 校验和 ··········································343 8.1.5 压缩网络数据 ·······························343 8.2 gzip—读写 GNU Zip 文件 ··············347 8.2.1 写压缩文件 ··································348 8.2.2 读压缩数据 ··································349 8.2.3 处理流 ··········································350 8.3 bz2—bzip2 压缩 ·····························352 8.3.1 内存中一次性操作 ·······················352 8.3.2 增量压缩和解压缩 ·······················354 8.3.3 混合内容流 ··································354 8.3.4 写压缩文件 ··································355 8.3.5 读压缩文件 ··································357 8.3.6 压缩网络数据 ·······························358 8.4 tarfile—Tar 归档访问 ·····················362 8.4.1 测试 Tar 文件 ·······························362 8.4.2 从归档文件读取元数据 ················362 8.4.3 从归档抽取文件 ···························364 8.4.4 创建新归档 ··································365 8.4.5 使用候选归档成员名 ····················366 8.4.6 从非文件源写数据 ·······················366 8.4.7 追加到归档 ··································367 8.4.8 处理压缩归档 ·······························368 8.5 zipfile—ZIP 归档访问 ····················369 8.5.1 测试 ZIP 文件 ·······························369 8.5.2 从归档读取元数据 ·······················369 8.5.3 从归档抽取归档文件 ····················371 8.5.4 创建新归档 ··································371 8.5.5 使用候选归档成员名 ····················373 8.5.6 从非文件源写数据 ·······················373 8.5.7 利用 ZipInfo 实例写 ·····················374 8.5.8 追加到文件 ··································375 8.5.9 Python ZIP 归档 ···························376 8.5.10 限制············································377 第 9 章 加密 ······································378 9.1 hashlib—密码散列 ·························378 9.1.1 示例数据 ······································378 XIV 9.1.2 MD5 示例 ·····································379 9.1.3 SHA1 示例 ···································379 9.1.4 按名创建散列 ·······························379 9.1.5 增量更新 ······································380 9.2 hmac—密码消息签名与验证 ··········381 9.2.1 消息签名 ······································381 9.2.2 SHA 与 MD5 ································382 9.2.3 二进制摘要 ··································383 9.2.4 消息签名的应用 ···························383 第 10 章 进程与线程··························387 10.1 subprocess—创建附加进程 ··········387 10.1.1 运行外部命令 ·····························388 10.1.2 直接处理管道 ·····························391 10.1.3 连接管道段·································393 10.1.4 与其他命令交互 ·························394 10.1.5 进程间传递信号 ·························396 10.2 signal—异步系统事件 ··················400 10.2.1 接收信号 ····································400 10.2.2 获取注册的处理程序 ··················401 10.2.3 发送信号 ····································402 10.2.4 闹铃············································403 10.2.5 忽略信号 ····································403 10.2.6 信号和线程·································404 10.3 threading—管理并发操作 ·············406 10.3.1 Thread 对象 ································406 10.3.2 确定当前线程 ·····························407 10.3.3 守护与非守护线程 ·····················409 10.3.4 列举所有线程 ·····························411 10.3.5 派生线程 ····································412 10.3.6 定时器线程·································414 10.3.7 线程间传送信号 ·························415 10.3.8 控制资源访问 ·····························416 10.3.9 同步线程 ····································421 10.3.10 限制资源的并发访问 ················422 10.3.11 线程特定数据 ···························423 10.4  multiprocessing—像线程一样 管理进程 ··········································425 10.4.1 multiprocessing 基础 ··················426 10.4.2 可导入的目标函数 ·····················427 10.4.3 确定当前进程 ·····························428 10.4.4 守护进程 ····································428 10.4.5 等待进程 ····································430 10.4.6 终止进程 ····································431 10.4.7 进程退出状态 ·····························432 10.4.8 日志············································434 10.4.9 派生进程 ····································435 10.4.10 向进程传递消息 ·······················435 10.4.11 进程间信号传输 ·······················438 10.4.12 控制资源访问 ···························439 10.4.13 同步操作 ··································440 10.4.14 控制资源的并发访问 ················441 10.4.15 管理共享状态 ···························443 10.4.16 共享命名空间 ···························444 10.4.17 进程池 ······································445 10.4.18 实现 MapReduce·······················447 第 11 章 网络通信 ·····························452 11.1 socket—网络通信 ·························452 11.1.1 寻址、协议簇和套接字类型 ·······452 11.1.2 TCP/IP 客户和服务器 ·················460 11.1.3 用户数据报客户和服务器 ···········467 11.1.4 UNIX 域套接字 ··························469 11.1.5 组播 ············································473 11.1.6 发送二进制数据 ·························476 11.1.7 非阻塞通信和超时 ······················478 XV 11.2 select—高效等待 I/O ···················479 11.2.1 使用 select() ·······························479 11.2.2 有超时的非阻塞 I/O ···················484 11.2.3 使用 poll() ··································486 11.2.4 平台特定选项 ·····························490 11.3 SocketServer—创建网络服务器 ···491 11.3.1 服务器类型 ·································491 11.3.2 服务器对象 ·································491 11.3.3 实现服务器 ·································491 11.3.4 请求处理器 ·································492 11.3.5 回应示例 ····································492 11.3.6 线程和进程 ·································497 11.4 asyncore—异步 I/O ······················499 11.4.1 服务器 ········································500 11.4.2 客户 ············································501 11.4.3 事件循环 ····································503 11.4.4 处理其他事件循环 ······················505 11.4.5 处理文件 ····································507 11.5 asynchat—异步协议处理器 ··········508 11.5.1 消息终止符 ·································508 11.5.2 服务器和处理器 ·························508 11.5.3 客户 ············································511 11.5.4 集成 ············································512 第 12 章 Internet ································514 12.1 urlparse—分解 URL ·····················514 12.1.1 解析············································515 12.1.2 反解析 ········································517 12.1.3 连接············································518 12.2  BaseHTTPServer—实现 Web 服务器的基类 ··································519 12.2.1 HTTP GET ·································519 12.2.2 HTTP POST ································521 12.2.3 线程与进程·································522 12.2.4 处理错误 ····································523 12.2.5 设置首部 ····································524 12.3 urllib—网络资源访问 ···················525 12.3.1 利用缓存实现简单获取 ··············526 12.3.2 参数编码 ····································527 12.3.3 路径与 URL ································529 12.4 urllib2—网络资源访问 ·················530 12.4.1 HTTP GET ·································530 12.4.2 参数编码 ····································532 12.4.3 HTTP POST ································533 12.4.4 增加发出首部 ·····························534 12.4.5 从请求提交表单数据 ··················535 12.4.6 上传文件 ····································536 12.4.7 创建定制协议处理器 ··················539 12.5  Base64—用 ASCII 编码二进制 数据 ·················································541 12.5.1 Base64 编码 ·······························541 12.5.2 Base64 解码 ·······························542 12.5.3 URL 安全的变种 ························543 12.5.4 其他编码 ····································543 12.6  robotparser—网络蜘蛛访问 控制 ·················································544 12.6.1 robots.txt ····································545 12.6.2 测试访问权限 ·····························545 12.6.3 长久蜘蛛 ····································546 12.7 Cookie—HTTP Cookie ·················547 12.7.1 创建和设置 Cookie ·····················547 12.7.2 Morsel ········································548 12.7.3 编码值 ········································550 12.7.4 接收和解析 Cookie 首部 ············550 12.7.5 候选输出格式 ·····························551 12.7.6 废弃的类 ····································552 XVI 12.8 uuid—全局惟一标识符 ·················552 12.8.1  UUID 1—IEEE 802 MAC 地址 ············································552 12.8.2 UUID 3 和 5—基于名字的值 ···554 12.8.3 UUID 4—随机值 ····················556 12.8.4 处理 UUID 对象 ·························556 12.9 json—JavaScript 对象记法 ···········557 12.9.1 编码和解码简单数据类型 ··········557 12.9.2 优质输出和紧凑输出 ··················558 12.9.3 编码字典 ····································560 12.9.4 处理定制类型 ·····························561 12.9.5 编码器和解码器类 ·····················563 12.9.6 处理流和文件 ·····························565 12.9.7 混合数据流·································566 12.10  xmlrpclib—XML-RPC 的客户 端库 ···············································567 12.10.1 连接服务器 ·······························568 12.10.2 数据类型 ··································570 12.10.3 传递对象 ··································573 12.10.4 二进制数据 ·······························573 12.10.5 异常处理 ··································575 12.10.6 将调用结合在一个消息中·········575 12.11  SimpleXMLRPCServer—一个 XML-RPC 服务器 ··························577 12.11.1 一个简单的服务器 ····················577 12.11.2 备用 API 名 ······························578 12.11.3 加点的 API 名 ··························579 12.11.4 任意 API 名 ······························580 12.11.5 公布对象的方法 ·······················581 12.11.6 分派调用 ··································583 12.11.7 自省 API ···································584 第 13 章 Email ···································587 13.1 smtplib—简单邮件传输协议客户 ···587 13.1.1 发送 Email 消息 ·························587 13.1.2 认证和加密·································589 13.1.3 验证 Email 地址 ·························592 13.2 smtpd—示例邮件服务器 ··············593 13.2.1 邮件服务器基类 ·························593 13.2.2 调试服务器·································595 13.2.3 代理服务器·································596 13.3 imaplib—IMAP4 客户库 ··············596 13.3.1 变种············································597 13.3.2 连接到服务器 ·····························597 13.3.3 示例配置 ····································598 13.3.4 列出邮箱 ····································599 13.3.5 邮箱状态 ····································601 13.3.6 选择邮箱 ····································602 13.3.7 搜索消息 ····································603 13.3.8 搜索规则 ····································604 13.3.9 获取消息 ····································605 13.3.10 完整消息 ··································608 13.3.11 上传消息 ··································609 13.3.12 移动和复制消息 ·······················611 13.3.13 删除消息 ··································612 13.4 mailbox—管理邮件归档 ···············614 13.4.1 mbox ··········································614 13.4.2 Maildir ·······································616 13.4.3 其他格式 ····································622 第 14 章 应用构建模块 ······················623 14.1 getopt—命令行选项解析 ··············624 14.1.1 函数参数 ····································624 XVII 14.1.2 短格式选项·································624 14.1.3 长格式选项·································625 14.1.4 一个完整的例子 ·························625 14.1.5 缩写长格式选项 ·························627 14.1.6 GNU 选项解析 ···························627 14.1.7 结束参数处理 ·····························629 14.2 optparse—命令行选项解析器 ·······629 14.2.1 创建 OptionParser·······················629 14.2.2 短格式和长格式选项 ··················630 14.2.3 用 getopt 比较·····························631 14.2.4 选项值 ········································632 14.2.5 选项动作 ····································635 14.2.6 帮助消息 ····································639 14.3  argparse—命令行选项和参数 解析 ·················································644 14.3.1 与 optparse 比较 ·························644 14.3.2 建立解析器·································644 14.3.3 定义参数 ····································644 14.3.4 解析命令行·································645 14.3.5 简单示例 ····································645 14.3.6 自动生成的选项 ·························652 14.3.7 解析器组织·································653 14.3.8 高级参数处理 ·····························659 14.4 readline—GNU Readline 库 ··········666 14.4.1 配置············································667 14.4.2 完成文本 ····································668 14.4.3 访问完成缓冲区 ·························670 14.4.4 输入历史 ····································674 14.4.5 hook ···········································676 14.5 getpass—安全密码提示 ················677 14.5.1 示例············································677 14.5.2 无终端使用 getpass ····················678 14.6 cmd—面向行的命令处理器··········679 14.6.1 处理命令 ····································680 14.6.2 命令参数 ····································681 14.6.3 现场帮助 ····································682 14.6.4 自动完成 ····································683 14.6.5 覆盖基类方法 ·····························684 14.6.6 通过属性配置 Cmd ····················686 14.6.7 运行 shell 命令 ···························687 14.6.8 候选输入 ····································688 14.6.9 sys.argv 的命令 ··························689 14.7 shlex—解析 shell 语法 ·················690 14.7.1 加引号的字符串 ·························691 14.7.2 嵌入注释 ····································692 14.7.3 分解············································693 14.7.4 包含其他 Token 源 ·····················693 14.7.5 控制解析器·································694 14.7.6 错误处理 ····································696 14.7.7 POSIX 与非 POSIX 解析 ············697 14.8 ConfigParser—处理配置文件 ·······698 14.8.1 配置文件格式 ·····························699 14.8.2 读取配置文件 ·····························699 14.8.3 访问配置设置 ·····························701 14.8.4 修改设置 ····································705 14.8.5 保存配置文件 ·····························706 14.8.6 选项搜索路径 ·····························707 14.8.7 用接合合并值 ·····························709 14.9  日志—报告状态、错误和信息 消息 ·················································712 14.9.1 应用与库中的日志记录 ··············712 14.9.2 记入文件 ····································712 14.9.3 旋转日志文件 ·····························713 14.9.4 详细级别 ····································714 14.9.5 命名日志记录器实例 ··················715 14.10 fileinput—命令行过滤器框架 ·····716 XVIII 14.10.1 M3U 文件转换为 RSS ··············716 14.10.2 进度元数据 ·······························718 14.10.3 原地过滤 ··································719 14.11 atexit—程序关闭回调 ·················721 14.11.1 示例 ··········································721 14.11.2  什么情况下不调用 atexit 函数 ··········································722 14.11.3 处理异常 ··································724 14.12 sched—定时事件调度器 ·············725 14.12.1 有延迟地运行事件 ····················725 14.12.2 重叠事件 ··································726 14.12.3 事件优先级 ·······························727 14.12.4 取消事件 ··································727 第 15 章 国际化和本地化 ··················729 15.1 gettext—消息编目 ························729 15.1.1 转换工作流概述 ·························729 15.1.2 由源代码创建消息编目 ··············730 15.1.3 运行时查找消息编目 ··················732 15.1.4 复数值 ········································733 15.1.5 应用与模块本地化 ·····················735 15.1.6 切换转换 ····································736 15.2 locale—文化本地化 API ···············736 15.2.1 探查当前本地化环境 ··················737 15.2.2 货币············································742 15.2.3 格式化数字·································742 15.2.4 解析数字 ····································743 15.2.5 日期和时间·································744 第 16 章 开发工具 ·····························745 16.1 pydoc—模块的联机帮助 ··············746 16.1.1 纯文本帮助·································746 16.1.2 HTML 帮助 ································746 16.1.3 交互式帮助·································746 16.2 doctest—通过文档完成测试 ·········747 16.2.1 开始············································747 16.2.2 处理不可预测的输出 ··················748 16.2.3 Traceback ···································752 16.2.4 避开空白符·································753 16.2.5 测试位置 ····································758 16.2.6 外部文档 ····································761 16.2.7 运行测试 ····································763 16.2.8 测试上下文·································766 16.3 unittest—自动测试框架 ················769 16.3.1 基本测试结构 ·····························769 16.3.2 运行测试 ····································770 16.3.3 测试结果 ····································770 16.3.4 断言真值 ····································772 16.3.5 测试相等性·································773 16.3.6 近似相等 ····································774 16.3.7 测试异常 ····································775 16.3.8 测试固件 ····································775 16.3.9 测试套件 ····································776 16.4 traceback—异常和栈轨迹 ············777 16.4.1 支持函数 ····································777 16.4.2 处理异常 ····································777 16.4.3 处理栈 ········································780 16.5 cgitb—详细的 traceback 报告 ·······783 16.5.1 标准 traceback 转储 ····················783 16.5.2 启用详细 traceback ·····················783 16.5.3 traceback 中的局部变量 ·············785 16.5.4 异常属性 ····································787 16.5.5 HTML 输出 ································788 16.5.6 记录 traceback ····························789 16.6 pdb—交互式调试工具 ··················791 XIX 16.6.1 启动调试工具 ·····························791 16.6.2 控制调试工具 ·····························794 16.6.3 断点············································803 16.6.4 改变执行流·································813 16.6.5 用别名定制调试工具 ··················819 16.6.6 保存配置设置 ·····························821 16.7 trace—执行程序流 ·······················822 16.7.1 示例程序 ····································822 16.7.2 跟踪执行 ····································822 16.7.3 代码覆盖 ····································823 16.7.4 调用关系 ····································825 16.7.5 编程接口 ····································826 16.7.6 保存结果数据 ·····························828 16.7.7 选项············································829 16.8 profile 和 pstats—性能分析 ··········830 16.8.1 运行性能分析工具 ·····················830 16.8.2 在上下文中运行 ·························832 16.8.3 pstats :保存和处理统计信息 ·····833 16.8.4 限制报告内容 ·····························835 16.8.5 调用图 ········································836 16.9  timeit—测量小段 Python 代码的 执行时间 ··········································837 16.9.1 模块内容 ····································837 16.9.2 基本示例 ····································837 16.9.3 值存储在字典中 ·························838 16.9.4 从命令行执行 ·····························840 16.10 compileall—字节编译源文件 ·····841 16.10.1 编译一个目录 ···························842 16.10.2 编译 sys.path ····························842 16.10.3 从命令行执行 ···························843 16.11 pyclbr—类浏览器 ·······················843 16.11.1 扫描类 ······································845 16.11.2 扫描函数 ··································846 第 17 章 运行时特性··························847 17.1 site—全站点配置 ·························847 17.1.1 导入路径 ····································847 17.1.2 用户目录 ····································849 17.1.3 路径配置文件 ·····························850 17.1.4 定制站点配置 ·····························852 17.1.5 定制用户配置 ·····························853 17.1.6 禁用 site 模块 ·····························854 17.2 sys—系统特定的配置 ··················854 17.2.1 解释器设置·································855 17.2.2 运行时环境·································860 17.2.3 内存管理和限制 ·························862 17.2.4 异常处理 ····································867 17.2.5 底层线程支持 ·····························869 17.2.6 模块和导入·································875 17.2.7 跟踪程序运行情况 ·····················892 17.3  os—可移植访问操作系统特定 特性 ·················································898 17.3.1 进程所有者·································898 17.3.2 进程环境 ····································900 17.3.3 进程工作目录 ·····························901 17.3.4 管道············································901 17.3.5 文件描述符·································905 17.3.6 文件系统权限 ·····························905 17.3.7 目录············································906 17.3.8 符号链接 ····································907 17.3.9 遍历目录树·································907 17.3.10 运行外部命令 ···························909 17.3.11 用 os.fork() 创建进程 ················910 17.3.12 等待子进程 ·······························911 17.3.13 Spawn ·······································913 17.3.14 文件系统权限 ···························913 XX 17.4 platform—系统版本信息 ··············914 17.4.1 解释器 ········································915 17.4.2 平台············································916 17.4.3 操作系统和硬件信息 ··················916 17.4.4 可执行程序体系结构 ··················918 17.5 resource—系统资源管理 ··············918 17.5.1 当前使用情况 ·····························919 17.5.2 资源限制 ····································919 17.6 gc—垃圾回收器 ···························922 17.6.1 跟踪引用 ····································922 17.6.2 强制垃圾回收 ·····························925 17.6.3 查找无法收集的对象引用 ··········928 17.6.4 回收阈限和代 ·····························931 17.6.5 调试············································933 17.7 sysconfig—解释器编译时配置 ·····940 17.7.1 配置变量 ····································940 17.7.2 安装路径 ····································942 17.7.3 Python 版本和平台 ·····················945 第 18 章 语言工具 ·····························947 18.1 warnings—非致命警告 ·················947 18.1.1 分类和过滤·································948 18.1.2 生成警告 ····································948 18.1.3 用模式过滤·································949 18.1.4 重复的警告·································951 18.1.5 候选消息传送函数 ·····················951 18.1.6 格式化 ········································952 18.1.7 警告中的栈层次 ·························952 18.2 abc—抽象基类 ·····························953 18.2.1 为什么使用抽象基类 ··················953 18.2.2 抽象基类如何工作 ·····················954 18.2.3 注册一个具体类 ·························954 18.2.4 通过派生实现 ·····························955 18.2.5 abc 中的具体方法 ·······················956 18.2.6 抽象属性 ····································957 18.3  dis—Python 字节码反汇编 工具 ·················································960 18.3.1 基本反汇编·································961 18.3.2 反汇编函数·································961 18.3.3 类 ···············································963 18.3.4 使用反汇编进行调试 ··················963 18.3.5 循环的性能分析 ························· 965 18.3.6 编译器优化································· 970 18.4 inspect—检查现场对象 ················972 18.4.1 示例模块 ····································972 18.4.2 模块信息 ····································973 18.4.3 检查模块 ····································974 18.4.4 检查类 ········································975 18.4.5 文档串 ·······································976 18.4.6 获取源代码·································977 18.4.7 方法和函数参数 ·························979 18.4.8 类层次结构·································980 18.4.9 方法解析顺序 ·····························981 18.4.10 栈与帧 ······································982 18.5 exceptions—内置异常类 ··············984 18.5.1 基类············································985 18.5.2 产生的异常·································985 18.5.3 警告类型 ····································998 第 19 章 模块与包 ·····························999 19.1 imp—Python 的导入机制 ·············999 19.1.1 示例包 ········································999 19.1.2 模块类型 ····································999 19.1.3 查找模块 ··································1000 19.1.4 加载模块 ··································1001 19.2  zipimport—从 ZIP 归档加载 XXI Python 代码 ····································1003 19.2.1 示例··········································1003 19.2.2 查找模块 ··································1004 19.2.3 访问代码 ··································1004 19.2.4 源代码 ······································1005 19.2.5 包 ·············································1006 19.2.6 数据··········································1006 19.3 pkgutil—包工具 ·························1008 19.3.1 包导入路径·······························1008 19.3.2 包的开发版本 ···························1010 19.3.3 用 PKG 文件管理路径 ··············1011 19.3.4 嵌套包 ······································1013 19.3.5 包数据 ······································1014 第 1 章 文  本 对 Python 程序员来说,最显而易见的文本处理工具就是 string 类,不过除此以外,标准库 中还提供了大量其他工具,可以帮你轻松地完成高级文本处理。 用 Python 2.0 之前版本编写的老代码使用的是 string 模块的函数,而不是 string 对象的方 法。对应这个模块中的每一个函数都有一个等价的方法,新代码已经不再使用那些函数。 使用 Python 2.4 或以后版本的程序可能会使用 string.Template 作为一个简便方法,除了具 备 string 或 unicode 类的特性,还可以对字符串实现参数化。与很多 Web 框架定义的模板或 Python Package Index 提供的扩展模块相比,尽管 string.Template 没有那么丰富的特性,但作为 用户可修改的模板,即需要在静态文本中插入动态值,它确实很好地做到了二者兼顾。 textwrap 模块包括一些工具,可以对从段落中抽取的文本进行格式化,如限制输出的宽度、 增加缩进,以及插入换行符从而能一致地自动换行。 除了 string 对象支持的内置相等性和排序比较之外,标准库还包括两个与比较文本值有关 的模块。re 提供了一个完整的正则表达式库,出于速度原因这个库使用 C 实现。正则表达式非 常适合在较大的数据集中查找子串,能够根据比固定字符串更为复杂的模式比较字符串,还可 以完成一定程度的解析。 另一方面,difflib 则会根据添加、删除或修改的部分来计算不同文本序列之间的具体差别。 difflib 中比较函数的输出可以用来为用户提供更详细的反馈,指出两个输入中出现变化的地方, 一个文档随时间有哪些改变,等等。 1.1 string—文本常量和模板 作用:包含处理文本的常量和类。 Python 版本:1.4 及以后版本 string 模块可以追溯到最早的 Python 版本。到了 2.0 版本,原先仅在这个模块中实现的很 多函数则被移植为 str 和 unicode 对象的方法。仍然在遗留代码中使用这些函数,不过如今这些 函数已经废弃,将在 Python 3.0 中完全去除。string 模块保留了很多有用的常量和类,用来处 理 string 和 unicode 对象,这里就重点讨论这些常量和类。 1.1.1 函数 还有两个函数未从 string 模块移出 :capwords() 和 maketrans()。capwords() 的作用是将一  2  Python 标准库  个字符串中所有单词的首字母大写。 ptg 4 Text 1.1 string—Text Constants and Templates Purpose Contains constants and classes for working with text. Python Version 1.4 and later The string module dates from the earliest versions of Python. In version 2.0, many of the functions previously implemented only in the module were moved to methods of str and unicode objects. Legacy versions of those functions are still available, but their use is deprecated and they will be dropped in Python 3.0. The string module retains several useful constants and classes for working with string and unicode objects, and this discussion will concentrate on them. 1.1.1 Functions The two functions capwords() and maketrans() are not moving from the string module. capwords() capitalizes all words in a string. import string s = ’The quick brown fox jumped over the lazy dog.’ print s print string.capwords(s) The results are the same as calling split(), capitalizing the words in the resulting list, and then calling join() to combine the results. $ python string_capwords.py The quick brown fox jumped over the lazy dog. The Quick Brown Fox Jumped Over The Lazy Dog. The maketrans() function creates translation tables that can be used with the translate() method to change one set of characters to another more efficiently than with repeated calls to replace(). import string leet = string.maketrans(’abegiloprstz’, ’463611092572’) 其结果等同于先调用 split(),这会将结果列表中各个单词的首字母大写,然后调用 join() 合 并结果。 ptg 4 Text 1.1 string—Text Constants and Templates Purpose Contains constants and classes for working with text. Python Version 1.4 and later The string module dates from the earliest versions of Python. In version 2.0, many of the functions previously implemented only in the module were moved to methods of str and unicode objects. Legacy versions of those functions are still available, but their use is deprecated and they will be dropped in Python 3.0. The string module retains several useful constants and classes for working with string and unicode objects, and this discussion will concentrate on them. 1.1.1 Functions The two functions capwords() and maketrans() are not moving from the string module. capwords() capitalizes all words in a string. import string s = ’The quick brown fox jumped over the lazy dog.’ print s print string.capwords(s) The results are the same as calling split(), capitalizing the words in the resulting list, and then calling join() to combine the results. $ python string_capwords.py The quick brown fox jumped over the lazy dog. The Quick Brown Fox Jumped Over The Lazy Dog. The maketrans() function creates translation tables that can be used with the translate() method to change one set of characters to another more efficiently than with repeated calls to replace(). import string leet = string.maketrans(’abegiloprstz’, ’463611092572’) maketrans() 函数将创建转换表,可以用来结合 translate() 方法将一组字符修改为另一组字 符,这种做法比反复调用 replace() 更为高效。 ptg 4 Text 1.1 string—Text Constants and Templates Purpose Contains constants and classes for working with text. Python Version 1.4 and later The string module dates from the earliest versions of Python. In version 2.0, many of the functions previously implemented only in the module were moved to methods of str and unicode objects. Legacy versions of those functions are still available, but their use is deprecated and they will be dropped in Python 3.0. The string module retains several useful constants and classes for working with string and unicode objects, and this discussion will concentrate on them. 1.1.1 Functions The two functions capwords() and maketrans() are not moving from the string module. capwords() capitalizes all words in a string. import string s = ’The quick brown fox jumped over the lazy dog.’ print s print string.capwords(s) The results are the same as calling split(), capitalizing the words in the resulting list, and then calling join() to combine the results. $ python string_capwords.py The quick brown fox jumped over the lazy dog. The Quick Brown Fox Jumped Over The Lazy Dog. The maketrans() function creates translation tables that can be used with the translate() method to change one set of characters to another more efficiently than with repeated calls to replace(). import string leet = string.maketrans(’abegiloprstz’, ’463611092572’) ptg 1.1. string—Text Constants and Templates 5 s = ’The quick brown fox jumped over the lazy dog.’ print s print s.translate(leet) In this example, some letters are replaced by their l33t number alternatives. $ python string_maketrans.py The quick brown fox jumped over the lazy dog. Th3 qu1ck 620wn f0x jum93d 0v32 7h3 142y d06. 1.1.2 Templates String templates were added in Python 2.4 as part of PEP 292 and are intended as an alternative to the built-in interpolation syntax. With string.Template interpolation, variables are identified by prefixing the name with $ (e.g., $var) or, if necessary to set them off from surrounding text, they can also be wrapped with curly braces (e.g., ${var}). This example compares a simple template with a similar string interpolation using the % operator. import string values = { ’var’:’foo’ } t = string.Template(""" Variable : $var Escape : $$ Variable in text: ${var}iable """) print ’TEMPLATE:’, t.substitute(values) s = """ Variable : %(var)s Escape : %% Variable in text: %(var)siable """ print ’INTERPOLATION:’, s % values 在这个例子中,一些字母被替换为相应的“火星文”数字 。1 ptg 1.1. string—Text Constants and Templates 5 s = ’The quick brown fox jumped over the lazy dog.’ print s print s.translate(leet) In this example, some letters are replaced by their l33t number alternatives. $ python string_maketrans.py The quick brown fox jumped over the lazy dog. Th3 qu1ck 620wn f0x jum93d 0v32 7h3 142y d06. 1.1.2 Templates String templates were added in Python 2.4 as part of PEP 292 and are intended as an alternative to the built-in interpolation syntax. With string.Template interpolation, variables are identified by prefixing the name with $ (e.g., $var) or, if necessary to set them off from surrounding text, they can also be wrapped with curly braces (e.g., ${var}). This example compares a simple template with a similar string interpolation using the % operator. import string values = { ’var’:’foo’ } t = string.Template(""" Variable : $var Escape : $$ Variable in text: ${var}iable """) print ’TEMPLATE:’, t.substitute(values) s = """ Variable : %(var)s Escape : %% Variable in text: %(var)siable """ print ’INTERPOLATION:’, s % values 1.1.2 模板 字符串模板已经作为 PEP 292 的一部分增加到 Python 2.4 中,并得到扩展,成为替代内置 拼接(interpolation) 语法的一种候选方法。使用 string.Template 拼接时,可以在变量名前面 加上前缀 $(如 $var)来标识变量,或者如果需要与两侧的文本相区分,还可以用大括号将变 量括起(如 ${var})。2   Leet(L337、3L337、31337、leetspeak、eleet、Leetors、L3370rz 或 1337),火星文,又称黑客语,是指一种 发源于欧美地区的 BBS、线上游戏和黑客社群所使用的文字书写方式。通常是把拉丁字母转变成数字或是特 殊符号,例如 E 写成 3、A 写成 @ 等。或将单字写成同音的字母或数字,如 to 写成 2、for 写成 4 等。—译 者注 interpolation 也称为“连接”、“插补”或“替换”,是指在文本部分插入变量或表达式的值来拼接字符 串。—译者注 第 1 章 文  本 3  下面的例子对一个简单的模板和一个使用 % 操作符的类似字符串拼接进行了比较。 ptg 1.1. string—Text Constants and Templates 5 s = ’The quick brown fox jumped over the lazy dog.’ print s print s.translate(leet) In this example, some letters are replaced by their l33t number alternatives. $ python string_maketrans.py The quick brown fox jumped over the lazy dog. Th3 qu1ck 620wn f0x jum93d 0v32 7h3 142y d06. 1.1.2 Templates String templates were added in Python 2.4 as part of PEP 292 and are intended as an alternative to the built-in interpolation syntax. With string.Template interpolation, variables are identified by prefixing the name with $ (e.g., $var) or, if necessary to set them off from surrounding text, they can also be wrapped with curly braces (e.g., ${var}). This example compares a simple template with a similar string interpolation using the % operator. import string values = { ’var’:’foo’ } t = string.Template(""" Variable : $var Escape : $$ Variable in text: ${var}iable """) print ’TEMPLATE:’, t.substitute(values) s = """ Variable : %(var)s Escape : %% Variable in text: %(var)siable """ print ’INTERPOLATION:’, s % values 在这两种情况下,触发器字符($ 或 %)都要写两次来完成转义。 ptg 6 Text In both cases, the trigger character ($ or %) is escaped by repeating it twice. $ python string_template.py TEMPLATE: Variable : foo Escape : $ Variable in text: fooiable INTERPOLATION: Variable : foo Escape : % Variable in text: fooiable One key difference between templates and standard string interpolation is that the argument type is not considered. The values are converted to strings, and the strings are inserted into the result. No formatting options are available. For exam- ple, there is no way to control the number of digits used to represent a floating-point value. A benefit, though, is that by using the safe_substitute() method, it is possible to avoid exceptions if not all values the template needs are provided as arguments. import string values = { ’var’:’foo’ } t = string.Template("$var is here but $missing is not provided") try: print ’substitute() :’, t.substitute(values) except KeyError, err: print ’ERROR:’, str(err) print ’safe_substitute():’, t.safe_substitute(values) Since there is no value for missing in the values dictionary, a KeyError is raised by substitute(). Instead of raising the error, safe_substitute() catches it and leaves the variable expression alone in the text. $ python string_template_missing.py 模板与标准字符串拼接有一个重要区别,即模板不考虑参数类型。值会转换为字符串,再 将字符串插入到结果中。这里没有提供格式化选项。例如,没有办法控制使用几位有效数字来 表示一个浮点数值。 不过,这也有一个好处 :通过使用 safe_substitute() 方法,可以避免未能提供模板所需全部 参数值时可能产生的异常。 ptg 6 Text In both cases, the trigger character ($ or %) is escaped by repeating it twice. $ python string_template.py TEMPLATE: Variable : foo Escape : $ Variable in text: fooiable INTERPOLATION: Variable : foo Escape : % Variable in text: fooiable One key difference between templates and standard string interpolation is that the argument type is not considered. The values are converted to strings, and the strings are inserted into the result. No formatting options are available. For exam- ple, there is no way to control the number of digits used to represent a floating-point value. A benefit, though, is that by using the safe_substitute() method, it is possible to avoid exceptions if not all values the template needs are provided as arguments. import string values = { ’var’:’foo’ } t = string.Template("$var is here but $missing is not provided") try: print ’substitute() :’, t.substitute(values) except KeyError, err: print ’ERROR:’, str(err) print ’safe_substitute():’, t.safe_substitute(values) Since there is no value for missing in the values dictionary, a KeyError is raised by substitute(). Instead of raising the error, safe_substitute() catches it and leaves the variable expression alone in the text. $ python string_template_missing.py  4  Python 标准库  ptg 6 Text In both cases, the trigger character ($ or %) is escaped by repeating it twice. $ python string_template.py TEMPLATE: Variable : foo Escape : $ Variable in text: fooiable INTERPOLATION: Variable : foo Escape : % Variable in text: fooiable One key difference between templates and standard string interpolation is that the argument type is not considered. The values are converted to strings, and the strings are inserted into the result. No formatting options are available. For exam- ple, there is no way to control the number of digits used to represent a floating-point value. A benefit, though, is that by using the safe_substitute() method, it is possible to avoid exceptions if not all values the template needs are provided as arguments. import string values = { ’var’:’foo’ } t = string.Template("$var is here but $missing is not provided") try: print ’substitute() :’, t.substitute(values) except KeyError, err: print ’ERROR:’, str(err) print ’safe_substitute():’, t.safe_substitute(values) Since there is no value for missing in the values dictionary, a KeyError is raised by substitute(). Instead of raising the error, safe_substitute() catches it and leaves the variable expression alone in the text. $ python string_template_missing.py 由于 values 字典中没有对应 missing 的值,因此 substitute() 会产生一个 KeyError。不过, safe_substitute() 不会抛出这个错误,它将捕获这个异常,并在文本中保留变量表达式。 ptg 6 Text In both cases, the trigger character ($ or %) is escaped by repeating it twice. $ python string_template.py TEMPLATE: Variable : foo Escape : $ Variable in text: fooiable INTERPOLATION: Variable : foo Escape : % Variable in text: fooiable One key difference between templates and standard string interpolation is that the argument type is not considered. The values are converted to strings, and the strings are inserted into the result. No formatting options are available. For exam- ple, there is no way to control the number of digits used to represent a floating-point value. A benefit, though, is that by using the safe_substitute() method, it is possible to avoid exceptions if not all values the template needs are provided as arguments. import string values = { ’var’:’foo’ } t = string.Template("$var is here but $missing is not provided") try: print ’substitute() :’, t.substitute(values) except KeyError, err: print ’ERROR:’, str(err) print ’safe_substitute():’, t.safe_substitute(values) Since there is no value for missing in the values dictionary, a KeyError is raised by substitute(). Instead of raising the error, safe_substitute() catches it and leaves the variable expression alone in the text. $ python string_template_missing.py ptg 1.1. string—Text Constants and Templates 7 substitute() : ERROR: ’missing’ safe_substitute(): foo is here but $missing is not provided 1.1.3 Advanced Templates The default syntax for string.Template can be changed by adjusting the regular expression patterns it uses to find the variable names in the template body. A simple way to do that is to change the delimiter and idpattern class attributes. import string template_text = ’’’ Delimiter : %% Replaced : %with_underscore Ignored : %notunderscored ’’’ d = { ’with_underscore’:’replaced’, ’notunderscored’:’not replaced’, } class MyTemplate(string.Template): delimiter = ’%’ idpattern = ’[a-z]+_[a-z]+’ t = MyTemplate(template_text) print ’Modified ID pattern:’ print t.safe_substitute(d) In this example, the substitution rules are changed so that the delimiter is % instead of $ and variable names must include an underscore. The pattern %notunderscored is not replaced by anything because it does not include an underscore character. $ python string_template_advanced.py Modified ID pattern: Delimiter : % Replaced : replaced Ignored : %notunderscored 1.1.3 高级模板 可以修改 string.Template 的默认语法,为此要调整它在模板体中查找变量名所使用的正则 表达式模式。一种简单的做法是修改 delimiter 和 idpattern 类属性。 ptg 1.1. string—Text Constants and Templates 7 substitute() : ERROR: ’missing’ safe_substitute(): foo is here but $missing is not provided 1.1.3 Advanced Templates The default syntax for string.Template can be changed by adjusting the regular expression patterns it uses to find the variable names in the template body. A simple way to do that is to change the delimiter and idpattern class attributes. import string template_text = ’’’ Delimiter : %% Replaced : %with_underscore Ignored : %notunderscored ’’’ d = { ’with_underscore’:’replaced’, ’notunderscored’:’not replaced’, } class MyTemplate(string.Template): delimiter = ’%’ idpattern = ’[a-z]+_[a-z]+’ t = MyTemplate(template_text) print ’Modified ID pattern:’ print t.safe_substitute(d) In this example, the substitution rules are changed so that the delimiter is % instead of $ and variable names must include an underscore. The pattern %notunderscored is not replaced by anything because it does not include an underscore character. $ python string_template_advanced.py Modified ID pattern: Delimiter : % Replaced : replaced Ignored : %notunderscored 在这个例子中,替换规则已经改变,定界符是 % 而不是 $,另外变量名必须包含一条下划 线。模式 %notunderscored 未得到替换,因为其中不包含下划线字符。 ptg 1.1. string—Text Constants and Templates 7 substitute() : ERROR: ’missing’ safe_substitute(): foo is here but $missing is not provided 1.1.3 Advanced Templates The default syntax for string.Template can be changed by adjusting the regular expression patterns it uses to find the variable names in the template body. A simple way to do that is to change the delimiter and idpattern class attributes. import string template_text = ’’’ Delimiter : %% Replaced : %with_underscore Ignored : %notunderscored ’’’ d = { ’with_underscore’:’replaced’, ’notunderscored’:’not replaced’, } class MyTemplate(string.Template): delimiter = ’%’ idpattern = ’[a-z]+_[a-z]+’ t = MyTemplate(template_text) print ’Modified ID pattern:’ print t.safe_substitute(d) In this example, the substitution rules are changed so that the delimiter is % instead of $ and variable names must include an underscore. The pattern %notunderscored is not replaced by anything because it does not include an underscore character. $ python string_template_advanced.py Modified ID pattern: Delimiter : % Replaced : replaced Ignored : %notunderscored 第 1 章 文  本 5  要完成更复杂的修改,可以覆盖 pattern 属性,定义一个全新的正则表达式。所提供的模式 必须包含 4 个命名组,分别对应转义定界符、命名变量、用大括号括住的变量名,以及不合法 的定界符模式。 ptg 8 Text For more complex changes, override the pattern attribute and define an entirely new regular expression. The pattern provided must contain four named groups for cap- turing the escaped delimiter, the named variable, a braced version of the variable name, and any invalid delimiter patterns. import string t = string.Template(’$var’) print t.pattern.pattern The value of t.pattern is a compiled regular expression, but the original string is available via its pattern attribute. \$(?: (?P\$) | # two delimiters (?P[_a-z][_a-z0-9]*) | # identifier {(?P[_a-z][_a-z0-9]*)} | # braced identifier (?P) # ill-formed delimiter exprs ) This example defines a new pattern to create a new type of template using {{var}} as the variable syntax. import re import string class MyTemplate(string.Template): delimiter = ’{{’ pattern = r’’’ \{\{(?: (?P\{\{)| (?P[_a-z][_a-z0-9]*)\}\}| (?P[_a-z][_a-z0-9]*)\}\}| (?P) ) ’’’ t = MyTemplate(’’’ {{{{ {{var}} ’’’) t.pattern 是一个已编译的正则表达式,不过可以通过其 pattern 属性得到原来的字符串表示。 ptg 8 Text For more complex changes, override the pattern attribute and define an entirely new regular expression. The pattern provided must contain four named groups for cap- turing the escaped delimiter, the named variable, a braced version of the variable name, and any invalid delimiter patterns. import string t = string.Template(’$var’) print t.pattern.pattern The value of t.pattern is a compiled regular expression, but the original string is available via its pattern attribute. \$(?: (?P\$) | # two delimiters (?P[_a-z][_a-z0-9]*) | # identifier {(?P[_a-z][_a-z0-9]*)} | # braced identifier (?P) # ill-formed delimiter exprs ) This example defines a new pattern to create a new type of template using {{var}} as the variable syntax. import re import string class MyTemplate(string.Template): delimiter = ’{{’ pattern = r’’’ \{\{(?: (?P\{\{)| (?P[_a-z][_a-z0-9]*)\}\}| (?P[_a-z][_a-z0-9]*)\}\}| (?P) ) ’’’ t = MyTemplate(’’’ {{{{ {{var}} ’’’) 下面的例子定义了一个新模式来创建一个新的模板类型:使用 {{var}} 作为变量语法。 ptg 8 Text For more complex changes, override the pattern attribute and define an entirely new regular expression. The pattern provided must contain four named groups for cap- turing the escaped delimiter, the named variable, a braced version of the variable name, and any invalid delimiter patterns. import string t = string.Template(’$var’) print t.pattern.pattern The value of t.pattern is a compiled regular expression, but the original string is available via its pattern attribute. \$(?: (?P\$) | # two delimiters (?P[_a-z][_a-z0-9]*) | # identifier {(?P[_a-z][_a-z0-9]*)} | # braced identifier (?P) # ill-formed delimiter exprs ) This example defines a new pattern to create a new type of template using {{var}} as the variable syntax. import re import string class MyTemplate(string.Template): delimiter = ’{{’ pattern = r’’’ \{\{(?: (?P\{\{)| (?P[_a-z][_a-z0-9]*)\}\}| (?P[_a-z][_a-z0-9]*)\}\}| (?P) ) ’’’ t = MyTemplate(’’’ {{{{ {{var}} ’’’) ptg 1.2. textwrap—Formatting Text Paragraphs 9 print ’MATCHES:’, t.pattern.findall(t.template) print ’SUBSTITUTED:’, t.safe_substitute(var=’replacement’) Both the named and braced patterns must be provided separately, even though they are the same. Running the sample program generates: $ python string_template_newsyntax.py MATCHES: [(’{{’, ’’, ’’, ’’), (’’, ’var’, ’’, ’’)] SUBSTITUTED: {{ replacement See Also: string (http://docs.python.org/lib/module-string.html) Standard library documenta- tion for this module. String Methods (http://docs.python.org/lib/string-methods.html#string-methods) Methods of str objects that replace the deprecated functions in string. PEP 292 (www.python.org/dev/peps/pep-0292) A proposal for a simpler string sub- stitution syntax. l33t (http://en.wikipedia.org/wiki/Leet) “Leetspeak” alternative alphabet. 1.2 textwrap—Formatting Text Paragraphs Purpose Formatting text by adjusting where line breaks occur in a paragraph. Python Version 2.5 and later The textwrap module can be used to format text for output when pretty-printing is desired. It offers programmatic functionality similar to the paragraph wrapping or filling features found in many text editors and word processors. 1.2.1 Example Data The examples in this section use the module textwrap_example.py, which contains a string sample_text. sample_text = ’’’ The textwrap module can be used to format text for output in situations where pretty-printing is desired. It offers named 和 braced 模式必须单独提供,尽管它们实际上是一样的。运行这个示例程序会生成 以下结果: ptg 1.2. textwrap—Formatting Text Paragraphs 9 print ’MATCHES:’, t.pattern.findall(t.template) print ’SUBSTITUTED:’, t.safe_substitute(var=’replacement’) Both the named and braced patterns must be provided separately, even though they are the same. Running the sample program generates: $ python string_template_newsyntax.py MATCHES: [(’{{’, ’’, ’’, ’’), (’’, ’var’, ’’, ’’)] SUBSTITUTED: {{ replacement See Also: string (http://docs.python.org/lib/module-string.html) Standard library documenta- tion for this module. String Methods (http://docs.python.org/lib/string-methods.html#string-methods) Methods of str objects that replace the deprecated functions in string. PEP 292 (www.python.org/dev/peps/pep-0292) A proposal for a simpler string sub- stitution syntax. l33t (http://en.wikipedia.org/wiki/Leet) “Leetspeak” alternative alphabet. 1.2 textwrap—Formatting Text Paragraphs Purpose Formatting text by adjusting where line breaks occur in a paragraph. Python Version 2.5 and later The textwrap module can be used to format text for output when pretty-printing is desired. It offers programmatic functionality similar to the paragraph wrapping or filling features found in many text editors and word processors. 1.2.1 Example Data The examples in this section use the module textwrap_example.py, which contains a string sample_text. sample_text = ’’’ The textwrap module can be used to format text for output in situations where pretty-printing is desired. It offers  6  Python 标准库  参见: string (http://docs.python.org/lib/module-string.html) 这个模块的标准库文档。 String Methods (http://docs.python.org/lib/string-methods.html#string-methods) str 对象的方 法,取代已经废弃的 string 函数。 PEP 292 (www.python.org/dev/peps/pep-0292) 一种更简单的字符串替换语法的提案。 L33t (http://en.wikipedia.org/wiki/Leet) “火星文”字母表。 1.2 textwrap—格式化文本段落 作用:通过调整换行符在段落中出现的位置来格式化文本。 Python 版本:2.5 及以后版本 需要美观打印(pretty-printing)时,可以用 textwrap 模块来格式化要输出的文本。这个模 块允许通过编程提供类似段落自动换行或填充特性等功能(很多文本编辑器和字处理器都提供 有这些功能)。 1.2.1 示例数据 本节中的例子使用了模块 textwrap_example.py,其中包含一个字符串 sample_text。 ptg 1.2. textwrap—Formatting Text Paragraphs 9 print ’MATCHES:’, t.pattern.findall(t.template) print ’SUBSTITUTED:’, t.safe_substitute(var=’replacement’) Both the named and braced patterns must be provided separately, even though they are the same. Running the sample program generates: $ python string_template_newsyntax.py MATCHES: [(’{{’, ’’, ’’, ’’), (’’, ’var’, ’’, ’’)] SUBSTITUTED: {{ replacement See Also: string (http://docs.python.org/lib/module-string.html) Standard library documenta- tion for this module. String Methods (http://docs.python.org/lib/string-methods.html#string-methods) Methods of str objects that replace the deprecated functions in string. PEP 292 (www.python.org/dev/peps/pep-0292) A proposal for a simpler string sub- stitution syntax. l33t (http://en.wikipedia.org/wiki/Leet) “Leetspeak” alternative alphabet. 1.2 textwrap—Formatting Text Paragraphs Purpose Formatting text by adjusting where line breaks occur in a paragraph. Python Version 2.5 and later The textwrap module can be used to format text for output when pretty-printing is desired. It offers programmatic functionality similar to the paragraph wrapping or filling features found in many text editors and word processors. 1.2.1 Example Data The examples in this section use the module textwrap_example.py, which contains a string sample_text. sample_text = ’’’ The textwrap module can be used to format text for output in situations where pretty-printing is desired. It offers ptg 10 Text programmatic functionality similar to the paragraph wrapping or filling features found in many text editors. ’’’ 1.2.2 Filling Paragraphs The fill() function takes text as input and produces formatted text as output. import textwrap from textwrap_example import sample_text print ’No dedent:\n’ print textwrap.fill(sample_text, width=50) The results are something less than desirable. The text is now left justified, but the first line retains its indent and the spaces from the front of each subsequent line are embedded in the paragraph. $ python textwrap_fill.py No dedent: The textwrap module can be used to format text for output in situations where pretty- printing is desired. It offers programmatic functionality similar to the paragraph wrapping or filling features found in many text editors. 1.2.3 Removing Existing Indentation The previous example has embedded tabs and extra spaces mixed into the output, so it is not formatted very cleanly. Removing the common whitespace prefix from all lines in the sample text produces better results and allows the use of docstrings or embedded multiline strings straight from Python code while removing the code formatting itself. The sample string has an artificial indent level introduced for illustrating this feature. import textwrap from textwrap_example import sample_text dedented_text = textwrap.dedent(sample_text) print ’Dedented:’ print dedented_text 1.2.2 填充段落 fill() 函数取文本作为输入,生成格式化的文本作为输出。 ptg 10 Text programmatic functionality similar to the paragraph wrapping or filling features found in many text editors. ’’’ 1.2.2 Filling Paragraphs The fill() function takes text as input and produces formatted text as output. import textwrap from textwrap_example import sample_text print ’No dedent:\n’ print textwrap.fill(sample_text, width=50) The results are something less than desirable. The text is now left justified, but the first line retains its indent and the spaces from the front of each subsequent line are embedded in the paragraph. $ python textwrap_fill.py No dedent: The textwrap module can be used to format text for output in situations where pretty- printing is desired. It offers programmatic functionality similar to the paragraph wrapping or filling features found in many text editors. 1.2.3 Removing Existing Indentation The previous example has embedded tabs and extra spaces mixed into the output, so it is not formatted very cleanly. Removing the common whitespace prefix from all lines in the sample text produces better results and allows the use of docstrings or embedded multiline strings straight from Python code while removing the code formatting itself. The sample string has an artificial indent level introduced for illustrating this feature. import textwrap from textwrap_example import sample_text dedented_text = textwrap.dedent(sample_text) print ’Dedented:’ print dedented_text 这个结果还有些差强人意。现在文本是左对齐的,不过只有第一行保留了缩进,其余各行 前面的空格则嵌入到段落中。 ptg 10 Text programmatic functionality similar to the paragraph wrapping or filling features found in many text editors. ’’’ 1.2.2 Filling Paragraphs The fill() function takes text as input and produces formatted text as output. import textwrap from textwrap_example import sample_text print ’No dedent:\n’ print textwrap.fill(sample_text, width=50) The results are something less than desirable. The text is now left justified, but the first line retains its indent and the spaces from the front of each subsequent line are embedded in the paragraph. $ python textwrap_fill.py No dedent: The textwrap module can be used to format text for output in situations where pretty- printing is desired. It offers programmatic functionality similar to the paragraph wrapping or filling features found in many text editors. 1.2.3 Removing Existing Indentation The previous example has embedded tabs and extra spaces mixed into the output, so it is not formatted very cleanly. Removing the common whitespace prefix from all lines in the sample text produces better results and allows the use of docstrings or embedded multiline strings straight from Python code while removing the code formatting itself. The sample string has an artificial indent level introduced for illustrating this feature. import textwrap from textwrap_example import sample_text dedented_text = textwrap.dedent(sample_text) print ’Dedented:’ print dedented_text 第 1 章 文  本 7  ptg 10 Text programmatic functionality similar to the paragraph wrapping or filling features found in many text editors. ’’’ 1.2.2 Filling Paragraphs The fill() function takes text as input and produces formatted text as output. import textwrap from textwrap_example import sample_text print ’No dedent:\n’ print textwrap.fill(sample_text, width=50) The results are something less than desirable. The text is now left justified, but the first line retains its indent and the spaces from the front of each subsequent line are embedded in the paragraph. $ python textwrap_fill.py No dedent: The textwrap module can be used to format text for output in situations where pretty- printing is desired. It offers programmatic functionality similar to the paragraph wrapping or filling features found in many text editors. 1.2.3 Removing Existing Indentation The previous example has embedded tabs and extra spaces mixed into the output, so it is not formatted very cleanly. Removing the common whitespace prefix from all lines in the sample text produces better results and allows the use of docstrings or embedded multiline strings straight from Python code while removing the code formatting itself. The sample string has an artificial indent level introduced for illustrating this feature. import textwrap from textwrap_example import sample_text dedented_text = textwrap.dedent(sample_text) print ’Dedented:’ print dedented_text 1.2.3 去除现有缩进 在前面的例子中,输出里混合嵌入了制表符和额外的空格,所以格式不太美观。从示例 文本删除所有行中都有的空白符前缀可以生成更好的结果,从而能直接使用 Python 代码中的 docstring 或嵌入的多行字符串,同时自行去除代码的格式化。示例字符串人为地引入了一级缩 进,以便展示这个特性。 ptg 10 Text programmatic functionality similar to the paragraph wrapping or filling features found in many text editors. ’’’ 1.2.2 Filling Paragraphs The fill() function takes text as input and produces formatted text as output. import textwrap from textwrap_example import sample_text print ’No dedent:\n’ print textwrap.fill(sample_text, width=50) The results are something less than desirable. The text is now left justified, but the first line retains its indent and the spaces from the front of each subsequent line are embedded in the paragraph. $ python textwrap_fill.py No dedent: The textwrap module can be used to format text for output in situations where pretty- printing is desired. It offers programmatic functionality similar to the paragraph wrapping or filling features found in many text editors. 1.2.3 Removing Existing Indentation The previous example has embedded tabs and extra spaces mixed into the output, so it is not formatted very cleanly. Removing the common whitespace prefix from all lines in the sample text produces better results and allows the use of docstrings or embedded multiline strings straight from Python code while removing the code formatting itself. The sample string has an artificial indent level introduced for illustrating this feature. import textwrap from textwrap_example import sample_text dedented_text = textwrap.dedent(sample_text) print ’Dedented:’ print dedented_text 结果变得漂亮一些了: ptg 1.2. textwrap—Formatting Text Paragraphs 11 The results are starting to look better: $ python textwrap_dedent.py Dedented: The textwrap module can be used to format text for output in situations where pretty-printing is desired. It offers programmatic functionality similar to the paragraph wrapping or filling features found in many text editors. Since “dedent” is the opposite of “indent,” the result is a block of text with the common initial whitespace from each line removed. If one line is already indented more than another, some of the whitespace will not be removed. Input like Line one. Line two. Line three. becomes Line one. Line two. Line three. 1.2.4 Combining Dedent and Fill Next, the dedented text can be passed through fill() with a few different width values. import textwrap from textwrap_example import sample_text dedented_text = textwrap.dedent(sample_text).strip() for width in [ 45, 70 ]: print ’%d Columns:\n’ % width print textwrap.fill(dedented_text, width=width) print 由于“dedent”(去除缩进)与“indent”(缩进)正好相反,因此这里的结果是得到一个文 本块,而且删除了各行最前面都有的空白符。如果某一行比其他行缩进更多,则会有一些空白 符未删除。 以下输入: ptg 1.2. textwrap—Formatting Text Paragraphs 11 The results are starting to look better: $ python textwrap_dedent.py Dedented: The textwrap module can be used to format text for output in situations where pretty-printing is desired. It offers programmatic functionality similar to the paragraph wrapping or filling features found in many text editors. Since “dedent” is the opposite of “indent,” the result is a block of text with the common initial whitespace from each line removed. If one line is already indented more than another, some of the whitespace will not be removed. Input like Line one. Line two. Line three. becomes Line one. Line two. Line three. 1.2.4 Combining Dedent and Fill Next, the dedented text can be passed through fill() with a few different width values. import textwrap from textwrap_example import sample_text dedented_text = textwrap.dedent(sample_text).strip() for width in [ 45, 70 ]: print ’%d Columns:\n’ % width print textwrap.fill(dedented_text, width=width) print 会变成: ptg 1.2. textwrap—Formatting Text Paragraphs 11 The results are starting to look better: $ python textwrap_dedent.py Dedented: The textwrap module can be used to format text for output in situations where pretty-printing is desired. It offers programmatic functionality similar to the paragraph wrapping or filling features found in many text editors. Since “dedent” is the opposite of “indent,” the result is a block of text with the common initial whitespace from each line removed. If one line is already indented more than another, some of the whitespace will not be removed. Input like Line one. Line two. Line three. becomes Line one. Line two. Line three. 1.2.4 Combining Dedent and Fill Next, the dedented text can be passed through fill() with a few different width values. import textwrap from textwrap_example import sample_text dedented_text = textwrap.dedent(sample_text).strip() for width in [ 45, 70 ]: print ’%d Columns:\n’ % width print textwrap.fill(dedented_text, width=width) print 1.2.4 结合 dedent 和 fill 接下来,可以把去除缩进的文本传入 fill(),并提供一组不同的 width 值。  8  Python 标准库  ptg 1.2. textwrap—Formatting Text Paragraphs 11 The results are starting to look better: $ python textwrap_dedent.py Dedented: The textwrap module can be used to format text for output in situations where pretty-printing is desired. It offers programmatic functionality similar to the paragraph wrapping or filling features found in many text editors. Since “dedent” is the opposite of “indent,” the result is a block of text with the common initial whitespace from each line removed. If one line is already indented more than another, some of the whitespace will not be removed. Input like Line one. Line two. Line three. becomes Line one. Line two. Line three. 1.2.4 Combining Dedent and Fill Next, the dedented text can be passed through fill() with a few different width values. import textwrap from textwrap_example import sample_text dedented_text = textwrap.dedent(sample_text).strip() for width in [ 45, 70 ]: print ’%d Columns:\n’ % width print textwrap.fill(dedented_text, width=width) print 这会生成指定宽度的输出。 ptg 12 Text This produces outputs in the specified widths. $ python textwrap_fill_width.py 45 Columns: The textwrap module can be used to format text for output in situations where pretty- printing is desired. It offers programmatic functionality similar to the paragraph wrapping or filling features found in many text editors. 70 Columns: The textwrap module can be used to format text for output in situations where pretty-printing is desired. It offers programmatic functionality similar to the paragraph wrapping or filling features found in many text editors. 1.2.5 Hanging Indents Just as the width of the output can be set, the indent of the first line can be controlled independently of subsequent lines. import textwrap from textwrap_example import sample_text dedented_text = textwrap.dedent(sample_text).strip() print textwrap.fill(dedented_text, initial_indent=’’, subsequent_indent=’ ’ * 4, width=50, ) This makes it possible to produce a hanging indent, where the first line is indented less than the other lines. $ python textwrap_hanging_indent.py The textwrap module can be used to format text for output in situations where pretty-printing is desired. It offers programmatic functionality 1.2.5 悬挂缩进 不仅输出的宽度可以设置,还可以单独控制第一行的缩进,以区别后面各行。 ptg 12 Text This produces outputs in the specified widths. $ python textwrap_fill_width.py 45 Columns: The textwrap module can be used to format text for output in situations where pretty- printing is desired. It offers programmatic functionality similar to the paragraph wrapping or filling features found in many text editors. 70 Columns: The textwrap module can be used to format text for output in situations where pretty-printing is desired. It offers programmatic functionality similar to the paragraph wrapping or filling features found in many text editors. 1.2.5 Hanging Indents Just as the width of the output can be set, the indent of the first line can be controlled independently of subsequent lines. import textwrap from textwrap_example import sample_text dedented_text = textwrap.dedent(sample_text).strip() print textwrap.fill(dedented_text, initial_indent=’’, subsequent_indent=’ ’ * 4, width=50, ) This makes it possible to produce a hanging indent, where the first line is indented less than the other lines. $ python textwrap_hanging_indent.py The textwrap module can be used to format text for output in situations where pretty-printing is desired. It offers programmatic functionality 这样一来会生成一种悬挂缩进,即第一行的缩进小于其他行的缩进。 ptg 12 Text This produces outputs in the specified widths. $ python textwrap_fill_width.py 45 Columns: The textwrap module can be used to format text for output in situations where pretty- printing is desired. It offers programmatic functionality similar to the paragraph wrapping or filling features found in many text editors. 70 Columns: The textwrap module can be used to format text for output in situations where pretty-printing is desired. It offers programmatic functionality similar to the paragraph wrapping or filling features found in many text editors. 1.2.5 Hanging Indents Just as the width of the output can be set, the indent of the first line can be controlled independently of subsequent lines. import textwrap from textwrap_example import sample_text dedented_text = textwrap.dedent(sample_text).strip() print textwrap.fill(dedented_text, initial_indent=’’, subsequent_indent=’ ’ * 4, width=50, ) This makes it possible to produce a hanging indent, where the first line is indented less than the other lines. $ python textwrap_hanging_indent.py The textwrap module can be used to format text for output in situations where pretty-printing is desired. It offers programmatic functionality 第 1 章 文  本 9  ptg 12 Text This produces outputs in the specified widths. $ python textwrap_fill_width.py 45 Columns: The textwrap module can be used to format text for output in situations where pretty- printing is desired. It offers programmatic functionality similar to the paragraph wrapping or filling features found in many text editors. 70 Columns: The textwrap module can be used to format text for output in situations where pretty-printing is desired. It offers programmatic functionality similar to the paragraph wrapping or filling features found in many text editors. 1.2.5 Hanging Indents Just as the width of the output can be set, the indent of the first line can be controlled independently of subsequent lines. import textwrap from textwrap_example import sample_text dedented_text = textwrap.dedent(sample_text).strip() print textwrap.fill(dedented_text, initial_indent=’’, subsequent_indent=’ ’ * 4, width=50, ) This makes it possible to produce a hanging indent, where the first line is indented less than the other lines. $ python textwrap_hanging_indent.py The textwrap module can be used to format text for output in situations where pretty-printing is desired. It offers programmatic functionality ptg 1.3. re—Regular Expressions 13 similar to the paragraph wrapping or filling features found in many text editors. The indent values can include nonwhitespace characters, too. The hanging indent can be prefixed with * to produce bullet points, etc. See Also: textwrap (http://docs.python.org/lib/module-textwrap.html) Standard library doc- umentation for this module. 1.3 re—Regular Expressions Purpose Searching within and changing text using formal patterns. Python Version 1.5 and later Regular expressions are text-matching patterns described with a formal syntax. The patterns are interpreted as a set of instructions, which are then executed with a string as input to produce a matching subset or modified version of the original. The term “regular expressions” is frequently shortened to “regex” or “regexp” in conversation. Expressions can include literal text matching, repetition, pattern composition, branch- ing, and other sophisticated rules. Many parsing problems are easier to solve using a regular expression than by creating a special-purpose lexer and parser. Regular expressions are typically used in applications that involve a lot of text processing. For example, they are commonly used as search patterns in text-editing programs used by developers, including vi, emacs, and modern IDEs. They are also an integral part of UNIX command line utilities, such as sed, grep, and awk. Many programming languages include support for regular expressions in the language syntax (Perl, Ruby, Awk, and Tcl). Other languages, such as C, C++, and Python, support regular expressions through extension libraries. There are multiple open source implementations of regular expressions, each shar- ing a common core syntax but having different extensions or modifications to their advanced features. The syntax used in Python’s re module is based on the syntax used for regular expressions in Perl, with a few Python-specific enhancements. Note: Although the formal definition of “regular expression” is limited to expres- sions that describe regular languages, some of the extensions supported by re go beyond describing regular languages. The term “regular expression” is used here in a more general sense to mean any expression that can be evaluated by Python’s re module. 缩进值还可以包含非空白符。例如,悬挂缩进可以加前缀 * 来生成项目符号。 参见: textwrap (http://docs.python.org/lib/module-textwrap.html) 这个模块的标准库文档。 1.3 re—正则表达式 作用:使用形式化模式搜索和修改文本。 Python 版本:1.5 及以后版本 正则表达式(regular expression)是用一种形式化语法描述的文本匹配模式。模式被解释 为一组指令,然后会执行这组指令,以一个字符串作为输入,生成一个匹配的子集或原字符串 的修改版本。“正则表达式”一词在讨论中通常会简写为“regex”或“regexp”。表达式可以包 括字面量文本匹配、重复、模式组合、分支以及其他复杂的规则。对于很多解析问题,用正则 表达式解决会比创建特殊用途的词法分析器和语法分析器更为容易。 正则表达式通常在涉及大量文本处理的应用中使用。例如,在开发人员使用的文本编辑 程序中(包括 vi、emacs 和其他现代 IDE)常用正则表达式作为搜索模式。另外,正则表达式 还是 UNIX 命令行工具的一个不可缺少的部分,如 sed、grep 和 awk。很多编程语言都在语言 语法中包括对正则表达式的支持,如 Perl、Ruby、Awk 和 Tcl。另外一些语言(如 C、C++ 和 Python)则通过扩展库来支持正则表达式。 有很多开源的正则表达式实现,这些实现都有一种共同的核心语法,不过对其高级特性有 不同的扩展或修改。Python 的 re 模块中使用的语法以 Perl 所用正则表达式语法为基础,并提 供了一些特定于 Python 的改进。 注意 :尽管“正则表达式”的正式定义仅限于描述正则语言的表达式,但 re 支持的一些扩展已 不仅仅描述正则语言。这里“正则表达式”一词有更通用的含义,表示可以由 Python 的 re 模 块计算的所有表达式。 1.3.1 查找文本中的模式 re 最常见的用法就是搜索文本中的模式。search() 函数取模式和要扫描的文本作为输入,如 果找到这个模式则返回一个 Match 对象。如果未找到模式,search() 将返回 None。 每个 Match 对象包含有关匹配性质的信息,包括原输入字符串、使用的正则表达式,以及 模式在原字符串中出现的位置。 ptg 14 Text 1.3.1 Finding Patterns in Text The most common use for re is to search for patterns in text. The search() function takes the pattern and text to scan, and returns a Match object when the pattern is found. If the pattern is not found, search() returns None. Each Match object holds information about the nature of the match, including the original input string, the regular expression used, and the location within the original string where the pattern occurs. import re pattern = ’this’ text = ’Does this text match the pattern?’ match = re.search(pattern, text) s = match.start() e = match.end() print ’Found "%s"\nin "%s"\nfrom %d to %d ("%s")’ % \ (match.re.pattern, match.string, s, e, text[s:e]) The start() and end() methods give the indexes into the string showing where the text matched by the pattern occurs. $ python re_simple_match.py Found "this" in "Does this text match the pattern?" from 5 to 9 ("this") 1.3.2 Compiling Expressions re includes module-level functions for working with regular expressions as text strings, but it is more efficient to compile the expressions a program uses frequently. The com- pile() function converts an expression string into a RegexObject. import re # Precompile the patterns regexes = [ re.compile(p)  10  Python 标准库  ptg 14 Text 1.3.1 Finding Patterns in Text The most common use for re is to search for patterns in text. The search() function takes the pattern and text to scan, and returns a Match object when the pattern is found. If the pattern is not found, search() returns None. Each Match object holds information about the nature of the match, including the original input string, the regular expression used, and the location within the original string where the pattern occurs. import re pattern = ’this’ text = ’Does this text match the pattern?’ match = re.search(pattern, text) s = match.start() e = match.end() print ’Found "%s"\nin "%s"\nfrom %d to %d ("%s")’ % \ (match.re.pattern, match.string, s, e, text[s:e]) The start() and end() methods give the indexes into the string showing where the text matched by the pattern occurs. $ python re_simple_match.py Found "this" in "Does this text match the pattern?" from 5 to 9 ("this") 1.3.2 Compiling Expressions re includes module-level functions for working with regular expressions as text strings, but it is more efficient to compile the expressions a program uses frequently. The com- pile() function converts an expression string into a RegexObject. import re # Precompile the patterns regexes = [ re.compile(p) start() 和 end() 方法可以给出字符串中的相应索引,指示与模式匹配的文本在字符串中出现 的位置。 ptg 14 Text 1.3.1 Finding Patterns in Text The most common use for re is to search for patterns in text. The search() function takes the pattern and text to scan, and returns a Match object when the pattern is found. If the pattern is not found, search() returns None. Each Match object holds information about the nature of the match, including the original input string, the regular expression used, and the location within the original string where the pattern occurs. import re pattern = ’this’ text = ’Does this text match the pattern?’ match = re.search(pattern, text) s = match.start() e = match.end() print ’Found "%s"\nin "%s"\nfrom %d to %d ("%s")’ % \ (match.re.pattern, match.string, s, e, text[s:e]) The start() and end() methods give the indexes into the string showing where the text matched by the pattern occurs. $ python re_simple_match.py Found "this" in "Does this text match the pattern?" from 5 to 9 ("this") 1.3.2 Compiling Expressions re includes module-level functions for working with regular expressions as text strings, but it is more efficient to compile the expressions a program uses frequently. The com- pile() function converts an expression string into a RegexObject. import re # Precompile the patterns regexes = [ re.compile(p) 1.3.2 编译表达式 re 包含一些模块级函数,用于处理作为文本字符串的正则表达式,不过对于程序频繁使 用的表达式,编译这些表达式会更为高效。compile() 函数会把一个表达式字符串转换为一个 RegexObject。 ptg 14 Text 1.3.1 Finding Patterns in Text The most common use for re is to search for patterns in text. The search() function takes the pattern and text to scan, and returns a Match object when the pattern is found. If the pattern is not found, search() returns None. Each Match object holds information about the nature of the match, including the original input string, the regular expression used, and the location within the original string where the pattern occurs. import re pattern = ’this’ text = ’Does this text match the pattern?’ match = re.search(pattern, text) s = match.start() e = match.end() print ’Found "%s"\nin "%s"\nfrom %d to %d ("%s")’ % \ (match.re.pattern, match.string, s, e, text[s:e]) The start() and end() methods give the indexes into the string showing where the text matched by the pattern occurs. $ python re_simple_match.py Found "this" in "Does this text match the pattern?" from 5 to 9 ("this") 1.3.2 Compiling Expressions re includes module-level functions for working with regular expressions as text strings, but it is more efficient to compile the expressions a program uses frequently. The com- pile() function converts an expression string into a RegexObject. import re # Precompile the patterns regexes = [ re.compile(p) ptg 1.3. re—Regular Expressions 15 for p in [ ’this’, ’that’ ] ] text = ’Does this text match the pattern?’ print ’Text: %r\n’ % text for regex in regexes: print ’Seeking "%s" ->’ % regex.pattern, if regex.search(text): print ’match!’ else: print ’no match’ The module-level functions maintain a cache of compiled expressions. However, the size of the cache is limited, and using compiled expressions directly avoids the cache lookup overhead. Another advantage of using compiled expressions is that by precompiling all expressions when the module is loaded, the compilation work is shifted to application start time, instead of to a point when the program may be responding to a user action. $ python re_simple_compiled.py Text: ’Does this text match the pattern?’ Seeking "this" -> match! Seeking "that" -> no match 1.3.3 Multiple Matches So far, the example patterns have all used search() to look for single instances of literal text strings. The findall() function returns all substrings of the input that match the pattern without overlapping. import re text = ’abbaaabbbbaaaaa’ pattern = ’ab’ for match in re.findall(pattern, text): print ’Found "%s"’ % match 模块级函数会维护已编译表达式的一个缓存。不过,这个缓存的大小是有限的,直接使用 已编译表达式可以避免缓存查找开销。使用已编译表达式的另一个好处是,通过在加载模块时 预编译所有表达式,可以把编译工作转到应用开始时,而不是当程序响应一个用户动作时才进 第 1 章 文  本 11  行编译。 ptg 1.3. re—Regular Expressions 15 for p in [ ’this’, ’that’ ] ] text = ’Does this text match the pattern?’ print ’Text: %r\n’ % text for regex in regexes: print ’Seeking "%s" ->’ % regex.pattern, if regex.search(text): print ’match!’ else: print ’no match’ The module-level functions maintain a cache of compiled expressions. However, the size of the cache is limited, and using compiled expressions directly avoids the cache lookup overhead. Another advantage of using compiled expressions is that by precompiling all expressions when the module is loaded, the compilation work is shifted to application start time, instead of to a point when the program may be responding to a user action. $ python re_simple_compiled.py Text: ’Does this text match the pattern?’ Seeking "this" -> match! Seeking "that" -> no match 1.3.3 Multiple Matches So far, the example patterns have all used search() to look for single instances of literal text strings. The findall() function returns all substrings of the input that match the pattern without overlapping. import re text = ’abbaaabbbbaaaaa’ pattern = ’ab’ for match in re.findall(pattern, text): print ’Found "%s"’ % match 1.3.3 多重匹配 到目前为止,示例模式都使用 search() 来查找字面量文本字符串的单个实例。findall() 函数 会返回输入中与模式匹配而不重叠的所有子串。 ptg 1.3. re—Regular Expressions 15 for p in [ ’this’, ’that’ ] ] text = ’Does this text match the pattern?’ print ’Text: %r\n’ % text for regex in regexes: print ’Seeking "%s" ->’ % regex.pattern, if regex.search(text): print ’match!’ else: print ’no match’ The module-level functions maintain a cache of compiled expressions. However, the size of the cache is limited, and using compiled expressions directly avoids the cache lookup overhead. Another advantage of using compiled expressions is that by precompiling all expressions when the module is loaded, the compilation work is shifted to application start time, instead of to a point when the program may be responding to a user action. $ python re_simple_compiled.py Text: ’Does this text match the pattern?’ Seeking "this" -> match! Seeking "that" -> no match 1.3.3 Multiple Matches So far, the example patterns have all used search() to look for single instances of literal text strings. The findall() function returns all substrings of the input that match the pattern without overlapping. import re text = ’abbaaabbbbaaaaa’ pattern = ’ab’ for match in re.findall(pattern, text): print ’Found "%s"’ % match 这个输入字符串中有两个 ab 实例。 ptg 16 Text There are two instances of ab in the input string. $ python re_findall.py Found "ab" Found "ab" finditer() returns an iterator that produces Match instances instead of the strings returned by findall(). import re text = ’abbaaabbbbaaaaa’ pattern = ’ab’ for match in re.finditer(pattern, text): s = match.start() e = match.end() print ’Found "%s" at %d:%d’ % (text[s:e], s, e) This example finds the same two occurrences of ab, and the Match instance shows where they are in the original input. $ python re_finditer.py Found "ab" at 0:2 Found "ab" at 5:7 1.3.4 Pattern Syntax Regular expressions support more powerful patterns than simple literal text strings. Patterns can repeat, can be anchored to different logical locations within the input, and can be expressed in compact forms that do not require every literal character to be present in the pattern. All of these features are used by combining literal text values with metacharacters that are part of the regular expression pattern syntax implemented by re. import re def test_patterns(text, patterns=[]): finditer() 会返回一个迭代器,它将生成 Match 实例,而不像 findall() 返回字符串。 ptg 16 Text There are two instances of ab in the input string. $ python re_findall.py Found "ab" Found "ab" finditer() returns an iterator that produces Match instances instead of the strings returned by findall(). import re text = ’abbaaabbbbaaaaa’ pattern = ’ab’ for match in re.finditer(pattern, text): s = match.start() e = match.end() print ’Found "%s" at %d:%d’ % (text[s:e], s, e) This example finds the same two occurrences of ab, and the Match instance shows where they are in the original input. $ python re_finditer.py Found "ab" at 0:2 Found "ab" at 5:7 1.3.4 Pattern Syntax Regular expressions support more powerful patterns than simple literal text strings. Patterns can repeat, can be anchored to different logical locations within the input, and can be expressed in compact forms that do not require every literal character to be present in the pattern. All of these features are used by combining literal text values with metacharacters that are part of the regular expression pattern syntax implemented by re. import re def test_patterns(text, patterns=[]): 这个例子找到了 ab 的两次出现,Match 实例显示出它们在原输入字符串中的位置。 ptg 16 Text There are two instances of ab in the input string. $ python re_findall.py Found "ab" Found "ab" finditer() returns an iterator that produces Match instances instead of the strings returned by findall(). import re text = ’abbaaabbbbaaaaa’ pattern = ’ab’ for match in re.finditer(pattern, text): s = match.start() e = match.end() print ’Found "%s" at %d:%d’ % (text[s:e], s, e) This example finds the same two occurrences of ab, and the Match instance shows where they are in the original input. $ python re_finditer.py Found "ab" at 0:2 Found "ab" at 5:7 1.3.4 Pattern Syntax Regular expressions support more powerful patterns than simple literal text strings. Patterns can repeat, can be anchored to different logical locations within the input, and can be expressed in compact forms that do not require every literal character to be present in the pattern. All of these features are used by combining literal text values with metacharacters that are part of the regular expression pattern syntax implemented by re. import re def test_patterns(text, patterns=[]):  12  Python 标准库  1.3.4 模式语法 正则表达式支持更强大的模式,而不只是简单的字面量文本字符串。模式可以重复,可以 锚定到输入中不同的逻辑位置,还可以采用紧凑形式表示而不需要在模式中提供每一个字面量 字符。使用所有这些特性时,需要结合字面量文本值和元字符(metacharacter),元字符是 re 实现的正则表达式模式语法的一部分。 ptg 16 Text There are two instances of ab in the input string. $ python re_findall.py Found "ab" Found "ab" finditer() returns an iterator that produces Match instances instead of the strings returned by findall(). import re text = ’abbaaabbbbaaaaa’ pattern = ’ab’ for match in re.finditer(pattern, text): s = match.start() e = match.end() print ’Found "%s" at %d:%d’ % (text[s:e], s, e) This example finds the same two occurrences of ab, and the Match instance shows where they are in the original input. $ python re_finditer.py Found "ab" at 0:2 Found "ab" at 5:7 1.3.4 Pattern Syntax Regular expressions support more powerful patterns than simple literal text strings. Patterns can repeat, can be anchored to different logical locations within the input, and can be expressed in compact forms that do not require every literal character to be present in the pattern. All of these features are used by combining literal text values with metacharacters that are part of the regular expression pattern syntax implemented by re. import re def test_patterns(text, patterns=[]): ptg 1.3. re—Regular Expressions 17 """Given source text and a list of patterns, look for matches for each pattern within the text and print them to stdout. """ # Look for each pattern in the text and print the results for pattern, desc in patterns: print ’Pattern %r (%s)\n’ % (pattern, desc) print ’ %r’ % text for match in re.finditer(pattern, text): s = match.start() e = match.end() substr = text[s:e] n_backslashes = text[:s].count(’\\’) prefix = ’.’ * (s + n_backslashes) print ’ %s%r’ % (prefix, substr) print return if __name__ == ’__main__’: test_patterns(’abbaaabbbbaaaaa’, [(’ab’, "’a’ followed by ’b’"), ]) The following examples will use test_patterns() to explore how variations in patterns change the way they match the same input text. The output shows the input text and the substring range from each portion of the input that matches the pattern. $ python re_test_patterns.py Pattern ’ab’ (’a’ followed by ’b’) ’abbaaabbbbaaaaa’ ’ab’ .....’ab’ Repetition There are five ways to express repetition in a pattern. A pattern followed by the metacharacter * is repeated zero or more times. (Allowing a pattern to repeat zero times means it does not need to appear at all to match.) Replace the * with + and the pattern must appear at least once. Using ? means the pattern appears zero times or one time. For a specific number of occurrences, use {m} after the pattern, where m is the 下面的例子将使用 test_patterns() 来研究模式变化如何影响以何种方式匹配相同的输入文 本。输出显示了输入文本以及输入中与模式匹配的各个部分的子串区间。 ptg 1.3. re—Regular Expressions 17 """Given source text and a list of patterns, look for matches for each pattern within the text and print them to stdout. """ # Look for each pattern in the text and print the results for pattern, desc in patterns: print ’Pattern %r (%s)\n’ % (pattern, desc) print ’ %r’ % text for match in re.finditer(pattern, text): s = match.start() e = match.end() substr = text[s:e] n_backslashes = text[:s].count(’\\’) prefix = ’.’ * (s + n_backslashes) print ’ %s%r’ % (prefix, substr) print return if __name__ == ’__main__’: test_patterns(’abbaaabbbbaaaaa’, [(’ab’, "’a’ followed by ’b’"), ]) The following examples will use test_patterns() to explore how variations in patterns change the way they match the same input text. The output shows the input text and the substring range from each portion of the input that matches the pattern. $ python re_test_patterns.py Pattern ’ab’ (’a’ followed by ’b’) ’abbaaabbbbaaaaa’ ’ab’ .....’ab’ Repetition There are five ways to express repetition in a pattern. A pattern followed by the metacharacter * is repeated zero or more times. (Allowing a pattern to repeat zero times means it does not need to appear at all to match.) Replace the * with + and the pattern must appear at least once. Using ? means the pattern appears zero times or one time. For a specific number of occurrences, use {m} after the pattern, where m is the 重复 模式中有 5 种表达重复的方式。如果模式后面跟有元字符 *,这个模式会重复 0 次或多次。 第 1 章 文  本 13  (允许一个模式重复 0 次意味着这个模式并不需要出现就能够匹配)。如果将 * 替换为 +,那么 这个模式必须至少出现 1 次。使用“?”意味着模式要出现 0 或 1 次。如果希望出现特定的次 数,需要在模式后面使用 {m},这里 m 是模式需要重复的次数。最后,如果允许重复次数可变 但是有一个限定范围,可以使用 {m,n},这里 m 是最小重复次数,n 是最大重复次数。如果省 略 n ( 即 {m,}),表示这个值至少要出现 m 次,但无上限。 ptg 18 Text number of times the pattern should repeat. And, finally, to allow a variable but limited number of repetitions, use {m,n} where m is the minimum number of repetitions and n is the maximum. Leaving out n ({m,}) means the value appears at least m times, with no maximum. from re_test_patterns import test_patterns test_patterns( ’abbaabbba’, [ (’ab*’, ’a followed by zero or more b’), (’ab+’, ’a followed by one or more b’), (’ab?’, ’a followed by zero or one b’), (’ab{3}’, ’a followed by three b’), (’ab{2,3}’, ’a followed by two to three b’), ]) There are more matches for ab* and ab? than ab+. $ python re_repetition.py Pattern ’ab*’ (a followed by zero or more b) ’abbaabbba’ ’abb’ ...’a’ ....’abbb’ ........’a’ Pattern ’ab+’ (a followed by one or more b) ’abbaabbba’ ’abb’ ....’abbb’ Pattern ’ab?’ (a followed by zero or one b) ’abbaabbba’ ’ab’ ...’a’ ....’ab’ ........’a’ ab* 和 ab? 的匹配模式要多于 ab+ 的匹配。 ptg 18 Text number of times the pattern should repeat. And, finally, to allow a variable but limited number of repetitions, use {m,n} where m is the minimum number of repetitions and n is the maximum. Leaving out n ({m,}) means the value appears at least m times, with no maximum. from re_test_patterns import test_patterns test_patterns( ’abbaabbba’, [ (’ab*’, ’a followed by zero or more b’), (’ab+’, ’a followed by one or more b’), (’ab?’, ’a followed by zero or one b’), (’ab{3}’, ’a followed by three b’), (’ab{2,3}’, ’a followed by two to three b’), ]) There are more matches for ab* and ab? than ab+. $ python re_repetition.py Pattern ’ab*’ (a followed by zero or more b) ’abbaabbba’ ’abb’ ...’a’ ....’abbb’ ........’a’ Pattern ’ab+’ (a followed by one or more b) ’abbaabbba’ ’abb’ ....’abbb’ Pattern ’ab?’ (a followed by zero or one b) ’abbaabbba’ ’ab’ ...’a’ ....’ab’ ........’a’ ptg 1.3. re—Regular Expressions 19 Pattern ’ab{3}’ (a followed by three b) ’abbaabbba’ ....’abbb’ Pattern ’ab{2,3}’ (a followed by two to three b) ’abbaabbba’ ’abb’ ....’abbb’ Normally, when processing a repetition instruction, re will consume as much of the input as possible while matching the pattern. This so-called greedy behavior may result in fewer individual matches, or the matches may include more of the input text than intended. Greediness can be turned off by following the repetition instruction with ?. from re_test_patterns import test_patterns test_patterns( ’abbaabbba’, [ (’ab*?’, ’a followed by zero or more b’), (’ab+?’, ’a followed by one or more b’), (’ab??’, ’a followed by zero or one b’), (’ab{3}?’, ’a followed by three b’), (’ab{2,3}?’, ’a followed by two to three b’), ]) Disabling greedy consumption of the input for any patterns where zero occurrences of b are allowed means the matched substring does not include any b characters. $ python re_repetition_non_greedy.py Pattern ’ab*?’ (a followed by zero or more b) ’abbaabbba’ ’a’ ...’a’ ....’a’ ........’a’  14  Python 标准库  ptg 1.3. re—Regular Expressions 19 Pattern ’ab{3}’ (a followed by three b) ’abbaabbba’ ....’abbb’ Pattern ’ab{2,3}’ (a followed by two to three b) ’abbaabbba’ ’abb’ ....’abbb’ Normally, when processing a repetition instruction, re will consume as much of the input as possible while matching the pattern. This so-called greedy behavior may result in fewer individual matches, or the matches may include more of the input text than intended. Greediness can be turned off by following the repetition instruction with ?. from re_test_patterns import test_patterns test_patterns( ’abbaabbba’, [ (’ab*?’, ’a followed by zero or more b’), (’ab+?’, ’a followed by one or more b’), (’ab??’, ’a followed by zero or one b’), (’ab{3}?’, ’a followed by three b’), (’ab{2,3}?’, ’a followed by two to three b’), ]) Disabling greedy consumption of the input for any patterns where zero occurrences of b are allowed means the matched substring does not include any b characters. $ python re_repetition_non_greedy.py Pattern ’ab*?’ (a followed by zero or more b) ’abbaabbba’ ’a’ ...’a’ ....’a’ ........’a’ 正常情况下,处理重复指令时,re 匹配模式时会利用(consume)尽可能多的输入。这种 所谓“贪心”的行为可能导致单个匹配减少,或者匹配中包含了多于原先预计的输入文本。在 重复指令后面加上“?”可以关闭这种贪心行为。 ptg 1.3. re—Regular Expressions 19 Pattern ’ab{3}’ (a followed by three b) ’abbaabbba’ ....’abbb’ Pattern ’ab{2,3}’ (a followed by two to three b) ’abbaabbba’ ’abb’ ....’abbb’ Normally, when processing a repetition instruction, re will consume as much of the input as possible while matching the pattern. This so-called greedy behavior may result in fewer individual matches, or the matches may include more of the input text than intended. Greediness can be turned off by following the repetition instruction with ?. from re_test_patterns import test_patterns test_patterns( ’abbaabbba’, [ (’ab*?’, ’a followed by zero or more b’), (’ab+?’, ’a followed by one or more b’), (’ab??’, ’a followed by zero or one b’), (’ab{3}?’, ’a followed by three b’), (’ab{2,3}?’, ’a followed by two to three b’), ]) Disabling greedy consumption of the input for any patterns where zero occurrences of b are allowed means the matched substring does not include any b characters. $ python re_repetition_non_greedy.py Pattern ’ab*?’ (a followed by zero or more b) ’abbaabbba’ ’a’ ...’a’ ....’a’ ........’a’ 对于允许 b 出现 0 次的模式,如果禁用贪心利用(consume)输入,意味着匹配的子串不 包含任何 b 字符。 ptg 1.3. re—Regular Expressions 19 Pattern ’ab{3}’ (a followed by three b) ’abbaabbba’ ....’abbb’ Pattern ’ab{2,3}’ (a followed by two to three b) ’abbaabbba’ ’abb’ ....’abbb’ Normally, when processing a repetition instruction, re will consume as much of the input as possible while matching the pattern. This so-called greedy behavior may result in fewer individual matches, or the matches may include more of the input text than intended. Greediness can be turned off by following the repetition instruction with ?. from re_test_patterns import test_patterns test_patterns( ’abbaabbba’, [ (’ab*?’, ’a followed by zero or more b’), (’ab+?’, ’a followed by one or more b’), (’ab??’, ’a followed by zero or one b’), (’ab{3}?’, ’a followed by three b’), (’ab{2,3}?’, ’a followed by two to three b’), ]) Disabling greedy consumption of the input for any patterns where zero occurrences of b are allowed means the matched substring does not include any b characters. $ python re_repetition_non_greedy.py Pattern ’ab*?’ (a followed by zero or more b) ’abbaabbba’ ’a’ ...’a’ ....’a’ ........’a’ ptg 20 Text Pattern ’ab+?’ (a followed by one or more b) ’abbaabbba’ ’ab’ ....’ab’ Pattern ’ab??’ (a followed by zero or one b) ’abbaabbba’ ’a’ ...’a’ ....’a’ ........’a’ Pattern ’ab{3}?’ (a followed by three b) ’abbaabbba’ ....’abbb’ Pattern ’ab{2,3}?’ (a followed by two to three b) ’abbaabbba’ ’abb’ ....’abb’ Character Sets A character set is a group of characters, any one of which can match at that point in the pattern. For example, [ab] would match either a or b. from re_test_patterns import test_patterns test_patterns( ’abbaabbba’, [ (’[ab]’, ’either a or b’), (’a[ab]+’, ’a followed by 1 or more a or b’), (’a[ab]+?’, ’a followed by 1 or more a or b, not greedy’), ]) The greedy form of the expression (a[ab]+) consumes the entire string because the first letter is a and every subsequent character is either a or b. 第 1 章 文  本 15  ptg 20 Text Pattern ’ab+?’ (a followed by one or more b) ’abbaabbba’ ’ab’ ....’ab’ Pattern ’ab??’ (a followed by zero or one b) ’abbaabbba’ ’a’ ...’a’ ....’a’ ........’a’ Pattern ’ab{3}?’ (a followed by three b) ’abbaabbba’ ....’abbb’ Pattern ’ab{2,3}?’ (a followed by two to three b) ’abbaabbba’ ’abb’ ....’abb’ Character Sets A character set is a group of characters, any one of which can match at that point in the pattern. For example, [ab] would match either a or b. from re_test_patterns import test_patterns test_patterns( ’abbaabbba’, [ (’[ab]’, ’either a or b’), (’a[ab]+’, ’a followed by 1 or more a or b’), (’a[ab]+?’, ’a followed by 1 or more a or b, not greedy’), ]) The greedy form of the expression (a[ab]+) consumes the entire string because the first letter is a and every subsequent character is either a or b. 字符集 字符集(character set)是一组字符,包含可以与模式中相应位置匹配的所有字符。例如, [ab] 可以匹配 a 或 b。 ptg 20 Text Pattern ’ab+?’ (a followed by one or more b) ’abbaabbba’ ’ab’ ....’ab’ Pattern ’ab??’ (a followed by zero or one b) ’abbaabbba’ ’a’ ...’a’ ....’a’ ........’a’ Pattern ’ab{3}?’ (a followed by three b) ’abbaabbba’ ....’abbb’ Pattern ’ab{2,3}?’ (a followed by two to three b) ’abbaabbba’ ’abb’ ....’abb’ Character Sets A character set is a group of characters, any one of which can match at that point in the pattern. For example, [ab] would match either a or b. from re_test_patterns import test_patterns test_patterns( ’abbaabbba’, [ (’[ab]’, ’either a or b’), (’a[ab]+’, ’a followed by 1 or more a or b’), (’a[ab]+?’, ’a followed by 1 or more a or b, not greedy’), ]) The greedy form of the expression (a[ab]+) consumes the entire string because the first letter is a and every subsequent character is either a or b. 表达式 (a[ab]+) 的贪心形式会利用整个字符串,因为第一个字母是 a,而且后面的每一个 字符要么是 a 要么是 b。 ptg 1.3. re—Regular Expressions 21 $ python re_charset.py Pattern ’[ab]’ (either a or b) ’abbaabbba’ ’a’ .’b’ ..’b’ ...’a’ ....’a’ .....’b’ ......’b’ .......’b’ ........’a’ Pattern ’a[ab]+’ (a followed by 1 or more a or b) ’abbaabbba’ ’abbaabbba’ Pattern ’a[ab]+?’ (a followed by 1 or more a or b, not greedy) ’abbaabbba’ ’ab’ ...’aa’ A character set can also be used to exclude specific characters. The carat (^) means to look for characters not in the set following. from re_test_patterns import test_patterns test_patterns( ’This is some text -- with punctuation.’, [ (’[^-. ]+’, ’sequences without -, ., or space’), ]) This pattern finds all the substrings that do not contain the characters -, .,ora space. $ python re_charset_exclude.py Pattern ’[^-. ]+’ (sequences without -, ., or space)  16  Python 标准库  字符集还可以用来排除某些特定字符。尖字符 (^) 表示要查找未在随后的字符集中出现的 字符。 ptg 1.3. re—Regular Expressions 21 $ python re_charset.py Pattern ’[ab]’ (either a or b) ’abbaabbba’ ’a’ .’b’ ..’b’ ...’a’ ....’a’ .....’b’ ......’b’ .......’b’ ........’a’ Pattern ’a[ab]+’ (a followed by 1 or more a or b) ’abbaabbba’ ’abbaabbba’ Pattern ’a[ab]+?’ (a followed by 1 or more a or b, not greedy) ’abbaabbba’ ’ab’ ...’aa’ A character set can also be used to exclude specific characters. The carat (^) means to look for characters not in the set following. from re_test_patterns import test_patterns test_patterns( ’This is some text -- with punctuation.’, [ (’[^-. ]+’, ’sequences without -, ., or space’), ]) This pattern finds all the substrings that do not contain the characters -, .,ora space. $ python re_charset_exclude.py Pattern ’[^-. ]+’ (sequences without -, ., or space) 这个模式将找到不包含字符“–”、“.”或空格的所有子串。 ptg 1.3. re—Regular Expressions 21 $ python re_charset.py Pattern ’[ab]’ (either a or b) ’abbaabbba’ ’a’ .’b’ ..’b’ ...’a’ ....’a’ .....’b’ ......’b’ .......’b’ ........’a’ Pattern ’a[ab]+’ (a followed by 1 or more a or b) ’abbaabbba’ ’abbaabbba’ Pattern ’a[ab]+?’ (a followed by 1 or more a or b, not greedy) ’abbaabbba’ ’ab’ ...’aa’ A character set can also be used to exclude specific characters. The carat (^) means to look for characters not in the set following. from re_test_patterns import test_patterns test_patterns( ’This is some text -- with punctuation.’, [ (’[^-. ]+’, ’sequences without -, ., or space’), ]) This pattern finds all the substrings that do not contain the characters -, .,ora space. $ python re_charset_exclude.py Pattern ’[^-. ]+’ (sequences without -, ., or space) ptg 22 Text ’This is some text -- with punctuation.’ ’This’ .....’is’ ........’some’ .............’text’ .....................’with’ ..........................’punctuation’ As character sets grow larger, typing every character that should (or should not) match becomes tedious. A more compact format using character ranges can be used to define a character set to include all contiguous characters between a start point and a stop point. from re_test_patterns import test_patterns test_patterns( ’This is some text -- with punctuation.’, [ (’[a-z]+’, ’sequences of lowercase letters’), (’[A-Z]+’, ’sequences of uppercase letters’), (’[a-zA-Z]+’, ’sequences of lowercase or uppercase letters’), (’[A-Z][a-z]+’, ’one uppercase followed by lowercase’), ]) Here the range a-z includes the lowercase ASCII letters, and the range A-Z in- cludes the uppercase ASCII letters. The ranges can also be combined into a single character set. $ python re_charset_ranges.py Pattern ’[a-z]+’ (sequences of lowercase letters) ’This is some text -- with punctuation.’ .’his’ .....’is’ ........’some’ .............’text’ .....................’with’ ..........................’punctuation’ Pattern ’[A-Z]+’ (sequences of uppercase letters) ’This is some text -- with punctuation.’ ’T’ 随着字符集变大,键入每一个应当(或不应当)匹配的字符会变得很枯燥。可以使用一种 更紧凑的格式,利用字符区间(character range)来定义一个字符集,其中包括一个起点和一个 终点之间所有连续的字符。 ptg 22 Text ’This is some text -- with punctuation.’ ’This’ .....’is’ ........’some’ .............’text’ .....................’with’ ..........................’punctuation’ As character sets grow larger, typing every character that should (or should not) match becomes tedious. A more compact format using character ranges can be used to define a character set to include all contiguous characters between a start point and a stop point. from re_test_patterns import test_patterns test_patterns( ’This is some text -- with punctuation.’, [ (’[a-z]+’, ’sequences of lowercase letters’), (’[A-Z]+’, ’sequences of uppercase letters’), (’[a-zA-Z]+’, ’sequences of lowercase or uppercase letters’), (’[A-Z][a-z]+’, ’one uppercase followed by lowercase’), ]) Here the range a-z includes the lowercase ASCII letters, and the range A-Z in- cludes the uppercase ASCII letters. The ranges can also be combined into a single character set. $ python re_charset_ranges.py Pattern ’[a-z]+’ (sequences of lowercase letters) ’This is some text -- with punctuation.’ .’his’ .....’is’ ........’some’ .............’text’ .....................’with’ ..........................’punctuation’ Pattern ’[A-Z]+’ (sequences of uppercase letters) ’This is some text -- with punctuation.’ ’T’ 这里的区间 a–z 包括所有小写 ASCII 字母,区间 A–Z 包括全部大写 ASCII 字母。这些区 间还可以合并为一个字符集。 ptg 22 Text ’This is some text -- with punctuation.’ ’This’ .....’is’ ........’some’ .............’text’ .....................’with’ ..........................’punctuation’ As character sets grow larger, typing every character that should (or should not) match becomes tedious. A more compact format using character ranges can be used to define a character set to include all contiguous characters between a start point and a stop point. from re_test_patterns import test_patterns test_patterns( ’This is some text -- with punctuation.’, [ (’[a-z]+’, ’sequences of lowercase letters’), (’[A-Z]+’, ’sequences of uppercase letters’), (’[a-zA-Z]+’, ’sequences of lowercase or uppercase letters’), (’[A-Z][a-z]+’, ’one uppercase followed by lowercase’), ]) Here the range a-z includes the lowercase ASCII letters, and the range A-Z in- cludes the uppercase ASCII letters. The ranges can also be combined into a single character set. $ python re_charset_ranges.py Pattern ’[a-z]+’ (sequences of lowercase letters) ’This is some text -- with punctuation.’ .’his’ .....’is’ ........’some’ .............’text’ .....................’with’ ..........................’punctuation’ Pattern ’[A-Z]+’ (sequences of uppercase letters) ’This is some text -- with punctuation.’ ’T’ 第 1 章 文  本 17  ptg 22 Text ’This is some text -- with punctuation.’ ’This’ .....’is’ ........’some’ .............’text’ .....................’with’ ..........................’punctuation’ As character sets grow larger, typing every character that should (or should not) match becomes tedious. A more compact format using character ranges can be used to define a character set to include all contiguous characters between a start point and a stop point. from re_test_patterns import test_patterns test_patterns( ’This is some text -- with punctuation.’, [ (’[a-z]+’, ’sequences of lowercase letters’), (’[A-Z]+’, ’sequences of uppercase letters’), (’[a-zA-Z]+’, ’sequences of lowercase or uppercase letters’), (’[A-Z][a-z]+’, ’one uppercase followed by lowercase’), ]) Here the range a-z includes the lowercase ASCII letters, and the range A-Z in- cludes the uppercase ASCII letters. The ranges can also be combined into a single character set. $ python re_charset_ranges.py Pattern ’[a-z]+’ (sequences of lowercase letters) ’This is some text -- with punctuation.’ .’his’ .....’is’ ........’some’ .............’text’ .....................’with’ ..........................’punctuation’ Pattern ’[A-Z]+’ (sequences of uppercase letters) ’This is some text -- with punctuation.’ ’T’ ptg 1.3. re—Regular Expressions 23 Pattern ’[a-zA-Z]+’ (sequences of lowercase or uppercase letters) ’This is some text -- with punctuation.’ ’This’ .....’is’ ........’some’ .............’text’ .....................’with’ ..........................’punctuation’ Pattern ’[A-Z][a-z]+’ (one uppercase followed by lowercase) ’This is some text -- with punctuation.’ ’This’ As a special case of a character set, the metacharacter dot, or period (.), indicates that the pattern should match any single character in that position. from re_test_patterns import test_patterns test_patterns( ’abbaabbba’, [ (’a.’, ’a followed by any one character’), (’b.’, ’b followed by any one character’), (’a.*b’, ’a followed by anything, ending in b’), (’a.*?b’, ’a followed by anything, ending in b’), ]) Combining a dot with repetition can result in very long matches, unless the non- greedy form is used. $ python re_charset_dot.py Pattern ’a.’ (a followed by any one character) ’abbaabbba’ ’ab’ ...’aa’ Pattern ’b.’ (b followed by any one character) 作为字符集的一种特殊情况,元字符“.”(点号 ) 指模式应当匹配该位置的任何单字符。 ptg 1.3. re—Regular Expressions 23 Pattern ’[a-zA-Z]+’ (sequences of lowercase or uppercase letters) ’This is some text -- with punctuation.’ ’This’ .....’is’ ........’some’ .............’text’ .....................’with’ ..........................’punctuation’ Pattern ’[A-Z][a-z]+’ (one uppercase followed by lowercase) ’This is some text -- with punctuation.’ ’This’ As a special case of a character set, the metacharacter dot, or period (.), indicates that the pattern should match any single character in that position. from re_test_patterns import test_patterns test_patterns( ’abbaabbba’, [ (’a.’, ’a followed by any one character’), (’b.’, ’b followed by any one character’), (’a.*b’, ’a followed by anything, ending in b’), (’a.*?b’, ’a followed by anything, ending in b’), ]) Combining a dot with repetition can result in very long matches, unless the non- greedy form is used. $ python re_charset_dot.py Pattern ’a.’ (a followed by any one character) ’abbaabbba’ ’ab’ ...’aa’ Pattern ’b.’ (b followed by any one character) 结合点号与重复可以得到非常长的匹配结果,除非使用非贪心形式。 ptg 1.3. re—Regular Expressions 23 Pattern ’[a-zA-Z]+’ (sequences of lowercase or uppercase letters) ’This is some text -- with punctuation.’ ’This’ .....’is’ ........’some’ .............’text’ .....................’with’ ..........................’punctuation’ Pattern ’[A-Z][a-z]+’ (one uppercase followed by lowercase) ’This is some text -- with punctuation.’ ’This’ As a special case of a character set, the metacharacter dot, or period (.), indicates that the pattern should match any single character in that position. from re_test_patterns import test_patterns test_patterns( ’abbaabbba’, [ (’a.’, ’a followed by any one character’), (’b.’, ’b followed by any one character’), (’a.*b’, ’a followed by anything, ending in b’), (’a.*?b’, ’a followed by anything, ending in b’), ]) Combining a dot with repetition can result in very long matches, unless the non- greedy form is used. $ python re_charset_dot.py Pattern ’a.’ (a followed by any one character) ’abbaabbba’ ’ab’ ...’aa’ Pattern ’b.’ (b followed by any one character) ptg 24 Text ’abbaabbba’ .’bb’ .....’bb’ .......’ba’ Pattern ’a.*b’ (a followed by anything, ending in b) ’abbaabbba’ ’abbaabbb’ Pattern ’a.*?b’ (a followed by anything, ending in b) ’abbaabbba’ ’ab’ ...’aab’ Escape Codes An even more compact representation uses escape codes for several predefined charac- ter sets. The escape codes recognized by re are listed in Table 1.1. Table 1.1. Regular Expression Escape Codes Code Meaning \d A digit \D A nondigit \s Whitespace (tab, space, newline, etc.) \S Nonwhitespace \w Alphanumeric \W Nonalphanumeric Note: Escapes are indicated by prefixing the character with a backslash (\). Unfor- tunately, a backslash must itself be escaped in normal Python strings, and that results in expressions that are difficult to read. Using raw strings, created by prefixing the literal value with r, eliminates this problem and maintains readability. from re_test_patterns import test_patterns test_patterns( ’A prime #1 example!’,  18  Python 标准库  ptg 24 Text ’abbaabbba’ .’bb’ .....’bb’ .......’ba’ Pattern ’a.*b’ (a followed by anything, ending in b) ’abbaabbba’ ’abbaabbb’ Pattern ’a.*?b’ (a followed by anything, ending in b) ’abbaabbba’ ’ab’ ...’aab’ Escape Codes An even more compact representation uses escape codes for several predefined charac- ter sets. The escape codes recognized by re are listed in Table 1.1. Table 1.1. Regular Expression Escape Codes Code Meaning \d A digit \D A nondigit \s Whitespace (tab, space, newline, etc.) \S Nonwhitespace \w Alphanumeric \W Nonalphanumeric Note: Escapes are indicated by prefixing the character with a backslash (\). Unfor- tunately, a backslash must itself be escaped in normal Python strings, and that results in expressions that are difficult to read. Using raw strings, created by prefixing the literal value with r, eliminates this problem and maintains readability. from re_test_patterns import test_patterns test_patterns( ’A prime #1 example!’, 转义码 还有一种更为紧凑的表示,可以对一些预定义的字符集使用转义码。re 可以识别的转义码 如表 1.1 所示。 表 1.1 正则表达式转义码 转 义 码 含  义 \d 一个数字 \D 一个非数字 \s 空白符(制表符、空格、换行符等) \S 非空白符 \w 字母数字 \W 非字母数字 注意 :转义字符通过在该字符前面加一个反斜线 (\) 前缀来指示。遗憾的是,正常的 Python 字 符串中反斜线自身也必须转义,这就会导致表达式很难阅读。通过使用“原始”(raw)字符串 (在字面值前面加一个前缀 r 来创建),可以消除这个问题,并维持可读性。 ptg 24 Text ’abbaabbba’ .’bb’ .....’bb’ .......’ba’ Pattern ’a.*b’ (a followed by anything, ending in b) ’abbaabbba’ ’abbaabbb’ Pattern ’a.*?b’ (a followed by anything, ending in b) ’abbaabbba’ ’ab’ ...’aab’ Escape Codes An even more compact representation uses escape codes for several predefined charac- ter sets. The escape codes recognized by re are listed in Table 1.1. Table 1.1. Regular Expression Escape Codes Code Meaning \d A digit \D A nondigit \s Whitespace (tab, space, newline, etc.) \S Nonwhitespace \w Alphanumeric \W Nonalphanumeric Note: Escapes are indicated by prefixing the character with a backslash (\). Unfor- tunately, a backslash must itself be escaped in normal Python strings, and that results in expressions that are difficult to read. Using raw strings, created by prefixing the literal value with r, eliminates this problem and maintains readability. from re_test_patterns import test_patterns test_patterns( ’A prime #1 example!’, ptg 1.3. re—Regular Expressions 25 [ (r’\d+’, ’sequence of digits’), (r’\D+’, ’sequence of nondigits’), (r’\s+’, ’sequence of whitespace’), (r’\S+’, ’sequence of nonwhitespace’), (r’\w+’, ’alphanumeric characters’), (r’\W+’, ’nonalphanumeric’), ]) These sample expressions combine escape codes with repetition to find sequences of like characters in the input string. $ python re_escape_codes.py Pattern ’\\d+’ (sequence of digits) ’A prime #1 example!’ .........’1’ Pattern ’\\D+’ (sequence of nondigits) ’A prime #1 example!’ ’A prime #’ ..........’ example!’ Pattern ’\\s+’ (sequence of whitespace) ’A prime #1 example!’ .’ ’ .......’ ’ ..........’ ’ Pattern ’\\S+’ (sequence of nonwhitespace) ’A prime #1 example!’ ’A’ ..’prime’ ........’#1’ ...........’example!’ Pattern ’\\w+’ (alphanumeric characters) ’A prime #1 example!’ ’A’ 以下示例表达式结合了转义码和重复,来查找输入字符串中的类似字符序列。 ptg 1.3. re—Regular Expressions 25 [ (r’\d+’, ’sequence of digits’), (r’\D+’, ’sequence of nondigits’), (r’\s+’, ’sequence of whitespace’), (r’\S+’, ’sequence of nonwhitespace’), (r’\w+’, ’alphanumeric characters’), (r’\W+’, ’nonalphanumeric’), ]) These sample expressions combine escape codes with repetition to find sequences of like characters in the input string. $ python re_escape_codes.py Pattern ’\\d+’ (sequence of digits) ’A prime #1 example!’ .........’1’ Pattern ’\\D+’ (sequence of nondigits) ’A prime #1 example!’ ’A prime #’ ..........’ example!’ Pattern ’\\s+’ (sequence of whitespace) ’A prime #1 example!’ .’ ’ .......’ ’ ..........’ ’ Pattern ’\\S+’ (sequence of nonwhitespace) ’A prime #1 example!’ ’A’ ..’prime’ ........’#1’ ...........’example!’ Pattern ’\\w+’ (alphanumeric characters) ’A prime #1 example!’ ’A’ 第 1 章 文  本 19  ptg 1.3. re—Regular Expressions 25 [ (r’\d+’, ’sequence of digits’), (r’\D+’, ’sequence of nondigits’), (r’\s+’, ’sequence of whitespace’), (r’\S+’, ’sequence of nonwhitespace’), (r’\w+’, ’alphanumeric characters’), (r’\W+’, ’nonalphanumeric’), ]) These sample expressions combine escape codes with repetition to find sequences of like characters in the input string. $ python re_escape_codes.py Pattern ’\\d+’ (sequence of digits) ’A prime #1 example!’ .........’1’ Pattern ’\\D+’ (sequence of nondigits) ’A prime #1 example!’ ’A prime #’ ..........’ example!’ Pattern ’\\s+’ (sequence of whitespace) ’A prime #1 example!’ .’ ’ .......’ ’ ..........’ ’ Pattern ’\\S+’ (sequence of nonwhitespace) ’A prime #1 example!’ ’A’ ..’prime’ ........’#1’ ...........’example!’ Pattern ’\\w+’ (alphanumeric characters) ’A prime #1 example!’ ’A’ ptg 26 Text ..’prime’ .........’1’ ...........’example’ Pattern ’\\W+’ (nonalphanumeric) ’A prime #1 example!’ .’ ’ .......’ #’ ..........’ ’ ..................’!’ To match the characters that are part of the regular expression syntax, escape the characters in the search pattern. from re_test_patterns import test_patterns test_patterns( r’\d+ \D+ \s+’, [ (r’\\.\+’, ’escape code’), ]) The pattern in this example escapes the backslash and plus characters, since, as metacharacters, both have special meaning in a regular expression. $ python re_escape_escapes.py Pattern ’\\\\.\\+’ (escape code) ’\\d+ \\D+ \\s+’ ’\\d+’ .....’\\D+’ ..........’\\s+’ Anchoring In addition to describing the content of a pattern to match, the relative location can be specified in the input text where the pattern should appear by using anchoring instruc- tions. Table 1.2 lists valid anchoring codes. 要匹配属于正则表达式语法的字符,需要对搜索模式中的字符进行转义。 ptg 26 Text ..’prime’ .........’1’ ...........’example’ Pattern ’\\W+’ (nonalphanumeric) ’A prime #1 example!’ .’ ’ .......’ #’ ..........’ ’ ..................’!’ To match the characters that are part of the regular expression syntax, escape the characters in the search pattern. from re_test_patterns import test_patterns test_patterns( r’\d+ \D+ \s+’, [ (r’\\.\+’, ’escape code’), ]) The pattern in this example escapes the backslash and plus characters, since, as metacharacters, both have special meaning in a regular expression. $ python re_escape_escapes.py Pattern ’\\\\.\\+’ (escape code) ’\\d+ \\D+ \\s+’ ’\\d+’ .....’\\D+’ ..........’\\s+’ Anchoring In addition to describing the content of a pattern to match, the relative location can be specified in the input text where the pattern should appear by using anchoring instruc- tions. Table 1.2 lists valid anchoring codes.  20  Python 标准库  ptg 26 Text ..’prime’ .........’1’ ...........’example’ Pattern ’\\W+’ (nonalphanumeric) ’A prime #1 example!’ .’ ’ .......’ #’ ..........’ ’ ..................’!’ To match the characters that are part of the regular expression syntax, escape the characters in the search pattern. from re_test_patterns import test_patterns test_patterns( r’\d+ \D+ \s+’, [ (r’\\.\+’, ’escape code’), ]) The pattern in this example escapes the backslash and plus characters, since, as metacharacters, both have special meaning in a regular expression. $ python re_escape_escapes.py Pattern ’\\\\.\\+’ (escape code) ’\\d+ \\D+ \\s+’ ’\\d+’ .....’\\D+’ ..........’\\s+’ Anchoring In addition to describing the content of a pattern to match, the relative location can be specified in the input text where the pattern should appear by using anchoring instruc- tions. Table 1.2 lists valid anchoring codes. 这个例子中的模式对反斜线和加号字符进行了转义,因为作为元字符,这两个字符在正则 表达式中都有特殊的含义。 ptg 26 Text ..’prime’ .........’1’ ...........’example’ Pattern ’\\W+’ (nonalphanumeric) ’A prime #1 example!’ .’ ’ .......’ #’ ..........’ ’ ..................’!’ To match the characters that are part of the regular expression syntax, escape the characters in the search pattern. from re_test_patterns import test_patterns test_patterns( r’\d+ \D+ \s+’, [ (r’\\.\+’, ’escape code’), ]) The pattern in this example escapes the backslash and plus characters, since, as metacharacters, both have special meaning in a regular expression. $ python re_escape_escapes.py Pattern ’\\\\.\\+’ (escape code) ’\\d+ \\D+ \\s+’ ’\\d+’ .....’\\D+’ ..........’\\s+’ Anchoring In addition to describing the content of a pattern to match, the relative location can be specified in the input text where the pattern should appear by using anchoring instruc- tions. Table 1.2 lists valid anchoring codes. 锚定 除了描述要匹配的模式的内容外,还可以使用锚定(anchoring)指令指定输入文本中模式 应当出现的相对位置。表 1.2 列出了合法的锚定码。 表 1.2 正则表达式锚定码 锚 定 码 含  义 ^ 字符串或行的开始 $ 字符串或行的结束 \A 字符串开始 \Z 字符串结束 \b 一个单词开头或末尾的空串 \B 不在一个单词开头或末尾的空串 ptg 1.3. re—Regular Expressions 27 Table 1.2. Regular Expression Anchoring Codes Code Meaning ^ Start of string, or line $ End of string, or line \A Start of string \Z End of string \b Empty string at the beginning or end of a word \B Empty string not at the beginning or end of a word from re_test_patterns import test_patterns test_patterns( ’This is some text -- with punctuation.’, [ (r’^\w+’, ’word at start of string’), (r’\A\w+’, ’word at start of string’), (r’\w+\S*$’, ’word near end of string, skip punctuation’), (r’\w+\S*\Z’, ’word near end of string, skip punctuation’), (r’\w*t\w*’, ’word containing t’), (r’\bt\w+’, ’t at start of word’), (r’\w+t\b’, ’t at end of word’), (r’\Bt\B’, ’t, not start or end of word’), ]) The patterns in the example for matching words at the beginning and end of the string are different because the word at the end of the string is followed by punctuation to terminate the sentence. The pattern \w+$ would not match, since . is not considered an alphanumeric character. $ python re_anchoring.py Pattern ’^\\w+’ (word at start of string) ’This is some text -- with punctuation.’ ’This’ Pattern ’\\A\\w+’ (word at start of string) ’This is some text -- with punctuation.’ ’This’ Pattern ’\\w+\\S*$’ (word near end of string, skip punctuation) 这个例子中,匹配字符串开头和末尾单词的模式是不同的,因为字符串末尾的单词后面有 一个结束句子的标点符号。模式 \w+$ 不能匹配,因为“.”并不是一个字母数字字符。 第 1 章 文  本 21  ptg 1.3. re—Regular Expressions 27 Table 1.2. Regular Expression Anchoring Codes Code Meaning ^ Start of string, or line $ End of string, or line \A Start of string \Z End of string \b Empty string at the beginning or end of a word \B Empty string not at the beginning or end of a word from re_test_patterns import test_patterns test_patterns( ’This is some text -- with punctuation.’, [ (r’^\w+’, ’word at start of string’), (r’\A\w+’, ’word at start of string’), (r’\w+\S*$’, ’word near end of string, skip punctuation’), (r’\w+\S*\Z’, ’word near end of string, skip punctuation’), (r’\w*t\w*’, ’word containing t’), (r’\bt\w+’, ’t at start of word’), (r’\w+t\b’, ’t at end of word’), (r’\Bt\B’, ’t, not start or end of word’), ]) The patterns in the example for matching words at the beginning and end of the string are different because the word at the end of the string is followed by punctuation to terminate the sentence. The pattern \w+$ would not match, since . is not considered an alphanumeric character. $ python re_anchoring.py Pattern ’^\\w+’ (word at start of string) ’This is some text -- with punctuation.’ ’This’ Pattern ’\\A\\w+’ (word at start of string) ’This is some text -- with punctuation.’ ’This’ Pattern ’\\w+\\S*$’ (word near end of string, skip punctuation) ptg 28 Text ’This is some text -- with punctuation.’ ..........................’punctuation.’ Pattern ’\\w+\\S*\\Z’ (word near end of string, skip punctuation) ’This is some text -- with punctuation.’ ..........................’punctuation.’ Pattern ’\\w*t\\w*’ (word containing t) ’This is some text -- with punctuation.’ .............’text’ .....................’with’ ..........................’punctuation’ Pattern ’\\bt\\w+’ (t at start of word) ’This is some text -- with punctuation.’ .............’text’ Pattern ’\\w+t\\b’ (t at end of word) ’This is some text -- with punctuation.’ .............’text’ Pattern ’\\Bt\\B’ (t, not start or end of word) ’This is some text -- with punctuation.’ .......................’t’ ..............................’t’ .................................’t’ 1.3.5 Constraining the Search If it is known in advance that only a subset of the full input should be searched, the reg- ular expression match can be further constrained by telling re to limit the search range. For example, if the pattern must appear at the front of the input, then using match() instead of search()will anchor the search without having to explicitly include an anchor in the search pattern. import re text = ’This is some text -- with punctuation.’ pattern = ’is’  22  Python 标准库  1.3.5 限制搜索 如果提前已经知道只需搜索整个输入的一个子集,可以告诉 re 限制搜索范围,从而进一 步约束正则表达式匹配。例如,如果模式必须出现在输入的最前面,那么使用 match() 而不是 search() 会锚定搜索,而不必在搜索模式中显式地包含一个锚。 ptg 28 Text ’This is some text -- with punctuation.’ ..........................’punctuation.’ Pattern ’\\w+\\S*\\Z’ (word near end of string, skip punctuation) ’This is some text -- with punctuation.’ ..........................’punctuation.’ Pattern ’\\w*t\\w*’ (word containing t) ’This is some text -- with punctuation.’ .............’text’ .....................’with’ ..........................’punctuation’ Pattern ’\\bt\\w+’ (t at start of word) ’This is some text -- with punctuation.’ .............’text’ Pattern ’\\w+t\\b’ (t at end of word) ’This is some text -- with punctuation.’ .............’text’ Pattern ’\\Bt\\B’ (t, not start or end of word) ’This is some text -- with punctuation.’ .......................’t’ ..............................’t’ .................................’t’ 1.3.5 Constraining the Search If it is known in advance that only a subset of the full input should be searched, the reg- ular expression match can be further constrained by telling re to limit the search range. For example, if the pattern must appear at the front of the input, then using match() instead of search()will anchor the search without having to explicitly include an anchor in the search pattern. import re text = ’This is some text -- with punctuation.’ pattern = ’is’ ptg 1.3. re—Regular Expressions 29 print ’Text :’, text print ’Pattern:’, pattern m = re.match(pattern, text) print ’Match :’, m s = re.search(pattern, text) print ’Search :’, s Since the literal text is does not appear at the start of the input text, it is not found using match(). The sequence appears two other times in the text, though, so search() finds it. $ python re_match.py Text : This is some text -- with punctuation. Pattern: is Match : None Search : <_sre.SRE_Match object at 0x100d2bed0> The search() method of a compiled regular expression accepts optional start and end position parameters to limit the search to a substring of the input. import re text = ’This is some text -- with punctuation.’ pattern = re.compile(r’\b\w*is\w*\b’) print ’Text:’, text print pos = 0 while True: match = pattern.search(text, pos) if not match: break s = match.start() e = match.end() print ’ %2d : %2d = "%s"’ % \ (s, e-1, text[s:e]) # Move forward in text for the next search pos = e 由于字面量文本 is 未出现在输入文本最前面,因此使用 match() 无法找到它。不过,这个 序列在文本中另外出现了两次,所以 search() 可以找到它。 ptg 1.3. re—Regular Expressions 29 print ’Text :’, text print ’Pattern:’, pattern m = re.match(pattern, text) print ’Match :’, m s = re.search(pattern, text) print ’Search :’, s Since the literal text is does not appear at the start of the input text, it is not found using match(). The sequence appears two other times in the text, though, so search() finds it. $ python re_match.py Text : This is some text -- with punctuation. Pattern: is Match : None Search : <_sre.SRE_Match object at 0x100d2bed0> The search() method of a compiled regular expression accepts optional start and end position parameters to limit the search to a substring of the input. import re text = ’This is some text -- with punctuation.’ pattern = re.compile(r’\b\w*is\w*\b’) print ’Text:’, text print pos = 0 while True: match = pattern.search(text, pos) if not match: break s = match.start() e = match.end() print ’ %2d : %2d = "%s"’ % \ (s, e-1, text[s:e]) # Move forward in text for the next search pos = e 已编译正则表达式的 search() 方法还接受可选的 start 和 end 位置参数,将搜索限制在输入 的一个子串中。 ptg 1.3. re—Regular Expressions 29 print ’Text :’, text print ’Pattern:’, pattern m = re.match(pattern, text) print ’Match :’, m s = re.search(pattern, text) print ’Search :’, s Since the literal text is does not appear at the start of the input text, it is not found using match(). The sequence appears two other times in the text, though, so search() finds it. $ python re_match.py Text : This is some text -- with punctuation. Pattern: is Match : None Search : <_sre.SRE_Match object at 0x100d2bed0> The search() method of a compiled regular expression accepts optional start and end position parameters to limit the search to a substring of the input. import re text = ’This is some text -- with punctuation.’ pattern = re.compile(r’\b\w*is\w*\b’) print ’Text:’, text print pos = 0 while True: match = pattern.search(text, pos) if not match: break s = match.start() e = match.end() print ’ %2d : %2d = "%s"’ % \ (s, e-1, text[s:e]) # Move forward in text for the next search pos = e 第 1 章 文  本 23  ptg 1.3. re—Regular Expressions 29 print ’Text :’, text print ’Pattern:’, pattern m = re.match(pattern, text) print ’Match :’, m s = re.search(pattern, text) print ’Search :’, s Since the literal text is does not appear at the start of the input text, it is not found using match(). The sequence appears two other times in the text, though, so search() finds it. $ python re_match.py Text : This is some text -- with punctuation. Pattern: is Match : None Search : <_sre.SRE_Match object at 0x100d2bed0> The search() method of a compiled regular expression accepts optional start and end position parameters to limit the search to a substring of the input. import re text = ’This is some text -- with punctuation.’ pattern = re.compile(r’\b\w*is\w*\b’) print ’Text:’, text print pos = 0 while True: match = pattern.search(text, pos) if not match: break s = match.start() e = match.end() print ’ %2d : %2d = "%s"’ % \ (s, e-1, text[s:e]) # Move forward in text for the next search pos = e 这个例子实现了 iterall() 的一种不太高效的形式。每次找到一个匹配时,该匹配的结束位 置将用于下一次搜索。 ptg 30 Text This example implements a less efficient form of iterall(). Each time a match is found, the end position of that match is used for the next search. $ python re_search_substring.py Text: This is some text -- with punctuation. 0 : 3 = "This" 5 : 6 = "is" 1.3.6 Dissecting Matches with Groups Searching for pattern matches is the basis of the powerful capabilities provided by regular expressions. Adding groups to a pattern isolates parts of the matching text, expanding those capabilities to create a parser. Groups are defined by enclosing patterns in parentheses (( and )). from re_test_patterns import test_patterns test_patterns( ’abbaaabbbbaaaaa’, [ (’a(ab)’, ’a followed by literal ab’), (’a(a*b*)’, ’a followed by 0-n a and 0-n b’), (’a(ab)*’, ’a followed by 0-n ab’), (’a(ab)+’, ’a followed by 1-n ab’), ]) Any complete regular expression can be converted to a group and nested within a larger expression. All repetition modifiers can be applied to a group as a whole, requir- ing the entire group pattern to repeat. $ python re_groups.py Pattern ’a(ab)’ (a followed by literal ab) ’abbaaabbbbaaaaa’ ....’aab’ Pattern ’a(a*b*)’ (a followed by 0-n a and 0-n b) ’abbaaabbbbaaaaa’ 1.3.6 用组解析匹配 搜索模式匹配是正则表达式所提供强大功能的基础。为模式增加组(group)可以隔离匹配 文本的各个部分,进一步扩展这些功能来创建一个解析工具。通过将模式包围在小括号中(即 “(”和“)”)来分组。 ptg 30 Text This example implements a less efficient form of iterall(). Each time a match is found, the end position of that match is used for the next search. $ python re_search_substring.py Text: This is some text -- with punctuation. 0 : 3 = "This" 5 : 6 = "is" 1.3.6 Dissecting Matches with Groups Searching for pattern matches is the basis of the powerful capabilities provided by regular expressions. Adding groups to a pattern isolates parts of the matching text, expanding those capabilities to create a parser. Groups are defined by enclosing patterns in parentheses (( and )). from re_test_patterns import test_patterns test_patterns( ’abbaaabbbbaaaaa’, [ (’a(ab)’, ’a followed by literal ab’), (’a(a*b*)’, ’a followed by 0-n a and 0-n b’), (’a(ab)*’, ’a followed by 0-n ab’), (’a(ab)+’, ’a followed by 1-n ab’), ]) Any complete regular expression can be converted to a group and nested within a larger expression. All repetition modifiers can be applied to a group as a whole, requir- ing the entire group pattern to repeat. $ python re_groups.py Pattern ’a(ab)’ (a followed by literal ab) ’abbaaabbbbaaaaa’ ....’aab’ Pattern ’a(a*b*)’ (a followed by 0-n a and 0-n b) ’abbaaabbbbaaaaa’ 任何完整的正则表达式都可以转换为组,并嵌套在一个更大的表达式中。所有重复修饰符 可以应用到整个组作为一个整体,这就要求重复整个组模式。 ptg 30 Text This example implements a less efficient form of iterall(). Each time a match is found, the end position of that match is used for the next search. $ python re_search_substring.py Text: This is some text -- with punctuation. 0 : 3 = "This" 5 : 6 = "is" 1.3.6 Dissecting Matches with Groups Searching for pattern matches is the basis of the powerful capabilities provided by regular expressions. Adding groups to a pattern isolates parts of the matching text, expanding those capabilities to create a parser. Groups are defined by enclosing patterns in parentheses (( and )). from re_test_patterns import test_patterns test_patterns( ’abbaaabbbbaaaaa’, [ (’a(ab)’, ’a followed by literal ab’), (’a(a*b*)’, ’a followed by 0-n a and 0-n b’), (’a(ab)*’, ’a followed by 0-n ab’), (’a(ab)+’, ’a followed by 1-n ab’), ]) Any complete regular expression can be converted to a group and nested within a larger expression. All repetition modifiers can be applied to a group as a whole, requir- ing the entire group pattern to repeat. $ python re_groups.py Pattern ’a(ab)’ (a followed by literal ab) ’abbaaabbbbaaaaa’ ....’aab’ Pattern ’a(a*b*)’ (a followed by 0-n a and 0-n b) ’abbaaabbbbaaaaa’ ptg 1.3. re—Regular Expressions 31 ’abb’ ...’aaabbbb’ ..........’aaaaa’ Pattern ’a(ab)*’ (a followed by 0-n ab) ’abbaaabbbbaaaaa’ ’a’ ...’a’ ....’aab’ ..........’a’ ...........’a’ ............’a’ .............’a’ ..............’a’ Pattern ’a(ab)+’ (a followed by 1-n ab) ’abbaaabbbbaaaaa’ ....’aab’ To access the substrings matched by the individual groups within a pattern, use the groups() method of the Match object. import re text = ’This is some text -- with punctuation.’ print text print patterns = [ (r’^(\w+)’, ’word at start of string’), (r’(\w+)\S*$’, ’word at end, with optional punctuation’), (r’(\bt\w+)\W+(\w+)’, ’word starting with t, another word’), (r’(\w+t)\b’, ’word ending with t’), ] for pattern, desc in patterns: regex = re.compile(pattern) match = regex.search(text) print ’Pattern %r (%s)\n’ % (pattern, desc)  24  Python 标准库  ptg 1.3. re—Regular Expressions 31 ’abb’ ...’aaabbbb’ ..........’aaaaa’ Pattern ’a(ab)*’ (a followed by 0-n ab) ’abbaaabbbbaaaaa’ ’a’ ...’a’ ....’aab’ ..........’a’ ...........’a’ ............’a’ .............’a’ ..............’a’ Pattern ’a(ab)+’ (a followed by 1-n ab) ’abbaaabbbbaaaaa’ ....’aab’ To access the substrings matched by the individual groups within a pattern, use the groups() method of the Match object. import re text = ’This is some text -- with punctuation.’ print text print patterns = [ (r’^(\w+)’, ’word at start of string’), (r’(\w+)\S*$’, ’word at end, with optional punctuation’), (r’(\bt\w+)\W+(\w+)’, ’word starting with t, another word’), (r’(\w+t)\b’, ’word ending with t’), ] for pattern, desc in patterns: regex = re.compile(pattern) match = regex.search(text) print ’Pattern %r (%s)\n’ % (pattern, desc) 要访问一个模式中单个组所匹配的子串,可以使用 Match 对象的 groups() 方法。 ptg 1.3. re—Regular Expressions 31 ’abb’ ...’aaabbbb’ ..........’aaaaa’ Pattern ’a(ab)*’ (a followed by 0-n ab) ’abbaaabbbbaaaaa’ ’a’ ...’a’ ....’aab’ ..........’a’ ...........’a’ ............’a’ .............’a’ ..............’a’ Pattern ’a(ab)+’ (a followed by 1-n ab) ’abbaaabbbbaaaaa’ ....’aab’ To access the substrings matched by the individual groups within a pattern, use the groups() method of the Match object. import re text = ’This is some text -- with punctuation.’ print text print patterns = [ (r’^(\w+)’, ’word at start of string’), (r’(\w+)\S*$’, ’word at end, with optional punctuation’), (r’(\bt\w+)\W+(\w+)’, ’word starting with t, another word’), (r’(\w+t)\b’, ’word ending with t’), ] for pattern, desc in patterns: regex = re.compile(pattern) match = regex.search(text) print ’Pattern %r (%s)\n’ % (pattern, desc) ptg 32 Text print ’ ’, match.groups() print Match.groups() returns a sequence of strings in the order of the groups within the expression that matches the string. $ python re_groups_match.py This is some text -- with punctuation. Pattern ’^(\\w+)’ (word at start of string) (’This’,) Pattern ’(\\w+)\\S*$’ (word at end, with optional punctuation) (’punctuation’,) Pattern ’(\\bt\\w+)\\W+(\\w+)’ (word starting with t, another word) (’text’, ’with’) Pattern ’(\\w+t)\\b’ (word ending with t) (’text’,) Ask for the match of a single group with group(). This is useful when grouping is being used to find parts of the string, but some parts matched by groups are not needed in the results. import re text = ’This is some text -- with punctuation.’ print ’Input text :’, text # word starting with ’t’ then another word regex = re.compile(r’(\bt\w+)\W+(\w+)’) print ’Pattern :’, regex.pattern match = regex.search(text) print ’Entire match :’, match.group(0) Match.groups() 会按表达式中与字符串匹配的组的顺序返回一个字符串序列。 ptg 32 Text print ’ ’, match.groups() print Match.groups() returns a sequence of strings in the order of the groups within the expression that matches the string. $ python re_groups_match.py This is some text -- with punctuation. Pattern ’^(\\w+)’ (word at start of string) (’This’,) Pattern ’(\\w+)\\S*$’ (word at end, with optional punctuation) (’punctuation’,) Pattern ’(\\bt\\w+)\\W+(\\w+)’ (word starting with t, another word) (’text’, ’with’) Pattern ’(\\w+t)\\b’ (word ending with t) (’text’,) Ask for the match of a single group with group(). This is useful when grouping is being used to find parts of the string, but some parts matched by groups are not needed in the results. import re text = ’This is some text -- with punctuation.’ print ’Input text :’, text # word starting with ’t’ then another word regex = re.compile(r’(\bt\w+)\W+(\w+)’) print ’Pattern :’, regex.pattern match = regex.search(text) print ’Entire match :’, match.group(0) 第 1 章 文  本 25  ptg 32 Text print ’ ’, match.groups() print Match.groups() returns a sequence of strings in the order of the groups within the expression that matches the string. $ python re_groups_match.py This is some text -- with punctuation. Pattern ’^(\\w+)’ (word at start of string) (’This’,) Pattern ’(\\w+)\\S*$’ (word at end, with optional punctuation) (’punctuation’,) Pattern ’(\\bt\\w+)\\W+(\\w+)’ (word starting with t, another word) (’text’, ’with’) Pattern ’(\\w+t)\\b’ (word ending with t) (’text’,) Ask for the match of a single group with group(). This is useful when grouping is being used to find parts of the string, but some parts matched by groups are not needed in the results. import re text = ’This is some text -- with punctuation.’ print ’Input text :’, text # word starting with ’t’ then another word regex = re.compile(r’(\bt\w+)\W+(\w+)’) print ’Pattern :’, regex.pattern match = regex.search(text) print ’Entire match :’, match.group(0) 使用 group() 可以得到某个组的匹配。如果使用分组来查找字符串的各部分,不过结果中 并不需要某些与组匹配的部分,此时 group() 会很有用。 ptg 32 Text print ’ ’, match.groups() print Match.groups() returns a sequence of strings in the order of the groups within the expression that matches the string. $ python re_groups_match.py This is some text -- with punctuation. Pattern ’^(\\w+)’ (word at start of string) (’This’,) Pattern ’(\\w+)\\S*$’ (word at end, with optional punctuation) (’punctuation’,) Pattern ’(\\bt\\w+)\\W+(\\w+)’ (word starting with t, another word) (’text’, ’with’) Pattern ’(\\w+t)\\b’ (word ending with t) (’text’,) Ask for the match of a single group with group(). This is useful when grouping is being used to find parts of the string, but some parts matched by groups are not needed in the results. import re text = ’This is some text -- with punctuation.’ print ’Input text :’, text # word starting with ’t’ then another word regex = re.compile(r’(\bt\w+)\W+(\w+)’) print ’Pattern :’, regex.pattern match = regex.search(text) print ’Entire match :’, match.group(0) ptg 1.3. re—Regular Expressions 33 print ’Word starting with "t":’, match.group(1) print ’Word after "t" word :’, match.group(2) Group 0 represents the string matched by the entire expression, and subgroups are numbered starting with 1 in the order their left parenthesis appears in the expression. $ python re_groups_individual.py Input text : This is some text -- with punctuation. Pattern : (\bt\w+)\W+(\w+) Entire match : text -- with Word starting with "t": text Word after "t" word : with Python extends the basic grouping syntax to add named groups. Using names to refer to groups makes it easier to modify the pattern over time, without having to also modify the code using the match results. To set the name of a group, use the syntax (?Ppattern). import re text = ’This is some text -- with punctuation.’ print text print for pattern in [ r’^(?P\w+)’, r’(?P\w+)\S*$’, r’(?P\bt\w+)\W+(?P\w+)’, r’(?P\w+t)\b’, ]: regex = re.compile(pattern) match = regex.search(text) print ’Matching "%s"’ % pattern print ’ ’, match.groups() print ’ ’, match.groupdict() print Use groupdict() to retrieve the dictionary that maps group names to substrings from the match. Named patterns also are included in the ordered sequence returned by groups(). 第 0 组表示与整个表达式匹配的字符串,子组按其左小括号在表达式中出现的顺序从 1 开 始标号。 ptg 1.3. re—Regular Expressions 33 print ’Word starting with "t":’, match.group(1) print ’Word after "t" word :’, match.group(2) Group 0 represents the string matched by the entire expression, and subgroups are numbered starting with 1 in the order their left parenthesis appears in the expression. $ python re_groups_individual.py Input text : This is some text -- with punctuation. Pattern : (\bt\w+)\W+(\w+) Entire match : text -- with Word starting with "t": text Word after "t" word : with Python extends the basic grouping syntax to add named groups. Using names to refer to groups makes it easier to modify the pattern over time, without having to also modify the code using the match results. To set the name of a group, use the syntax (?Ppattern). import re text = ’This is some text -- with punctuation.’ print text print for pattern in [ r’^(?P\w+)’, r’(?P\w+)\S*$’, r’(?P\bt\w+)\W+(?P\w+)’, r’(?P\w+t)\b’, ]: regex = re.compile(pattern) match = regex.search(text) print ’Matching "%s"’ % pattern print ’ ’, match.groups() print ’ ’, match.groupdict() print Use groupdict() to retrieve the dictionary that maps group names to substrings from the match. Named patterns also are included in the ordered sequence returned by groups(). Python 对基本分组语法做了扩展,增加了命名组(named group)。通过使用名字来指示组, 这样以后就可以更容易地修改模式,而不必同时修改使用了匹配结果的代码。要设置一个组的 名字,可以使用以下语法:(?Ppattern)。 ptg 1.3. re—Regular Expressions 33 print ’Word starting with "t":’, match.group(1) print ’Word after "t" word :’, match.group(2) Group 0 represents the string matched by the entire expression, and subgroups are numbered starting with 1 in the order their left parenthesis appears in the expression. $ python re_groups_individual.py Input text : This is some text -- with punctuation. Pattern : (\bt\w+)\W+(\w+) Entire match : text -- with Word starting with "t": text Word after "t" word : with Python extends the basic grouping syntax to add named groups. Using names to refer to groups makes it easier to modify the pattern over time, without having to also modify the code using the match results. To set the name of a group, use the syntax (?Ppattern). import re text = ’This is some text -- with punctuation.’ print text print for pattern in [ r’^(?P\w+)’, r’(?P\w+)\S*$’, r’(?P\bt\w+)\W+(?P\w+)’, r’(?P\w+t)\b’, ]: regex = re.compile(pattern) match = regex.search(text) print ’Matching "%s"’ % pattern print ’ ’, match.groups() print ’ ’, match.groupdict() print Use groupdict() to retrieve the dictionary that maps group names to substrings from the match. Named patterns also are included in the ordered sequence returned by groups().  26  Python 标准库  ptg 1.3. re—Regular Expressions 33 print ’Word starting with "t":’, match.group(1) print ’Word after "t" word :’, match.group(2) Group 0 represents the string matched by the entire expression, and subgroups are numbered starting with 1 in the order their left parenthesis appears in the expression. $ python re_groups_individual.py Input text : This is some text -- with punctuation. Pattern : (\bt\w+)\W+(\w+) Entire match : text -- with Word starting with "t": text Word after "t" word : with Python extends the basic grouping syntax to add named groups. Using names to refer to groups makes it easier to modify the pattern over time, without having to also modify the code using the match results. To set the name of a group, use the syntax (?Ppattern). import re text = ’This is some text -- with punctuation.’ print text print for pattern in [ r’^(?P\w+)’, r’(?P\w+)\S*$’, r’(?P\bt\w+)\W+(?P\w+)’, r’(?P\w+t)\b’, ]: regex = re.compile(pattern) match = regex.search(text) print ’Matching "%s"’ % pattern print ’ ’, match.groups() print ’ ’, match.groupdict() print Use groupdict() to retrieve the dictionary that maps group names to substrings from the match. Named patterns also are included in the ordered sequence returned by groups(). 使用 groupdict() 可以获取一个字典,它将组名映射到匹配的子串。groups() 返回的有序序 列还包含命名模式。 ptg 34 Text $ python re_groups_named.py This is some text -- with punctuation. Matching "^(?P\w+)" (’This’,) {’first_word’: ’This’} Matching "(?P\w+)\S*$" (’punctuation’,) {’last_word’: ’punctuation’} Matching "(?P\bt\w+)\W+(?P\w+)" (’text’, ’with’) {’other_word’: ’with’, ’t_word’: ’text’} Matching "(?P\w+t)\b" (’text’,) {’ends_with_t’: ’text’} An updated version of test_patterns() that shows the numbered and named groups matched by a pattern will make the following examples easier to follow. import re def test_patterns(text, patterns=[]): """Given source text and a list of patterns, look for matches for each pattern within the text and print them to stdout. """ # Look for each pattern in the text and print the results for pattern, desc in patterns: print ’Pattern %r (%s)\n’ % (pattern, desc) print ’ %r’ % text for match in re.finditer(pattern, text): s = match.start() e = match.end() prefix = ’ ’ * (s) print ’ %s%r%s ’ % (prefix, text[s:e], ’ ’*(len(text)-e)), print match.groups() if match.groupdict(): print ’%s%s’ % (’ ’ * (len(text)-s), match.groupdict()) print return 以下是更新后的 test_patterns(),它会显示与一个模式匹配的编号组和命名组,使后面的例 子更容易理解。 ptg 34 Text $ python re_groups_named.py This is some text -- with punctuation. Matching "^(?P\w+)" (’This’,) {’first_word’: ’This’} Matching "(?P\w+)\S*$" (’punctuation’,) {’last_word’: ’punctuation’} Matching "(?P\bt\w+)\W+(?P\w+)" (’text’, ’with’) {’other_word’: ’with’, ’t_word’: ’text’} Matching "(?P\w+t)\b" (’text’,) {’ends_with_t’: ’text’} An updated version of test_patterns() that shows the numbered and named groups matched by a pattern will make the following examples easier to follow. import re def test_patterns(text, patterns=[]): """Given source text and a list of patterns, look for matches for each pattern within the text and print them to stdout. """ # Look for each pattern in the text and print the results for pattern, desc in patterns: print ’Pattern %r (%s)\n’ % (pattern, desc) print ’ %r’ % text for match in re.finditer(pattern, text): s = match.start() e = match.end() prefix = ’ ’ * (s) print ’ %s%r%s ’ % (prefix, text[s:e], ’ ’*(len(text)-e)), print match.groups() if match.groupdict(): print ’%s%s’ % (’ ’ * (len(text)-s), match.groupdict()) print return 第 1 章 文  本 27  ptg 34 Text $ python re_groups_named.py This is some text -- with punctuation. Matching "^(?P\w+)" (’This’,) {’first_word’: ’This’} Matching "(?P\w+)\S*$" (’punctuation’,) {’last_word’: ’punctuation’} Matching "(?P\bt\w+)\W+(?P\w+)" (’text’, ’with’) {’other_word’: ’with’, ’t_word’: ’text’} Matching "(?P\w+t)\b" (’text’,) {’ends_with_t’: ’text’} An updated version of test_patterns() that shows the numbered and named groups matched by a pattern will make the following examples easier to follow. import re def test_patterns(text, patterns=[]): """Given source text and a list of patterns, look for matches for each pattern within the text and print them to stdout. """ # Look for each pattern in the text and print the results for pattern, desc in patterns: print ’Pattern %r (%s)\n’ % (pattern, desc) print ’ %r’ % text for match in re.finditer(pattern, text): s = match.start() e = match.end() prefix = ’ ’ * (s) print ’ %s%r%s ’ % (prefix, text[s:e], ’ ’*(len(text)-e)), print match.groups() if match.groupdict(): print ’%s%s’ % (’ ’ * (len(text)-s), match.groupdict()) print return 因为组本身也是一个完整的正则表达式,所以组可以嵌套在其他组中,构成更复杂的表达式。 ptg 1.3. re—Regular Expressions 35 Since a group is itself a complete regular expression, groups can be nested within other groups to build even more complicated expressions. from re_test_patterns_groups import test_patterns test_patterns( ’abbaabbba’, [ (r’a((a*)(b*))’, ’a followed by 0-n a and 0-n b’), ]) In this case, the group (a*) matches an empty string, so the return value from groups() includes that empty string as the matched value. $ python re_groups_nested.py Pattern ’a((a*)(b*))’ (a followed by 0-n a and 0-n b) ’abbaabbba’ ’abb’ (’bb’, ’’, ’bb’) ’aabbb’ (’abbb’, ’a’, ’bbb’) ’a’ (’’, ’’, ’’) Groups are also useful for specifying alternative patterns. Use the pipe symbol (|) to indicate that one pattern or another should match. Consider the placement of the pipe carefully, though. The first expression in this example matches a sequence of a followed by a sequence consisting entirely of a single letter, a or b. The second pattern matches a followed by a sequence that may include either a or b. The patterns are similar, but the resulting matches are completely different. from re_test_patterns_groups import test_patterns test_patterns( ’abbaabbba’, [ (r’a((a+)|(b+))’, ’a then seq. of a or seq. of b’), (r’a((a|b)+)’, ’a then seq. of [ab]’), ]) When an alternative group is not matched but the entire pattern does match, the return value of groups() includes a None value at the point in the sequence where the alternative group should appear. 在这个例子中,组 (a*) 会匹配一个空串,所以 groups() 的返回值包括这个空串作为匹配值。 ptg 1.3. re—Regular Expressions 35 Since a group is itself a complete regular expression, groups can be nested within other groups to build even more complicated expressions. from re_test_patterns_groups import test_patterns test_patterns( ’abbaabbba’, [ (r’a((a*)(b*))’, ’a followed by 0-n a and 0-n b’), ]) In this case, the group (a*) matches an empty string, so the return value from groups() includes that empty string as the matched value. $ python re_groups_nested.py Pattern ’a((a*)(b*))’ (a followed by 0-n a and 0-n b) ’abbaabbba’ ’abb’ (’bb’, ’’, ’bb’) ’aabbb’ (’abbb’, ’a’, ’bbb’) ’a’ (’’, ’’, ’’) Groups are also useful for specifying alternative patterns. Use the pipe symbol (|) to indicate that one pattern or another should match. Consider the placement of the pipe carefully, though. The first expression in this example matches a sequence of a followed by a sequence consisting entirely of a single letter, a or b. The second pattern matches a followed by a sequence that may include either a or b. The patterns are similar, but the resulting matches are completely different. from re_test_patterns_groups import test_patterns test_patterns( ’abbaabbba’, [ (r’a((a+)|(b+))’, ’a then seq. of a or seq. of b’), (r’a((a|b)+)’, ’a then seq. of [ab]’), ]) When an alternative group is not matched but the entire pattern does match, the return value of groups() includes a None value at the point in the sequence where the alternative group should appear. 组对于指定候选模式也很有用。可以使用管道符号 (|) 指示应当匹配某一个或另一个模式。 不过,要仔细考虑管道符号的放置。下面这个例子中的第一个表达式会匹配一个 a 序列后面跟 有一个完全由某一个字母(a 或 b)构成的序列。第二个模式会匹配一个 a 后面跟有一个可能包 含 a 或 b 的序列。这两个模式很相似,不过得到的匹配结果完全不同。 ptg 1.3. re—Regular Expressions 35 Since a group is itself a complete regular expression, groups can be nested within other groups to build even more complicated expressions. from re_test_patterns_groups import test_patterns test_patterns( ’abbaabbba’, [ (r’a((a*)(b*))’, ’a followed by 0-n a and 0-n b’), ]) In this case, the group (a*) matches an empty string, so the return value from groups() includes that empty string as the matched value. $ python re_groups_nested.py Pattern ’a((a*)(b*))’ (a followed by 0-n a and 0-n b) ’abbaabbba’ ’abb’ (’bb’, ’’, ’bb’) ’aabbb’ (’abbb’, ’a’, ’bbb’) ’a’ (’’, ’’, ’’) Groups are also useful for specifying alternative patterns. Use the pipe symbol (|) to indicate that one pattern or another should match. Consider the placement of the pipe carefully, though. The first expression in this example matches a sequence of a followed by a sequence consisting entirely of a single letter, a or b. The second pattern matches a followed by a sequence that may include either a or b. The patterns are similar, but the resulting matches are completely different. from re_test_patterns_groups import test_patterns test_patterns( ’abbaabbba’, [ (r’a((a+)|(b+))’, ’a then seq. of a or seq. of b’), (r’a((a|b)+)’, ’a then seq. of [ab]’), ]) When an alternative group is not matched but the entire pattern does match, the return value of groups() includes a None value at the point in the sequence where the alternative group should appear. 如果候选组不匹配,但是整个模式确实匹配,groups() 的返回值会在序列中本应出现候选 组的位置上包含一个 None 值。 ptg 36 Text $ python re_groups_alternative.py Pattern ’a((a+)|(b+))’ (a then seq. of a or seq. of b) ’abbaabbba’ ’abb’ (’bb’, None, ’bb’) ’aa’ (’a’, ’a’, None) Pattern ’a((a|b)+)’ (a then seq. of [ab]) ’abbaabbba’ ’abbaabbba’ (’bbaabbba’, ’a’) Defining a group containing a subpattern is also useful when the string matching the subpattern is not part of what should be extracted from the full text. These groups are called noncapturing. Noncapturing groups can be used to describe repetition patterns or alternatives, without isolating the matching portion of the string in the value returned. To create a noncapturing group, use the syntax (?:pattern). from re_test_patterns_groups import test_patterns test_patterns( ’abbaabbba’, [ (r’a((a+)|(b+))’, ’capturing form’), (r’a((?:a+)|(?:b+))’, ’noncapturing’), ]) Compare the groups returned for the capturing and noncapturing forms of a pattern that match the same results. $ python re_groups_noncapturing.py Pattern ’a((a+)|(b+))’ (capturing form) ’abbaabbba’ ’abb’ (’bb’, None, ’bb’) ’aa’ (’a’, ’a’, None) Pattern ’a((?:a+)|(?:b+))’ (noncapturing) ’abbaabbba’  28  Python 标准库  ptg 36 Text $ python re_groups_alternative.py Pattern ’a((a+)|(b+))’ (a then seq. of a or seq. of b) ’abbaabbba’ ’abb’ (’bb’, None, ’bb’) ’aa’ (’a’, ’a’, None) Pattern ’a((a|b)+)’ (a then seq. of [ab]) ’abbaabbba’ ’abbaabbba’ (’bbaabbba’, ’a’) Defining a group containing a subpattern is also useful when the string matching the subpattern is not part of what should be extracted from the full text. These groups are called noncapturing. Noncapturing groups can be used to describe repetition patterns or alternatives, without isolating the matching portion of the string in the value returned. To create a noncapturing group, use the syntax (?:pattern). from re_test_patterns_groups import test_patterns test_patterns( ’abbaabbba’, [ (r’a((a+)|(b+))’, ’capturing form’), (r’a((?:a+)|(?:b+))’, ’noncapturing’), ]) Compare the groups returned for the capturing and noncapturing forms of a pattern that match the same results. $ python re_groups_noncapturing.py Pattern ’a((a+)|(b+))’ (capturing form) ’abbaabbba’ ’abb’ (’bb’, None, ’bb’) ’aa’ (’a’, ’a’, None) Pattern ’a((?:a+)|(?:b+))’ (noncapturing) ’abbaabbba’ 如果匹配子模式的字符串并不是从整个文本抽取的一部分,此时定义一个包含子模式的组 也很有用。这些组称为“非捕获组”(noncapturing)。非捕获组可以用来描述重复模式或候选模 式,而不在返回值中区分字符串的匹配部分。要创建一个非捕获组,可以使用语法 (?:pattern)。 ptg 36 Text $ python re_groups_alternative.py Pattern ’a((a+)|(b+))’ (a then seq. of a or seq. of b) ’abbaabbba’ ’abb’ (’bb’, None, ’bb’) ’aa’ (’a’, ’a’, None) Pattern ’a((a|b)+)’ (a then seq. of [ab]) ’abbaabbba’ ’abbaabbba’ (’bbaabbba’, ’a’) Defining a group containing a subpattern is also useful when the string matching the subpattern is not part of what should be extracted from the full text. These groups are called noncapturing. Noncapturing groups can be used to describe repetition patterns or alternatives, without isolating the matching portion of the string in the value returned. To create a noncapturing group, use the syntax (?:pattern). from re_test_patterns_groups import test_patterns test_patterns( ’abbaabbba’, [ (r’a((a+)|(b+))’, ’capturing form’), (r’a((?:a+)|(?:b+))’, ’noncapturing’), ]) Compare the groups returned for the capturing and noncapturing forms of a pattern that match the same results. $ python re_groups_noncapturing.py Pattern ’a((a+)|(b+))’ (capturing form) ’abbaabbba’ ’abb’ (’bb’, None, ’bb’) ’aa’ (’a’, ’a’, None) Pattern ’a((?:a+)|(?:b+))’ (noncapturing) ’abbaabbba’ 对于一个模式,尽管其捕获和非捕获形式会匹配相同的结果,但是会返回不同的组,下面 来加以比较。 ptg 36 Text $ python re_groups_alternative.py Pattern ’a((a+)|(b+))’ (a then seq. of a or seq. of b) ’abbaabbba’ ’abb’ (’bb’, None, ’bb’) ’aa’ (’a’, ’a’, None) Pattern ’a((a|b)+)’ (a then seq. of [ab]) ’abbaabbba’ ’abbaabbba’ (’bbaabbba’, ’a’) Defining a group containing a subpattern is also useful when the string matching the subpattern is not part of what should be extracted from the full text. These groups are called noncapturing. Noncapturing groups can be used to describe repetition patterns or alternatives, without isolating the matching portion of the string in the value returned. To create a noncapturing group, use the syntax (?:pattern). from re_test_patterns_groups import test_patterns test_patterns( ’abbaabbba’, [ (r’a((a+)|(b+))’, ’capturing form’), (r’a((?:a+)|(?:b+))’, ’noncapturing’), ]) Compare the groups returned for the capturing and noncapturing forms of a pattern that match the same results. $ python re_groups_noncapturing.py Pattern ’a((a+)|(b+))’ (capturing form) ’abbaabbba’ ’abb’ (’bb’, None, ’bb’) ’aa’ (’a’, ’a’, None) Pattern ’a((?:a+)|(?:b+))’ (noncapturing) ’abbaabbba’ ptg 1.3. re—Regular Expressions 37 ’abb’ (’bb’,) ’aa’ (’a’,) 1.3.7 Search Options The way the matching engine processes an expression can be changed using op- tion flags. The flags can be combined using a bitwise OR operation, then passed to compile(), search(), match(), and other functions that accept a pattern for searching. Case-Insensitive Matching IGNORECASE causes literal characters and character ranges in the pattern to match both uppercase and lowercase characters. import re text = ’This is some text -- with punctuation.’ pattern = r’\bT\w+’ with_case = re.compile(pattern) without_case = re.compile(pattern, re.IGNORECASE) print ’Text:\n %r’ % text print ’Pattern:\n %s’ % pattern print ’Case-sensitive:’ for match in with_case.findall(text): print ’ %r’ % match print ’Case-insensitive:’ for match in without_case.findall(text): print ’ %r’ % match Since the pattern includes the literal T, without setting IGNORECASE, the only match is the word This. When case is ignored, text also matches. $ python re_flags_ignorecase.py Text: ’This is some text -- with punctuation.’ Pattern: \bT\w+ Case-sensitive: ’This’ 1.3.7 搜索选项 利用选项标志可以改变匹配引擎处理表达式的方式。可以使用位或(OR)操作结合这些标 志,然后传递至 compile()、search()、match() 以及其他接受匹配模式完成搜索的函数。 不区分大小写的匹配 IGNORECASE 使模式中的字面量字符和字符区间与大小写字符都匹配。 第 1 章 文  本 29  ptg 1.3. re—Regular Expressions 37 ’abb’ (’bb’,) ’aa’ (’a’,) 1.3.7 Search Options The way the matching engine processes an expression can be changed using op- tion flags. The flags can be combined using a bitwise OR operation, then passed to compile(), search(), match(), and other functions that accept a pattern for searching. Case-Insensitive Matching IGNORECASE causes literal characters and character ranges in the pattern to match both uppercase and lowercase characters. import re text = ’This is some text -- with punctuation.’ pattern = r’\bT\w+’ with_case = re.compile(pattern) without_case = re.compile(pattern, re.IGNORECASE) print ’Text:\n %r’ % text print ’Pattern:\n %s’ % pattern print ’Case-sensitive:’ for match in with_case.findall(text): print ’ %r’ % match print ’Case-insensitive:’ for match in without_case.findall(text): print ’ %r’ % match Since the pattern includes the literal T, without setting IGNORECASE, the only match is the word This. When case is ignored, text also matches. $ python re_flags_ignorecase.py Text: ’This is some text -- with punctuation.’ Pattern: \bT\w+ Case-sensitive: ’This’ 由于这个模式包含字面量字符 T,但没有设置 IGNORECASE,因此只有一个匹配,即单词 This。如果忽略大小写,那么 text 也能匹配。 ptg 1.3. re—Regular Expressions 37 ’abb’ (’bb’,) ’aa’ (’a’,) 1.3.7 Search Options The way the matching engine processes an expression can be changed using op- tion flags. The flags can be combined using a bitwise OR operation, then passed to compile(), search(), match(), and other functions that accept a pattern for searching. Case-Insensitive Matching IGNORECASE causes literal characters and character ranges in the pattern to match both uppercase and lowercase characters. import re text = ’This is some text -- with punctuation.’ pattern = r’\bT\w+’ with_case = re.compile(pattern) without_case = re.compile(pattern, re.IGNORECASE) print ’Text:\n %r’ % text print ’Pattern:\n %s’ % pattern print ’Case-sensitive:’ for match in with_case.findall(text): print ’ %r’ % match print ’Case-insensitive:’ for match in without_case.findall(text): print ’ %r’ % match Since the pattern includes the literal T, without setting IGNORECASE, the only match is the word This. When case is ignored, text also matches. $ python re_flags_ignorecase.py Text: ’This is some text -- with punctuation.’ Pattern: \bT\w+ Case-sensitive: ’This’ ptg 38 Text Case-insensitive: ’This’ ’text’ Input with Multiple Lines Two flags affect how searching in multiline input works: MULTILINE and DOTALL. The MULTILINE flag controls how the pattern-matching code processes anchoring instruc- tions for text containing newline characters. When multiline mode is turned on, the anchor rules for ^ and $ apply at the beginning and end of each line, in addition to the entire string. import re text = ’This is some text -- with punctuation.\nA second line.’ pattern = r’(^\w+)|(\w+\S*$)’ single_line = re.compile(pattern) multiline = re.compile(pattern, re.MULTILINE) print ’Text:\n %r’ % text print ’Pattern:\n %s’ % pattern print ’Single Line :’ for match in single_line.findall(text): print ’ %r’ % (match,) print ’Multiline :’ for match in multiline.findall(text): print ’ %r’ % (match,) The pattern in the example matches the first or last word of the input. It matches line. at the end of the string, even though there is no newline. $ python re_flags_multiline.py Text: ’This is some text -- with punctuation.\nA second line.’ Pattern: (^\w+)|(\w+\S*$) Single Line : (’This’, ’’) (’’, ’line.’) Multiline : (’This’, ’’) (’’, ’punctuation.’) 多行输入 有两个标志会影响如何在多行输入中进行搜索:MULTILINE 和 DOTALL。MULTILINE 标 志会控制模式匹配代码如何对包含换行符的文本处理锚定指令。当打开多行模式时,除了整个 字符串外,还要在每一行的开头和结尾应用 ^ 和 $ 的锚定规则。 ptg 38 Text Case-insensitive: ’This’ ’text’ Input with Multiple Lines Two flags affect how searching in multiline input works: MULTILINE and DOTALL. The MULTILINE flag controls how the pattern-matching code processes anchoring instruc- tions for text containing newline characters. When multiline mode is turned on, the anchor rules for ^ and $ apply at the beginning and end of each line, in addition to the entire string. import re text = ’This is some text -- with punctuation.\nA second line.’ pattern = r’(^\w+)|(\w+\S*$)’ single_line = re.compile(pattern) multiline = re.compile(pattern, re.MULTILINE) print ’Text:\n %r’ % text print ’Pattern:\n %s’ % pattern print ’Single Line :’ for match in single_line.findall(text): print ’ %r’ % (match,) print ’Multiline :’ for match in multiline.findall(text): print ’ %r’ % (match,) The pattern in the example matches the first or last word of the input. It matches line. at the end of the string, even though there is no newline. $ python re_flags_multiline.py Text: ’This is some text -- with punctuation.\nA second line.’ Pattern: (^\w+)|(\w+\S*$) Single Line : (’This’, ’’) (’’, ’line.’) Multiline : (’This’, ’’) (’’, ’punctuation.’)  30  Python 标准库  ptg 38 Text Case-insensitive: ’This’ ’text’ Input with Multiple Lines Two flags affect how searching in multiline input works: MULTILINE and DOTALL. The MULTILINE flag controls how the pattern-matching code processes anchoring instruc- tions for text containing newline characters. When multiline mode is turned on, the anchor rules for ^ and $ apply at the beginning and end of each line, in addition to the entire string. import re text = ’This is some text -- with punctuation.\nA second line.’ pattern = r’(^\w+)|(\w+\S*$)’ single_line = re.compile(pattern) multiline = re.compile(pattern, re.MULTILINE) print ’Text:\n %r’ % text print ’Pattern:\n %s’ % pattern print ’Single Line :’ for match in single_line.findall(text): print ’ %r’ % (match,) print ’Multiline :’ for match in multiline.findall(text): print ’ %r’ % (match,) The pattern in the example matches the first or last word of the input. It matches line. at the end of the string, even though there is no newline. $ python re_flags_multiline.py Text: ’This is some text -- with punctuation.\nA second line.’ Pattern: (^\w+)|(\w+\S*$) Single Line : (’This’, ’’) (’’, ’line.’) Multiline : (’This’, ’’) (’’, ’punctuation.’) 这个例子中的模式会匹配输入的第一个或最后一个单词。它会匹配字符串末尾的 line.,尽 管并没有换行符。 ptg 38 Text Case-insensitive: ’This’ ’text’ Input with Multiple Lines Two flags affect how searching in multiline input works: MULTILINE and DOTALL. The MULTILINE flag controls how the pattern-matching code processes anchoring instruc- tions for text containing newline characters. When multiline mode is turned on, the anchor rules for ^ and $ apply at the beginning and end of each line, in addition to the entire string. import re text = ’This is some text -- with punctuation.\nA second line.’ pattern = r’(^\w+)|(\w+\S*$)’ single_line = re.compile(pattern) multiline = re.compile(pattern, re.MULTILINE) print ’Text:\n %r’ % text print ’Pattern:\n %s’ % pattern print ’Single Line :’ for match in single_line.findall(text): print ’ %r’ % (match,) print ’Multiline :’ for match in multiline.findall(text): print ’ %r’ % (match,) The pattern in the example matches the first or last word of the input. It matches line. at the end of the string, even though there is no newline. $ python re_flags_multiline.py Text: ’This is some text -- with punctuation.\nA second line.’ Pattern: (^\w+)|(\w+\S*$) Single Line : (’This’, ’’) (’’, ’line.’) Multiline : (’This’, ’’) (’’, ’punctuation.’) ptg 1.3. re—Regular Expressions 39 (’A’, ’’) (’’, ’line.’) DOTALL is the other flag related to multiline text. Normally, the dot character (.) matches everything in the input text except a newline character. The flag allows dot to match newlines as well. import re text = ’This is some text -- with punctuation.\nA second line.’ pattern = r’.+’ no_newlines = re.compile(pattern) dotall = re.compile(pattern, re.DOTALL) print ’Text:\n %r’ % text print ’Pattern:\n %s’ % pattern print ’No newlines :’ for match in no_newlines.findall(text): print ’ %r’ % match print ’Dotall :’ for match in dotall.findall(text): print ’ %r’ % match Without the flag, each line of the input text matches the pattern separately. Adding the flag causes the entire string to be consumed. $ python re_flags_dotall.py Text: ’This is some text -- with punctuation.\nA second line.’ Pattern: .+ No newlines : ’This is some text -- with punctuation.’ ’A second line.’ Dotall : ’This is some text -- with punctuation.\nA second line.’ Unicode Under Python 2, str objects use the ASCII character set, and regular expression pro- cessing assumes that the pattern and input text are both ASCII. The escape codes DOTALL 也是一个与多行文本有关的标志。正常情况下,点字符 (.) 可以与输入文本中除 了换行符之外的所有其他字符匹配。这个标志则允许点字符还可以匹配换行符。 ptg 1.3. re—Regular Expressions 39 (’A’, ’’) (’’, ’line.’) DOTALL is the other flag related to multiline text. Normally, the dot character (.) matches everything in the input text except a newline character. The flag allows dot to match newlines as well. import re text = ’This is some text -- with punctuation.\nA second line.’ pattern = r’.+’ no_newlines = re.compile(pattern) dotall = re.compile(pattern, re.DOTALL) print ’Text:\n %r’ % text print ’Pattern:\n %s’ % pattern print ’No newlines :’ for match in no_newlines.findall(text): print ’ %r’ % match print ’Dotall :’ for match in dotall.findall(text): print ’ %r’ % match Without the flag, each line of the input text matches the pattern separately. Adding the flag causes the entire string to be consumed. $ python re_flags_dotall.py Text: ’This is some text -- with punctuation.\nA second line.’ Pattern: .+ No newlines : ’This is some text -- with punctuation.’ ’A second line.’ Dotall : ’This is some text -- with punctuation.\nA second line.’ Unicode Under Python 2, str objects use the ASCII character set, and regular expression pro- cessing assumes that the pattern and input text are both ASCII. The escape codes 如果没有这个标志,输入文本的各行会与模式单独匹配。增加了这个标志后,则会利用整 个字符串。 ptg 1.3. re—Regular Expressions 39 (’A’, ’’) (’’, ’line.’) DOTALL is the other flag related to multiline text. Normally, the dot character (.) matches everything in the input text except a newline character. The flag allows dot to match newlines as well. import re text = ’This is some text -- with punctuation.\nA second line.’ pattern = r’.+’ no_newlines = re.compile(pattern) dotall = re.compile(pattern, re.DOTALL) print ’Text:\n %r’ % text print ’Pattern:\n %s’ % pattern print ’No newlines :’ for match in no_newlines.findall(text): print ’ %r’ % match print ’Dotall :’ for match in dotall.findall(text): print ’ %r’ % match Without the flag, each line of the input text matches the pattern separately. Adding the flag causes the entire string to be consumed. $ python re_flags_dotall.py Text: ’This is some text -- with punctuation.\nA second line.’ Pattern: .+ No newlines : ’This is some text -- with punctuation.’ ’A second line.’ Dotall : ’This is some text -- with punctuation.\nA second line.’ Unicode Under Python 2, str objects use the ASCII character set, and regular expression pro- cessing assumes that the pattern and input text are both ASCII. The escape codes 第 1 章 文  本 31  ptg 1.3. re—Regular Expressions 39 (’A’, ’’) (’’, ’line.’) DOTALL is the other flag related to multiline text. Normally, the dot character (.) matches everything in the input text except a newline character. The flag allows dot to match newlines as well. import re text = ’This is some text -- with punctuation.\nA second line.’ pattern = r’.+’ no_newlines = re.compile(pattern) dotall = re.compile(pattern, re.DOTALL) print ’Text:\n %r’ % text print ’Pattern:\n %s’ % pattern print ’No newlines :’ for match in no_newlines.findall(text): print ’ %r’ % match print ’Dotall :’ for match in dotall.findall(text): print ’ %r’ % match Without the flag, each line of the input text matches the pattern separately. Adding the flag causes the entire string to be consumed. $ python re_flags_dotall.py Text: ’This is some text -- with punctuation.\nA second line.’ Pattern: .+ No newlines : ’This is some text -- with punctuation.’ ’A second line.’ Dotall : ’This is some text -- with punctuation.\nA second line.’ Unicode Under Python 2, str objects use the ASCII character set, and regular expression pro- cessing assumes that the pattern and input text are both ASCII. The escape codes Unicode 在 Python 2 中,str 对象使用的是 ASCII 字符集,而且正则表达式处理会假设模式和输入 文本都是 ASCII 字符。先前描述的转义码就默认使用 ASCII 来定义。这些假设意味着模式 \w+ 会匹配单词“French”但是不能匹配单词“Fran ptg 40 Text described earlier are defined in terms of ASCII by default. Those assumptions mean that the pattern \w+ will match the word “French” but not the word “Français,” since the ç is not part of the ASCII character set. To enable Unicode matching in Python 2, add the UNICODE flag when compiling the pattern or when calling the module-level functions search() and match(). import re import codecs import sys # Set standard output encoding to UTF-8. sys.stdout = codecs.getwriter(’UTF-8’)(sys.stdout) text = u’Français złoty Österreich’ pattern = ur’\w+’ ascii_pattern = re.compile(pattern) unicode_pattern = re.compile(pattern, re.UNICODE) print ’Text :’, text print ’Pattern :’, pattern print ’ASCII :’, u’, ’.join(ascii_pattern.findall(text)) print ’Unicode :’, u’, ’.join(unicode_pattern.findall(text)) The other escape sequences (\W, \b, \B, \d, \D, \s, and \S) are also processed differently for Unicode text. Instead of assuming what members of the character set are identified by the escape sequence, the regular expression engine consults the Unicode database to find the properties of each character. $ python re_flags_unicode.py Text : Français złoty Österreich Pattern : \w+ ASCII : Fran, ais, z, oty, sterreich Unicode : Français, złoty, Österreich Note: Python 3 uses Unicode for all strings by default, so the flag is not necessary. Verbose Expression Syntax The compact format of regular expression syntax can become a hindrance as expres- sions grow more complicated. As the number of groups in an expression increases, it ais”,这是因为 ptg 40 Text described earlier are defined in terms of ASCII by default. Those assumptions mean that the pattern \w+ will match the word “French” but not the word “Français,” since the ç is not part of the ASCII character set. To enable Unicode matching in Python 2, add the UNICODE flag when compiling the pattern or when calling the module-level functions search() and match(). import re import codecs import sys # Set standard output encoding to UTF-8. sys.stdout = codecs.getwriter(’UTF-8’)(sys.stdout) text = u’Français złoty Österreich’ pattern = ur’\w+’ ascii_pattern = re.compile(pattern) unicode_pattern = re.compile(pattern, re.UNICODE) print ’Text :’, text print ’Pattern :’, pattern print ’ASCII :’, u’, ’.join(ascii_pattern.findall(text)) print ’Unicode :’, u’, ’.join(unicode_pattern.findall(text)) The other escape sequences (\W, \b, \B, \d, \D, \s, and \S) are also processed differently for Unicode text. Instead of assuming what members of the character set are identified by the escape sequence, the regular expression engine consults the Unicode database to find the properties of each character. $ python re_flags_unicode.py Text : Français złoty Österreich Pattern : \w+ ASCII : Fran, ais, z, oty, sterreich Unicode : Français, złoty, Österreich Note: Python 3 uses Unicode for all strings by default, so the flag is not necessary. Verbose Expression Syntax The compact format of regular expression syntax can become a hindrance as expres- sions grow more complicated. As the number of groups in an expression increases, it 不属于 ASCII 字符集。要在 Python 2 中启用 Unicode 匹配,需要在编译模式时或者调用模块级函数 search() 和 match() 时增 加 Unicode 标志。 ptg 40 Text described earlier are defined in terms of ASCII by default. Those assumptions mean that the pattern \w+ will match the word “French” but not the word “Français,” since the ç is not part of the ASCII character set. To enable Unicode matching in Python 2, add the UNICODE flag when compiling the pattern or when calling the module-level functions search() and match(). import re import codecs import sys # Set standard output encoding to UTF-8. sys.stdout = codecs.getwriter(’UTF-8’)(sys.stdout) text = u’Français złoty Österreich’ pattern = ur’\w+’ ascii_pattern = re.compile(pattern) unicode_pattern = re.compile(pattern, re.UNICODE) print ’Text :’, text print ’Pattern :’, pattern print ’ASCII :’, u’, ’.join(ascii_pattern.findall(text)) print ’Unicode :’, u’, ’.join(unicode_pattern.findall(text)) The other escape sequences (\W, \b, \B, \d, \D, \s, and \S) are also processed differently for Unicode text. Instead of assuming what members of the character set are identified by the escape sequence, the regular expression engine consults the Unicode database to find the properties of each character. $ python re_flags_unicode.py Text : Français złoty Österreich Pattern : \w+ ASCII : Fran, ais, z, oty, sterreich Unicode : Français, złoty, Österreich Note: Python 3 uses Unicode for all strings by default, so the flag is not necessary. Verbose Expression Syntax The compact format of regular expression syntax can become a hindrance as expres- sions grow more complicated. As the number of groups in an expression increases, it 对于 Unicode 文本,其他转义序列(\W、\b、\B、\d、\D、\s 和 \S)也会做不同的处理。 正则表达式引擎不再假设字符集成员由转义序列标识,而会查看 Unicode 数据库来查找各个字 符的属性。 ptg 40 Text described earlier are defined in terms of ASCII by default. Those assumptions mean that the pattern \w+ will match the word “French” but not the word “Français,” since the ç is not part of the ASCII character set. To enable Unicode matching in Python 2, add the UNICODE flag when compiling the pattern or when calling the module-level functions search() and match(). import re import codecs import sys # Set standard output encoding to UTF-8. sys.stdout = codecs.getwriter(’UTF-8’)(sys.stdout) text = u’Français złoty Österreich’ pattern = ur’\w+’ ascii_pattern = re.compile(pattern) unicode_pattern = re.compile(pattern, re.UNICODE) print ’Text :’, text print ’Pattern :’, pattern print ’ASCII :’, u’, ’.join(ascii_pattern.findall(text)) print ’Unicode :’, u’, ’.join(unicode_pattern.findall(text)) The other escape sequences (\W, \b, \B, \d, \D, \s, and \S) are also processed differently for Unicode text. Instead of assuming what members of the character set are identified by the escape sequence, the regular expression engine consults the Unicode database to find the properties of each character. $ python re_flags_unicode.py Text : Français złoty Österreich Pattern : \w+ ASCII : Fran, ais, z, oty, sterreich Unicode : Français, złoty, Österreich Note: Python 3 uses Unicode for all strings by default, so the flag is not necessary. Verbose Expression Syntax The compact format of regular expression syntax can become a hindrance as expres- sions grow more complicated. As the number of groups in an expression increases, it 注意:Python 3 对所有字符串都默认使用 Unicode,所以这个标志已经不再需要。 详细表达式语法 随着表达式变得越来越复杂,紧凑格式的正则表达式语法可能会成为障碍。随着表达式  32  Python 标准库  中组数的增加,需要做更多的工作来明确为什么需要各个元素以及表达式的各部分究竟如何 交互。使用命名组有助于缓解这些问题,不过一种更好的解决方案是使用详细模式表达式 (verbose mode expression),它允许在模式中嵌入注释和额外的空白符。 可以用一个验证 Email 地址的模式来说明采用详细模式能够更容易地处理正则表达式。第 一个版本会识别以 3 个顶级域之一(.com、.org 和 .edu)结尾的地址。 ptg 1.3. re—Regular Expressions 41 will be more work to keep track of why each element is needed and how exactly the parts of the expression interact. Using named groups helps mitigate these issues, but a better solution is to use verbose mode expressions, which allow comments and extra whitespace to be embedded in the pattern. A pattern to validate email addresses will illustrate how verbose mode makes working with regular expressions easier. The first version recognizes addresses that end in one of three top-level domains: .com, .org, and .edu. import re address = re.compile(’[\w\d.+-]+@([\w\d.]+\.)+(com|org|edu)’, re.UNICODE) candidates = [ u’first.last@example.com’, u’first.last+category@gmail.com’, u’valid-address@mail.example.com’, u’not-valid@example.foo’, ] for candidate in candidates: match = address.search(candidate) print ’%-30s %s’ % (candidate, ’Matches’ if match else ’No match’) This expression is already complex. There are several character classes, groups, and repetition expressions. $ python re_email_compact.py first.last@example.com Matches first.last+category@gmail.com Matches valid-address@mail.example.com Matches not-valid@example.foo No match Converting the expression to a more verbose format will make it easier to extend. import re address = re.compile( ’’’ [\w\d.+-]+ # username @ 这个表达式已经很复杂了,其中有多个字符类、组和重复表达式。 ptg 1.3. re—Regular Expressions 41 will be more work to keep track of why each element is needed and how exactly the parts of the expression interact. Using named groups helps mitigate these issues, but a better solution is to use verbose mode expressions, which allow comments and extra whitespace to be embedded in the pattern. A pattern to validate email addresses will illustrate how verbose mode makes working with regular expressions easier. The first version recognizes addresses that end in one of three top-level domains: .com, .org, and .edu. import re address = re.compile(’[\w\d.+-]+@([\w\d.]+\.)+(com|org|edu)’, re.UNICODE) candidates = [ u’first.last@example.com’, u’first.last+category@gmail.com’, u’valid-address@mail.example.com’, u’not-valid@example.foo’, ] for candidate in candidates: match = address.search(candidate) print ’%-30s %s’ % (candidate, ’Matches’ if match else ’No match’) This expression is already complex. There are several character classes, groups, and repetition expressions. $ python re_email_compact.py first.last@example.com Matches first.last+category@gmail.com Matches valid-address@mail.example.com Matches not-valid@example.foo No match Converting the expression to a more verbose format will make it easier to extend. import re address = re.compile( ’’’ [\w\d.+-]+ # username @ 将这个表达式转换为一种更详细的格式,使之更容易扩展。 ptg 1.3. re—Regular Expressions 41 will be more work to keep track of why each element is needed and how exactly the parts of the expression interact. Using named groups helps mitigate these issues, but a better solution is to use verbose mode expressions, which allow comments and extra whitespace to be embedded in the pattern. A pattern to validate email addresses will illustrate how verbose mode makes working with regular expressions easier. The first version recognizes addresses that end in one of three top-level domains: .com, .org, and .edu. import re address = re.compile(’[\w\d.+-]+@([\w\d.]+\.)+(com|org|edu)’, re.UNICODE) candidates = [ u’first.last@example.com’, u’first.last+category@gmail.com’, u’valid-address@mail.example.com’, u’not-valid@example.foo’, ] for candidate in candidates: match = address.search(candidate) print ’%-30s %s’ % (candidate, ’Matches’ if match else ’No match’) This expression is already complex. There are several character classes, groups, and repetition expressions. $ python re_email_compact.py first.last@example.com Matches first.last+category@gmail.com Matches valid-address@mail.example.com Matches not-valid@example.foo No match Converting the expression to a more verbose format will make it easier to extend. import re address = re.compile( ’’’ [\w\d.+-]+ # username @ ptg 42 Text ([\w\d.]+\.)+ # domain name prefix (com|org|edu) # TODO: support more top-level domains ’’’, re.UNICODE | re.VERBOSE) candidates = [ u’first.last@example.com’, u’first.last+category@gmail.com’, u’valid-address@mail.example.com’, u’not-valid@example.foo’, ] for candidate in candidates: match = address.search(candidate) print ’%-30s %s’ % (candidate, ’Matches’ if match else ’No match’) The expression matches the same inputs, but in this extended format, it is easier to read. The comments also help identify different parts of the pattern so that it can be expanded to match more inputs. $ python re_email_verbose.py first.last@example.com Matches first.last+category@gmail.com Matches valid-address@mail.example.com Matches not-valid@example.foo No match This expanded version parses inputs that include a person’s name and email ad- dress, as might appear in an email header. The name comes first and stands on its own, and the email address follows surrounded by angle brackets (< and >). import re address = re.compile( ’’’ # A name is made up of letters, and may include "." # for title abbreviations and middle initials. ((?P ([\w.,]+\s+)*[\w.,]+) \s* # Email addresses are wrapped in angle 第 1 章 文  本 33  ptg 42 Text ([\w\d.]+\.)+ # domain name prefix (com|org|edu) # TODO: support more top-level domains ’’’, re.UNICODE | re.VERBOSE) candidates = [ u’first.last@example.com’, u’first.last+category@gmail.com’, u’valid-address@mail.example.com’, u’not-valid@example.foo’, ] for candidate in candidates: match = address.search(candidate) print ’%-30s %s’ % (candidate, ’Matches’ if match else ’No match’) The expression matches the same inputs, but in this extended format, it is easier to read. The comments also help identify different parts of the pattern so that it can be expanded to match more inputs. $ python re_email_verbose.py first.last@example.com Matches first.last+category@gmail.com Matches valid-address@mail.example.com Matches not-valid@example.foo No match This expanded version parses inputs that include a person’s name and email ad- dress, as might appear in an email header. The name comes first and stands on its own, and the email address follows surrounded by angle brackets (< and >). import re address = re.compile( ’’’ # A name is made up of letters, and may include "." # for title abbreviations and middle initials. ((?P ([\w.,]+\s+)*[\w.,]+) \s* # Email addresses are wrapped in angle 这个表达式会匹配同样的输入,不过采用这种扩展格式将更易读。注释还有助于标识模式 中的不同部分,从而能扩展来匹配更多输入。 ptg 42 Text ([\w\d.]+\.)+ # domain name prefix (com|org|edu) # TODO: support more top-level domains ’’’, re.UNICODE | re.VERBOSE) candidates = [ u’first.last@example.com’, u’first.last+category@gmail.com’, u’valid-address@mail.example.com’, u’not-valid@example.foo’, ] for candidate in candidates: match = address.search(candidate) print ’%-30s %s’ % (candidate, ’Matches’ if match else ’No match’) The expression matches the same inputs, but in this extended format, it is easier to read. The comments also help identify different parts of the pattern so that it can be expanded to match more inputs. $ python re_email_verbose.py first.last@example.com Matches first.last+category@gmail.com Matches valid-address@mail.example.com Matches not-valid@example.foo No match This expanded version parses inputs that include a person’s name and email ad- dress, as might appear in an email header. The name comes first and stands on its own, and the email address follows surrounded by angle brackets (< and >). import re address = re.compile( ’’’ # A name is made up of letters, and may include "." # for title abbreviations and middle initials. ((?P ([\w.,]+\s+)*[\w.,]+) \s* # Email addresses are wrapped in angle 这个扩展的版本会解析包含人名和 Email 地址的输入(这很可能在 Email 首部出现)。首先 是单独的人名,然后是 Email 地址,用尖括号包围(“<”和“>”)。 ptg 42 Text ([\w\d.]+\.)+ # domain name prefix (com|org|edu) # TODO: support more top-level domains ’’’, re.UNICODE | re.VERBOSE) candidates = [ u’first.last@example.com’, u’first.last+category@gmail.com’, u’valid-address@mail.example.com’, u’not-valid@example.foo’, ] for candidate in candidates: match = address.search(candidate) print ’%-30s %s’ % (candidate, ’Matches’ if match else ’No match’) The expression matches the same inputs, but in this extended format, it is easier to read. The comments also help identify different parts of the pattern so that it can be expanded to match more inputs. $ python re_email_verbose.py first.last@example.com Matches first.last+category@gmail.com Matches valid-address@mail.example.com Matches not-valid@example.foo No match This expanded version parses inputs that include a person’s name and email ad- dress, as might appear in an email header. The name comes first and stands on its own, and the email address follows surrounded by angle brackets (< and >). import re address = re.compile( ’’’ # A name is made up of letters, and may include "." # for title abbreviations and middle initials. ((?P ([\w.,]+\s+)*[\w.,]+) \s* # Email addresses are wrapped in angle ptg 1.3. re—Regular Expressions 43 # brackets: < > but only if a name is # found, so keep the start bracket in this # group. < )? # the entire name is optional # The address itself: username@domain.tld (?P [\w\d.+-]+ # username @ ([\w\d.]+\.)+ # domain name prefix (com|org|edu) # limit the allowed top-level domains ) >? # optional closing angle bracket ’’’, re.UNICODE | re.VERBOSE) candidates = [ u’first.last@example.com’, u’first.last+category@gmail.com’, u’valid-address@mail.example.com’, u’not-valid@example.foo’, u’First Last ’, u’No Brackets first.last@example.com’, u’First Last’, u’First Middle Last ’, u’First M. Last ’, u’’, ] for candidate in candidates: print ’Candidate:’, candidate match = address.search(candidate) if match: print ’ Name :’, match.groupdict()[’name’] print ’ Email:’, match.groupdict()[’email’] else: print ’ No match’ As with other programming languages, the ability to insert comments into ver- bose regular expressions helps with their maintainability. This final version includes  34  Python 标准库  ptg 1.3. re—Regular Expressions 43 # brackets: < > but only if a name is # found, so keep the start bracket in this # group. < )? # the entire name is optional # The address itself: username@domain.tld (?P [\w\d.+-]+ # username @ ([\w\d.]+\.)+ # domain name prefix (com|org|edu) # limit the allowed top-level domains ) >? # optional closing angle bracket ’’’, re.UNICODE | re.VERBOSE) candidates = [ u’first.last@example.com’, u’first.last+category@gmail.com’, u’valid-address@mail.example.com’, u’not-valid@example.foo’, u’First Last ’, u’No Brackets first.last@example.com’, u’First Last’, u’First Middle Last ’, u’First M. Last ’, u’’, ] for candidate in candidates: print ’Candidate:’, candidate match = address.search(candidate) if match: print ’ Name :’, match.groupdict()[’name’] print ’ Email:’, match.groupdict()[’email’] else: print ’ No match’ As with other programming languages, the ability to insert comments into ver- bose regular expressions helps with their maintainability. This final version includes 类似于其他编程语言,能够在详细正则表达式中插入注释将有助于增强其可维护性。最后 这个版本包含为将来的维护人员提供的实现说明,另外还包括一些空白符将各个组分开,并突 出其嵌套层次。 ptg 44 Text implementation notes to future maintainers and whitespace to separate the groups from each other and highlight their nesting level. $ python re_email_with_name.py Candidate: first.last@example.com Name : None Email: first.last@example.com Candidate: first.last+category@gmail.com Name : None Email: first.last+category@gmail.com Candidate: valid-address@mail.example.com Name : None Email: valid-address@mail.example.com Candidate: not-valid@example.foo No match Candidate: First Last Name : First Last Email: first.last@example.com Candidate: No Brackets first.last@example.com Name : None Email: first.last@example.com Candidate: First Last No match Candidate: First Middle Last Name : First Middle Last Email: first.last@example.com Candidate: First M. Last Name : First M. Last Email: first.last@example.com Candidate: Name : None Email: first.last@example.com Embedding Flags in Patterns If flags cannot be added when compiling an expression, such as when a pattern is passed as an argument to a library function that will compile it later, the flags can be embedded inside the expression string itself. For example, to turn case-insensitive matching on, add (?i) to the beginning of the expression. 第 1 章 文  本 35  ptg 44 Text implementation notes to future maintainers and whitespace to separate the groups from each other and highlight their nesting level. $ python re_email_with_name.py Candidate: first.last@example.com Name : None Email: first.last@example.com Candidate: first.last+category@gmail.com Name : None Email: first.last+category@gmail.com Candidate: valid-address@mail.example.com Name : None Email: valid-address@mail.example.com Candidate: not-valid@example.foo No match Candidate: First Last Name : First Last Email: first.last@example.com Candidate: No Brackets first.last@example.com Name : None Email: first.last@example.com Candidate: First Last No match Candidate: First Middle Last Name : First Middle Last Email: first.last@example.com Candidate: First M. Last Name : First M. Last Email: first.last@example.com Candidate: Name : None Email: first.last@example.com Embedding Flags in Patterns If flags cannot be added when compiling an expression, such as when a pattern is passed as an argument to a library function that will compile it later, the flags can be embedded inside the expression string itself. For example, to turn case-insensitive matching on, add (?i) to the beginning of the expression. 在模式中嵌入标志 如果编译表达式时不能增加标志,如将模式作为参数传入一个将在以后编译该模式的库函 数时,可以把标志嵌入到表达式字符串本身。例如,要启用不区分大小写的匹配,可以在表达 式开头增加 (?i)。 ptg 1.3. re—Regular Expressions 45 import re text = ’This is some text -- with punctuation.’ pattern = r’(?i)\bT\w+’ regex = re.compile(pattern) print ’Text :’, text print ’Pattern :’, pattern print ’Matches :’, regex.findall(text) Because the options control the way the entire expression is evaluated or parsed, they should always come at the beginning of the expression. $ python re_flags_embedded.py Text : This is some text -- with punctuation. Pattern : (?i)\bT\w+ Matches : [’This’, ’text’] The abbreviations for all flags are listed in Table 1.3. Table 1.3. Regular Expression Flag Abbreviations Flag Abbreviation IGNORECASE i MULTILINE m DOTALL s UNICODE u VERBOSE x Embedded flags can be combined by placing them within the same group. For example, (?imu) turns on case-insensitive matching for multiline Unicode strings. 1.3.8 Looking Ahead or Behind In many cases, it is useful to match a part of a pattern only if some other part will also match. For example, in the email parsing expression, the angle brackets were each marked as optional. Really, though, the brackets should be paired, and the expression should only match if both are present or neither is. This modified version of the 因为这些选项控制了如何计算或解析整个表达式,所以它们总要放在表达式最前面。 ptg 1.3. re—Regular Expressions 45 import re text = ’This is some text -- with punctuation.’ pattern = r’(?i)\bT\w+’ regex = re.compile(pattern) print ’Text :’, text print ’Pattern :’, pattern print ’Matches :’, regex.findall(text) Because the options control the way the entire expression is evaluated or parsed, they should always come at the beginning of the expression. $ python re_flags_embedded.py Text : This is some text -- with punctuation. Pattern : (?i)\bT\w+ Matches : [’This’, ’text’] The abbreviations for all flags are listed in Table 1.3. Table 1.3. Regular Expression Flag Abbreviations Flag Abbreviation IGNORECASE i MULTILINE m DOTALL s UNICODE u VERBOSE x Embedded flags can be combined by placing them within the same group. For example, (?imu) turns on case-insensitive matching for multiline Unicode strings. 1.3.8 Looking Ahead or Behind In many cases, it is useful to match a part of a pattern only if some other part will also match. For example, in the email parsing expression, the angle brackets were each marked as optional. Really, though, the brackets should be paired, and the expression should only match if both are present or neither is. This modified version of the 表 1.3 列出了所有标志的缩写。 表 1.3 正则表达式标志缩写 标  志 缩  写 IGNORECASE i MULTILINE m DOTALL s UNICODE u VERBOSE x 可以把嵌入标志放在同一个组中结合使用。例如,(?imu) 会打开相应选项,支持多行 Unicode 字符串不区分大小写的匹配。  36  Python 标准库  1.3.8 前向或后向 很多情况下,仅当模式中另外某个部分也匹配时才匹配模式的某一部分,这可能很有用。 例如,在 Email 解析表达式中,两个尖括号分别标志为可选。不过,实际上尖括号必须成对, 只有当两个尖括号都出现或都不出现时表达式才能匹配。修改后的表达式使用了一个肯定前向 (positive look-ahead)断言来匹配尖括号对。前向断言语法为 (?=pattern)。 ptg 46 Text expression uses a positive look-ahead assertion to match the pair. The look-ahead as- sertion syntax is (?=pattern). import re address = re.compile( ’’’ # A name is made up of letters, and may include "." # for title abbreviations and middle initials. ((?P ([\w.,]+\s+)*[\w.,]+ ) \s+ ) # name is no longer optional # LOOKAHEAD # Email addresses are wrapped in angle brackets, but only # if they are both present or neither is. (?= (<.*>$) # remainder wrapped in angle brackets | ([^<].*[^>]$) # remainder *not* wrapped in angle brackets ) [\w\d.+-]+ # username @ ([\w\d.]+\.)+ # domain name prefix (com|org|edu) # limit the allowed top-level domains ) >? # optional closing angle bracket ’’’, re.UNICODE | re.VERBOSE) candidates = [ u’First Last ’, u’No Brackets first.last@example.com’, u’Open Bracket ’, ] 第 1 章 文  本 37  ptg 1.3. re—Regular Expressions 47 for candidate in candidates: print ’Candidate:’, candidate match = address.search(candidate) if match: print ’ Name :’, match.groupdict()[’name’] print ’ Email:’, match.groupdict()[’email’] else: print ’ No match’ Several important changes occur in this version of the expression. First, the name portion is no longer optional. That means stand-alone addresses do not match, but it also prevents improperly formatted name/address combinations from matching. The positive look-ahead rule after the “name” group asserts that the remainder of the string is either wrapped with a pair of angle brackets or there is not a mismatched bracket; the brackets are either both present or neither is. The look-ahead is expressed as a group, but the match for a look-ahead group does not consume any of the input text. The rest of the pattern picks up from the same spot after the look-ahead matches. $ python re_look_ahead.py Candidate: First Last Name : First Last Email: first.last@example.com Candidate: No Brackets first.last@example.com Name : No Brackets Email: first.last@example.com Candidate: Open Bracket No match A negative look-ahead assertion ((?!pattern)) says that the pattern does not match the text following the current point. For example, the email recognition pattern could be modified to ignore noreply mailing addresses automated systems commonly use. import re address = re.compile( ’’’ ^ 这个版本的表达式中出现了很多重要的变化。首先,name 部分不再是可选的。这说明,单 独的地址将不能匹配,还能避免匹配那些格式不正确的“名 / 地址”组合。“name”组后面的肯 定前向规则断言字符串的余下部分要么包围在一对尖括号中,要么不存在不匹配的尖括号 ;也 就是尖括号要么都出现,要么都不出现。这个前向规则表述为一个组,不过前向组的匹配并不 利用任何输入文本。这个模式的其余部分会从前向匹配之后的位置取字符。 ptg 1.3. re—Regular Expressions 47 for candidate in candidates: print ’Candidate:’, candidate match = address.search(candidate) if match: print ’ Name :’, match.groupdict()[’name’] print ’ Email:’, match.groupdict()[’email’] else: print ’ No match’ Several important changes occur in this version of the expression. First, the name portion is no longer optional. That means stand-alone addresses do not match, but it also prevents improperly formatted name/address combinations from matching. The positive look-ahead rule after the “name” group asserts that the remainder of the string is either wrapped with a pair of angle brackets or there is not a mismatched bracket; the brackets are either both present or neither is. The look-ahead is expressed as a group, but the match for a look-ahead group does not consume any of the input text. The rest of the pattern picks up from the same spot after the look-ahead matches. $ python re_look_ahead.py Candidate: First Last Name : First Last Email: first.last@example.com Candidate: No Brackets first.last@example.com Name : No Brackets Email: first.last@example.com Candidate: Open Bracket No match A negative look-ahead assertion ((?!pattern)) says that the pattern does not match the text following the current point. For example, the email recognition pattern could be modified to ignore noreply mailing addresses automated systems commonly use. import re address = re.compile( ’’’ ^ 否定前向(negative look-ahead)断言 ((?!pattern)) 要求模式不匹配当前位置后面的文本。 例如,Email 识别模式可以修改为忽略自动系统常用的 noreply 邮件地址。 ptg 1.3. re—Regular Expressions 47 for candidate in candidates: print ’Candidate:’, candidate match = address.search(candidate) if match: print ’ Name :’, match.groupdict()[’name’] print ’ Email:’, match.groupdict()[’email’] else: print ’ No match’ Several important changes occur in this version of the expression. First, the name portion is no longer optional. That means stand-alone addresses do not match, but it also prevents improperly formatted name/address combinations from matching. The positive look-ahead rule after the “name” group asserts that the remainder of the string is either wrapped with a pair of angle brackets or there is not a mismatched bracket; the brackets are either both present or neither is. The look-ahead is expressed as a group, but the match for a look-ahead group does not consume any of the input text. The rest of the pattern picks up from the same spot after the look-ahead matches. $ python re_look_ahead.py Candidate: First Last Name : First Last Email: first.last@example.com Candidate: No Brackets first.last@example.com Name : No Brackets Email: first.last@example.com Candidate: Open Bracket No match A negative look-ahead assertion ((?!pattern)) says that the pattern does not match the text following the current point. For example, the email recognition pattern could be modified to ignore noreply mailing addresses automated systems commonly use. import re address = re.compile( ’’’ ^ ptg 48 Text # An address: username@domain.tld # Ignore noreply addresses (?!noreply@.*$) [\w\d.+-]+ # username @ ([\w\d.]+\.)+ # domain name prefix (com|org|edu) # limit the allowed top-level domains $ ’’’, re.UNICODE | re.VERBOSE) candidates = [ u’first.last@example.com’, u’noreply@example.com’, ] for candidate in candidates: print ’Candidate:’, candidate match = address.search(candidate) if match: print ’ Match:’, candidate[match.start():match.end()] else: print ’ No match’ The address starting with noreply does not match the pattern, since the look- ahead assertion fails. $ python re_negative_look_ahead.py Candidate: first.last@example.com Match: first.last@example.com Candidate: noreply@example.com No match Instead of looking ahead for noreply in the username portion of the email ad- dress, the pattern can also be written using a negative look-behind assertion after the username is matched using the syntax (? \1 # first name \. \4 # last name @ ([\w\d.]+\.)+ # domain name prefix (com|org|edu) # limit the allowed top-level domains ) > ’’’, re.UNICODE | re.VERBOSE | re.IGNORECASE) candidates = [ u’First Last ’, u’Different Name ’, u’First Middle Last ’, u’First M. Last ’, ] for candidate in candidates: print ’Candidate:’, candidate match = address.search(candidate) if match: print ’ Match name :’, match.group(1), match.group(4) print ’ Match email:’, match.group(5) else: print ’ No match’ Although the syntax is simple, creating back-references by numerical id has a couple of disadvantages. From a practical standpoint, as the expression changes, the groups must be counted again and every reference may need to be updated. The other disadvantage is that only 99 references can be made this way, because if the id number 第 1 章 文  本 41  ptg 1.3. re—Regular Expressions 51 (\w+) # first name \s+ (([\w.]+)\s+)? # optional middle name or initial (\w+) # last name \s+ < # The address: first_name.last_name@domain.tld (?P \1 # first name \. \4 # last name @ ([\w\d.]+\.)+ # domain name prefix (com|org|edu) # limit the allowed top-level domains ) > ’’’, re.UNICODE | re.VERBOSE | re.IGNORECASE) candidates = [ u’First Last ’, u’Different Name ’, u’First Middle Last ’, u’First M. Last ’, ] for candidate in candidates: print ’Candidate:’, candidate match = address.search(candidate) if match: print ’ Match name :’, match.group(1), match.group(4) print ’ Match email:’, match.group(5) else: print ’ No match’ Although the syntax is simple, creating back-references by numerical id has a couple of disadvantages. From a practical standpoint, as the expression changes, the groups must be counted again and every reference may need to be updated. The other disadvantage is that only 99 references can be made this way, because if the id number 尽管这个语法很简单,不过按数字 id 创建反向引用有两个缺点。从实用角度讲,当表达式 改变时,这些组就必须重新编号,每个引用可能都需要更新。另一个缺点是,采用这种方法只 能创建 99 个引用,因为如果 id 编号有 3 位,就会解释为一个八进制字符值而不是一个组引用。 另一方面,如果一个表达式有超过 99 个组,问题就不只是无法引用表达式中的某些组那么简 单了,还会产生一些更严重的维护问题。 ptg 52 Text is three digits long, it will be interpreted as an octal character value instead of a group reference. On the other hand, if an expression has more than 99 groups, more serious maintenance challenges will arise than not being able to refer to some groups in the expression. $ python re_refer_to_group.py Candidate: First Last Match name : First Last Match email: first.last@example.com Candidate: Different Name No match Candidate: First Middle Last Match name : First Last Match email: first.last@example.com Candidate: First M. Last Match name : First Last Match email: first.last@example.com Python’s expression parser includes an extension that uses (?P=name) to refer to the value of a named group matched earlier in the expression. import re address = re.compile( ’’’ # The regular name (?P\w+) \s+ (([\w.]+)\s+)? # optional middle name or initial (?P\w+) \s+ < # The address: first_name.last_name@domain.tld (?P (?P=first_name) \. (?P=last_name) Python 的表达式解析器包括一个扩展,可以使用 (?P=name) 指示表达式中先前匹配的一个 命名组的值。 ptg 52 Text is three digits long, it will be interpreted as an octal character value instead of a group reference. On the other hand, if an expression has more than 99 groups, more serious maintenance challenges will arise than not being able to refer to some groups in the expression. $ python re_refer_to_group.py Candidate: First Last Match name : First Last Match email: first.last@example.com Candidate: Different Name No match Candidate: First Middle Last Match name : First Last Match email: first.last@example.com Candidate: First M. Last Match name : First Last Match email: first.last@example.com Python’s expression parser includes an extension that uses (?P=name) to refer to the value of a named group matched earlier in the expression. import re address = re.compile( ’’’ # The regular name (?P\w+) \s+ (([\w.]+)\s+)? # optional middle name or initial (?P\w+) \s+ < # The address: first_name.last_name@domain.tld (?P (?P=first_name) \. (?P=last_name)  42  Python 标准库  ptg 52 Text is three digits long, it will be interpreted as an octal character value instead of a group reference. On the other hand, if an expression has more than 99 groups, more serious maintenance challenges will arise than not being able to refer to some groups in the expression. $ python re_refer_to_group.py Candidate: First Last Match name : First Last Match email: first.last@example.com Candidate: Different Name No match Candidate: First Middle Last Match name : First Last Match email: first.last@example.com Candidate: First M. Last Match name : First Last Match email: first.last@example.com Python’s expression parser includes an extension that uses (?P=name) to refer to the value of a named group matched earlier in the expression. import re address = re.compile( ’’’ # The regular name (?P\w+) \s+ (([\w.]+)\s+)? # optional middle name or initial (?P\w+) \s+ < # The address: first_name.last_name@domain.tld (?P (?P=first_name) \. (?P=last_name) ptg 1.3. re—Regular Expressions 53 @ ([\w\d.]+\.)+ # domain name prefix (com|org|edu) # limit the allowed top-level domains ) > ’’’, re.UNICODE | re.VERBOSE | re.IGNORECASE) candidates = [ u’First Last ’, u’Different Name ’, u’First Middle Last ’, u’First M. Last ’, ] for candidate in candidates: print ’Candidate:’, candidate match = address.search(candidate) if match: print ’ Match name :’, match.groupdict()[’first_name’], print match.groupdict()[’last_name’] print ’ Match email:’, match.groupdict()[’email’] else: print ’ No match’ The address expression is compiled with the IGNORECASE flag on, since proper names are normally capitalized but email addresses are not. $ python re_refer_to_named_group.py Candidate: First Last Match name : First Last Match email: first.last@example.com Candidate: Different Name No match Candidate: First Middle Last Match name : First Last Match email: first.last@example.com Candidate: First M. Last Match name : First Last Match email: first.last@example.com 编译地址表达式时打开了 IGNORECASE 标志,因为尽管正确的名字通常首字母会大写, 但 Email 地址往往不会大写首字母。 ptg 1.3. re—Regular Expressions 53 @ ([\w\d.]+\.)+ # domain name prefix (com|org|edu) # limit the allowed top-level domains ) > ’’’, re.UNICODE | re.VERBOSE | re.IGNORECASE) candidates = [ u’First Last ’, u’Different Name ’, u’First Middle Last ’, u’First M. Last ’, ] for candidate in candidates: print ’Candidate:’, candidate match = address.search(candidate) if match: print ’ Match name :’, match.groupdict()[’first_name’], print match.groupdict()[’last_name’] print ’ Match email:’, match.groupdict()[’email’] else: print ’ No match’ The address expression is compiled with the IGNORECASE flag on, since proper names are normally capitalized but email addresses are not. $ python re_refer_to_named_group.py Candidate: First Last Match name : First Last Match email: first.last@example.com Candidate: Different Name No match Candidate: First Middle Last Match name : First Last Match email: first.last@example.com Candidate: First M. Last Match name : First Last Match email: first.last@example.com 在表达式中使用反向引用还有一种机制,即根据前一个组是否匹配来选择不同的模式。可 以修正这个 Email 模式,使得如果出现名字就需要有尖括号,不过如果只有 Email 地址本身就 第 1 章 文  本 43  不需要尖括号。查看一个组是否匹配的语法是 (?(id)yes-expression|no-expression),这里 id 是组 名或编号,yes-expression 是组有值时使用的模式,no-expression 则是组没有值时使用的模式。 ptg 54 Text The other mechanism for using back-references in expressions chooses a different pattern based on whether a previous group matched. The email pattern can be cor- rected so that the angle brackets are required if a name is present, but not if the email address is by itself. The syntax for testing to see if a group has matched is (?(id)yes-expression|no-expression), where id is the group name or num- ber, yes-expression is the pattern to use if the group has a value, and no-expression is the pattern to use otherwise. import re address = re.compile( ’’’ ^ # A name is made up of letters, and may include "." # for title abbreviations and middle initials. (?P ([\w.]+\s+)*[\w.]+ )? \s* # Email addresses are wrapped in angle brackets, but # only if a name is found. (?(name) # remainder wrapped in angle brackets because # there is a name (?P(?=(<.*>$))) | # remainder does not include angle brackets without name (?=([^<].*[^>]$)) ) # Only look for a bracket if the look-ahead assertion # found both of them. (?(brackets)<|\s*) # The address itself: username@domain.tld (?P [\w\d.+-]+ # username @ ([\w\d.]+\.)+ # domain name prefix (com|org|edu) # limit the allowed top-level domains ptg 1.3. re—Regular Expressions 55 ) # Only look for a bracket if the look-ahead assertion # found both of them. (?(brackets)>|\s*) $ ’’’, re.UNICODE | re.VERBOSE) candidates = [ u’First Last ’, u’No Brackets first.last@example.com’, u’Open Bracket ’, u’no.brackets@example.com’, ] for candidate in candidates: print ’Candidate:’, candidate match = address.search(candidate) if match: print ’ Match name :’, match.groupdict()[’name’] print ’ Match email:’, match.groupdict()[’email’] else: print ’ No match’ This version of the email address parser uses two tests. If the name group matches, then the look-ahead assertion requires both angle brackets and sets up the brackets group. If name is not matched, the assertion requires that the rest of the text not have an- gle brackets around it. Later, if the brackets group is set, the actual pattern-matching code consumes the brackets in the input using literal patterns; otherwise, it consumes any blank space. $ python re_id.py Candidate: First Last Match name : First Last Match email: first.last@example.com Candidate: No Brackets first.last@example.com No match Candidate: Open Bracket |\s*) $ ’’’, re.UNICODE | re.VERBOSE) candidates = [ u’First Last ’, u’No Brackets first.last@example.com’, u’Open Bracket ’, u’no.brackets@example.com’, ] for candidate in candidates: print ’Candidate:’, candidate match = address.search(candidate) if match: print ’ Match name :’, match.groupdict()[’name’] print ’ Match email:’, match.groupdict()[’email’] else: print ’ No match’ This version of the email address parser uses two tests. If the name group matches, then the look-ahead assertion requires both angle brackets and sets up the brackets group. If name is not matched, the assertion requires that the rest of the text not have an- gle brackets around it. Later, if the brackets group is set, the actual pattern-matching code consumes the brackets in the input using literal patterns; otherwise, it consumes any blank space. $ python re_id.py Candidate: First Last Match name : First Last Match email: first.last@example.com Candidate: No Brackets first.last@example.com No match Candidate: Open Bracket |\s*) $ ’’’, re.UNICODE | re.VERBOSE) candidates = [ u’First Last ’, u’No Brackets first.last@example.com’, u’Open Bracket ’, u’no.brackets@example.com’, ] for candidate in candidates: print ’Candidate:’, candidate match = address.search(candidate) if match: print ’ Match name :’, match.groupdict()[’name’] print ’ Match email:’, match.groupdict()[’email’] else: print ’ No match’ This version of the email address parser uses two tests. If the name group matches, then the look-ahead assertion requires both angle brackets and sets up the brackets group. If name is not matched, the assertion requires that the rest of the text not have an- gle brackets around it. Later, if the brackets group is set, the actual pattern-matching code consumes the brackets in the input using literal patterns; otherwise, it consumes any blank space. $ python re_id.py Candidate: First Last Match name : First Last Match email: first.last@example.com Candidate: No Brackets first.last@example.com No match Candidate: Open Bracket No match Candidate: no.brackets@example.com Match name : None Match email: no.brackets@example.com 1.3.10 Modifying Strings with Patterns In addition to searching through text, re also supports modifying text using regular ex- pressions as the search mechanism, and the replacements can reference groups matched in the regex as part of the substitution text. Use sub() to replace all occurrences of a pattern with another string. import re bold = re.compile(r’\*{2}(.*?)\*{2}’) text = ’Make this **bold**. This **too**.’ print ’Text:’, text print ’Bold:’, bold.sub(r’\1’, text) References to the text matched by the pattern can be inserted using the \num syntax used for back-references. $ python re_sub.py Text: Make this **bold**. This **too**. Bold: Make this bold. This too. To use named groups in the substitution, use the syntax \g. import re bold = re.compile(r’\*{2}(?P.*?)\*{2}’, re.UNICODE) text = ’Make this **bold**. This **too**.’ print ’Text:’, text print ’Bold:’, bold.sub(r’\g’, text) 1.3.10 用模式修改字符串 除了搜索文本外,re 还支持使用正则表达式作为搜索机制来修改文本,而且替换可以引用 正则表达式中的匹配组作为替换文本的一部分。使用 sub() 可以将一个模式的所有出现替换为 另一个字符串。 ptg 56 Text No match Candidate: Close Bracket first.last@example.com> No match Candidate: no.brackets@example.com Match name : None Match email: no.brackets@example.com 1.3.10 Modifying Strings with Patterns In addition to searching through text, re also supports modifying text using regular ex- pressions as the search mechanism, and the replacements can reference groups matched in the regex as part of the substitution text. Use sub() to replace all occurrences of a pattern with another string. import re bold = re.compile(r’\*{2}(.*?)\*{2}’) text = ’Make this **bold**. This **too**.’ print ’Text:’, text print ’Bold:’, bold.sub(r’\1’, text) References to the text matched by the pattern can be inserted using the \num syntax used for back-references. $ python re_sub.py Text: Make this **bold**. This **too**. Bold: Make this bold. This too. To use named groups in the substitution, use the syntax \g. import re bold = re.compile(r’\*{2}(?P.*?)\*{2}’, re.UNICODE) text = ’Make this **bold**. This **too**.’ print ’Text:’, text print ’Bold:’, bold.sub(r’\g’, text) 第 1 章 文  本 45  ptg 56 Text No match Candidate: Close Bracket first.last@example.com> No match Candidate: no.brackets@example.com Match name : None Match email: no.brackets@example.com 1.3.10 Modifying Strings with Patterns In addition to searching through text, re also supports modifying text using regular ex- pressions as the search mechanism, and the replacements can reference groups matched in the regex as part of the substitution text. Use sub() to replace all occurrences of a pattern with another string. import re bold = re.compile(r’\*{2}(.*?)\*{2}’) text = ’Make this **bold**. This **too**.’ print ’Text:’, text print ’Bold:’, bold.sub(r’\1’, text) References to the text matched by the pattern can be inserted using the \num syntax used for back-references. $ python re_sub.py Text: Make this **bold**. This **too**. Bold: Make this bold. This too. To use named groups in the substitution, use the syntax \g. import re bold = re.compile(r’\*{2}(?P.*?)\*{2}’, re.UNICODE) text = ’Make this **bold**. This **too**.’ print ’Text:’, text print ’Bold:’, bold.sub(r’\g’, text) 可以使用向后引用的 \num 语法插入与模式匹配的文本的引用。 ptg 56 Text No match Candidate: Close Bracket first.last@example.com> No match Candidate: no.brackets@example.com Match name : None Match email: no.brackets@example.com 1.3.10 Modifying Strings with Patterns In addition to searching through text, re also supports modifying text using regular ex- pressions as the search mechanism, and the replacements can reference groups matched in the regex as part of the substitution text. Use sub() to replace all occurrences of a pattern with another string. import re bold = re.compile(r’\*{2}(.*?)\*{2}’) text = ’Make this **bold**. This **too**.’ print ’Text:’, text print ’Bold:’, bold.sub(r’\1’, text) References to the text matched by the pattern can be inserted using the \num syntax used for back-references. $ python re_sub.py Text: Make this **bold**. This **too**. Bold: Make this bold. This too. To use named groups in the substitution, use the syntax \g. import re bold = re.compile(r’\*{2}(?P.*?)\*{2}’, re.UNICODE) text = ’Make this **bold**. This **too**.’ print ’Text:’, text print ’Bold:’, bold.sub(r’\g’, text) 要在替换中使用命名组,可以使用语法 \g。 ptg 56 Text No match Candidate: Close Bracket first.last@example.com> No match Candidate: no.brackets@example.com Match name : None Match email: no.brackets@example.com 1.3.10 Modifying Strings with Patterns In addition to searching through text, re also supports modifying text using regular ex- pressions as the search mechanism, and the replacements can reference groups matched in the regex as part of the substitution text. Use sub() to replace all occurrences of a pattern with another string. import re bold = re.compile(r’\*{2}(.*?)\*{2}’) text = ’Make this **bold**. This **too**.’ print ’Text:’, text print ’Bold:’, bold.sub(r’\1’, text) References to the text matched by the pattern can be inserted using the \num syntax used for back-references. $ python re_sub.py Text: Make this **bold**. This **too**. Bold: Make this bold. This too. To use named groups in the substitution, use the syntax \g. import re bold = re.compile(r’\*{2}(?P.*?)\*{2}’, re.UNICODE) text = ’Make this **bold**. This **too**.’ print ’Text:’, text print ’Bold:’, bold.sub(r’\g’, text) \g 语法还适用于编号引用,使用这个语法可以消除组编号和两侧字面量数字之间的 多义性。 ptg 1.3. re—Regular Expressions 57 The \g syntax also works with numbered references, and using it elimi- nates any ambiguity between group numbers and surrounding literal digits. $ python re_sub_named_groups.py Text: Make this **bold**. This **too**. Bold: Make this bold. This too. Pass a value to count to limit the number of substitutions performed. import re bold = re.compile(r’\*{2}(.*?)\*{2}’, re.UNICODE) text = ’Make this **bold**. This **too**.’ print ’Text:’, text print ’Bold:’, bold.sub(r’\1’, text, count=1) Only the first substitution is made because count is 1. $ python re_sub_count.py Text: Make this **bold**. This **too**. Bold: Make this bold. This **too**. subn() works just like sub(), except that it returns both the modified string and the count of substitutions made. import re bold = re.compile(r’\*{2}(.*?)\*{2}’, re.UNICODE) text = ’Make this **bold**. This **too**.’ print ’Text:’, text print ’Bold:’, bold.subn(r’\1’, text) The search pattern matches twice in the example. $ python re_subn.py 向 count 传入一个值可以限制完成的替换数。 ptg 1.3. re—Regular Expressions 57 The \g syntax also works with numbered references, and using it elimi- nates any ambiguity between group numbers and surrounding literal digits. $ python re_sub_named_groups.py Text: Make this **bold**. This **too**. Bold: Make this bold. This too. Pass a value to count to limit the number of substitutions performed. import re bold = re.compile(r’\*{2}(.*?)\*{2}’, re.UNICODE) text = ’Make this **bold**. This **too**.’ print ’Text:’, text print ’Bold:’, bold.sub(r’\1’, text, count=1) Only the first substitution is made because count is 1. $ python re_sub_count.py Text: Make this **bold**. This **too**. Bold: Make this bold. This **too**. subn() works just like sub(), except that it returns both the modified string and the count of substitutions made. import re bold = re.compile(r’\*{2}(.*?)\*{2}’, re.UNICODE) text = ’Make this **bold**. This **too**.’ print ’Text:’, text print ’Bold:’, bold.subn(r’\1’, text) The search pattern matches twice in the example. $ python re_subn.py 由于 count 为 1,因此只完成了第一个替换。 ptg 1.3. re—Regular Expressions 57 The \g syntax also works with numbered references, and using it elimi- nates any ambiguity between group numbers and surrounding literal digits. $ python re_sub_named_groups.py Text: Make this **bold**. This **too**. Bold: Make this bold. This too. Pass a value to count to limit the number of substitutions performed. import re bold = re.compile(r’\*{2}(.*?)\*{2}’, re.UNICODE) text = ’Make this **bold**. This **too**.’ print ’Text:’, text print ’Bold:’, bold.sub(r’\1’, text, count=1) Only the first substitution is made because count is 1. $ python re_sub_count.py Text: Make this **bold**. This **too**. Bold: Make this bold. This **too**. subn() works just like sub(), except that it returns both the modified string and the count of substitutions made. import re bold = re.compile(r’\*{2}(.*?)\*{2}’, re.UNICODE) text = ’Make this **bold**. This **too**.’ print ’Text:’, text print ’Bold:’, bold.subn(r’\1’, text) The search pattern matches twice in the example. $ python re_subn.py subn() 的工作原理与 sub() 很相似,只是它会同时返回修改后的字符串和完成的替换次数。 ptg 1.3. re—Regular Expressions 57 The \g syntax also works with numbered references, and using it elimi- nates any ambiguity between group numbers and surrounding literal digits. $ python re_sub_named_groups.py Text: Make this **bold**. This **too**. Bold: Make this bold. This too. Pass a value to count to limit the number of substitutions performed. import re bold = re.compile(r’\*{2}(.*?)\*{2}’, re.UNICODE) text = ’Make this **bold**. This **too**.’ print ’Text:’, text print ’Bold:’, bold.sub(r’\1’, text, count=1) Only the first substitution is made because count is 1. $ python re_sub_count.py Text: Make this **bold**. This **too**. Bold: Make this bold. This **too**. subn() works just like sub(), except that it returns both the modified string and the count of substitutions made. import re bold = re.compile(r’\*{2}(.*?)\*{2}’, re.UNICODE) text = ’Make this **bold**. This **too**.’ print ’Text:’, text print ’Bold:’, bold.subn(r’\1’, text) The search pattern matches twice in the example. $ python re_subn.py  46  Python 标准库  ptg 1.3. re—Regular Expressions 57 The \g syntax also works with numbered references, and using it elimi- nates any ambiguity between group numbers and surrounding literal digits. $ python re_sub_named_groups.py Text: Make this **bold**. This **too**. Bold: Make this bold. This too. Pass a value to count to limit the number of substitutions performed. import re bold = re.compile(r’\*{2}(.*?)\*{2}’, re.UNICODE) text = ’Make this **bold**. This **too**.’ print ’Text:’, text print ’Bold:’, bold.sub(r’\1’, text, count=1) Only the first substitution is made because count is 1. $ python re_sub_count.py Text: Make this **bold**. This **too**. Bold: Make this bold. This **too**. subn() works just like sub(), except that it returns both the modified string and the count of substitutions made. import re bold = re.compile(r’\*{2}(.*?)\*{2}’, re.UNICODE) text = ’Make this **bold**. This **too**.’ print ’Text:’, text print ’Bold:’, bold.subn(r’\1’, text) The search pattern matches twice in the example. $ python re_subn.py 在这个例子中搜索模式有两次匹配。 ptg 1.3. re—Regular Expressions 57 The \g syntax also works with numbered references, and using it elimi- nates any ambiguity between group numbers and surrounding literal digits. $ python re_sub_named_groups.py Text: Make this **bold**. This **too**. Bold: Make this bold. This too. Pass a value to count to limit the number of substitutions performed. import re bold = re.compile(r’\*{2}(.*?)\*{2}’, re.UNICODE) text = ’Make this **bold**. This **too**.’ print ’Text:’, text print ’Bold:’, bold.sub(r’\1’, text, count=1) Only the first substitution is made because count is 1. $ python re_sub_count.py Text: Make this **bold**. This **too**. Bold: Make this bold. This **too**. subn() works just like sub(), except that it returns both the modified string and the count of substitutions made. import re bold = re.compile(r’\*{2}(.*?)\*{2}’, re.UNICODE) text = ’Make this **bold**. This **too**.’ print ’Text:’, text print ’Bold:’, bold.subn(r’\1’, text) The search pattern matches twice in the example. $ python re_subn.py ptg 58 Text Text: Make this **bold**. This **too**. Bold: (’Make this bold. This too.’, 2) 1.3.11 Splitting with Patterns str.split() is one of the most frequently used methods for breaking apart strings to parse them. It only supports using literal values as separators, though, and sometimes a regular expression is necessary if the input is not consistently formatted. For example, many plain-text markup languages define paragraph separators as two or more newline (\n) characters. In this case, str.split() cannot be used because of the “or more” part of the definition. A strategy for identifying paragraphs using findall() would use a pattern like (.+?)\n{2,}. import re text = ’’’Paragraph one on two lines. Paragraph two. Paragraph three.’’’ for num, para in enumerate(re.findall(r’(.+?)\n{2,}’, text, flags=re.DOTALL) ): print num, repr(para) print That pattern fails for paragraphs at the end of the input text, as illustrated by the fact that “Paragraph three.” is not part of the output. $ python re_paragraphs_findall.py 0 ’Paragraph one\non two lines.’ 1 ’Paragraph two.’ 1.3.11 利用模式拆分 str.split() 是分解字符串来完成解析的最常用方法之一。不过,它只支持使用字面值作为分 隔符。有时,如果输入没有一致的格式,就需要有一个正则表达式。例如,很多纯文本标记语 言都把段落分隔符定义为两个或多个换行符 (\n)。在这种情况下,就不能使用 str.split(),因为 这个定义中提到了“或多个”。 使用 findall() 标识段落有一种策略:使用类似 (.+?)\n{2,} 的模式。 ptg 58 Text Text: Make this **bold**. This **too**. Bold: (’Make this bold. This too.’, 2) 1.3.11 Splitting with Patterns str.split() is one of the most frequently used methods for breaking apart strings to parse them. It only supports using literal values as separators, though, and sometimes a regular expression is necessary if the input is not consistently formatted. For example, many plain-text markup languages define paragraph separators as two or more newline (\n) characters. In this case, str.split() cannot be used because of the “or more” part of the definition. A strategy for identifying paragraphs using findall() would use a pattern like (.+?)\n{2,}. import re text = ’’’Paragraph one on two lines. Paragraph two. Paragraph three.’’’ for num, para in enumerate(re.findall(r’(.+?)\n{2,}’, text, flags=re.DOTALL) ): print num, repr(para) print That pattern fails for paragraphs at the end of the input text, as illustrated by the fact that “Paragraph three.” is not part of the output. $ python re_paragraphs_findall.py 0 ’Paragraph one\non two lines.’ 1 ’Paragraph two.’ 对于输入文本末尾的段落,这个模式会失败,原因在于“Paragraph three.”不是输出的一 部分。 ptg 58 Text Text: Make this **bold**. This **too**. Bold: (’Make this bold. This too.’, 2) 1.3.11 Splitting with Patterns str.split() is one of the most frequently used methods for breaking apart strings to parse them. It only supports using literal values as separators, though, and sometimes a regular expression is necessary if the input is not consistently formatted. For example, many plain-text markup languages define paragraph separators as two or more newline (\n) characters. In this case, str.split() cannot be used because of the “or more” part of the definition. A strategy for identifying paragraphs using findall() would use a pattern like (.+?)\n{2,}. import re text = ’’’Paragraph one on two lines. Paragraph two. Paragraph three.’’’ for num, para in enumerate(re.findall(r’(.+?)\n{2,}’, text, flags=re.DOTALL) ): print num, repr(para) print That pattern fails for paragraphs at the end of the input text, as illustrated by the fact that “Paragraph three.” is not part of the output. $ python re_paragraphs_findall.py 0 ’Paragraph one\non two lines.’ 1 ’Paragraph two.’ 可以扩展这个模式,指出段落以两个或更多个换行符结束或者以输入末尾作为结束,就能 修正这个问题,但是会让模式更为复杂。可以转向使用 re.split() 而不是 re.findall(),这就能自 第 1 章 文  本 47  动地处理边界条件,并保证模式更简单。 ptg 1.3. re—Regular Expressions 59 Extending the pattern to say that a paragraph ends with two or more newlines or the end of input fixes the problem, but makes the pattern more complicated. Converting to re.split() instead of re.findall() handles the boundary condition automatically and keeps the pattern simpler. import re text = ’’’Paragraph one on two lines. Paragraph two. Paragraph three.’’’ print ’With findall:’ for num, para in enumerate(re.findall(r’(.+?)(\n{2,}|$)’, text, flags=re.DOTALL)): print num, repr(para) print print print ’With split:’ for num, para in enumerate(re.split(r’\n{2,}’, text)): print num, repr(para) print The pattern argument to split() expresses the markup specification more pre- cisely: Two or more newline characters mark a separator point between paragraphs in the input string. $ python re_split.py With findall: 0 (’Paragraph one\non two lines.’, ’\n\n’) 1 (’Paragraph two.’, ’\n\n\n’) 2 (’Paragraph three.’, ’’) split() 的模式参数更准确地表述了标记规范 :由两个或更多个换行符标记输入字符串中段 落之间的分隔点。 ptg 1.3. re—Regular Expressions 59 Extending the pattern to say that a paragraph ends with two or more newlines or the end of input fixes the problem, but makes the pattern more complicated. Converting to re.split() instead of re.findall() handles the boundary condition automatically and keeps the pattern simpler. import re text = ’’’Paragraph one on two lines. Paragraph two. Paragraph three.’’’ print ’With findall:’ for num, para in enumerate(re.findall(r’(.+?)(\n{2,}|$)’, text, flags=re.DOTALL)): print num, repr(para) print print print ’With split:’ for num, para in enumerate(re.split(r’\n{2,}’, text)): print num, repr(para) print The pattern argument to split() expresses the markup specification more pre- cisely: Two or more newline characters mark a separator point between paragraphs in the input string. $ python re_split.py With findall: 0 (’Paragraph one\non two lines.’, ’\n\n’) 1 (’Paragraph two.’, ’\n\n\n’) 2 (’Paragraph three.’, ’’) ptg 60 Text With split: 0 ’Paragraph one\non two lines.’ 1 ’Paragraph two.’ 2 ’Paragraph three.’ Enclosing the expression in parentheses to define a group causes split() to work more like str.partition(), so it returns the separator values as well as the other parts of the string. import re text = ’’’Paragraph one on two lines. Paragraph two. Paragraph three.’’’ print ’With split:’ for num, para in enumerate(re.split(r’(\n{2,})’, text)): print num, repr(para) print The output now includes each paragraph, as well as the sequence of newlines separating them. $ python re_split_groups.py With split: 0 ’Paragraph one\non two lines.’ 1 ’\n\n’ 2 ’Paragraph two.’ 3 ’\n\n\n’ 4 ’Paragraph three.’ 可以将表达式包围在小括号里来定义一个组,这使得 split() 的工作方式更类似于 str. partition(),因此它会返回分隔符值以及字符串的其他部分。 ptg 60 Text With split: 0 ’Paragraph one\non two lines.’ 1 ’Paragraph two.’ 2 ’Paragraph three.’ Enclosing the expression in parentheses to define a group causes split() to work more like str.partition(), so it returns the separator values as well as the other parts of the string. import re text = ’’’Paragraph one on two lines. Paragraph two. Paragraph three.’’’ print ’With split:’ for num, para in enumerate(re.split(r’(\n{2,})’, text)): print num, repr(para) print The output now includes each paragraph, as well as the sequence of newlines separating them. $ python re_split_groups.py With split: 0 ’Paragraph one\non two lines.’ 1 ’\n\n’ 2 ’Paragraph two.’ 3 ’\n\n\n’ 4 ’Paragraph three.’  48  Python 标准库  ptg 60 Text With split: 0 ’Paragraph one\non two lines.’ 1 ’Paragraph two.’ 2 ’Paragraph three.’ Enclosing the expression in parentheses to define a group causes split() to work more like str.partition(), so it returns the separator values as well as the other parts of the string. import re text = ’’’Paragraph one on two lines. Paragraph two. Paragraph three.’’’ print ’With split:’ for num, para in enumerate(re.split(r’(\n{2,})’, text)): print num, repr(para) print The output now includes each paragraph, as well as the sequence of newlines separating them. $ python re_split_groups.py With split: 0 ’Paragraph one\non two lines.’ 1 ’\n\n’ 2 ’Paragraph two.’ 3 ’\n\n\n’ 4 ’Paragraph three.’ 现在输出包括了各个段落,以及分隔这些段落的换行符序列。 ptg 60 Text With split: 0 ’Paragraph one\non two lines.’ 1 ’Paragraph two.’ 2 ’Paragraph three.’ Enclosing the expression in parentheses to define a group causes split() to work more like str.partition(), so it returns the separator values as well as the other parts of the string. import re text = ’’’Paragraph one on two lines. Paragraph two. Paragraph three.’’’ print ’With split:’ for num, para in enumerate(re.split(r’(\n{2,})’, text)): print num, repr(para) print The output now includes each paragraph, as well as the sequence of newlines separating them. $ python re_split_groups.py With split: 0 ’Paragraph one\non two lines.’ 1 ’\n\n’ 2 ’Paragraph two.’ 3 ’\n\n\n’ 4 ’Paragraph three.’ 参见: re (http://docs.python.org/library/re.html) 这个模块的标准库文档。 Regular Expression HOWTO (http://docs.python.org/howto/regex.html) Andrew Kuchling 为 Python 开发人员提供的正则表达式介绍。 Kodos (http://kodos.sourceforge.net/) 这是一个用于测试正则表达式的交互式工具,由 Phil Schwartz 创建。 Python Regular Expression Testing Tool (http://www.pythonregex.com/)  这 是 一 个 用 来 测 试正则表达式的基于 Web 的工具,由 Brand Verity.com 的 David Naffziger 创建,灵感来自 Kodos。 Regular expression (http://en.wikipedia.org/wiki/Regular_expressions) 这是一篇维基百科文 章,对正则表达式概念和技术做了一般介绍。 locale (15.2 节 ) 处理 Unicode 文本时可以使用 locale 模块设置语言配置。 unicodedata (docs.python.org/library/unicodedata.html) 通过程序访问 Unicode 字符属性数 据库。 第 1 章 文  本 49  1.4 difflib—比较序列 作用:比 较序列(特别是文本行)。 Python 版本:2.1 及以后版本 difflib 模块包含一些用来计算和处理序列之间差异的工具。它对于比较文本尤其有用,其 中包含的函数可以使用多种常用差异格式生成报告。 本节中的例子都会使用 difflib_data.py 模块中以下这个常用的测试数据: ptg 1.4. difflib—Compare Sequences 61 See Also: re (http://docs.python.org/library/re.html) The standard library documentation for this module. Regular Expression HOWTO (http://docs.python.org/howto/regex.html) Andrew Kuchling’s introduction to regular expressions for Python developers. Kodos (http://kodos.sourceforge.net/) An interactive tool for testing regular expres- sions, created by Phil Schwartz. Python Regular Expression Testing Tool (http://www.pythonregex.com/) A Web- based tool for testing regular expressions created by David Naffziger at Brand Verity.com and inspired by Kodos. Regular expression (http://en.wikipedia.org/wiki/Regular_expressions) Wikipedia article that provides a general introduction to regular expression concepts and techniques. locale (page 909) Use the locale module to set the language configuration when working with Unicode text. unicodedata (docs.python.org/library/unicodedata.html) Programmatic access to the Unicode character property database. 1.4 difflib—Compare Sequences Purpose Compare sequences, especially lines of text. Python Version 2.1 and later The difflib module contains tools for computing and working with differences be- tween sequences. It is especially useful for comparing text and includes functions that produce reports using several common difference formats. The examples in this section will all use this common test data in the difflib_data.py module: text1 = """Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Integer eu lacus accumsan arcu fermentum euismod. Donec pulvinar porttitor tellus. Aliquam venenatis. Donec facilisis pharetra tortor. In nec mauris eget magna consequat convallis. Nam sed sem vitae odio pellentesque interdum. Sed consequat viverra nisl. Suspendisse arcu metus, blandit quis, rhoncus ac, pharetra eget, velit. Mauris urna. Morbi nonummy molestie orci. Praesent nisi elit, fringilla ac, suscipit non, tristique vel, mauris. Curabitur vel lorem id nisl porta adipiscing. Suspendisse eu lectus. In nunc. Duis vulputate tristique enim. Donec quis lectus a justo imperdiet tempus.""" ptg 62 Text text1_lines = text1.splitlines() text2 = """Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Integer eu lacus accumsan arcu fermentum euismod. Donec pulvinar, porttitor tellus. Aliquam venenatis. Donec facilisis pharetra tortor. In nec mauris eget magna consequat convallis. Nam cras vitae mi vitae odio pellentesque interdum. Sed consequat viverra nisl. Suspendisse arcu metus, blandit quis, rhoncus ac, pharetra eget, velit. Mauris urna. Morbi nonummy molestie orci. Praesent nisi elit, fringilla ac, suscipit non, tristique vel, mauris. Curabitur vel lorem id nisl porta adipiscing. Duis vulputate tristique enim. Donec quis lectus a justo imperdiet tempus. Suspendisse eu lectus. In nunc.""" text2_lines = text2.splitlines() 1.4.1 Comparing Bodies of Text The Differ class works on sequences of text lines and produces human-readable deltas, or change instructions, including differences within individual lines. The default output produced by Differ is similar to the diff command line tool under UNIX. It in- cludes the original input values from both lists, including common values, and markup data to indicate what changes were made. • Lines prefixed with - indicate that they were in the first sequence, but not the second. • Lines prefixed with + were in the second sequence, but not the first. • If a line has an incremental difference between versions, an extra line prefixed with ? is used to highlight the change within the new version. • If a line has not changed, it is printed with an extra blank space on the left column so that it is aligned with the other output, which may have differences. Breaking up the text into a sequence of individual lines before passing it to compare() produces more readable output than passing it in large strings. import difflib from difflib_data import * d = difflib.Differ() diff = d.compare(text1_lines, text2_lines) print ’\n’.join(diff) 1.4.1 比较文本体 Differ 类用于处理文本序列,并生成人类可读的差异(deltas)或更改指令,包括各行中的 差异。Differ 生成的默认输出与 UNIX 下的 diff 命令行工具类似,包括两个列表的原始输入值 (包含共同的值),以及指示做了哪些更改的标记数据。 • 有“–”前缀的行指示这些行在第一个序列中,但不包含在第二个序列中。 • 有“+”前缀的行在第二个序列中,但不包含在第一个序列中。 • 如果某一行在不同版本之间存在增量差异,会使用一个以“?”为前缀的额外的行强调新 版本中的变更。  50  Python 标准库  • 如果一行未改变,会输出该行,而且其左边有一个额外的空格,使它与其他可能有差异 的输出对齐。 将文本传入 compare() 之前先分解为由单个文本行构成的序列,与传入大字符串相比,这 样可以生成更可读的输出。 ptg 62 Text text1_lines = text1.splitlines() text2 = """Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Integer eu lacus accumsan arcu fermentum euismod. Donec pulvinar, porttitor tellus. Aliquam venenatis. Donec facilisis pharetra tortor. In nec mauris eget magna consequat convallis. Nam cras vitae mi vitae odio pellentesque interdum. Sed consequat viverra nisl. Suspendisse arcu metus, blandit quis, rhoncus ac, pharetra eget, velit. Mauris urna. Morbi nonummy molestie orci. Praesent nisi elit, fringilla ac, suscipit non, tristique vel, mauris. Curabitur vel lorem id nisl porta adipiscing. Duis vulputate tristique enim. Donec quis lectus a justo imperdiet tempus. Suspendisse eu lectus. In nunc.""" text2_lines = text2.splitlines() 1.4.1 Comparing Bodies of Text The Differ class works on sequences of text lines and produces human-readable deltas, or change instructions, including differences within individual lines. The default output produced by Differ is similar to the diff command line tool under UNIX. It in- cludes the original input values from both lists, including common values, and markup data to indicate what changes were made. • Lines prefixed with - indicate that they were in the first sequence, but not the second. • Lines prefixed with + were in the second sequence, but not the first. • If a line has an incremental difference between versions, an extra line prefixed with ? is used to highlight the change within the new version. • If a line has not changed, it is printed with an extra blank space on the left column so that it is aligned with the other output, which may have differences. Breaking up the text into a sequence of individual lines before passing it to compare() produces more readable output than passing it in large strings. import difflib from difflib_data import * d = difflib.Differ() diff = d.compare(text1_lines, text2_lines) print ’\n’.join(diff) 示例数据中两个文本段的开始部分是一样的,所以第一行会直接输出而没有任何额外标注。 ptg 1.4. difflib—Compare Sequences 63 The beginning of both text segments in the sample data is the same, so the first line prints without any extra annotation. Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Integer eu lacus accumsan arcu fermentum euismod. Donec The third line of the data changes to include a comma in the modified text. Both versions of the line print, with the extra information on line five showing the column where the text is modified, including the fact that the , character is added. - pulvinar porttitor tellus. Aliquam venenatis. Donec facilisis + pulvinar, porttitor tellus. Aliquam venenatis. Donec facilisis ? + The next few lines of the output show that an extra space is removed. - pharetra tortor. In nec mauris eget magna consequat ? - + pharetra tortor. In nec mauris eget magna consequat Next, a more complex change is made, replacing several words in a phrase. - convallis. Nam sed sem vitae odio pellentesque interdum. Sed ? - -- + convallis. Nam cras vitae mi vitae odio pellentesque interdum. Sed ? +++ +++++ + The last sentence in the paragraph is changed significantly, so the difference is represented by removing the old version and adding the new. consequat viverra nisl. Suspendisse arcu metus, blandit quis, rhoncus ac, pharetra eget, velit. Mauris urna. Morbi nonummy molestie orci. Praesent nisi elit, fringilla ac, suscipit non, tristique vel, mauris. Curabitur vel lorem id nisl porta - adipiscing. Suspendisse eu lectus. In nunc. Duis vulputate - tristique enim. Donec quis lectus a justo imperdiet tempus. + adipiscing. Duis vulputate tristique enim. Donec quis lectus a + justo imperdiet tempus. Suspendisse eu lectus. In nunc. 数据的第三行存在变化,修改后的文本中包含有一个逗号。这两个版本的数据行都会输出, 而且第五行上的额外信息会显示文本中哪一列有修改,这里显示出增加了“,”字符。 ptg 1.4. difflib—Compare Sequences 63 The beginning of both text segments in the sample data is the same, so the first line prints without any extra annotation. Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Integer eu lacus accumsan arcu fermentum euismod. Donec The third line of the data changes to include a comma in the modified text. Both versions of the line print, with the extra information on line five showing the column where the text is modified, including the fact that the , character is added. - pulvinar porttitor tellus. Aliquam venenatis. Donec facilisis + pulvinar, porttitor tellus. Aliquam venenatis. Donec facilisis ? + The next few lines of the output show that an extra space is removed. - pharetra tortor. In nec mauris eget magna consequat ? - + pharetra tortor. In nec mauris eget magna consequat Next, a more complex change is made, replacing several words in a phrase. - convallis. Nam sed sem vitae odio pellentesque interdum. Sed ? - -- + convallis. Nam cras vitae mi vitae odio pellentesque interdum. Sed ? +++ +++++ + The last sentence in the paragraph is changed significantly, so the difference is represented by removing the old version and adding the new. consequat viverra nisl. Suspendisse arcu metus, blandit quis, rhoncus ac, pharetra eget, velit. Mauris urna. Morbi nonummy molestie orci. Praesent nisi elit, fringilla ac, suscipit non, tristique vel, mauris. Curabitur vel lorem id nisl porta - adipiscing. Suspendisse eu lectus. In nunc. Duis vulputate - tristique enim. Donec quis lectus a justo imperdiet tempus. + adipiscing. Duis vulputate tristique enim. Donec quis lectus a + justo imperdiet tempus. Suspendisse eu lectus. In nunc. 输出中接下来几行显示删除了一个多余的空格。 ptg 1.4. difflib—Compare Sequences 63 The beginning of both text segments in the sample data is the same, so the first line prints without any extra annotation. Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Integer eu lacus accumsan arcu fermentum euismod. Donec The third line of the data changes to include a comma in the modified text. Both versions of the line print, with the extra information on line five showing the column where the text is modified, including the fact that the , character is added. - pulvinar porttitor tellus. Aliquam venenatis. Donec facilisis + pulvinar, porttitor tellus. Aliquam venenatis. Donec facilisis ? + The next few lines of the output show that an extra space is removed. - pharetra tortor. In nec mauris eget magna consequat ? - + pharetra tortor. In nec mauris eget magna consequat Next, a more complex change is made, replacing several words in a phrase. - convallis. Nam sed sem vitae odio pellentesque interdum. Sed ? - -- + convallis. Nam cras vitae mi vitae odio pellentesque interdum. Sed ? +++ +++++ + The last sentence in the paragraph is changed significantly, so the difference is represented by removing the old version and adding the new. consequat viverra nisl. Suspendisse arcu metus, blandit quis, rhoncus ac, pharetra eget, velit. Mauris urna. Morbi nonummy molestie orci. Praesent nisi elit, fringilla ac, suscipit non, tristique vel, mauris. Curabitur vel lorem id nisl porta - adipiscing. Suspendisse eu lectus. In nunc. Duis vulputate - tristique enim. Donec quis lectus a justo imperdiet tempus. + adipiscing. Duis vulputate tristique enim. Donec quis lectus a + justo imperdiet tempus. Suspendisse eu lectus. In nunc. 接下来有一个更为复杂的变更,替换了一个短语中的多个单词。 ptg 1.4. difflib—Compare Sequences 63 The beginning of both text segments in the sample data is the same, so the first line prints without any extra annotation. Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Integer eu lacus accumsan arcu fermentum euismod. Donec The third line of the data changes to include a comma in the modified text. Both versions of the line print, with the extra information on line five showing the column where the text is modified, including the fact that the , character is added. - pulvinar porttitor tellus. Aliquam venenatis. Donec facilisis + pulvinar, porttitor tellus. Aliquam venenatis. Donec facilisis ? + The next few lines of the output show that an extra space is removed. - pharetra tortor. In nec mauris eget magna consequat ? - + pharetra tortor. In nec mauris eget magna consequat Next, a more complex change is made, replacing several words in a phrase. - convallis. Nam sed sem vitae odio pellentesque interdum. Sed ? - -- + convallis. Nam cras vitae mi vitae odio pellentesque interdum. Sed ? +++ +++++ + The last sentence in the paragraph is changed significantly, so the difference is represented by removing the old version and adding the new. consequat viverra nisl. Suspendisse arcu metus, blandit quis, rhoncus ac, pharetra eget, velit. Mauris urna. Morbi nonummy molestie orci. Praesent nisi elit, fringilla ac, suscipit non, tristique vel, mauris. Curabitur vel lorem id nisl porta - adipiscing. Suspendisse eu lectus. In nunc. Duis vulputate - tristique enim. Donec quis lectus a justo imperdiet tempus. + adipiscing. Duis vulputate tristique enim. Donec quis lectus a + justo imperdiet tempus. Suspendisse eu lectus. In nunc. 段落中下一句变化很大,所以表示差异时完全删除了老版本,而增加了新版本。 ptg 1.4. difflib—Compare Sequences 63 The beginning of both text segments in the sample data is the same, so the first line prints without any extra annotation. Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Integer eu lacus accumsan arcu fermentum euismod. Donec The third line of the data changes to include a comma in the modified text. Both versions of the line print, with the extra information on line five showing the column where the text is modified, including the fact that the , character is added. - pulvinar porttitor tellus. Aliquam venenatis. Donec facilisis + pulvinar, porttitor tellus. Aliquam venenatis. Donec facilisis ? + The next few lines of the output show that an extra space is removed. - pharetra tortor. In nec mauris eget magna consequat ? - + pharetra tortor. In nec mauris eget magna consequat Next, a more complex change is made, replacing several words in a phrase. - convallis. Nam sed sem vitae odio pellentesque interdum. Sed ? - -- + convallis. Nam cras vitae mi vitae odio pellentesque interdum. Sed ? +++ +++++ + The last sentence in the paragraph is changed significantly, so the difference is represented by removing the old version and adding the new. consequat viverra nisl. Suspendisse arcu metus, blandit quis, rhoncus ac, pharetra eget, velit. Mauris urna. Morbi nonummy molestie orci. Praesent nisi elit, fringilla ac, suscipit non, tristique vel, mauris. Curabitur vel lorem id nisl porta - adipiscing. Suspendisse eu lectus. In nunc. Duis vulputate - tristique enim. Donec quis lectus a justo imperdiet tempus. + adipiscing. Duis vulputate tristique enim. Donec quis lectus a + justo imperdiet tempus. Suspendisse eu lectus. In nunc. 第 1 章 文  本 51  ndiff() 函数生成的输出基本上相同,会特别“加工”来处理文本数据,并删除输入中的 “噪声”。 其他输出格式 Differ 类会显示所有输入行,统一差异格式(unified diff)则不同,它只包含已修改的文本 行和一些上下文。Python 2.3 中增加了 unified_diff() 函数来生成这种输出。 ptg 64 Text The ndiff() function produces essentially the same output. The processing is specifically tailored for working with text data and eliminating noise in the input. Other Output Formats While the Differ class shows all input lines, a unified diff includes only modified lines and a bit of context. In Python 2.3, the unified_diff() function was added to produce this sort of output. import difflib from difflib_data import * diff = difflib.unified_diff(text1_lines, text2_lines, lineterm=’’, ) print ’\n’.join(list(diff)) The lineterm argument is used to tell unified_diff() to skip appending new- lines to the control lines it returns because the input lines do not include them. Newlines are added to all lines when they are printed. The output should look familiar to users of subversion or other version control tools. $ python difflib_unified.py --- +++ @@ -1,11 +1,11 @@ Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Integer eu lacus accumsan arcu fermentum euismod. Donec -pulvinar porttitor tellus. Aliquam venenatis. Donec facilisis -pharetra tortor. In nec mauris eget magna consequat -convallis. Nam sed sem vitae odio pellentesque interdum. Sed +pulvinar, porttitor tellus. Aliquam venenatis. Donec facilisis +pharetra tortor. In nec mauris eget magna consequat +convallis. Nam cras vitae mi vitae odio pellentesque interdum. Sed consequat viverra nisl. Suspendisse arcu metus, blandit quis, rhoncus ac, pharetra eget, velit. Mauris urna. Morbi nonummy molestie orci. Praesent nisi elit, fringilla ac, suscipit non, tristique vel, mauris. Curabitur vel lorem id nisl porta -adipiscing. Suspendisse eu lectus. In nunc. Duis vulputate -tristique enim. Donec quis lectus a justo imperdiet tempus. lineterm 参数用来告诉 unified_diff() 不必为它返回的控制行追加换行符,因为输入行不包 括这些换行符。输出时所有行都会增加换行符。对于 subversion 或其他版本控制工具的用户来 说,输出看上去应该很熟悉。 ptg 64 Text The ndiff() function produces essentially the same output. The processing is specifically tailored for working with text data and eliminating noise in the input. Other Output Formats While the Differ class shows all input lines, a unified diff includes only modified lines and a bit of context. In Python 2.3, the unified_diff() function was added to produce this sort of output. import difflib from difflib_data import * diff = difflib.unified_diff(text1_lines, text2_lines, lineterm=’’, ) print ’\n’.join(list(diff)) The lineterm argument is used to tell unified_diff() to skip appending new- lines to the control lines it returns because the input lines do not include them. Newlines are added to all lines when they are printed. The output should look familiar to users of subversion or other version control tools. $ python difflib_unified.py --- +++ @@ -1,11 +1,11 @@ Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Integer eu lacus accumsan arcu fermentum euismod. Donec -pulvinar porttitor tellus. Aliquam venenatis. Donec facilisis -pharetra tortor. In nec mauris eget magna consequat -convallis. Nam sed sem vitae odio pellentesque interdum. Sed +pulvinar, porttitor tellus. Aliquam venenatis. Donec facilisis +pharetra tortor. In nec mauris eget magna consequat +convallis. Nam cras vitae mi vitae odio pellentesque interdum. Sed consequat viverra nisl. Suspendisse arcu metus, blandit quis, rhoncus ac, pharetra eget, velit. Mauris urna. Morbi nonummy molestie orci. Praesent nisi elit, fringilla ac, suscipit non, tristique vel, mauris. Curabitur vel lorem id nisl porta -adipiscing. Suspendisse eu lectus. In nunc. Duis vulputate -tristique enim. Donec quis lectus a justo imperdiet tempus. ptg 1.4. difflib—Compare Sequences 65 +adipiscing. Duis vulputate tristique enim. Donec quis lectus a +justo imperdiet tempus. Suspendisse eu lectus. In nunc. Using context_diff() produces similar readable output. 1.4.2 Junk Data All functions that produce difference sequences accept arguments to indicate which lines should be ignored and which characters within a line should be ignored. These parameters can be used to skip over markup or whitespace changes in two versions of a file, for example. # This example is adapted from the source for difflib.py. from difflib import SequenceMatcher def show_results(s): i, j, k = s.find_longest_match(0, 5, 0, 9) print ’ i = %d’ % i print ’ j = %d’ % j print ’ k = %d’ % k print ’ A[i:i+k] = %r’ % A[i:i+k] print ’ B[j:j+k] = %r’ % B[j:j+k] A = " abcd" B = "abcd abcd" print ’A = %r’ % A print ’B = %r’ % B print ’\nWithout junk detection:’ show_results(SequenceMatcher(None, A, B)) print ’\nTreat spaces as junk:’ show_results(SequenceMatcher(lambda x: x==" ", A, B)) The default for Differ is to not ignore any lines or characters explicitly, but to rely on the ability of SequenceMatcher to detect noise. The default for ndiff() is to ignore space and tab characters. $ python difflib_junk.py 使用 context_diff() 会生成类似可读的输出。 1.4.2 无用数据 所有生成差异序列的函数都可以接受一些参数来指示应当忽略哪些行,以及应当忽略一行  52  Python 标准库  中的哪些字符。例如,可以用这些参数指定跳过一个文件两个版本中的标记或空白符变更。 ptg 1.4. difflib—Compare Sequences 65 +adipiscing. Duis vulputate tristique enim. Donec quis lectus a +justo imperdiet tempus. Suspendisse eu lectus. In nunc. Using context_diff() produces similar readable output. 1.4.2 Junk Data All functions that produce difference sequences accept arguments to indicate which lines should be ignored and which characters within a line should be ignored. These parameters can be used to skip over markup or whitespace changes in two versions of a file, for example. # This example is adapted from the source for difflib.py. from difflib import SequenceMatcher def show_results(s): i, j, k = s.find_longest_match(0, 5, 0, 9) print ’ i = %d’ % i print ’ j = %d’ % j print ’ k = %d’ % k print ’ A[i:i+k] = %r’ % A[i:i+k] print ’ B[j:j+k] = %r’ % B[j:j+k] A = " abcd" B = "abcd abcd" print ’A = %r’ % A print ’B = %r’ % B print ’\nWithout junk detection:’ show_results(SequenceMatcher(None, A, B)) print ’\nTreat spaces as junk:’ show_results(SequenceMatcher(lambda x: x==" ", A, B)) The default for Differ is to not ignore any lines or characters explicitly, but to rely on the ability of SequenceMatcher to detect noise. The default for ndiff() is to ignore space and tab characters. $ python difflib_junk.py 默认情况下,Differ 不会显式忽略任何行或字符,而会依赖 SequenceMatcher 的能力检测噪 声。ndiff() 的默认行为是忽略空格和制表符(tab)。 ptg 1.4. difflib—Compare Sequences 65 +adipiscing. Duis vulputate tristique enim. Donec quis lectus a +justo imperdiet tempus. Suspendisse eu lectus. In nunc. Using context_diff() produces similar readable output. 1.4.2 Junk Data All functions that produce difference sequences accept arguments to indicate which lines should be ignored and which characters within a line should be ignored. These parameters can be used to skip over markup or whitespace changes in two versions of a file, for example. # This example is adapted from the source for difflib.py. from difflib import SequenceMatcher def show_results(s): i, j, k = s.find_longest_match(0, 5, 0, 9) print ’ i = %d’ % i print ’ j = %d’ % j print ’ k = %d’ % k print ’ A[i:i+k] = %r’ % A[i:i+k] print ’ B[j:j+k] = %r’ % B[j:j+k] A = " abcd" B = "abcd abcd" print ’A = %r’ % A print ’B = %r’ % B print ’\nWithout junk detection:’ show_results(SequenceMatcher(None, A, B)) print ’\nTreat spaces as junk:’ show_results(SequenceMatcher(lambda x: x==" ", A, B)) The default for Differ is to not ignore any lines or characters explicitly, but to rely on the ability of SequenceMatcher to detect noise. The default for ndiff() is to ignore space and tab characters. $ python difflib_junk.py ptg 66 Text A = ’ abcd’ B = ’abcd abcd’ Without junk detection: i = 0 j = 4 k = 5 A[i:i+k] = ’ abcd’ B[j:j+k] = ’ abcd’ Treat spaces as junk: i = 1 j = 0 k = 4 A[i:i+k] = ’abcd’ B[j:j+k] = ’abcd’ 1.4.3 Comparing Arbitrary Types The SequenceMatcher class compares two sequences of any type, as long as the values are hashable. It uses an algorithm to identify the longest contiguous matching blocks from the sequences, eliminating junk values that do not contribute to the real data. import difflib from difflib_data import * s1 = [ 1, 2, 3, 5, 6, 4 ] s2 = [ 2, 3, 5, 4, 6, 1 ] print ’Initial data:’ print ’s1 =’, s1 print ’s2 =’, s2 print ’s1 == s2:’, s1==s2 print matcher = difflib.SequenceMatcher(None, s1, s2) for tag, i1, i2, j1, j2 in reversed(matcher.get_opcodes()): if tag == ’delete’: print ’Remove %s from positions [%d:%d]’ % \ (s1[i1:i2], i1, i2) del s1[i1:i2] 第 1 章 文  本 53  1.4.3 比较任意类型 SequenceMatcher 类用于比较任意类型的两个序列,只要它们的值是可散列的。这个类使 用一个算法来标识序列中最长的连续匹配块,并删除对实际数据没有贡献的无用值。 ptg 66 Text A = ’ abcd’ B = ’abcd abcd’ Without junk detection: i = 0 j = 4 k = 5 A[i:i+k] = ’ abcd’ B[j:j+k] = ’ abcd’ Treat spaces as junk: i = 1 j = 0 k = 4 A[i:i+k] = ’abcd’ B[j:j+k] = ’abcd’ 1.4.3 Comparing Arbitrary Types The SequenceMatcher class compares two sequences of any type, as long as the values are hashable. It uses an algorithm to identify the longest contiguous matching blocks from the sequences, eliminating junk values that do not contribute to the real data. import difflib from difflib_data import * s1 = [ 1, 2, 3, 5, 6, 4 ] s2 = [ 2, 3, 5, 4, 6, 1 ] print ’Initial data:’ print ’s1 =’, s1 print ’s2 =’, s2 print ’s1 == s2:’, s1==s2 print matcher = difflib.SequenceMatcher(None, s1, s2) for tag, i1, i2, j1, j2 in reversed(matcher.get_opcodes()): if tag == ’delete’: print ’Remove %s from positions [%d:%d]’ % \ (s1[i1:i2], i1, i2) del s1[i1:i2] ptg 1.4. difflib—Compare Sequences 67 elif tag == ’equal’: print ’s1[%d:%d] and s2[%d:%d] are the same’ % \ (i1, i2, j1, j2) elif tag == ’insert’: print ’Insert %s from s2[%d:%d] into s1 at %d’ % \ (s2[j1:j2], j1, j2, i1) s1[i1:i2] = s2[j1:j2] elif tag == ’replace’: print ’Replace %s from s1[%d:%d] with %s from s2[%d:%d]’ % ( s1[i1:i2], i1, i2, s2[j1:j2], j1, j2) s1[i1:i2] = s2[j1:j2] print ’ s1 =’, s1 print ’s1 == s2:’, s1==s2 This example compares two lists of integers and uses get_opcodes() to derive the instructions for converting the original list into the newer version. The modifications are applied in reverse order so that the list indexes remain accurate after items are added and removed. $ python difflib_seq.py Initial data: s1 = [1, 2, 3, 5, 6, 4] s2 = [2, 3, 5, 4, 6, 1] s1 == s2: False Replace [4] from s1[5:6] with [1] from s2[5:6] s1 = [1, 2, 3, 5, 6, 1] s1[4:5] and s2[4:5] are the same s1 = [1, 2, 3, 5, 6, 1] Insert [4] from s2[3:4] into s1 at 4 s1 = [1, 2, 3, 5, 4, 6, 1] s1[1:4] and s2[0:3] are the same s1 = [1, 2, 3, 5, 4, 6, 1] Remove [1] from positions [0:1] s1 = [2, 3, 5, 4, 6, 1] s1 == s2: True 这个例子比较了两个整数列表,并使用 get_opcodes() 得出将原列表转换为新列表的指令。 这里以逆序应用所做的修改,使得添加和删除元素之后列表索引仍是正确的。 ptg 1.4. difflib—Compare Sequences 67 elif tag == ’equal’: print ’s1[%d:%d] and s2[%d:%d] are the same’ % \ (i1, i2, j1, j2) elif tag == ’insert’: print ’Insert %s from s2[%d:%d] into s1 at %d’ % \ (s2[j1:j2], j1, j2, i1) s1[i1:i2] = s2[j1:j2] elif tag == ’replace’: print ’Replace %s from s1[%d:%d] with %s from s2[%d:%d]’ % ( s1[i1:i2], i1, i2, s2[j1:j2], j1, j2) s1[i1:i2] = s2[j1:j2] print ’ s1 =’, s1 print ’s1 == s2:’, s1==s2 This example compares two lists of integers and uses get_opcodes() to derive the instructions for converting the original list into the newer version. The modifications are applied in reverse order so that the list indexes remain accurate after items are added and removed. $ python difflib_seq.py Initial data: s1 = [1, 2, 3, 5, 6, 4] s2 = [2, 3, 5, 4, 6, 1] s1 == s2: False Replace [4] from s1[5:6] with [1] from s2[5:6] s1 = [1, 2, 3, 5, 6, 1] s1[4:5] and s2[4:5] are the same s1 = [1, 2, 3, 5, 6, 1] Insert [4] from s2[3:4] into s1 at 4 s1 = [1, 2, 3, 5, 4, 6, 1] s1[1:4] and s2[0:3] are the same s1 = [1, 2, 3, 5, 4, 6, 1] Remove [1] from positions [0:1] s1 = [2, 3, 5, 4, 6, 1] s1 == s2: True  54  Python 标准库  ptg 1.4. difflib—Compare Sequences 67 elif tag == ’equal’: print ’s1[%d:%d] and s2[%d:%d] are the same’ % \ (i1, i2, j1, j2) elif tag == ’insert’: print ’Insert %s from s2[%d:%d] into s1 at %d’ % \ (s2[j1:j2], j1, j2, i1) s1[i1:i2] = s2[j1:j2] elif tag == ’replace’: print ’Replace %s from s1[%d:%d] with %s from s2[%d:%d]’ % ( s1[i1:i2], i1, i2, s2[j1:j2], j1, j2) s1[i1:i2] = s2[j1:j2] print ’ s1 =’, s1 print ’s1 == s2:’, s1==s2 This example compares two lists of integers and uses get_opcodes() to derive the instructions for converting the original list into the newer version. The modifications are applied in reverse order so that the list indexes remain accurate after items are added and removed. $ python difflib_seq.py Initial data: s1 = [1, 2, 3, 5, 6, 4] s2 = [2, 3, 5, 4, 6, 1] s1 == s2: False Replace [4] from s1[5:6] with [1] from s2[5:6] s1 = [1, 2, 3, 5, 6, 1] s1[4:5] and s2[4:5] are the same s1 = [1, 2, 3, 5, 6, 1] Insert [4] from s2[3:4] into s1 at 4 s1 = [1, 2, 3, 5, 4, 6, 1] s1[1:4] and s2[0:3] are the same s1 = [1, 2, 3, 5, 4, 6, 1] Remove [1] from positions [0:1] s1 = [2, 3, 5, 4, 6, 1] s1 == s2: True SequenceMatcher 用于处理定制类以及内置类型,前提是它们必须是可散列的。 参见: difflib (http://docs.python.org/library/difflib.html) 这个模块的标准库文档。 Pattern Matching: The Gestalt Approach (http://www.ddj.com/documents/s=1103/ddj8807c/)  对 John W. Ratcliff 和 D. E. Metzener 提出的一种类似算法的讨论,发表于 1988 年 7 月的《Dr. Dobb’s Journal》。 第 2 章 数 据 结 构 Python 包含很多标准编程数据结构,如 list(列表)、tuple(元组)、dict(字典)和 set(集合), 这些都属于其内置类型。对很多应用来说这些结构已经足够,不再需要其他类型,不过,如果 确实需要其他结构也大可放心,标准库提供了功能强大而且经过充分测试的版本可备使用。 collections 模块包含多种数据结构的实现,扩展了其他模块中的相应结构。例如, Deque 是 一个双端队列,允许从任意一端增加或删除元素。defaultdict 是一个字典,如果找不到某个键, 它会响应一个默认值,而 OrderedDict 会记住增加元素的序列。namedtuple 扩展了一般的 tuple, 除了为每个成员元素提供一个数值索引外还提供一个属性名。 对于大量数据,array 会比 list 更高效地利用内存。由于 array 仅限于一种数据类型,与通 用的 list 相比,它可以采用一种更紧凑的内存表示。不仅如此,list 的很多同样的方法都可以用 来处理 array,所以可以把一个应用中的 list 替换为 array 而无须太多修改。 对一个序列中的元素排序是数据处理的一个基本方面。Python 的 list 包含一个 sort() 方法, 不过有时维护一个有序列表会更为高效,而无须每次改变列表内容时都重新排序。heapq 中的 函数可以修改列表的内容,同时还能以很低的开销维护列表原来的顺序。 构建有序列表或数组还有一种选择,即 bisect。它使用一种二分查找算法查找新元素的插 入点,如果要反复对一个频繁改变的列表排序,可以将它作为一种候选方法。 尽管内置的 list 可以使用 insert() 和 pop() 方法模拟队列,但这不是线程安全的。要完成线 程间的实序通信,可以使用 Queue 模块。multiprocessing 包含一个 Queue 的版本,它会处理进 程间的通信,从而能更容易地将一个多线程程序转换为使用进程而不是线程。 struct 对于解码另一个应用的数据(可能来自一个二进制文件或数据流)会很有用,可以将 这些数据解码为 Python 的内置类型,以便于处理。 本章会介绍两种与内存管理有关的模块。对于高度互连的数据结构,如图和树,可以使用 weakref 维护引用,同时当不再需要某些对象时仍允许垃圾回收器进行清理。copy 中的函数用 于复制数据结构及其内容,包括用 deepcopy() 完成递归复制。 调试数据结构可能很耗费时间,特别是查看大序列或字典的打印输出。可以使用 pprint 创 建易读的表示,从而打印到控制台或写至一个日志文件以利于调试。 另外,最后一点,如果现有的类型不能满足需求,可以派生某个内置类型进行定制,或者 使用 collections 中定义的某个抽象基类作为起点构建一个新的容器类型。  56  Python 标准库  2.1 collections—容器数据类型 作用:容器数据类型。 Python 版本:2.4 及以后版本 collections 模块包含除内置类型 list、dict 和 tuple 以外的其他容器数据类型。 2.1.1 Counter Counter 作为一个容器,可以跟踪相同的值增加了多少次。这个类可以用来实现其他语言中 常用包(bag)或多集合(multiset)数据结构来实现的算法。 初始化 Counter 支持 3 种形式的初始化。调用 Counter 的构造函数时可以提供一个元素序列或者一 个包含键和计数的字典,还可以使用关键字参数将字符串名映射到计数。 ptg 2.1. collections—Container Data Types 71 import collections print collections.Counter([’a’, ’b’, ’c’, ’a’, ’b’, ’b’]) print collections.Counter({’a’:2, ’b’:3, ’c’:1}) print collections.Counter(a=2, b=3, c=1) The results of all three forms of initialization are the same. $ python collections_counter_init.py Counter({’b’: 3, ’a’: 2, ’c’: 1}) Counter({’b’: 3, ’a’: 2, ’c’: 1}) Counter({’b’: 3, ’a’: 2, ’c’: 1}) An empty Counter can be constructed with no arguments and populated via the update() method. import collections c = collections.Counter() print ’Initial :’, c c.update(’abcdaab’) print ’Sequence:’, c c.update({’a’:1, ’d’:5}) print ’Dict :’, c The count values are increased based on the new data, rather than replaced. In this example, the count for a goes from 3 to 4. $ python collections_counter_update.py Initial : Counter() Sequence: Counter({’a’: 3, ’b’: 2, ’c’: 1, ’d’: 1}) Dict : Counter({’d’: 6, ’a’: 4, ’b’: 2, ’c’: 1}) Accessing Counts Once a Counter is populated, its values can be retrieved using the dictionary API. 这 3 种形式的初始化结果都是一样的。 ptg 2.1. collections—Container Data Types 71 import collections print collections.Counter([’a’, ’b’, ’c’, ’a’, ’b’, ’b’]) print collections.Counter({’a’:2, ’b’:3, ’c’:1}) print collections.Counter(a=2, b=3, c=1) The results of all three forms of initialization are the same. $ python collections_counter_init.py Counter({’b’: 3, ’a’: 2, ’c’: 1}) Counter({’b’: 3, ’a’: 2, ’c’: 1}) Counter({’b’: 3, ’a’: 2, ’c’: 1}) An empty Counter can be constructed with no arguments and populated via the update() method. import collections c = collections.Counter() print ’Initial :’, c c.update(’abcdaab’) print ’Sequence:’, c c.update({’a’:1, ’d’:5}) print ’Dict :’, c The count values are increased based on the new data, rather than replaced. In this example, the count for a goes from 3 to 4. $ python collections_counter_update.py Initial : Counter() Sequence: Counter({’a’: 3, ’b’: 2, ’c’: 1, ’d’: 1}) Dict : Counter({’d’: 6, ’a’: 4, ’b’: 2, ’c’: 1}) Accessing Counts Once a Counter is populated, its values can be retrieved using the dictionary API. 如果不提供任何参数,可以构造一个空 Counter,然后通过 update() 方法填充。 ptg 2.1. collections—Container Data Types 71 import collections print collections.Counter([’a’, ’b’, ’c’, ’a’, ’b’, ’b’]) print collections.Counter({’a’:2, ’b’:3, ’c’:1}) print collections.Counter(a=2, b=3, c=1) The results of all three forms of initialization are the same. $ python collections_counter_init.py Counter({’b’: 3, ’a’: 2, ’c’: 1}) Counter({’b’: 3, ’a’: 2, ’c’: 1}) Counter({’b’: 3, ’a’: 2, ’c’: 1}) An empty Counter can be constructed with no arguments and populated via the update() method. import collections c = collections.Counter() print ’Initial :’, c c.update(’abcdaab’) print ’Sequence:’, c c.update({’a’:1, ’d’:5}) print ’Dict :’, c The count values are increased based on the new data, rather than replaced. In this example, the count for a goes from 3 to 4. $ python collections_counter_update.py Initial : Counter() Sequence: Counter({’a’: 3, ’b’: 2, ’c’: 1, ’d’: 1}) Dict : Counter({’d’: 6, ’a’: 4, ’b’: 2, ’c’: 1}) Accessing Counts Once a Counter is populated, its values can be retrieved using the dictionary API. 计数值将根据新数据增加,替换数据不会改变计数。在下面的例子中,a 的计数会从 3 增加到 4。 ptg 2.1. collections—Container Data Types 71 import collections print collections.Counter([’a’, ’b’, ’c’, ’a’, ’b’, ’b’]) print collections.Counter({’a’:2, ’b’:3, ’c’:1}) print collections.Counter(a=2, b=3, c=1) The results of all three forms of initialization are the same. $ python collections_counter_init.py Counter({’b’: 3, ’a’: 2, ’c’: 1}) Counter({’b’: 3, ’a’: 2, ’c’: 1}) Counter({’b’: 3, ’a’: 2, ’c’: 1}) An empty Counter can be constructed with no arguments and populated via the update() method. import collections c = collections.Counter() print ’Initial :’, c c.update(’abcdaab’) print ’Sequence:’, c c.update({’a’:1, ’d’:5}) print ’Dict :’, c The count values are increased based on the new data, rather than replaced. In this example, the count for a goes from 3 to 4. $ python collections_counter_update.py Initial : Counter() Sequence: Counter({’a’: 3, ’b’: 2, ’c’: 1, ’d’: 1}) Dict : Counter({’d’: 6, ’a’: 4, ’b’: 2, ’c’: 1}) Accessing Counts Once a Counter is populated, its values can be retrieved using the dictionary API. 第 2 章 数 据 结 构 57  访问计数 一旦填充了 Counter,可以使用字典 API 获取它的值。 ptg 72 Data Structures import collections c = collections.Counter(’abcdaab’) for letter in ’abcde’: print ’%s : %d’ % (letter, c[letter]) Counter does not raise KeyError for unknown items. If a value has not been seen in the input (as with e in this example), its count is 0. $ python collections_counter_get_values.py a : 3 b : 2 c : 1 d : 1 e : 0 The elements() method returns an iterator that produces all items known to the Counter. import collections c = collections.Counter(’extremely’) c[’z’] = 0 print c print list(c.elements()) The order of elements is not guaranteed, and items with counts less than or equal to zero are not included. $ python collections_counter_elements.py Counter({’e’: 3, ’m’: 1, ’l’: 1, ’r’: 1, ’t’: 1, ’y’: 1, ’x’: 1, ’z’: 0}) [’e’, ’e’, ’e’, ’m’, ’l’, ’r’, ’t’, ’y’, ’x’] Use most_common() to produce a sequence of the n most frequently encountered input values and their respective counts. 对于未知的元素,Counter 不会产生 KeyError。如果在输入中没有找到某个值(如此例中 的 e),其计数为 0。 ptg 72 Data Structures import collections c = collections.Counter(’abcdaab’) for letter in ’abcde’: print ’%s : %d’ % (letter, c[letter]) Counter does not raise KeyError for unknown items. If a value has not been seen in the input (as with e in this example), its count is 0. $ python collections_counter_get_values.py a : 3 b : 2 c : 1 d : 1 e : 0 The elements() method returns an iterator that produces all items known to the Counter. import collections c = collections.Counter(’extremely’) c[’z’] = 0 print c print list(c.elements()) The order of elements is not guaranteed, and items with counts less than or equal to zero are not included. $ python collections_counter_elements.py Counter({’e’: 3, ’m’: 1, ’l’: 1, ’r’: 1, ’t’: 1, ’y’: 1, ’x’: 1, ’z’: 0}) [’e’, ’e’, ’e’, ’m’, ’l’, ’r’, ’t’, ’y’, ’x’] Use most_common() to produce a sequence of the n most frequently encountered input values and their respective counts. elements() 方法返回一个迭代器,将生成 Counter 知道的所有元素。 ptg 72 Data Structures import collections c = collections.Counter(’abcdaab’) for letter in ’abcde’: print ’%s : %d’ % (letter, c[letter]) Counter does not raise KeyError for unknown items. If a value has not been seen in the input (as with e in this example), its count is 0. $ python collections_counter_get_values.py a : 3 b : 2 c : 1 d : 1 e : 0 The elements() method returns an iterator that produces all items known to the Counter. import collections c = collections.Counter(’extremely’) c[’z’] = 0 print c print list(c.elements()) The order of elements is not guaranteed, and items with counts less than or equal to zero are not included. $ python collections_counter_elements.py Counter({’e’: 3, ’m’: 1, ’l’: 1, ’r’: 1, ’t’: 1, ’y’: 1, ’x’: 1, ’z’: 0}) [’e’, ’e’, ’e’, ’m’, ’l’, ’r’, ’t’, ’y’, ’x’] Use most_common() to produce a sequence of the n most frequently encountered input values and their respective counts. 不能保证元素的顺序不变,另外计数小于或等于 0 的元素不包含在内。 ptg 72 Data Structures import collections c = collections.Counter(’abcdaab’) for letter in ’abcde’: print ’%s : %d’ % (letter, c[letter]) Counter does not raise KeyError for unknown items. If a value has not been seen in the input (as with e in this example), its count is 0. $ python collections_counter_get_values.py a : 3 b : 2 c : 1 d : 1 e : 0 The elements() method returns an iterator that produces all items known to the Counter. import collections c = collections.Counter(’extremely’) c[’z’] = 0 print c print list(c.elements()) The order of elements is not guaranteed, and items with counts less than or equal to zero are not included. $ python collections_counter_elements.py Counter({’e’: 3, ’m’: 1, ’l’: 1, ’r’: 1, ’t’: 1, ’y’: 1, ’x’: 1, ’z’: 0}) [’e’, ’e’, ’e’, ’m’, ’l’, ’r’, ’t’, ’y’, ’x’] Use most_common() to produce a sequence of the n most frequently encountered input values and their respective counts. 使用 most_common() 可以生成一个序列,其中包含 n 个最常遇到的输入值及其相应计数。 ptg 2.1. collections—Container Data Types 73 import collections c = collections.Counter() with open(’/usr/share/dict/words’, ’rt’) as f: for line in f: c.update(line.rstrip().lower()) print ’Most common:’ for letter, count in c.most_common(3): print ’%s: %7d’ % (letter, count) This example counts the letters appearing in all words in the system dictionary to produce a frequency distribution, and then prints the three most common letters. Leaving out the argument to most_common() produces a list of all the items, in order of frequency. $ python collections_counter_most_common.py Most common: e: 234803 i: 200613 a: 198938 Arithmetic Counter instances support arithmetic and set operations for aggregating results. import collections c1 = collections.Counter([’a’, ’b’, ’c’, ’a’, ’b’, ’b’]) c2 = collections.Counter(’alphabet’) print ’C1:’, c1 print ’C2:’, c2 print ’\nCombined counts:’ print c1 + c2 print ’\nSubtraction:’ print c1 - c2  58  Python 标准库  这个例子要统计系统字典的所有单词中出现的字母,来生成一个频度分布,然后打印 3 个 最常见的字母。如果不向 most_common() 提供参数,会生成由所有元素构成的一个列表,按频 度排序。 ptg 2.1. collections—Container Data Types 73 import collections c = collections.Counter() with open(’/usr/share/dict/words’, ’rt’) as f: for line in f: c.update(line.rstrip().lower()) print ’Most common:’ for letter, count in c.most_common(3): print ’%s: %7d’ % (letter, count) This example counts the letters appearing in all words in the system dictionary to produce a frequency distribution, and then prints the three most common letters. Leaving out the argument to most_common() produces a list of all the items, in order of frequency. $ python collections_counter_most_common.py Most common: e: 234803 i: 200613 a: 198938 Arithmetic Counter instances support arithmetic and set operations for aggregating results. import collections c1 = collections.Counter([’a’, ’b’, ’c’, ’a’, ’b’, ’b’]) c2 = collections.Counter(’alphabet’) print ’C1:’, c1 print ’C2:’, c2 print ’\nCombined counts:’ print c1 + c2 print ’\nSubtraction:’ print c1 - c2 算术操作 Counter 实例支持算术和集合操作来完成结果的聚集。 ptg 2.1. collections—Container Data Types 73 import collections c = collections.Counter() with open(’/usr/share/dict/words’, ’rt’) as f: for line in f: c.update(line.rstrip().lower()) print ’Most common:’ for letter, count in c.most_common(3): print ’%s: %7d’ % (letter, count) This example counts the letters appearing in all words in the system dictionary to produce a frequency distribution, and then prints the three most common letters. Leaving out the argument to most_common() produces a list of all the items, in order of frequency. $ python collections_counter_most_common.py Most common: e: 234803 i: 200613 a: 198938 Arithmetic Counter instances support arithmetic and set operations for aggregating results. import collections c1 = collections.Counter([’a’, ’b’, ’c’, ’a’, ’b’, ’b’]) c2 = collections.Counter(’alphabet’) print ’C1:’, c1 print ’C2:’, c2 print ’\nCombined counts:’ print c1 + c2 print ’\nSubtraction:’ print c1 - c2 ptg 74 Data Structures print ’\nIntersection (taking positive minimums):’ print c1 & c2 print ’\nUnion (taking maximums):’ print c1 | c2 Each time a new Counter is produced through an operation, any items with zero or negative counts are discarded. The count for a is the same in c1 and c2, so subtrac- tion leaves it at zero. $ python collections_counter_arithmetic.py C1: Counter({’b’: 3, ’a’: 2, ’c’: 1}) C2: Counter({’a’: 2, ’b’: 1, ’e’: 1, ’h’: 1, ’l’: 1, ’p’: 1, ’t’: 1}) Combined counts: Counter({’a’: 4, ’b’: 4, ’c’: 1, ’e’: 1, ’h’: 1, ’l’: 1, ’p’: 1, ’t’: 1}) Subtraction: Counter({’b’: 2, ’c’: 1}) Intersection (taking positive minimums): Counter({’a’: 2, ’b’: 1}) Union (taking maximums): Counter({’b’: 3, ’a’: 2, ’c’: 1, ’e’: 1, ’h’: 1, ’l’: 1, ’p’: 1, ’t’: 1}) 2.1.2 defaultdict The standard dictionary includes the method setdefault() for retrieving a value and establishing a default if the value does not exist. By contrast, defaultdict lets the caller specify the default up front when the container is initialized. import collections def default_factory(): return ’default value’ d = collections.defaultdict(default_factory, foo=’bar’) print ’d:’, d 每次通过一个操作生成一个新的 Counter 时,计数为 0 或负数的元素都会被删除。在 c1 和 c2 中 a 的计数相同,所以减法操作后它的计数为 0。 ptg 74 Data Structures print ’\nIntersection (taking positive minimums):’ print c1 & c2 print ’\nUnion (taking maximums):’ print c1 | c2 Each time a new Counter is produced through an operation, any items with zero or negative counts are discarded. The count for a is the same in c1 and c2, so subtrac- tion leaves it at zero. $ python collections_counter_arithmetic.py C1: Counter({’b’: 3, ’a’: 2, ’c’: 1}) C2: Counter({’a’: 2, ’b’: 1, ’e’: 1, ’h’: 1, ’l’: 1, ’p’: 1, ’t’: 1}) Combined counts: Counter({’a’: 4, ’b’: 4, ’c’: 1, ’e’: 1, ’h’: 1, ’l’: 1, ’p’: 1, ’t’: 1}) Subtraction: Counter({’b’: 2, ’c’: 1}) Intersection (taking positive minimums): Counter({’a’: 2, ’b’: 1}) Union (taking maximums): Counter({’b’: 3, ’a’: 2, ’c’: 1, ’e’: 1, ’h’: 1, ’l’: 1, ’p’: 1, ’t’: 1}) 2.1.2 defaultdict The standard dictionary includes the method setdefault() for retrieving a value and establishing a default if the value does not exist. By contrast, defaultdict lets the caller specify the default up front when the container is initialized. import collections def default_factory(): return ’default value’ d = collections.defaultdict(default_factory, foo=’bar’) print ’d:’, d 第 2 章 数 据 结 构 59  ptg 74 Data Structures print ’\nIntersection (taking positive minimums):’ print c1 & c2 print ’\nUnion (taking maximums):’ print c1 | c2 Each time a new Counter is produced through an operation, any items with zero or negative counts are discarded. The count for a is the same in c1 and c2, so subtrac- tion leaves it at zero. $ python collections_counter_arithmetic.py C1: Counter({’b’: 3, ’a’: 2, ’c’: 1}) C2: Counter({’a’: 2, ’b’: 1, ’e’: 1, ’h’: 1, ’l’: 1, ’p’: 1, ’t’: 1}) Combined counts: Counter({’a’: 4, ’b’: 4, ’c’: 1, ’e’: 1, ’h’: 1, ’l’: 1, ’p’: 1, ’t’: 1}) Subtraction: Counter({’b’: 2, ’c’: 1}) Intersection (taking positive minimums): Counter({’a’: 2, ’b’: 1}) Union (taking maximums): Counter({’b’: 3, ’a’: 2, ’c’: 1, ’e’: 1, ’h’: 1, ’l’: 1, ’p’: 1, ’t’: 1}) 2.1.2 defaultdict The standard dictionary includes the method setdefault() for retrieving a value and establishing a default if the value does not exist. By contrast, defaultdict lets the caller specify the default up front when the container is initialized. import collections def default_factory(): return ’default value’ d = collections.defaultdict(default_factory, foo=’bar’) print ’d:’, d 2.1.2 defaultdict 标准字典包括一个方法 setdefault() 来获取一个值,如果这个值不存在则建立一个默认值。 与之相反,defaultdict 初始化容器时会让调用者提前指定默认值。 ptg 74 Data Structures print ’\nIntersection (taking positive minimums):’ print c1 & c2 print ’\nUnion (taking maximums):’ print c1 | c2 Each time a new Counter is produced through an operation, any items with zero or negative counts are discarded. The count for a is the same in c1 and c2, so subtrac- tion leaves it at zero. $ python collections_counter_arithmetic.py C1: Counter({’b’: 3, ’a’: 2, ’c’: 1}) C2: Counter({’a’: 2, ’b’: 1, ’e’: 1, ’h’: 1, ’l’: 1, ’p’: 1, ’t’: 1}) Combined counts: Counter({’a’: 4, ’b’: 4, ’c’: 1, ’e’: 1, ’h’: 1, ’l’: 1, ’p’: 1, ’t’: 1}) Subtraction: Counter({’b’: 2, ’c’: 1}) Intersection (taking positive minimums): Counter({’a’: 2, ’b’: 1}) Union (taking maximums): Counter({’b’: 3, ’a’: 2, ’c’: 1, ’e’: 1, ’h’: 1, ’l’: 1, ’p’: 1, ’t’: 1}) 2.1.2 defaultdict The standard dictionary includes the method setdefault() for retrieving a value and establishing a default if the value does not exist. By contrast, defaultdict lets the caller specify the default up front when the container is initialized. import collections def default_factory(): return ’default value’ d = collections.defaultdict(default_factory, foo=’bar’) print ’d:’, d ptg 2.1. collections—Container Data Types 75 print ’foo =>’, d[’foo’] print ’bar =>’, d[’bar’] This method works well, as long as it is appropriate for all keys to have the same default. It can be especially useful if the default is a type used for aggregating or accu- mulating values, such as a list, set, or even int. The standard library documentation includes several examples of using defaultdict this way. $ python collections_defaultdict.py d: defaultdict(, {’foo’: ’bar’}) foo => bar bar => default value See Also: defaultdict examples (http://docs.python.org/lib/defaultdict-examples.html) Examples of using defaultdict from the standard library documentation. Evolution of Default Dictionaries in Python (http://jtauber.com/blog/2008/02/27/evolution_of_default_dictionaries_in_ python/) Discussion from James Tauber of how defaultdict relates to other means of initializing dictionaries. 2.1.3 Deque A double-ended queue, or deque, supports adding and removing elements from either end. The more commonly used structures, stacks, and queues are degenerate forms of deques where the inputs and outputs are restricted to a single end. import collections d = collections.deque(’abcdefg’) print ’Deque:’, d print ’Length:’, len(d) print ’Left end:’, d[0] print ’Right end:’, d[-1] d.remove(’c’) print ’remove(c):’, d 只要所有键都有相同的默认值并无不妥,就可以使用这个方法。如果默认值是一种用于聚 集或累加值的类型,如 list、set 或者甚至是 int,这个方法尤其有用。标准库文档提供了很多采 用这种方式使用 defaultdict 的例子。 ptg 2.1. collections—Container Data Types 75 print ’foo =>’, d[’foo’] print ’bar =>’, d[’bar’] This method works well, as long as it is appropriate for all keys to have the same default. It can be especially useful if the default is a type used for aggregating or accu- mulating values, such as a list, set, or even int. The standard library documentation includes several examples of using defaultdict this way. $ python collections_defaultdict.py d: defaultdict(, {’foo’: ’bar’}) foo => bar bar => default value See Also: defaultdict examples (http://docs.python.org/lib/defaultdict-examples.html) Examples of using defaultdict from the standard library documentation. Evolution of Default Dictionaries in Python (http://jtauber.com/blog/2008/02/27/evolution_of_default_dictionaries_in_ python/) Discussion from James Tauber of how defaultdict relates to other means of initializing dictionaries. 2.1.3 Deque A double-ended queue, or deque, supports adding and removing elements from either end. The more commonly used structures, stacks, and queues are degenerate forms of deques where the inputs and outputs are restricted to a single end. import collections d = collections.deque(’abcdefg’) print ’Deque:’, d print ’Length:’, len(d) print ’Left end:’, d[0] print ’Right end:’, d[-1] d.remove(’c’) print ’remove(c):’, d 参见: defaultdict examples (http://docs.python.org/lib/defaultdict-examples.html) 标准库文档中使 用 defaultdict 的例子。 Evolution of Default Dictionaries in Python(http://jtauber.com/blog/2008/02/27/evolution_of_ default_dictionaries_in_python/) James Tauber 对 defaultdict 与初始化字典的其他方式之间的关 联所做的讨论。 2.1.3 deque deque(双端队列)支持从任意一端增加和删除元素。更为常用的两种结构,即栈和队列, 就是双端队列的退化形式,其输入和输出限制在一端。 ptg 2.1. collections—Container Data Types 75 print ’foo =>’, d[’foo’] print ’bar =>’, d[’bar’] This method works well, as long as it is appropriate for all keys to have the same default. It can be especially useful if the default is a type used for aggregating or accu- mulating values, such as a list, set, or even int. The standard library documentation includes several examples of using defaultdict this way. $ python collections_defaultdict.py d: defaultdict(, {’foo’: ’bar’}) foo => bar bar => default value See Also: defaultdict examples (http://docs.python.org/lib/defaultdict-examples.html) Examples of using defaultdict from the standard library documentation. Evolution of Default Dictionaries in Python (http://jtauber.com/blog/2008/02/27/evolution_of_default_dictionaries_in_ python/) Discussion from James Tauber of how defaultdict relates to other means of initializing dictionaries. 2.1.3 Deque A double-ended queue, or deque, supports adding and removing elements from either end. The more commonly used structures, stacks, and queues are degenerate forms of deques where the inputs and outputs are restricted to a single end. import collections d = collections.deque(’abcdefg’) print ’Deque:’, d print ’Length:’, len(d) print ’Left end:’, d[0] print ’Right end:’, d[-1] d.remove(’c’) print ’remove(c):’, d  60  Python 标准库  ptg 2.1. collections—Container Data Types 75 print ’foo =>’, d[’foo’] print ’bar =>’, d[’bar’] This method works well, as long as it is appropriate for all keys to have the same default. It can be especially useful if the default is a type used for aggregating or accu- mulating values, such as a list, set, or even int. The standard library documentation includes several examples of using defaultdict this way. $ python collections_defaultdict.py d: defaultdict(, {’foo’: ’bar’}) foo => bar bar => default value See Also: defaultdict examples (http://docs.python.org/lib/defaultdict-examples.html) Examples of using defaultdict from the standard library documentation. Evolution of Default Dictionaries in Python (http://jtauber.com/blog/2008/02/27/evolution_of_default_dictionaries_in_ python/) Discussion from James Tauber of how defaultdict relates to other means of initializing dictionaries. 2.1.3 Deque A double-ended queue, or deque, supports adding and removing elements from either end. The more commonly used structures, stacks, and queues are degenerate forms of deques where the inputs and outputs are restricted to a single end. import collections d = collections.deque(’abcdefg’) print ’Deque:’, d print ’Length:’, len(d) print ’Left end:’, d[0] print ’Right end:’, d[-1] d.remove(’c’) print ’remove(c):’, d 由于 deque 是一种序列容器,因此同样支持 list 的一些操作,如用 __getitem__() 检查内容、 确定长度,以及通过匹配标识从序列中间删除元素。 ptg 76 Data Structures Since deques are a type of sequence container, they support some of the same operations as list, such as examining the contents with __getitem__(), determining length, and removing elements from the middle by matching identity. $ python collections_deque.py Deque: deque([’a’, ’b’, ’c’, ’d’, ’e’, ’f’, ’g’]) Length: 7 Left end: a Right end: g remove(c): deque([’a’, ’b’, ’d’, ’e’, ’f’, ’g’]) Populating A deque can be populated from either end, termed “left” and “right” in the Python implementation. import collections # Add to the right d1 = collections.deque() d1.extend(’abcdefg’) print ’extend :’, d1 d1.append(’h’) print ’append :’, d1 # Add to the left d2 = collections.deque() d2.extendleft(xrange(6)) print ’extendleft:’, d2 d2.appendleft(6) print ’appendleft:’, d2 The extendleft() function iterates over its input and performs the equivalent of an appendleft() for each item. The end result is that the deque contains the input sequence in reverse order. $ python collections_deque_populating.py extend : deque([’a’, ’b’, ’c’, ’d’, ’e’, ’f’, ’g’]) append : deque([’a’, ’b’, ’c’, ’d’, ’e’, ’f’, ’g’, ’h’]) 填充 deque 可以从任意一端填充,在 Python 实现中称为“左端”和“右端”。 ptg 76 Data Structures Since deques are a type of sequence container, they support some of the same operations as list, such as examining the contents with __getitem__(), determining length, and removing elements from the middle by matching identity. $ python collections_deque.py Deque: deque([’a’, ’b’, ’c’, ’d’, ’e’, ’f’, ’g’]) Length: 7 Left end: a Right end: g remove(c): deque([’a’, ’b’, ’d’, ’e’, ’f’, ’g’]) Populating A deque can be populated from either end, termed “left” and “right” in the Python implementation. import collections # Add to the right d1 = collections.deque() d1.extend(’abcdefg’) print ’extend :’, d1 d1.append(’h’) print ’append :’, d1 # Add to the left d2 = collections.deque() d2.extendleft(xrange(6)) print ’extendleft:’, d2 d2.appendleft(6) print ’appendleft:’, d2 The extendleft() function iterates over its input and performs the equivalent of an appendleft() for each item. The end result is that the deque contains the input sequence in reverse order. $ python collections_deque_populating.py extend : deque([’a’, ’b’, ’c’, ’d’, ’e’, ’f’, ’g’]) append : deque([’a’, ’b’, ’c’, ’d’, ’e’, ’f’, ’g’, ’h’]) extendleft() 函数迭代处理其输入,对各个元素完成与 appendleft() 同样的处理。最终结果 是 deque 将包含逆序的输入序列。 ptg 76 Data Structures Since deques are a type of sequence container, they support some of the same operations as list, such as examining the contents with __getitem__(), determining length, and removing elements from the middle by matching identity. $ python collections_deque.py Deque: deque([’a’, ’b’, ’c’, ’d’, ’e’, ’f’, ’g’]) Length: 7 Left end: a Right end: g remove(c): deque([’a’, ’b’, ’d’, ’e’, ’f’, ’g’]) Populating A deque can be populated from either end, termed “left” and “right” in the Python implementation. import collections # Add to the right d1 = collections.deque() d1.extend(’abcdefg’) print ’extend :’, d1 d1.append(’h’) print ’append :’, d1 # Add to the left d2 = collections.deque() d2.extendleft(xrange(6)) print ’extendleft:’, d2 d2.appendleft(6) print ’appendleft:’, d2 The extendleft() function iterates over its input and performs the equivalent of an appendleft() for each item. The end result is that the deque contains the input sequence in reverse order. $ python collections_deque_populating.py extend : deque([’a’, ’b’, ’c’, ’d’, ’e’, ’f’, ’g’]) append : deque([’a’, ’b’, ’c’, ’d’, ’e’, ’f’, ’g’, ’h’]) ptg 2.1. collections—Container Data Types 77 extendleft: deque([5, 4, 3, 2, 1, 0]) appendleft: deque([6, 5, 4, 3, 2, 1, 0]) Consuming Similarly, the elements of the deque can be consumed from both ends or either end, depending on the algorithm being applied. import collections print ’From the right:’ d = collections.deque(’abcdefg’) while True: try: print d.pop(), except IndexError: break print print ’\nFrom the left:’ d = collections.deque(xrange(6)) while True: try: print d.popleft(), except IndexError: break print Use pop() to remove an item from the right end of the deque and popleft() to take from the left end. $ python collections_deque_consuming.py From the right: g f e d c b a From the left: 0 1 2 3 4 5 Since deques are thread-safe, the contents can even be consumed from both ends at the same time from separate threads. 第 2 章 数 据 结 构 61  利用 类似地,可以从两端或任意一端利用 deque 的元素,这取决于所应用的算法。 ptg 2.1. collections—Container Data Types 77 extendleft: deque([5, 4, 3, 2, 1, 0]) appendleft: deque([6, 5, 4, 3, 2, 1, 0]) Consuming Similarly, the elements of the deque can be consumed from both ends or either end, depending on the algorithm being applied. import collections print ’From the right:’ d = collections.deque(’abcdefg’) while True: try: print d.pop(), except IndexError: break print print ’\nFrom the left:’ d = collections.deque(xrange(6)) while True: try: print d.popleft(), except IndexError: break print Use pop() to remove an item from the right end of the deque and popleft() to take from the left end. $ python collections_deque_consuming.py From the right: g f e d c b a From the left: 0 1 2 3 4 5 Since deques are thread-safe, the contents can even be consumed from both ends at the same time from separate threads. 使用 pop() 可以从 deque 的右端删除一个元素,使用 popleft() 可以从 deque 的左端删除一 个元素。 ptg 2.1. collections—Container Data Types 77 extendleft: deque([5, 4, 3, 2, 1, 0]) appendleft: deque([6, 5, 4, 3, 2, 1, 0]) Consuming Similarly, the elements of the deque can be consumed from both ends or either end, depending on the algorithm being applied. import collections print ’From the right:’ d = collections.deque(’abcdefg’) while True: try: print d.pop(), except IndexError: break print print ’\nFrom the left:’ d = collections.deque(xrange(6)) while True: try: print d.popleft(), except IndexError: break print Use pop() to remove an item from the right end of the deque and popleft() to take from the left end. $ python collections_deque_consuming.py From the right: g f e d c b a From the left: 0 1 2 3 4 5 Since deques are thread-safe, the contents can even be consumed from both ends at the same time from separate threads. 由于双端队列是线程安全的,所以甚至可以在不同线程中同时从两端利用队列的内容。 ptg 78 Data Structures import collections import threading import time candle = collections.deque(xrange(5)) def burn(direction, nextSource): while True: try: next = nextSource() except IndexError: break else: print ’%8s: %s’ % (direction, next) time.sleep(0.1) print ’%8s done’ % direction return left = threading.Thread(target=burn, args=(’Left’, candle.popleft)) right = threading.Thread(target=burn, args=(’Right’, candle.pop)) left.start() right.start() left.join() right.join() The threads in this example alternate between each end, removing items until the deque is empty. $ python collections_deque_both_ends.py Left: 0 Right: 4 Right: 3 Left: 1 Right: 2 Left done Right done Rotating Another useful capability of the deque is to rotate it in either direction, to skip over some items.  62  Python 标准库  ptg 78 Data Structures import collections import threading import time candle = collections.deque(xrange(5)) def burn(direction, nextSource): while True: try: next = nextSource() except IndexError: break else: print ’%8s: %s’ % (direction, next) time.sleep(0.1) print ’%8s done’ % direction return left = threading.Thread(target=burn, args=(’Left’, candle.popleft)) right = threading.Thread(target=burn, args=(’Right’, candle.pop)) left.start() right.start() left.join() right.join() The threads in this example alternate between each end, removing items until the deque is empty. $ python collections_deque_both_ends.py Left: 0 Right: 4 Right: 3 Left: 1 Right: 2 Left done Right done Rotating Another useful capability of the deque is to rotate it in either direction, to skip over some items. 这个例子中的线程交替处理两端,删除元素,直至这个 deque 为空。 ptg 78 Data Structures import collections import threading import time candle = collections.deque(xrange(5)) def burn(direction, nextSource): while True: try: next = nextSource() except IndexError: break else: print ’%8s: %s’ % (direction, next) time.sleep(0.1) print ’%8s done’ % direction return left = threading.Thread(target=burn, args=(’Left’, candle.popleft)) right = threading.Thread(target=burn, args=(’Right’, candle.pop)) left.start() right.start() left.join() right.join() The threads in this example alternate between each end, removing items until the deque is empty. $ python collections_deque_both_ends.py Left: 0 Right: 4 Right: 3 Left: 1 Right: 2 Left done Right done Rotating Another useful capability of the deque is to rotate it in either direction, to skip over some items. 旋转 deque 的另一个很有用的功能是可以按任意一个方向旋转,而跳过一些元素。 ptg 2.1. collections—Container Data Types 79 import collections d = collections.deque(xrange(10)) print ’Normal :’, d d = collections.deque(xrange(10)) d.rotate(2) print ’Right rotation:’, d d = collections.deque(xrange(10)) d.rotate(-2) print ’Left rotation :’, d Rotating the deque to the right (using a positive rotation) takes items from the right end and moves them to the left end. Rotating to the left (with a negative value) takes items from the left end and moves them to the right end. It may help to visualize the items in the deque as being engraved along the edge of a dial. $ python collections_deque_rotate.py Normal : deque([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]) Right rotation: deque([8, 9, 0, 1, 2, 3, 4, 5, 6, 7]) Left rotation : deque([2, 3, 4, 5, 6, 7, 8, 9, 0, 1]) See Also: Deque (http://en.wikipedia.org/wiki/Deque) Wikipedia article that provides a dis- cussion of the deque data structure. Deque Recipes (http://docs.python.org/lib/deque-recipes.html) Examples of using deques in algorithms from the standard library documentation. 2.1.4 namedtuple The standard tuple uses numerical indexes to access its members. bob = (’Bob’, 30, ’male’) print ’Representation:’, bob jane = (’Jane’, 29, ’female’) print ’\nField by index:’, jane[0] print ’\nFields by index:’ for p in [ bob, jane ]: print ’%s is a %d year old %s’ % p 将 deque 向右旋转(使用一个正旋转值),会从右端取元素,把它们移到左端。向左旋转 (使用一个负旋转值)则从左端将元素移至右端。可以形象地把 deque 中的元素看作刻在拨号盘 上,这对于理解双端队列很有帮助。 ptg 2.1. collections—Container Data Types 79 import collections d = collections.deque(xrange(10)) print ’Normal :’, d d = collections.deque(xrange(10)) d.rotate(2) print ’Right rotation:’, d d = collections.deque(xrange(10)) d.rotate(-2) print ’Left rotation :’, d Rotating the deque to the right (using a positive rotation) takes items from the right end and moves them to the left end. Rotating to the left (with a negative value) takes items from the left end and moves them to the right end. It may help to visualize the items in the deque as being engraved along the edge of a dial. $ python collections_deque_rotate.py Normal : deque([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]) Right rotation: deque([8, 9, 0, 1, 2, 3, 4, 5, 6, 7]) Left rotation : deque([2, 3, 4, 5, 6, 7, 8, 9, 0, 1]) See Also: Deque (http://en.wikipedia.org/wiki/Deque) Wikipedia article that provides a dis- cussion of the deque data structure. Deque Recipes (http://docs.python.org/lib/deque-recipes.html) Examples of using deques in algorithms from the standard library documentation. 2.1.4 namedtuple The standard tuple uses numerical indexes to access its members. bob = (’Bob’, 30, ’male’) print ’Representation:’, bob jane = (’Jane’, 29, ’female’) print ’\nField by index:’, jane[0] print ’\nFields by index:’ for p in [ bob, jane ]: print ’%s is a %d year old %s’ % p 第 2 章 数 据 结 构 63  参见: Deque (http://en.wikipedia.org/wiki/Deque) 维基百科文章,提供了对双端队列数据结构的讨论。 Deque Recipes (http://docs.python.org/lib/deque-recipes.html) 标准库文档的算法中使用双 端队列的例子。 2.1.4 namedtuple 标准 tuple 使用数值索引来访问其成员。 ptg 2.1. collections—Container Data Types 79 import collections d = collections.deque(xrange(10)) print ’Normal :’, d d = collections.deque(xrange(10)) d.rotate(2) print ’Right rotation:’, d d = collections.deque(xrange(10)) d.rotate(-2) print ’Left rotation :’, d Rotating the deque to the right (using a positive rotation) takes items from the right end and moves them to the left end. Rotating to the left (with a negative value) takes items from the left end and moves them to the right end. It may help to visualize the items in the deque as being engraved along the edge of a dial. $ python collections_deque_rotate.py Normal : deque([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]) Right rotation: deque([8, 9, 0, 1, 2, 3, 4, 5, 6, 7]) Left rotation : deque([2, 3, 4, 5, 6, 7, 8, 9, 0, 1]) See Also: Deque (http://en.wikipedia.org/wiki/Deque) Wikipedia article that provides a dis- cussion of the deque data structure. Deque Recipes (http://docs.python.org/lib/deque-recipes.html) Examples of using deques in algorithms from the standard library documentation. 2.1.4 namedtuple The standard tuple uses numerical indexes to access its members. bob = (’Bob’, 30, ’male’) print ’Representation:’, bob jane = (’Jane’, 29, ’female’) print ’\nField by index:’, jane[0] print ’\nFields by index:’ for p in [ bob, jane ]: print ’%s is a %d year old %s’ % p 因此对于简单用途来说,tuple 是很方便的容器。 ptg 80 Data Structures This makes tuples convenient containers for simple uses. $ python collections_tuple.py Representation: (’Bob’, 30, ’male’) Field by index: Jane Fields by index: Bob is a 30 year old male Jane is a 29 year old female On the other hand, remembering which index should be used for each value can lead to errors, especially if the tuple has a lot of fields and is constructed far from where it is used. A namedtuple assigns names, as well as the numerical index, to each member. Defining namedtuple instances are just as memory efficient as regular tuples because they do not have per-instance dictionaries. Each kind of namedtuple is represented by its own class, created by using the namedtuple() factory function. The arguments are the name of the new class and a string containing the names of the elements. import collections Person = collections.namedtuple(’Person’, ’name age gender’) print ’Type of Person:’, type(Person) bob = Person(name=’Bob’, age=30, gender=’male’) print ’\nRepresentation:’, bob jane = Person(name=’Jane’, age=29, gender=’female’) print ’\nField by name:’, jane.name print ’\nFields by index:’ for p in [ bob, jane ]: print ’%s is a %d year old %s’ % p As the example illustrates, it is possible to access the fields of the namedtuple by name using dotted notation (obj.attr) as well as using the positional indexes of standard tuples. 另一方面,使用 tuple 时需要记住对应各个值要使用哪个索引,这可能会导致错误,特别是 当 tuple 有大量字段,而且元组的构造和使用相距很远时。对于各个成员,namedtuple 除了指 定数值索引外,还会指定名字。 定义 namedtuple 实例与常规元组在内存使用方面同样高效,因为它们没有各实例的字典。各种 namedtuple 都由其自己的类表示,使用 namedtuple() 工厂函数来创建。参数就是新类名和一个 包含元素名的字符串。 ptg 80 Data Structures This makes tuples convenient containers for simple uses. $ python collections_tuple.py Representation: (’Bob’, 30, ’male’) Field by index: Jane Fields by index: Bob is a 30 year old male Jane is a 29 year old female On the other hand, remembering which index should be used for each value can lead to errors, especially if the tuple has a lot of fields and is constructed far from where it is used. A namedtuple assigns names, as well as the numerical index, to each member. Defining namedtuple instances are just as memory efficient as regular tuples because they do not have per-instance dictionaries. Each kind of namedtuple is represented by its own class, created by using the namedtuple() factory function. The arguments are the name of the new class and a string containing the names of the elements. import collections Person = collections.namedtuple(’Person’, ’name age gender’) print ’Type of Person:’, type(Person) bob = Person(name=’Bob’, age=30, gender=’male’) print ’\nRepresentation:’, bob jane = Person(name=’Jane’, age=29, gender=’female’) print ’\nField by name:’, jane.name print ’\nFields by index:’ for p in [ bob, jane ]: print ’%s is a %d year old %s’ % p As the example illustrates, it is possible to access the fields of the namedtuple by name using dotted notation (obj.attr) as well as using the positional indexes of standard tuples.  64  Python 标准库  ptg 80 Data Structures This makes tuples convenient containers for simple uses. $ python collections_tuple.py Representation: (’Bob’, 30, ’male’) Field by index: Jane Fields by index: Bob is a 30 year old male Jane is a 29 year old female On the other hand, remembering which index should be used for each value can lead to errors, especially if the tuple has a lot of fields and is constructed far from where it is used. A namedtuple assigns names, as well as the numerical index, to each member. Defining namedtuple instances are just as memory efficient as regular tuples because they do not have per-instance dictionaries. Each kind of namedtuple is represented by its own class, created by using the namedtuple() factory function. The arguments are the name of the new class and a string containing the names of the elements. import collections Person = collections.namedtuple(’Person’, ’name age gender’) print ’Type of Person:’, type(Person) bob = Person(name=’Bob’, age=30, gender=’male’) print ’\nRepresentation:’, bob jane = Person(name=’Jane’, age=29, gender=’female’) print ’\nField by name:’, jane.name print ’\nFields by index:’ for p in [ bob, jane ]: print ’%s is a %d year old %s’ % p As the example illustrates, it is possible to access the fields of the namedtuple by name using dotted notation (obj.attr) as well as using the positional indexes of standard tuples. 如这个例子所示,除了使用标准元组的位置索引外,还可以使用点记法(obj.attr)按名字 访问 namedtuple 的字段。 ptg 2.1. collections—Container Data Types 81 $ python collections_namedtuple_person.py Type of Person: Representation: Person(name=’Bob’, age=30, gender=’male’) Field by name: Jane Fields by index: Bob is a 30 year old male Jane is a 29 year old female Invalid Field Names Field names are invalid if they are repeated or conflict with Python keywords. import collections try: collections.namedtuple(’Person’, ’name class age gender’) except ValueError, err: print err try: collections.namedtuple(’Person’, ’name age gender age’) except ValueError, err: print err As the field names are parsed, invalid values cause ValueError exceptions. $ python collections_namedtuple_bad_fields.py Type names and field names cannot be a keyword: ’class’ Encountered duplicate field name: ’age’ If a namedtuple is being created based on values outside of the control of the pro- gram (such as to represent the rows returned by a database query, where the schema is not known in advance), set the rename option to True so the invalid fields are renamed. import collections with_class = collections.namedtuple( ’Person’, ’name class age gender’, rename=True) 非法字段名 如果字段名重复或与 Python 关键字冲突,就是非法字段名。 ptg 2.1. collections—Container Data Types 81 $ python collections_namedtuple_person.py Type of Person: Representation: Person(name=’Bob’, age=30, gender=’male’) Field by name: Jane Fields by index: Bob is a 30 year old male Jane is a 29 year old female Invalid Field Names Field names are invalid if they are repeated or conflict with Python keywords. import collections try: collections.namedtuple(’Person’, ’name class age gender’) except ValueError, err: print err try: collections.namedtuple(’Person’, ’name age gender age’) except ValueError, err: print err As the field names are parsed, invalid values cause ValueError exceptions. $ python collections_namedtuple_bad_fields.py Type names and field names cannot be a keyword: ’class’ Encountered duplicate field name: ’age’ If a namedtuple is being created based on values outside of the control of the pro- gram (such as to represent the rows returned by a database query, where the schema is not known in advance), set the rename option to True so the invalid fields are renamed. import collections with_class = collections.namedtuple( ’Person’, ’name class age gender’, rename=True) 解析字段名时,非法值会导致 ValueError 异常。 ptg 2.1. collections—Container Data Types 81 $ python collections_namedtuple_person.py Type of Person: Representation: Person(name=’Bob’, age=30, gender=’male’) Field by name: Jane Fields by index: Bob is a 30 year old male Jane is a 29 year old female Invalid Field Names Field names are invalid if they are repeated or conflict with Python keywords. import collections try: collections.namedtuple(’Person’, ’name class age gender’) except ValueError, err: print err try: collections.namedtuple(’Person’, ’name age gender age’) except ValueError, err: print err As the field names are parsed, invalid values cause ValueError exceptions. $ python collections_namedtuple_bad_fields.py Type names and field names cannot be a keyword: ’class’ Encountered duplicate field name: ’age’ If a namedtuple is being created based on values outside of the control of the pro- gram (such as to represent the rows returned by a database query, where the schema is not known in advance), set the rename option to True so the invalid fields are renamed. import collections with_class = collections.namedtuple( ’Person’, ’name class age gender’, rename=True) 如果创建一个 namedtuple 时要基于在程序控制之外的值(如表示一个数据库查询返回的记 录行,而且数据库模式事先并不知道),要将 rename 选项设置为 True,从而对非法字段重命名。 ptg 2.1. collections—Container Data Types 81 $ python collections_namedtuple_person.py Type of Person: Representation: Person(name=’Bob’, age=30, gender=’male’) Field by name: Jane Fields by index: Bob is a 30 year old male Jane is a 29 year old female Invalid Field Names Field names are invalid if they are repeated or conflict with Python keywords. import collections try: collections.namedtuple(’Person’, ’name class age gender’) except ValueError, err: print err try: collections.namedtuple(’Person’, ’name age gender age’) except ValueError, err: print err As the field names are parsed, invalid values cause ValueError exceptions. $ python collections_namedtuple_bad_fields.py Type names and field names cannot be a keyword: ’class’ Encountered duplicate field name: ’age’ If a namedtuple is being created based on values outside of the control of the pro- gram (such as to represent the rows returned by a database query, where the schema is not known in advance), set the rename option to True so the invalid fields are renamed. import collections with_class = collections.namedtuple( ’Person’, ’name class age gender’, rename=True) 第 2 章 数 据 结 构 65  ptg 2.1. collections—Container Data Types 81 $ python collections_namedtuple_person.py Type of Person: Representation: Person(name=’Bob’, age=30, gender=’male’) Field by name: Jane Fields by index: Bob is a 30 year old male Jane is a 29 year old female Invalid Field Names Field names are invalid if they are repeated or conflict with Python keywords. import collections try: collections.namedtuple(’Person’, ’name class age gender’) except ValueError, err: print err try: collections.namedtuple(’Person’, ’name age gender age’) except ValueError, err: print err As the field names are parsed, invalid values cause ValueError exceptions. $ python collections_namedtuple_bad_fields.py Type names and field names cannot be a keyword: ’class’ Encountered duplicate field name: ’age’ If a namedtuple is being created based on values outside of the control of the pro- gram (such as to represent the rows returned by a database query, where the schema is not known in advance), set the rename option to True so the invalid fields are renamed. import collections with_class = collections.namedtuple( ’Person’, ’name class age gender’, rename=True) ptg 82 Data Structures print with_class._fields two_ages = collections.namedtuple( ’Person’, ’name age gender age’, rename=True) print two_ages._fields The new names for renamed fields depend on their index in the tuple, so the field with name class becomes _1 and the duplicate age field is changed to _3. $ python collections_namedtuple_rename.py (’name’, ’_1’, ’age’, ’gender’) (’name’, ’age’, ’gender’, ’_3’) 2.1.5 OrderedDict An OrderedDict is a dictionary subclass that remembers the order in which its con- tents are added. import collections print ’Regular dictionary:’ d = {} d[’a’] = ’A’ d[’b’] = ’B’ d[’c’] = ’C’ for k, v in d.items(): print k, v print ’\nOrderedDict:’ d = collections.OrderedDict() d[’a’] = ’A’ d[’b’] = ’B’ d[’c’] = ’C’ for k, v in d.items(): print k, v A regular dict does not track the insertion order, and iterating over it produces the values in order based on how the keys are stored in the hash table. In an OrderedDict, ptg 82 Data Structures print with_class._fields two_ages = collections.namedtuple( ’Person’, ’name age gender age’, rename=True) print two_ages._fields The new names for renamed fields depend on their index in the tuple, so the field with name class becomes _1 and the duplicate age field is changed to _3. $ python collections_namedtuple_rename.py (’name’, ’_1’, ’age’, ’gender’) (’name’, ’age’, ’gender’, ’_3’) 2.1.5 OrderedDict An OrderedDict is a dictionary subclass that remembers the order in which its con- tents are added. import collections print ’Regular dictionary:’ d = {} d[’a’] = ’A’ d[’b’] = ’B’ d[’c’] = ’C’ for k, v in d.items(): print k, v print ’\nOrderedDict:’ d = collections.OrderedDict() d[’a’] = ’A’ d[’b’] = ’B’ d[’c’] = ’C’ for k, v in d.items(): print k, v A regular dict does not track the insertion order, and iterating over it produces the values in order based on how the keys are stored in the hash table. In an OrderedDict, 重命名的字段的新名字取决于它在 tuple 中的索引,所以名为 class 的字段会变成 _1,重复 的 age 字段则变成 _3。 ptg 82 Data Structures print with_class._fields two_ages = collections.namedtuple( ’Person’, ’name age gender age’, rename=True) print two_ages._fields The new names for renamed fields depend on their index in the tuple, so the field with name class becomes _1 and the duplicate age field is changed to _3. $ python collections_namedtuple_rename.py (’name’, ’_1’, ’age’, ’gender’) (’name’, ’age’, ’gender’, ’_3’) 2.1.5 OrderedDict An OrderedDict is a dictionary subclass that remembers the order in which its con- tents are added. import collections print ’Regular dictionary:’ d = {} d[’a’] = ’A’ d[’b’] = ’B’ d[’c’] = ’C’ for k, v in d.items(): print k, v print ’\nOrderedDict:’ d = collections.OrderedDict() d[’a’] = ’A’ d[’b’] = ’B’ d[’c’] = ’C’ for k, v in d.items(): print k, v A regular dict does not track the insertion order, and iterating over it produces the values in order based on how the keys are stored in the hash table. In an OrderedDict, 2.1.5 OrderedDict OrderedDict 是一个字典子类,可以记住其内容增加的顺序。 ptg 82 Data Structures print with_class._fields two_ages = collections.namedtuple( ’Person’, ’name age gender age’, rename=True) print two_ages._fields The new names for renamed fields depend on their index in the tuple, so the field with name class becomes _1 and the duplicate age field is changed to _3. $ python collections_namedtuple_rename.py (’name’, ’_1’, ’age’, ’gender’) (’name’, ’age’, ’gender’, ’_3’) 2.1.5 OrderedDict An OrderedDict is a dictionary subclass that remembers the order in which its con- tents are added. import collections print ’Regular dictionary:’ d = {} d[’a’] = ’A’ d[’b’] = ’B’ d[’c’] = ’C’ for k, v in d.items(): print k, v print ’\nOrderedDict:’ d = collections.OrderedDict() d[’a’] = ’A’ d[’b’] = ’B’ d[’c’] = ’C’ for k, v in d.items(): print k, v A regular dict does not track the insertion order, and iterating over it produces the values in order based on how the keys are stored in the hash table. In an OrderedDict, 常规 dict 并不跟踪插入顺序,迭代处理时会根据键在散列表中存储的顺序来生成值。在 OrderedDict 中则相反,它会记住元素插入的顺序,并在创建迭代器时使用这个顺序。 ptg 2.1. collections—Container Data Types 83 by contrast, the order in which the items are inserted is remembered and used when creating an iterator. $ python collections_ordereddict_iter.py Regular dictionary: a A c C b B OrderedDict: a A b B c C Equality A regular dict looks at its contents when testing for equality. An OrderedDict also considers the order the items were added. import collections print ’dict :’, d1 = {} d1[’a’] = ’A’ d1[’b’] = ’B’ d1[’c’] = ’C’ d2 = {} d2[’c’] = ’C’ d2[’b’] = ’B’ d2[’a’] = ’A’ print d1 == d2 print ’OrderedDict:’, d1 = collections.OrderedDict() d1[’a’] = ’A’ d1[’b’] = ’B’ d1[’c’] = ’C’  66  Python 标准库  ptg 2.1. collections—Container Data Types 83 by contrast, the order in which the items are inserted is remembered and used when creating an iterator. $ python collections_ordereddict_iter.py Regular dictionary: a A c C b B OrderedDict: a A b B c C Equality A regular dict looks at its contents when testing for equality. An OrderedDict also considers the order the items were added. import collections print ’dict :’, d1 = {} d1[’a’] = ’A’ d1[’b’] = ’B’ d1[’c’] = ’C’ d2 = {} d2[’c’] = ’C’ d2[’b’] = ’B’ d2[’a’] = ’A’ print d1 == d2 print ’OrderedDict:’, d1 = collections.OrderedDict() d1[’a’] = ’A’ d1[’b’] = ’B’ d1[’c’] = ’C’ 相等性 常规的 dict 在检查相等性时会查看其内容。OrderedDict 还会考虑元素增加的顺序。 ptg 2.1. collections—Container Data Types 83 by contrast, the order in which the items are inserted is remembered and used when creating an iterator. $ python collections_ordereddict_iter.py Regular dictionary: a A c C b B OrderedDict: a A b B c C Equality A regular dict looks at its contents when testing for equality. An OrderedDict also considers the order the items were added. import collections print ’dict :’, d1 = {} d1[’a’] = ’A’ d1[’b’] = ’B’ d1[’c’] = ’C’ d2 = {} d2[’c’] = ’C’ d2[’b’] = ’B’ d2[’a’] = ’A’ print d1 == d2 print ’OrderedDict:’, d1 = collections.OrderedDict() d1[’a’] = ’A’ d1[’b’] = ’B’ d1[’c’] = ’C’ ptg 84 Data Structures d2 = collections.OrderedDict() d2[’c’] = ’C’ d2[’b’] = ’B’ d2[’a’] = ’A’ print d1 == d2 In this case, since the two ordered dictionaries are created from values in a different order, they are considered to be different. $ python collections_ordereddict_equality.py dict : True OrderedDict: False See Also: collections (http://docs.python.org/library/collections.html) The standard library documentation for this module. 2.2 array—Sequence of Fixed-Type Data Purpose Manage sequences of fixed-type numerical data efficiently. Python Version 1.4 and later The array module defines a sequence data structure that looks very much like a list, except that all members have to be of the same primitive type. Refer to the standard library documentation for array for a complete list of the types supported. 2.2.1 Initialization An array is instantiated with an argument describing the type of data to be allowed, and possibly an initial sequence of data to store in the array. import array import binascii s = ’This is the array.’ a = array.array(’c’, s) print ’As string:’, s print ’As array :’, a print ’As hex :’, binascii.hexlify(a) 在这个例子中,由于两个有序字典由不同顺序的值创建,所以认为这两个有序字典是不同的。 ptg 84 Data Structures d2 = collections.OrderedDict() d2[’c’] = ’C’ d2[’b’] = ’B’ d2[’a’] = ’A’ print d1 == d2 In this case, since the two ordered dictionaries are created from values in a different order, they are considered to be different. $ python collections_ordereddict_equality.py dict : True OrderedDict: False See Also: collections (http://docs.python.org/library/collections.html) The standard library documentation for this module. 2.2 array—Sequence of Fixed-Type Data Purpose Manage sequences of fixed-type numerical data efficiently. Python Version 1.4 and later The array module defines a sequence data structure that looks very much like a list, except that all members have to be of the same primitive type. Refer to the standard library documentation for array for a complete list of the types supported. 2.2.1 Initialization An array is instantiated with an argument describing the type of data to be allowed, and possibly an initial sequence of data to store in the array. import array import binascii s = ’This is the array.’ a = array.array(’c’, s) print ’As string:’, s print ’As array :’, a print ’As hex :’, binascii.hexlify(a) 参见: collections (http://docs.python.org/library/collections.html) 这个模块的标准库文档。 2.2 array—固定类型数据序列 作用:高效管理固定类型数值数据的序列。 第 2 章 数 据 结 构 67  Python 版本:1.4 及以后版本 array 模块定义了一个序列数据结构,看起来与 list 非常相似,只不过所有成员都必须是相 同的基本类型。可以参考 array 的标准库文档全面了解目前支持的所有类型。 2.2.1 初始化 array 实例化时可以提供一个参数来描述允许哪种数据类型,还可以有一个初始的数据序列 存储在数组中。 ptg 84 Data Structures d2 = collections.OrderedDict() d2[’c’] = ’C’ d2[’b’] = ’B’ d2[’a’] = ’A’ print d1 == d2 In this case, since the two ordered dictionaries are created from values in a different order, they are considered to be different. $ python collections_ordereddict_equality.py dict : True OrderedDict: False See Also: collections (http://docs.python.org/library/collections.html) The standard library documentation for this module. 2.2 array—Sequence of Fixed-Type Data Purpose Manage sequences of fixed-type numerical data efficiently. Python Version 1.4 and later The array module defines a sequence data structure that looks very much like a list, except that all members have to be of the same primitive type. Refer to the standard library documentation for array for a complete list of the types supported. 2.2.1 Initialization An array is instantiated with an argument describing the type of data to be allowed, and possibly an initial sequence of data to store in the array. import array import binascii s = ’This is the array.’ a = array.array(’c’, s) print ’As string:’, s print ’As array :’, a print ’As hex :’, binascii.hexlify(a) 在这个例子中,数组配置为包含一个字节序列,用一个简单的字符串初始化。 ptg 2.2. array—Sequence of Fixed-Type Data 85 In this example, the array is configured to hold a sequence of bytes and is initial- ized with a simple string. $ python array_string.py As string: This is the array. As array : array(’c’, ’This is the array.’) As hex : 54686973206973207468652061727261792e 2.2.2 Manipulating Arrays An array can be extended and otherwise manipulated in the same ways as other Python sequences. import array import pprint a = array.array(’i’, xrange(3)) print ’Initial :’, a a.extend(xrange(3)) print ’Extended:’, a print ’Slice :’, a[2:5] print ’Iterator:’ print list(enumerate(a)) The supported operations include slicing, iterating, and adding elements to the end. $ python array_sequence.py Initial : array(’i’, [0, 1, 2]) Extended: array(’i’, [0, 1, 2, 0, 1, 2]) Slice : array(’i’, [2, 0, 1]) Iterator: [(0, 0), (1, 1), (2, 2), (3, 0), (4, 1), (5, 2)] 2.2.3 Arrays and Files The contents of an array can be written to and read from files using built-in methods coded efficiently for that purpose. 2.2.2 处理数组 类似于其他的 Python 序列,可以采用同样的方式扩展和处理 array。 ptg 2.2. array—Sequence of Fixed-Type Data 85 In this example, the array is configured to hold a sequence of bytes and is initial- ized with a simple string. $ python array_string.py As string: This is the array. As array : array(’c’, ’This is the array.’) As hex : 54686973206973207468652061727261792e 2.2.2 Manipulating Arrays An array can be extended and otherwise manipulated in the same ways as other Python sequences. import array import pprint a = array.array(’i’, xrange(3)) print ’Initial :’, a a.extend(xrange(3)) print ’Extended:’, a print ’Slice :’, a[2:5] print ’Iterator:’ print list(enumerate(a)) The supported operations include slicing, iterating, and adding elements to the end. $ python array_sequence.py Initial : array(’i’, [0, 1, 2]) Extended: array(’i’, [0, 1, 2, 0, 1, 2]) Slice : array(’i’, [2, 0, 1]) Iterator: [(0, 0), (1, 1), (2, 2), (3, 0), (4, 1), (5, 2)] 2.2.3 Arrays and Files The contents of an array can be written to and read from files using built-in methods coded efficiently for that purpose. 目前支持的操作包括分片、迭代以及向末尾增加元素。 ptg 2.2. array—Sequence of Fixed-Type Data 85 In this example, the array is configured to hold a sequence of bytes and is initial- ized with a simple string. $ python array_string.py As string: This is the array. As array : array(’c’, ’This is the array.’) As hex : 54686973206973207468652061727261792e 2.2.2 Manipulating Arrays An array can be extended and otherwise manipulated in the same ways as other Python sequences. import array import pprint a = array.array(’i’, xrange(3)) print ’Initial :’, a a.extend(xrange(3)) print ’Extended:’, a print ’Slice :’, a[2:5] print ’Iterator:’ print list(enumerate(a)) The supported operations include slicing, iterating, and adding elements to the end. $ python array_sequence.py Initial : array(’i’, [0, 1, 2]) Extended: array(’i’, [0, 1, 2, 0, 1, 2]) Slice : array(’i’, [2, 0, 1]) Iterator: [(0, 0), (1, 1), (2, 2), (3, 0), (4, 1), (5, 2)] 2.2.3 Arrays and Files The contents of an array can be written to and read from files using built-in methods coded efficiently for that purpose.  68  Python 标准库  ptg 2.2. array—Sequence of Fixed-Type Data 85 In this example, the array is configured to hold a sequence of bytes and is initial- ized with a simple string. $ python array_string.py As string: This is the array. As array : array(’c’, ’This is the array.’) As hex : 54686973206973207468652061727261792e 2.2.2 Manipulating Arrays An array can be extended and otherwise manipulated in the same ways as other Python sequences. import array import pprint a = array.array(’i’, xrange(3)) print ’Initial :’, a a.extend(xrange(3)) print ’Extended:’, a print ’Slice :’, a[2:5] print ’Iterator:’ print list(enumerate(a)) The supported operations include slicing, iterating, and adding elements to the end. $ python array_sequence.py Initial : array(’i’, [0, 1, 2]) Extended: array(’i’, [0, 1, 2, 0, 1, 2]) Slice : array(’i’, [2, 0, 1]) Iterator: [(0, 0), (1, 1), (2, 2), (3, 0), (4, 1), (5, 2)] 2.2.3 Arrays and Files The contents of an array can be written to and read from files using built-in methods coded efficiently for that purpose. 2.2.3 数组与文件 可以使用高效读 / 写文件的专用内置方法将数组的内容写入文件或从文件读入数组。 ptg 86 Data Structures import array import binascii import tempfile a = array.array(’i’, xrange(5)) print ’A1:’, a # Write the array of numbers to a temporary file output = tempfile.NamedTemporaryFile() a.tofile(output.file) # must pass an *actual* file output.flush() # Read the raw data with open(output.name, ’rb’) as input: raw_data = input.read() print ’Raw Contents:’, binascii.hexlify(raw_data) # Read the data into an array input.seek(0) a2 = array.array(’i’) a2.fromfile(input, len(a)) print ’A2:’, a2 This example illustrates reading the data raw, directly from the binary file, versus reading it into a new array and converting the bytes to the appropriate types. $ python array_file.py A1: array(’i’, [0, 1, 2, 3, 4]) Raw Contents: 0000000001000000020000000300000004000000 A2: array(’i’, [0, 1, 2, 3, 4]) 2.2.4 Alternate Byte Ordering If the data in the array is not in the native byte order, or needs to be swapped before being sent to a system with a different byte order (or over the network), it is possible to convert the entire array without iterating over the elements from Python. import array import binascii def to_hex(a): chars_per_item = a.itemsize * 2 # 2 hex digits 这个例子展示了直接从二进制文件读取原始数据,将它读入一个新的数组,并把字节转换 为适当的类型。 ptg 86 Data Structures import array import binascii import tempfile a = array.array(’i’, xrange(5)) print ’A1:’, a # Write the array of numbers to a temporary file output = tempfile.NamedTemporaryFile() a.tofile(output.file) # must pass an *actual* file output.flush() # Read the raw data with open(output.name, ’rb’) as input: raw_data = input.read() print ’Raw Contents:’, binascii.hexlify(raw_data) # Read the data into an array input.seek(0) a2 = array.array(’i’) a2.fromfile(input, len(a)) print ’A2:’, a2 This example illustrates reading the data raw, directly from the binary file, versus reading it into a new array and converting the bytes to the appropriate types. $ python array_file.py A1: array(’i’, [0, 1, 2, 3, 4]) Raw Contents: 0000000001000000020000000300000004000000 A2: array(’i’, [0, 1, 2, 3, 4]) 2.2.4 Alternate Byte Ordering If the data in the array is not in the native byte order, or needs to be swapped before being sent to a system with a different byte order (or over the network), it is possible to convert the entire array without iterating over the elements from Python. import array import binascii def to_hex(a): chars_per_item = a.itemsize * 2 # 2 hex digits 2.2.4 候选字节顺序 如果数组中的数据没有采用固有的字节顺序,或者在发送到一个采用不同字节顺序的系统 (或在网络上发送)之前需要交换顺序,可以由 Python 转换整个数组而无须迭代处理每一个元素。 ptg 86 Data Structures import array import binascii import tempfile a = array.array(’i’, xrange(5)) print ’A1:’, a # Write the array of numbers to a temporary file output = tempfile.NamedTemporaryFile() a.tofile(output.file) # must pass an *actual* file output.flush() # Read the raw data with open(output.name, ’rb’) as input: raw_data = input.read() print ’Raw Contents:’, binascii.hexlify(raw_data) # Read the data into an array input.seek(0) a2 = array.array(’i’) a2.fromfile(input, len(a)) print ’A2:’, a2 This example illustrates reading the data raw, directly from the binary file, versus reading it into a new array and converting the bytes to the appropriate types. $ python array_file.py A1: array(’i’, [0, 1, 2, 3, 4]) Raw Contents: 0000000001000000020000000300000004000000 A2: array(’i’, [0, 1, 2, 3, 4]) 2.2.4 Alternate Byte Ordering If the data in the array is not in the native byte order, or needs to be swapped before being sent to a system with a different byte order (or over the network), it is possible to convert the entire array without iterating over the elements from Python. import array import binascii def to_hex(a): chars_per_item = a.itemsize * 2 # 2 hex digits 第 2 章 数 据 结 构 69  ptg 86 Data Structures import array import binascii import tempfile a = array.array(’i’, xrange(5)) print ’A1:’, a # Write the array of numbers to a temporary file output = tempfile.NamedTemporaryFile() a.tofile(output.file) # must pass an *actual* file output.flush() # Read the raw data with open(output.name, ’rb’) as input: raw_data = input.read() print ’Raw Contents:’, binascii.hexlify(raw_data) # Read the data into an array input.seek(0) a2 = array.array(’i’) a2.fromfile(input, len(a)) print ’A2:’, a2 This example illustrates reading the data raw, directly from the binary file, versus reading it into a new array and converting the bytes to the appropriate types. $ python array_file.py A1: array(’i’, [0, 1, 2, 3, 4]) Raw Contents: 0000000001000000020000000300000004000000 A2: array(’i’, [0, 1, 2, 3, 4]) 2.2.4 Alternate Byte Ordering If the data in the array is not in the native byte order, or needs to be swapped before being sent to a system with a different byte order (or over the network), it is possible to convert the entire array without iterating over the elements from Python. import array import binascii def to_hex(a): chars_per_item = a.itemsize * 2 # 2 hex digits ptg 2.3. heapq—Heap Sort Algorithm 87 hex_version = binascii.hexlify(a) num_chunks = len(hex_version) / chars_per_item for i in xrange(num_chunks): start = i*chars_per_item end = start + chars_per_item yield hex_version[start:end] a1 = array.array(’i’, xrange(5)) a2 = array.array(’i’, xrange(5)) a2.byteswap() fmt = ’%10s %10s %10s %10s’ print fmt % (’A1 hex’, ’A1’, ’A2 hex’, ’A2’) print fmt % ((’-’ * 10,) * 4) for values in zip(to_hex(a1), a1, to_hex(a2), a2): print fmt % values The byteswap() method switches the byte order of the items in the array from within C, so it is much more efficient than looping over the data in Python. $ python array_byteswap.py A1 hex A1 A2 hex A2 ---------- ---------- ---------- ---------- 00000000 0 00000000 0 01000000 1 00000001 16777216 02000000 2 00000002 33554432 03000000 3 00000003 50331648 04000000 4 00000004 67108864 See Also: array (http://docs.python.org/library/array.html) The standard library documenta- tion for this module. struct (page 102) The struct module. Numerical Python (www.scipy.org) NumPy is a Python library for working with large data sets efficiently. 2.3 heapq—Heap Sort Algorithm Purpose The heapq module implements a min-heap sort algorithm suit- able for use with Python’s lists. Python Version New in 2.3 with additions in 2.5 byteswap() 方法会交换 C 数组中元素的字节顺序,比在 Python 中循环处理数据高效得多。 ptg 2.3. heapq—Heap Sort Algorithm 87 hex_version = binascii.hexlify(a) num_chunks = len(hex_version) / chars_per_item for i in xrange(num_chunks): start = i*chars_per_item end = start + chars_per_item yield hex_version[start:end] a1 = array.array(’i’, xrange(5)) a2 = array.array(’i’, xrange(5)) a2.byteswap() fmt = ’%10s %10s %10s %10s’ print fmt % (’A1 hex’, ’A1’, ’A2 hex’, ’A2’) print fmt % ((’-’ * 10,) * 4) for values in zip(to_hex(a1), a1, to_hex(a2), a2): print fmt % values The byteswap() method switches the byte order of the items in the array from within C, so it is much more efficient than looping over the data in Python. $ python array_byteswap.py A1 hex A1 A2 hex A2 ---------- ---------- ---------- ---------- 00000000 0 00000000 0 01000000 1 00000001 16777216 02000000 2 00000002 33554432 03000000 3 00000003 50331648 04000000 4 00000004 67108864 See Also: array (http://docs.python.org/library/array.html) The standard library documenta- tion for this module. struct (page 102) The struct module. Numerical Python (www.scipy.org) NumPy is a Python library for working with large data sets efficiently. 2.3 heapq—Heap Sort Algorithm Purpose The heapq module implements a min-heap sort algorithm suit- able for use with Python’s lists. Python Version New in 2.3 with additions in 2.5 参见: array (http://docs.python.org/library/array.html) 这个模块的标准库文档。 struct (2.6 节) struct 模块。 Numerical Python (www.scipy.org) NumPy 是一个高效处理大数据集的 Python 库。 2.3 heapq—堆排序算法 作用:heapq 模块实现了一个适用于 Python 列表的最小堆排序算法。 Python 版本:2.3 版本中新增,并在 2.5 版本中做了补充 堆(heap)是一个树形数据结构,其中子节点与父节点是一种有序关系。二叉堆(Binary heap)可以使用以如下方式组织的列表或数组表示,即元素 N 的子元素位于 2*N+1 和 2*N+2(索 引从 0 开始)。这种布局允许原地重新组织堆,从而不必在增加或删除元素时分配大量内存。 最大堆(max-heap)确保父节点大于或等于其两个子节点。最小堆(min-heap)要求父节 点小于或等于其子节点。Python 的 heapq 模块实现了一个最小堆。  70  Python 标准库  2.3.1 示例数据 本节中的示例将使用 heapq_heapdata.py 中的数据。 ptg 88 Data Structures A heap is a tree-like data structure where the child nodes have a sort-order relationship with the parents. Binary heaps can be represented using a list or an array organized so that the children of element N are at positions 2*N+1 and 2*N+2 (for zero-based indexes). This layout makes it possible to rearrange heaps in place, so it is not necessary to reallocate as much memory when adding or removing items. A max-heap ensures that the parent is larger than or equal to both of its children. A min-heap requires that the parent be less than or equal to its children. Python’s heapq module implements a min-heap. 2.3.1 Example Data The examples in this section use the data in heapq_heapdata.py. # This data was generated with the random module. data = [19, 9, 4, 10, 11] The heap output is printed using heapq_showtree.py. import math from cStringIO import StringIO def show_tree(tree, total_width=36, fill=’ ’): """Pretty-print a tree.""" output = StringIO() last_row = -1 for i, n in enumerate(tree): if i: row = int(math.floor(math.log(i+1, 2))) else: row = 0 if row != last_row: output.write(’\n’) columns = 2**row col_width = int(math.floor((total_width * 1.0) / columns)) output.write(str(n).center(col_width, fill)) last_row = row print output.getvalue() print ’-’ * total_width print return 堆输出使用 heapq_showtree.py 打印。 ptg 88 Data Structures A heap is a tree-like data structure where the child nodes have a sort-order relationship with the parents. Binary heaps can be represented using a list or an array organized so that the children of element N are at positions 2*N+1 and 2*N+2 (for zero-based indexes). This layout makes it possible to rearrange heaps in place, so it is not necessary to reallocate as much memory when adding or removing items. A max-heap ensures that the parent is larger than or equal to both of its children. A min-heap requires that the parent be less than or equal to its children. Python’s heapq module implements a min-heap. 2.3.1 Example Data The examples in this section use the data in heapq_heapdata.py. # This data was generated with the random module. data = [19, 9, 4, 10, 11] The heap output is printed using heapq_showtree.py. import math from cStringIO import StringIO def show_tree(tree, total_width=36, fill=’ ’): """Pretty-print a tree.""" output = StringIO() last_row = -1 for i, n in enumerate(tree): if i: row = int(math.floor(math.log(i+1, 2))) else: row = 0 if row != last_row: output.write(’\n’) columns = 2**row col_width = int(math.floor((total_width * 1.0) / columns)) output.write(str(n).center(col_width, fill)) last_row = row print output.getvalue() print ’-’ * total_width print return 2.3.2 创建堆 创建堆有两种基本方式:heappush() 和 heapify()。 ptg 2.3. heapq—Heap Sort Algorithm 89 2.3.2 Creating a Heap There are two basic ways to create a heap: heappush() and heapify(). import heapq from heapq_showtree import show_tree from heapq_heapdata import data heap = [] print ’random :’, data print for n in data: print ’add %3d:’ % n heapq.heappush(heap, n) show_tree(heap) Using heappush(), the heap sort order of the elements is maintained as new items are added from a data source. $ python heapq_heappush.py random : [19, 9, 4, 10, 11] add 19: 19 ------------------------------------ add 9: 9 19 ------------------------------------ add 4: 4 19 9 ------------------------------------ add 10: 4 第 2 章 数 据 结 构 71  使用 heappush() 时,从数据源增加新元素时会保持元素的堆顺序。 ptg 2.3. heapq—Heap Sort Algorithm 89 2.3.2 Creating a Heap There are two basic ways to create a heap: heappush() and heapify(). import heapq from heapq_showtree import show_tree from heapq_heapdata import data heap = [] print ’random :’, data print for n in data: print ’add %3d:’ % n heapq.heappush(heap, n) show_tree(heap) Using heappush(), the heap sort order of the elements is maintained as new items are added from a data source. $ python heapq_heappush.py random : [19, 9, 4, 10, 11] add 19: 19 ------------------------------------ add 9: 9 19 ------------------------------------ add 4: 4 19 9 ------------------------------------ add 10: 4 ptg 90 Data Structures 10 9 19 ------------------------------------ add 11: 4 10 9 19 11 ------------------------------------ If the data is already in memory, it is more efficient to use heapify() to rearrange the items of the list in place. import heapq from heapq_showtree import show_tree from heapq_heapdata import data print ’random :’, data heapq.heapify(data) print ’heapified :’ show_tree(data) The result of building a list in heap order one item at a time is the same as building it unordered and then calling heapify(). $ python heapq_heapify.py random : [19, 9, 4, 10, 11] heapified : 4 9 19 10 11 ------------------------------------ 2.3.3 Accessing Contents of a Heap Once the heap is organized correctly, use heappop() to remove the element with the lowest value. import heapq from heapq_showtree import show_tree from heapq_heapdata import data 如果数据已经在内存中,使用 heapify() 原地重新组织列表中的元素会更高效。 ptg 90 Data Structures 10 9 19 ------------------------------------ add 11: 4 10 9 19 11 ------------------------------------ If the data is already in memory, it is more efficient to use heapify() to rearrange the items of the list in place. import heapq from heapq_showtree import show_tree from heapq_heapdata import data print ’random :’, data heapq.heapify(data) print ’heapified :’ show_tree(data) The result of building a list in heap order one item at a time is the same as building it unordered and then calling heapify(). $ python heapq_heapify.py random : [19, 9, 4, 10, 11] heapified : 4 9 19 10 11 ------------------------------------ 2.3.3 Accessing Contents of a Heap Once the heap is organized correctly, use heappop() to remove the element with the lowest value. import heapq from heapq_showtree import show_tree from heapq_heapdata import data 如果按堆顺序一次一个元素地构建列表,其结果与构建一个无序列表再调用 heapify() 是一  72  Python 标准库  样的。 ptg 90 Data Structures 10 9 19 ------------------------------------ add 11: 4 10 9 19 11 ------------------------------------ If the data is already in memory, it is more efficient to use heapify() to rearrange the items of the list in place. import heapq from heapq_showtree import show_tree from heapq_heapdata import data print ’random :’, data heapq.heapify(data) print ’heapified :’ show_tree(data) The result of building a list in heap order one item at a time is the same as building it unordered and then calling heapify(). $ python heapq_heapify.py random : [19, 9, 4, 10, 11] heapified : 4 9 19 10 11 ------------------------------------ 2.3.3 Accessing Contents of a Heap Once the heap is organized correctly, use heappop() to remove the element with the lowest value. import heapq from heapq_showtree import show_tree from heapq_heapdata import data 2.3.3 访问堆的内容 一旦堆已正确组织,就可以使用 heappop() 删除有最小值的元素。 ptg 90 Data Structures 10 9 19 ------------------------------------ add 11: 4 10 9 19 11 ------------------------------------ If the data is already in memory, it is more efficient to use heapify() to rearrange the items of the list in place. import heapq from heapq_showtree import show_tree from heapq_heapdata import data print ’random :’, data heapq.heapify(data) print ’heapified :’ show_tree(data) The result of building a list in heap order one item at a time is the same as building it unordered and then calling heapify(). $ python heapq_heapify.py random : [19, 9, 4, 10, 11] heapified : 4 9 19 10 11 ------------------------------------ 2.3.3 Accessing Contents of a Heap Once the heap is organized correctly, use heappop() to remove the element with the lowest value. import heapq from heapq_showtree import show_tree from heapq_heapdata import data ptg 2.3. heapq—Heap Sort Algorithm 91 print ’random :’, data heapq.heapify(data) print ’heapified :’ show_tree(data) print for i in xrange(2): smallest = heapq.heappop(data) print ’pop %3d:’ % smallest show_tree(data) In this example, adapted from the stdlib documentation, heapify() and heappop() are used to sort a list of numbers. $ python heapq_heappop.py random : [19, 9, 4, 10, 11] heapified : 4 9 19 10 11 ------------------------------------ pop 4: 9 10 19 11 ------------------------------------ pop 9: 10 11 19 ------------------------------------ To remove existing elements and replace them with new values in a single opera- tion, use heapreplace(). import heapq from heapq_showtree import show_tree from heapq_heapdata import data 这个例子是由 stdlib 文档改写的,其中使用了 heapify() 和 heappop() 对一个数字列表排序。 ptg 2.3. heapq—Heap Sort Algorithm 91 print ’random :’, data heapq.heapify(data) print ’heapified :’ show_tree(data) print for i in xrange(2): smallest = heapq.heappop(data) print ’pop %3d:’ % smallest show_tree(data) In this example, adapted from the stdlib documentation, heapify() and heappop() are used to sort a list of numbers. $ python heapq_heappop.py random : [19, 9, 4, 10, 11] heapified : 4 9 19 10 11 ------------------------------------ pop 4: 9 10 19 11 ------------------------------------ pop 9: 10 11 19 ------------------------------------ To remove existing elements and replace them with new values in a single opera- tion, use heapreplace(). import heapq from heapq_showtree import show_tree from heapq_heapdata import data 第 2 章 数 据 结 构 73  ptg 2.3. heapq—Heap Sort Algorithm 91 print ’random :’, data heapq.heapify(data) print ’heapified :’ show_tree(data) print for i in xrange(2): smallest = heapq.heappop(data) print ’pop %3d:’ % smallest show_tree(data) In this example, adapted from the stdlib documentation, heapify() and heappop() are used to sort a list of numbers. $ python heapq_heappop.py random : [19, 9, 4, 10, 11] heapified : 4 9 19 10 11 ------------------------------------ pop 4: 9 10 19 11 ------------------------------------ pop 9: 10 11 19 ------------------------------------ To remove existing elements and replace them with new values in a single opera- tion, use heapreplace(). import heapq from heapq_showtree import show_tree from heapq_heapdata import data 如果希望在一个操作中删除现有元素并替换为新值,可以使用 heapreplace()。 ptg 2.3. heapq—Heap Sort Algorithm 91 print ’random :’, data heapq.heapify(data) print ’heapified :’ show_tree(data) print for i in xrange(2): smallest = heapq.heappop(data) print ’pop %3d:’ % smallest show_tree(data) In this example, adapted from the stdlib documentation, heapify() and heappop() are used to sort a list of numbers. $ python heapq_heappop.py random : [19, 9, 4, 10, 11] heapified : 4 9 19 10 11 ------------------------------------ pop 4: 9 10 19 11 ------------------------------------ pop 9: 10 11 19 ------------------------------------ To remove existing elements and replace them with new values in a single opera- tion, use heapreplace(). import heapq from heapq_showtree import show_tree from heapq_heapdata import data ptg 92 Data Structures heapq.heapify(data) print ’start:’ show_tree(data) for n in [0, 13]: smallest = heapq.heapreplace(data, n) print ’replace %2d with %2d:’ % (smallest, n) show_tree(data) Replacing elements in place makes it possible to maintain a fixed-size heap, such as a queue of jobs ordered by priority. $ python heapq_heapreplace.py start: 4 9 19 10 11 ------------------------------------ replace 4 with 0: 0 9 19 10 11 ------------------------------------ replace 0 with 13: 9 10 19 13 11 ------------------------------------ 2.3.4 Data Extremes from a Heap heapq also includes two functions to examine an iterable to find a range of the largest or smallest values it contains. import heapq from heapq_heapdata import data 通过原地替换元素,这样可以维持一个固定大小的堆,如按优先级排序的作业队列。 ptg 92 Data Structures heapq.heapify(data) print ’start:’ show_tree(data) for n in [0, 13]: smallest = heapq.heapreplace(data, n) print ’replace %2d with %2d:’ % (smallest, n) show_tree(data) Replacing elements in place makes it possible to maintain a fixed-size heap, such as a queue of jobs ordered by priority. $ python heapq_heapreplace.py start: 4 9 19 10 11 ------------------------------------ replace 4 with 0: 0 9 19 10 11 ------------------------------------ replace 0 with 13: 9 10 19 13 11 ------------------------------------ 2.3.4 Data Extremes from a Heap heapq also includes two functions to examine an iterable to find a range of the largest or smallest values it contains. import heapq from heapq_heapdata import data 2.3.4 堆的数据极值 heapq 还包括两个检查可迭代对象的函数,查找其中包含的最大值或最小值的范围。  74  Python 标准库  ptg 92 Data Structures heapq.heapify(data) print ’start:’ show_tree(data) for n in [0, 13]: smallest = heapq.heapreplace(data, n) print ’replace %2d with %2d:’ % (smallest, n) show_tree(data) Replacing elements in place makes it possible to maintain a fixed-size heap, such as a queue of jobs ordered by priority. $ python heapq_heapreplace.py start: 4 9 19 10 11 ------------------------------------ replace 4 with 0: 0 9 19 10 11 ------------------------------------ replace 0 with 13: 9 10 19 13 11 ------------------------------------ 2.3.4 Data Extremes from a Heap heapq also includes two functions to examine an iterable to find a range of the largest or smallest values it contains. import heapq from heapq_heapdata import data ptg 2.4. bisect—Maintain Lists in Sorted Order 93 print ’all :’, data print ’3 largest :’, heapq.nlargest(3, data) print ’from sort :’, list(reversed(sorted(data)[-3:])) print ’3 smallest:’, heapq.nsmallest(3, data) print ’from sort :’, sorted(data)[:3] Using nlargest() and nsmallest() is only efficient for relatively small values of n > 1, but can still come in handy in a few cases. $ python heapq_extremes.py all : [19, 9, 4, 10, 11] 3 largest : [19, 11, 10] from sort : [19, 11, 10] 3 smallest: [4, 9, 10] from sort : [4, 9, 10] See Also: heapq (http://docs.python.org/library/heapq.html) The standard library documen- tation for this module. Heap (data structure) (http://en.wikipedia.org/wiki/Heap_(data_structure)) Wikipedia article that provides a general description of heap data structures. Priority Queue (page 98) A priority queue implementation from Queue (page 96) in the standard library. 2.4 bisect—Maintain Lists in Sorted Order Purpose Maintains a list in sorted order without having to call sort each time an item is added to the list. Python Version 1.4 and later The bisect module implements an algorithm for inserting elements into a list while maintaining the list in sorted order. For some cases, this is more efficient than repeatedly sorting a list or explicitly sorting a large list after it is constructed. 2.4.1 Inserting in Sorted Order Here is a simple example using insort() to insert items into a list in sorted order. 只有当 n 值(n>1)相对小时使用 nlargest() 和 nsmallest() 才算高效,不过有些情况下这两 个函数会很方便。 ptg 2.4. bisect—Maintain Lists in Sorted Order 93 print ’all :’, data print ’3 largest :’, heapq.nlargest(3, data) print ’from sort :’, list(reversed(sorted(data)[-3:])) print ’3 smallest:’, heapq.nsmallest(3, data) print ’from sort :’, sorted(data)[:3] Using nlargest() and nsmallest() is only efficient for relatively small values of n > 1, but can still come in handy in a few cases. $ python heapq_extremes.py all : [19, 9, 4, 10, 11] 3 largest : [19, 11, 10] from sort : [19, 11, 10] 3 smallest: [4, 9, 10] from sort : [4, 9, 10] See Also: heapq (http://docs.python.org/library/heapq.html) The standard library documen- tation for this module. Heap (data structure) (http://en.wikipedia.org/wiki/Heap_(data_structure)) Wikipedia article that provides a general description of heap data structures. Priority Queue (page 98) A priority queue implementation from Queue (page 96) in the standard library. 2.4 bisect—Maintain Lists in Sorted Order Purpose Maintains a list in sorted order without having to call sort each time an item is added to the list. Python Version 1.4 and later The bisect module implements an algorithm for inserting elements into a list while maintaining the list in sorted order. For some cases, this is more efficient than repeatedly sorting a list or explicitly sorting a large list after it is constructed. 2.4.1 Inserting in Sorted Order Here is a simple example using insort() to insert items into a list in sorted order. 参见: heapq (http://docs.python.org/library/heapq.html) 这个模块的标准库文档。 Heap (data structure) (http://en.wikipedia.org/wiki/Heap_(data_structure))  维 基 百 科 文 章, 提供了对堆数据结构的一般描述。 2.5.3 节 基于标准库中 Queue (2.5 节 ) 的一个优先队列实现。 2.4 bisect—维护有序列表 作用:维护有序列表,而不必在每次向列表增加一个元素时都调用 sort 排序。 Python 版本:1.4 及以后版本 bisect 模块实现了一个算法用于向列表中插入元素,同时仍保持列表有序。有些情况下, 这比反复对一个列表排序更高效,另外也比构建一个大列表之后再显式对其排序更为高效。 2.4.1 有序插入 下面给出一个简单的例子,这里使用 insort() 按有序顺序向一个列表中插入元素。 ptg 94 Data Structures import bisect import random # Use a constant seed to ensure that # the same pseudo-random numbers # are used each time the loop is run. random.seed(1) print ’New Pos Contents’ print ’--- --- --------’ # Generate random numbers and # insert them into a list in sorted # order. l = [] for i in range(1, 15): r = random.randint(1, 100) position = bisect.bisect(l, r) bisect.insort(l, r) print ’%3d %3d’ % (r, position), l The first column of the output shows the new random number. The second column shows the position where the number will be inserted into the list. The remainder of each line is the current sorted list. $ python bisect_example.py New Pos Contents --- --- -------- 14 0 [14] 85 1 [14, 85] 77 1 [14, 77, 85] 26 1 [14, 26, 77, 85] 50 2 [14, 26, 50, 77, 85] 45 2 [14, 26, 45, 50, 77, 85] 66 4 [14, 26, 45, 50, 66, 77, 85] 79 6 [14, 26, 45, 50, 66, 77, 79, 85] 10 0 [10, 14, 26, 45, 50, 66, 77, 79, 85] 3 0 [3, 10, 14, 26, 45, 50, 66, 77, 79, 85] 84 9 [3, 10, 14, 26, 45, 50, 66, 77, 79, 84, 85] 44 4 [3, 10, 14, 26, 44, 45, 50, 66, 77, 79, 84, 85] 77 9 [3, 10, 14, 26, 44, 45, 50, 66, 77, 77, 79, 84, 85] 1 0 [1, 3, 10, 14, 26, 44, 45, 50, 66, 77, 77, 79, 84, 85] 第 2 章 数 据 结 构 75  ptg 94 Data Structures import bisect import random # Use a constant seed to ensure that # the same pseudo-random numbers # are used each time the loop is run. random.seed(1) print ’New Pos Contents’ print ’--- --- --------’ # Generate random numbers and # insert them into a list in sorted # order. l = [] for i in range(1, 15): r = random.randint(1, 100) position = bisect.bisect(l, r) bisect.insort(l, r) print ’%3d %3d’ % (r, position), l The first column of the output shows the new random number. The second column shows the position where the number will be inserted into the list. The remainder of each line is the current sorted list. $ python bisect_example.py New Pos Contents --- --- -------- 14 0 [14] 85 1 [14, 85] 77 1 [14, 77, 85] 26 1 [14, 26, 77, 85] 50 2 [14, 26, 50, 77, 85] 45 2 [14, 26, 45, 50, 77, 85] 66 4 [14, 26, 45, 50, 66, 77, 85] 79 6 [14, 26, 45, 50, 66, 77, 79, 85] 10 0 [10, 14, 26, 45, 50, 66, 77, 79, 85] 3 0 [3, 10, 14, 26, 45, 50, 66, 77, 79, 85] 84 9 [3, 10, 14, 26, 45, 50, 66, 77, 79, 84, 85] 44 4 [3, 10, 14, 26, 44, 45, 50, 66, 77, 79, 84, 85] 77 9 [3, 10, 14, 26, 44, 45, 50, 66, 77, 77, 79, 84, 85] 1 0 [1, 3, 10, 14, 26, 44, 45, 50, 66, 77, 77, 79, 84, 85] 输出的第一列显示了新随机数。第二列显示了这个数将插入到列表的哪个位置。每一行余 下的部分则是当前的有序列表。 ptg 94 Data Structures import bisect import random # Use a constant seed to ensure that # the same pseudo-random numbers # are used each time the loop is run. random.seed(1) print ’New Pos Contents’ print ’--- --- --------’ # Generate random numbers and # insert them into a list in sorted # order. l = [] for i in range(1, 15): r = random.randint(1, 100) position = bisect.bisect(l, r) bisect.insort(l, r) print ’%3d %3d’ % (r, position), l The first column of the output shows the new random number. The second column shows the position where the number will be inserted into the list. The remainder of each line is the current sorted list. $ python bisect_example.py New Pos Contents --- --- -------- 14 0 [14] 85 1 [14, 85] 77 1 [14, 77, 85] 26 1 [14, 26, 77, 85] 50 2 [14, 26, 50, 77, 85] 45 2 [14, 26, 45, 50, 77, 85] 66 4 [14, 26, 45, 50, 66, 77, 85] 79 6 [14, 26, 45, 50, 66, 77, 79, 85] 10 0 [10, 14, 26, 45, 50, 66, 77, 79, 85] 3 0 [3, 10, 14, 26, 45, 50, 66, 77, 79, 85] 84 9 [3, 10, 14, 26, 45, 50, 66, 77, 79, 84, 85] 44 4 [3, 10, 14, 26, 44, 45, 50, 66, 77, 79, 84, 85] 77 9 [3, 10, 14, 26, 44, 45, 50, 66, 77, 77, 79, 84, 85] 1 0 [1, 3, 10, 14, 26, 44, 45, 50, 66, 77, 77, 79, 84, 85] 这是一个很简单的例子,对于此例所处理的数据量来说,如果直接构建列表然后完成一次 排序,可能速度更快。不过对于长列表而言,使用类似这样的一个插入排序算法可以大大节省 时间和内存。 2.4.2 处理重复 之前显示的结果集包括一个重复的值 77。bisect 模块提供了两种方法来处理重复。新值可 以插入到现有值的左边或右边。insort() 函数实际上是 insort_right() 的别名,这个函数会在现有 值之后插入新值。相应的函数 insort_left() 则在现有值之前插入新值。 ptg 2.4. bisect—Maintain Lists in Sorted Order 95 This is a simple example, and for the amount of data being manipulated, it might be faster to simply build the list and then sort it once. But for long lists, significant time and memory savings can be achieved using an insertion sort algorithm such as this one. 2.4.2 Handling Duplicates The result set shown previously includes a repeated value, 77. The bisect module pro- vides two ways to handle repeats. New values can be inserted to the left of existing val- ues or to the right. The insort() function is actually an alias for insort_right(), which inserts after the existing value. The corresponding function insort_left() inserts before the existing value. import bisect import random # Reset the seed random.seed(1) print ’New Pos Contents’ print ’--- --- --------’ # Use bisect_left and insort_left. l = [] for i in range(1, 15): r = random.randint(1, 100) position = bisect.bisect_left(l, r) bisect.insort_left(l, r) print ’%3d %3d’ % (r, position), l When the same data is manipulated using bisect_left() and insort_left(), the results are the same sorted list, but the insert positions are different for the duplicate values. $ python bisect_example2.py New Pos Contents --- --- -------- 14 0 [14] 85 1 [14, 85] 77 1 [14, 77, 85] 26 1 [14, 26, 77, 85] 50 2 [14, 26, 50, 77, 85] 45 2 [14, 26, 45, 50, 77, 85]  76  Python 标准库  ptg 2.4. bisect—Maintain Lists in Sorted Order 95 This is a simple example, and for the amount of data being manipulated, it might be faster to simply build the list and then sort it once. But for long lists, significant time and memory savings can be achieved using an insertion sort algorithm such as this one. 2.4.2 Handling Duplicates The result set shown previously includes a repeated value, 77. The bisect module pro- vides two ways to handle repeats. New values can be inserted to the left of existing val- ues or to the right. The insort() function is actually an alias for insort_right(), which inserts after the existing value. The corresponding function insort_left() inserts before the existing value. import bisect import random # Reset the seed random.seed(1) print ’New Pos Contents’ print ’--- --- --------’ # Use bisect_left and insort_left. l = [] for i in range(1, 15): r = random.randint(1, 100) position = bisect.bisect_left(l, r) bisect.insort_left(l, r) print ’%3d %3d’ % (r, position), l When the same data is manipulated using bisect_left() and insort_left(), the results are the same sorted list, but the insert positions are different for the duplicate values. $ python bisect_example2.py New Pos Contents --- --- -------- 14 0 [14] 85 1 [14, 85] 77 1 [14, 77, 85] 26 1 [14, 26, 77, 85] 50 2 [14, 26, 50, 77, 85] 45 2 [14, 26, 45, 50, 77, 85] 使用 bisect_left() 和 insort_left() 处理同样的数据时,结果会得到相同的有序列表,不过重 复值插入的位置有所不同。 ptg 2.4. bisect—Maintain Lists in Sorted Order 95 This is a simple example, and for the amount of data being manipulated, it might be faster to simply build the list and then sort it once. But for long lists, significant time and memory savings can be achieved using an insertion sort algorithm such as this one. 2.4.2 Handling Duplicates The result set shown previously includes a repeated value, 77. The bisect module pro- vides two ways to handle repeats. New values can be inserted to the left of existing val- ues or to the right. The insort() function is actually an alias for insort_right(), which inserts after the existing value. The corresponding function insort_left() inserts before the existing value. import bisect import random # Reset the seed random.seed(1) print ’New Pos Contents’ print ’--- --- --------’ # Use bisect_left and insort_left. l = [] for i in range(1, 15): r = random.randint(1, 100) position = bisect.bisect_left(l, r) bisect.insort_left(l, r) print ’%3d %3d’ % (r, position), l When the same data is manipulated using bisect_left() and insort_left(), the results are the same sorted list, but the insert positions are different for the duplicate values. $ python bisect_example2.py New Pos Contents --- --- -------- 14 0 [14] 85 1 [14, 85] 77 1 [14, 77, 85] 26 1 [14, 26, 77, 85] 50 2 [14, 26, 50, 77, 85] 45 2 [14, 26, 45, 50, 77, 85] ptg 96 Data Structures 66 4 [14, 26, 45, 50, 66, 77, 85] 79 6 [14, 26, 45, 50, 66, 77, 79, 85] 10 0 [10, 14, 26, 45, 50, 66, 77, 79, 85] 3 0 [3, 10, 14, 26, 45, 50, 66, 77, 79, 85] 84 9 [3, 10, 14, 26, 45, 50, 66, 77, 79, 84, 85] 44 4 [3, 10, 14, 26, 44, 45, 50, 66, 77, 79, 84, 85] 77 8 [3, 10, 14, 26, 44, 45, 50, 66, 77, 77, 79, 84, 85] 1 0 [1, 3, 10, 14, 26, 44, 45, 50, 66, 77, 77, 79, 84, 85] In addition to the Python implementation, a faster C implementation is available. If the C version is present, that implementation automatically overrides the pure Python implementation when bisect is imported. See Also: bisect (http://docs.python.org/library/bisect.html) The standard library documenta- tion for this module. Insertion Sort (http://en.wikipedia.org/wiki/Insertion_sort) Wikipedia article that provides a description of the insertion sort algorithm. 2.5 Queue—Thread-Safe FIFO Implementation Purpose Provides a thread-safe FIFO implementation. Python Version At least 1.4 The Queue module provides a first-in, first-out (FIFO) data structure suitable for mul- tithreaded programming. It can be used to pass messages or other data safely between producer and consumer threads. Locking is handled for the caller, so many threads can work with the same Queue instance safely. The size of a Queue (the number of ele- ments it contains) may be restricted to throttle memory usage or processing. Note: This discussion assumes you already understand the general nature of a queue. If you do not, you may want to read some of the references before con- tinuing. 2.5.1 Basic FIFO Queue The Queue class implements a basic first-in, first-out container. Elements are added to one end of the sequence using put(), and removed from the other end using get(). 除了 Python 实现外,还有一个速度更快的 C 实现。如果有 C 版本,导入 bisect 时,这个 实现会自动地覆盖纯 Python 实现。 参见: bisect (http://docs.python.org/library/bisect.html) 这个模块的标准库文档。 Insertion Sort (http://en.wikipedia.org/wiki/Insertion_sort) 维基百科文章,提供了对插入排 序算法的描述。 2.5 Queue—线程安全的 FIFO 实现 作用:提供一个线程安全的 FIFO 实现。 第 2 章 数 据 结 构 77  Python 版本:至少 1.4 Queue 模块提供了一个适用于多线程编程的先进先出(first-in, first-out,FIFO)数据结构, 可以用来在生产者和消费者线程之间安全地传递消息或其他数据。它会为调用者处理锁定,使 多个线程可以安全地处理同一个 Queue 实例。Queue 的大小(其中包含的元素个数)可能要受 限,以限制内存使用或处理。 注意 :这里的讨论假设你已经了解队列的一般性质。如果你还不太清楚,在学习下面的内容之 前可能需要先读一些有关的参考资料。 2.5.1 基本 FIFO 队列 Queue 类实现了一个基本的先进先出容器。使用 put() 将元素增加到序列一端,使用 get() 从另一端删除。 ptg 2.5. Queue—Thread-Safe FIFO Implementation 97 import Queue q = Queue.Queue() for i in range(5): q.put(i) while not q.empty(): print q.get(), print This example uses a single thread to illustrate that elements are removed from the queue in the same order they are inserted. $ python Queue_fifo.py 0 1 2 3 4 2.5.2 LIFO Queue In contrast to the standard FIFO implementation of Queue, the LifoQueue uses last-in, first-out (LIFO) ordering (normally associated with a stack data structure). import Queue q = Queue.LifoQueue() for i in range(5): q.put(i) while not q.empty(): print q.get(), print The item most recently put into the queue is removed by get. $ python Queue_lifo.py 4 3 2 1 0 这个例子使用了一个线程,来展示按元素的插入顺序从队列删除元素。 ptg 2.5. Queue—Thread-Safe FIFO Implementation 97 import Queue q = Queue.Queue() for i in range(5): q.put(i) while not q.empty(): print q.get(), print This example uses a single thread to illustrate that elements are removed from the queue in the same order they are inserted. $ python Queue_fifo.py 0 1 2 3 4 2.5.2 LIFO Queue In contrast to the standard FIFO implementation of Queue, the LifoQueue uses last-in, first-out (LIFO) ordering (normally associated with a stack data structure). import Queue q = Queue.LifoQueue() for i in range(5): q.put(i) while not q.empty(): print q.get(), print The item most recently put into the queue is removed by get. $ python Queue_lifo.py 4 3 2 1 0 2.5.2 LIFO 队列 与 Queue 的标准 FIFO 实现相反,LifoQueue 使用了后进先出(last–in,first–out,LIFO)顺 序(通常与栈数据结构关联)。 ptg 2.5. Queue—Thread-Safe FIFO Implementation 97 import Queue q = Queue.Queue() for i in range(5): q.put(i) while not q.empty(): print q.get(), print This example uses a single thread to illustrate that elements are removed from the queue in the same order they are inserted. $ python Queue_fifo.py 0 1 2 3 4 2.5.2 LIFO Queue In contrast to the standard FIFO implementation of Queue, the LifoQueue uses last-in, first-out (LIFO) ordering (normally associated with a stack data structure). import Queue q = Queue.LifoQueue() for i in range(5): q.put(i) while not q.empty(): print q.get(), print The item most recently put into the queue is removed by get. $ python Queue_lifo.py 4 3 2 1 0  78  Python 标准库  get 将删除最近使用 put 插入到队列的元素。 ptg 2.5. Queue—Thread-Safe FIFO Implementation 97 import Queue q = Queue.Queue() for i in range(5): q.put(i) while not q.empty(): print q.get(), print This example uses a single thread to illustrate that elements are removed from the queue in the same order they are inserted. $ python Queue_fifo.py 0 1 2 3 4 2.5.2 LIFO Queue In contrast to the standard FIFO implementation of Queue, the LifoQueue uses last-in, first-out (LIFO) ordering (normally associated with a stack data structure). import Queue q = Queue.LifoQueue() for i in range(5): q.put(i) while not q.empty(): print q.get(), print The item most recently put into the queue is removed by get. $ python Queue_lifo.py 4 3 2 1 0 2.5.3 优先队列 有些情况下,队列中元素的处理顺序需要根据这些元素的特性来决定,而不只是在队列中 创建或插入的顺序。例如,财务部门的打印作业可能要优先于一个开发人员打印的代码清单。 PriorityQueue 使用队列内容的有序顺序来决定获取哪一个元素。 ptg 98 Data Structures 2.5.3 Priority Queue Sometimes, the processing order of the items in a queue needs to be based on charac- teristics of those items, rather than just on the order in which they are created or added to the queue. For example, print jobs from the payroll department may take precedence over a code listing printed by a developer. PriorityQueue uses the sort order of the contents of the queue to decide which to retrieve. import Queue import threading class Job(object): def __init__(self, priority, description): self.priority = priority self.description = description print ’New job:’, description return def __cmp__(self, other): return cmp(self.priority, other.priority) q = Queue.PriorityQueue() q.put( Job(3, ’Mid-level job’) ) q.put( Job(10, ’Low-level job’) ) q.put( Job(1, ’Important job’) ) def process_job(q): while True: next_job = q.get() print ’Processing job:’, next_job.description q.task_done() workers = [ threading.Thread(target=process_job, args=(q,)), threading.Thread(target=process_job, args=(q,)), ] for w in workers: w.setDaemon(True) w.start() q.join() This example has multiple threads consuming the jobs, which are to be processed based on the priority of items in the queue at the time get() was called. The order 这个例子有多个线程在处理作业,要根据调用 get() 时队列中元素的优先级来处理。运行消 费者线程时,增加到队列中的元素的处理顺序取决于线程上下文切换。 第 2 章 数 据 结 构 79  ptg 2.5. Queue—Thread-Safe FIFO Implementation 99 of processing for items added to the queue while the consumer threads are running depends on thread context switching. $ python Queue_priority.py New job: Mid-level job New job: Low-level job New job: Important job Processing job: Important job Processing job: Mid-level job Processing job: Low-level job 2.5.4 Building a Threaded Podcast Client The source code for the podcasting client in this section demonstrates how to use the Queue class with multiple threads. The program reads one or more RSS feeds, queues up the enclosures for the five most recent episodes to be downloaded, and processes several downloads in parallel using threads. It does not have enough error handling for production use, but the skeleton implementation provides an example of how to use the Queue module. First, some operating parameters are established. Normally, these would come from user inputs (preferences, a database, etc.). The example uses hard-coded values for the number of threads and a list of URLs to fetch. from Queue import Queue from threading import Thread import time import urllib import urlparse import feedparser # Set up some global variables num_fetch_threads = 2 enclosure_queue = Queue() # A real app wouldn’t use hard-coded data... feed_urls = [ ’http://advocacy.python.org/podcasts/littlebit.rss’, ] The function downloadEnclosures() will run in the worker thread and process the downloads using urllib. 2.5.4 构建一个多线程播客客户程序 本节将构建一个播客客户程序,程序的源代码展示了如何用多个线程使用 Queue 类。这个 程序要读入一个或多个 RSS 提要,对专辑排队来显示最新的 5 集以供下载,并使用线程并行地 处理多个下载。这里没有提供足够的错误处理,所以不能在实际生产环境中使用,不过这个骨 架实现可以作为一个很好的例子来说明如何使用 Queue 模块。 首先要建立一些操作参数。正常情况下,这些参数来自用户输入(首选项、数据库,等 等)。不过在这个例子中,线程数和要获取的 URL 列表都使用了硬编码值。 ptg 2.5. Queue—Thread-Safe FIFO Implementation 99 of processing for items added to the queue while the consumer threads are running depends on thread context switching. $ python Queue_priority.py New job: Mid-level job New job: Low-level job New job: Important job Processing job: Important job Processing job: Mid-level job Processing job: Low-level job 2.5.4 Building a Threaded Podcast Client The source code for the podcasting client in this section demonstrates how to use the Queue class with multiple threads. The program reads one or more RSS feeds, queues up the enclosures for the five most recent episodes to be downloaded, and processes several downloads in parallel using threads. It does not have enough error handling for production use, but the skeleton implementation provides an example of how to use the Queue module. First, some operating parameters are established. Normally, these would come from user inputs (preferences, a database, etc.). The example uses hard-coded values for the number of threads and a list of URLs to fetch. from Queue import Queue from threading import Thread import time import urllib import urlparse import feedparser # Set up some global variables num_fetch_threads = 2 enclosure_queue = Queue() # A real app wouldn’t use hard-coded data... feed_urls = [ ’http://advocacy.python.org/podcasts/littlebit.rss’, ] The function downloadEnclosures() will run in the worker thread and process the downloads using urllib. 函数 downloadEnclosures() 在工作线程中运行,使用 urllib 来处理下载。 ptg 100 Data Structures def downloadEnclosures(i, q): """This is the worker thread function. It processes items in the queue one after another. These daemon threads go into an infinite loop, and only exit when the main thread ends. """ while True: print ’%s: Looking for the next enclosure’ % i url = q.get() parsed_url = urlparse.urlparse(url) print ’%s: Downloading:’ % i, parsed_url.path response = urllib.urlopen(url) data = response.read() # Save the downloaded file to the current directory outfile_name = url.rpartition(’/’)[-1] with open(outfile_name, ’wb’) as outfile: outfile.write(data) q.task_done() Once the threads’ target function is defined, the worker threads can be started. When downloadEnclosures() processes the statement url = q.get(), it blocks and waits until the queue has something to return. That means it is safe to start the threads before there is anything in the queue. # Set up some threads to fetch the enclosures for i in range(num_fetch_threads): worker = Thread(target=downloadEnclosures, args=(i, enclosure_queue,)) worker.setDaemon(True) worker.start() The next step is to retrieve the feed contents using Mark Pilgrim’s feedparser module (www.feedparser.org) and enqueue the URLs of the enclosures. As soon as the first URL is added to the queue, one of the worker threads picks it up and starts downloading it. The loop will continue to add items until the feed is exhausted, and the worker threads will take turns dequeuing URLs to download them. # Download the feed(s) and put the enclosure URLs into # the queue. for url in feed_urls: response = feedparser.parse(url, agent=’fetch_podcasts.py’)  80  Python 标准库  ptg 100 Data Structures def downloadEnclosures(i, q): """This is the worker thread function. It processes items in the queue one after another. These daemon threads go into an infinite loop, and only exit when the main thread ends. """ while True: print ’%s: Looking for the next enclosure’ % i url = q.get() parsed_url = urlparse.urlparse(url) print ’%s: Downloading:’ % i, parsed_url.path response = urllib.urlopen(url) data = response.read() # Save the downloaded file to the current directory outfile_name = url.rpartition(’/’)[-1] with open(outfile_name, ’wb’) as outfile: outfile.write(data) q.task_done() Once the threads’ target function is defined, the worker threads can be started. When downloadEnclosures() processes the statement url = q.get(), it blocks and waits until the queue has something to return. That means it is safe to start the threads before there is anything in the queue. # Set up some threads to fetch the enclosures for i in range(num_fetch_threads): worker = Thread(target=downloadEnclosures, args=(i, enclosure_queue,)) worker.setDaemon(True) worker.start() The next step is to retrieve the feed contents using Mark Pilgrim’s feedparser module (www.feedparser.org) and enqueue the URLs of the enclosures. As soon as the first URL is added to the queue, one of the worker threads picks it up and starts downloading it. The loop will continue to add items until the feed is exhausted, and the worker threads will take turns dequeuing URLs to download them. # Download the feed(s) and put the enclosure URLs into # the queue. for url in feed_urls: response = feedparser.parse(url, agent=’fetch_podcasts.py’) 一旦定义了线程的目标函数,接下来可以启动工作线程。downloadEnclosures() 处理语句 url = q.get() 时,会阻塞并等待,直到队列返回某个结果。这说明,即使队列中没有任何内容, 也可以安全地启动线程。 ptg 100 Data Structures def downloadEnclosures(i, q): """This is the worker thread function. It processes items in the queue one after another. These daemon threads go into an infinite loop, and only exit when the main thread ends. """ while True: print ’%s: Looking for the next enclosure’ % i url = q.get() parsed_url = urlparse.urlparse(url) print ’%s: Downloading:’ % i, parsed_url.path response = urllib.urlopen(url) data = response.read() # Save the downloaded file to the current directory outfile_name = url.rpartition(’/’)[-1] with open(outfile_name, ’wb’) as outfile: outfile.write(data) q.task_done() Once the threads’ target function is defined, the worker threads can be started. When downloadEnclosures() processes the statement url = q.get(), it blocks and waits until the queue has something to return. That means it is safe to start the threads before there is anything in the queue. # Set up some threads to fetch the enclosures for i in range(num_fetch_threads): worker = Thread(target=downloadEnclosures, args=(i, enclosure_queue,)) worker.setDaemon(True) worker.start() The next step is to retrieve the feed contents using Mark Pilgrim’s feedparser module (www.feedparser.org) and enqueue the URLs of the enclosures. As soon as the first URL is added to the queue, one of the worker threads picks it up and starts downloading it. The loop will continue to add items until the feed is exhausted, and the worker threads will take turns dequeuing URLs to download them. # Download the feed(s) and put the enclosure URLs into # the queue. for url in feed_urls: response = feedparser.parse(url, agent=’fetch_podcasts.py’) 下一步使用 Mark Pilgrim 的 feedparser 模块(www.feedparser.org)获取提要内容,并将这 些专辑的 URL 入队。一旦第一个 URL 增加到队列,就会有某个工作线程提取这个 URL,开始 下载。这个循环会继续增加元素,直到提要全部利用,工作线程会依次将 URL 出队来下载这 些提要。 ptg 100 Data Structures def downloadEnclosures(i, q): """This is the worker thread function. It processes items in the queue one after another. These daemon threads go into an infinite loop, and only exit when the main thread ends. """ while True: print ’%s: Looking for the next enclosure’ % i url = q.get() parsed_url = urlparse.urlparse(url) print ’%s: Downloading:’ % i, parsed_url.path response = urllib.urlopen(url) data = response.read() # Save the downloaded file to the current directory outfile_name = url.rpartition(’/’)[-1] with open(outfile_name, ’wb’) as outfile: outfile.write(data) q.task_done() Once the threads’ target function is defined, the worker threads can be started. When downloadEnclosures() processes the statement url = q.get(), it blocks and waits until the queue has something to return. That means it is safe to start the threads before there is anything in the queue. # Set up some threads to fetch the enclosures for i in range(num_fetch_threads): worker = Thread(target=downloadEnclosures, args=(i, enclosure_queue,)) worker.setDaemon(True) worker.start() The next step is to retrieve the feed contents using Mark Pilgrim’s feedparser module (www.feedparser.org) and enqueue the URLs of the enclosures. As soon as the first URL is added to the queue, one of the worker threads picks it up and starts downloading it. The loop will continue to add items until the feed is exhausted, and the worker threads will take turns dequeuing URLs to download them. # Download the feed(s) and put the enclosure URLs into # the queue. for url in feed_urls: response = feedparser.parse(url, agent=’fetch_podcasts.py’) ptg 2.5. Queue—Thread-Safe FIFO Implementation 101 for entry in response[’entries’][-5:]: for enclosure in entry.get(’enclosures’, []): parsed_url = urlparse.urlparse(enclosure[’url’]) print ’Queuing:’, parsed_url.path enclosure_queue.put(enclosure[’url’]) The only thing left to do is wait for the queue to empty out again, using join(). # Now wait for the queue to be empty, indicating that we have # processed all the downloads. print ’*** Main thread waiting’ enclosure_queue.join() print ’*** Done’ Running the sample script produces the following. $ python fetch_podcasts.py 0: Looking for the next enclosure 1: Looking for the next enclosure Queuing: /podcasts/littlebit/2010-04-18.mp3 Queuing: /podcasts/littlebit/2010-05-22.mp3 Queuing: /podcasts/littlebit/2010-06-06.mp3 Queuing: /podcasts/littlebit/2010-07-26.mp3 Queuing: /podcasts/littlebit/2010-11-25.mp3 *** Main thread waiting 0: Downloading: /podcasts/littlebit/2010-04-18.mp3 0: Looking for the next enclosure 0: Downloading: /podcasts/littlebit/2010-05-22.mp3 0: Looking for the next enclosure 0: Downloading: /podcasts/littlebit/2010-06-06.mp3 0: Looking for the next enclosure 0: Downloading: /podcasts/littlebit/2010-07-26.mp3 0: Looking for the next enclosure 0: Downloading: /podcasts/littlebit/2010-11-25.mp3 0: Looking for the next enclosure *** Done The actual output will depend on the contents of the RSS feed used. See Also: Queue (http://docs.python.org/lib/module-Queue.html) Standard library documen- tation for this module. 还有一件事要做,要使用 join() 再次等待队列腾空。 ptg 2.5. Queue—Thread-Safe FIFO Implementation 101 for entry in response[’entries’][-5:]: for enclosure in entry.get(’enclosures’, []): parsed_url = urlparse.urlparse(enclosure[’url’]) print ’Queuing:’, parsed_url.path enclosure_queue.put(enclosure[’url’]) The only thing left to do is wait for the queue to empty out again, using join(). # Now wait for the queue to be empty, indicating that we have # processed all the downloads. print ’*** Main thread waiting’ enclosure_queue.join() print ’*** Done’ Running the sample script produces the following. $ python fetch_podcasts.py 0: Looking for the next enclosure 1: Looking for the next enclosure Queuing: /podcasts/littlebit/2010-04-18.mp3 Queuing: /podcasts/littlebit/2010-05-22.mp3 Queuing: /podcasts/littlebit/2010-06-06.mp3 Queuing: /podcasts/littlebit/2010-07-26.mp3 Queuing: /podcasts/littlebit/2010-11-25.mp3 *** Main thread waiting 0: Downloading: /podcasts/littlebit/2010-04-18.mp3 0: Looking for the next enclosure 0: Downloading: /podcasts/littlebit/2010-05-22.mp3 0: Looking for the next enclosure 0: Downloading: /podcasts/littlebit/2010-06-06.mp3 0: Looking for the next enclosure 0: Downloading: /podcasts/littlebit/2010-07-26.mp3 0: Looking for the next enclosure 0: Downloading: /podcasts/littlebit/2010-11-25.mp3 0: Looking for the next enclosure *** Done The actual output will depend on the contents of the RSS feed used. See Also: Queue (http://docs.python.org/lib/module-Queue.html) Standard library documen- tation for this module. 运行这个示例脚本可以生成以下结果。 ptg 2.5. Queue—Thread-Safe FIFO Implementation 101 for entry in response[’entries’][-5:]: for enclosure in entry.get(’enclosures’, []): parsed_url = urlparse.urlparse(enclosure[’url’]) print ’Queuing:’, parsed_url.path enclosure_queue.put(enclosure[’url’]) The only thing left to do is wait for the queue to empty out again, using join(). # Now wait for the queue to be empty, indicating that we have # processed all the downloads. print ’*** Main thread waiting’ enclosure_queue.join() print ’*** Done’ Running the sample script produces the following. $ python fetch_podcasts.py 0: Looking for the next enclosure 1: Looking for the next enclosure Queuing: /podcasts/littlebit/2010-04-18.mp3 Queuing: /podcasts/littlebit/2010-05-22.mp3 Queuing: /podcasts/littlebit/2010-06-06.mp3 Queuing: /podcasts/littlebit/2010-07-26.mp3 Queuing: /podcasts/littlebit/2010-11-25.mp3 *** Main thread waiting 0: Downloading: /podcasts/littlebit/2010-04-18.mp3 0: Looking for the next enclosure 0: Downloading: /podcasts/littlebit/2010-05-22.mp3 0: Looking for the next enclosure 0: Downloading: /podcasts/littlebit/2010-06-06.mp3 0: Looking for the next enclosure 0: Downloading: /podcasts/littlebit/2010-07-26.mp3 0: Looking for the next enclosure 0: Downloading: /podcasts/littlebit/2010-11-25.mp3 0: Looking for the next enclosure *** Done The actual output will depend on the contents of the RSS feed used. See Also: Queue (http://docs.python.org/lib/module-Queue.html) Standard library documen- tation for this module. 第 2 章 数 据 结 构 81  ptg 2.5. Queue—Thread-Safe FIFO Implementation 101 for entry in response[’entries’][-5:]: for enclosure in entry.get(’enclosures’, []): parsed_url = urlparse.urlparse(enclosure[’url’]) print ’Queuing:’, parsed_url.path enclosure_queue.put(enclosure[’url’]) The only thing left to do is wait for the queue to empty out again, using join(). # Now wait for the queue to be empty, indicating that we have # processed all the downloads. print ’*** Main thread waiting’ enclosure_queue.join() print ’*** Done’ Running the sample script produces the following. $ python fetch_podcasts.py 0: Looking for the next enclosure 1: Looking for the next enclosure Queuing: /podcasts/littlebit/2010-04-18.mp3 Queuing: /podcasts/littlebit/2010-05-22.mp3 Queuing: /podcasts/littlebit/2010-06-06.mp3 Queuing: /podcasts/littlebit/2010-07-26.mp3 Queuing: /podcasts/littlebit/2010-11-25.mp3 *** Main thread waiting 0: Downloading: /podcasts/littlebit/2010-04-18.mp3 0: Looking for the next enclosure 0: Downloading: /podcasts/littlebit/2010-05-22.mp3 0: Looking for the next enclosure 0: Downloading: /podcasts/littlebit/2010-06-06.mp3 0: Looking for the next enclosure 0: Downloading: /podcasts/littlebit/2010-07-26.mp3 0: Looking for the next enclosure 0: Downloading: /podcasts/littlebit/2010-11-25.mp3 0: Looking for the next enclosure *** Done The actual output will depend on the contents of the RSS feed used. See Also: Queue (http://docs.python.org/lib/module-Queue.html) Standard library documen- tation for this module. 具体的输出取决于所使用的 RSS 提要的内容。 参见: Queue (http://docs.python.org/lib/module-Queue.html) 这个模块的标准库文档。 2.1 节的 deque collections 模块包含一个 deque (双端队列)类。 Queue data structures (http://en.wikipedia.org/wiki/Queue_(data_structure)) 解释队列的一篇 维基百科文章。 FIFO (http://en.wikipedia.org/wiki/FIFO) 维基百科文章,解释了先进先出数据结构。 2.6 struct—二进制数据结构 作用:在字符串和二进制数据之间转换。 Python 版本:1.4 及以后版本 struct 模块包括一些在字节串与内置 Python 数据类型(如数字和字符串)之间完成转换的 函数。 2.6.1 函数与 Struct 类 struct 提供了一组处理结构值的模块级函数,另外还有一个 Struct 类。格式指示符由字符串 格式转换为一种编译表示,这与处理正则表达式的方式类似。这个转换会耗费资源,所以当创 建一个 Struct 实例并在这个实例上调用方法时(而不是使用模块级函数),完成一次转换会更为 高效。下面的例子都会使用 Struct 类。 2.6.2 打包和解包 Struct 支 持 使 用 格 式 指 示 符 将 数 据 打 包(packing) 为 字 符 串, 以 及 从 字 符 串 解 包 (unpacking)数据,格式提示符由表示数据类型的字符以及可选的数量及字节序(endianness)  82  Python 标准库  指示符构成。要全面了解目前支持的格式指示符,请参考标准库文档。 在下面的例子中,指示符要求有一个整数或 long 值、一个包含两字符的串,以及一个浮点 数。格式指示符中包含的空格用来分隔类型指示符,在编译格式时会被忽略。 ptg 102 Data Structures Deque (page 75) from collections (page 70) The collections module includes a deque (double-ended queue) class. Queue data structures (http://en.wikipedia.org/wiki/Queue_(data_structure)) Wikipedia article explaining queues. FIFO (http://en.wikipedia.org/wiki/FIFO) Wikipedia article explaining first-in, first-out data structures. 2.6 struct—Binary Data Structures Purpose Convert between strings and binary data. Python Version 1.4 and later The struct module includes functions for converting between strings of bytes and native Python data types, such as numbers and strings. 2.6.1 Functions vs. Struct Class There is a set of module-level functions for working with structured values, and there is also the Struct class. Format specifiers are converted from their string format to a compiled representation, similar to the way regular expressions are handled. The con- version takes some resources, so it is typically more efficient to do it once when creating a Struct instance and call methods on the instance, instead of using the module-level functions. The following examples all use the Struct class. 2.6.2 Packing and Unpacking Structs support packing data into strings and unpacking data from strings using for- mat specifiers made up of characters representing the data type and optional count and endianness indicators. Refer to the standard library documentation for a complete list of the supported format specifiers. In this example, the specifier calls for an integer or long value, a two-character string, and a floating-point number. The spaces in the format specifier are included to separate the type indicators and are ignored when the format is compiled. import struct import binascii values = (1, ’ab’, 2.7) s = struct.Struct(’I 2s f’) packed_data = s.pack(*values) ptg 2.6. struct—Binary Data Structures 103 print ’Original values:’, values print ’Format string :’, s.format print ’Uses :’, s.size, ’bytes’ print ’Packed Value :’, binascii.hexlify(packed_data) The example converts the packed value to a sequence of hex bytes for printing with binascii.hexlify(), since some characters are nulls. $ python struct_pack.py Original values: (1, ’ab’, 2.7) Format string : I 2s f Uses : 12 bytes Packed Value : 0100000061620000cdcc2c40 Use unpack() to extract data from its packed representation. import struct import binascii packed_data = binascii.unhexlify(’0100000061620000cdcc2c40’) s = struct.Struct(’I 2s f’) unpacked_data = s.unpack(packed_data) print ’Unpacked Values:’, unpacked_data Passing the packed value to unpack() gives basically the same values back (note the discrepancy in the floating-point value). $ python struct_unpack.py Unpacked Values: (1, ’ab’, 2.700000047683716) 2.6.3 Endianness By default, values are encoded using the native C library notion of endianness. It is easy to override that choice by providing an explicit endianness directive in the format string. import struct import binascii 这个例子将打包的值转换为一个十六进制字节序列,以便利用 binascii.hexlify() 打印,因 为有些字符是 null。 ptg 2.6. struct—Binary Data Structures 103 print ’Original values:’, values print ’Format string :’, s.format print ’Uses :’, s.size, ’bytes’ print ’Packed Value :’, binascii.hexlify(packed_data) The example converts the packed value to a sequence of hex bytes for printing with binascii.hexlify(), since some characters are nulls. $ python struct_pack.py Original values: (1, ’ab’, 2.7) Format string : I 2s f Uses : 12 bytes Packed Value : 0100000061620000cdcc2c40 Use unpack() to extract data from its packed representation. import struct import binascii packed_data = binascii.unhexlify(’0100000061620000cdcc2c40’) s = struct.Struct(’I 2s f’) unpacked_data = s.unpack(packed_data) print ’Unpacked Values:’, unpacked_data Passing the packed value to unpack() gives basically the same values back (note the discrepancy in the floating-point value). $ python struct_unpack.py Unpacked Values: (1, ’ab’, 2.700000047683716) 2.6.3 Endianness By default, values are encoded using the native C library notion of endianness. It is easy to override that choice by providing an explicit endianness directive in the format string. import struct import binascii 使用 unpack() 可以从打包表示中抽取数据。 ptg 2.6. struct—Binary Data Structures 103 print ’Original values:’, values print ’Format string :’, s.format print ’Uses :’, s.size, ’bytes’ print ’Packed Value :’, binascii.hexlify(packed_data) The example converts the packed value to a sequence of hex bytes for printing with binascii.hexlify(), since some characters are nulls. $ python struct_pack.py Original values: (1, ’ab’, 2.7) Format string : I 2s f Uses : 12 bytes Packed Value : 0100000061620000cdcc2c40 Use unpack() to extract data from its packed representation. import struct import binascii packed_data = binascii.unhexlify(’0100000061620000cdcc2c40’) s = struct.Struct(’I 2s f’) unpacked_data = s.unpack(packed_data) print ’Unpacked Values:’, unpacked_data Passing the packed value to unpack() gives basically the same values back (note the discrepancy in the floating-point value). $ python struct_unpack.py Unpacked Values: (1, ’ab’, 2.700000047683716) 2.6.3 Endianness By default, values are encoded using the native C library notion of endianness. It is easy to override that choice by providing an explicit endianness directive in the format string. import struct import binascii 将打包值传入 unpack(),基本上会得到相同的值(注意浮点值中的差别)。 ptg 2.6. struct—Binary Data Structures 103 print ’Original values:’, values print ’Format string :’, s.format print ’Uses :’, s.size, ’bytes’ print ’Packed Value :’, binascii.hexlify(packed_data) The example converts the packed value to a sequence of hex bytes for printing with binascii.hexlify(), since some characters are nulls. $ python struct_pack.py Original values: (1, ’ab’, 2.7) Format string : I 2s f Uses : 12 bytes Packed Value : 0100000061620000cdcc2c40 Use unpack() to extract data from its packed representation. import struct import binascii packed_data = binascii.unhexlify(’0100000061620000cdcc2c40’) s = struct.Struct(’I 2s f’) unpacked_data = s.unpack(packed_data) print ’Unpacked Values:’, unpacked_data Passing the packed value to unpack() gives basically the same values back (note the discrepancy in the floating-point value). $ python struct_unpack.py Unpacked Values: (1, ’ab’, 2.700000047683716) 2.6.3 Endianness By default, values are encoded using the native C library notion of endianness. It is easy to override that choice by providing an explicit endianness directive in the format string. import struct import binascii 2.6.3 字节序 默认情况下,值会使用内置 C 库的字节序(endianness)来编码。只需在格式串中提供一 个显式的字节序指令,就可以很容易地覆盖这个默认选择。 ptg 2.6. struct—Binary Data Structures 103 print ’Original values:’, values print ’Format string :’, s.format print ’Uses :’, s.size, ’bytes’ print ’Packed Value :’, binascii.hexlify(packed_data) The example converts the packed value to a sequence of hex bytes for printing with binascii.hexlify(), since some characters are nulls. $ python struct_pack.py Original values: (1, ’ab’, 2.7) Format string : I 2s f Uses : 12 bytes Packed Value : 0100000061620000cdcc2c40 Use unpack() to extract data from its packed representation. import struct import binascii packed_data = binascii.unhexlify(’0100000061620000cdcc2c40’) s = struct.Struct(’I 2s f’) unpacked_data = s.unpack(packed_data) print ’Unpacked Values:’, unpacked_data Passing the packed value to unpack() gives basically the same values back (note the discrepancy in the floating-point value). $ python struct_unpack.py Unpacked Values: (1, ’ab’, 2.700000047683716) 2.6.3 Endianness By default, values are encoded using the native C library notion of endianness. It is easy to override that choice by providing an explicit endianness directive in the format string. import struct import binascii 第 2 章 数 据 结 构 83  ptg 104 Data Structures values = (1, ’ab’, 2.7) print ’Original values:’, values endianness = [ (’@’, ’native, native’), (’=’, ’native, standard’), (’<’, ’little-endian’), (’>’, ’big-endian’), (’!’, ’network’), ] for code, name in endianness: s = struct.Struct(code + ’ I 2s f’) packed_data = s.pack(*values) print print ’Format string :’, s.format, ’for’, name print ’Uses :’, s.size, ’bytes’ print ’Packed Value :’, binascii.hexlify(packed_data) print ’Unpacked Value :’, s.unpack(packed_data) Table 2.1 lists the byte order specifiers used by Struct. Table 2.1. Byte Order Specifiers for struct Code Meaning @ Native order = Native standard < Little-endian > Big-endian ! Network order $ python struct_endianness.py Original values: (1, ’ab’, 2.7) Format string : @ I 2s f for native, native Uses : 12 bytes Packed Value : 0100000061620000cdcc2c40 Unpacked Value : (1, ’ab’, 2.700000047683716) Format string : = I 2s f for native, standard Uses : 10 bytes Packed Value : 010000006162cdcc2c40 表 2.1 列出了 struct 使用的字节序指示符。 表 2.1 struct 的字节序指示符 指 示 符 含  义 @ 内置顺序 = 内置标准 < 小端 > 大端 ! 网络顺序 ptg 104 Data Structures values = (1, ’ab’, 2.7) print ’Original values:’, values endianness = [ (’@’, ’native, native’), (’=’, ’native, standard’), (’<’, ’little-endian’), (’>’, ’big-endian’), (’!’, ’network’), ] for code, name in endianness: s = struct.Struct(code + ’ I 2s f’) packed_data = s.pack(*values) print print ’Format string :’, s.format, ’for’, name print ’Uses :’, s.size, ’bytes’ print ’Packed Value :’, binascii.hexlify(packed_data) print ’Unpacked Value :’, s.unpack(packed_data) Table 2.1 lists the byte order specifiers used by Struct. Table 2.1. Byte Order Specifiers for struct Code Meaning @ Native order = Native standard < Little-endian > Big-endian ! Network order $ python struct_endianness.py Original values: (1, ’ab’, 2.7) Format string : @ I 2s f for native, native Uses : 12 bytes Packed Value : 0100000061620000cdcc2c40 Unpacked Value : (1, ’ab’, 2.700000047683716) Format string : = I 2s f for native, standard Uses : 10 bytes Packed Value : 010000006162cdcc2c40 ptg 2.6. struct—Binary Data Structures 105 Unpacked Value : (1, ’ab’, 2.700000047683716) Format string : < I 2s f for little-endian Uses : 10 bytes Packed Value : 010000006162cdcc2c40 Unpacked Value : (1, ’ab’, 2.700000047683716) Format string : > I 2s f for big-endian Uses : 10 bytes Packed Value : 000000016162402ccccd Unpacked Value : (1, ’ab’, 2.700000047683716) Format string : ! I 2s f for network Uses : 10 bytes Packed Value : 000000016162402ccccd Unpacked Value : (1, ’ab’, 2.700000047683716) 2.6.4 Buffers Working with binary packed data is typically reserved for performance-sensitive sit- uations or when passing data into and out of extension modules. These cases can be optimized by avoiding the overhead of allocating a new buffer for each packed struc- ture. The pack_into() and unpack_from() methods support writing to preallocated buffers directly. import struct import binascii s = struct.Struct(’I 2s f’) values = (1, ’ab’, 2.7) print ’Original:’, values print print ’ctypes string buffer’ import ctypes b = ctypes.create_string_buffer(s.size) print ’Before :’, binascii.hexlify(b.raw) s.pack_into(b, 0, *values) print ’After :’, binascii.hexlify(b.raw) print ’Unpacked:’, s.unpack_from(b, 0)  84  Python 标准库  ptg 2.6. struct—Binary Data Structures 105 Unpacked Value : (1, ’ab’, 2.700000047683716) Format string : < I 2s f for little-endian Uses : 10 bytes Packed Value : 010000006162cdcc2c40 Unpacked Value : (1, ’ab’, 2.700000047683716) Format string : > I 2s f for big-endian Uses : 10 bytes Packed Value : 000000016162402ccccd Unpacked Value : (1, ’ab’, 2.700000047683716) Format string : ! I 2s f for network Uses : 10 bytes Packed Value : 000000016162402ccccd Unpacked Value : (1, ’ab’, 2.700000047683716) 2.6.4 Buffers Working with binary packed data is typically reserved for performance-sensitive sit- uations or when passing data into and out of extension modules. These cases can be optimized by avoiding the overhead of allocating a new buffer for each packed struc- ture. The pack_into() and unpack_from() methods support writing to preallocated buffers directly. import struct import binascii s = struct.Struct(’I 2s f’) values = (1, ’ab’, 2.7) print ’Original:’, values print print ’ctypes string buffer’ import ctypes b = ctypes.create_string_buffer(s.size) print ’Before :’, binascii.hexlify(b.raw) s.pack_into(b, 0, *values) print ’After :’, binascii.hexlify(b.raw) print ’Unpacked:’, s.unpack_from(b, 0) 2.6.4 缓冲区 通常在重视性能的情况下或者向扩展模块传入或传出数据时才会处理二进制打包数据。通 过避免为每个打包结构分配一个新缓冲区所带来的开销,可以优化这些情况。pack_into() 和 unpack_from() 方法支持直接写入预分配的缓冲区。 ptg 2.6. struct—Binary Data Structures 105 Unpacked Value : (1, ’ab’, 2.700000047683716) Format string : < I 2s f for little-endian Uses : 10 bytes Packed Value : 010000006162cdcc2c40 Unpacked Value : (1, ’ab’, 2.700000047683716) Format string : > I 2s f for big-endian Uses : 10 bytes Packed Value : 000000016162402ccccd Unpacked Value : (1, ’ab’, 2.700000047683716) Format string : ! I 2s f for network Uses : 10 bytes Packed Value : 000000016162402ccccd Unpacked Value : (1, ’ab’, 2.700000047683716) 2.6.4 Buffers Working with binary packed data is typically reserved for performance-sensitive sit- uations or when passing data into and out of extension modules. These cases can be optimized by avoiding the overhead of allocating a new buffer for each packed struc- ture. The pack_into() and unpack_from() methods support writing to preallocated buffers directly. import struct import binascii s = struct.Struct(’I 2s f’) values = (1, ’ab’, 2.7) print ’Original:’, values print print ’ctypes string buffer’ import ctypes b = ctypes.create_string_buffer(s.size) print ’Before :’, binascii.hexlify(b.raw) s.pack_into(b, 0, *values) print ’After :’, binascii.hexlify(b.raw) print ’Unpacked:’, s.unpack_from(b, 0) ptg 106 Data Structures print print ’array’ import array a = array.array(’c’, ’\0’ * s.size) print ’Before :’, binascii.hexlify(a) s.pack_into(a, 0, *values) print ’After :’, binascii.hexlify(a) print ’Unpacked:’, s.unpack_from(a, 0) The size attribute of the Struct tells us how big the buffer needs to be. $ python struct_buffers.py Original: (1, ’ab’, 2.7) ctypes string buffer Before : 000000000000000000000000 After : 0100000061620000cdcc2c40 Unpacked: (1, ’ab’, 2.700000047683716) array Before : 000000000000000000000000 After : 0100000061620000cdcc2c40 Unpacked: (1, ’ab’, 2.700000047683716) See Also: struct (http://docs.python.org/library/struct.html) The standard library documenta- tion for this module. array (page 84 ) The array module, for working with sequences of fixed-type values. binascii (http://docs.python.org/library/binascii.html) The binascii module, for producing ASCII representations of binary data. Endianness (http://en.wikipedia.org/wiki/Endianness) Wikipedia article that pro- vides an explanation of byte order and endianness in encoding. 2.7 weakref—Impermanent References to Objects Purpose Refer to an “expensive” object, but allow its memory to be reclaimed by the garbage collector if there are no other nonweak ref- erences. Python Version 2.1 and later Struct 的 size 属性指出缓冲区需要有多大。 ptg 106 Data Structures print print ’array’ import array a = array.array(’c’, ’\0’ * s.size) print ’Before :’, binascii.hexlify(a) s.pack_into(a, 0, *values) print ’After :’, binascii.hexlify(a) print ’Unpacked:’, s.unpack_from(a, 0) The size attribute of the Struct tells us how big the buffer needs to be. $ python struct_buffers.py Original: (1, ’ab’, 2.7) ctypes string buffer Before : 000000000000000000000000 After : 0100000061620000cdcc2c40 Unpacked: (1, ’ab’, 2.700000047683716) array Before : 000000000000000000000000 After : 0100000061620000cdcc2c40 Unpacked: (1, ’ab’, 2.700000047683716) See Also: struct (http://docs.python.org/library/struct.html) The standard library documenta- tion for this module. array (page 84 ) The array module, for working with sequences of fixed-type values. binascii (http://docs.python.org/library/binascii.html) The binascii module, for producing ASCII representations of binary data. Endianness (http://en.wikipedia.org/wiki/Endianness) Wikipedia article that pro- vides an explanation of byte order and endianness in encoding. 2.7 weakref—Impermanent References to Objects Purpose Refer to an “expensive” object, but allow its memory to be reclaimed by the garbage collector if there are no other nonweak ref- erences. Python Version 2.1 and later 第 2 章 数 据 结 构 85  ptg 106 Data Structures print print ’array’ import array a = array.array(’c’, ’\0’ * s.size) print ’Before :’, binascii.hexlify(a) s.pack_into(a, 0, *values) print ’After :’, binascii.hexlify(a) print ’Unpacked:’, s.unpack_from(a, 0) The size attribute of the Struct tells us how big the buffer needs to be. $ python struct_buffers.py Original: (1, ’ab’, 2.7) ctypes string buffer Before : 000000000000000000000000 After : 0100000061620000cdcc2c40 Unpacked: (1, ’ab’, 2.700000047683716) array Before : 000000000000000000000000 After : 0100000061620000cdcc2c40 Unpacked: (1, ’ab’, 2.700000047683716) See Also: struct (http://docs.python.org/library/struct.html) The standard library documenta- tion for this module. array (page 84 ) The array module, for working with sequences of fixed-type values. binascii (http://docs.python.org/library/binascii.html) The binascii module, for producing ASCII representations of binary data. Endianness (http://en.wikipedia.org/wiki/Endianness) Wikipedia article that pro- vides an explanation of byte order and endianness in encoding. 2.7 weakref—Impermanent References to Objects Purpose Refer to an “expensive” object, but allow its memory to be reclaimed by the garbage collector if there are no other nonweak ref- erences. Python Version 2.1 and later 参见: struct (http://docs.python.org/library/struct.html) 这个模块的标准库文档。 array(2.2 节) array 模块,用于处理固定类型值的序列。 binascii (http://docs.python.org/library/binascii.html) binascii 模块,用于生成二进制数据的 ASCII 表示。 Endianness (http://en.wikipedia.org/wiki/Endianness) 维基百科文章,提供了字节顺序以及 编码中字节序的解释。 2.7 weakref—对象的非永久引用 作用 :引用一个“昂贵”的对象,不过如果不再有其他非弱引用,则允许由垃圾回收器回 收其内存。 Python 版本:2.1 及以后版本 weakref 模块支持对象的弱引用。正常的引用会增加对象的引用计数,避免它被垃圾回收。 但并不总希望如此,比如有时可能会出现一个循环引用,或者有时可能要构建一个对象缓存,需 要内存时则要删除这个缓存。弱引用(weak reference)是避免对象被自动清除的一个对象句柄。 2.7.1 引用 对象的弱引用通过 ref 类管理。要获取原对象,可以调用引用对象。 ptg 2.7. weakref—Impermanent References to Objects 107 The weakref module supports weak references to objects. A normal reference incre- ments the reference count on the object and prevents it from being garbage collected. This is not always desirable, either when a circular reference might be present or when building a cache of objects that should be deleted when memory is needed. A weak reference is a handle to an object that does not keep it from being cleaned up automati- cally. 2.7.1 References Weak references to objects are managed through the ref class. To retrieve the original object, call the reference object. import weakref class ExpensiveObject(object): def __del__(self): print ’(Deleting %s)’ % self obj = ExpensiveObject() r = weakref.ref(obj) print ’obj:’, obj print ’ref:’, r print ’r():’, r() print ’deleting obj’ del obj print ’r():’, r() In this case, since obj is deleted before the second call to the reference, the ref returns None. $ python weakref_ref.py obj: <__main__.ExpensiveObject object at 0x100da5750> ref: r(): <__main__.ExpensiveObject object at 0x100da5750> deleting obj (Deleting <__main__.ExpensiveObject object at 0x100da5750>) r(): None  86  Python 标准库  ptg 2.7. weakref—Impermanent References to Objects 107 The weakref module supports weak references to objects. A normal reference incre- ments the reference count on the object and prevents it from being garbage collected. This is not always desirable, either when a circular reference might be present or when building a cache of objects that should be deleted when memory is needed. A weak reference is a handle to an object that does not keep it from being cleaned up automati- cally. 2.7.1 References Weak references to objects are managed through the ref class. To retrieve the original object, call the reference object. import weakref class ExpensiveObject(object): def __del__(self): print ’(Deleting %s)’ % self obj = ExpensiveObject() r = weakref.ref(obj) print ’obj:’, obj print ’ref:’, r print ’r():’, r() print ’deleting obj’ del obj print ’r():’, r() In this case, since obj is deleted before the second call to the reference, the ref returns None. $ python weakref_ref.py obj: <__main__.ExpensiveObject object at 0x100da5750> ref: r(): <__main__.ExpensiveObject object at 0x100da5750> deleting obj (Deleting <__main__.ExpensiveObject object at 0x100da5750>) r(): None 在这里,由于 obj 在第二次调用引用之前已经删除,所以 ref 返回 None。 ptg 2.7. weakref—Impermanent References to Objects 107 The weakref module supports weak references to objects. A normal reference incre- ments the reference count on the object and prevents it from being garbage collected. This is not always desirable, either when a circular reference might be present or when building a cache of objects that should be deleted when memory is needed. A weak reference is a handle to an object that does not keep it from being cleaned up automati- cally. 2.7.1 References Weak references to objects are managed through the ref class. To retrieve the original object, call the reference object. import weakref class ExpensiveObject(object): def __del__(self): print ’(Deleting %s)’ % self obj = ExpensiveObject() r = weakref.ref(obj) print ’obj:’, obj print ’ref:’, r print ’r():’, r() print ’deleting obj’ del obj print ’r():’, r() In this case, since obj is deleted before the second call to the reference, the ref returns None. $ python weakref_ref.py obj: <__main__.ExpensiveObject object at 0x100da5750> ref: r(): <__main__.ExpensiveObject object at 0x100da5750> deleting obj (Deleting <__main__.ExpensiveObject object at 0x100da5750>) r(): None 2.7.2 引用回调 ref 构造函数接受一个可选的回调函数,删除所引用的对象时会调用这个函数。 ptg 108 Data Structures 2.7.2 Reference Callbacks The ref constructor accepts an optional callback function to invoke when the refer- enced object is deleted. import weakref class ExpensiveObject(object): def __del__(self): print ’(Deleting %s)’ % self def callback(reference): """Invoked when referenced object is deleted""" print ’callback(’, reference, ’)’ obj = ExpensiveObject() r = weakref.ref(obj, callback) print ’obj:’, obj print ’ref:’, r print ’r():’, r() print ’deleting obj’ del obj print ’r():’, r() The callback receives the reference object as an argument after the reference is “dead” and no longer refers to the original object. One use for this feature is to remove the weak reference object from a cache. $ python weakref_ref_callback.py obj: <__main__.ExpensiveObject object at 0x100da1950> ref: r(): <__main__.ExpensiveObject object at 0x100da1950> deleting obj callback( ) (Deleting <__main__.ExpensiveObject object at 0x100da1950>) r(): None 2.7.3 Proxies It is sometimes more convenient to use a proxy, rather than a weak reference. Proxies can be used as though they were the original object and do not need to be called before 引用已经“死亡”不再引用原对象时,这个回调会接受引用对象作为参数。这个特性的一 种用法就是从缓存删除弱引用对象。 ptg 108 Data Structures 2.7.2 Reference Callbacks The ref constructor accepts an optional callback function to invoke when the refer- enced object is deleted. import weakref class ExpensiveObject(object): def __del__(self): print ’(Deleting %s)’ % self def callback(reference): """Invoked when referenced object is deleted""" print ’callback(’, reference, ’)’ obj = ExpensiveObject() r = weakref.ref(obj, callback) print ’obj:’, obj print ’ref:’, r print ’r():’, r() print ’deleting obj’ del obj print ’r():’, r() The callback receives the reference object as an argument after the reference is “dead” and no longer refers to the original object. One use for this feature is to remove the weak reference object from a cache. $ python weakref_ref_callback.py obj: <__main__.ExpensiveObject object at 0x100da1950> ref: r(): <__main__.ExpensiveObject object at 0x100da1950> deleting obj callback( ) (Deleting <__main__.ExpensiveObject object at 0x100da1950>) r(): None 2.7.3 Proxies It is sometimes more convenient to use a proxy, rather than a weak reference. Proxies can be used as though they were the original object and do not need to be called before 第 2 章 数 据 结 构 87  ptg 108 Data Structures 2.7.2 Reference Callbacks The ref constructor accepts an optional callback function to invoke when the refer- enced object is deleted. import weakref class ExpensiveObject(object): def __del__(self): print ’(Deleting %s)’ % self def callback(reference): """Invoked when referenced object is deleted""" print ’callback(’, reference, ’)’ obj = ExpensiveObject() r = weakref.ref(obj, callback) print ’obj:’, obj print ’ref:’, r print ’r():’, r() print ’deleting obj’ del obj print ’r():’, r() The callback receives the reference object as an argument after the reference is “dead” and no longer refers to the original object. One use for this feature is to remove the weak reference object from a cache. $ python weakref_ref_callback.py obj: <__main__.ExpensiveObject object at 0x100da1950> ref: r(): <__main__.ExpensiveObject object at 0x100da1950> deleting obj callback( ) (Deleting <__main__.ExpensiveObject object at 0x100da1950>) r(): None 2.7.3 Proxies It is sometimes more convenient to use a proxy, rather than a weak reference. Proxies can be used as though they were the original object and do not need to be called before 2.7.3 代理 有时使用代理比使用弱引用更为方便。使用代理时可以像使用原对象一样,而且不要求访 问对象之前先调用代理。这说明,可以将代理传递到一个库,而这个库并不知道它接收的是一 个引用而不是真正的对象。 ptg 2.7. weakref—Impermanent References to Objects 109 the object is accessible. That means they can be passed to a library that does not know it is receiving a reference instead of the real object. import weakref class ExpensiveObject(object): def __init__(self, name): self.name = name def __del__(self): print ’(Deleting %s)’ % self obj = ExpensiveObject(’My Object’) r = weakref.ref(obj) p = weakref.proxy(obj) print ’via obj:’, obj.name print ’via ref:’, r().name print ’via proxy:’, p.name del obj print ’via proxy:’, p.name If the proxy is accessed after the referent object is removed, a ReferenceError exception is raised. $ python weakref_proxy.py via obj: My Object via ref: My Object via proxy: My Object (Deleting <__main__.ExpensiveObject object at 0x100da27d0>) via proxy: Traceback (most recent call last): File "weakref_proxy.py", line 26, in print ’via proxy:’, p.name ReferenceError: weakly-referenced object no longer exists 2.7.4 Cyclic References One use for weak references is to allow cyclic references without preventing garbage collection. This example illustrates the difference between using regular objects and proxies when a graph includes a cycle. The Graph class in weakref_graph.py accepts any object given to it as the “next” node in the sequence. For the sake of brevity, this implementation supports 如果引用对象删除后再访问代理,会产生一个 ReferenceError 异常。 ptg 2.7. weakref—Impermanent References to Objects 109 the object is accessible. That means they can be passed to a library that does not know it is receiving a reference instead of the real object. import weakref class ExpensiveObject(object): def __init__(self, name): self.name = name def __del__(self): print ’(Deleting %s)’ % self obj = ExpensiveObject(’My Object’) r = weakref.ref(obj) p = weakref.proxy(obj) print ’via obj:’, obj.name print ’via ref:’, r().name print ’via proxy:’, p.name del obj print ’via proxy:’, p.name If the proxy is accessed after the referent object is removed, a ReferenceError exception is raised. $ python weakref_proxy.py via obj: My Object via ref: My Object via proxy: My Object (Deleting <__main__.ExpensiveObject object at 0x100da27d0>) via proxy: Traceback (most recent call last): File "weakref_proxy.py", line 26, in print ’via proxy:’, p.name ReferenceError: weakly-referenced object no longer exists 2.7.4 Cyclic References One use for weak references is to allow cyclic references without preventing garbage collection. This example illustrates the difference between using regular objects and proxies when a graph includes a cycle. The Graph class in weakref_graph.py accepts any object given to it as the “next” node in the sequence. For the sake of brevity, this implementation supports 2.7.4 循环引用 弱引用有一种用法,即在不阻止垃圾回收时允许循环引用。下面的例子展示了图中包含一  88  Python 标准库  个循环时使用常规对象和使用代理的区别。 weakref_graph.py 中的 Graph 类接受给定的任意对象作为序列中的“下一个”节点。为简 洁起见,这个实现支持从各节点有一个“传出”(outgoing)引用,通常这样做用途很有限,不 过可以很容易为这些例子创建循环。函数 demo() 是一个工具函数,通过创建一个循环然后删除 各个引用来尝试使用 Graph 类。 ptg 110 Data Structures a single outgoing reference from each node, which is of limited use generally, but makes it easy to create cycles for these examples. The function demo() is a utility function to exercise the Graph class by creating a cycle and then removing various references. import gc from pprint import pprint import weakref class Graph(object): def __init__(self, name): self.name = name self.other = None def set_next(self, other): print ’%s.set_next(%r)’ % (self.name, other) self.other = other def all_nodes(self): "Generate the nodes in the graph sequence." yield self n = self.other while n and n.name != self.name: yield n n = n.other if n is self: yield n return def __str__(self): return ’->’.join(n.name for n in self.all_nodes()) def __repr__(self): return ’<%s at 0x%x name=%s>’ % (self.__class__.__name__, id(self), self.name) def __del__(self): print ’(Deleting %s)’ % self.name self.set_next(None) def collect_and_show_garbage(): "Show what garbage is present." print ’Collecting...’ n = gc.collect() print ’Unreachable objects:’, n print ’Garbage:’, pprint(gc.garbage) ptg 2.7. weakref—Impermanent References to Objects 111 def demo(graph_factory): print ’Set up graph:’ one = graph_factory(’one’) two = graph_factory(’two’) three = graph_factory(’three’) one.set_next(two) two.set_next(three) three.set_next(one) print print ’Graph:’ print str(one) collect_and_show_garbage() print three = None two = None print ’After 2 references removed:’ print str(one) collect_and_show_garbage() print print ’Removing last reference:’ one = None collect_and_show_garbage() This example uses the gc module to help debug the leak. The DEBUG_LEAK flag causes gc to print information about objects that cannot be seen, other than through the reference the garbage collector has to them. import gc from pprint import pprint import weakref from weakref_graph import Graph, demo, collect_and_show_garbage gc.set_debug(gc.DEBUG_LEAK) print ’Setting up the cycle’ print demo(Graph) 第 2 章 数 据 结 构 89  ptg 2.7. weakref—Impermanent References to Objects 111 def demo(graph_factory): print ’Set up graph:’ one = graph_factory(’one’) two = graph_factory(’two’) three = graph_factory(’three’) one.set_next(two) two.set_next(three) three.set_next(one) print print ’Graph:’ print str(one) collect_and_show_garbage() print three = None two = None print ’After 2 references removed:’ print str(one) collect_and_show_garbage() print print ’Removing last reference:’ one = None collect_and_show_garbage() This example uses the gc module to help debug the leak. The DEBUG_LEAK flag causes gc to print information about objects that cannot be seen, other than through the reference the garbage collector has to them. import gc from pprint import pprint import weakref from weakref_graph import Graph, demo, collect_and_show_garbage gc.set_debug(gc.DEBUG_LEAK) print ’Setting up the cycle’ print demo(Graph) 这个例子使用 gc 模块帮助调试内存泄露。DEBUG_LEAK 标志会使 gc 打印那些无法看到 的对象的有关信息,而不是通过垃圾回收器中这些对象的引用。 ptg 2.7. weakref—Impermanent References to Objects 111 def demo(graph_factory): print ’Set up graph:’ one = graph_factory(’one’) two = graph_factory(’two’) three = graph_factory(’three’) one.set_next(two) two.set_next(three) three.set_next(one) print print ’Graph:’ print str(one) collect_and_show_garbage() print three = None two = None print ’After 2 references removed:’ print str(one) collect_and_show_garbage() print print ’Removing last reference:’ one = None collect_and_show_garbage() This example uses the gc module to help debug the leak. The DEBUG_LEAK flag causes gc to print information about objects that cannot be seen, other than through the reference the garbage collector has to them. import gc from pprint import pprint import weakref from weakref_graph import Graph, demo, collect_and_show_garbage gc.set_debug(gc.DEBUG_LEAK) print ’Setting up the cycle’ print demo(Graph) ptg 112 Data Structures print print ’Breaking the cycle and cleaning up garbage’ print gc.garbage[0].set_next(None) while gc.garbage: del gc.garbage[0] print collect_and_show_garbage() Even after deleting the local references to the Graph instances in demo(), the graphs all show up in the garbage list and cannot be collected. Several dictionaries are also found in the garbage list. They are the __dict__ values from the Graph instances and contain the attributes for those objects. The graphs can be forcibly deleted, since the program knows what they are. Enabling unbuffered I/O by passing the -u option to the interpreter ensures that the output from the print statements in this example program (written to standard output) and the debug output from gc (written to standard error) are interleaved correctly. $ python -u weakref_cycle.py Setting up the cycle Set up graph: one.set_next() two.set_next() three.set_next() Graph: one->two->three->one Collecting... Unreachable objects: 0 Garbage:[] After 2 references removed: one->two->three->one Collecting... Unreachable objects: 0 Garbage:[] Removing last reference: Collecting... gc: uncollectable gc: uncollectable 即使在 demo() 中删除了 Graph 实例的本地引用,图仍然显示在垃圾列表中,不能回收。而  90  Python 标准库  且垃圾列表中还会找到多个字典。它们是来自 Graph 实例的 __dict__ 值,包含了这些对象的属 性。可以强制删除这些图,因为程序知道它们是什么。向解释器传入 -u 选项,启用非缓冲 I/O, 从而确保这个示例程序中 print 语句的输出(写至标准输出)和 gc 的调试输出(写至标准错误 输出)正确地隔行显示。 ptg 112 Data Structures print print ’Breaking the cycle and cleaning up garbage’ print gc.garbage[0].set_next(None) while gc.garbage: del gc.garbage[0] print collect_and_show_garbage() Even after deleting the local references to the Graph instances in demo(), the graphs all show up in the garbage list and cannot be collected. Several dictionaries are also found in the garbage list. They are the __dict__ values from the Graph instances and contain the attributes for those objects. The graphs can be forcibly deleted, since the program knows what they are. Enabling unbuffered I/O by passing the -u option to the interpreter ensures that the output from the print statements in this example program (written to standard output) and the debug output from gc (written to standard error) are interleaved correctly. $ python -u weakref_cycle.py Setting up the cycle Set up graph: one.set_next() two.set_next() three.set_next() Graph: one->two->three->one Collecting... Unreachable objects: 0 Garbage:[] After 2 references removed: one->two->three->one Collecting... Unreachable objects: 0 Garbage:[] Removing last reference: Collecting... gc: uncollectable gc: uncollectable ptg 2.7. weakref—Impermanent References to Objects 113 gc: uncollectable gc: uncollectable gc: uncollectable gc: uncollectable Unreachable objects: 6 Garbage:[, , , {’name’: ’one’, ’other’: }, {’name’: ’two’, ’other’: }, {’name’: ’three’, ’other’: }] Breaking the cycle and cleaning up garbage one.set_next(None) (Deleting two) two.set_next(None) (Deleting three) three.set_next(None) (Deleting one) one.set_next(None) Collecting... Unreachable objects: 0 Garbage:[] The next step is to create a more intelligent WeakGraph class that knows how to avoid creating cycles with regular references by using weak references when a cycle is detected. import gc from pprint import pprint import weakref from weakref_graph import Graph, demo class WeakGraph(Graph): def set_next(self, other): if other is not None: # See if we should replace the reference # to other with a weakref. if self in other.all_nodes(): other = weakref.proxy(other) 第 2 章 数 据 结 构 91  ptg 2.7. weakref—Impermanent References to Objects 113 gc: uncollectable gc: uncollectable gc: uncollectable gc: uncollectable Unreachable objects: 6 Garbage:[, , , {’name’: ’one’, ’other’: }, {’name’: ’two’, ’other’: }, {’name’: ’three’, ’other’: }] Breaking the cycle and cleaning up garbage one.set_next(None) (Deleting two) two.set_next(None) (Deleting three) three.set_next(None) (Deleting one) one.set_next(None) Collecting... Unreachable objects: 0 Garbage:[] The next step is to create a more intelligent WeakGraph class that knows how to avoid creating cycles with regular references by using weak references when a cycle is detected. import gc from pprint import pprint import weakref from weakref_graph import Graph, demo class WeakGraph(Graph): def set_next(self, other): if other is not None: # See if we should replace the reference # to other with a weakref. if self in other.all_nodes(): other = weakref.proxy(other) 下一步是创建一个更智能的 WeakGraph 类,它知道如何在检测到一个循环时使用弱引用来 避免建立常规引用循环。 ptg 2.7. weakref—Impermanent References to Objects 113 gc: uncollectable gc: uncollectable gc: uncollectable gc: uncollectable Unreachable objects: 6 Garbage:[, , , {’name’: ’one’, ’other’: }, {’name’: ’two’, ’other’: }, {’name’: ’three’, ’other’: }] Breaking the cycle and cleaning up garbage one.set_next(None) (Deleting two) two.set_next(None) (Deleting three) three.set_next(None) (Deleting one) one.set_next(None) Collecting... Unreachable objects: 0 Garbage:[] The next step is to create a more intelligent WeakGraph class that knows how to avoid creating cycles with regular references by using weak references when a cycle is detected. import gc from pprint import pprint import weakref from weakref_graph import Graph, demo class WeakGraph(Graph): def set_next(self, other): if other is not None: # See if we should replace the reference # to other with a weakref. if self in other.all_nodes(): other = weakref.proxy(other) ptg 114 Data Structures super(WeakGraph, self).set_next(other) return demo(WeakGraph) Since the WeakGraph instances use proxies to refer to objects that have already been seen, as demo() removes all local references to the objects, the cycle is broken and the garbage collector can delete the objects. $ python weakref_weakgraph.py Set up graph: one.set_next() two.set_next() three.set_next( ) Graph: one->two->three Collecting... Unreachable objects: 0 Garbage:[] After 2 references removed: one->two->three Collecting... Unreachable objects: 0 Garbage:[] Removing last reference: (Deleting one) one.set_next(None) (Deleting two) two.set_next(None) (Deleting three) three.set_next(None) Collecting... Unreachable objects: 0 Garbage:[] 2.7.5 Caching Objects The ref and proxy classes are considered “low level.” While they are useful for maintaining weak references to individual objects and allowing cycles to be garbage 由于 WeakGraph 实例使用代理来指示已看到的对象,随着 demo() 删除了对象的所有本地 引用,循环会断开,这样垃圾回收器就可以将这些对象删除。 ptg 114 Data Structures super(WeakGraph, self).set_next(other) return demo(WeakGraph) Since the WeakGraph instances use proxies to refer to objects that have already been seen, as demo() removes all local references to the objects, the cycle is broken and the garbage collector can delete the objects. $ python weakref_weakgraph.py Set up graph: one.set_next() two.set_next() three.set_next( ) Graph: one->two->three Collecting... Unreachable objects: 0 Garbage:[] After 2 references removed: one->two->three Collecting... Unreachable objects: 0 Garbage:[] Removing last reference: (Deleting one) one.set_next(None) (Deleting two) two.set_next(None) (Deleting three) three.set_next(None) Collecting... Unreachable objects: 0 Garbage:[] 2.7.5 Caching Objects The ref and proxy classes are considered “low level.” While they are useful for maintaining weak references to individual objects and allowing cycles to be garbage  92  Python 标准库  ptg 114 Data Structures super(WeakGraph, self).set_next(other) return demo(WeakGraph) Since the WeakGraph instances use proxies to refer to objects that have already been seen, as demo() removes all local references to the objects, the cycle is broken and the garbage collector can delete the objects. $ python weakref_weakgraph.py Set up graph: one.set_next() two.set_next() three.set_next( ) Graph: one->two->three Collecting... Unreachable objects: 0 Garbage:[] After 2 references removed: one->two->three Collecting... Unreachable objects: 0 Garbage:[] Removing last reference: (Deleting one) one.set_next(None) (Deleting two) two.set_next(None) (Deleting three) three.set_next(None) Collecting... Unreachable objects: 0 Garbage:[] 2.7.5 Caching Objects The ref and proxy classes are considered “low level.” While they are useful for maintaining weak references to individual objects and allowing cycles to be garbage 2.7.5 缓存对象 ref 和 proxy 类可以认为是“底层”实现。尽管它们对于维护单个对象的弱引用很有用,并 允许将循环垃圾回收,但 WeakKeyDictionary 和 WeakValueDictionary 提供了一个更合适的 API 来创建多个对象的缓存。 WeakValueDictionary 使用其中保存的值的弱引用,当其他代码不再实际使用这些值时允许将 其垃圾回收。通过使用垃圾回收器的显式调用,由此说明了使用常规字典和 WeakValueDictionary 完成内存处理的差别。 ptg 2.7. weakref—Impermanent References to Objects 115 collected, the WeakKeyDictionary and WeakValueDictionary provide a more appropriate API for creating a cache of several objects. The WeakValueDictionary uses weak references to the values it holds, allow- ing them to be garbage collected when other code is not actually using them. Using explicit calls to the garbage collector illustrates the difference between memory han- dling with a regular dictionary and WeakValueDictionary. import gc from pprint import pprint import weakref gc.set_debug(gc.DEBUG_LEAK) class ExpensiveObject(object): def __init__(self, name): self.name = name def __repr__(self): return ’ExpensiveObject(%s)’ % self.name def __del__(self): print ’ (Deleting %s)’ % self def demo(cache_factory): # hold objects so any weak references # are not removed immediately all_refs = {} # create the cache using the factory print ’CACHE TYPE:’, cache_factory cache = cache_factory() for name in [ ’one’, ’two’, ’three’ ]: o = ExpensiveObject(name) cache[name] = o all_refs[name] = o del o # decref print ’ all_refs =’, pprint(all_refs) print ’\n Before, cache contains:’, cache.keys() for name, value in cache.items(): print ’ %s = %s’ % (name, value) del value # decref # Remove all references to the objects except the cache print ’\n Cleanup:’ 第 2 章 数 据 结 构 93  ptg 2.7. weakref—Impermanent References to Objects 115 collected, the WeakKeyDictionary and WeakValueDictionary provide a more appropriate API for creating a cache of several objects. The WeakValueDictionary uses weak references to the values it holds, allow- ing them to be garbage collected when other code is not actually using them. Using explicit calls to the garbage collector illustrates the difference between memory han- dling with a regular dictionary and WeakValueDictionary. import gc from pprint import pprint import weakref gc.set_debug(gc.DEBUG_LEAK) class ExpensiveObject(object): def __init__(self, name): self.name = name def __repr__(self): return ’ExpensiveObject(%s)’ % self.name def __del__(self): print ’ (Deleting %s)’ % self def demo(cache_factory): # hold objects so any weak references # are not removed immediately all_refs = {} # create the cache using the factory print ’CACHE TYPE:’, cache_factory cache = cache_factory() for name in [ ’one’, ’two’, ’three’ ]: o = ExpensiveObject(name) cache[name] = o all_refs[name] = o del o # decref print ’ all_refs =’, pprint(all_refs) print ’\n Before, cache contains:’, cache.keys() for name, value in cache.items(): print ’ %s = %s’ % (name, value) del value # decref # Remove all references to the objects except the cache print ’\n Cleanup:’ ptg 116 Data Structures del all_refs gc.collect() print ’\n After, cache contains:’, cache.keys() for name, value in cache.items(): print ’ %s = %s’ % (name, value) print ’ demo returning’ return demo(dict) print demo(weakref.WeakValueDictionary) Any loop variables that refer to the values being cached must be cleared explicitly so the reference count of the object is decremented. Otherwise, the garbage collec- tor would not remove the objects, and they would remain in the cache. Similarly, the all_refs variable is used to hold references to prevent them from being garbage collected prematurely. $ python weakref_valuedict.py CACHE TYPE: all_refs ={’one’: ExpensiveObject(one), ’three’: ExpensiveObject(three), ’two’: ExpensiveObject(two)} Before, cache contains: [’three’, ’two’, ’one’] three = ExpensiveObject(three) two = ExpensiveObject(two) one = ExpensiveObject(one) Cleanup: After, cache contains: [’three’, ’two’, ’one’] three = ExpensiveObject(three) two = ExpensiveObject(two) one = ExpensiveObject(one) demo returning (Deleting ExpensiveObject(three)) (Deleting ExpensiveObject(two)) (Deleting ExpensiveObject(one)) 如果循环变量指示缓存的值,这些循环变量必须显式清除,从而使对象的引用计数减少。 否则,垃圾回收器不会删除这些对象,它们仍会保留在缓存中。类似地,all_refs 变量用来维护 引用,避免它们过早地被垃圾回收。 ptg 116 Data Structures del all_refs gc.collect() print ’\n After, cache contains:’, cache.keys() for name, value in cache.items(): print ’ %s = %s’ % (name, value) print ’ demo returning’ return demo(dict) print demo(weakref.WeakValueDictionary) Any loop variables that refer to the values being cached must be cleared explicitly so the reference count of the object is decremented. Otherwise, the garbage collec- tor would not remove the objects, and they would remain in the cache. Similarly, the all_refs variable is used to hold references to prevent them from being garbage collected prematurely. $ python weakref_valuedict.py CACHE TYPE: all_refs ={’one’: ExpensiveObject(one), ’three’: ExpensiveObject(three), ’two’: ExpensiveObject(two)} Before, cache contains: [’three’, ’two’, ’one’] three = ExpensiveObject(three) two = ExpensiveObject(two) one = ExpensiveObject(one) Cleanup: After, cache contains: [’three’, ’two’, ’one’] three = ExpensiveObject(three) two = ExpensiveObject(two) one = ExpensiveObject(one) demo returning (Deleting ExpensiveObject(three)) (Deleting ExpensiveObject(two)) (Deleting ExpensiveObject(one))  94  Python 标准库  ptg 116 Data Structures del all_refs gc.collect() print ’\n After, cache contains:’, cache.keys() for name, value in cache.items(): print ’ %s = %s’ % (name, value) print ’ demo returning’ return demo(dict) print demo(weakref.WeakValueDictionary) Any loop variables that refer to the values being cached must be cleared explicitly so the reference count of the object is decremented. Otherwise, the garbage collec- tor would not remove the objects, and they would remain in the cache. Similarly, the all_refs variable is used to hold references to prevent them from being garbage collected prematurely. $ python weakref_valuedict.py CACHE TYPE: all_refs ={’one’: ExpensiveObject(one), ’three’: ExpensiveObject(three), ’two’: ExpensiveObject(two)} Before, cache contains: [’three’, ’two’, ’one’] three = ExpensiveObject(three) two = ExpensiveObject(two) one = ExpensiveObject(one) Cleanup: After, cache contains: [’three’, ’two’, ’one’] three = ExpensiveObject(three) two = ExpensiveObject(two) one = ExpensiveObject(one) demo returning (Deleting ExpensiveObject(three)) (Deleting ExpensiveObject(two)) (Deleting ExpensiveObject(one)) ptg 2.8. copy—Duplicate Objects 117 CACHE TYPE: weakref.WeakValueDictionary all_refs ={’one’: ExpensiveObject(one), ’three’: ExpensiveObject(three), ’two’: ExpensiveObject(two)} Before, cache contains: [’three’, ’two’, ’one’] three = ExpensiveObject(three) two = ExpensiveObject(two) one = ExpensiveObject(one) Cleanup: (Deleting ExpensiveObject(three)) (Deleting ExpensiveObject(two)) (Deleting ExpensiveObject(one)) After, cache contains: [] demo returning The WeakKeyDictionary works similarly, but it uses weak references for the keys instead of the values in the dictionary. Warning: The library documentation for weakref contains this warning: Caution: Because a WeakValueDictionary is built on top of a Python dictionary, it must not change size when iterating over it. This can be difficult to ensure for a WeakValueDictionary because actions performed by the program during iter- ation may cause items in the dictionary to vanish “by magic” (as a side effect of garbage collection). See Also: weakref (http://docs.python.org/lib/module-weakref.html) Standard library docu- mentation for this module. gc (page 1138) The gc module is the interface to the interpreter’s garbage collector. 2.8 copy—Duplicate Objects Purpose Provides functions for duplicating objects using shallow or deep copy semantics. Python Version 1.4 and later ptg 2.8. copy—Duplicate Objects 117 CACHE TYPE: weakref.WeakValueDictionary all_refs ={’one’: ExpensiveObject(one), ’three’: ExpensiveObject(three), ’two’: ExpensiveObject(two)} Before, cache contains: [’three’, ’two’, ’one’] three = ExpensiveObject(three) two = ExpensiveObject(two) one = ExpensiveObject(one) Cleanup: (Deleting ExpensiveObject(three)) (Deleting ExpensiveObject(two)) (Deleting ExpensiveObject(one)) After, cache contains: [] demo returning The WeakKeyDictionary works similarly, but it uses weak references for the keys instead of the values in the dictionary. Warning: The library documentation for weakref contains this warning: Caution: Because a WeakValueDictionary is built on top of a Python dictionary, it must not change size when iterating over it. This can be difficult to ensure for a WeakValueDictionary because actions performed by the program during iter- ation may cause items in the dictionary to vanish “by magic” (as a side effect of garbage collection). See Also: weakref (http://docs.python.org/lib/module-weakref.html) Standard library docu- mentation for this module. gc (page 1138) The gc module is the interface to the interpreter’s garbage collector. 2.8 copy—Duplicate Objects Purpose Provides functions for duplicating objects using shallow or deep copy semantics. Python Version 1.4 and later WeakKeyDictionary 的工作原理与此类似,不过它使用字典中键的弱引用,而不是字典中的值。 警告: weakref 的库文档包含以下警告。 注意 :由于 WeakValueDictionary 基于 Python 字典建立,迭代处理它时不能改变大小。对于 WeakValueDictionary,可能很难确保这一点,因为程序在迭代期间完成的动作可能会导致字典 中的元素“魔法般地”消失(作为垃圾回收的一个副作用)。 参见: weakref (http://docs.python.org/lib/module-weakref.html) 这个模块的标准库文档。 gc (17.6 节) gc 模块是解释器的垃圾回收器的接口。 2.8 copy—复制对象 作用:提供一些函数,可以使用浅副本或深副本语义复制对象。 Python 版本:1.4 及以后版本 copy 模块包括两个函数 copy() 和 deepcopy(),用于复制现有的对象。 2.8.1 浅副本 copy() 创建的浅副本(shallow copy)是一个新容器,其中填充原对象内容的引用。建立 第 2 章 数 据 结 构 95  list 对象的一个浅副本时,会构造一个新的 list,并将原对象的元素追加到这个 list。 ptg 118 Data Structures The copy module includes two functions, copy() and deepcopy(), for duplicating existing objects. 2.8.1 Shallow Copies The shallow copy created by copy() is a new container populated with references to the contents of the original object. When making a shallow copy of a list object, a new list is constructed and the elements of the original object are appended to it. import copy class MyClass: def __init__(self, name): self.name = name def __cmp__(self, other): return cmp(self.name, other.name) a = MyClass(’a’) my_list = [ a ] dup = copy.copy(my_list) print ’ my_list:’, my_list print ’ dup:’, dup print ’ dup is my_list:’, (dup is my_list) print ’ dup == my_list:’, (dup == my_list) print ’dup[0] is my_list[0]:’, (dup[0] is my_list[0]) print ’dup[0] == my_list[0]:’, (dup[0] == my_list[0]) For a shallow copy, the MyClass instance is not duplicated, so the reference in the dup list is to the same object that is in my_list. $ python copy_shallow.py my_list: [<__main__.MyClass instance at 0x100dadc68>] dup: [<__main__.MyClass instance at 0x100dadc68>] dup is my_list: False dup == my_list: True dup[0] is my_list[0]: True dup[0] == my_list[0]: True 2.8.2 Deep Copies The deep copy created by deepcopy() is a new container populated with copies of the contents of the original object. To make a deep copy of a list, a new list ptg 118 Data Structures The copy module includes two functions, copy() and deepcopy(), for duplicating existing objects. 2.8.1 Shallow Copies The shallow copy created by copy() is a new container populated with references to the contents of the original object. When making a shallow copy of a list object, a new list is constructed and the elements of the original object are appended to it. import copy class MyClass: def __init__(self, name): self.name = name def __cmp__(self, other): return cmp(self.name, other.name) a = MyClass(’a’) my_list = [ a ] dup = copy.copy(my_list) print ’ my_list:’, my_list print ’ dup:’, dup print ’ dup is my_list:’, (dup is my_list) print ’ dup == my_list:’, (dup == my_list) print ’dup[0] is my_list[0]:’, (dup[0] is my_list[0]) print ’dup[0] == my_list[0]:’, (dup[0] == my_list[0]) For a shallow copy, the MyClass instance is not duplicated, so the reference in the dup list is to the same object that is in my_list. $ python copy_shallow.py my_list: [<__main__.MyClass instance at 0x100dadc68>] dup: [<__main__.MyClass instance at 0x100dadc68>] dup is my_list: False dup == my_list: True dup[0] is my_list[0]: True dup[0] == my_list[0]: True 2.8.2 Deep Copies The deep copy created by deepcopy() is a new container populated with copies of the contents of the original object. To make a deep copy of a list, a new list 对于一个浅副本,不会复制 MyClass 实例,所以 dup 列表中的引用会指向 my_list 中相同 的对象。 ptg 118 Data Structures The copy module includes two functions, copy() and deepcopy(), for duplicating existing objects. 2.8.1 Shallow Copies The shallow copy created by copy() is a new container populated with references to the contents of the original object. When making a shallow copy of a list object, a new list is constructed and the elements of the original object are appended to it. import copy class MyClass: def __init__(self, name): self.name = name def __cmp__(self, other): return cmp(self.name, other.name) a = MyClass(’a’) my_list = [ a ] dup = copy.copy(my_list) print ’ my_list:’, my_list print ’ dup:’, dup print ’ dup is my_list:’, (dup is my_list) print ’ dup == my_list:’, (dup == my_list) print ’dup[0] is my_list[0]:’, (dup[0] is my_list[0]) print ’dup[0] == my_list[0]:’, (dup[0] == my_list[0]) For a shallow copy, the MyClass instance is not duplicated, so the reference in the dup list is to the same object that is in my_list. $ python copy_shallow.py my_list: [<__main__.MyClass instance at 0x100dadc68>] dup: [<__main__.MyClass instance at 0x100dadc68>] dup is my_list: False dup == my_list: True dup[0] is my_list[0]: True dup[0] == my_list[0]: True 2.8.2 Deep Copies The deep copy created by deepcopy() is a new container populated with copies of the contents of the original object. To make a deep copy of a list, a new list 2.8.2 深副本 deepcopy() 创建的深副本是一个新容器,其中填充原对象内容的副本。要建立一个 list 的 深副本,会构造一个新的 list,复制原列表的元素,然后将这些副本追加到新列表。 将前例中的 copy() 调用替换为 deepcopy(),可以清楚地看到输出的不同。 ptg 2.8. copy—Duplicate Objects 119 is constructed, the elements of the original list are copied, and then those copies are appended to the new list. Replacing the call to copy() with deepcopy() makes the difference in the output apparent. dup = copy.deepcopy(my_list) The first element of the list is no longer the same object reference, but when the two objects are compared, they still evaluate as being equal. $ python copy_deep.py my_list: [<__main__.MyClass instance at 0x100dadc68>] dup: [<__main__.MyClass instance at 0x100dadc20>] dup is my_list: False dup == my_list: True dup[0] is my_list[0]: False dup[0] == my_list[0]: True 2.8.3 Customizing Copy Behavior It is possible to control how copies are made using the __copy__() and __deepcopy__() special methods. • __copy__() is called without any arguments and should return a shallow copy of the object. • __deepcopy__() is called with a memo dictionary and should return a deep copy of the object. Any member attributes that need to be deep-copied should be passed to copy.deepcopy(), along with the memo dictionary, to control for recursion. (The memo dictionary is explained in more detail later.) This example illustrates how the methods are called. import copy class MyClass: def __init__(self, name): self.name = name def __cmp__(self, other): return cmp(self.name, other.name) 列表的第一个元素不再是相同的对象引用,不过比较这两个对象时,仍认为它们是相等的。 ptg 2.8. copy—Duplicate Objects 119 is constructed, the elements of the original list are copied, and then those copies are appended to the new list. Replacing the call to copy() with deepcopy() makes the difference in the output apparent. dup = copy.deepcopy(my_list) The first element of the list is no longer the same object reference, but when the two objects are compared, they still evaluate as being equal. $ python copy_deep.py my_list: [<__main__.MyClass instance at 0x100dadc68>] dup: [<__main__.MyClass instance at 0x100dadc20>] dup is my_list: False dup == my_list: True dup[0] is my_list[0]: False dup[0] == my_list[0]: True 2.8.3 Customizing Copy Behavior It is possible to control how copies are made using the __copy__() and __deepcopy__() special methods. • __copy__() is called without any arguments and should return a shallow copy of the object. • __deepcopy__() is called with a memo dictionary and should return a deep copy of the object. Any member attributes that need to be deep-copied should be passed to copy.deepcopy(), along with the memo dictionary, to control for recursion. (The memo dictionary is explained in more detail later.) This example illustrates how the methods are called. import copy class MyClass: def __init__(self, name): self.name = name def __cmp__(self, other): return cmp(self.name, other.name)  96  Python 标准库  2.8.3 定制复制行为 可以使用特殊方法 __copy__() 和 __deepcopy__() 来控制如何建立副本。 • 调用 __copy__() 而不提供任何参数,这会返回对象的一个浅副本。 • 调用 __deepcopy__(),并提供一个备忘字典,这会返回对象的一个深副本。所有需要深复 制的成员属性都要连同备忘字典传递到 copy.deepcopy() 来控制递归。(备忘字典将在后面 更详细地解释。) 下面这个例子展示了如何调用这些方法。 ptg 2.8. copy—Duplicate Objects 119 is constructed, the elements of the original list are copied, and then those copies are appended to the new list. Replacing the call to copy() with deepcopy() makes the difference in the output apparent. dup = copy.deepcopy(my_list) The first element of the list is no longer the same object reference, but when the two objects are compared, they still evaluate as being equal. $ python copy_deep.py my_list: [<__main__.MyClass instance at 0x100dadc68>] dup: [<__main__.MyClass instance at 0x100dadc20>] dup is my_list: False dup == my_list: True dup[0] is my_list[0]: False dup[0] == my_list[0]: True 2.8.3 Customizing Copy Behavior It is possible to control how copies are made using the __copy__() and __deepcopy__() special methods. • __copy__() is called without any arguments and should return a shallow copy of the object. • __deepcopy__() is called with a memo dictionary and should return a deep copy of the object. Any member attributes that need to be deep-copied should be passed to copy.deepcopy(), along with the memo dictionary, to control for recursion. (The memo dictionary is explained in more detail later.) This example illustrates how the methods are called. import copy class MyClass: def __init__(self, name): self.name = name def __cmp__(self, other): return cmp(self.name, other.name) ptg 120 Data Structures def __copy__(self): print ’__copy__()’ return MyClass(self.name) def __deepcopy__(self, memo): print ’__deepcopy__(%s)’ % str(memo) return MyClass(copy.deepcopy(self.name, memo)) a = MyClass(’a’) sc = copy.copy(a) dc = copy.deepcopy(a) The memo dictionary is used to keep track of the values that have been copied already, to avoid infinite recursion. $ python copy_hooks.py __copy__() __deepcopy__({}) 2.8.4 Recursion in Deep Copy To avoid problems with duplicating recursive data structures, deepcopy() uses a dic- tionary to track objects that have already been copied. This dictionary is passed to the __deepcopy__() method so it can be examined there as well. This example shows how an interconnected data structure, such as a directed graph, can assist with protecting against recursion by implementing a __deepcopy __() method. import copy import pprint class Graph: def __init__(self, name, connections): self.name = name self.connections = connections def add_connection(self, other): self.connections.append(other) def __repr__(self): return ’Graph(name=%s, id=%s)’ % (self.name, id(self)) 备忘字典用于跟踪已复制的值,以避免无限递归。 ptg 120 Data Structures def __copy__(self): print ’__copy__()’ return MyClass(self.name) def __deepcopy__(self, memo): print ’__deepcopy__(%s)’ % str(memo) return MyClass(copy.deepcopy(self.name, memo)) a = MyClass(’a’) sc = copy.copy(a) dc = copy.deepcopy(a) The memo dictionary is used to keep track of the values that have been copied already, to avoid infinite recursion. $ python copy_hooks.py __copy__() __deepcopy__({}) 2.8.4 Recursion in Deep Copy To avoid problems with duplicating recursive data structures, deepcopy() uses a dic- tionary to track objects that have already been copied. This dictionary is passed to the __deepcopy__() method so it can be examined there as well. This example shows how an interconnected data structure, such as a directed graph, can assist with protecting against recursion by implementing a __deepcopy __() method. import copy import pprint class Graph: def __init__(self, name, connections): self.name = name self.connections = connections def add_connection(self, other): self.connections.append(other) def __repr__(self): return ’Graph(name=%s, id=%s)’ % (self.name, id(self)) 2.8.4 深副本中的递归 为了避免复制递归数据结构可能带来的问题,deepcopy() 使用了一个字典来跟踪已复制的 对象。将这个字典传入 __deepcopy__() 方法,从而在该方法中也可以进行检查。 下面的例子显示了一个互连的数据结构(如一个有向图)可以通过实现 __deepcopy__() 方 法帮助防止递归。 ptg 120 Data Structures def __copy__(self): print ’__copy__()’ return MyClass(self.name) def __deepcopy__(self, memo): print ’__deepcopy__(%s)’ % str(memo) return MyClass(copy.deepcopy(self.name, memo)) a = MyClass(’a’) sc = copy.copy(a) dc = copy.deepcopy(a) The memo dictionary is used to keep track of the values that have been copied already, to avoid infinite recursion. $ python copy_hooks.py __copy__() __deepcopy__({}) 2.8.4 Recursion in Deep Copy To avoid problems with duplicating recursive data structures, deepcopy() uses a dic- tionary to track objects that have already been copied. This dictionary is passed to the __deepcopy__() method so it can be examined there as well. This example shows how an interconnected data structure, such as a directed graph, can assist with protecting against recursion by implementing a __deepcopy __() method. import copy import pprint class Graph: def __init__(self, name, connections): self.name = name self.connections = connections def add_connection(self, other): self.connections.append(other) def __repr__(self): return ’Graph(name=%s, id=%s)’ % (self.name, id(self)) 第 2 章 数 据 结 构 97  ptg 120 Data Structures def __copy__(self): print ’__copy__()’ return MyClass(self.name) def __deepcopy__(self, memo): print ’__deepcopy__(%s)’ % str(memo) return MyClass(copy.deepcopy(self.name, memo)) a = MyClass(’a’) sc = copy.copy(a) dc = copy.deepcopy(a) The memo dictionary is used to keep track of the values that have been copied already, to avoid infinite recursion. $ python copy_hooks.py __copy__() __deepcopy__({}) 2.8.4 Recursion in Deep Copy To avoid problems with duplicating recursive data structures, deepcopy() uses a dic- tionary to track objects that have already been copied. This dictionary is passed to the __deepcopy__() method so it can be examined there as well. This example shows how an interconnected data structure, such as a directed graph, can assist with protecting against recursion by implementing a __deepcopy __() method. import copy import pprint class Graph: def __init__(self, name, connections): self.name = name self.connections = connections def add_connection(self, other): self.connections.append(other) def __repr__(self): return ’Graph(name=%s, id=%s)’ % (self.name, id(self)) ptg 2.8. copy—Duplicate Objects 121 def __deepcopy__(self, memo): print ’\nCalling __deepcopy__ for %r’ % self if self in memo: existing = memo.get(self) print ’ Already copied to %r’ % existing return existing print ’ Memo dictionary:’ pprint.pprint(memo, indent=4, width=40) dup = Graph(copy.deepcopy(self.name, memo), []) print ’ Copying to new object %s’ % dup memo[self] = dup for c in self.connections: dup.add_connection(copy.deepcopy(c, memo)) return dup root = Graph(’root’, []) a = Graph(’a’, [root]) b = Graph(’b’, [a, root]) root.add_connection(a) root.add_connection(b) dup = copy.deepcopy(root) The Graph class includes a few basic directed-graph methods. An instance can be initialized with a name and a list of existing nodes to which it is connected. The add_connection() method is used to set up bidirectional connections. It is also used by the deepcopy operator. The __deepcopy__() method prints messages to show how it is called and man- ages the memo dictionary contents, as needed. Instead of copying the connection list wholesale, it creates a new list and appends copies of the individual connections to it. That ensures that the memo dictionary is updated as each new node is duplicated and avoids recursion issues or extra copies of nodes. As before, it returns the copied object when it is done. There are several cycles in the graph shown in Figure 2.1, but handling the re- cursion with the memo dictionary prevents the traversal from causing a stack overflow error. When the root node is copied, the output is as follows. $ python copy_recursion.py Calling __deepcopy__ for Graph(name=root, id=4309347072) Memo dictionary: { } Graph 类包含一些基本的有向图方法。基于一个名和已连接的现有节点的一个列表可以初 始化一个 Graph 实例。add_connection() 方法用于建立双向连接。 deepcopy 也用到了这个方法。 __deepcopy__() 方 法 将 打 印 消 息 来 显 示 它 如 何 得 到 调 用, 并 根 据 需 要 管 理 备 忘 字 典 内 容。 它 不 是 复 制 整 个 连 接 列 表, 而 是 创 建 一 个 新 列 表, 把 各 个 连 接 的 副 本 追 加 到 这 个 列 表。 这样可以确保复制各个新节点时会更新备忘字典,以避免递 归问题或多余的节点副本。与前面一样,完成时会返回复制 的对象。 图 2.1 中存在几个环,不过利用备忘字典处理递归就可以避免 遍历导致栈溢出错误。复制根节点 root 时,输出如下: ptg 122 Data Structures root a b Figure 2.1. Deepcopy for an object graph with cycles Copying to new object Graph(name=root, id=4309347360) Calling __deepcopy__ for Graph(name=a, id=4309347144) Memo dictionary: { Graph(name=root, id=4309347072): Graph(name=root, id=4309347360), 4307936896: [’root’], 4309253504: ’root’} Copying to new object Graph(name=a, id=4309347504) Calling __deepcopy__ for Graph(name=root, id=4309347072) Already copied to Graph(name=root, id=4309347360) Calling __deepcopy__ for Graph(name=b, id=4309347216) Memo dictionary: { Graph(name=root, id=4309347072): Graph(name=root, id=4309347360), Graph(name=a, id=4309347144): Graph(name=a, id=4309347504), 4307936896: [ ’root’, ’a’, Graph(name=root, id=4309347072), Graph(name=a, id=4309347144)], 4308678136: ’a’, 4309253504: ’root’, 4309347072: Graph(name=root, id=4309347360), 4309347144: Graph(name=a, id=4309347504)} Copying to new object Graph(name=b, id=4309347864) The second time the root node is encountered, while the a node is being copied, __deepcopy__() detects the recursion and reuses the existing value from the memo dictionary instead of creating a new object. 图 2.1 带环对象图的深复制  98  Python 标准库  ptg 2.8. copy—Duplicate Objects 121 def __deepcopy__(self, memo): print ’\nCalling __deepcopy__ for %r’ % self if self in memo: existing = memo.get(self) print ’ Already copied to %r’ % existing return existing print ’ Memo dictionary:’ pprint.pprint(memo, indent=4, width=40) dup = Graph(copy.deepcopy(self.name, memo), []) print ’ Copying to new object %s’ % dup memo[self] = dup for c in self.connections: dup.add_connection(copy.deepcopy(c, memo)) return dup root = Graph(’root’, []) a = Graph(’a’, [root]) b = Graph(’b’, [a, root]) root.add_connection(a) root.add_connection(b) dup = copy.deepcopy(root) The Graph class includes a few basic directed-graph methods. An instance can be initialized with a name and a list of existing nodes to which it is connected. The add_connection() method is used to set up bidirectional connections. It is also used by the deepcopy operator. The __deepcopy__() method prints messages to show how it is called and man- ages the memo dictionary contents, as needed. Instead of copying the connection list wholesale, it creates a new list and appends copies of the individual connections to it. That ensures that the memo dictionary is updated as each new node is duplicated and avoids recursion issues or extra copies of nodes. As before, it returns the copied object when it is done. There are several cycles in the graph shown in Figure 2.1, but handling the re- cursion with the memo dictionary prevents the traversal from causing a stack overflow error. When the root node is copied, the output is as follows. $ python copy_recursion.py Calling __deepcopy__ for Graph(name=root, id=4309347072) Memo dictionary: { } ptg 122 Data Structures root a b Figure 2.1. Deepcopy for an object graph with cycles Copying to new object Graph(name=root, id=4309347360) Calling __deepcopy__ for Graph(name=a, id=4309347144) Memo dictionary: { Graph(name=root, id=4309347072): Graph(name=root, id=4309347360), 4307936896: [’root’], 4309253504: ’root’} Copying to new object Graph(name=a, id=4309347504) Calling __deepcopy__ for Graph(name=root, id=4309347072) Already copied to Graph(name=root, id=4309347360) Calling __deepcopy__ for Graph(name=b, id=4309347216) Memo dictionary: { Graph(name=root, id=4309347072): Graph(name=root, id=4309347360), Graph(name=a, id=4309347144): Graph(name=a, id=4309347504), 4307936896: [ ’root’, ’a’, Graph(name=root, id=4309347072), Graph(name=a, id=4309347144)], 4308678136: ’a’, 4309253504: ’root’, 4309347072: Graph(name=root, id=4309347360), 4309347144: Graph(name=a, id=4309347504)} Copying to new object Graph(name=b, id=4309347864) The second time the root node is encountered, while the a node is being copied, __deepcopy__() detects the recursion and reuses the existing value from the memo dictionary instead of creating a new object. 第二次遇到 root 节点时,此时正在复制 a 节点,__deepcopy__() 检测到递归,会重用备忘 字典中现有的值,而不是创建一个新对象。 参见: copy (http://docs.python.org/library/copy.html) 这个模块的标准库文档。 2.9 pprint—美观打印数据结构 作用:美观打印数据结构。 Python 版本:1.4 及以后版本 pprint 包含一个“美观打印机”(pretty printer),用于生成数据结构的一个美观视图。格式 化工具会生成数据结构的一些表示,不仅可以由解释器正确地解析,而且便于人类阅读。输出 尽可能放在一行上,分解为多行时则需要缩进。 这一节中的例子都用到了 pprint_data.py,其中包含以下数据。 第 2 章 数 据 结 构 99  ptg 2.9. pprint—Pretty-Print Data Structures 123 See Also: copy (http://docs.python.org/library/copy.html) The standard library documenta- tion for this module. 2.9 pprint—Pretty-Print Data Structures Purpose Pretty-print data structures. Python Version 1.4 and later pprint contains a “pretty printer” for producing aesthetically pleasing views of data structures. The formatter produces representations of data structures that can be parsed correctly by the interpreter and are also easy for a human to read. The output is kept on a single line, if possible, and indented when split across multiple lines. The examples in this section all depend on pprint_data.py, which contains the following. data = [ (1, { ’a’:’A’, ’b’:’B’, ’c’:’C’, ’d’:’D’ }), (2, { ’e’:’E’, ’f’:’F’, ’g’:’G’, ’h’:’H’, ’i’:’I’, ’j’:’J’, ’k’:’K’, ’l’:’L’, }), ] 2.9.1 Printing The simplest way to use the module is through the pprint() function. from pprint import pprint from pprint_data import data print ’PRINT:’ print data print print ’PPRINT:’ pprint(data) pprint() formats an object and writes it to the data stream passed as argument (or sys.stdout by default). $ python pprint_pprint.py 2.9.1 打印 要使用这个模块,最简单的方法就是利用 pprint() 函数。 ptg 2.9. pprint—Pretty-Print Data Structures 123 See Also: copy (http://docs.python.org/library/copy.html) The standard library documenta- tion for this module. 2.9 pprint—Pretty-Print Data Structures Purpose Pretty-print data structures. Python Version 1.4 and later pprint contains a “pretty printer” for producing aesthetically pleasing views of data structures. The formatter produces representations of data structures that can be parsed correctly by the interpreter and are also easy for a human to read. The output is kept on a single line, if possible, and indented when split across multiple lines. The examples in this section all depend on pprint_data.py, which contains the following. data = [ (1, { ’a’:’A’, ’b’:’B’, ’c’:’C’, ’d’:’D’ }), (2, { ’e’:’E’, ’f’:’F’, ’g’:’G’, ’h’:’H’, ’i’:’I’, ’j’:’J’, ’k’:’K’, ’l’:’L’, }), ] 2.9.1 Printing The simplest way to use the module is through the pprint() function. from pprint import pprint from pprint_data import data print ’PRINT:’ print data print print ’PPRINT:’ pprint(data) pprint() formats an object and writes it to the data stream passed as argument (or sys.stdout by default). $ python pprint_pprint.py pprint() 格式化一个对象,并把它写至一个数据流,这个数据流作为参数传入(或者是默认 的 sys.stdout)。 ptg 2.9. pprint—Pretty-Print Data Structures 123 See Also: copy (http://docs.python.org/library/copy.html) The standard library documenta- tion for this module. 2.9 pprint—Pretty-Print Data Structures Purpose Pretty-print data structures. Python Version 1.4 and later pprint contains a “pretty printer” for producing aesthetically pleasing views of data structures. The formatter produces representations of data structures that can be parsed correctly by the interpreter and are also easy for a human to read. The output is kept on a single line, if possible, and indented when split across multiple lines. The examples in this section all depend on pprint_data.py, which contains the following. data = [ (1, { ’a’:’A’, ’b’:’B’, ’c’:’C’, ’d’:’D’ }), (2, { ’e’:’E’, ’f’:’F’, ’g’:’G’, ’h’:’H’, ’i’:’I’, ’j’:’J’, ’k’:’K’, ’l’:’L’, }), ] 2.9.1 Printing The simplest way to use the module is through the pprint() function. from pprint import pprint from pprint_data import data print ’PRINT:’ print data print print ’PPRINT:’ pprint(data) pprint() formats an object and writes it to the data stream passed as argument (or sys.stdout by default). $ python pprint_pprint.py ptg 124 Data Structures PRINT: [(1, {’a’: ’A’, ’c’: ’C’, ’b’: ’B’, ’d’: ’D’}), (2, {’e’: ’E’, ’g’: ’G’, ’f’: ’F’, ’i’: ’I’, ’h’: ’H’, ’k’: ’K’, ’j’: ’J’, ’l’: ’L’})] PPRINT: [(1, {’a’: ’A’, ’b’: ’B’, ’c’: ’C’, ’d’: ’D’}), (2, {’e’: ’E’, ’f’: ’F’, ’g’: ’G’, ’h’: ’H’, ’i’: ’I’, ’j’: ’J’, ’k’: ’K’, ’l’: ’L’})] 2.9.2 Formatting To format a data structure without writing it directly to a stream (i.e., for logging), use pformat() to build a string representation. import logging from pprint import pformat from pprint_data import data logging.basicConfig(level=logging.DEBUG, format=’%(levelname)-8s %(message)s’, ) logging.debug(’Logging pformatted data’) formatted = pformat(data) for line in formatted.splitlines(): logging.debug(line.rstrip()) The formatted string can then be printed or logged independently. $ python pprint_pformat.py DEBUG Logging pformatted data DEBUG [(1, {’a’: ’A’, ’b’: ’B’, ’c’: ’C’, ’d’: ’D’}), DEBUG (2, DEBUG {’e’: ’E’, DEBUG ’f’: ’F’, 2.9.2 格式化 要 格 式 化 一 个 数 据 结 构 而 不 把 它 直 接 写 至 一 个 流( 例 如 用 于 日 志 记 录 ), 可 以 使 用 pformat() 来构造一个字符串表示。 ptg 124 Data Structures PRINT: [(1, {’a’: ’A’, ’c’: ’C’, ’b’: ’B’, ’d’: ’D’}), (2, {’e’: ’E’, ’g’: ’G’, ’f’: ’F’, ’i’: ’I’, ’h’: ’H’, ’k’: ’K’, ’j’: ’J’, ’l’: ’L’})] PPRINT: [(1, {’a’: ’A’, ’b’: ’B’, ’c’: ’C’, ’d’: ’D’}), (2, {’e’: ’E’, ’f’: ’F’, ’g’: ’G’, ’h’: ’H’, ’i’: ’I’, ’j’: ’J’, ’k’: ’K’, ’l’: ’L’})] 2.9.2 Formatting To format a data structure without writing it directly to a stream (i.e., for logging), use pformat() to build a string representation. import logging from pprint import pformat from pprint_data import data logging.basicConfig(level=logging.DEBUG, format=’%(levelname)-8s %(message)s’, ) logging.debug(’Logging pformatted data’) formatted = pformat(data) for line in formatted.splitlines(): logging.debug(line.rstrip()) The formatted string can then be printed or logged independently. $ python pprint_pformat.py DEBUG Logging pformatted data DEBUG [(1, {’a’: ’A’, ’b’: ’B’, ’c’: ’C’, ’d’: ’D’}), DEBUG (2, DEBUG {’e’: ’E’, DEBUG ’f’: ’F’,  100  Python 标准库  ptg 124 Data Structures PRINT: [(1, {’a’: ’A’, ’c’: ’C’, ’b’: ’B’, ’d’: ’D’}), (2, {’e’: ’E’, ’g’: ’G’, ’f’: ’F’, ’i’: ’I’, ’h’: ’H’, ’k’: ’K’, ’j’: ’J’, ’l’: ’L’})] PPRINT: [(1, {’a’: ’A’, ’b’: ’B’, ’c’: ’C’, ’d’: ’D’}), (2, {’e’: ’E’, ’f’: ’F’, ’g’: ’G’, ’h’: ’H’, ’i’: ’I’, ’j’: ’J’, ’k’: ’K’, ’l’: ’L’})] 2.9.2 Formatting To format a data structure without writing it directly to a stream (i.e., for logging), use pformat() to build a string representation. import logging from pprint import pformat from pprint_data import data logging.basicConfig(level=logging.DEBUG, format=’%(levelname)-8s %(message)s’, ) logging.debug(’Logging pformatted data’) formatted = pformat(data) for line in formatted.splitlines(): logging.debug(line.rstrip()) The formatted string can then be printed or logged independently. $ python pprint_pformat.py DEBUG Logging pformatted data DEBUG [(1, {’a’: ’A’, ’b’: ’B’, ’c’: ’C’, ’d’: ’D’}), DEBUG (2, DEBUG {’e’: ’E’, DEBUG ’f’: ’F’, 然后可以单独地打印格式化的字符串或者记入日志。 ptg 124 Data Structures PRINT: [(1, {’a’: ’A’, ’c’: ’C’, ’b’: ’B’, ’d’: ’D’}), (2, {’e’: ’E’, ’g’: ’G’, ’f’: ’F’, ’i’: ’I’, ’h’: ’H’, ’k’: ’K’, ’j’: ’J’, ’l’: ’L’})] PPRINT: [(1, {’a’: ’A’, ’b’: ’B’, ’c’: ’C’, ’d’: ’D’}), (2, {’e’: ’E’, ’f’: ’F’, ’g’: ’G’, ’h’: ’H’, ’i’: ’I’, ’j’: ’J’, ’k’: ’K’, ’l’: ’L’})] 2.9.2 Formatting To format a data structure without writing it directly to a stream (i.e., for logging), use pformat() to build a string representation. import logging from pprint import pformat from pprint_data import data logging.basicConfig(level=logging.DEBUG, format=’%(levelname)-8s %(message)s’, ) logging.debug(’Logging pformatted data’) formatted = pformat(data) for line in formatted.splitlines(): logging.debug(line.rstrip()) The formatted string can then be printed or logged independently. $ python pprint_pformat.py DEBUG Logging pformatted data DEBUG [(1, {’a’: ’A’, ’b’: ’B’, ’c’: ’C’, ’d’: ’D’}), DEBUG (2, DEBUG {’e’: ’E’, DEBUG ’f’: ’F’, ptg 2.9. pprint—Pretty-Print Data Structures 125 DEBUG ’g’: ’G’, DEBUG ’h’: ’H’, DEBUG ’i’: ’I’, DEBUG ’j’: ’J’, DEBUG ’k’: ’K’, DEBUG ’l’: ’L’})] 2.9.3 Arbitrary Classes The PrettyPrinter class used by pprint() can also work with custom classes, if they define a __repr__() method. from pprint import pprint class node(object): def __init__(self, name, contents=[]): self.name = name self.contents = contents[:] def __repr__(self): return ( ’node(’ + repr(self.name) + ’, ’ + repr(self.contents) + ’)’ ) trees = [ node(’node-1’), node(’node-2’, [ node(’node-2-1’)]), node(’node-3’, [ node(’node-3-1’)]), ] pprint(trees) The representations of the nested objects are combined by the PrettyPrinter to return the full string representation. $ python pprint_arbitrary_object.py [node(’node-1’, []), node(’node-2’, [node(’node-2-1’, [])]), node(’node-3’, [node(’node-3-1’, [])])] 2.9.4 Recursion Recursive data structures are represented with a reference to the original source of the data, with the form . 2.9.3 任意类 如果定制类定义了一个 __repr__() 方法,pprint() 使用的 PrettyPrinter 类还可以处理这些定 制类。 ptg 2.9. pprint—Pretty-Print Data Structures 125 DEBUG ’g’: ’G’, DEBUG ’h’: ’H’, DEBUG ’i’: ’I’, DEBUG ’j’: ’J’, DEBUG ’k’: ’K’, DEBUG ’l’: ’L’})] 2.9.3 Arbitrary Classes The PrettyPrinter class used by pprint() can also work with custom classes, if they define a __repr__() method. from pprint import pprint class node(object): def __init__(self, name, contents=[]): self.name = name self.contents = contents[:] def __repr__(self): return ( ’node(’ + repr(self.name) + ’, ’ + repr(self.contents) + ’)’ ) trees = [ node(’node-1’), node(’node-2’, [ node(’node-2-1’)]), node(’node-3’, [ node(’node-3-1’)]), ] pprint(trees) The representations of the nested objects are combined by the PrettyPrinter to return the full string representation. $ python pprint_arbitrary_object.py [node(’node-1’, []), node(’node-2’, [node(’node-2-1’, [])]), node(’node-3’, [node(’node-3-1’, [])])] 2.9.4 Recursion Recursive data structures are represented with a reference to the original source of the data, with the form . 由 PrettyPrinter 组合嵌套对象的表示,从而返回完整的字符串表示。 ptg 2.9. pprint—Pretty-Print Data Structures 125 DEBUG ’g’: ’G’, DEBUG ’h’: ’H’, DEBUG ’i’: ’I’, DEBUG ’j’: ’J’, DEBUG ’k’: ’K’, DEBUG ’l’: ’L’})] 2.9.3 Arbitrary Classes The PrettyPrinter class used by pprint() can also work with custom classes, if they define a __repr__() method. from pprint import pprint class node(object): def __init__(self, name, contents=[]): self.name = name self.contents = contents[:] def __repr__(self): return ( ’node(’ + repr(self.name) + ’, ’ + repr(self.contents) + ’)’ ) trees = [ node(’node-1’), node(’node-2’, [ node(’node-2-1’)]), node(’node-3’, [ node(’node-3-1’)]), ] pprint(trees) The representations of the nested objects are combined by the PrettyPrinter to return the full string representation. $ python pprint_arbitrary_object.py [node(’node-1’, []), node(’node-2’, [node(’node-2-1’, [])]), node(’node-3’, [node(’node-3-1’, [])])] 2.9.4 Recursion Recursive data structures are represented with a reference to the original source of the data, with the form . 第 2 章 数 据 结 构 101  ptg 2.9. pprint—Pretty-Print Data Structures 125 DEBUG ’g’: ’G’, DEBUG ’h’: ’H’, DEBUG ’i’: ’I’, DEBUG ’j’: ’J’, DEBUG ’k’: ’K’, DEBUG ’l’: ’L’})] 2.9.3 Arbitrary Classes The PrettyPrinter class used by pprint() can also work with custom classes, if they define a __repr__() method. from pprint import pprint class node(object): def __init__(self, name, contents=[]): self.name = name self.contents = contents[:] def __repr__(self): return ( ’node(’ + repr(self.name) + ’, ’ + repr(self.contents) + ’)’ ) trees = [ node(’node-1’), node(’node-2’, [ node(’node-2-1’)]), node(’node-3’, [ node(’node-3-1’)]), ] pprint(trees) The representations of the nested objects are combined by the PrettyPrinter to return the full string representation. $ python pprint_arbitrary_object.py [node(’node-1’, []), node(’node-2’, [node(’node-2-1’, [])]), node(’node-3’, [node(’node-3-1’, [])])] 2.9.4 Recursion Recursive data structures are represented with a reference to the original source of the data, with the form . 2.9.4 递归 递归数据结构由指向原数据源的引用来表示,形式为 。 ptg 126 Data Structures from pprint import pprint local_data = [ ’a’, ’b’, 1, 2 ] local_data.append(local_data) print ’id(local_data) =>’, id(local_data) pprint(local_data) In this example, the list local_data is added to itself, creating a recursive reference. $ python pprint_recursion.py id(local_data) => 4309215280 [’a’, ’b’, 1, 2, ] 2.9.5 Limiting Nested Output For very deep data structures, it may not be desirable for the output to include all details. The data may not format properly, the formatted text might be too large to manage, or some of the data may be extraneous. from pprint import pprint from pprint_data import data pprint(data, depth=1) Use the depth argument to control how far down into the nested data structure the pretty printer recurses. Levels not included in the output are represented by an ellipsis. $ python pprint_depth.py [(...), (...)] 2.9.6 Controlling Output Width The default output width for the formatted text is 80 columns. To adjust that width, use the width argument to pprint(). from pprint import pprint 在这个例子中,列表 local_data 增加到其自身,这会创建一个递归引用。 ptg 126 Data Structures from pprint import pprint local_data = [ ’a’, ’b’, 1, 2 ] local_data.append(local_data) print ’id(local_data) =>’, id(local_data) pprint(local_data) In this example, the list local_data is added to itself, creating a recursive reference. $ python pprint_recursion.py id(local_data) => 4309215280 [’a’, ’b’, 1, 2, ] 2.9.5 Limiting Nested Output For very deep data structures, it may not be desirable for the output to include all details. The data may not format properly, the formatted text might be too large to manage, or some of the data may be extraneous. from pprint import pprint from pprint_data import data pprint(data, depth=1) Use the depth argument to control how far down into the nested data structure the pretty printer recurses. Levels not included in the output are represented by an ellipsis. $ python pprint_depth.py [(...), (...)] 2.9.6 Controlling Output Width The default output width for the formatted text is 80 columns. To adjust that width, use the width argument to pprint(). from pprint import pprint 2.9.5 限制嵌套输出 对于非常深的数据结构,可能不要求输出包含所有细节。有可能数据没有适当地格式化, 也可能格式化文本过大而无法管理,或者某些数据是多余的。 ptg 126 Data Structures from pprint import pprint local_data = [ ’a’, ’b’, 1, 2 ] local_data.append(local_data) print ’id(local_data) =>’, id(local_data) pprint(local_data) In this example, the list local_data is added to itself, creating a recursive reference. $ python pprint_recursion.py id(local_data) => 4309215280 [’a’, ’b’, 1, 2, ] 2.9.5 Limiting Nested Output For very deep data structures, it may not be desirable for the output to include all details. The data may not format properly, the formatted text might be too large to manage, or some of the data may be extraneous. from pprint import pprint from pprint_data import data pprint(data, depth=1) Use the depth argument to control how far down into the nested data structure the pretty printer recurses. Levels not included in the output are represented by an ellipsis. $ python pprint_depth.py [(...), (...)] 2.9.6 Controlling Output Width The default output width for the formatted text is 80 columns. To adjust that width, use the width argument to pprint(). from pprint import pprint 使用 depth 参数可以控制美观打印机递归处理嵌套数据结构的深度。输出中未包含的层次 由一个省略号表示。 ptg 126 Data Structures from pprint import pprint local_data = [ ’a’, ’b’, 1, 2 ] local_data.append(local_data) print ’id(local_data) =>’, id(local_data) pprint(local_data) In this example, the list local_data is added to itself, creating a recursive reference. $ python pprint_recursion.py id(local_data) => 4309215280 [’a’, ’b’, 1, 2, ] 2.9.5 Limiting Nested Output For very deep data structures, it may not be desirable for the output to include all details. The data may not format properly, the formatted text might be too large to manage, or some of the data may be extraneous. from pprint import pprint from pprint_data import data pprint(data, depth=1) Use the depth argument to control how far down into the nested data structure the pretty printer recurses. Levels not included in the output are represented by an ellipsis. $ python pprint_depth.py [(...), (...)] 2.9.6 Controlling Output Width The default output width for the formatted text is 80 columns. To adjust that width, use the width argument to pprint(). from pprint import pprint 2.9.6 控制输出宽度 格式化文本的默认输出宽度为 80 列。要调整这个宽度,可以在 pprint() 中使用参数 width。 ptg 126 Data Structures from pprint import pprint local_data = [ ’a’, ’b’, 1, 2 ] local_data.append(local_data) print ’id(local_data) =>’, id(local_data) pprint(local_data) In this example, the list local_data is added to itself, creating a recursive reference. $ python pprint_recursion.py id(local_data) => 4309215280 [’a’, ’b’, 1, 2, ] 2.9.5 Limiting Nested Output For very deep data structures, it may not be desirable for the output to include all details. The data may not format properly, the formatted text might be too large to manage, or some of the data may be extraneous. from pprint import pprint from pprint_data import data pprint(data, depth=1) Use the depth argument to control how far down into the nested data structure the pretty printer recurses. Levels not included in the output are represented by an ellipsis. $ python pprint_depth.py [(...), (...)] 2.9.6 Controlling Output Width The default output width for the formatted text is 80 columns. To adjust that width, use the width argument to pprint(). from pprint import pprint ptg 2.9. pprint—Pretty-Print Data Structures 127 from pprint_data import data for width in [ 80, 5 ]: print ’WIDTH =’, width pprint(data, width=width) print When the width is too low to accommodate the formatted data structure, the lines are not truncated or wrapped if that would introduce invalid syntax. $ python pprint_width.py WIDTH = 80 [(1, {’a’: ’A’, ’b’: ’B’, ’c’: ’C’, ’d’: ’D’}), (2, {’e’: ’E’, ’f’: ’F’, ’g’: ’G’, ’h’: ’H’, ’i’: ’I’, ’j’: ’J’, ’k’: ’K’, ’l’: ’L’})] WIDTH = 5 [(1, {’a’: ’A’, ’b’: ’B’, ’c’: ’C’, ’d’: ’D’}), (2, {’e’: ’E’, ’f’: ’F’, ’g’: ’G’, ’h’: ’H’, ’i’: ’I’, ’j’: ’J’, ’k’: ’K’, ’l’: ’L’})] See Also: pprint (http://docs.python.org/lib/module-pprint.html) Standard library documen- tation for this module.  102  Python 标准库  宽度太小不能适应格式化数据结构时,如果截断或转行会引入非法的语法,就不会进行截 断或转行。 ptg 2.9. pprint—Pretty-Print Data Structures 127 from pprint_data import data for width in [ 80, 5 ]: print ’WIDTH =’, width pprint(data, width=width) print When the width is too low to accommodate the formatted data structure, the lines are not truncated or wrapped if that would introduce invalid syntax. $ python pprint_width.py WIDTH = 80 [(1, {’a’: ’A’, ’b’: ’B’, ’c’: ’C’, ’d’: ’D’}), (2, {’e’: ’E’, ’f’: ’F’, ’g’: ’G’, ’h’: ’H’, ’i’: ’I’, ’j’: ’J’, ’k’: ’K’, ’l’: ’L’})] WIDTH = 5 [(1, {’a’: ’A’, ’b’: ’B’, ’c’: ’C’, ’d’: ’D’}), (2, {’e’: ’E’, ’f’: ’F’, ’g’: ’G’, ’h’: ’H’, ’i’: ’I’, ’j’: ’J’, ’k’: ’K’, ’l’: ’L’})] See Also: pprint (http://docs.python.org/lib/module-pprint.html) Standard library documen- tation for this module. 参见: pprint (http://docs.python.org/lib/module-pprint.html) 这个模块的标准库文档。 第 3 章 算  法 Python 包含很多模块,可以采用对任务最适用的方式精巧而简洁地实现算法。它支持不同 的编程方式,包括纯过程式、面向对象和函数式。这 3 种方式经常在同一个程序的不同部分混 合使用。 functools 包含的函数用于创建函数修饰符、启用面向方面的编程以及传统面向对象方法所 不能支持的代码重用。它还提供了一个类修饰符使用一个快捷方式来实现所有富比较 API,另 外提供了 partial 对象用来创建函数(包含其参数)的引用。 itertools 模块包含的函数用于创建和处理函数式编程中使用的迭代器和生成器。利用 operator 模块,通过提供基于函数的内置操作接口,如算术操作或元素查找,使用函数式编程 时不再需要很多 lambda 函数。 contextlib 使得对于所有编程方式来说资源管理会更容易、更可靠,而且更简洁。结合上下 文管理器和 with 语句,可以减少 try:finally 块的个数和所需的缩进层次,同时还能确保文件、 套接字、数据库事务和其他资源在适当的时候关闭和释放。 3.1 functools—管理函数的工具 作用:处理其他函数的函数。 Python 版本:2.5 及以后版本 functools 模块提供了一些工具来调整或扩展函数和其他可回调对象,而不必完全重写。 3.1.1 修饰符 functools 模块提供的主要工具是 partial 类,它可以用来“包装”一个有默认参数的可回 调对象。得到的对象本身是可回调的,可以看作就像是原来的函数。它与原函数的参数完全相 同,调用时也可以提供额外的位置或命名参数。可以使用 partial 而不是 lambda 为函数提供默 认参数,有些参数可以不指定。 partial 对象 下面这个例子显示了函数 myfunc() 的两个简单的 partial 对象。show_details() 的输出包含 这个部分对象的 func、args 和 keywords 属性。 ptg 130 Algorithms 3.1.1 Decorators The primary tool supplied by the functools module is the class partial, which can be used to “wrap” a callable object with default arguments. The resulting object is itself callable and can be treated as though it is the original function. It takes all the same arguments as the original, and it can be invoked with extra positional or named arguments as well. A partial can be used instead of a lambda to provide default arguments to a function, while leaving some arguments unspecified. Partial Objects This example shows two simple partial objects for the function myfunc(). The output of show_details() includes the func, args, and keywords attributes of the partial object. import functools def myfunc(a, b=2): """Docstring for myfunc().""" print ’ called myfunc with:’, (a, b) return def show_details(name, f, is_partial=False): """Show details of a callable object.""" print ’%s:’ % name print ’ object:’, f if not is_partial: print ’ __name__:’, f.__name__ if is_partial: print ’ func:’, f.func print ’ args:’, f.args print ’ keywords:’, f.keywords return show_details(’myfunc’, myfunc) myfunc(’a’, 3) print # Set a different default value for ’b’, but require # the caller to provide ’a’. p1 = functools.partial(myfunc, b=4) show_details(’partial with named default’, p1, True)  104  Python 标准库  ptg 130 Algorithms 3.1.1 Decorators The primary tool supplied by the functools module is the class partial, which can be used to “wrap” a callable object with default arguments. The resulting object is itself callable and can be treated as though it is the original function. It takes all the same arguments as the original, and it can be invoked with extra positional or named arguments as well. A partial can be used instead of a lambda to provide default arguments to a function, while leaving some arguments unspecified. Partial Objects This example shows two simple partial objects for the function myfunc(). The output of show_details() includes the func, args, and keywords attributes of the partial object. import functools def myfunc(a, b=2): """Docstring for myfunc().""" print ’ called myfunc with:’, (a, b) return def show_details(name, f, is_partial=False): """Show details of a callable object.""" print ’%s:’ % name print ’ object:’, f if not is_partial: print ’ __name__:’, f.__name__ if is_partial: print ’ func:’, f.func print ’ args:’, f.args print ’ keywords:’, f.keywords return show_details(’myfunc’, myfunc) myfunc(’a’, 3) print # Set a different default value for ’b’, but require # the caller to provide ’a’. p1 = functools.partial(myfunc, b=4) show_details(’partial with named default’, p1, True) ptg 3.1. functools—Tools for Manipulating Functions 131 p1(’passing a’) p1(’override b’, b=5) print # Set default values for both ’a’ and ’b’. p2 = functools.partial(myfunc, ’default a’, b=99) show_details(’partial with defaults’, p2, True) p2() p2(b=’override b’) print print ’Insufficient arguments:’ p1() At the end of the example, the first partial created is invoked without passing a value for a, causing an exception. $ python functools_partial.py myfunc: object: __name__: myfunc called myfunc with: (’a’, 3) partial with named default: object: func: args: () keywords: {’b’: 4} called myfunc with: (’passing a’, 4) called myfunc with: (’override b’, 5) partial with defaults: object: func: args: (’default a’,) keywords: {’b’: 99} called myfunc with: (’default a’, 99) called myfunc with: (’default a’, ’override b’) Insufficient arguments: Traceback (most recent call last): 在这个例子的最后,调用了之前创建的第一个 partial,但没有为 a 传入一个值,这就导致 一个异常。 ptg 3.1. functools—Tools for Manipulating Functions 131 p1(’passing a’) p1(’override b’, b=5) print # Set default values for both ’a’ and ’b’. p2 = functools.partial(myfunc, ’default a’, b=99) show_details(’partial with defaults’, p2, True) p2() p2(b=’override b’) print print ’Insufficient arguments:’ p1() At the end of the example, the first partial created is invoked without passing a value for a, causing an exception. $ python functools_partial.py myfunc: object: __name__: myfunc called myfunc with: (’a’, 3) partial with named default: object: func: args: () keywords: {’b’: 4} called myfunc with: (’passing a’, 4) called myfunc with: (’override b’, 5) partial with defaults: object: func: args: (’default a’,) keywords: {’b’: 99} called myfunc with: (’default a’, 99) called myfunc with: (’default a’, ’override b’) Insufficient arguments: Traceback (most recent call last): 第 3 章 算  法 105  ptg 3.1. functools—Tools for Manipulating Functions 131 p1(’passing a’) p1(’override b’, b=5) print # Set default values for both ’a’ and ’b’. p2 = functools.partial(myfunc, ’default a’, b=99) show_details(’partial with defaults’, p2, True) p2() p2(b=’override b’) print print ’Insufficient arguments:’ p1() At the end of the example, the first partial created is invoked without passing a value for a, causing an exception. $ python functools_partial.py myfunc: object: __name__: myfunc called myfunc with: (’a’, 3) partial with named default: object: func: args: () keywords: {’b’: 4} called myfunc with: (’passing a’, 4) called myfunc with: (’override b’, 5) partial with defaults: object: func: args: (’default a’,) keywords: {’b’: 99} called myfunc with: (’default a’, 99) called myfunc with: (’default a’, ’override b’) Insufficient arguments: Traceback (most recent call last): ptg 132 Algorithms File "functools_partial.py", line 51, in p1() TypeError: myfunc() takes at least 1 argument (1 given) Acquiring Function Properties The partial object does not have __name__ or __doc__ attributes by default, and without those attributes, decorated functions are more difficult to debug. Using update_wrapper() copies or adds attributes from the original function to the partial object. import functools def myfunc(a, b=2): """Docstring for myfunc().""" print ’ called myfunc with:’, (a, b) return def show_details(name, f): """Show details of a callable object.""" print ’%s:’ % name print ’ object:’, f print ’ __name__:’, try: print f.__name__ except AttributeError: print ’(no __name__)’ print ’ __doc__’, repr(f.__doc__) print return show_details(’myfunc’, myfunc) p1 = functools.partial(myfunc, b=4) show_details(’raw wrapper’, p1) print ’Updating wrapper:’ print ’ assign:’, functools.WRAPPER_ASSIGNMENTS print ’ update:’, functools.WRAPPER_UPDATES print functools.update_wrapper(p1, myfunc) show_details(’updated wrapper’, p1) 获取函数属性 默认情况下,partial 对象没有 __name__ 或 _doc_ 属性。如果没有这些属性,修饰的函 数将更难调试。使用 update_wrapper() 可以从原函数将属性复制或添加到 partial 对象。 ptg 132 Algorithms File "functools_partial.py", line 51, in p1() TypeError: myfunc() takes at least 1 argument (1 given) Acquiring Function Properties The partial object does not have __name__ or __doc__ attributes by default, and without those attributes, decorated functions are more difficult to debug. Using update_wrapper() copies or adds attributes from the original function to the partial object. import functools def myfunc(a, b=2): """Docstring for myfunc().""" print ’ called myfunc with:’, (a, b) return def show_details(name, f): """Show details of a callable object.""" print ’%s:’ % name print ’ object:’, f print ’ __name__:’, try: print f.__name__ except AttributeError: print ’(no __name__)’ print ’ __doc__’, repr(f.__doc__) print return show_details(’myfunc’, myfunc) p1 = functools.partial(myfunc, b=4) show_details(’raw wrapper’, p1) print ’Updating wrapper:’ print ’ assign:’, functools.WRAPPER_ASSIGNMENTS print ’ update:’, functools.WRAPPER_UPDATES print functools.update_wrapper(p1, myfunc) show_details(’updated wrapper’, p1)  106  Python 标准库  ptg 132 Algorithms File "functools_partial.py", line 51, in p1() TypeError: myfunc() takes at least 1 argument (1 given) Acquiring Function Properties The partial object does not have __name__ or __doc__ attributes by default, and without those attributes, decorated functions are more difficult to debug. Using update_wrapper() copies or adds attributes from the original function to the partial object. import functools def myfunc(a, b=2): """Docstring for myfunc().""" print ’ called myfunc with:’, (a, b) return def show_details(name, f): """Show details of a callable object.""" print ’%s:’ % name print ’ object:’, f print ’ __name__:’, try: print f.__name__ except AttributeError: print ’(no __name__)’ print ’ __doc__’, repr(f.__doc__) print return show_details(’myfunc’, myfunc) p1 = functools.partial(myfunc, b=4) show_details(’raw wrapper’, p1) print ’Updating wrapper:’ print ’ assign:’, functools.WRAPPER_ASSIGNMENTS print ’ update:’, functools.WRAPPER_UPDATES print functools.update_wrapper(p1, myfunc) show_details(’updated wrapper’, p1) 添加到包装器的属性在 WRAPPER_ASSIGNMENTS 中定义,而 WRAPPER_UPDATES 列 出了要修改的值。 ptg 3.1. functools—Tools for Manipulating Functions 133 The attributes added to the wrapper are defined in WRAPPER_ASSIGNMENTS, while WRAPPER_UPDATES lists values to be modified. $ python functools_update_wrapper.py myfunc: object: __name__: myfunc __doc__ ’Docstring for myfunc().’ raw wrapper: object: __name__: (no __name__) __doc__ ’partial(func, *args, **keywords) - new function with parti al application\n of the given arguments and keywords.\n’ Updating wrapper: assign: (’__module__’, ’__name__’, ’__doc__’) update: (’__dict__’,) updated wrapper: object: __name__: myfunc __doc__ ’Docstring for myfunc().’ Other Callables Partials work with any callable object, not just with stand-alone functions. import functools class MyClass(object): """Demonstration class for functools""" def method1(self, a, b=2): """Docstring for method1().""" print ’ called method1 with:’, (self, a, b) return def method2(self, c, d=5): """Docstring for method2""" print ’ called method2 with:’, (self, c, d) return 其他可回调对象 Partial 适用于任何可回调对象,而不只是单独的函数。 ptg 3.1. functools—Tools for Manipulating Functions 133 The attributes added to the wrapper are defined in WRAPPER_ASSIGNMENTS, while WRAPPER_UPDATES lists values to be modified. $ python functools_update_wrapper.py myfunc: object: __name__: myfunc __doc__ ’Docstring for myfunc().’ raw wrapper: object: __name__: (no __name__) __doc__ ’partial(func, *args, **keywords) - new function with parti al application\n of the given arguments and keywords.\n’ Updating wrapper: assign: (’__module__’, ’__name__’, ’__doc__’) update: (’__dict__’,) updated wrapper: object: __name__: myfunc __doc__ ’Docstring for myfunc().’ Other Callables Partials work with any callable object, not just with stand-alone functions. import functools class MyClass(object): """Demonstration class for functools""" def method1(self, a, b=2): """Docstring for method1().""" print ’ called method1 with:’, (self, a, b) return def method2(self, c, d=5): """Docstring for method2""" print ’ called method2 with:’, (self, c, d) return 第 3 章 算  法 107  ptg 3.1. functools—Tools for Manipulating Functions 133 The attributes added to the wrapper are defined in WRAPPER_ASSIGNMENTS, while WRAPPER_UPDATES lists values to be modified. $ python functools_update_wrapper.py myfunc: object: __name__: myfunc __doc__ ’Docstring for myfunc().’ raw wrapper: object: __name__: (no __name__) __doc__ ’partial(func, *args, **keywords) - new function with parti al application\n of the given arguments and keywords.\n’ Updating wrapper: assign: (’__module__’, ’__name__’, ’__doc__’) update: (’__dict__’,) updated wrapper: object: __name__: myfunc __doc__ ’Docstring for myfunc().’ Other Callables Partials work with any callable object, not just with stand-alone functions. import functools class MyClass(object): """Demonstration class for functools""" def method1(self, a, b=2): """Docstring for method1().""" print ’ called method1 with:’, (self, a, b) return def method2(self, c, d=5): """Docstring for method2""" print ’ called method2 with:’, (self, c, d) return ptg 134 Algorithms wrapped_method2 = functools.partial(method2, ’wrapped c’) functools.update_wrapper(wrapped_method2, method2) def __call__(self, e, f=6): """Docstring for MyClass.__call__""" print ’ called object with:’, (self, e, f) return def show_details(name, f): """Show details of a callable object.""" print ’%s:’ % name print ’ object:’, f print ’ __name__:’, try: print f.__name__ except AttributeError: print ’(no __name__)’ print ’ __doc__’, repr(f.__doc__) return o = MyClass() show_details(’method1 straight’, o.method1) o.method1(’no default for a’, b=3) print p1 = functools.partial(o.method1, b=4) functools.update_wrapper(p1, o.method1) show_details(’method1 wrapper’, p1) p1(’a goes here’) print show_details(’method2’, o.method2) o.method2(’no default for c’, d=6) print show_details(’wrapped method2’, o.wrapped_method2) o.wrapped_method2(’no default for c’, d=6) print show_details(’instance’, o) o(’no default for e’) print  108  Python 标准库  ptg 134 Algorithms wrapped_method2 = functools.partial(method2, ’wrapped c’) functools.update_wrapper(wrapped_method2, method2) def __call__(self, e, f=6): """Docstring for MyClass.__call__""" print ’ called object with:’, (self, e, f) return def show_details(name, f): """Show details of a callable object.""" print ’%s:’ % name print ’ object:’, f print ’ __name__:’, try: print f.__name__ except AttributeError: print ’(no __name__)’ print ’ __doc__’, repr(f.__doc__) return o = MyClass() show_details(’method1 straight’, o.method1) o.method1(’no default for a’, b=3) print p1 = functools.partial(o.method1, b=4) functools.update_wrapper(p1, o.method1) show_details(’method1 wrapper’, p1) p1(’a goes here’) print show_details(’method2’, o.method2) o.method2(’no default for c’, d=6) print show_details(’wrapped method2’, o.wrapped_method2) o.wrapped_method2(’no default for c’, d=6) print show_details(’instance’, o) o(’no default for e’) print ptg 3.1. functools—Tools for Manipulating Functions 135 p2 = functools.partial(o, f=7) show_details(’instance wrapper’, p2) p2(’e goes here’) This example creates partials from an instance and methods of an instance. $ python functools_method.py method1 straight: object: > __name__: method1 __doc__ ’Docstring for method1().’ called method1 with: (<__main__.MyClass object at 0x100da3550>, ’n o default for a’, 3) method1 wrapper: object: __name__: method1 __doc__ ’Docstring for method1().’ called method1 with: (<__main__.MyClass object at 0x100da3550>, ’a goes here’, 4) method2: object: > __name__: method2 __doc__ ’Docstring for method2’ called method2 with: (<__main__.MyClass object at 0x100da3550>, ’n o default for c’, 6) wrapped method2: object: __name__: method2 __doc__ ’Docstring for method2’ called method2 with: (’wrapped c’, ’no default for c’, 6) instance: object: <__main__.MyClass object at 0x100da3550> __name__: (no __name__) __doc__ ’Demonstration class for functools’ called object with: (<__main__.MyClass object at 0x100da3550>, ’no 这个例子由一个实例以及实例的方法来创建部分对象。 ptg 3.1. functools—Tools for Manipulating Functions 135 p2 = functools.partial(o, f=7) show_details(’instance wrapper’, p2) p2(’e goes here’) This example creates partials from an instance and methods of an instance. $ python functools_method.py method1 straight: object: > __name__: method1 __doc__ ’Docstring for method1().’ called method1 with: (<__main__.MyClass object at 0x100da3550>, ’n o default for a’, 3) method1 wrapper: object: __name__: method1 __doc__ ’Docstring for method1().’ called method1 with: (<__main__.MyClass object at 0x100da3550>, ’a goes here’, 4) method2: object: > __name__: method2 __doc__ ’Docstring for method2’ called method2 with: (<__main__.MyClass object at 0x100da3550>, ’n o default for c’, 6) wrapped method2: object: __name__: method2 __doc__ ’Docstring for method2’ called method2 with: (’wrapped c’, ’no default for c’, 6) instance: object: <__main__.MyClass object at 0x100da3550> __name__: (no __name__) __doc__ ’Demonstration class for functools’ called object with: (<__main__.MyClass object at 0x100da3550>, ’no ptg 136 Algorithms default for e’, 6) instance wrapper: object: __name__: (no __name__) __doc__ ’partial(func, *args, **keywords) - new function with part ial application\n of the given arguments and keywords.\n’ called object with: (<__main__.MyClass object at 0x100da3550>, ’e goes here’, 7) Acquiring Function Properties for Decorators Updating the properties of a wrapped callable is especially useful when used in a dec- orator, since the transformed function ends up with properties of the original “bare” function. import functools def show_details(name, f): """Show details of a callable object.""" print ’%s:’ % name print ’ object:’, f print ’ __name__:’, try: print f.__name__ except AttributeError: print ’(no __name__)’ print ’ __doc__’, repr(f.__doc__) print return def simple_decorator(f): @functools.wraps(f) def decorated(a=’decorated defaults’, b=1): print ’ decorated:’, (a, b) print ’ ’, f(a, b=b) return return decorated def myfunc(a, b=2): "myfunc() is not complicated" print ’ myfunc:’, (a,b) return 第 3 章 算  法 109  ptg 136 Algorithms default for e’, 6) instance wrapper: object: __name__: (no __name__) __doc__ ’partial(func, *args, **keywords) - new function with part ial application\n of the given arguments and keywords.\n’ called object with: (<__main__.MyClass object at 0x100da3550>, ’e goes here’, 7) Acquiring Function Properties for Decorators Updating the properties of a wrapped callable is especially useful when used in a dec- orator, since the transformed function ends up with properties of the original “bare” function. import functools def show_details(name, f): """Show details of a callable object.""" print ’%s:’ % name print ’ object:’, f print ’ __name__:’, try: print f.__name__ except AttributeError: print ’(no __name__)’ print ’ __doc__’, repr(f.__doc__) print return def simple_decorator(f): @functools.wraps(f) def decorated(a=’decorated defaults’, b=1): print ’ decorated:’, (a, b) print ’ ’, f(a, b=b) return return decorated def myfunc(a, b=2): "myfunc() is not complicated" print ’ myfunc:’, (a,b) return 为修饰符获取函数属性 在修饰符中使用时,更新包装的可回调对象的属性尤其有用,因为变换后的函数最后会得 到原“祼”函数的属性。 ptg 136 Algorithms default for e’, 6) instance wrapper: object: __name__: (no __name__) __doc__ ’partial(func, *args, **keywords) - new function with part ial application\n of the given arguments and keywords.\n’ called object with: (<__main__.MyClass object at 0x100da3550>, ’e goes here’, 7) Acquiring Function Properties for Decorators Updating the properties of a wrapped callable is especially useful when used in a dec- orator, since the transformed function ends up with properties of the original “bare” function. import functools def show_details(name, f): """Show details of a callable object.""" print ’%s:’ % name print ’ object:’, f print ’ __name__:’, try: print f.__name__ except AttributeError: print ’(no __name__)’ print ’ __doc__’, repr(f.__doc__) print return def simple_decorator(f): @functools.wraps(f) def decorated(a=’decorated defaults’, b=1): print ’ decorated:’, (a, b) print ’ ’, f(a, b=b) return return decorated def myfunc(a, b=2): "myfunc() is not complicated" print ’ myfunc:’, (a,b) return ptg 3.1. functools—Tools for Manipulating Functions 137 # The raw function show_details(’myfunc’, myfunc) myfunc(’unwrapped, default b’) myfunc(’unwrapped, passing b’, 3) print # Wrap explicitly wrapped_myfunc = simple_decorator(myfunc) show_details(’wrapped_myfunc’, wrapped_myfunc) wrapped_myfunc() wrapped_myfunc(’args to wrapped’, 4) print # Wrap with decorator syntax @simple_decorator def decorated_myfunc(a, b): myfunc(a, b) return show_details(’decorated_myfunc’, decorated_myfunc) decorated_myfunc() decorated_myfunc(’args to decorated’, 4) functools provides a decorator, wraps(), that applies update_wrapper() to the decorated function. $ python functools_wraps.py myfunc: object: __name__: myfunc __doc__ ’myfunc() is not complicated’ myfunc: (’unwrapped, default b’, 2) myfunc: (’unwrapped, passing b’, 3) wrapped_myfunc: object: __name__: myfunc __doc__ ’myfunc() is not complicated’ decorated: (’decorated defaults’, 1) myfunc: (’decorated defaults’, 1)  110  Python 标准库  ptg 3.1. functools—Tools for Manipulating Functions 137 # The raw function show_details(’myfunc’, myfunc) myfunc(’unwrapped, default b’) myfunc(’unwrapped, passing b’, 3) print # Wrap explicitly wrapped_myfunc = simple_decorator(myfunc) show_details(’wrapped_myfunc’, wrapped_myfunc) wrapped_myfunc() wrapped_myfunc(’args to wrapped’, 4) print # Wrap with decorator syntax @simple_decorator def decorated_myfunc(a, b): myfunc(a, b) return show_details(’decorated_myfunc’, decorated_myfunc) decorated_myfunc() decorated_myfunc(’args to decorated’, 4) functools provides a decorator, wraps(), that applies update_wrapper() to the decorated function. $ python functools_wraps.py myfunc: object: __name__: myfunc __doc__ ’myfunc() is not complicated’ myfunc: (’unwrapped, default b’, 2) myfunc: (’unwrapped, passing b’, 3) wrapped_myfunc: object: __name__: myfunc __doc__ ’myfunc() is not complicated’ decorated: (’decorated defaults’, 1) myfunc: (’decorated defaults’, 1) functools 提供了一个修饰符 wraps(),它会对所修饰的函数应用 update_wrapper()。 ptg 3.1. functools—Tools for Manipulating Functions 137 # The raw function show_details(’myfunc’, myfunc) myfunc(’unwrapped, default b’) myfunc(’unwrapped, passing b’, 3) print # Wrap explicitly wrapped_myfunc = simple_decorator(myfunc) show_details(’wrapped_myfunc’, wrapped_myfunc) wrapped_myfunc() wrapped_myfunc(’args to wrapped’, 4) print # Wrap with decorator syntax @simple_decorator def decorated_myfunc(a, b): myfunc(a, b) return show_details(’decorated_myfunc’, decorated_myfunc) decorated_myfunc() decorated_myfunc(’args to decorated’, 4) functools provides a decorator, wraps(), that applies update_wrapper() to the decorated function. $ python functools_wraps.py myfunc: object: __name__: myfunc __doc__ ’myfunc() is not complicated’ myfunc: (’unwrapped, default b’, 2) myfunc: (’unwrapped, passing b’, 3) wrapped_myfunc: object: __name__: myfunc __doc__ ’myfunc() is not complicated’ decorated: (’decorated defaults’, 1) myfunc: (’decorated defaults’, 1) ptg 138 Algorithms decorated: (’args to wrapped’, 4) myfunc: (’args to wrapped’, 4) decorated_myfunc: object: __name__: decorated_myfunc __doc__ None decorated: (’decorated defaults’, 1) myfunc: (’decorated defaults’, 1) decorated: (’args to decorated’, 4) myfunc: (’args to decorated’, 4) 3.1.2 Comparison Under Python 2, classes can define a __cmp__() method that returns -1, 0, or 1 based on whether the object is less than, equal to, or greater than the item being compared. Python 2.1 introduces the rich comparison methods API (__lt__(), __le__(), __eq__(), __ne__(), __gt__(), and __ge__()), which perform a single compari- son operation and return a Boolean value. Python 3 deprecated __cmp__() in favor of these new methods, so functools provides tools to make it easier to write Python 2 classes that comply with the new comparison requirements in Python 3. Rich Comparison The rich comparison API is designed to allow classes with complex comparisons to implement each test in the most efficient way possible. However, for classes where comparison is relatively simple, there is no point in manually creating each of the rich comparison methods. The total_ordering() class decorator takes a class that pro- vides some of the methods and adds the rest of them. import functools import inspect from pprint import pprint @functools.total_ordering class MyObject(object): def __init__(self, val): self.val = val def __eq__(self, other): print ’ testing __eq__(%s, %s)’ % (self.val, other.val) return self.val == other.val 第 3 章 算  法 111  3.1.2 比较 在 Python 2 中,类可以定义一个 __cmp__() 方法,后者会根据这个对象小于、等于还是 大于所比较的元素返回 –1、0 或 1。Python 2.1 引入了富比较(rich comparison)方法 API(__ lt__()、__le__()、__eq__()、__ne__()、__gt__() 和 __ge__()),可以完成一个比较操作并返回一 个 Boolean 值。Python 3 则废弃了 __cmp__() 而代之以这些新方法,所以 functools 提供了一些 工具,以便于编写 Python 2 类而且同时符合 Python 3 中新的比较需求。 富比较 设计富比较 API 是为了支持涉及复杂比较的类,从而以最高效的方式实现各个测试。不 过,对于比较相对简单的类,手动地创建各个富比较方法就没有必要了。total_ordering() 类修 饰符取一个提供了部分方法的类,并添加其余的方法。 ptg 138 Algorithms decorated: (’args to wrapped’, 4) myfunc: (’args to wrapped’, 4) decorated_myfunc: object: __name__: decorated_myfunc __doc__ None decorated: (’decorated defaults’, 1) myfunc: (’decorated defaults’, 1) decorated: (’args to decorated’, 4) myfunc: (’args to decorated’, 4) 3.1.2 Comparison Under Python 2, classes can define a __cmp__() method that returns -1, 0, or 1 based on whether the object is less than, equal to, or greater than the item being compared. Python 2.1 introduces the rich comparison methods API (__lt__(), __le__(), __eq__(), __ne__(), __gt__(), and __ge__()), which perform a single compari- son operation and return a Boolean value. Python 3 deprecated __cmp__() in favor of these new methods, so functools provides tools to make it easier to write Python 2 classes that comply with the new comparison requirements in Python 3. Rich Comparison The rich comparison API is designed to allow classes with complex comparisons to implement each test in the most efficient way possible. However, for classes where comparison is relatively simple, there is no point in manually creating each of the rich comparison methods. The total_ordering() class decorator takes a class that pro- vides some of the methods and adds the rest of them. import functools import inspect from pprint import pprint @functools.total_ordering class MyObject(object): def __init__(self, val): self.val = val def __eq__(self, other): print ’ testing __eq__(%s, %s)’ % (self.val, other.val) return self.val == other.val ptg 3.1. functools—Tools for Manipulating Functions 139 def __gt__(self, other): print ’ testing __gt__(%s, %s)’ % (self.val, other.val) return self.val > other.val print ’Methods:\n’ pprint(inspect.getmembers(MyObject, inspect.ismethod)) a = MyObject(1) b = MyObject(2) print ’\nComparisons:’ for expr in [ ’a < b’, ’a <= b’, ’a == b’, ’a >= b’, ’a > b’ ]: print ’\n%-6s:’ % expr result = eval(expr) print ’ result of %s: %s’ % (expr, result) The class must provide implementation of __eq__() and one other rich compar- ison method. The decorator adds implementations of the rest of the methods that work by using the comparisons provided. $ python functools_total_ordering.py Methods: [(’__eq__’, ), (’__ge__’, ), (’__gt__’, ), (’__init__’, ), (’__le__’, ), (’__lt__’, )] Comparisons: a < b : testing __gt__(2, 1) result of a < b: True a <= b: testing __gt__(1, 2) result of a <= b: True a == b: testing __eq__(1, 2) result of a == b: False 这个类必须提供 __eq__() 和另外一个富比较方法的实现。修饰符会增加其余方法的实现, 它们要利用类提供的比较来完成工作。 ptg 3.1. functools—Tools for Manipulating Functions 139 def __gt__(self, other): print ’ testing __gt__(%s, %s)’ % (self.val, other.val) return self.val > other.val print ’Methods:\n’ pprint(inspect.getmembers(MyObject, inspect.ismethod)) a = MyObject(1) b = MyObject(2) print ’\nComparisons:’ for expr in [ ’a < b’, ’a <= b’, ’a == b’, ’a >= b’, ’a > b’ ]: print ’\n%-6s:’ % expr result = eval(expr) print ’ result of %s: %s’ % (expr, result) The class must provide implementation of __eq__() and one other rich compar- ison method. The decorator adds implementations of the rest of the methods that work by using the comparisons provided. $ python functools_total_ordering.py Methods: [(’__eq__’, ), (’__ge__’, ), (’__gt__’, ), (’__init__’, ), (’__le__’, ), (’__lt__’, )] Comparisons: a < b : testing __gt__(2, 1) result of a < b: True a <= b: testing __gt__(1, 2) result of a <= b: True a == b: testing __eq__(1, 2) result of a == b: False  112  Python 标准库  ptg 3.1. functools—Tools for Manipulating Functions 139 def __gt__(self, other): print ’ testing __gt__(%s, %s)’ % (self.val, other.val) return self.val > other.val print ’Methods:\n’ pprint(inspect.getmembers(MyObject, inspect.ismethod)) a = MyObject(1) b = MyObject(2) print ’\nComparisons:’ for expr in [ ’a < b’, ’a <= b’, ’a == b’, ’a >= b’, ’a > b’ ]: print ’\n%-6s:’ % expr result = eval(expr) print ’ result of %s: %s’ % (expr, result) The class must provide implementation of __eq__() and one other rich compar- ison method. The decorator adds implementations of the rest of the methods that work by using the comparisons provided. $ python functools_total_ordering.py Methods: [(’__eq__’, ), (’__ge__’, ), (’__gt__’, ), (’__init__’, ), (’__le__’, ), (’__lt__’, )] Comparisons: a < b : testing __gt__(2, 1) result of a < b: True a <= b: testing __gt__(1, 2) result of a <= b: True a == b: testing __eq__(1, 2) result of a == b: False ptg 140 Algorithms a >= b: testing __gt__(2, 1) result of a >= b: False a > b : testing __gt__(1, 2) result of a > b: False Collation Order Since old-style comparison functions are deprecated in Python 3, the cmp argument to functions like sort() is also no longer supported. Python 2 programs that use com- parison functions can use cmp_to_key() to convert them to a function that returns a collation key, which is used to determine the position in the final sequence. import functools class MyObject(object): def __init__(self, val): self.val = val def __str__(self): return ’MyObject(%s)’ % self.val def compare_obj(a, b): """Old-style comparison function. """ print ’comparing %s and %s’ % (a, b) return cmp(a.val, b.val) # Make a key function using cmp_to_key() get_key = functools.cmp_to_key(compare_obj) def get_key_wrapper(o): """Wrapper function for get_key to allow for print statements. """ new_key = get_key(o) print ’key_wrapper(%s) -> %s’ % (o, new_key) return new_key objs = [ MyObject(x) for x in xrange(5, 0, -1) ] for o in sorted(objs, key=get_key_wrapper): print o 比对序 由于 Python 3 中废弃了老式比较函数,因此 sort() 之类的函数中也不再支持 cmp 参数。 对于使用了比较函数的 Python 2 程序,可以用 cmp_to_key() 将它们转换为一个返回比对键 (collation key)的函数,这个键用于确定元素在最终序列中的位置。 ptg 140 Algorithms a >= b: testing __gt__(2, 1) result of a >= b: False a > b : testing __gt__(1, 2) result of a > b: False Collation Order Since old-style comparison functions are deprecated in Python 3, the cmp argument to functions like sort() is also no longer supported. Python 2 programs that use com- parison functions can use cmp_to_key() to convert them to a function that returns a collation key, which is used to determine the position in the final sequence. import functools class MyObject(object): def __init__(self, val): self.val = val def __str__(self): return ’MyObject(%s)’ % self.val def compare_obj(a, b): """Old-style comparison function. """ print ’comparing %s and %s’ % (a, b) return cmp(a.val, b.val) # Make a key function using cmp_to_key() get_key = functools.cmp_to_key(compare_obj) def get_key_wrapper(o): """Wrapper function for get_key to allow for print statements. """ new_key = get_key(o) print ’key_wrapper(%s) -> %s’ % (o, new_key) return new_key objs = [ MyObject(x) for x in xrange(5, 0, -1) ] for o in sorted(objs, key=get_key_wrapper): print o 第 3 章 算  法 113  ptg 140 Algorithms a >= b: testing __gt__(2, 1) result of a >= b: False a > b : testing __gt__(1, 2) result of a > b: False Collation Order Since old-style comparison functions are deprecated in Python 3, the cmp argument to functions like sort() is also no longer supported. Python 2 programs that use com- parison functions can use cmp_to_key() to convert them to a function that returns a collation key, which is used to determine the position in the final sequence. import functools class MyObject(object): def __init__(self, val): self.val = val def __str__(self): return ’MyObject(%s)’ % self.val def compare_obj(a, b): """Old-style comparison function. """ print ’comparing %s and %s’ % (a, b) return cmp(a.val, b.val) # Make a key function using cmp_to_key() get_key = functools.cmp_to_key(compare_obj) def get_key_wrapper(o): """Wrapper function for get_key to allow for print statements. """ new_key = get_key(o) print ’key_wrapper(%s) -> %s’ % (o, new_key) return new_key objs = [ MyObject(x) for x in xrange(5, 0, -1) ] for o in sorted(objs, key=get_key_wrapper): print o 正常情况下,可以直接使用 cmp_to_key(),不过这个例子中引入了一个额外的包装器函数, 从而在调用 key 函数时可以输出更多的信息。 如输出所示,sorted() 首先对序列中的每一个元素调用 get_key_wrapper() 来生成一个键。 cmp_to_key() 返回的键是 functools 中定义的一个类的实例,这个类使用所传入的老式比较函数 来实现富比较 API。所有键都创建之后,将通过比较这些键对序列排序。 ptg 3.2. itertools—Iterator Functions 141 Normally, cmp_to_key() would be used directly, but in this example, an extra wrapper function is introduced to print out more information as the key function is being called. The output shows that sorted() starts by calling get_key_wrapper() for each item in the sequence to produce a key. The keys returned by cmp_to_key() are instances of a class defined in functools that implements the rich comparison API using the old-style comparison function passed in. After all keys are created, the se- quence is sorted by comparing the keys. $ python functools_cmp_to_key.py key_wrapper(MyObject(5)) -> key_wrapper(MyObject(4)) -> key_wrapper(MyObject(3)) -> key_wrapper(MyObject(2)) -> key_wrapper(MyObject(1)) -> comparing MyObject(4) and MyObject(5) comparing MyObject(3) and MyObject(4) comparing MyObject(2) and MyObject(3) comparing MyObject(1) and MyObject(2) MyObject(1) MyObject(2) MyObject(3) MyObject(4) MyObject(5) See Also: functools (http://docs.python.org/library/functools.html) The standard library doc- umentation for this module. Rich comparison methods (http://docs.python.org/reference/datamodel.html# object.__lt__) Description of the rich comparison methods from the Python Reference Guide. inspect (page 1200) Introspection API for live objects. 3.2 itertools—Iterator Functions Purpose The itertools module includes a set of functions for working with sequence data sets. Python Version 2.3 and later 参见: functools (http://docs.python.org/library/functools.html) 这个模块的标准库文档。 Rich comparison methods (http://docs.python.org/reference/datamodel.html#object.__lt__)  Python 参考指南中对富比较方法的描述。 inspect (18.4 节 ) 活动对象的自省 API。  114  Python 标准库  3.2 itertools—迭代器函数 作用: itertools 模块包含一组函数用于处理序列数据集。 Python 版本:2.3 及以后版本 itertools 提供的函数是受函数式编程语言(如 Clojure 和 Haskell)中类似特性的启发。其目 的是保证快速,并且高效地使用内存,而且可以联结在一起表述更为复杂的基于迭代的算法。 与使用列表的代码相比,基于迭代器的算法可以提供更好的内存使用特性。在真正需要数 据之前,并不从迭代器生成数据,由于这个原因,不需要将所有数据都同时存储在内存中。这 种“懒”处理模型可以减少内存使用,相应地还可以减少交换以及大数据集的其他副作用,从 而改善性能。 3.2.1 合并和分解迭代器 chain() 函数取多个迭代器作为参数,最后返回一个迭代器,它能生成所有输入迭代器的内 容,就好像这些内容来自一个迭代器一样。 ptg 142 Algorithms The functions provided by itertools are inspired by similar features of functional programming languages such as Clojure and Haskell. They are intended to be fast and use memory efficiently, and also to be hooked together to express more complicated iteration-based algorithms. Iterator-based code offers better memory consumption characteristics than code that uses lists. Since data is not produced from the iterator until it is needed, all data does not need to be stored in memory at the same time. This “lazy” processing model uses less memory, which can reduce swapping and other side effects of large data sets, improving performance. 3.2.1 Merging and Splitting Iterators The chain() function takes several iterators as arguments and returns a single iterator that produces the contents of all of them as though they came from a single iterator. from itertools import * for i in chain([1, 2, 3], [’a’, ’b’, ’c’]): print i, print chain() makes it easy to process several sequences without constructing one large list. $ python itertools_chain.py 1 2 3 a b c izip() returns an iterator that combines the elements of several iterators into tuples. from itertools import * for i in izip([1, 2, 3], [’a’, ’b’, ’c’]): print i It works like the built-in function zip(), except that it returns an iterator instead of a list. 利用 chain(),可以轻松地处理多个序列而不必构造一个大的列表。 ptg 142 Algorithms The functions provided by itertools are inspired by similar features of functional programming languages such as Clojure and Haskell. They are intended to be fast and use memory efficiently, and also to be hooked together to express more complicated iteration-based algorithms. Iterator-based code offers better memory consumption characteristics than code that uses lists. Since data is not produced from the iterator until it is needed, all data does not need to be stored in memory at the same time. This “lazy” processing model uses less memory, which can reduce swapping and other side effects of large data sets, improving performance. 3.2.1 Merging and Splitting Iterators The chain() function takes several iterators as arguments and returns a single iterator that produces the contents of all of them as though they came from a single iterator. from itertools import * for i in chain([1, 2, 3], [’a’, ’b’, ’c’]): print i, print chain() makes it easy to process several sequences without constructing one large list. $ python itertools_chain.py 1 2 3 a b c izip() returns an iterator that combines the elements of several iterators into tuples. from itertools import * for i in izip([1, 2, 3], [’a’, ’b’, ’c’]): print i It works like the built-in function zip(), except that it returns an iterator instead of a list. izip() 返回一个迭代器,它会把多个迭代器的元素结合到一个元组中。 ptg 142 Algorithms The functions provided by itertools are inspired by similar features of functional programming languages such as Clojure and Haskell. They are intended to be fast and use memory efficiently, and also to be hooked together to express more complicated iteration-based algorithms. Iterator-based code offers better memory consumption characteristics than code that uses lists. Since data is not produced from the iterator until it is needed, all data does not need to be stored in memory at the same time. This “lazy” processing model uses less memory, which can reduce swapping and other side effects of large data sets, improving performance. 3.2.1 Merging and Splitting Iterators The chain() function takes several iterators as arguments and returns a single iterator that produces the contents of all of them as though they came from a single iterator. from itertools import * for i in chain([1, 2, 3], [’a’, ’b’, ’c’]): print i, print chain() makes it easy to process several sequences without constructing one large list. $ python itertools_chain.py 1 2 3 a b c izip() returns an iterator that combines the elements of several iterators into tuples. from itertools import * for i in izip([1, 2, 3], [’a’, ’b’, ’c’]): print i It works like the built-in function zip(), except that it returns an iterator instead of a list. 这个函数的工作方式类似于内置函数 zip(),只不过它会返回一个迭代器而不是一个列表。 ptg 3.2. itertools—Iterator Functions 143 $ python itertools_izip.py (1, ’a’) (2, ’b’) (3, ’c’) The islice() function returns an iterator that returns selected items from the input iterator, by index. from itertools import * print ’Stop at 5:’ for i in islice(count(), 5): print i, print ’\n’ print ’Start at 5, Stop at 10:’ for i in islice(count(), 5, 10): print i, print ’\n’ print ’By tens to 100:’ for i in islice(count(), 0, 100, 10): print i, print ’\n’ islice() takes the same arguments as the slice operator for lists: start, stop, and step. The start and step arguments are optional. $ python itertools_islice.py Stop at 5: 0 1 2 3 4 Start at 5, Stop at 10: 5 6 7 8 9 By tens to 100: 0 10 20 30 40 50 60 70 80 90 islice() 函数返回一个迭代器,它按索引返回由输入迭代器所选的元素。 ptg 3.2. itertools—Iterator Functions 143 $ python itertools_izip.py (1, ’a’) (2, ’b’) (3, ’c’) The islice() function returns an iterator that returns selected items from the input iterator, by index. from itertools import * print ’Stop at 5:’ for i in islice(count(), 5): print i, print ’\n’ print ’Start at 5, Stop at 10:’ for i in islice(count(), 5, 10): print i, print ’\n’ print ’By tens to 100:’ for i in islice(count(), 0, 100, 10): print i, print ’\n’ islice() takes the same arguments as the slice operator for lists: start, stop, and step. The start and step arguments are optional. $ python itertools_islice.py Stop at 5: 0 1 2 3 4 Start at 5, Stop at 10: 5 6 7 8 9 By tens to 100: 0 10 20 30 40 50 60 70 80 90 第 3 章 算  法 115  ptg 3.2. itertools—Iterator Functions 143 $ python itertools_izip.py (1, ’a’) (2, ’b’) (3, ’c’) The islice() function returns an iterator that returns selected items from the input iterator, by index. from itertools import * print ’Stop at 5:’ for i in islice(count(), 5): print i, print ’\n’ print ’Start at 5, Stop at 10:’ for i in islice(count(), 5, 10): print i, print ’\n’ print ’By tens to 100:’ for i in islice(count(), 0, 100, 10): print i, print ’\n’ islice() takes the same arguments as the slice operator for lists: start, stop, and step. The start and step arguments are optional. $ python itertools_islice.py Stop at 5: 0 1 2 3 4 Start at 5, Stop at 10: 5 6 7 8 9 By tens to 100: 0 10 20 30 40 50 60 70 80 90 islice() 与列表的 slice 操作符参数相同,同样包括开始位置(start)、结束位置(stop)和 步长(step)。start 和 step 参数是可选的。 ptg 3.2. itertools—Iterator Functions 143 $ python itertools_izip.py (1, ’a’) (2, ’b’) (3, ’c’) The islice() function returns an iterator that returns selected items from the input iterator, by index. from itertools import * print ’Stop at 5:’ for i in islice(count(), 5): print i, print ’\n’ print ’Start at 5, Stop at 10:’ for i in islice(count(), 5, 10): print i, print ’\n’ print ’By tens to 100:’ for i in islice(count(), 0, 100, 10): print i, print ’\n’ islice() takes the same arguments as the slice operator for lists: start, stop, and step. The start and step arguments are optional. $ python itertools_islice.py Stop at 5: 0 1 2 3 4 Start at 5, Stop at 10: 5 6 7 8 9 By tens to 100: 0 10 20 30 40 50 60 70 80 90 tee() 函数根据一个原输入迭代器返回多个独立的迭代器(默认为两个)。 ptg 144 Algorithms The tee() function returns several independent iterators (defaults to 2) based on a single original input. from itertools import * r = islice(count(), 5) i1, i2 = tee(r) print ’i1:’, list(i1) print ’i2:’, list(i2) tee() has semantics similar to the UNIX tee utility, which repeats the values it reads from its input and writes them to a named file and standard output. The iterators returned by tee() can be used to feed the same set of data into multiple algorithms to be processed in parallel. $ python itertools_tee.py i1: [0, 1, 2, 3, 4] i2: [0, 1, 2, 3, 4] The new iterators created by tee() share their input, so the original iterator should not be used once the new ones are created. from itertools import * r = islice(count(), 5) i1, i2 = tee(r) print ’r:’, for i in r: print i, if i > 1: break print print ’i1:’, list(i1) print ’i2:’, list(i2) If values are consumed from the original input, the new iterators will not produce those values: tee() 的语义类似于 UNIX tee 工具,它会重复从输入读到的值,并把它们写至一个命名文 件和标准输出。tee() 返回的迭代器可以用来为将并行处理的多个算法提供相同的数据集。 ptg 144 Algorithms The tee() function returns several independent iterators (defaults to 2) based on a single original input. from itertools import * r = islice(count(), 5) i1, i2 = tee(r) print ’i1:’, list(i1) print ’i2:’, list(i2) tee() has semantics similar to the UNIX tee utility, which repeats the values it reads from its input and writes them to a named file and standard output. The iterators returned by tee() can be used to feed the same set of data into multiple algorithms to be processed in parallel. $ python itertools_tee.py i1: [0, 1, 2, 3, 4] i2: [0, 1, 2, 3, 4] The new iterators created by tee() share their input, so the original iterator should not be used once the new ones are created. from itertools import * r = islice(count(), 5) i1, i2 = tee(r) print ’r:’, for i in r: print i, if i > 1: break print print ’i1:’, list(i1) print ’i2:’, list(i2) If values are consumed from the original input, the new iterators will not produce those values: tee() 创建的新迭代器共享其输入迭代器,所以一旦创建了新迭代器,就不应再使用原迭代器。 ptg 144 Algorithms The tee() function returns several independent iterators (defaults to 2) based on a single original input. from itertools import * r = islice(count(), 5) i1, i2 = tee(r) print ’i1:’, list(i1) print ’i2:’, list(i2) tee() has semantics similar to the UNIX tee utility, which repeats the values it reads from its input and writes them to a named file and standard output. The iterators returned by tee() can be used to feed the same set of data into multiple algorithms to be processed in parallel. $ python itertools_tee.py i1: [0, 1, 2, 3, 4] i2: [0, 1, 2, 3, 4] The new iterators created by tee() share their input, so the original iterator should not be used once the new ones are created. from itertools import * r = islice(count(), 5) i1, i2 = tee(r) print ’r:’, for i in r: print i, if i > 1: break print print ’i1:’, list(i1) print ’i2:’, list(i2) If values are consumed from the original input, the new iterators will not produce those values:  116  Python 标准库  ptg 144 Algorithms The tee() function returns several independent iterators (defaults to 2) based on a single original input. from itertools import * r = islice(count(), 5) i1, i2 = tee(r) print ’i1:’, list(i1) print ’i2:’, list(i2) tee() has semantics similar to the UNIX tee utility, which repeats the values it reads from its input and writes them to a named file and standard output. The iterators returned by tee() can be used to feed the same set of data into multiple algorithms to be processed in parallel. $ python itertools_tee.py i1: [0, 1, 2, 3, 4] i2: [0, 1, 2, 3, 4] The new iterators created by tee() share their input, so the original iterator should not be used once the new ones are created. from itertools import * r = islice(count(), 5) i1, i2 = tee(r) print ’r:’, for i in r: print i, if i > 1: break print print ’i1:’, list(i1) print ’i2:’, list(i2) If values are consumed from the original input, the new iterators will not produce those values: 如果原输入迭代器的值已经利用,新迭代器就不会再生成这些值。 ptg 3.2. itertools—Iterator Functions 145 $ python itertools_tee_error.py r: 0 1 2 i1: [3, 4] i2: [3, 4] 3.2.2 Converting Inputs The imap() function returns an iterator that calls a function on the values in the input iterators and returns the results. It works like the built-in map(), except that it stops when any input iterator is exhausted (instead of inserting None values to completely consume all inputs). from itertools import * print ’Doubles:’ for i in imap(lambda x:2*x, xrange(5)): print i print ’Multiples:’ for i in imap(lambda x,y:(x, y, x*y), xrange(5), xrange(5,10)): print ’%d * %d = %d’ % i In the first example, the lambda function multiplies the input values by 2. In the second example, the lambda function multiplies two arguments, taken from separate iterators, and returns a tuple with the original arguments and the computed value. $ python itertools_imap.py Doubles: 0 2 4 6 8 Multiples: 0 * 5 = 0 1 * 6 = 6 2 * 7 = 14 3 * 8 = 24 4 * 9 = 36 3.2.2 转换输入 imap() 函数会返回一个迭代器,它对输入迭代器中的值调用一个函数并返回结果。imap() 函数的工作方式类似于内置函数 map(),只不过只要有某个输入迭代器中的元素全部用完, imap() 函数都会停止(而不是插入 None 值来完全利用所有输入)。 ptg 3.2. itertools—Iterator Functions 145 $ python itertools_tee_error.py r: 0 1 2 i1: [3, 4] i2: [3, 4] 3.2.2 Converting Inputs The imap() function returns an iterator that calls a function on the values in the input iterators and returns the results. It works like the built-in map(), except that it stops when any input iterator is exhausted (instead of inserting None values to completely consume all inputs). from itertools import * print ’Doubles:’ for i in imap(lambda x:2*x, xrange(5)): print i print ’Multiples:’ for i in imap(lambda x,y:(x, y, x*y), xrange(5), xrange(5,10)): print ’%d * %d = %d’ % i In the first example, the lambda function multiplies the input values by 2. In the second example, the lambda function multiplies two arguments, taken from separate iterators, and returns a tuple with the original arguments and the computed value. $ python itertools_imap.py Doubles: 0 2 4 6 8 Multiples: 0 * 5 = 0 1 * 6 = 6 2 * 7 = 14 3 * 8 = 24 4 * 9 = 36 在第一个例子中,lambda 函数将输入值乘以 2。在第二个例子中,lambda 函数将两个参数相 乘(这两个参数分别来自不同的迭代器),并返回一个 tuple,其中包含原参数和计算得到的值。 ptg 3.2. itertools—Iterator Functions 145 $ python itertools_tee_error.py r: 0 1 2 i1: [3, 4] i2: [3, 4] 3.2.2 Converting Inputs The imap() function returns an iterator that calls a function on the values in the input iterators and returns the results. It works like the built-in map(), except that it stops when any input iterator is exhausted (instead of inserting None values to completely consume all inputs). from itertools import * print ’Doubles:’ for i in imap(lambda x:2*x, xrange(5)): print i print ’Multiples:’ for i in imap(lambda x,y:(x, y, x*y), xrange(5), xrange(5,10)): print ’%d * %d = %d’ % i In the first example, the lambda function multiplies the input values by 2. In the second example, the lambda function multiplies two arguments, taken from separate iterators, and returns a tuple with the original arguments and the computed value. $ python itertools_imap.py Doubles: 0 2 4 6 8 Multiples: 0 * 5 = 0 1 * 6 = 6 2 * 7 = 14 3 * 8 = 24 4 * 9 = 36 第 3 章 算  法 117  ptg 3.2. itertools—Iterator Functions 145 $ python itertools_tee_error.py r: 0 1 2 i1: [3, 4] i2: [3, 4] 3.2.2 Converting Inputs The imap() function returns an iterator that calls a function on the values in the input iterators and returns the results. It works like the built-in map(), except that it stops when any input iterator is exhausted (instead of inserting None values to completely consume all inputs). from itertools import * print ’Doubles:’ for i in imap(lambda x:2*x, xrange(5)): print i print ’Multiples:’ for i in imap(lambda x,y:(x, y, x*y), xrange(5), xrange(5,10)): print ’%d * %d = %d’ % i In the first example, the lambda function multiplies the input values by 2. In the second example, the lambda function multiplies two arguments, taken from separate iterators, and returns a tuple with the original arguments and the computed value. $ python itertools_imap.py Doubles: 0 2 4 6 8 Multiples: 0 * 5 = 0 1 * 6 = 6 2 * 7 = 14 3 * 8 = 24 4 * 9 = 36 starmap() 函数类似于 imap(),不过并不是由多个迭代器构建一个 tuple,它使用 * 语法分解 一个迭代器中的元素作为映射函数的参数。 ptg 146 Algorithms The starmap() function is similar to imap(), but instead of constructing a tuple from multiple iterators, it splits up the items in a single iterator as arguments to the mapping function using the * syntax. from itertools import * values = [(0, 5), (1, 6), (2, 7), (3, 8), (4, 9)] for i in starmap(lambda x,y:(x, y, x*y), values): print ’%d * %d = %d’ % i Where the mapping function to imap() is called f(i1, i2), the mapping func- tion passed to starmap() is called f(*i). $ python itertools_starmap.py 0 * 5 = 0 1 * 6 = 6 2 * 7 = 14 3 * 8 = 24 4 * 9 = 36 3.2.3 Producing New Values The count() function returns an iterator that produces consecutive integers, indefi- nitely. The first number can be passed as an argument (the default is zero). There is no upper bound argument [see the built-in xrange() for more control over the result set]. from itertools import * for i in izip(count(1), [’a’, ’b’, ’c’]): print i This example stops because the list argument is consumed. $ python itertools_count.py (1, ’a’) (2, ’b’) (3, ’c’) imap() 的映射函数名为 f(i1, i2),而传入 starmap() 的映射函数名为 f(*i)。 ptg 146 Algorithms The starmap() function is similar to imap(), but instead of constructing a tuple from multiple iterators, it splits up the items in a single iterator as arguments to the mapping function using the * syntax. from itertools import * values = [(0, 5), (1, 6), (2, 7), (3, 8), (4, 9)] for i in starmap(lambda x,y:(x, y, x*y), values): print ’%d * %d = %d’ % i Where the mapping function to imap() is called f(i1, i2), the mapping func- tion passed to starmap() is called f(*i). $ python itertools_starmap.py 0 * 5 = 0 1 * 6 = 6 2 * 7 = 14 3 * 8 = 24 4 * 9 = 36 3.2.3 Producing New Values The count() function returns an iterator that produces consecutive integers, indefi- nitely. The first number can be passed as an argument (the default is zero). There is no upper bound argument [see the built-in xrange() for more control over the result set]. from itertools import * for i in izip(count(1), [’a’, ’b’, ’c’]): print i This example stops because the list argument is consumed. $ python itertools_count.py (1, ’a’) (2, ’b’) (3, ’c’) 3.2.3 生成新值 count() 函数返回一个迭代器,能够无限地生成连续整数。第一个数可以作为参数传入(默 认为 0)。这里没有上界参数 [ 参见内置 xrange() 来更多地控制结果集 ]。 ptg 146 Algorithms The starmap() function is similar to imap(), but instead of constructing a tuple from multiple iterators, it splits up the items in a single iterator as arguments to the mapping function using the * syntax. from itertools import * values = [(0, 5), (1, 6), (2, 7), (3, 8), (4, 9)] for i in starmap(lambda x,y:(x, y, x*y), values): print ’%d * %d = %d’ % i Where the mapping function to imap() is called f(i1, i2), the mapping func- tion passed to starmap() is called f(*i). $ python itertools_starmap.py 0 * 5 = 0 1 * 6 = 6 2 * 7 = 14 3 * 8 = 24 4 * 9 = 36 3.2.3 Producing New Values The count() function returns an iterator that produces consecutive integers, indefi- nitely. The first number can be passed as an argument (the default is zero). There is no upper bound argument [see the built-in xrange() for more control over the result set]. from itertools import * for i in izip(count(1), [’a’, ’b’, ’c’]): print i This example stops because the list argument is consumed. $ python itertools_count.py (1, ’a’) (2, ’b’) (3, ’c’) 这个例子会停止,因为 list 参数已经利用过。 ptg 146 Algorithms The starmap() function is similar to imap(), but instead of constructing a tuple from multiple iterators, it splits up the items in a single iterator as arguments to the mapping function using the * syntax. from itertools import * values = [(0, 5), (1, 6), (2, 7), (3, 8), (4, 9)] for i in starmap(lambda x,y:(x, y, x*y), values): print ’%d * %d = %d’ % i Where the mapping function to imap() is called f(i1, i2), the mapping func- tion passed to starmap() is called f(*i). $ python itertools_starmap.py 0 * 5 = 0 1 * 6 = 6 2 * 7 = 14 3 * 8 = 24 4 * 9 = 36 3.2.3 Producing New Values The count() function returns an iterator that produces consecutive integers, indefi- nitely. The first number can be passed as an argument (the default is zero). There is no upper bound argument [see the built-in xrange() for more control over the result set]. from itertools import * for i in izip(count(1), [’a’, ’b’, ’c’]): print i This example stops because the list argument is consumed. $ python itertools_count.py (1, ’a’) (2, ’b’) (3, ’c’) cycle() 函数返回一个迭代器,它会无限地重复给定参数的内容。由于必须记住输入迭代器 的全部内容,因此如果这个迭代器很长,可能会耗费大量内存。 ptg 3.2. itertools—Iterator Functions 147 The cycle() function returns an iterator that indefinitely repeats the contents of the arguments it is given. Since it has to remember the entire contents of the input iterator, it may consume quite a bit of memory if the iterator is long. from itertools import * for i, item in izip(xrange(7), cycle([’a’, ’b’, ’c’])): print (i, item) A counter variable is used to break out of the loop after a few cycles in this example. $ python itertools_cycle.py (0, ’a’) (1, ’b’) (2, ’c’) (3, ’a’) (4, ’b’) (5, ’c’) (6, ’a’) The repeat() function returns an iterator that produces the same value each time it is accessed. from itertools import * for i in repeat(’over-and-over’, 5): print i The iterator returned by repeat() keeps returning data forever, unless the optional times argument is provided to limit it. $ python itertools_repeat.py over-and-over over-and-over over-and-over over-and-over over-and-over 这个例子中使用了一个计数器变量,在数个周期后会跳出循环。 ptg 3.2. itertools—Iterator Functions 147 The cycle() function returns an iterator that indefinitely repeats the contents of the arguments it is given. Since it has to remember the entire contents of the input iterator, it may consume quite a bit of memory if the iterator is long. from itertools import * for i, item in izip(xrange(7), cycle([’a’, ’b’, ’c’])): print (i, item) A counter variable is used to break out of the loop after a few cycles in this example. $ python itertools_cycle.py (0, ’a’) (1, ’b’) (2, ’c’) (3, ’a’) (4, ’b’) (5, ’c’) (6, ’a’) The repeat() function returns an iterator that produces the same value each time it is accessed. from itertools import * for i in repeat(’over-and-over’, 5): print i The iterator returned by repeat() keeps returning data forever, unless the optional times argument is provided to limit it. $ python itertools_repeat.py over-and-over over-and-over over-and-over over-and-over over-and-over  118  Python 标准库  ptg 3.2. itertools—Iterator Functions 147 The cycle() function returns an iterator that indefinitely repeats the contents of the arguments it is given. Since it has to remember the entire contents of the input iterator, it may consume quite a bit of memory if the iterator is long. from itertools import * for i, item in izip(xrange(7), cycle([’a’, ’b’, ’c’])): print (i, item) A counter variable is used to break out of the loop after a few cycles in this example. $ python itertools_cycle.py (0, ’a’) (1, ’b’) (2, ’c’) (3, ’a’) (4, ’b’) (5, ’c’) (6, ’a’) The repeat() function returns an iterator that produces the same value each time it is accessed. from itertools import * for i in repeat(’over-and-over’, 5): print i The iterator returned by repeat() keeps returning data forever, unless the optional times argument is provided to limit it. $ python itertools_repeat.py over-and-over over-and-over over-and-over over-and-over over-and-over repeat() 函数返回一个迭代器,每次访问时会生成相同的值。 ptg 3.2. itertools—Iterator Functions 147 The cycle() function returns an iterator that indefinitely repeats the contents of the arguments it is given. Since it has to remember the entire contents of the input iterator, it may consume quite a bit of memory if the iterator is long. from itertools import * for i, item in izip(xrange(7), cycle([’a’, ’b’, ’c’])): print (i, item) A counter variable is used to break out of the loop after a few cycles in this example. $ python itertools_cycle.py (0, ’a’) (1, ’b’) (2, ’c’) (3, ’a’) (4, ’b’) (5, ’c’) (6, ’a’) The repeat() function returns an iterator that produces the same value each time it is accessed. from itertools import * for i in repeat(’over-and-over’, 5): print i The iterator returned by repeat() keeps returning data forever, unless the optional times argument is provided to limit it. $ python itertools_repeat.py over-and-over over-and-over over-and-over over-and-over over-and-over repeat() 返回的迭代器会一直返回数据,除非提供了可选的 times 参数来限制次数。 ptg 3.2. itertools—Iterator Functions 147 The cycle() function returns an iterator that indefinitely repeats the contents of the arguments it is given. Since it has to remember the entire contents of the input iterator, it may consume quite a bit of memory if the iterator is long. from itertools import * for i, item in izip(xrange(7), cycle([’a’, ’b’, ’c’])): print (i, item) A counter variable is used to break out of the loop after a few cycles in this example. $ python itertools_cycle.py (0, ’a’) (1, ’b’) (2, ’c’) (3, ’a’) (4, ’b’) (5, ’c’) (6, ’a’) The repeat() function returns an iterator that produces the same value each time it is accessed. from itertools import * for i in repeat(’over-and-over’, 5): print i The iterator returned by repeat() keeps returning data forever, unless the optional times argument is provided to limit it. $ python itertools_repeat.py over-and-over over-and-over over-and-over over-and-over over-and-over 如果既要包含来自其他迭代器的值,还要包含一些不变的值,那么将 repeat() 与 izip() 或 imap() 结合使用会很有用。 ptg 148 Algorithms It is useful to combine repeat() with izip() or imap() when invariant values need to be included with the values from the other iterators. from itertools import * for i, s in izip(count(), repeat(’over-and-over’, 5)): print i, s A counter value is combined with the constant returned by repeat() in this example. $ python itertools_repeat_izip.py 0 over-and-over 1 over-and-over 2 over-and-over 3 over-and-over 4 over-and-over This example uses imap() to multiply the numbers in the range 0 through 4 by 2. from itertools import * for i in imap(lambda x,y:(x, y, x*y), repeat(2), xrange(5)): print ’%d * %d = %d’ % i The repeat() iterator does not need to be explicitly limited, since imap() stops processing when any of its inputs ends, and the xrange() returns only five elements. $ python itertools_repeat_imap.py 2 * 0 = 0 2 * 1 = 2 2 * 2 = 4 2 * 3 = 6 2 * 4 = 8 3.2.4 Filtering The dropwhile() function returns an iterator that produces elements of the input iterator after a condition becomes false for the first time. 这个例子中就结合了一个计数器值和 repeat() 返回的常量。 ptg 148 Algorithms It is useful to combine repeat() with izip() or imap() when invariant values need to be included with the values from the other iterators. from itertools import * for i, s in izip(count(), repeat(’over-and-over’, 5)): print i, s A counter value is combined with the constant returned by repeat() in this example. $ python itertools_repeat_izip.py 0 over-and-over 1 over-and-over 2 over-and-over 3 over-and-over 4 over-and-over This example uses imap() to multiply the numbers in the range 0 through 4 by 2. from itertools import * for i in imap(lambda x,y:(x, y, x*y), repeat(2), xrange(5)): print ’%d * %d = %d’ % i The repeat() iterator does not need to be explicitly limited, since imap() stops processing when any of its inputs ends, and the xrange() returns only five elements. $ python itertools_repeat_imap.py 2 * 0 = 0 2 * 1 = 2 2 * 2 = 4 2 * 3 = 6 2 * 4 = 8 3.2.4 Filtering The dropwhile() function returns an iterator that produces elements of the input iterator after a condition becomes false for the first time. 下面这个例子使用 imap() 将 0 ~ 4 区间中的数乘以 2。 ptg 148 Algorithms It is useful to combine repeat() with izip() or imap() when invariant values need to be included with the values from the other iterators. from itertools import * for i, s in izip(count(), repeat(’over-and-over’, 5)): print i, s A counter value is combined with the constant returned by repeat() in this example. $ python itertools_repeat_izip.py 0 over-and-over 1 over-and-over 2 over-and-over 3 over-and-over 4 over-and-over This example uses imap() to multiply the numbers in the range 0 through 4 by 2. from itertools import * for i in imap(lambda x,y:(x, y, x*y), repeat(2), xrange(5)): print ’%d * %d = %d’ % i The repeat() iterator does not need to be explicitly limited, since imap() stops processing when any of its inputs ends, and the xrange() returns only five elements. $ python itertools_repeat_imap.py 2 * 0 = 0 2 * 1 = 2 2 * 2 = 4 2 * 3 = 6 2 * 4 = 8 3.2.4 Filtering The dropwhile() function returns an iterator that produces elements of the input iterator after a condition becomes false for the first time. repeat() 迭代器不需要显式限制,因为任何一个输入结束时 imap() 就会停止处理,而 xrange() 只返回 5 个元素。 第 3 章 算  法 119  ptg 148 Algorithms It is useful to combine repeat() with izip() or imap() when invariant values need to be included with the values from the other iterators. from itertools import * for i, s in izip(count(), repeat(’over-and-over’, 5)): print i, s A counter value is combined with the constant returned by repeat() in this example. $ python itertools_repeat_izip.py 0 over-and-over 1 over-and-over 2 over-and-over 3 over-and-over 4 over-and-over This example uses imap() to multiply the numbers in the range 0 through 4 by 2. from itertools import * for i in imap(lambda x,y:(x, y, x*y), repeat(2), xrange(5)): print ’%d * %d = %d’ % i The repeat() iterator does not need to be explicitly limited, since imap() stops processing when any of its inputs ends, and the xrange() returns only five elements. $ python itertools_repeat_imap.py 2 * 0 = 0 2 * 1 = 2 2 * 2 = 4 2 * 3 = 6 2 * 4 = 8 3.2.4 Filtering The dropwhile() function returns an iterator that produces elements of the input iterator after a condition becomes false for the first time. 3.2.4 过滤 dropwhile() 函数返回一个迭代器,它会生成输入迭代器中条件第一次为 false 之后的元素。 ptg 3.2. itertools—Iterator Functions 149 from itertools import * def should_drop(x): print ’Testing:’, x return (x<1) for i in dropwhile(should_drop, [ -1, 0, 1, 2, -2 ]): print ’Yielding:’, i dropwhile() does not filter every item of the input; after the condition is false the first time, all remaining items in the input are returned. $ python itertools_dropwhile.py Testing: -1 Testing: 0 Testing: 1 Yielding: 1 Yielding: 2 Yielding: -2 The opposite of dropwhile() is takewhile(). It returns an iterator that returns items from the input iterator, as long as the test function returns true. from itertools import * def should_take(x): print ’Testing:’, x return (x<2) for i in takewhile(should_take, [ -1, 0, 1, 2, -2 ]): print ’Yielding:’, i As soon as should_take() returns False, takewhile() stops processing the input. $ python itertools_takewhile.py Testing: -1 Yielding: -1 Testing: 0 dropwhile() 并不会过滤输入的每一个元素;第一次条件为 false 之后,输入中所有余下的元 素都会返回。 ptg 3.2. itertools—Iterator Functions 149 from itertools import * def should_drop(x): print ’Testing:’, x return (x<1) for i in dropwhile(should_drop, [ -1, 0, 1, 2, -2 ]): print ’Yielding:’, i dropwhile() does not filter every item of the input; after the condition is false the first time, all remaining items in the input are returned. $ python itertools_dropwhile.py Testing: -1 Testing: 0 Testing: 1 Yielding: 1 Yielding: 2 Yielding: -2 The opposite of dropwhile() is takewhile(). It returns an iterator that returns items from the input iterator, as long as the test function returns true. from itertools import * def should_take(x): print ’Testing:’, x return (x<2) for i in takewhile(should_take, [ -1, 0, 1, 2, -2 ]): print ’Yielding:’, i As soon as should_take() returns False, takewhile() stops processing the input. $ python itertools_takewhile.py Testing: -1 Yielding: -1 Testing: 0 takewhile() 与 dropwhile() 正好相反。它也返回一个迭代器,这个迭代器将返回输入迭代器 中保证测试条件为 true 的元素。 ptg 3.2. itertools—Iterator Functions 149 from itertools import * def should_drop(x): print ’Testing:’, x return (x<1) for i in dropwhile(should_drop, [ -1, 0, 1, 2, -2 ]): print ’Yielding:’, i dropwhile() does not filter every item of the input; after the condition is false the first time, all remaining items in the input are returned. $ python itertools_dropwhile.py Testing: -1 Testing: 0 Testing: 1 Yielding: 1 Yielding: 2 Yielding: -2 The opposite of dropwhile() is takewhile(). It returns an iterator that returns items from the input iterator, as long as the test function returns true. from itertools import * def should_take(x): print ’Testing:’, x return (x<2) for i in takewhile(should_take, [ -1, 0, 1, 2, -2 ]): print ’Yielding:’, i As soon as should_take() returns False, takewhile() stops processing the input. $ python itertools_takewhile.py Testing: -1 Yielding: -1 Testing: 0 ptg 3.2. itertools—Iterator Functions 149 from itertools import * def should_drop(x): print ’Testing:’, x return (x<1) for i in dropwhile(should_drop, [ -1, 0, 1, 2, -2 ]): print ’Yielding:’, i dropwhile() does not filter every item of the input; after the condition is false the first time, all remaining items in the input are returned. $ python itertools_dropwhile.py Testing: -1 Testing: 0 Testing: 1 Yielding: 1 Yielding: 2 Yielding: -2 The opposite of dropwhile() is takewhile(). It returns an iterator that returns items from the input iterator, as long as the test function returns true. from itertools import * def should_take(x): print ’Testing:’, x return (x<2) for i in takewhile(should_take, [ -1, 0, 1, 2, -2 ]): print ’Yielding:’, i As soon as should_take() returns False, takewhile() stops processing the input. $ python itertools_takewhile.py Testing: -1 Yielding: -1 Testing: 0 一旦 should_take() 返回 false,takewhile() 就会停止处理输入。 ptg 3.2. itertools—Iterator Functions 149 from itertools import * def should_drop(x): print ’Testing:’, x return (x<1) for i in dropwhile(should_drop, [ -1, 0, 1, 2, -2 ]): print ’Yielding:’, i dropwhile() does not filter every item of the input; after the condition is false the first time, all remaining items in the input are returned. $ python itertools_dropwhile.py Testing: -1 Testing: 0 Testing: 1 Yielding: 1 Yielding: 2 Yielding: -2 The opposite of dropwhile() is takewhile(). It returns an iterator that returns items from the input iterator, as long as the test function returns true. from itertools import * def should_take(x): print ’Testing:’, x return (x<2) for i in takewhile(should_take, [ -1, 0, 1, 2, -2 ]): print ’Yielding:’, i As soon as should_take() returns False, takewhile() stops processing the input. $ python itertools_takewhile.py Testing: -1 Yielding: -1 Testing: 0  120  Python 标准库  ptg 3.2. itertools—Iterator Functions 149 from itertools import * def should_drop(x): print ’Testing:’, x return (x<1) for i in dropwhile(should_drop, [ -1, 0, 1, 2, -2 ]): print ’Yielding:’, i dropwhile() does not filter every item of the input; after the condition is false the first time, all remaining items in the input are returned. $ python itertools_dropwhile.py Testing: -1 Testing: 0 Testing: 1 Yielding: 1 Yielding: 2 Yielding: -2 The opposite of dropwhile() is takewhile(). It returns an iterator that returns items from the input iterator, as long as the test function returns true. from itertools import * def should_take(x): print ’Testing:’, x return (x<2) for i in takewhile(should_take, [ -1, 0, 1, 2, -2 ]): print ’Yielding:’, i As soon as should_take() returns False, takewhile() stops processing the input. $ python itertools_takewhile.py Testing: -1 Yielding: -1 Testing: 0 ptg 150 Algorithms Yielding: 0 Testing: 1 Yielding: 1 Testing: 2 ifilter() returns an iterator that works like the built-in filter() does for lists, including only items for which the test function returns true. from itertools import * def check_item(x): print ’Testing:’, x return (x<1) for i in ifilter(check_item, [ -1, 0, 1, 2, -2 ]): print ’Yielding:’, i ifilter() is different from dropwhile() in that every item is tested before it is returned. $ python itertools_ifilter.py Testing: -1 Yielding: -1 Testing: 0 Yielding: 0 Testing: 1 Testing: 2 Testing: -2 Yielding: -2 ifilterfalse() returns an iterator that includes only items where the test func- tion returns false. from itertools import * def check_item(x): print ’Testing:’, x return (x<1) for i in ifilterfalse(check_item, [ -1, 0, 1, 2, -2 ]): print ’Yielding:’, i ifilter() 返回一个迭代器,它的工作方式与内置 filter() 处理列表的做法类似,其中只包含测 试条件返回 true 时的相应元素。 ptg 150 Algorithms Yielding: 0 Testing: 1 Yielding: 1 Testing: 2 ifilter() returns an iterator that works like the built-in filter() does for lists, including only items for which the test function returns true. from itertools import * def check_item(x): print ’Testing:’, x return (x<1) for i in ifilter(check_item, [ -1, 0, 1, 2, -2 ]): print ’Yielding:’, i ifilter() is different from dropwhile() in that every item is tested before it is returned. $ python itertools_ifilter.py Testing: -1 Yielding: -1 Testing: 0 Yielding: 0 Testing: 1 Testing: 2 Testing: -2 Yielding: -2 ifilterfalse() returns an iterator that includes only items where the test func- tion returns false. from itertools import * def check_item(x): print ’Testing:’, x return (x<1) for i in ifilterfalse(check_item, [ -1, 0, 1, 2, -2 ]): print ’Yielding:’, i ifilter() 与 dropwhile() 不同,在返回之前对每一个元素都会进行测试。 ptg 150 Algorithms Yielding: 0 Testing: 1 Yielding: 1 Testing: 2 ifilter() returns an iterator that works like the built-in filter() does for lists, including only items for which the test function returns true. from itertools import * def check_item(x): print ’Testing:’, x return (x<1) for i in ifilter(check_item, [ -1, 0, 1, 2, -2 ]): print ’Yielding:’, i ifilter() is different from dropwhile() in that every item is tested before it is returned. $ python itertools_ifilter.py Testing: -1 Yielding: -1 Testing: 0 Yielding: 0 Testing: 1 Testing: 2 Testing: -2 Yielding: -2 ifilterfalse() returns an iterator that includes only items where the test func- tion returns false. from itertools import * def check_item(x): print ’Testing:’, x return (x<1) for i in ifilterfalse(check_item, [ -1, 0, 1, 2, -2 ]): print ’Yielding:’, i ifilterfalse() 返回一个迭代器,其中只包含测试条件返回 false 时的相应元素。 ptg 150 Algorithms Yielding: 0 Testing: 1 Yielding: 1 Testing: 2 ifilter() returns an iterator that works like the built-in filter() does for lists, including only items for which the test function returns true. from itertools import * def check_item(x): print ’Testing:’, x return (x<1) for i in ifilter(check_item, [ -1, 0, 1, 2, -2 ]): print ’Yielding:’, i ifilter() is different from dropwhile() in that every item is tested before it is returned. $ python itertools_ifilter.py Testing: -1 Yielding: -1 Testing: 0 Yielding: 0 Testing: 1 Testing: 2 Testing: -2 Yielding: -2 ifilterfalse() returns an iterator that includes only items where the test func- tion returns false. from itertools import * def check_item(x): print ’Testing:’, x return (x<1) for i in ifilterfalse(check_item, [ -1, 0, 1, 2, -2 ]): print ’Yielding:’, i check_item() 中的测试表达式与前面相同,所以在这个使用 ifilterfalse() 的例子中,其结果 与上一个例子的结果正好相反。 ptg 3.2. itertools—Iterator Functions 151 The test expression in check_item() is the same, so the results in this example with ifilterfalse() are the opposite of the results from the previous example. $ python itertools_ifilterfalse.py Testing: -1 Testing: 0 Testing: 1 Yielding: 1 Testing: 2 Yielding: 2 Testing: -2 3.2.5 Grouping Data The groupby() function returns an iterator that produces sets of values organized by a common key. This example illustrates grouping related values based on an attribute. from itertools import * import operator import pprint class Point: def __init__(self, x, y): self.x = x self.y = y def __repr__(self): return ’(%s, %s)’ % (self.x, self.y) def __cmp__(self, other): return cmp((self.x, self.y), (other.x, other.y)) # Create a dataset of Point instances data = list(imap(Point, cycle(islice(count(), 3)), islice(count(), 7), ) ) print ’Data:’ pprint.pprint(data, width=69) print # Try to group the unsorted data based on X values print ’Grouped, unsorted:’ 第 3 章 算  法 121  ptg 3.2. itertools—Iterator Functions 151 The test expression in check_item() is the same, so the results in this example with ifilterfalse() are the opposite of the results from the previous example. $ python itertools_ifilterfalse.py Testing: -1 Testing: 0 Testing: 1 Yielding: 1 Testing: 2 Yielding: 2 Testing: -2 3.2.5 Grouping Data The groupby() function returns an iterator that produces sets of values organized by a common key. This example illustrates grouping related values based on an attribute. from itertools import * import operator import pprint class Point: def __init__(self, x, y): self.x = x self.y = y def __repr__(self): return ’(%s, %s)’ % (self.x, self.y) def __cmp__(self, other): return cmp((self.x, self.y), (other.x, other.y)) # Create a dataset of Point instances data = list(imap(Point, cycle(islice(count(), 3)), islice(count(), 7), ) ) print ’Data:’ pprint.pprint(data, width=69) print # Try to group the unsorted data based on X values print ’Grouped, unsorted:’ 3.2.5 数据分组 groupby() 函数返回一个迭代器,它会生成按一个公共键组织的值集。下面这个例子展示了 如何根据一个属性对相关的值分组。 ptg 3.2. itertools—Iterator Functions 151 The test expression in check_item() is the same, so the results in this example with ifilterfalse() are the opposite of the results from the previous example. $ python itertools_ifilterfalse.py Testing: -1 Testing: 0 Testing: 1 Yielding: 1 Testing: 2 Yielding: 2 Testing: -2 3.2.5 Grouping Data The groupby() function returns an iterator that produces sets of values organized by a common key. This example illustrates grouping related values based on an attribute. from itertools import * import operator import pprint class Point: def __init__(self, x, y): self.x = x self.y = y def __repr__(self): return ’(%s, %s)’ % (self.x, self.y) def __cmp__(self, other): return cmp((self.x, self.y), (other.x, other.y)) # Create a dataset of Point instances data = list(imap(Point, cycle(islice(count(), 3)), islice(count(), 7), ) ) print ’Data:’ pprint.pprint(data, width=69) print # Try to group the unsorted data based on X values print ’Grouped, unsorted:’ ptg 152 Algorithms for k, g in groupby(data, operator.attrgetter(’x’)): print k, list(g) print # Sort the data data.sort() print ’Sorted:’ pprint.pprint(data, width=69) print # Group the sorted data based on X values print ’Grouped, sorted:’ for k, g in groupby(data, operator.attrgetter(’x’)): print k, list(g) print The input sequence needs to be sorted on the key value in order for the groupings to work out as expected. $ python itertools_groupby_seq.py Data: [(0, 0), (1, 1), (2, 2), (0, 3), (1, 4), (2, 5), (0, 6), (1, 7), (2, 8), (0, 9)] Grouped, unsorted: 0 [(0, 0)] 1 [(1, 1)] 2 [(2, 2)] 0 [(0, 3)] 1 [(1, 4)] 2 [(2, 5)] 0 [(0, 6)] 1 [(1, 7)] ptg 152 Algorithms for k, g in groupby(data, operator.attrgetter(’x’)): print k, list(g) print # Sort the data data.sort() print ’Sorted:’ pprint.pprint(data, width=69) print # Group the sorted data based on X values print ’Grouped, sorted:’ for k, g in groupby(data, operator.attrgetter(’x’)): print k, list(g) print The input sequence needs to be sorted on the key value in order for the groupings to work out as expected. $ python itertools_groupby_seq.py Data: [(0, 0), (1, 1), (2, 2), (0, 3), (1, 4), (2, 5), (0, 6), (1, 7), (2, 8), (0, 9)] Grouped, unsorted: 0 [(0, 0)] 1 [(1, 1)] 2 [(2, 2)] 0 [(0, 3)] 1 [(1, 4)] 2 [(2, 5)] 0 [(0, 6)] 1 [(1, 7)]  122  Python 标准库  输入序列要根据键值排序,以保证得到预期的分组。 ptg 152 Algorithms for k, g in groupby(data, operator.attrgetter(’x’)): print k, list(g) print # Sort the data data.sort() print ’Sorted:’ pprint.pprint(data, width=69) print # Group the sorted data based on X values print ’Grouped, sorted:’ for k, g in groupby(data, operator.attrgetter(’x’)): print k, list(g) print The input sequence needs to be sorted on the key value in order for the groupings to work out as expected. $ python itertools_groupby_seq.py Data: [(0, 0), (1, 1), (2, 2), (0, 3), (1, 4), (2, 5), (0, 6), (1, 7), (2, 8), (0, 9)] Grouped, unsorted: 0 [(0, 0)] 1 [(1, 1)] 2 [(2, 2)] 0 [(0, 3)] 1 [(1, 4)] 2 [(2, 5)] 0 [(0, 6)] 1 [(1, 7)] ptg 3.3. operator—Functional Interface to Built-in Operators 153 2 [(2, 8)] 0 [(0, 9)] Sorted: [(0, 0), (0, 3), (0, 6), (0, 9), (1, 1), (1, 4), (1, 7), (2, 2), (2, 5), (2, 8)] Grouped, sorted: 0 [(0, 0), (0, 3), (0, 6), (0, 9)] 1 [(1, 1), (1, 4), (1, 7)] 2 [(2, 2), (2, 5), (2, 8)] See Also: itertools (http://docs.python.org/library/itertools.html) The standard library docu- mentation for this module. The Standard ML Basis Library (www.standardml.org/Basis/) The library for SML. Definition of Haskell and the Standard Libraries (www.haskell.org/definition/) Standard library specification for the functional language Haskell. Clojure (http://clojure.org/) Clojure is a dynamic functional language that runs on the Java Virtual Machine. tee (http://unixhelp.ed.ac.uk/CGI/man-cgi?tee) UNIX command line tool for split- ting one input into multiple identical output streams. 3.3 operator—Functional Interface to Built-in Operators Purpose Functional interface to built-in operators. Python Version 1.4 and later Programming with iterators occasionally requires creating small functions for simple expressions. Sometimes, these can be implemented as lambda functions, but for some operations, new functions are not needed at all. The operator module defines func- tions that correspond to built-in operations for arithmetic and comparison. ptg 3.3. operator—Functional Interface to Built-in Operators 153 2 [(2, 8)] 0 [(0, 9)] Sorted: [(0, 0), (0, 3), (0, 6), (0, 9), (1, 1), (1, 4), (1, 7), (2, 2), (2, 5), (2, 8)] Grouped, sorted: 0 [(0, 0), (0, 3), (0, 6), (0, 9)] 1 [(1, 1), (1, 4), (1, 7)] 2 [(2, 2), (2, 5), (2, 8)] See Also: itertools (http://docs.python.org/library/itertools.html) The standard library docu- mentation for this module. The Standard ML Basis Library (www.standardml.org/Basis/) The library for SML. Definition of Haskell and the Standard Libraries (www.haskell.org/definition/) Standard library specification for the functional language Haskell. Clojure (http://clojure.org/) Clojure is a dynamic functional language that runs on the Java Virtual Machine. tee (http://unixhelp.ed.ac.uk/CGI/man-cgi?tee) UNIX command line tool for split- ting one input into multiple identical output streams. 3.3 operator—Functional Interface to Built-in Operators Purpose Functional interface to built-in operators. Python Version 1.4 and later Programming with iterators occasionally requires creating small functions for simple expressions. Sometimes, these can be implemented as lambda functions, but for some operations, new functions are not needed at all. The operator module defines func- tions that correspond to built-in operations for arithmetic and comparison. 参见: itertools (http://docs.python.org/library/itertools.html) 这个模块的标准库文档。 第 3 章 算  法 123  The Standard ML Basis Library (www.standardml.org/Basis/) SML 库。 Definition of Haskell and the Standard Libraries (www.haskell.org/definition/) 函数语言 Haskell 的标准库规范。 Clojure (http://clojure.org/) Clojure 是一种在 Java 虚拟机上运行的动态函数语言。 tee (http://unixhelp.ed.ac.uk/CGI/man-cgi?tee) 这是一个 UNIX 命令行工具,用于将一个输 入分解为多个相同的输出流。 3.3 operator—内置操作符的函数接口 作用:内置操作符的函数接口。 Python 版本:1.4 及以后版本 使用迭代器编程时,有时需要为简单的表达式创建小函数。有些情况下,这些确实可以实 现为 lambda 函数,不过对于某些操作根本不需要新函数。operator 模块定义了一些对应算术和 比较内置操作的函数。 3.3.1 逻辑操作 有一些函数可以用来确定一个值的相应 Boolean 值,将其取反来创建相反的 Boolean 值, 以及比较对象查看它们是否相等。 ptg 154 Algorithms 3.3.1 Logical Operations There are functions for determining the Boolean equivalent for a value, negating it to create the opposite Boolean value, and comparing objects to see if they are identical. from operator import * a = -1 b = 5 print ’a =’, a print ’b =’, b print print ’not_(a) :’, not_(a) print ’truth(a) :’, truth(a) print ’is_(a, b) :’, is_(a,b) print ’is_not(a, b):’, is_not(a,b) not_() includes the trailing underscore because not is a Python keyword. truth() applies the same logic used when testing an expression in an if statement. is_() implements the same check used by the is keyword, and is_not() does the same test and returns the opposite answer. $ python operator_boolean.py a = -1 b = 5 not_(a) : False truth(a) : True is_(a, b) : False is_not(a, b): True 3.3.2 Comparison Operators All rich comparison operators are supported. from operator import * a = 1 b = 5.0 not_() 后面有下划线,因为 not 是一个 Python 关键字。truth() 会应用 if 语句中测试一个表 达式时所用的同样的逻辑。is_() 实现了 is 关键字使用的相同检查,is_not() 完成同样的测试, 但返回相反的答案。 ptg 154 Algorithms 3.3.1 Logical Operations There are functions for determining the Boolean equivalent for a value, negating it to create the opposite Boolean value, and comparing objects to see if they are identical. from operator import * a = -1 b = 5 print ’a =’, a print ’b =’, b print print ’not_(a) :’, not_(a) print ’truth(a) :’, truth(a) print ’is_(a, b) :’, is_(a,b) print ’is_not(a, b):’, is_not(a,b) not_() includes the trailing underscore because not is a Python keyword. truth() applies the same logic used when testing an expression in an if statement. is_() implements the same check used by the is keyword, and is_not() does the same test and returns the opposite answer. $ python operator_boolean.py a = -1 b = 5 not_(a) : False truth(a) : True is_(a, b) : False is_not(a, b): True 3.3.2 Comparison Operators All rich comparison operators are supported. from operator import * a = 1 b = 5.0  124  Python 标准库  ptg 154 Algorithms 3.3.1 Logical Operations There are functions for determining the Boolean equivalent for a value, negating it to create the opposite Boolean value, and comparing objects to see if they are identical. from operator import * a = -1 b = 5 print ’a =’, a print ’b =’, b print print ’not_(a) :’, not_(a) print ’truth(a) :’, truth(a) print ’is_(a, b) :’, is_(a,b) print ’is_not(a, b):’, is_not(a,b) not_() includes the trailing underscore because not is a Python keyword. truth() applies the same logic used when testing an expression in an if statement. is_() implements the same check used by the is keyword, and is_not() does the same test and returns the opposite answer. $ python operator_boolean.py a = -1 b = 5 not_(a) : False truth(a) : True is_(a, b) : False is_not(a, b): True 3.3.2 Comparison Operators All rich comparison operators are supported. from operator import * a = 1 b = 5.0 3.3.2 比较操作符 所有富比较操作符都得到支持。 ptg 154 Algorithms 3.3.1 Logical Operations There are functions for determining the Boolean equivalent for a value, negating it to create the opposite Boolean value, and comparing objects to see if they are identical. from operator import * a = -1 b = 5 print ’a =’, a print ’b =’, b print print ’not_(a) :’, not_(a) print ’truth(a) :’, truth(a) print ’is_(a, b) :’, is_(a,b) print ’is_not(a, b):’, is_not(a,b) not_() includes the trailing underscore because not is a Python keyword. truth() applies the same logic used when testing an expression in an if statement. is_() implements the same check used by the is keyword, and is_not() does the same test and returns the opposite answer. $ python operator_boolean.py a = -1 b = 5 not_(a) : False truth(a) : True is_(a, b) : False is_not(a, b): True 3.3.2 Comparison Operators All rich comparison operators are supported. from operator import * a = 1 b = 5.0 ptg 3.3. operator—Functional Interface to Built-in Operators 155 print ’a =’, a print ’b =’, b for func in (lt, le, eq, ne, ge, gt): print ’%s(a, b):’ % func.__name__, func(a, b) The functions are equivalent to the expression syntax using <, <=, ==, >=, and >. $ python operator_comparisons.py a = 1 b = 5.0 lt(a, b): True le(a, b): True eq(a, b): False ne(a, b): True ge(a, b): False gt(a, b): False 3.3.3 Arithmetic Operators The arithmetic operators for manipulating numerical values are also supported. from operator import * a = -1 b = 5.0 c = 2 d = 6 print ’a =’, a print ’b =’, b print ’c =’, c print ’d =’, d print ’\nPositive/Negative:’ print ’abs(a):’, abs(a) print ’neg(a):’, neg(a) print ’neg(b):’, neg(b) print ’pos(a):’, pos(a) print ’pos(b):’, pos(b) 这些函数等价于使用 <、<=、==、>= 和 > 的表达式语法。 ptg 3.3. operator—Functional Interface to Built-in Operators 155 print ’a =’, a print ’b =’, b for func in (lt, le, eq, ne, ge, gt): print ’%s(a, b):’ % func.__name__, func(a, b) The functions are equivalent to the expression syntax using <, <=, ==, >=, and >. $ python operator_comparisons.py a = 1 b = 5.0 lt(a, b): True le(a, b): True eq(a, b): False ne(a, b): True ge(a, b): False gt(a, b): False 3.3.3 Arithmetic Operators The arithmetic operators for manipulating numerical values are also supported. from operator import * a = -1 b = 5.0 c = 2 d = 6 print ’a =’, a print ’b =’, b print ’c =’, c print ’d =’, d print ’\nPositive/Negative:’ print ’abs(a):’, abs(a) print ’neg(a):’, neg(a) print ’neg(b):’, neg(b) print ’pos(a):’, pos(a) print ’pos(b):’, pos(b) 3.3.3 算术操作符 处理数字值的算术操作符也得到支持。 ptg 3.3. operator—Functional Interface to Built-in Operators 155 print ’a =’, a print ’b =’, b for func in (lt, le, eq, ne, ge, gt): print ’%s(a, b):’ % func.__name__, func(a, b) The functions are equivalent to the expression syntax using <, <=, ==, >=, and >. $ python operator_comparisons.py a = 1 b = 5.0 lt(a, b): True le(a, b): True eq(a, b): False ne(a, b): True ge(a, b): False gt(a, b): False 3.3.3 Arithmetic Operators The arithmetic operators for manipulating numerical values are also supported. from operator import * a = -1 b = 5.0 c = 2 d = 6 print ’a =’, a print ’b =’, b print ’c =’, c print ’d =’, d print ’\nPositive/Negative:’ print ’abs(a):’, abs(a) print ’neg(a):’, neg(a) print ’neg(b):’, neg(b) print ’pos(a):’, pos(a) print ’pos(b):’, pos(b) 第 3 章 算  法 125  ptg 3.3. operator—Functional Interface to Built-in Operators 155 print ’a =’, a print ’b =’, b for func in (lt, le, eq, ne, ge, gt): print ’%s(a, b):’ % func.__name__, func(a, b) The functions are equivalent to the expression syntax using <, <=, ==, >=, and >. $ python operator_comparisons.py a = 1 b = 5.0 lt(a, b): True le(a, b): True eq(a, b): False ne(a, b): True ge(a, b): False gt(a, b): False 3.3.3 Arithmetic Operators The arithmetic operators for manipulating numerical values are also supported. from operator import * a = -1 b = 5.0 c = 2 d = 6 print ’a =’, a print ’b =’, b print ’c =’, c print ’d =’, d print ’\nPositive/Negative:’ print ’abs(a):’, abs(a) print ’neg(a):’, neg(a) print ’neg(b):’, neg(b) print ’pos(a):’, pos(a) print ’pos(b):’, pos(b) ptg 156 Algorithms print ’\nArithmetic:’ print ’add(a, b) :’, add(a, b) print ’div(a, b) :’, div(a, b) print ’div(d, c) :’, div(d, c) print ’floordiv(a, b):’, floordiv(a, b) print ’floordiv(d, c):’, floordiv(d, c) print ’mod(a, b) :’, mod(a, b) print ’mul(a, b) :’, mul(a, b) print ’pow(c, d) :’, pow(c, d) print ’sub(b, a) :’, sub(b, a) print ’truediv(a, b) :’, truediv(a, b) print ’truediv(d, c) :’, truediv(d, c) print ’\nBitwise:’ print ’and_(c, d) :’, and_(c, d) print ’invert(c) :’, invert(c) print ’lshift(c, d):’, lshift(c, d) print ’or_(c, d) :’, or_(c, d) print ’rshift(d, c):’, rshift(d, c) print ’xor(c, d) :’, xor(c, d) There are two separate division operators: floordiv() (integer division as implemented in Python before version 3.0) and truediv() (floating-point division). $ python operator_math.py a = -1 b = 5.0 c = 2 d = 6 Positive/Negative: abs(a): 1 neg(a): 1 neg(b): -5.0 pos(a): -1 pos(b): 5.0 Arithmetic: add(a, b) : 4.0 div(a, b) : -0.2 div(d, c) : 3 floordiv(a, b): -1.0 floordiv(d, c): 3 mod(a, b) : 4.0 有两个不同的除法操作符 :floordiv()(Python 3.0 版本之前实现的整数除法)和 truediv() (浮点数除法)。 ptg 156 Algorithms print ’\nArithmetic:’ print ’add(a, b) :’, add(a, b) print ’div(a, b) :’, div(a, b) print ’div(d, c) :’, div(d, c) print ’floordiv(a, b):’, floordiv(a, b) print ’floordiv(d, c):’, floordiv(d, c) print ’mod(a, b) :’, mod(a, b) print ’mul(a, b) :’, mul(a, b) print ’pow(c, d) :’, pow(c, d) print ’sub(b, a) :’, sub(b, a) print ’truediv(a, b) :’, truediv(a, b) print ’truediv(d, c) :’, truediv(d, c) print ’\nBitwise:’ print ’and_(c, d) :’, and_(c, d) print ’invert(c) :’, invert(c) print ’lshift(c, d):’, lshift(c, d) print ’or_(c, d) :’, or_(c, d) print ’rshift(d, c):’, rshift(d, c) print ’xor(c, d) :’, xor(c, d) There are two separate division operators: floordiv() (integer division as implemented in Python before version 3.0) and truediv() (floating-point division). $ python operator_math.py a = -1 b = 5.0 c = 2 d = 6 Positive/Negative: abs(a): 1 neg(a): 1 neg(b): -5.0 pos(a): -1 pos(b): 5.0 Arithmetic: add(a, b) : 4.0 div(a, b) : -0.2 div(d, c) : 3 floordiv(a, b): -1.0 floordiv(d, c): 3 mod(a, b) : 4.0 ptg 156 Algorithms print ’\nArithmetic:’ print ’add(a, b) :’, add(a, b) print ’div(a, b) :’, div(a, b) print ’div(d, c) :’, div(d, c) print ’floordiv(a, b):’, floordiv(a, b) print ’floordiv(d, c):’, floordiv(d, c) print ’mod(a, b) :’, mod(a, b) print ’mul(a, b) :’, mul(a, b) print ’pow(c, d) :’, pow(c, d) print ’sub(b, a) :’, sub(b, a) print ’truediv(a, b) :’, truediv(a, b) print ’truediv(d, c) :’, truediv(d, c) print ’\nBitwise:’ print ’and_(c, d) :’, and_(c, d) print ’invert(c) :’, invert(c) print ’lshift(c, d):’, lshift(c, d) print ’or_(c, d) :’, or_(c, d) print ’rshift(d, c):’, rshift(d, c) print ’xor(c, d) :’, xor(c, d) There are two separate division operators: floordiv() (integer division as implemented in Python before version 3.0) and truediv() (floating-point division). $ python operator_math.py a = -1 b = 5.0 c = 2 d = 6 Positive/Negative: abs(a): 1 neg(a): 1 neg(b): -5.0 pos(a): -1 pos(b): 5.0 Arithmetic: add(a, b) : 4.0 div(a, b) : -0.2 div(d, c) : 3 floordiv(a, b): -1.0 floordiv(d, c): 3 mod(a, b) : 4.0  126  Python 标准库  ptg 3.3. operator—Functional Interface to Built-in Operators 157 mul(a, b) : -5.0 pow(c, d) : 64 sub(b, a) : 6.0 truediv(a, b) : -0.2 truediv(d, c) : 3.0 Bitwise: and_(c, d) : 2 invert(c) : -3 lshift(c, d): 128 or_(c, d) : 6 rshift(d, c): 1 xor(c, d) : 4 3.3.4 Sequence Operators The operators for working with sequences can be divided into four groups: build- ing up sequences, searching for items, accessing contents, and removing items from sequences. from operator import * a = [ 1, 2, 3 ] b = [ ’a’, ’b’, ’c’ ] print ’a =’, a print ’b =’, b print ’\nConstructive:’ print ’ concat(a, b):’, concat(a, b) print ’ repeat(a, 3):’, repeat(a, 3) print ’\nSearching:’ print ’ contains(a, 1) :’, contains(a, 1) print ’ contains(b, "d"):’, contains(b, "d") print ’ countOf(a, 1) :’, countOf(a, 1) print ’ countOf(b, "d") :’, countOf(b, "d") print ’ indexOf(a, 5) :’, indexOf(a, 1) print ’\nAccess Items:’ print ’ getitem(b, 1) :’, getitem(b, 1) print ’ getslice(a, 1, 3) :’, getslice(a, 1, 3) print ’ setitem(b, 1, "d") :’, setitem(b, 1, "d"), print ’, after b =’, b 3.3.4 序列操作符 处理序列的操作符可以划分为 4 组:建立序列、搜索元素、访问内容和从序列删除元素。 ptg 3.3. operator—Functional Interface to Built-in Operators 157 mul(a, b) : -5.0 pow(c, d) : 64 sub(b, a) : 6.0 truediv(a, b) : -0.2 truediv(d, c) : 3.0 Bitwise: and_(c, d) : 2 invert(c) : -3 lshift(c, d): 128 or_(c, d) : 6 rshift(d, c): 1 xor(c, d) : 4 3.3.4 Sequence Operators The operators for working with sequences can be divided into four groups: build- ing up sequences, searching for items, accessing contents, and removing items from sequences. from operator import * a = [ 1, 2, 3 ] b = [ ’a’, ’b’, ’c’ ] print ’a =’, a print ’b =’, b print ’\nConstructive:’ print ’ concat(a, b):’, concat(a, b) print ’ repeat(a, 3):’, repeat(a, 3) print ’\nSearching:’ print ’ contains(a, 1) :’, contains(a, 1) print ’ contains(b, "d"):’, contains(b, "d") print ’ countOf(a, 1) :’, countOf(a, 1) print ’ countOf(b, "d") :’, countOf(b, "d") print ’ indexOf(a, 5) :’, indexOf(a, 1) print ’\nAccess Items:’ print ’ getitem(b, 1) :’, getitem(b, 1) print ’ getslice(a, 1, 3) :’, getslice(a, 1, 3) print ’ setitem(b, 1, "d") :’, setitem(b, 1, "d"), print ’, after b =’, b ptg 158 Algorithms print ’ setslice(a, 1, 3, [4, 5]):’, setslice(a, 1, 3, [4, 5]), print ’, after a =’, a print ’\nDestructive:’ print ’ delitem(b, 1) :’, delitem(b, 1), ’, after b =’, b print ’ delslice(a, 1, 3):’, delslice(a, 1, 3), ’, after a =’, a Some of these operations, such as setitem() and delitem(), modify the sequence in place and do not return a value. $ python operator_sequences.py a = [1, 2, 3] b = [’a’, ’b’, ’c’] Constructive: concat(a, b): [1, 2, 3, ’a’, ’b’, ’c’] repeat(a, 3): [1, 2, 3, 1, 2, 3, 1, 2, 3] Searching: contains(a, 1) : True contains(b, "d"): False countOf(a, 1) : 1 countOf(b, "d") : 0 indexOf(a, 5) : 0 Access Items: getitem(b, 1) : b getslice(a, 1, 3) : [2, 3] setitem(b, 1, "d") : None , after b = [’a’, ’d’, ’c’] setslice(a, 1, 3, [4, 5]): None , after a = [1, 4, 5] Destructive: delitem(b, 1) : None , after b = [’a’, ’c’] delslice(a, 1, 3): None , after a = [1] 3.3.5 In-Place Operators In addition to the standard operators, many types of objects support “in-place” modifi- cation through special operators such as +=. There are equivalent functions for in-place modifications, too. 第 3 章 算  法 127  其中一些操作(如 setitem() 和 delitem())会原地修改序列,而不返回任何值。 ptg 158 Algorithms print ’ setslice(a, 1, 3, [4, 5]):’, setslice(a, 1, 3, [4, 5]), print ’, after a =’, a print ’\nDestructive:’ print ’ delitem(b, 1) :’, delitem(b, 1), ’, after b =’, b print ’ delslice(a, 1, 3):’, delslice(a, 1, 3), ’, after a =’, a Some of these operations, such as setitem() and delitem(), modify the sequence in place and do not return a value. $ python operator_sequences.py a = [1, 2, 3] b = [’a’, ’b’, ’c’] Constructive: concat(a, b): [1, 2, 3, ’a’, ’b’, ’c’] repeat(a, 3): [1, 2, 3, 1, 2, 3, 1, 2, 3] Searching: contains(a, 1) : True contains(b, "d"): False countOf(a, 1) : 1 countOf(b, "d") : 0 indexOf(a, 5) : 0 Access Items: getitem(b, 1) : b getslice(a, 1, 3) : [2, 3] setitem(b, 1, "d") : None , after b = [’a’, ’d’, ’c’] setslice(a, 1, 3, [4, 5]): None , after a = [1, 4, 5] Destructive: delitem(b, 1) : None , after b = [’a’, ’c’] delslice(a, 1, 3): None , after a = [1] 3.3.5 In-Place Operators In addition to the standard operators, many types of objects support “in-place” modifi- cation through special operators such as +=. There are equivalent functions for in-place modifications, too. 3.3.5 原地操作符 除了标准操作符外,很多对象类型还通过一些特殊操作符(如 +=)支持“原地”修改。这 些原地修改也有相应的等价函数。 ptg 3.3. operator—Functional Interface to Built-in Operators 159 from operator import * a = -1 b = 5.0 c = [ 1, 2, 3 ] d = [ ’a’, ’b’, ’c’] print ’a =’, a print ’b =’, b print ’c =’, c print ’d =’, d print a = iadd(a, b) print ’a = iadd(a, b) =>’, a print c = iconcat(c, d) print ’c = iconcat(c, d) =>’, c These examples demonstrate only a few of the functions. Refer to the standard library documentation for complete details. $ python operator_inplace.py a = -1 b = 5.0 c = [1, 2, 3] d = [’a’, ’b’, ’c’] a = iadd(a, b) => 4.0 c = iconcat(c, d) => [1, 2, 3, ’a’, ’b’, ’c’] 3.3.6 Attribute and Item “Getters” One of the most unusual features of the operator module is the concept of getters. These are callable objects constructed at runtime to retrieve attributes of objects or contents from sequences. Getters are especially useful when working with iterators or generator sequences, where they are intended to incur less overhead than a lambda or Python function. ptg 3.3. operator—Functional Interface to Built-in Operators 159 from operator import * a = -1 b = 5.0 c = [ 1, 2, 3 ] d = [ ’a’, ’b’, ’c’] print ’a =’, a print ’b =’, b print ’c =’, c print ’d =’, d print a = iadd(a, b) print ’a = iadd(a, b) =>’, a print c = iconcat(c, d) print ’c = iconcat(c, d) =>’, c These examples demonstrate only a few of the functions. Refer to the standard library documentation for complete details. $ python operator_inplace.py a = -1 b = 5.0 c = [1, 2, 3] d = [’a’, ’b’, ’c’] a = iadd(a, b) => 4.0 c = iconcat(c, d) => [1, 2, 3, ’a’, ’b’, ’c’] 3.3.6 Attribute and Item “Getters” One of the most unusual features of the operator module is the concept of getters. These are callable objects constructed at runtime to retrieve attributes of objects or contents from sequences. Getters are especially useful when working with iterators or generator sequences, where they are intended to incur less overhead than a lambda or Python function.  128  Python 标准库  ptg 3.3. operator—Functional Interface to Built-in Operators 159 from operator import * a = -1 b = 5.0 c = [ 1, 2, 3 ] d = [ ’a’, ’b’, ’c’] print ’a =’, a print ’b =’, b print ’c =’, c print ’d =’, d print a = iadd(a, b) print ’a = iadd(a, b) =>’, a print c = iconcat(c, d) print ’c = iconcat(c, d) =>’, c These examples demonstrate only a few of the functions. Refer to the standard library documentation for complete details. $ python operator_inplace.py a = -1 b = 5.0 c = [1, 2, 3] d = [’a’, ’b’, ’c’] a = iadd(a, b) => 4.0 c = iconcat(c, d) => [1, 2, 3, ’a’, ’b’, ’c’] 3.3.6 Attribute and Item “Getters” One of the most unusual features of the operator module is the concept of getters. These are callable objects constructed at runtime to retrieve attributes of objects or contents from sequences. Getters are especially useful when working with iterators or generator sequences, where they are intended to incur less overhead than a lambda or Python function. 这些例子只展示了部分函数。要全面了解有关详细内容,请参考标准库文档。 ptg 3.3. operator—Functional Interface to Built-in Operators 159 from operator import * a = -1 b = 5.0 c = [ 1, 2, 3 ] d = [ ’a’, ’b’, ’c’] print ’a =’, a print ’b =’, b print ’c =’, c print ’d =’, d print a = iadd(a, b) print ’a = iadd(a, b) =>’, a print c = iconcat(c, d) print ’c = iconcat(c, d) =>’, c These examples demonstrate only a few of the functions. Refer to the standard library documentation for complete details. $ python operator_inplace.py a = -1 b = 5.0 c = [1, 2, 3] d = [’a’, ’b’, ’c’] a = iadd(a, b) => 4.0 c = iconcat(c, d) => [1, 2, 3, ’a’, ’b’, ’c’] 3.3.6 Attribute and Item “Getters” One of the most unusual features of the operator module is the concept of getters. These are callable objects constructed at runtime to retrieve attributes of objects or contents from sequences. Getters are especially useful when working with iterators or generator sequences, where they are intended to incur less overhead than a lambda or Python function. 3.3.6 属性和元素“获取方法” operator 模块最特别的特性之一是获取方法(getter)的概念。获取方法是运行时构造的一 些可回调对象,用来获取对象的属性或序列的内容。获取方法在处理迭代器或生成器序列时特 别有用,它们引入的开销会大大低于 lambda 或 Python 函数的开销。 ptg 160 Algorithms from operator import * class MyObj(object): """example class for attrgetter""" def __init__(self, arg): super(MyObj, self).__init__() self.arg = arg def __repr__(self): return ’MyObj(%s)’ % self.arg l = [ MyObj(i) for i in xrange(5) ] print ’objects :’, l # Extract the ’arg’ value from each object g = attrgetter(’arg’) vals = [ g(i) for i in l ] print ’arg values:’, vals # Sort using arg l.reverse() print ’reversed :’, l print ’sorted :’, sorted(l, key=g) Attribute getters work like lambda x, n=’attrname’: getattr(x, n): $ python operator_attrgetter.py objects : [MyObj(0), MyObj(1), MyObj(2), MyObj(3), MyObj(4)] arg values: [0, 1, 2, 3, 4] reversed : [MyObj(4), MyObj(3), MyObj(2), MyObj(1), MyObj(0)] sorted : [MyObj(0), MyObj(1), MyObj(2), MyObj(3), MyObj(4)] Item getters work like lambda x, y=5: x[y]: from operator import * l = [ dict(val=-1 * i) for i in xrange(4) ] print ’Dictionaries:’, l g = itemgetter(’val’) vals = [ g(i) for i in l ] print ’ values:’, vals print ’ sorted:’, sorted(l, key=g) ptg 160 Algorithms from operator import * class MyObj(object): """example class for attrgetter""" def __init__(self, arg): super(MyObj, self).__init__() self.arg = arg def __repr__(self): return ’MyObj(%s)’ % self.arg l = [ MyObj(i) for i in xrange(5) ] print ’objects :’, l # Extract the ’arg’ value from each object g = attrgetter(’arg’) vals = [ g(i) for i in l ] print ’arg values:’, vals # Sort using arg l.reverse() print ’reversed :’, l print ’sorted :’, sorted(l, key=g) Attribute getters work like lambda x, n=’attrname’: getattr(x, n): $ python operator_attrgetter.py objects : [MyObj(0), MyObj(1), MyObj(2), MyObj(3), MyObj(4)] arg values: [0, 1, 2, 3, 4] reversed : [MyObj(4), MyObj(3), MyObj(2), MyObj(1), MyObj(0)] sorted : [MyObj(0), MyObj(1), MyObj(2), MyObj(3), MyObj(4)] Item getters work like lambda x, y=5: x[y]: from operator import * l = [ dict(val=-1 * i) for i in xrange(4) ] print ’Dictionaries:’, l g = itemgetter(’val’) vals = [ g(i) for i in l ] print ’ values:’, vals print ’ sorted:’, sorted(l, key=g) 属性获取方法的工作类似于 lambda x, n='attrname': getattr(x, n): ptg 160 Algorithms from operator import * class MyObj(object): """example class for attrgetter""" def __init__(self, arg): super(MyObj, self).__init__() self.arg = arg def __repr__(self): return ’MyObj(%s)’ % self.arg l = [ MyObj(i) for i in xrange(5) ] print ’objects :’, l # Extract the ’arg’ value from each object g = attrgetter(’arg’) vals = [ g(i) for i in l ] print ’arg values:’, vals # Sort using arg l.reverse() print ’reversed :’, l print ’sorted :’, sorted(l, key=g) Attribute getters work like lambda x, n=’attrname’: getattr(x, n): $ python operator_attrgetter.py objects : [MyObj(0), MyObj(1), MyObj(2), MyObj(3), MyObj(4)] arg values: [0, 1, 2, 3, 4] reversed : [MyObj(4), MyObj(3), MyObj(2), MyObj(1), MyObj(0)] sorted : [MyObj(0), MyObj(1), MyObj(2), MyObj(3), MyObj(4)] Item getters work like lambda x, y=5: x[y]: from operator import * l = [ dict(val=-1 * i) for i in xrange(4) ] print ’Dictionaries:’, l g = itemgetter(’val’) vals = [ g(i) for i in l ] print ’ values:’, vals print ’ sorted:’, sorted(l, key=g) 第 3 章 算  法 129  ptg 160 Algorithms from operator import * class MyObj(object): """example class for attrgetter""" def __init__(self, arg): super(MyObj, self).__init__() self.arg = arg def __repr__(self): return ’MyObj(%s)’ % self.arg l = [ MyObj(i) for i in xrange(5) ] print ’objects :’, l # Extract the ’arg’ value from each object g = attrgetter(’arg’) vals = [ g(i) for i in l ] print ’arg values:’, vals # Sort using arg l.reverse() print ’reversed :’, l print ’sorted :’, sorted(l, key=g) Attribute getters work like lambda x, n=’attrname’: getattr(x, n): $ python operator_attrgetter.py objects : [MyObj(0), MyObj(1), MyObj(2), MyObj(3), MyObj(4)] arg values: [0, 1, 2, 3, 4] reversed : [MyObj(4), MyObj(3), MyObj(2), MyObj(1), MyObj(0)] sorted : [MyObj(0), MyObj(1), MyObj(2), MyObj(3), MyObj(4)] Item getters work like lambda x, y=5: x[y]: from operator import * l = [ dict(val=-1 * i) for i in xrange(4) ] print ’Dictionaries:’, l g = itemgetter(’val’) vals = [ g(i) for i in l ] print ’ values:’, vals print ’ sorted:’, sorted(l, key=g) 元素获取方法的工作类似于 lambda x, y=5: x[y]: ptg 160 Algorithms from operator import * class MyObj(object): """example class for attrgetter""" def __init__(self, arg): super(MyObj, self).__init__() self.arg = arg def __repr__(self): return ’MyObj(%s)’ % self.arg l = [ MyObj(i) for i in xrange(5) ] print ’objects :’, l # Extract the ’arg’ value from each object g = attrgetter(’arg’) vals = [ g(i) for i in l ] print ’arg values:’, vals # Sort using arg l.reverse() print ’reversed :’, l print ’sorted :’, sorted(l, key=g) Attribute getters work like lambda x, n=’attrname’: getattr(x, n): $ python operator_attrgetter.py objects : [MyObj(0), MyObj(1), MyObj(2), MyObj(3), MyObj(4)] arg values: [0, 1, 2, 3, 4] reversed : [MyObj(4), MyObj(3), MyObj(2), MyObj(1), MyObj(0)] sorted : [MyObj(0), MyObj(1), MyObj(2), MyObj(3), MyObj(4)] Item getters work like lambda x, y=5: x[y]: from operator import * l = [ dict(val=-1 * i) for i in xrange(4) ] print ’Dictionaries:’, l g = itemgetter(’val’) vals = [ g(i) for i in l ] print ’ values:’, vals print ’ sorted:’, sorted(l, key=g) ptg 3.3. operator—Functional Interface to Built-in Operators 161 print l = [ (i, i*-2) for i in xrange(4) ] print ’Tuples :’, l g = itemgetter(1) vals = [ g(i) for i in l ] print ’ values:’, vals print ’ sorted:’, sorted(l, key=g) Item getters work with mappings as well as sequences. $ python operator_itemgetter.py Dictionaries: [{’val’: 0}, {’val’: -1}, {’val’: -2}, {’val’: -3}] values: [0, -1, -2, -3] sorted: [{’val’: -3}, {’val’: -2}, {’val’: -1}, {’val’: 0}] Tuples : [(0, 0), (1, -2), (2, -4), (3, -6)] values: [0, -2, -4, -6] sorted: [(3, -6), (2, -4), (1, -2), (0, 0)] 3.3.7 Combining Operators and Custom Classes The functions in the operator module work via the standard Python interfaces for their operations, so they work with user-defined classes as well as the built-in types. from operator import * class MyObj(object): """Example for operator overloading""" def __init__(self, val): super(MyObj, self).__init__() self.val = val return def __str__(self): return ’MyObj(%s)’ % self.val def __lt__(self, other): """compare for less-than""" print ’Testing %s < %s’ % (self, other) return self.val < other.val def __add__(self, other): """add values""" 除了序列外,元素获取方法还适用于映射。 ptg 3.3. operator—Functional Interface to Built-in Operators 161 print l = [ (i, i*-2) for i in xrange(4) ] print ’Tuples :’, l g = itemgetter(1) vals = [ g(i) for i in l ] print ’ values:’, vals print ’ sorted:’, sorted(l, key=g) Item getters work with mappings as well as sequences. $ python operator_itemgetter.py Dictionaries: [{’val’: 0}, {’val’: -1}, {’val’: -2}, {’val’: -3}] values: [0, -1, -2, -3] sorted: [{’val’: -3}, {’val’: -2}, {’val’: -1}, {’val’: 0}] Tuples : [(0, 0), (1, -2), (2, -4), (3, -6)] values: [0, -2, -4, -6] sorted: [(3, -6), (2, -4), (1, -2), (0, 0)] 3.3.7 Combining Operators and Custom Classes The functions in the operator module work via the standard Python interfaces for their operations, so they work with user-defined classes as well as the built-in types. from operator import * class MyObj(object): """Example for operator overloading""" def __init__(self, val): super(MyObj, self).__init__() self.val = val return def __str__(self): return ’MyObj(%s)’ % self.val def __lt__(self, other): """compare for less-than""" print ’Testing %s < %s’ % (self, other) return self.val < other.val def __add__(self, other): """add values""" 3.3.7 结合操作符和定制类 operator 模块中的函数通过相应操作的标准 Python 接口完成工作,所以它们不仅适用于内 置类型,还适用于用户定义的类。 ptg 3.3. operator—Functional Interface to Built-in Operators 161 print l = [ (i, i*-2) for i in xrange(4) ] print ’Tuples :’, l g = itemgetter(1) vals = [ g(i) for i in l ] print ’ values:’, vals print ’ sorted:’, sorted(l, key=g) Item getters work with mappings as well as sequences. $ python operator_itemgetter.py Dictionaries: [{’val’: 0}, {’val’: -1}, {’val’: -2}, {’val’: -3}] values: [0, -1, -2, -3] sorted: [{’val’: -3}, {’val’: -2}, {’val’: -1}, {’val’: 0}] Tuples : [(0, 0), (1, -2), (2, -4), (3, -6)] values: [0, -2, -4, -6] sorted: [(3, -6), (2, -4), (1, -2), (0, 0)] 3.3.7 Combining Operators and Custom Classes The functions in the operator module work via the standard Python interfaces for their operations, so they work with user-defined classes as well as the built-in types. from operator import * class MyObj(object): """Example for operator overloading""" def __init__(self, val): super(MyObj, self).__init__() self.val = val return def __str__(self): return ’MyObj(%s)’ % self.val def __lt__(self, other): """compare for less-than""" print ’Testing %s < %s’ % (self, other) return self.val < other.val def __add__(self, other): """add values""" ptg 3.3. operator—Functional Interface to Built-in Operators 161 print l = [ (i, i*-2) for i in xrange(4) ] print ’Tuples :’, l g = itemgetter(1) vals = [ g(i) for i in l ] print ’ values:’, vals print ’ sorted:’, sorted(l, key=g) Item getters work with mappings as well as sequences. $ python operator_itemgetter.py Dictionaries: [{’val’: 0}, {’val’: -1}, {’val’: -2}, {’val’: -3}] values: [0, -1, -2, -3] sorted: [{’val’: -3}, {’val’: -2}, {’val’: -1}, {’val’: 0}] Tuples : [(0, 0), (1, -2), (2, -4), (3, -6)] values: [0, -2, -4, -6] sorted: [(3, -6), (2, -4), (1, -2), (0, 0)] 3.3.7 Combining Operators and Custom Classes The functions in the operator module work via the standard Python interfaces for their operations, so they work with user-defined classes as well as the built-in types. from operator import * class MyObj(object): """Example for operator overloading""" def __init__(self, val): super(MyObj, self).__init__() self.val = val return def __str__(self): return ’MyObj(%s)’ % self.val def __lt__(self, other): """compare for less-than""" print ’Testing %s < %s’ % (self, other) return self.val < other.val def __add__(self, other): """add values"""  130  Python 标准库  ptg 3.3. operator—Functional Interface to Built-in Operators 161 print l = [ (i, i*-2) for i in xrange(4) ] print ’Tuples :’, l g = itemgetter(1) vals = [ g(i) for i in l ] print ’ values:’, vals print ’ sorted:’, sorted(l, key=g) Item getters work with mappings as well as sequences. $ python operator_itemgetter.py Dictionaries: [{’val’: 0}, {’val’: -1}, {’val’: -2}, {’val’: -3}] values: [0, -1, -2, -3] sorted: [{’val’: -3}, {’val’: -2}, {’val’: -1}, {’val’: 0}] Tuples : [(0, 0), (1, -2), (2, -4), (3, -6)] values: [0, -2, -4, -6] sorted: [(3, -6), (2, -4), (1, -2), (0, 0)] 3.3.7 Combining Operators and Custom Classes The functions in the operator module work via the standard Python interfaces for their operations, so they work with user-defined classes as well as the built-in types. from operator import * class MyObj(object): """Example for operator overloading""" def __init__(self, val): super(MyObj, self).__init__() self.val = val return def __str__(self): return ’MyObj(%s)’ % self.val def __lt__(self, other): """compare for less-than""" print ’Testing %s < %s’ % (self, other) return self.val < other.val def __add__(self, other): """add values""" ptg 162 Algorithms print ’Adding %s + %s’ % (self, other) return MyObj(self.val + other.val) a = MyObj(1) b = MyObj(2) print ’Comparison:’ print lt(a, b) print ’\nArithmetic:’ print add(a, b) Refer to the Python reference guide for a complete list of the special methods each operator uses. $ python operator_classes.py Comparison: Testing MyObj(1) < MyObj(2) True Arithmetic: Adding MyObj(1) + MyObj(2) MyObj(3) 3.3.8 Type Checking The operator module also includes functions for testing API compliance for mapping, number, and sequence types. from operator import * class NoType(object): """Supports none of the type APIs""" class MultiType(object): """Supports multiple type APIs""" def __len__(self): return 0 def __getitem__(self, name): return ’mapping’ def __int__(self): return 0 要全面了解每个操作符使用的所有特殊方法,请参考 Python 参考指南。 ptg 162 Algorithms print ’Adding %s + %s’ % (self, other) return MyObj(self.val + other.val) a = MyObj(1) b = MyObj(2) print ’Comparison:’ print lt(a, b) print ’\nArithmetic:’ print add(a, b) Refer to the Python reference guide for a complete list of the special methods each operator uses. $ python operator_classes.py Comparison: Testing MyObj(1) < MyObj(2) True Arithmetic: Adding MyObj(1) + MyObj(2) MyObj(3) 3.3.8 Type Checking The operator module also includes functions for testing API compliance for mapping, number, and sequence types. from operator import * class NoType(object): """Supports none of the type APIs""" class MultiType(object): """Supports multiple type APIs""" def __len__(self): return 0 def __getitem__(self, name): return ’mapping’ def __int__(self): return 0 3.3.8 类型检查 operator 模块还包含一些函数来测试映射、数字和序列类型的 API 兼容性。 ptg 162 Algorithms print ’Adding %s + %s’ % (self, other) return MyObj(self.val + other.val) a = MyObj(1) b = MyObj(2) print ’Comparison:’ print lt(a, b) print ’\nArithmetic:’ print add(a, b) Refer to the Python reference guide for a complete list of the special methods each operator uses. $ python operator_classes.py Comparison: Testing MyObj(1) < MyObj(2) True Arithmetic: Adding MyObj(1) + MyObj(2) MyObj(3) 3.3.8 Type Checking The operator module also includes functions for testing API compliance for mapping, number, and sequence types. from operator import * class NoType(object): """Supports none of the type APIs""" class MultiType(object): """Supports multiple type APIs""" def __len__(self): return 0 def __getitem__(self, name): return ’mapping’ def __int__(self): return 0 ptg 162 Algorithms print ’Adding %s + %s’ % (self, other) return MyObj(self.val + other.val) a = MyObj(1) b = MyObj(2) print ’Comparison:’ print lt(a, b) print ’\nArithmetic:’ print add(a, b) Refer to the Python reference guide for a complete list of the special methods each operator uses. $ python operator_classes.py Comparison: Testing MyObj(1) < MyObj(2) True Arithmetic: Adding MyObj(1) + MyObj(2) MyObj(3) 3.3.8 Type Checking The operator module also includes functions for testing API compliance for mapping, number, and sequence types. from operator import * class NoType(object): """Supports none of the type APIs""" class MultiType(object): """Supports multiple type APIs""" def __len__(self): return 0 def __getitem__(self, name): return ’mapping’ def __int__(self): return 0 第 3 章 算  法 131  ptg 3.4. contextlib—Context Manager Utilities 163 o = NoType() t = MultiType() for func in (isMappingType, isNumberType, isSequenceType): print ’%s(o):’ % func.__name__, func(o) print ’%s(t):’ % func.__name__, func(t) The tests are not perfect, since the interfaces are not strictly defined, but they do provide some idea of what is supported. $ python operator_typechecking.py isMappingType(o): False isMappingType(t): True isNumberType(o): False isNumberType(t): True isSequenceType(o): False isSequenceType(t): True See Also: operator (http://docs.python.org/lib/module-operator.html) Standard library docu- mentation for this module. functools (page 129) Functional programming tools, including the total_ ordering() decorator for adding rich comparison methods to a class. itertools (page 141) Iterator operations. abc (page 1178) The abc module includes abstract base classes that define the APIs for collection types. 3.4 contextlib—Context Manager Utilities Purpose Utilities for creating and working with context managers. Python Version 2.5 and later The contextlib module contains utilities for working with context managers and the with statement. Note: Context managers are tied to the with statement. Since with is officially part of Python 2.6, import it from __future__ before using contextlib in Python 2.5. 这些测试并不完善,因为接口没有严格定义,不过通过这些测试,确实能让我们对支持哪 些功能有所了解。 ptg 3.4. contextlib—Context Manager Utilities 163 o = NoType() t = MultiType() for func in (isMappingType, isNumberType, isSequenceType): print ’%s(o):’ % func.__name__, func(o) print ’%s(t):’ % func.__name__, func(t) The tests are not perfect, since the interfaces are not strictly defined, but they do provide some idea of what is supported. $ python operator_typechecking.py isMappingType(o): False isMappingType(t): True isNumberType(o): False isNumberType(t): True isSequenceType(o): False isSequenceType(t): True See Also: operator (http://docs.python.org/lib/module-operator.html) Standard library docu- mentation for this module. functools (page 129) Functional programming tools, including the total_ ordering() decorator for adding rich comparison methods to a class. itertools (page 141) Iterator operations. abc (page 1178) The abc module includes abstract base classes that define the APIs for collection types. 3.4 contextlib—Context Manager Utilities Purpose Utilities for creating and working with context managers. Python Version 2.5 and later The contextlib module contains utilities for working with context managers and the with statement. Note: Context managers are tied to the with statement. Since with is officially part of Python 2.6, import it from __future__ before using contextlib in Python 2.5. 参见: operator (http://docs.python.org/lib/module-operator.html) 这个模块的标准库文档。 functools (3.1 节) 函数编程工具,包括 total_ordering() 修饰符用于为类添加富比较方法。 itertools (3.2 节) 迭代器操作。 abc (18.2 节) abc 模块包含一些抽象基类,为集合类型定义了 API。 3.4 contextlib—上下文管理器工具 作用:创建和处理上下文管理器的工具。 Python 版本:2.5 及以后版本 contextlib 模块包含一些工具,用于处理上下文管理器和 with 语句。 注意 :上下文管理器要与 with 语句关联。由于官方认为 with 是 Python 2.6 的一部分,因此在 Python 2.5 中使用 contextlib 之前要从 __future__ 将其导入。 3.4.1 上下文管理器 API 上下文管理器(context manager)要负责一个代码块中的资源,可能在进入代码块时创建 资源,然后在退出代码块时清理这个资源。例如,文件支持上下文管理器 API,可以很容易地 确保完成文件读写后关闭文件。 ptg 164 Algorithms 3.4.1 Context Manager API A context manager is responsible for a resource within a code block, possibly creating it when the block is entered and then cleaning it up after the block is exited. For example, files support the context manager API to make it easy to ensure they are closed after all reading or writing is done. with open(’/tmp/pymotw.txt’, ’wt’) as f: f.write(’contents go here’) # file is automatically closed A context manager is enabled by the with statement, and the API involves two methods. The __enter__() method is run when execution flow enters the code block inside the with. It returns an object to be used within the context. When execution flow leaves the with block, the __exit__() method of the context manager is called to clean up any resources being used. class Context(object): def __init__(self): print ’__init__()’ def __enter__(self): print ’__enter__()’ return self def __exit__(self, exc_type, exc_val, exc_tb): print ’__exit__()’ with Context(): print ’Doing work in the context’ Combining a context manager and the with statement is a more compact way of writing a try:finally block, since the context manager’s __exit__() method is always called, even if an exception is raised. $ python contextlib_api.py __init__() __enter__() Doing work in the context __exit__() 上下文管理器由 with 语句启用,这个 API 包括两个方法。当执行流进入 with 中的代码块  132  Python 标准库  时会运行 __enter__() 方法。它会返回一个对象,在这个上下文中使用。当执行流离开 with 块 时,则调用这个上下文管理器的 __exit__() 方法来清理所使用的资源。 ptg 164 Algorithms 3.4.1 Context Manager API A context manager is responsible for a resource within a code block, possibly creating it when the block is entered and then cleaning it up after the block is exited. For example, files support the context manager API to make it easy to ensure they are closed after all reading or writing is done. with open(’/tmp/pymotw.txt’, ’wt’) as f: f.write(’contents go here’) # file is automatically closed A context manager is enabled by the with statement, and the API involves two methods. The __enter__() method is run when execution flow enters the code block inside the with. It returns an object to be used within the context. When execution flow leaves the with block, the __exit__() method of the context manager is called to clean up any resources being used. class Context(object): def __init__(self): print ’__init__()’ def __enter__(self): print ’__enter__()’ return self def __exit__(self, exc_type, exc_val, exc_tb): print ’__exit__()’ with Context(): print ’Doing work in the context’ Combining a context manager and the with statement is a more compact way of writing a try:finally block, since the context manager’s __exit__() method is always called, even if an exception is raised. $ python contextlib_api.py __init__() __enter__() Doing work in the context __exit__() 结合上下文管理器与 with 语句是 try:finally 块的一种更紧凑的写法,因为上下文管理器的 __exit__() 方法总会调用,即使在产生异常的情况下也是如此。 ptg 164 Algorithms 3.4.1 Context Manager API A context manager is responsible for a resource within a code block, possibly creating it when the block is entered and then cleaning it up after the block is exited. For example, files support the context manager API to make it easy to ensure they are closed after all reading or writing is done. with open(’/tmp/pymotw.txt’, ’wt’) as f: f.write(’contents go here’) # file is automatically closed A context manager is enabled by the with statement, and the API involves two methods. The __enter__() method is run when execution flow enters the code block inside the with. It returns an object to be used within the context. When execution flow leaves the with block, the __exit__() method of the context manager is called to clean up any resources being used. class Context(object): def __init__(self): print ’__init__()’ def __enter__(self): print ’__enter__()’ return self def __exit__(self, exc_type, exc_val, exc_tb): print ’__exit__()’ with Context(): print ’Doing work in the context’ Combining a context manager and the with statement is a more compact way of writing a try:finally block, since the context manager’s __exit__() method is always called, even if an exception is raised. $ python contextlib_api.py __init__() __enter__() Doing work in the context __exit__() 如果在 with 语句的 as 子句中指定了名称,__enter__() 方法可以返回与这个名称相关联的 任何对象。在这个例子中,Context 会返回使用了上下文的对象。 ptg 3.4. contextlib—Context Manager Utilities 165 The __enter__() method can return any object to be associated with a name specified in the as clause of the with statement. In this example, the Context returns an object that uses the open context. class WithinContext(object): def __init__(self, context): print ’WithinContext.__init__(%s)’ % context def do_something(self): print ’WithinContext.do_something()’ def __del__(self): print ’WithinContext.__del__’ class Context(object): def __init__(self): print ’Context.__init__()’ def __enter__(self): print ’Context.__enter__()’ return WithinContext(self) def __exit__(self, exc_type, exc_val, exc_tb): print ’Context.__exit__()’ with Context() as c: c.do_something() The value associated with the variable c is the object returned by __enter__(), which is not necessarily the Context instance created in the with statement. $ python contextlib_api_other_object.py Context.__init__() Context.__enter__() WithinContext.__init__(<__main__.Context object at 0x100d98a10>) WithinContext.do_something() Context.__exit__() WithinContext.__del__ The __exit__() method receives arguments containing details of any exception raised in the with block. ptg 3.4. contextlib—Context Manager Utilities 165 The __enter__() method can return any object to be associated with a name specified in the as clause of the with statement. In this example, the Context returns an object that uses the open context. class WithinContext(object): def __init__(self, context): print ’WithinContext.__init__(%s)’ % context def do_something(self): print ’WithinContext.do_something()’ def __del__(self): print ’WithinContext.__del__’ class Context(object): def __init__(self): print ’Context.__init__()’ def __enter__(self): print ’Context.__enter__()’ return WithinContext(self) def __exit__(self, exc_type, exc_val, exc_tb): print ’Context.__exit__()’ with Context() as c: c.do_something() The value associated with the variable c is the object returned by __enter__(), which is not necessarily the Context instance created in the with statement. $ python contextlib_api_other_object.py Context.__init__() Context.__enter__() WithinContext.__init__(<__main__.Context object at 0x100d98a10>) WithinContext.do_something() Context.__exit__() WithinContext.__del__ The __exit__() method receives arguments containing details of any exception raised in the with block. 与变量 c 关联的值是 __enter__() 返回的对象,这不一定是 with 语句中创建的 Context 第 3 章 算  法 133  实例。 ptg 3.4. contextlib—Context Manager Utilities 165 The __enter__() method can return any object to be associated with a name specified in the as clause of the with statement. In this example, the Context returns an object that uses the open context. class WithinContext(object): def __init__(self, context): print ’WithinContext.__init__(%s)’ % context def do_something(self): print ’WithinContext.do_something()’ def __del__(self): print ’WithinContext.__del__’ class Context(object): def __init__(self): print ’Context.__init__()’ def __enter__(self): print ’Context.__enter__()’ return WithinContext(self) def __exit__(self, exc_type, exc_val, exc_tb): print ’Context.__exit__()’ with Context() as c: c.do_something() The value associated with the variable c is the object returned by __enter__(), which is not necessarily the Context instance created in the with statement. $ python contextlib_api_other_object.py Context.__init__() Context.__enter__() WithinContext.__init__(<__main__.Context object at 0x100d98a10>) WithinContext.do_something() Context.__exit__() WithinContext.__del__ The __exit__() method receives arguments containing details of any exception raised in the with block. __exit__() 方法接收一些参数,其中包含 with 块中产生的异常的详细信息。 ptg 166 Algorithms class Context(object): def __init__(self, handle_error): print ’__init__(%s)’ % handle_error self.handle_error = handle_error def __enter__(self): print ’__enter__()’ return self def __exit__(self, exc_type, exc_val, exc_tb): print ’__exit__()’ print ’ exc_type =’, exc_type print ’ exc_val =’, exc_val print ’ exc_tb =’, exc_tb return self.handle_error with Context(True): raise RuntimeError(’error message handled’) print with Context(False): raise RuntimeError(’error message propagated’) If the context manager can handle the exception, __exit__() should return a true value to indicate that the exception does not need to be propagated. Returning false causes the exception to be reraised after __exit__() returns. $ python contextlib_api_error.py __init__(True) __enter__() __exit__() exc_type = exc_val = error message handled exc_tb = __init__(False) __enter__() __exit__() exc_type = exc_val = error message propagated exc_tb = 如果上下文管理器可以处理这个异常,__exit__() 应当返回一个 true 值来指示不需要传播 这个异常。如果返回 false,就会导致 __exit__() 返回后重新抛出这个异常。 ptg 166 Algorithms class Context(object): def __init__(self, handle_error): print ’__init__(%s)’ % handle_error self.handle_error = handle_error def __enter__(self): print ’__enter__()’ return self def __exit__(self, exc_type, exc_val, exc_tb): print ’__exit__()’ print ’ exc_type =’, exc_type print ’ exc_val =’, exc_val print ’ exc_tb =’, exc_tb return self.handle_error with Context(True): raise RuntimeError(’error message handled’) print with Context(False): raise RuntimeError(’error message propagated’) If the context manager can handle the exception, __exit__() should return a true value to indicate that the exception does not need to be propagated. Returning false causes the exception to be reraised after __exit__() returns. $ python contextlib_api_error.py __init__(True) __enter__() __exit__() exc_type = exc_val = error message handled exc_tb = __init__(False) __enter__() __exit__() exc_type = exc_val = error message propagated exc_tb =  134  Python 标准库  ptg 166 Algorithms class Context(object): def __init__(self, handle_error): print ’__init__(%s)’ % handle_error self.handle_error = handle_error def __enter__(self): print ’__enter__()’ return self def __exit__(self, exc_type, exc_val, exc_tb): print ’__exit__()’ print ’ exc_type =’, exc_type print ’ exc_val =’, exc_val print ’ exc_tb =’, exc_tb return self.handle_error with Context(True): raise RuntimeError(’error message handled’) print with Context(False): raise RuntimeError(’error message propagated’) If the context manager can handle the exception, __exit__() should return a true value to indicate that the exception does not need to be propagated. Returning false causes the exception to be reraised after __exit__() returns. $ python contextlib_api_error.py __init__(True) __enter__() __exit__() exc_type = exc_val = error message handled exc_tb = __init__(False) __enter__() __exit__() exc_type = exc_val = error message propagated exc_tb = ptg 3.4. contextlib—Context Manager Utilities 167 Traceback (most recent call last): File "contextlib_api_error.py", line 33, in raise RuntimeError(’error message propagated’) RuntimeError: error message propagated 3.4.2 From Generator to Context Manager Creating context managers the traditional way, by writing a class with __enter__() and __exit__() methods, is not difficult. But sometimes, writing everything out fully is extra overhead for a trivial bit of context. In those sorts of situations, use the contextmanager() decorator to convert a generator function into a context manager. import contextlib @contextlib.contextmanager def make_context(): print ’ entering’ try: yield {} except RuntimeError, err: print ’ ERROR:’, err finally: print ’ exiting’ print ’Normal:’ with make_context() as value: print ’ inside with statement:’, value print ’\nHandled error:’ with make_context() as value: raise RuntimeError(’showing example of handling an error’) print ’\nUnhandled error:’ with make_context() as value: raise ValueError(’this exception is not handled’) The generator should initialize the context, yield exactly one time, and then clean up the context. The value yielded, if any, is bound to the variable in the as clause of the with statement. Exceptions from within the with block are reraised inside the generator, so they can be handled there. 3.4.2 从生成器到上下文管理器 采用传统方式创建上下文管理器,即编写一个包含 __enter__() 和 __exit__() 方法的类,这 并不难。不过有些时候,对于很少的上下文来说,完全编写所有代码会是额外的负担。在这些 情况下,可以使用 contextmanager() 修饰符将一个生成器函数转换为上下文管理器。 ptg 3.4. contextlib—Context Manager Utilities 167 Traceback (most recent call last): File "contextlib_api_error.py", line 33, in raise RuntimeError(’error message propagated’) RuntimeError: error message propagated 3.4.2 From Generator to Context Manager Creating context managers the traditional way, by writing a class with __enter__() and __exit__() methods, is not difficult. But sometimes, writing everything out fully is extra overhead for a trivial bit of context. In those sorts of situations, use the contextmanager() decorator to convert a generator function into a context manager. import contextlib @contextlib.contextmanager def make_context(): print ’ entering’ try: yield {} except RuntimeError, err: print ’ ERROR:’, err finally: print ’ exiting’ print ’Normal:’ with make_context() as value: print ’ inside with statement:’, value print ’\nHandled error:’ with make_context() as value: raise RuntimeError(’showing example of handling an error’) print ’\nUnhandled error:’ with make_context() as value: raise ValueError(’this exception is not handled’) The generator should initialize the context, yield exactly one time, and then clean up the context. The value yielded, if any, is bound to the variable in the as clause of the with statement. Exceptions from within the with block are reraised inside the generator, so they can be handled there. 生成器要初始化上下文,用 yield 生成一次值,然后清理上下文。所生成的值(如果有)会 绑定到 with 语句 as 子句中的变量。with 块中的异常会在生成器中重新抛出,使之在生成器中 得到处理。 ptg 168 Algorithms $ python contextlib_contextmanager.py Normal: entering inside with statement: {} exiting Handled error: entering ERROR: showing example of handling an error exiting Unhandled error: entering exiting Traceback (most recent call last): File "contextlib_contextmanager.py", line 34, in raise ValueError(’this exception is not handled’) ValueError: this exception is not handled 3.4.3 Nesting Contexts At times, it is necessary to manage multiple contexts simultaneously (such as when copying data between input and output file handles, for example). It is possible to nest with statements one inside another, but if the outer contexts do not need their own separate block, this adds to the indention level without giving any real benefit. Using nested() nests the contexts using a single with statement. import contextlib @contextlib.contextmanager def make_context(name): print ’entering:’, name yield name print ’exiting :’, name with contextlib.nested(make_context(’A’), make_context(’B’)) as (A, B): print ’inside with statement:’, A, B Program execution leaves the contexts in the reverse order in which they are entered. 第 3 章 算  法 135  ptg 168 Algorithms $ python contextlib_contextmanager.py Normal: entering inside with statement: {} exiting Handled error: entering ERROR: showing example of handling an error exiting Unhandled error: entering exiting Traceback (most recent call last): File "contextlib_contextmanager.py", line 34, in raise ValueError(’this exception is not handled’) ValueError: this exception is not handled 3.4.3 Nesting Contexts At times, it is necessary to manage multiple contexts simultaneously (such as when copying data between input and output file handles, for example). It is possible to nest with statements one inside another, but if the outer contexts do not need their own separate block, this adds to the indention level without giving any real benefit. Using nested() nests the contexts using a single with statement. import contextlib @contextlib.contextmanager def make_context(name): print ’entering:’, name yield name print ’exiting :’, name with contextlib.nested(make_context(’A’), make_context(’B’)) as (A, B): print ’inside with statement:’, A, B Program execution leaves the contexts in the reverse order in which they are entered. 3.4.3 嵌套上下文 有些情况下,必须同时管理多个上下文(例如在输入和输出文件句柄之间复制数据)。 为此可以嵌套 with 语句,不过,如果外部上下文并不需要单独的块,嵌套 with 语句就只会 增加缩进层次而不会提供任何实际的好处。使用 nested() 可以利用一条 with 语句实现上下 文嵌套。 ptg 168 Algorithms $ python contextlib_contextmanager.py Normal: entering inside with statement: {} exiting Handled error: entering ERROR: showing example of handling an error exiting Unhandled error: entering exiting Traceback (most recent call last): File "contextlib_contextmanager.py", line 34, in raise ValueError(’this exception is not handled’) ValueError: this exception is not handled 3.4.3 Nesting Contexts At times, it is necessary to manage multiple contexts simultaneously (such as when copying data between input and output file handles, for example). It is possible to nest with statements one inside another, but if the outer contexts do not need their own separate block, this adds to the indention level without giving any real benefit. Using nested() nests the contexts using a single with statement. import contextlib @contextlib.contextmanager def make_context(name): print ’entering:’, name yield name print ’exiting :’, name with contextlib.nested(make_context(’A’), make_context(’B’)) as (A, B): print ’inside with statement:’, A, B Program execution leaves the contexts in the reverse order in which they are entered. 程序执行时会按其进入上下文的逆序离开上下文。 ptg 3.4. contextlib—Context Manager Utilities 169 $ python contextlib_nested.py entering: A entering: B inside with statement: A B exiting : B exiting : A In Python 2.7 and later, nested() is deprecated because the with statement sup- ports nesting directly. import contextlib @contextlib.contextmanager def make_context(name): print ’entering:’, name yield name print ’exiting :’, name with make_context(’A’) as A, make_context(’B’) as B: print ’inside with statement:’, A, B Each context manager and optional as clause are separated by a comma (,). The effect is similar to using nested(), but avoids some of the edge-cases around error handling that nested() could not implement correctly. $ python contextlib_nested_with.py entering: A entering: B inside with statement: A B exiting : B exiting : A 3.4.4 Closing Open Handles The file class supports the context manager API directly, but some other objects that represent open handles do not. The example given in the standard library documentation for contextlib is the object returned from urllib.urlopen(). There are other legacy classes that use a close() method but do not support the context manager API. To ensure that a handle is closed, use closing() to create a context manager for it. 在 Python 2.7 及以后版本中废弃了 nested(),因为在这些版本中 with 语句直接支持嵌套。 ptg 3.4. contextlib—Context Manager Utilities 169 $ python contextlib_nested.py entering: A entering: B inside with statement: A B exiting : B exiting : A In Python 2.7 and later, nested() is deprecated because the with statement sup- ports nesting directly. import contextlib @contextlib.contextmanager def make_context(name): print ’entering:’, name yield name print ’exiting :’, name with make_context(’A’) as A, make_context(’B’) as B: print ’inside with statement:’, A, B Each context manager and optional as clause are separated by a comma (,). The effect is similar to using nested(), but avoids some of the edge-cases around error handling that nested() could not implement correctly. $ python contextlib_nested_with.py entering: A entering: B inside with statement: A B exiting : B exiting : A 3.4.4 Closing Open Handles The file class supports the context manager API directly, but some other objects that represent open handles do not. The example given in the standard library documentation for contextlib is the object returned from urllib.urlopen(). There are other legacy classes that use a close() method but do not support the context manager API. To ensure that a handle is closed, use closing() to create a context manager for it.  136  Python 标准库  ptg 3.4. contextlib—Context Manager Utilities 169 $ python contextlib_nested.py entering: A entering: B inside with statement: A B exiting : B exiting : A In Python 2.7 and later, nested() is deprecated because the with statement sup- ports nesting directly. import contextlib @contextlib.contextmanager def make_context(name): print ’entering:’, name yield name print ’exiting :’, name with make_context(’A’) as A, make_context(’B’) as B: print ’inside with statement:’, A, B Each context manager and optional as clause are separated by a comma (,). The effect is similar to using nested(), but avoids some of the edge-cases around error handling that nested() could not implement correctly. $ python contextlib_nested_with.py entering: A entering: B inside with statement: A B exiting : B exiting : A 3.4.4 Closing Open Handles The file class supports the context manager API directly, but some other objects that represent open handles do not. The example given in the standard library documentation for contextlib is the object returned from urllib.urlopen(). There are other legacy classes that use a close() method but do not support the context manager API. To ensure that a handle is closed, use closing() to create a context manager for it. 每个上下文管理器与 as 子句(可选)之间用一个逗号(,)分隔。其效果类似于使用 nested(),不过避免了 nested() 不能正确实现的有关错误处理的一些边界情况。 ptg 3.4. contextlib—Context Manager Utilities 169 $ python contextlib_nested.py entering: A entering: B inside with statement: A B exiting : B exiting : A In Python 2.7 and later, nested() is deprecated because the with statement sup- ports nesting directly. import contextlib @contextlib.contextmanager def make_context(name): print ’entering:’, name yield name print ’exiting :’, name with make_context(’A’) as A, make_context(’B’) as B: print ’inside with statement:’, A, B Each context manager and optional as clause are separated by a comma (,). The effect is similar to using nested(), but avoids some of the edge-cases around error handling that nested() could not implement correctly. $ python contextlib_nested_with.py entering: A entering: B inside with statement: A B exiting : B exiting : A 3.4.4 Closing Open Handles The file class supports the context manager API directly, but some other objects that represent open handles do not. The example given in the standard library documentation for contextlib is the object returned from urllib.urlopen(). There are other legacy classes that use a close() method but do not support the context manager API. To ensure that a handle is closed, use closing() to create a context manager for it. 3.4.4 关闭打开的句柄 file 类直接支持上下文管理器 API,不过表示打开句柄的另外一些对象并不支持这个 API。 contextlib 的标准库文档中给出的例子是一个由 urllib.urlopen() 返回的对象。还有另外一些遗留 类,它们使用 close() 方法而不支持上下文管理器 API。为了确保关闭句柄,需要使用 closing() 为它创建一个上下文管理器。 ptg 170 Algorithms import contextlib class Door(object): def __init__(self): print ’ __init__()’ def close(self): print ’ close()’ print ’Normal Example:’ with contextlib.closing(Door()) as door: print ’ inside with statement’ print ’\nError handling example:’ try: with contextlib.closing(Door()) as door: print ’ raising from inside with statement’ raise RuntimeError(’error message’) except Exception, err: print ’ Had an error:’, err The handle is closed whether there is an error in the with block or not. $ python contextlib_closing.py Normal Example: __init__() inside with statement close() Error handling example: __init__() raising from inside with statement close() Had an error: error message See Also: contextlib (http://docs.python.org/library/contextlib.html) The standard library documentation for this module. PEP 343 (http://www.python.org/dev/peps/pep-0343) The with statement. 第 3 章 算  法 137  不论 with 块中是否有一个错误,这个句柄都会关闭。 ptg 170 Algorithms import contextlib class Door(object): def __init__(self): print ’ __init__()’ def close(self): print ’ close()’ print ’Normal Example:’ with contextlib.closing(Door()) as door: print ’ inside with statement’ print ’\nError handling example:’ try: with contextlib.closing(Door()) as door: print ’ raising from inside with statement’ raise RuntimeError(’error message’) except Exception, err: print ’ Had an error:’, err The handle is closed whether there is an error in the with block or not. $ python contextlib_closing.py Normal Example: __init__() inside with statement close() Error handling example: __init__() raising from inside with statement close() Had an error: error message See Also: contextlib (http://docs.python.org/library/contextlib.html) The standard library documentation for this module. PEP 343 (http://www.python.org/dev/peps/pep-0343) The with statement. 参见: contextlib (http://docs.python.org/library/contextlib.html) 这个模块的标准库文档。 PEP 343 (http://www.python.org/dev/peps/pep-0343) with 语句。 Context Manager Types (http://docs.python.org/library/stdtypes.html#type contextmanager)  标准库文档中关于上下文管理器 API 的描述。 With Statement Context Managers(http://docs.python.org/reference/datamodel.html#context- managers) Python 参考指南中关于上下文管理器 API 的描述。
还剩159页未读

继续阅读

下载pdf到电脑,查找使用更方便

pdf的实际排版效果,会与网站的显示效果略有不同!!

需要 8 金币 [ 分享pdf获得金币 ] 1 人已下载

下载pdf

pdf贡献者

reeyes

贡献于2015-04-14

下载需要 8 金币 [金币充值 ]
亲,您也可以通过 分享原创pdf 来获得金币奖励!
下载pdf