ruby cookbook(第1版)


Ruby Cookbook ™ ,TITLE.21720 Page i Friday, July 7, 2006 4:42 PM Other resources from O’Reilly Related titles Ajax Hacks™ Ajax Design Patterns Head Rush Ajax Rails Cookbook™ Ruby on Rails: Up and Running oreilly.com oreilly.com is more than a complete catalog of O’Reilly books. You’ll also find links to news, events, articles, weblogs, sample chapters, and code examples. oreillynet.com is the essential portal for developers interested in open and emerging technologies, including new platforms, pro- gramming languages, and operating systems. Conferences O’Reilly brings diverse innovators together to nurture the ideas that spark revolutionary industries. We specialize in document- ing the latest tools and systems, translating the innovator’s knowledge into useful skills for those in the trenches. Visit con- ferences.oreilly.com for our upcoming events. Safari Bookshelf (safari.oreilly.com) is the premier online refer- ence library for programmers and IT professionals. Conduct searches across more than 1,000 books. Subscribers can zero in on answers to time-critical questions in a matter of seconds. Read the books on your Bookshelf from cover to cover or sim- ply flip to the page you need. Try it today for free. ,TITLE.21720 Page ii Friday, July 7, 2006 4:42 PM Ruby Cookbook™ Lucas Carlson and Leonard Richardson Beijing • Cambridge • Farnham • Köln • Paris • Sebastopol • Taipei • Tokyo ,TITLE.21720 Page iii Friday, July 7, 2006 4:42 PM Ruby Cookbook by Lucas Carlson and Leonard Richardson Copyright © 2006 O’Reilly Media, Inc. All rights reserved. Printed in the United States of America. Published by O’Reilly Media, Inc., 1005 Gravenstein Highway North, Sebastopol, CA 95472. O’Reilly books may be purchased for educational, business, or sales promotional use. Online editions are also available for most titles (safari.oreilly.com). For more information, contact our corporate/institutional sales department: (800) 998-9938 or corporate@oreilly.com. Editor: Mike Loukides Production Editor: Colleen Gorman Proofreader: Colleen Gorman Indexer: Johnna VanHoose Dinse Cover Designer: Karen Montgomery Interior Designer: David Futato Illustrators: Robert Romano and Jessamyn Read Printing History: July 2006: First Edition. Nutshell Handbook, the Nutshell Handbook logo, and the O’Reilly logo are registered trademarks of O’Reilly Media, Inc. The Cookbook series designations, Ruby Cookbook, the image of a side-striped jackal, and related trade dress are trademarks of O’Reilly Media, Inc. Many of the designations used by manufacturers and sellers to distinguish their products are claimed as trademarks. Where those designations appear in this book, and O’Reilly Media, Inc. was aware of a trademark claim, the designations have been printed in caps or initial caps. While every precaution has been taken in the preparation of this book, the publisher and authors assume no responsibility for errors or omissions, or for damages resulting from the use of the information contained herein. ISBN: 0-596-52369-6 [M] ,COPYRIGHT.21583 Page iv Friday, July 7, 2006 4:42 PM For Tess, who sat by me the whole time. For John and Rael, the best programmers I know. —Lucas Carlson For Sumana. —Leonard Richardson ,DEDICATION.6852 Page v Tuesday, June 27, 2006 11:12 AM ,DEDICATION.6852 Page vi Tuesday, June 27, 2006 11:12 AM vii Table of Contents Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xix 1. Strings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1 Building a String from Parts 4 1.2 Substituting Variables into Strings 6 1.3 Substituting Variables into an Existing String 8 1.4 Reversing a String by Words or Characters 10 1.5 Representing Unprintable Characters 11 1.6 Converting Between Characters and Values 14 1.7 Converting Between Strings and Symbols 14 1.8 Processing a String One Character at a Time 16 1.9 Processing a String One Word at a Time 17 1.10 Changing the Case of a String 19 1.11 Managing Whitespace 21 1.12 Testing Whether an Object Is String-Like 22 1.13 Getting the Parts of a String You Want 23 1.14 Handling International Encodings 24 1.15 Word-Wrapping Lines of Text 26 1.16 Generating a Succession of Strings 28 1.17 Matching Strings with Regular Expressions 30 1.18 Replacing Multiple Patterns in a Single Pass 32 1.19 Validating an Email Address 33 1.20 Classifying Text with a Bayesian Analyzer 37 2. Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 2.1 Parsing a Number from a String 40 2.2 Comparing Floating-Point Numbers 43 viii | Table of Contents 2.3 Representing Numbers to Arbitrary Precision 45 2.4 Representing Rational Numbers 48 2.5 Generating Random Numbers 50 2.6 Converting Between Numeric Bases 52 2.7 Taking Logarithms 53 2.8 Finding Mean, Median, and Mode 55 2.9 Converting Between Degrees and Radians 58 2.10 Multiplying Matrices 60 2.11 Solving a System of Linear Equations 64 2.12 Using Complex Numbers 67 2.13 Simulating a Subclass of Fixnum 69 2.14 Doing Math with Roman Numbers 73 2.15 Generating a Sequence of Numbers 78 2.16 Generating Prime Numbers 81 2.17 Checking a Credit Card Checksum 85 3. Date and Time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87 3.1 Finding Today’s Date 90 3.2 Parsing Dates, Precisely or Fuzzily 93 3.3 Printing a Date 96 3.4 Iterating Over Dates 100 3.5 Doing Date Arithmetic 102 3.6 Counting the Days Since an Arbitrary Date 104 3.7 Converting Between Time Zones 106 3.8 Checking Whether Daylight Saving Time Is in Effect 109 3.9 Converting Between Time and DateTime Objects 110 3.10 Finding the Day of the Week 113 3.11 Handling Commercial Dates 115 3.12 Running a Code Block Periodically116 3.13 Waiting a Certain Amount of Time 118 3.14 Adding a Timeout to a Long-Running Operation 121 4. Arrays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123 4.1 Iterating Over an Array125 4.2 Rearranging Values Without Using Temporary Variables 129 4.3 Stripping Duplicate Elements from an Array130 4.4 Reversing an Array132 4.5 Sorting an Array132 4.6 Ignoring Case When Sorting Strings 134 Table of Contents | ix 4.7 Making Sure a Sorted Array Stays Sorted 135 4.8 Summing the Items of an Array140 4.9 Sorting an Array by Frequency of Appearance 141 4.10 Shuffling an Array143 4.11 Getting the N Smallest Items of an Array145 4.12 Building Up a Hash Using Injection 147 4.13 Extracting Portions of Arrays 149 4.14 Computing Set Operations on Arrays 152 4.15 Partitioning or Classifying a Set 155 5. Hashes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159 5.1 Using Symbols as Hash Keys 161 5.2 Creating a Hash with a Default Value 162 5.3 Adding Elements to a Hash 164 5.4 Removing Elements from a Hash 166 5.5 Using an Array or Other Modifiable Object as a Hash Key 168 5.6 Keeping Multiple Values for the Same Hash Key170 5.7 Iterating Over a Hash 171 5.8 Iterating Over a Hash in Insertion Order 174 5.9 Printing a Hash 175 5.10 Inverting a Hash 177 5.11 Choosing Randomly from a Weighted List 179 5.12 Building a Histogram 181 5.13 Remapping the Keys and Values of a Hash 183 5.14 Extracting Portions of Hashes 184 5.15 Searching a Hash with Regular Expressions 185 6. Files and Directories . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187 6.1 Checking to See If a File Exists 190 6.2 Checking Your Access to a File 191 6.3 Changing the Permissions on a File 193 6.4 Seeing When a File Was Last Used 196 6.5 Listing a Directory198 6.6 Reading the Contents of a File 201 6.7 Writing to a File 204 6.8 Writing to a Temporary File 206 6.9 Picking a Random Line from a File 207 6.10 Comparing Two Files 209 6.11 Performing Random Access on “Read-Once” Input Streams 212 x | Table of Contents 6.12 Walking a Directory Tree 214 6.13 Locking a File 217 6.14 Backing Up to Versioned Filenames 220 6.15 Pretending a String Is a File 222 6.16 Redirecting Standard Input or Output 225 6.17 Processing a Binary File 227 6.18 Deleting a File 231 6.19 Truncating a File 232 6.20 Finding the Files You Want 233 6.21 Finding and Changing the Current Working Directory235 7. Code Blocks and Iteration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 237 7.1 Creating and Invoking a Block 240 7.2 Writing a Method That Accepts a Block 241 7.3 Binding a Block Argument to a Variable 244 7.4 Blocks as Closures: Using Outside Variables Within a Code Block 246 7.5 Writing an Iterator Over a Data Structure 247 7.6 Changing the Way an Object Iterates 250 7.7 Writing Block Methods That Classify or Collect 253 7.8 Stopping an Iteration 254 7.9 Looping Through Multiple Iterables in Parallel 256 7.10 Hiding Setup and Cleanup in a Block Method 260 7.11 Coupling Systems Loosely with Callbacks 262 8. Objects and Classes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 267 8.1 Managing Instance Data 269 8.2 Managing Class Data 272 8.3 Checking Class or Module Membership 275 8.4 Writing an Inherited Class 277 8.5 Overloading Methods 279 8.6 Validating and Modifying Attribute Values 281 8.7 Defining a Virtual Attribute 283 8.8 Delegating Method Calls to Another Object 284 8.9 Converting and Coercing Objects to Different Types 287 8.10 Getting a Human-Readable Printout of AnyObject 291 8.11 Accepting or Passing a Variable Number of Arguments 293 8.12 Simulating Keyword Arguments 295 8.13 Calling a Superclass’s Method 297 8.14 Creating an Abstract Method 299 Table of Contents | xi 8.15 Freezing an Object to Prevent Changes 302 8.16 Making a Copy of an Object 304 8.17 Declaring Constants 307 8.18 Implementing Class and Singleton Methods 309 8.19 Controlling Access by Making Methods Private 311 9. Modules and Namespaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 315 9.1 Simulating Multiple Inheritance with Mixins 315 9.2 Extending Specific Objects with Modules 319 9.3 Mixing in Class Methods 321 9.4 Implementing Enumerable: Write One Method, Get 22 Free 322 9.5 Avoiding Naming Collisions with Namespaces 324 9.6 Automatically Loading Libraries as Needed 326 9.7 Including Namespaces 328 9.8 Initializing Instance Variables Defined bya Module 329 9.9 Automatically Initializing Mixed-In Modules 330 10. Reflection and Metaprogramming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 333 10.1 Finding an Object’s Class and Superclass 334 10.2 Listing an Object’s Methods 335 10.3 Listing Methods Unique to an Object 337 10.4 Getting a Reference to a Method 339 10.5 Fixing Bugs in Someone Else’s Class 341 10.6 Listening for Changes to a Class 343 10.7 Checking Whether an Object Has Necessary Attributes 345 10.8 Responding to Calls to Undefined Methods 347 10.9 Automatically Initializing Instance Variables 351 10.10 Avoiding Boilerplate Code with Metaprogramming 352 10.11 Metaprogramming with String Evaluations 355 10.12 Evaluating Code in an Earlier Context 357 10.13 Undefining a Method 358 10.14 Aliasing Methods 361 10.15 Doing Aspect-Oriented Programming 364 10.16 Enforcing Software Contracts 367 11. XML and HTML . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 371 11.1 Checking XML Well-Formedness 372 11.2 Extracting Data from a Document’s Tree Structure 374 11.3 Extracting Data While Parsing a Document 376 xii | Table of Contents 11.4 Navigating a Document with XPath 377 11.5 Parsing Invalid Markup 380 11.6 Converting an XML Document into a Hash 382 11.7 Validating an XML Document 385 11.8 Substituting XML Entities 388 11.9 Creating and Modifying XML Documents 390 11.10 Compressing Whitespace in an XML Document 394 11.11 Guessing a Document’s Encoding 395 11.12 Converting from One Encoding to Another 396 11.13 Extracting All the URLs from an HTML Document 398 11.14 Transforming Plain Text to HTML 401 11.15 Converting HTML Documents from the Web into Text 402 11.16 A Simple Feed Aggregator 405 12. Graphics and Other File Formats . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 409 12.1 Thumbnailing Images 409 12.2 Adding Text to an Image 412 12.3 Converting One Image Format to Another 415 12.4 Graphing Data 417 12.5 Adding Graphical Context with Sparklines 421 12.6 Strongly Encrypting Data 424 12.7 Parsing Comma-Separated Data 426 12.8 Parsing Not-Quite-Comma-Separated Data 429 12.9 Generating and Parsing Excel Spreadsheets 431 12.10 Compressing and Archiving Files with Gzip and Tar 433 12.11 Reading and Writing ZIP Files 436 12.12 Reading and Writing Configuration Files 437 12.13 Generating PDF Files 439 12.14 Representing Data as MIDI Music 443 13. Databases and Persistence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 447 13.1 Serializing Data with YAML 450 13.2 Serializing Data with Marshal 454 13.3 Persisting Objects with Madeleine 455 13.4 Indexing Unstructured Text with SimpleSearch 458 13.5 Indexing Structured Text with Ferret 459 13.6 Using Berkeley DB Databases 463 13.7 Controlling MySQL on Unix 465 13.8 Finding the Number of Rows Returned bya Query 466 Table of Contents | xiii 13.9 Talking Directly to a MySQL Database 468 13.10 Talking Directly to a PostgreSQL Database 470 13.11 Using Object Relational Mapping with ActiveRecord 473 13.12 Using Object Relational Mapping with Og 477 13.13 Building Queries Programmatically481 13.14 Validating Data with ActiveRecord 485 13.15 Preventing SQL Injection Attacks 487 13.16 Using Transactions in ActiveRecord 490 13.17 Adding Hooks to Table Events 492 13.18 Adding Taggability with a Database Mixin 495 14. Internet Services . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 499 14.1 Grabbing the Contents of a Web Page 500 14.2 Making an HTTPS Web Request 502 14.3 Customizing HTTP Request Headers 504 14.4 Performing DNS Queries 506 14.5 Sending Mail 508 14.6 Reading Mail with IMAP 512 14.7 Reading Mail with POP3 516 14.8 Being an FTP Client 520 14.9 Being a Telnet Client 522 14.10 Being an SSH Client 525 14.11 Copying a File to Another Machine 527 14.12 Being a BitTorrent Client 529 14.13 Pinging a Machine 531 14.14 Writing an Internet Server 532 14.15 Parsing URLs 534 14.16 Writing a CGI Script 537 14.17 Setting Cookies and Other HTTP Response Headers 540 14.18 Handling File Uploads via CGI 543 14.19 Running Servlets with WEBrick 546 14.20 A Real-World HTTP Client 551 15. Web Development: Ruby on Rails . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 555 15.1 Writing a Simple Rails Application to Show System Status 557 15.2 Passing Data from the Controller to the View 560 15.3 Creating a Layout for Your Header and Footer 563 15.4 Redirecting to a Different Location 565 15.5 Displaying Templates with Render 567 xiv | Table of Contents 15.6 Integrating a Database with Your Rails Application 570 15.7 Understanding Pluralization Rules 573 15.8 Creating a Login System 575 15.9 Storing Hashed User Passwords in the Database 579 15.10 Escaping HTML and JavaScript for Display581 15.11 Setting and Retrieving Session Information 582 15.12 Setting and Retrieving Cookies 585 15.13 Extracting Code into Helper Functions 587 15.14 Refactoring the View into Partial Snippets of Views 588 15.15 Adding DHTML Effects with script.aculo.us 592 15.16 Generating Forms for Manipulating Model Objects 594 15.17 Creating an Ajax Form 598 15.18 Exposing Web Services on Your Web Site 601 15.19 Sending Mail with Rails 604 15.20 Automatically Sending Error Messages to Your Email 606 15.21 Documenting Your Web Site 608 15.22 Unit Testing Your Web Site 609 15.23 Using breakpoint in Your Web Application 613 16. Web Services and Distributed Programming . . . . . . . . . . . . . . . . . . . . . . . . . 616 16.1 Searching for Books on Amazon 617 16.2 Finding Photos on Flickr 620 16.3 Writing an XML-RPC Client 623 16.4 Writing a SOAP Client 625 16.5 Writing a SOAP Server 627 16.6 Searching the Web with Google’s SOAP Service 628 16.7 Using a WSDL File to Make SOAP Calls Easier 630 16.8 Charging a Credit Card 632 16.9 Finding the Cost to Ship Packages via UPS or FedEx 633 16.10 Sharing a Hash Between Any Number of Computers 635 16.11 Implementing a Distributed Queue 639 16.12 Creating a Shared “Whiteboard” 640 16.13 Securing DRb Services with Access Control Lists 644 16.14 Automatically Discovering DRb Services with Rinda 645 16.15 Proxying Objects That Can’t Be Distributed 647 16.16 Storing Data on Distributed RAM with MemCached 650 16.17 Caching Expensive Results with MemCached 652 16.18 A Remote-Controlled Jukebox 655 Table of Contents | xv 17. Testing, Debugging, Optimizing, and Documenting . . . . . . . . . . . . . . . . . . . 661 17.1 Running Code Only in Debug Mode 662 17.2 Raising an Exception 664 17.3 Handling an Exception 666 17.4 Rerunning After an Exception 668 17.5 Adding Logging to Your Application 669 17.6 Creating and Understanding Tracebacks 672 17.7 Writing Unit Tests 674 17.8 Running Unit Tests 677 17.9 Testing Code That Uses External Resources 679 17.10 Using breakpoint to Inspect and Change the State of Your Application 684 17.11 Documenting Your Application 686 17.12 Profiling Your Application 691 17.13 Benchmarking Competing Solutions 694 17.14 Running Multiple Analysis Tools at Once 696 17.15 Who’s Calling That Method? A Call Graph Analyzer 697 18. Packaging and Distributing Software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 701 18.1 Finding Libraries by Querying Gem Respositories 702 18.2 Installing and Using a Gem 705 18.3 Requiring a Specific Version of a Gem 708 18.4 Uninstalling a Gem 711 18.5 Reading Documentation for Installed Gems 712 18.6 Packaging Your Code as a Gem 714 18.7 Distributing Your Gems 717 18.8 Installing and Creating Standalone Packages with setup.rb 719 19. Automating Tasks with Rake . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 723 19.1 Automatically Running Unit Tests 725 19.2 Automatically Generating Documentation 727 19.3 Cleaning Up Generated Files 729 19.4 Automatically Building a Gem 731 19.5 Gathering Statistics About Your Code 732 19.6 Publishing Your Documentation 735 19.7 Running Multiple Tasks in Parallel 737 19.8 A Generic Project Rakefile 738 xvi | Table of Contents 20. Multitasking and Multithreading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 745 20.1 Running a Daemon Process on Unix 746 20.2 Creating a Windows Service 749 20.3 Doing Two Things at Once with Threads 752 20.4 Synchronizing Access to an Object 754 20.5 Terminating a Thread 757 20.6 Running a Code Block on Many Objects Simultaneously 760 20.7 Limiting Multithreading with a Thread Pool 763 20.8 Driving an External Process with popen 765 20.9 Capturing the Output and Error Streams from a Unix Shell Command 767 20.10 Controlling a Process on Another Machine 768 20.11 Avoiding Deadlock 770 21. User Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 773 21.1 Getting Input One Line at a Time 774 21.2 Getting Input One Character at a Time 776 21.3 Parsing Command-Line Arguments 779 21.4 Testing Whether a Program Is Running Interactively782 21.5 Setting Up and Tearing Down a Curses Program 782 21.6 Clearing the Screen 785 21.7 Determining Terminal Size 786 21.8 Changing Text Color 788 21.9 Reading a Password 791 21.10 Allowing Input Editing with Readline 792 21.11 Making Your Keyboard Lights Blink 794 21.12 Creating a GUI Application with Tk 796 21.13 Creating a GUI Application with wxRuby800 21.14 Creating a GUI Application with Ruby/GTK 803 21.15 Creating a Mac OS X Application with RubyCocoa 807 21.16 Using AppleScript to Get User Input 815 22. Extending Ruby with Other Languages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 817 22.1 Writing a C Extension for Ruby818 22.2 Using a C Library from Ruby 821 22.3 Calling a C Library Through SWIG 825 22.4 Writing Inline C in Your Ruby Code 827 22.5 Using Java Libraries with JRuby830 Table of Contents | xvii 23. System Administration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 833 23.1 Scripting an External Program 834 23.2 Managing Windows Services 835 23.3 Running Code as Another User 837 23.4 Running Periodic Tasks Without cron or at 839 23.5 Deleting Files That Match a Regular Expression 840 23.6 Renaming Files in Bulk 842 23.7 Finding Duplicate Files 845 23.8 Automating Backups 848 23.9 Normalizing Ownership and Permissions in User Directories 849 23.10 Killing All Processes for a Given User 852 Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 855 xix Preface1 Life Is Short This is a book of recipes: solutions to common problems, copy-and-paste code snip- pets, explanations, examples, and short tutorials. This book is meant to save you time. Time, as they say, is money, but a span of time is also a piece of your life. Our lives are better spent creating new things than fight- ing our own errors, or trying to solve problems that have already been solved. We present this book in the hope that the time it saves, distributed across all its readers, will greatly outweigh the time we spent creating it. The Rubyprogramming language is itself a wonderful time-saving tool. It makes you more productive than other programming languages because you spend more time making the computer do what you want, and less wrestling with the language. But there are manywaysfor a Rubyprogrammer to spend time without accomplishing anything, and we’ve encountered them all: • Time spent writing Ruby implementations of common algorithms. • Time spent debugging Ruby implementations of common algorithms. • Time spent discovering and working around Ruby-specific pitfalls. • Time spent on repetitive tasks (including repetitive programming tasks!) that could be automated. • Time spent duplicating work that someone else has already made publicly available. • Time spent searching for a library that does X. • Time spent evaluating and deciding between the many libraries that do X. • Time spent learning how to use a librarybecause of poor or outdated documentation. • Time lost staying away from a useful technology because it seems intimidating. xx | Preface We, and the manycontributors to this book, recall vividlyour own wasted hours and days. We’ve distilled our experiences into this book so that you don’t waste your time—or at least so you enjoyably waste it on more interesting problems. Our other goal is to expand your interests. If you come to this book wanting to gen- erate algorithmic music with Ruby then, yes, Recipe 12.14 will save you time over starting from scratch. It’s more likelythat you’dnever considered the possibilityuntil now. Everyrecipe in this book was developed and written with these two goals in mind: to save you time, and to keep your brain active with new ideas. Audience This cookbook is aimed at people who know at least a little bit of Ruby, or who know a fair amount about programming in general. This isn’t a Rubytutorial (see the Resources section below for some real tutorials), but if you’re already familiar with a few other programming languages, you should be able to pick up Ruby by reading through the first 10 chapters of this book and typing in the code listings as you go. We’ve included recipes suitable for all skill levels, from those who are just starting out with Ruby, to experts who need an occasional reference. We focus mainly on generic programming techniques, but we also cover specific application frameworks (like Ruby on Rails and GUI libraries) and best practices (like unit testing). Even if you just plan to use this book as a reference, we recommend that you skim through it once to get a picture of the problems we solve. This is a big book but it doesn’t solve every problem. If you pick it up and you can’t find a solution to your problem, or one that nudges you in the right direction, then you’ve lost time. If you skim through this book once beforehand, you’ll get a fair idea of the problems we cover in this book, and you’ll get a better hit rate. You’ll know when this book can help you; and when you should consult other books, do a web search, ask a friend, or get help some other way. The Structure of This Book Each of this book’s 23 chapters focuses on a kind of programming or a particular data type. This overview of the chapters should give you a picture of how we divided up the recipes. Each chapter also has its own, somewhat lengthier introduction, which gives a more detailed view of its recipes. At the veryleast, we recommend you skim the chapter introductions and the table of contents. We start with six chapters covering Ruby’s built-in data structures. • Chapter 1, Strings, contains recipes for building, processing, and manipulating strings of text. We devote a few recipes specificallyto regular expressions (Reci- pes 1.17 through 1.19), but our focus is on Ruby-specific issues, and regular Preface | xxi expressions are a very general tool. If you haven’t encountered them yet, or just find them intimidating, we recommend you go through an online tutorial or Mastering Regular Expressions by Jeffrey Friedl (O’Reilly). • Chapter 2, Numbers, covers the representation of different types of numbers: real numbers, complex numbers, arbitrary-precision decimals, and so on. It also includes Rubyimplementations of common mathematical and statistical algo- rithms, and explains some Ruby quirks you’ll run into if you create your own numeric types (Recipes 2.13 and 2.14). • Chapter 3, Date and Time, covers Ruby’s two interfaces for dealing with time: the one based on the C time library, which may be familiar to you from other program- ming languages, and the one implemented in pure Ruby, which is more idiomatic. • Chapter 4, Arrays, introduces the array, Ruby’s simplest compound data type. Manyof an array’smethods are actuallymethods of the Enumerable mixin; this means you can apply many of these recipes to hashes and other data types. Some features of Enumerable are covered in this chapter (Recipes 4.4 and 4.6), and some are covered in Chapter 7. • Chapter 5, Hashes, covers the hash, Ruby’s other basic compound data type. Hashes make it easyto associate objects with names and find them later (hashes are sometimes called “lookup tables” or “dictionaries,” two telling names). It’s easy to use hashes along with arrays to build deep and complex data structures. • Chapter 6, Files and Directories, covers techniques for reading, writing, and manipulating files. Ruby’s file access interface is based on the standard C file libraries, so it may look familiar to you. This chapter also covers Ruby’s stan- dard libraries for searching and manipulating the filesystem; many of these reci- pes show up again in Chapter 23. The first six chapters deal with specific algorithmic problems. The next four are more abstract: they’re about Ruby idiom and philosophy. If you can’t get the Ruby language itself to do what you want, or you’re having trouble writing Ruby code that looks the way Ruby “should” look, the recipes in these chapters may help. • Chapter 7, Code Blocks and Iteration, contains recipes that explore the possibili- ties of Ruby’s code blocks (also known as closures). • Chapter 8, Objects and Classes, covers Ruby’s take on object-oriented program- ming. It contains recipes for writing different types of classes and methods, and a few recipes that demonstrate capabilities of all Rubyobjects (such as freezing and cloning). • Chapter 9, Modules and Namespaces, covers Ruby’s modules. These constructs are used to “mix” new behavior into existing classes and to segregate functional- ity into different namespaces. • Chapter 10, Reflection and Metaprogramming, covers techniques for programati- cally exploring and modifying Ruby class definitions. xxii | Preface Chapter 6 covers basic file access, but doesn’t touch much on specific file formats. We devote three chapters to popular ways of storing data. • Chapter 11, XML and HTML, shows how to handle the most popular data inter- change formats. The chapter deals mostlywith parsing other people’s XML doc- uments and web pages (but see Recipe 11.9). • Chapter 12, Graphics and Other File Formats, covers data interchange formats other than XML and HTML, with a special focus on generating and manipulat- ing graphics. • Chapter 13, Databases and Persistence, covers the best Rubyinterfaces to data storage formats, whether you’re serializing Ruby objects to disk, or storing struc- tured data in a database. This chapter demonstrates everything from different ways of serializing data and indexing text, to the Ruby client libraries for popu- lar SQL databases, to full-blown abstraction layers like ActiveRecord that save you from having to write SQL at all. Currentlythe most popular use of Rubyis in network applications (mostlythrough Ruby on Rails). We devote three chapters to different types of applications: • Chapter 14, Internet Services, kicks off our networking coverage byillustrating a wide variety of clients and servers written with Ruby libraries. • Chapter 15, Web Development: Ruby on Rails, covers the web application frame- work that’s been driving so much of Ruby’s recent popularity. • Chapter 16, Web Services and Distributed Programming, covers two techniques for sharing information between computers during a Rubyprogram. In order to use a web service, you make an HTTP request of a program on some other computer, usually one you don’t control. Ruby’s DRb library lets you share Ruby data struc- tures between programs running on a set of computers, all of which you control. We then have three chapters on the auxilliarytasks that surround the main program- ming work of a project. • Chapter 17, Testing, Debugging, Optimizing, and Documenting, focuses mainly on handling exception conditions and creating unit tests for your code. There are also several recipes on the processes of debugging and optimization. • Chapter 18, Packaging and Distributing Software, mainlydeals with Ruby’sGem packaging system and the RubyForge server that hosts many gem files. Many recipes in other chapters require that you install a particular gem, so if you’re not familiar with gems, we recommend you read Recipe 18.2 in particular. The chapter also shows you how to create and distribute gems for your own projects. • Chapter 19, Automating Tasks with Rake, covers the most popular Rubybuild tool. With Rake, you can script common tasks like running unit tests or packag- ing your code as a gem. Though it’s usually used in Ruby projects, it’s a general- purpose build language that you can use wherever you might use Make. Preface | xxiii We close the book with four chapters on miscellaneous topics. • Chapter 20, Multitasking and Multithreading, shows how to use threads to do more than one thing at once, and how to use Unix subprocesses to run external commands. • Chapter 21, User Interface, covers user interfaces (apart from the web interface, which was covered in Chapter 15). We discuss the command-line interface, character-based GUIs with Curses and HighLine, GUI toolkits for various plat- forms, and more obscure kinds of user interface (Recipe 21.11). • Chapter 22, Extending Ruby with Other Languages, focuses on hooking up Ruby to other languages, either for performance or to get access to more libraries. Most of the chapter focuses on getting access to C libraries, but there is one rec- ipe about JRuby, the Ruby implementation that runs on the Java Virtual Machine (Recipe 22.5). • Chapter 23, System Administration, is full of self-contained programs for doing administrative tasks, usuallyusing techniques from other chapters. The recipes have a heavyfocus on Unix administration, but there are some resources for Windows users (including Recipe 23.2), and some cross-platform scripts. How the Code Listings Work Learning from a cookbook means performing the recipes. Some of our recipes define big chunks of Rubycode that youcan simplyplop into yourprogram and use with- out reallyunderstanding them (Recipe 19.8 is a good example). But most of the reci- pes demonstrate techniques, and the best way to learn a technique is to practice it. We wrote the recipes, and their code listings, with this in mind. Most of our listings act like unit tests for the concepts described in the recipe: theypoke at objects and show you the results. Now, a Rubyinstallation comes with an interactive interpreter called irb. Within an irb session, you can type in lines of Ruby code and see the output immediately. You don’t have to create a Ruby program file and run it through the interpreter. Most of our recipes are presented in a form that you can type or copy/paste directly into an irb session. To studya recipe in depth, we recommend that youstart an irb session and run through the code listings as you read it. You’ll have a deeper under- standing of the concept if you do it yourself than if you just read about it. Once you’re done, you can experiment further with the objects you defined while running the code listings. Sometimes we want to draw your attention to the expected result of a Ruby expres- sion. We do this with a Rubycomment containing an ASCII arrow that points to the expected value of the expression. This is the same arrow irb uses to tell you the value of every expression you type. xxiv | Preface We also use textual comments to explain some pieces of code. Here’s a fragment of Ruby code that I’ve formatted with comments as I would in a recipe: 1 + 2 # => 3 # On a long line, the expected value goes on a new line: Math.sqrt(1 + 2 + 3 + 4 + 5 + 6 + 7 + 8 + 9 + 10) # => 7.41619848709566 To displaythe expected output of a Rubyexpression, we use a comment that has no ASCII arrow, and that always goes on a new line: puts "This string is self-referential." # This string is self-referential. If you type these two snippets of code into irb, ignoring the comments, you can check back against the text and verify that you got the same results we did: $ irb irb(main):001:0> 1 + 2 => 3 irb(main):002:0> Math.sqrt(1 + 2 + 3 + 4 + 5 + 6 + 7 + 8 + 9 + 10) => 7.41619848709566 irb(main):003:0> puts "This string is self-referential." This string is self-referential. => nil If you’re reading this book in electronic form, you can copy and paste the code frag- ments into irb. The Rubyinterpreter will ignore the comments, but youcan use them to make sure your answers match ours, without having to look back at the text. (But you should know that typing in the code yourself, at least the first time, is bet- ter for comprehension.) $ irb irb(main):001:0> 1 + 2 # => 3 => 3 irb(main):002:0> irb(main):003:0* # On a long line, the expected value goes on a new line: irb(main):004:0* Math.sqrt(1 + 2 + 3 + 4 + 5 + 6 + 7 + 8 + 9 + 10) => 7.41619848709566 irb(main):005:0> # => 7.41619848709566 irb(main):006:0* irb(main):007:0* puts "This string is self-referential." This string is self-referential. => nil irb(main):008:0> # This string is self-referential. We don’t cut corners. Most of our recipes demonstrate a complete irb session from start to finish, and theyinclude anyimports or initialization necessaryto illustrate the point we’re trying to make. If you run the code exactly as it is in the recipe, you should get the same results we did.* This fits in with our philosophythat code samples should * When a program’s behavior depends on the current time, the random number generator, or the presence of certain files on disk, you might not get the exact same results we did, but it should be similar. Preface | xxv be unit tests for the underlying concepts. In fact, we tested our code samples like unit tests, with a Ruby script that parses recipe texts and runs the code listings. The irb session technique doesn’t always work. Rails recipes have to run within Rails. Curses recipes take over the screen and don’t playwell with irb. So sometimes we show you standalone files. We present them in the following format: #!/usr/bin/ruby -w # sample_ruby_file.rb: A sample file 1 + 2 Math.sqrt(1 + 2 + 3 + 4 + 5 + 6 + 7 + 8 + 9 + 10) puts "This string is self-referential." Whenever possible, we’ll also show what you’ll get when you run this program: maybe a screenshot of a GUI program, or a record of the program’s output when run from the Unix command line: $ ruby sample_ruby_file.rb This string is self-referential. Note that the output of sample_ruby_file.rb looks different from the same code entered into irb. Here, there’s no trace of the addition and the square root opera- tions, because they produce no output. Installing the Software Rubycomes preinstalled on Mac OS X and most Linux installations. Windows doesn’t come with Ruby, but it’s easy to get it with the One-Click Installer: see http:// rubyforge.org/projects/rubyinstaller/. If you’re on a Unix/Linux system and you don’t have Ruby installed (or you want to upgrade), your distribution’s package system may make a Ruby package available. On Debian GNU/Linux, it’s available as the package ruby-[version]: for instance, ruby-1.8 or ruby-1.9. Red Hat Linux calls it ruby; so does the DarwinParts system on Mac OS X. If all else fails, download the Rubysource code and compile it yourself.You can get the Ruby source code through FTP or HTTP by visiting http://www.ruby-lang.org/. Manyof the recipes in this book require that youinstall third-partylibraries in the form of Rubygems. In general, we prefer standalone solutions (using onlythe Ruby standard library) to solutions that use gems, and gem-based solutions to ones that require other kinds of third-party software. If you’re not familiar with gems, consult Chapter 18 as needed. To get started, all you need to know is that you first download the Rubygems library from http:// rubyforge.org/projects/rubygems/ (choose the latest release from that page). Unpack xxvi | Preface the tarball or ZIP file, change into the rubygems-[version] directory, and run this command as the superuser: $ ruby setup.rb The Rubygems library is included in the Windows One-Click Installer, so you don’t have to worry about this step on Windows. Once you’ve got the Rubygems library installed, it’s easy to install many other pieces of Rubycode. When a recipe sayssomething like “Rubyon Rails is available as the rails gem,” you can issue the following command from the command line (again, as the superuser): $ gem install rails --include-dependencies The RubyGems library will download the rails gem (and anyother gems on which it depends) and automaticallyinstall them. You should then be able to run the code in the recipe, exactly as it appears. The three most useful gems for new Rubyinstallations are rails (if you intend to cre- ate Rails applications) and the two gems provided bythe RubyFacets project: facets_core and facets_more. The Facets Core libraryextends the classes of the Ruby standard librarywith generallyuseful methods. The Facets More libraryadds entirely new classes and modules. The RubyFacets homepage ( http://facets.rubyforge.org/) has a complete reference. Some Rubylibraries (especiallyolder ones) are not packaged as gems. Most of the nongem libraries mentioned in this book have entries in the RubyApplication Archive (http://raa.ruby-lang.org/), a directoryof Rubyprograms and libraries. In most cases you can download a tarball or ZIP file from the RAA, and install it with the technique described in Recipe 18.8. Platform Differences, Version Differences, and Other Headaches Except where noted, the recipes describe cross-platform concepts, and the code itself should run the same wayon Windows, Linux, and Mac OS X. Most of the platform differences and platform-specific recipes show up in the final chapters: Chapter 20, Chapter 21, and Chapter 23 (but see the introduction to Chapter 6 for a note about Windows filenames). We wrote and tested the recipes using Rubyversion 1.8.4 and Rails version 1.1.2, the latest stable versions as of the time of writing. In a couple of places we mention code changes you should make if you’re running Ruby 1.9 (the latest unstable version as of the time of writing) or 2.0. Preface | xxvii Despite our best efforts, this book maycontain unflagged platform-specific code, not to mention plain old bugs. We apologize for these in advance of their discovery. If you have problems with a recipe, check out the eratta for this book (see below). In several recipes in this book, we modifystandard Rubyclasses like Array to add new methods (see, for instance, Recipe 1.10, which defines a new method called String#capitalize_first_letter). These methods are then available to every instance of that class in your program. This is a fairly common technique in Ruby: both Rails and the Facets Core librarymentioned above do it. It’s somewhat contro- versial, though, and it can cause problems (see Recipe 8.4 for an in-depth discus- sion), so we felt we should mention it here in the Preface, even though it might be too technical for people who are new to Ruby. If you don’t want to modify the standard classes, you can put the methods we dem- onstrate into a subclass, or define them in the Kernel namespace: that is, define capitalize_first_letter_of_string instead of reopening String and defining capitalize_first_letter inside it. Other Resources If you need to learn Ruby, the standard reference is Programming Ruby: The Prag- matic Programmer’s Guide byDave Thomas, Chad Fowler, and AndyHunt (Prag- matic Programmers). The first edition is available online in HTML format (http:// www.rubycentral.com/book/), but it’s out of date. The second edition is much better and is available as a printed book or as PDF (http://www.pragmaticprogrammer.com/ titles/ruby/). It’s a much better idea to buythe second edition and use the first edi- tion as a handy reference than to try to read the first edition. “Why’s (Poignant) Guide to Ruby,” by “why the lucky stiff,” teaches Ruby with stories, like an English primer. Excellent for creative beginners (http://poignantguide.net/ruby/). For Rails, the standard book is Agile Web Development with Rails byDave Thomas, David Hansson, Leon Breedt, and Mike Clark (Pragmatic Programmers). There are also two books like this one that focus exclusivelyon Rails: Rails Cookbook byRob Orsini (O’Reilly) and Rails Recipes by Chad Fowler (Pragmatic Programmers). Some common Rubypitfalls are explained in the RubyFAQ ( http://www.rubycentral. com/faq/, starting in Section 4) and in “Things That Newcomers to RubyShould Know” (http://www.glue.umd.edu/~billtj/ruby.html). Manypeople come to Rubyalreadyknowing one or more programming languages. You might find it frustrating to learn Rubywith a big book that thinks it has to teach you programming and Ruby. For such people, we recommend Ruby creator Yukihiro Matsumoto’s “RubyUser’s Guide” ( http://www.ruby-doc.org/docs/UsersGuide/rg/). It’s a short read, and it focuses on what makes Rubydifferent from other programming languages. Its terminologyis a little out of date, and it presents its code samples xxviii | Preface through the obsolete eval.rb program (use irb instead), but it’s the best short intro- duction we know of. There are a few articles especiallyfor Java programmers who want to learn Ruby:Jim Weirich’s “10 Things EveryJava Programmer Should Know About Ruby”( http:// onestepback.org/articles/10things/), Francis Hwang’s blog entry“Coming to Ruby from Java” (http://fhwang.net/blog/40.html), and Chris Williams’s collection of links, “From Java to Ruby(With Love)” ( http://cwilliams.textdriven.com/pages/java_to_ruby) Despite the names, C++ programmers will also benefit from much of what’s in these pieces. The RubyBookshelf ( http://books.rubyveil.com/books/Bookshelf/Introduction/Bookshelf) has produced a number of free books on Ruby, including many of the ones men- tioned above, in an easy-to-read HTML format. Finally, Ruby’s built-in modules, classes, and methods come with excellent docu- mentation (much of it originallywritten for Programming Ruby). You can read this documentation online at http://www.ruby-doc.org/core/ and http://www.ruby-doc.org/ stdlib/. You can also look it up on your own Ruby installation by using the ri com- mand. Pass in the name of a class or method, and ri will give you the corresponding documentation. Here are a few examples: $ ri Array # A class $ ri Array.new # A class method $ ri Array#compact # An instance method Conventions Used in This Book The following typographical conventions are used in this book: Plain text Indicates menu titles, menu options, menu buttons, and keyboard accelerators (such as Alt and Ctrl). Italic Indicates new terms, URLs, email addresses, and Unix utilities. Constant width Indicates commands, options, switches, variables, attributes, keys, functions, types, classes, namespaces, methods, modules, properties, parameters, values, objects, events, event handlers, XML tags, HTML tags, macros, programs, librar- ies, filenames, pathnames, directories, the contents of files, or the output from commands. Constant width bold Shows commands or other text that should be typed literally by the user. Constant width italic Shows text that should be replaced with user-supplied values. Preface | xxix Using Code Examples This book is here to help you get your job done. In general, you may use the code in this book in your programs and documentation. You do not need to contact us for permission unless you’re reproducing a significant portion of the code. For example, writing a program that uses several chunks of code from this book does not require permission. Selling or distributing a CD-ROM of examples from O’Reillybooks does require permission. Answering a question byciting this book and quoting example code does not require permission. Incorporating a significant amount of example code from this book into your product’s documentation does require permission. We appreciate, but do not require, attribution. An attribution usuallyincludes the title, author, publisher, and ISBN. For example: “Ruby Cookbook, byLucas Carlson and Leonard Richardson. Copyright 2006 O’Reilly Media, Inc., 0-596-52369-6.” If you feel your use of code examples falls outside fair use or the permission given above, feel free to contact us at permissions@oreilly.com. Comments and Questions Please address comments and questions concerning this book to the publisher: O’Reilly Media, Inc. 1005 Gravenstein Highway North Sebastopol, CA 95472 800-998-9938 (in the United States or Canada) 707-829-0515 (international or local) 707-829-0104 (fax) We have a web page for this book, where we list errata, examples, and anyaddi- tional information. You can access this page at: http://www.oreilly.com/catalog/rubyckbk To comment or ask technical questions about this book, send email to: bookquestions@oreilly.com For more information about our books, conferences, Resource Centers, and the O’Reilly Network, see our web site at: http://www.oreilly.com Acknowledgments First we’d like to thank our editor, Michael Loukides, for his help and for acquiesc- ing to our use of his name in recipe code samples, even when we turned him into a talking frog. The production editor, Colleen Gorman, was also very helpful. xxx | Preface This book would have taken longer to write and been less interesting without our contributing authors, who, collectively, wrote over 60 of these recipes. The roll of names includes: Steve Arniel, Ben Bleything, Antonio Cangiano, Mauro Cicio, Mau- rice Codik, Thomas Enebo, Pat Eyler, Bill Froelich, Rod Gaither, Ben Giddings, Michael Granger, James Edward GrayII, Stefan Lang, Kevin Marshall, Matthew Palmer Chetan Patil, Alun ap Rhisiart, Garrett Rooney, John-Mason Shackelford, Phil Tomson, and John Wells. Theysaved us time bylending their knowledge of var- ious Ruby topics, and they enriched the book with their ideas. This book would be of appallinglylow qualitywere it not for our technical review- ers, who spotted dozens of bugs, platform-specific problems, and conceptual errors: John N. Alegre, Dave Burt, Bill Dolinar, Simen Edvardsen, Shane Emmons, Edward Faulkner, Dan Fitzpatrick, Bill Guindon, Stephen Hildrey, Meador Inge, Eric Jaco- boni, Julian I. Kamil, RandyKramer, Alex LeDonne, Steven Lumos, Keith Rosenb- latt, Gene Tani, and R Vrajmohan. Finally, to the programmers and writers of the Ruby community; from the celebri- ties like Yukihiro Matsumoto, Dave Thomas, Chad Fowler, and “why”, to the hun- dreds of unsung heroes whose work went into the libraries we demonstrate throughout the book, and whose skill and patience bring more people into the Ruby community all the time. 1 Chapter 1 CHAPTER 1 Strings1 Rubyis a programmer-friendlylanguage. If youare alreadyfamiliar with object- oriented programming, Rubyshould quicklybecome second nature. If you’vestrug- gled with learning object-oriented programming or are not familiar with it, Ruby should make more sense to you than other object-oriented languages because Ruby’s methods are consistently named, concise, and generally act the way you expect. Throughout this book, we demonstrate concepts through interactive Rubysessions. Strings are a good place to start because not only are they a useful data type, they’re easyto create and use. Theyprovide a simple introduction to Ruby,a point of com- parison between Rubyand other languages youmight know, and an approachable wayto introduce important Rubyconcepts like duck typing(see Recipe 1.12), open classes (demonstrated in Recipe 1.10), symbols (Recipe 1.7), and even Ruby gems (Recipe 1.20). If you use Mac OS X or a Unix environment with Ruby installed, go to your com- mand line right now and type irb. If you’re using Windows, you can download and install the One-Click Installer from http://rubyforge.org/projects/rubyinstaller/, and do the same from a command prompt (you can also run the fxri program, if that’s more comfortable for you). You’ve now entered an interactive Ruby shell, and you can fol- low along with the code samples in most of this book’s recipes. Strings in Ruby are much like strings in other dynamic languages like Perl, Python and PHP. They’re not too much different from strings in Java and C. Ruby strings are dynamic, mutable, and flexible. Get started with strings by typing this line into your interactive Ruby session: string = "My first string" You should see some output that looks like this: => "My first string" You typed in a Ruby expression that created a string “My first string”, and assigned it to the variable string. The value of that expression is just the new value of string, which is what your interactive Ruby session printed out on the right side of the 2 | Chapter 1: Strings arrow. Throughout this book, we’ll represent this kind of interaction in the follow- ing form:* string = "My first string" # => "My first string" In Ruby, everything that can be assigned to a variable is an object. Here, the variable string points to an object of class String. That class defines over a hundred built-in methods: named pieces of code that examine and manipulate the string. We’ll explore some of these throughout the chapter, and indeed the entire book. Let’s try out one now: String#length, which returns the number of bytes in a string. Here’s a Ruby method call: string.length # => 15 Many programming languages make you put parentheses after a method call: string.length( ) # => 15 In Ruby, parentheses are almost always optional. They’re especially optional in this- case, since we’re not passing anyarguments into String#length. If you’re passing arguments into a method, it’s often more readable to enclose the argument list in parentheses: string.count 'i' # => 2 # "i" occurs twice. string.count('i') # => 2 The return value of a method call is itself an object. In the case of String#length, the return value is the number 15, an instance of the Fixnum class. We can call a method on this object as well: string.length.next # => 16 Let’s take a more complicated case: a string that contains non-ASCII characters. This string contains the French phrase “il était une fois,” encoded as UTF-8:† french_string = "il \xc3\xa9tait une fois" # => "il \303\251tait une fois" Manyprogramming languages (notablyJava) treat a string as a series of characters. Rubytreats a string as a series of bytes.The French string contains 14 letters and 3 spaces, so you might think Ruby would say the length of the string is 17. But one of the letters (the e with acute accent) is represented as two bytes, and that’s what Ruby counts: french_string.length # => 18 For more on handling different encodings, see Recipe 1.14 and Recipe 11.12. For more on this specific problem, see Recipe 1.8 You can represent special characters in strings (like the binarydata in the French string) with string escaping. Rubydoes different typesof string escaping depending * Yes, this was covered in the Preface, but not everyone reads the Preface. † “\xc3\xa9” is a Ruby string representation of the UTF-8 encoding of the Unicode character é. Strings | 3 on how you create the string. When you enclose a string in double quotes, you can encode binarydata into the string (as in the French example above), and youcan encode newlines with the code “\n”, as in other programming languages: puts "This string\ncontains a newline" # This string # contains a newline When you enclose a string in single quotes, the only special codes you can use are “\'” to get a literal single quote, and “\\” to get a literal backslash: puts 'it may look like this string contains a newline\nbut it doesn\'t' # it may look like this string contains a newline\nbut it doesn't puts 'Here is a backslash: \\' # Here is a backslash: \ This is covered in more detail in Recipe 1.5. Also see Recipes 1.2 and 1.3 for more examples of the more spectacular substitutions double-quoted strings can do. Another useful way to initialize strings is with the “here documents” style: long_string = < "Here is a long string\nWith many paragraphs\n" puts long_string # Here is a long string # With many paragraphs Like most of Ruby’s built-in classes, Ruby’s strings define the same functionality in several different ways, so that you can use the idiom you prefer. Say you want to get a substring of a larger string (as in Recipe 1.13). If you’re an object-oriented pro- gramming purist, you can use the String#slice method: string # => "My first string" string.slice(3, 5) # => "first" But if you’re coming from C, and you think of a string as an array of bytes, Ruby can accommodate you. Selecting a single byte from a string returns that byte as a number. string.chr + string.chr + string.chr + string.chr + string.chr # => "first" And if you come from Python, and you like that language’s slice notation, you can just as easily chop up the string that way: string[3, 5] # => "first" Unlike in most programming languages, Rubystrings are mutable: youcan change them after theyare declared. Below we see the difference between the methods String#upcase and String#upcase!: string.upcase # => "MY FIRST STRING" string # => "My first string" 4 | Chapter 1: Strings string.upcase! # => "MY FIRST STRING" string # => "MY FIRST STRING" This is one of Ruby’s syntactical conventions. “Dangerous” methods (generally those that modifytheir object in place) usuallyhave an exclamation mark at the end of their name. Another syntactical convention is that predicates, methods that return a true/false value, have a question mark at the end of their name (as in some varieties of Lisp): string.empty? # => false string.include? 'MY' # => true This use of English punctuation to provide the programmer with information is an example of Matz’s design philosophy: that Ruby is a language primarily for humans to read and write, and secondarily for computers to interpret. An interactive Rubysession is an indispensable tool for learning and experimenting with these methods. Again, we encourage you to type the sample code shown in these recipes into an irb or fxri session, and tryto build upon the examples as your knowledge of Ruby grows. Here are some extra resources for using strings in Ruby: • You can get information about anybuilt-in Rubymethod with the ri command; for instance, to see more about the String#upcase! method, issue the command ri "String#upcase!" from the command line. • “whythe luckystiff” has written an excellent introduction to installing Ruby, and using irb and ri: http://poignantguide.net/ruby/expansion-pak-1.html • For more information about the design philosophybehind Ruby,read an inter- view with Yukihiro “Matz” Matsumoto, creator of Ruby: http://www.artima.com/ intv/ruby.html 1.1 Building a String from Parts Problem You want to iterate over a data structure, building a string from it as you do. Solution There are two efficient solutions. The simplest solution is to start with an empty string, and repeatedly append substrings onto it with the << operator: hash = { "key1" => "val1", "key2" => "val2" } string = "" hash.each { |k,v| string << "#{k} is #{v}\n" } puts string # key1 is val1 # key2 is val2 1.1 Building a String from Parts | 5 This variant of the simple solution is slightly more efficient, but harder to read: string = "" hash.each { |k,v| string << k << " is " << v << "\n" } If your data structure is an array, or easily transformed into an array, it’s usually more efficient to use Array#join: puts hash.keys.join("\n") + "\n" # key1 # key2 Discussion In languages like Python and Java, it’s very inefficient to build a string by starting with an emptystring and adding each substring onto the end. In those languages, strings are immutable, so adding one string to another builds an entirelynew string. Doing this multiple times creates a huge number of intermediarystrings, each of which is only used as a stepping stone to the next string. This wastes time and memory. In those languages, the most efficient wayto build a string is alwaysto put the sub- strings into an arrayor another mutable data structure, one that expands dynami- callyrather than byimplicitlycreating entirelynew objects. Once you’redone processing the substrings, you get a single string with the equivalent of Ruby’s Array#join. In Java, this is the purpose of the StringBuffer class. In Ruby, though, strings are just as mutable as arrays. Just like arrays, they can expand as needed, without using much time or memory. The fastest solution to this problem in Rubyis usuallyto forgo a holding arrayand tack the substrings directly onto a base string. Sometimes using Array#join is faster, but it’s usuallyprettyclose, and the << construction is generally easier to understand. If efficiency is important to you, don’t build a new string when you can append items onto an existing string. Constructs like str << 'a' + 'b' or str << "#{var1} #{var2}" create new strings that are immediatelysubsumed into the larger string. This is exactly what you’re trying to avoid. Use str << var1 << ' ' << var2 instead. On the other hand, you shouldn’t modify strings that aren’t yours. Sometimes safety requires that you create a new string. When you define a method that takes a string as an argument, you shouldn’t modify that string by appending other strings onto it, unless that’s reallythe point of the method (and unless the method’s name ends in an exclamation point, so that callers know it modifies objects in place). Another caveat: Array#join does not work preciselythe same wayas repeated appends to a string. Array#join accepts a separator string that it inserts between every two elements of the array. Unlike a simple string-building iteration over an array, it will not insert the separator string after the last element in the array. This example illustrates the difference: data = ['1', '2', '3'] s = '' 6 | Chapter 1: Strings data.each { |x| s << x << ' and a '} s # => "1 and a 2 and a 3 and a " data.join(' and a ') # => "1 and a 2 and a 3" To simulate the behavior of Array#join across an iteration, you can use Enumerable#each_with_index and omit the separator on the last index. This only works if you know how long the Enumerable is going to be: s = "" data.each_with_index { |x, i| s << x; s << "|" if i < data.length-1 } s # => "1|2|3" 1.2 Substituting Variables into Strings Problem You want to create a string that contains a representation of a Rubyvariable or expression. Solution Within the string, enclose the variable or expression in curlybrackets and prefix it with a hash character. number = 5 "The number is #{number}." # => "The number is 5." "The number is #{5}." # => "The number is 5." "The number after #{number} is #{number.next}." # => "The number after 5 is 6." "The number prior to #{number} is #{number-1}." # => "The number prior to 5 is 4." "We're ##{number}!" # => "We're #5!" Discussion When you define a string by putting it in double quotes, Ruby scans it for special substitution codes. The most common case, so common that you might not even think about it, is that Rubysubstitutes a single newline character everytime a string contains slash followed by the letter n (“\n”). Rubysupports more complex string substitutions as well. Anytext kept within the brackets of the special marker #{} (that is, #{text in here}) is interpreted as a Ruby expression. The result of that expression is substituted into the string that gets cre- ated. If the result of the expression is not a string, Rubycalls its to_s method and uses that instead. Once such a string is created, it is indistinguishable from a string created without using the string interpolation feature: "#{number}" == '5' # => true 1.2 Substituting Variables into Strings | 7 You can use string interpolation to run even large chunks of Rubycode inside a string. This extreme example defines a class within a string; its result is the return value of a method defined in the class. You should never have anyreason to do this, but it shows the power of this feature. %{Here is #{class InstantClass def bar "some text" end end InstantClass.new.bar }.} # => "Here is some text." The code run in string interpolations runs in the same context as anyother Rubycode in the same location. To take the example above, the InstantClass class has now been defined like any other class, and can be used outside the string that defines it. If a string interpolation calls a method that has side effects, the side effects are trig- gered. If a string definition sets a variable, that variable is accessible afterwards. It’s bad form to rely on this behavior, but you should be aware of it: "I've set x to #{x = 5; x += 1}." # => "I've set x to 6." x # => 6 To avoid triggering string interpolation, escape the hash characters or put the string in single quotes. "\#{foo}" # => "\#{foo}" '#{foo}' # => "\#{foo}" The “here document” construct is an alternative to the %{} construct, which is some- times more readable. It lets you define a multiline string that only ends when the Ruby parser encounters a certain string on a line by iteself: name = "Mr. Lorum" email = < "There once was a man from Peru\nWhose limericks stopped on line two\n" 8 | Chapter 1: Strings See Also • You can use the technique described in Recipe 1.3, “Substituting Variables into an Existing String,” to define a template string or object, and substitute in vari- ables later 1.3 Substituting Variables into an Existing String Problem You want to create a string that contains Rubyexpressions or variable substitutions, without actuallyperforming the substitutions. You plan to substitute values into the string later, possibly multiple times with different values each time. Solution There are two good solutions: printf-style strings, and ERB templates. Rubysupports a printf-style string format like C’s and Python’s. Put printf direc- tives into a string and it becomes a template. You can interpolate values into it later using the modulus operator: template = 'Oceania has always been at war with %s.' template % 'Eurasia' # => "Oceania has always been at war with Eurasia." template % 'Eastasia' # => "Oceania has always been at war with Eastasia." 'To 2 decimal places: %.2f' % Math::PI # => "To 2 decimal places: 3.14" 'Zero-padded: %.5d' % Math::PI # => "Zero-padded: 00003" An ERB template looks something like JSP or PHP code. Most of it is treated as a normal string, but certain control sequences are executed as Rubycode. The control sequence is replaced with either the output of the Rubycode, or the value of its last expression: require 'erb' template = ERB.new %q{Chunky <%= food %>!} food = "bacon" template.result(binding) # => "Chunky bacon!" food = "peanut butter" template.result(binding) # => "Chunky peanut butter!" You can omit the call to Kernel#binding if you’re not in an irb session: puts template.result # Chunky peanut butter! You mayrecognize this format from the .rhtml files used byRails views: theyuse ERB behind the scenes. 1.3 Substituting Variables into an Existing String | 9 Discussion An ERB template can reference variables like food before they’re defined. When you call ERB#result,orERB#run, the template is executed according to the current values of those variables. Like JSP and PHP code, ERB templates can contain loops and conditionals. Here’s a more sophisticated template: template = %q{ <% if problems.empty? %> Looks like your code is clean! <% else %> I found the following possible problems with your code: <% problems.each do |problem, line| %> * <%= problem %> on line <%= line %> <% end %> <% end %>}.gsub(/^\s+/, '') template = ERB.new(template, nil, '<>') problems = [["Use of is_a? instead of duck typing", 23], ["eval( ) is usually dangerous", 44]] template.run(binding) # I found the following possible problems with your code: # * Use of is_a? instead of duck typing on line 23 # * eval( ) is usually dangerous on line 44 problems = [] template.run(binding) # Looks like your code is clean! ERB is sophisticated, but neither it nor the printf-style strings look like the simple Rubystring substitutions described in Recipe 1.2. There’s an alternative. If youuse single quotes instead of double quotes to define a string with substitutions, the sub- stitutions won’t be activated. You can then use this string as a template with eval: class String def substitute(binding=TOPLEVEL_BINDING) eval(%{"#{self}"}, binding) end end template = %q{Chunky #{food}!} # => "Chunky \#{food}!" food = 'bacon' template.substitute(binding) # => "Chunky bacon!" food = 'peanut butter' template.substitute(binding) # => "Chunky peanut butter!" You must be verycareful when using eval: if you use a variable in the wrong way, youcould give an attacker the abilityto run arbitraryRubycode in your eval 10 | Chapter 1: Strings statement. That won’t happen in this example since anypossible value of food gets stuck into a string definition before it’s interpolated: food = '#{system("dir")}' puts template.substitute(binding) # Chunky #{system("dir")}! See Also • This recipe gives basic examples of ERB templates; for more complex examples, see the documentation of the ERB class (http://www.ruby-doc.org/stdlib/libdoc/ erb/rdoc/classes/ERB.html) • Recipe 1.2, “Substituting Variables into Strings” • Recipe 10.12, “Evaluating Code in an Earlier Context,” has more about Binding objects 1.4 Reversing a String by Words or Characters Problem The letters (or words) of your string are in the wrong order. Solution To create a new string that contains a reversed version of your original string, use the reverse method. To reverse a string in place, use the reverse! method. s = ".sdrawkcab si gnirts sihT" s.reverse # => "This string is backwards." s # => ".sdrawkcab si gnirts sihT" s.reverse! # => "This string is backwards." s # => "This string is backwards." To reverse the order of the words in a string, split the string into a list of whitespace- separated words, then join the list back into a string. s = "order. wrong the in are words These" s.split(/(\s+)/).reverse!.join('') # => "These words are in the wrong order." s.split(/\b/).reverse!.join('') # => "These words are in the wrong. order" Discussion The String#split method takes a regular expression to use as a separator. Each time the separator matches part of the string, the portion of the string before the separator goes into a list. split then resumes scanning the rest of the string. The result is a list of strings found between instances of the separator. The regular expression /(\s+)/ matches one or more whitespace characters; this splits the string on word bound- aries, which works for us because we want to reverse the order of the words. 1.5 Representing Unprintable Characters | 11 The regular expression \b matches a word boundary. This is not the same as match- ing whitespace, because it also matches punctuation. Note the difference in punctua- tion between the two final examples in the Solution. Because the regular expression /(\s+)/ includes a set of parentheses, the separator strings themselves are included in the returned list. Therefore, when we join the strings back together, we’ve preserved whitespace. This example shows the differ- ence between including the parentheses and omitting them: "Three little words".split(/\s+/) # => ["Three", "little", "words"] "Three little words".split(/(\s+)/) # => ["Three", " ", "little", " ", "words"] See Also • Recipe 1.9, “Processing a String One Word at a Time,” has some regular expres- sions for alternative definitions of “word” • Recipe 1.11, “Managing Whitespace” • Recipe 1.17, “Matching Strings with Regular Expressions” 1.5 Representing Unprintable Characters Problem You need to make reference to a control character, a strange UTF-8 character, or some other character that’s not on your keyboard. Solution Rubygives youa number of escaping mechanisms to refer to unprintable characters. Byusing one of these mechanisms within a double-quoted string, youcan put any binary character into the string. You can reference anyanybinarycharacter byencoding its octal representation into the format “\000”, or its hexadecimal representation into the format “\x00”. octal = "\000\001\010\020" octal.each_byte { |x| puts x } # 0 # 1 # 8 # 16 hexadecimal = "\x00\x01\x10\x20" hexadecimal.each_byte { |x| puts x } # 0 # 1 # 16 # 32 12 | Chapter 1: Strings This makes it possible to represent UTF-8 characters even when you can’t type them or displaythem in yourterminal. Tryrunning this program, and then opening the generated file smiley.html in your web browser: open('smiley.html', 'wb') do |f| f << '' f << "\xe2\x98\xBA" end The most common unprintable characters (such as newline) have special mneu- monic aliases consisting of a backslash and a letter. "\a" == "\x07" # => true # ASCII 0x07 = BEL (Sound system bell) "\b" == "\x08" # => true # ASCII 0x08 = BS (Backspace) "\e" == "\x1b" # => true # ASCII 0x1B = ESC (Escape) "\f" == "\x0c" # => true # ASCII 0x0C = FF (Form feed) "\n" == "\x0a" # => true # ASCII 0x0A = LF (Newline/line feed) "\r" == "\x0d" # => true # ASCII 0x0D = CR (Carriage return) "\t" == "\x09" # => true # ASCII 0x09 = HT (Tab/horizontal tab) "\v" == "\x0b" # => true # ASCII 0x0B = VT (Vertical tab) Discussion Rubystores a string as a sequence of bytes.It makes no difference whether those bytes are printable ASCII characters, binary characters, or a mix of the two. When Rubyprints out a human-readable string representation of a binarycharacter, it uses the character’s \xxx octal representation. Characters with special \x mneu- monics are printed as the mneumonic. Printable characters are output as their print- able representation, even if another representation was used to create the string. "\x10\x11\xfe\xff" # => "\020\021\376\377" "\x48\145\x6c\x6c\157\x0a" # => "Hello\n" To avoid confusion with the mneumonic characters, a literal backslash in a string is represented bytwo backslashes. For instance, the two-character string consisting of a backslash and the 14th letter of the alphabet is represented as “\\n”. "\\".size # => 1 "\\" == "\x5c" # => true "\\n"[0] == ?\\ # => true "\\n"[1] == ?n # => true "\\n" =~ /\n/ # => nil Rubyalso provides special shortcuts for representing keyboardsequences like Control-C. "\C-_x_" represents the sequence you get by holding down the control keyand hitting the x key, and "\M-_x_" represents the sequence you get by holding down the Alt (or Meta) key and hitting the x key: "\C-a\C-b\C-c" # => "\001\002\003" "\M-a\M-b\M-c" # => "\341\342\343" Shorthand representations of binarycharacters can be used whenever Rubyexpects a character. For instance, you can get the decimal byte number of a special character 1.5 Representing Unprintable Characters | 13 byprefixing it with ?, and you can use shorthand representations in regular expres- sion character ranges. ?\C-a # => 1 ?\M-z # => 250 contains_control_chars = /[\C-a-\C-^]/ 'Foobar' =~ contains_control_chars # => nil "Foo\C-zbar" =~ contains_control_chars # => 3 contains_upper_chars = /[\x80-\xff]/ 'Foobar' =~ contains_upper_chars # => nil "Foo\212bar" =~ contains_upper_chars # => 3 Here’s a sinister application that scans logged keystrokes for special characters: def snoop_on_keylog(input) input.each_byte do |b| case b when ?\C-c; puts 'Control-C: stopped a process?' when ?\C-z; puts 'Control-Z: suspended a process?' when ?\n; puts 'Newline.' when ?\M-x; puts 'Meta-x: using Emacs?' end end end snoop_on_keylog("ls -ltR\003emacsHello\012\370rot13-other-window\012\032") # Control-C: stopped a process? # Newline. # Meta-x: using Emacs? # Newline. # Control-Z: suspended a process? Special characters are onlyinterpreted in strings delimited bydouble quotes, or strings created with %{} or %Q{}. Theyare not interpreted in strings delimited bysin- gle quotes, or strings created with %q{}. You can take advantage of this feature when you need to display special characters to the end-user, or create a string containing a lot of backslashes. puts "foo\tbar" # foo bar puts %{foo\tbar} # foo bar puts %Q{foo\tbar} # foo bar puts 'foo\tbar' # foo\tbar puts %q{foo\tbar} # foo\tbar If you come to Ruby from Python, this feature can take advantage of you, making you wonder why the special characters in your single-quoted strings aren’t treated as 14 | Chapter 1: Strings special. If you need to create a string with special characters and a lot of embedded double quotes, use the %{} construct. 1.6 Converting Between Characters and Values Problem You want to see the ASCII code for a character, or transform an ASCII code into a string. Solution To see the ASCII code for a specific character as an integer, use the ? operator: ?a # => 97 ?! # => 33 ?\n # => 10 To see the integer value of a particular in a string, access it as though it were an ele- ment of an array: 'a'[0] # => 97 'bad sound'[1] # => 97 To see the ASCII character corresponding to a given number, call its #chr method. This returns a string containing only one character: 97.chr # => "a" 33.chr # => "!" 10.chr # => "\n" 0.chr # => "\000" 256.chr # RangeError: 256 out of char range Discussion Though not technicallyan array,a string acts a lot like like an arrayof Fixnum objects: one Fixnum for each byte in the string. Accessing a single element of the “array” yields a Fixnum for the corresponding byte: for textual strings, this is an ASCII code. Calling String#each_byte lets you iterate over the Fixnum objects that make up a string. See Also • Recipe 1.8, “Processing a String One Character at a Time” 1.7 Converting Between Strings and Symbols Problem You want to get a string containing the label of a Rubysymbol,or get the Rubysym- bol that corresponds to a given string. 1.7 Converting Between Strings and Symbols | 15 Solution To turn a symbol into a string, use Symbol#to_s,orSymbol#id2name, for which to_s is an alias. :a_symbol.to_s # => "a_symbol" :AnotherSymbol.id2name # => "AnotherSymbol" :"Yet another symbol!".to_s # => "Yet another symbol!" You usually reference a symbol by just typing its name. If you’re given a string in code and need to get the corresponding symbol, you can use String.intern: :dodecahedron.object_id # => 4565262 symbol_name = "dodecahedron" symbol_name.intern # => :dodecahedron symbol_name.intern.object_id # => 4565262 Discussion A Symbol is about the most basic Rubyobject youcan create. It’s just a name and an internal ID. Symbols are useful becase a given symbol name refers to the same object throughout a Ruby program. Symbols are often more efficient than strings. Two strings with the same contents are two different objects (one of the strings might be modified later on, and become dif- ferent), but for anygiven name there is onlyone Symbol object. This can save both time and memory. "string".object_id # => 1503030 "string".object_id # => 1500330 :symbol.object_id # => 4569358 :symbol.object_id # => 4569358 If you have n references to a name, you can keep all those references with only one symbol, using only one object’s worth of memory. With strings, the same code would use n different objects, all containing the same data. It’s also faster to com- pare two symbols than to compare two strings, because Ruby only has to check the object IDs. "string1" == "string2" # => false :symbol1 == :symbol2 # => false Finally, to quote Ruby hacker Jim Weirich on when to use a string versus a symbol: • If the contents (the sequence of characters) of the object are important, use a string. • If the identity of the object is important, use a symbol. See Also • See Recipe 5.1, “Using Symbols as Hash Keys” for one use of symbols • Recipe 8.12, “Simulating Keyword Arguments,” has another 16 | Chapter 1: Strings • Chapter 10, especiallyRecipe 10.4, “Getting a Reference to a Method” and Rec- ipe 10.10, “Avoiding Boilerplate Code with Metaprogramming” • See http://glu.ttono.us/articles/2005/08/19/understanding-ruby-symbols for a sym- bol primer 1.8 Processing a String One Character at a Time Problem You want to process each character of a string individually. Solution If you’re processing an ASCII document, then each byte corresponds to one charac- ter. Use String#each_byte to yield each byte of a string as a number, which you can turn into a one-character string: 'foobar'.each_byte { |x| puts "#{x} = #{x.chr}" } # 102 = f # 111 = o # 111 = o # 98 = b # 97 = a # 114 = r Use String#scan to yield each character of a string as a new one-character string: 'foobar'.scan( /./ ) { |c| puts c } # f # o # o # b # a # r Discussion Since a string is a sequence of bytes, you might think that the String#each method would iterate over the sequence, the way Array#each does. But String#each is actu- ally used to split a string on a given record separator (by default, the newline): "foo\nbar".each { |x| puts x } # foo # bar The string equivalent of Array#each method is actually each_byte. A string stores its characters as a sequence of Fixnum objects, and each_bytes yields that sequence. String#each_byte is faster than String#scan, so if you’re processing an ASCII file, you might want to use String#each_byte and convert to a string everynumber passed into the code block (as seen in the Solution). 1.9 Processing a String One Word at a Time | 17 String#scan works by applying a given regular expression to a string, and yielding each match to the code block you provide. The regular expression /./ matches every character in the string, in turn. If you have the $KCODE variable set correctly, then the scan technique will work on UTF-8 strings as well. This is the simplest wayto sneak a notion of “character” into Ruby’s byte-based strings. Here’s a Ruby string containing the UTF-8 encoding of the French phrase “ça va”: french = "\xc3\xa7a va" Even if your terminal can’t properly display the character “ç”, you can see how the behavior of String#scan changes when you make the regular expression Unicode- aware, or set $KCODE so that Ruby handles all strings as UTF-8: french.scan(/./) { |c| puts c } # # # a # # v # a french.scan(/./u) { |c| puts c } # ç # a # # v # a $KCODE = 'u' french.scan(/./) { |c| puts c } # ç # a # # v # a Once Rubyknows to treat strings as UTF-8 instead of ASCII, it starts treating the two bytes representing the “ç” as a single character. Even if you can’t see UTF-8, you can write programs that handle it correctly. See Also • Recipe 11.12, “Converting from One Encoding to Another” 1.9 Processing a String One Word at a Time Problem You want to split a piece of text into words, and operate on each word. 18 | Chapter 1: Strings Solution First decide what you mean by “word.” What separates one word from another? Only whitespace? Whitespace or punctuation? Is “johnny-come-lately” one word or three? Build a regular expression that matches a single word according to whatever definition you need (there are some samples are in the Discussion). Then pass that regular expression into String#scan. Everyword it finds, it will yield to a code block. The word_count method defined below takes a piece of text and cre- ates a histogram of word frequencies. Its regular expression considers a “word” to be a string of Ruby identifier characters: letters, numbers, and underscores. class String def word_count frequencies = Hash.new(0) downcase.scan(/\w+/) { |word| frequencies[word] += 1 } return frequencies end end %{Dogs dogs dog dog dogs.}.word_count # => {"dogs"=>3, "dog"=>2} %{"I have no shame," I said.}.word_count # => {"no"=>1, "shame"=>1, "have"=>1, "said"=>1, "i"=>2} Discussion The regular expression /\w+/ is nice and simple, but you can probably do better for your application’s definition of “word.” You probably don’t consider two words sep- arated byan underscore to be a single word. Some English words, like “pan-fried” and “fo’c’sle”, contain embedded punctuation. Here are a few more definitions of “word” in regular expression form: # Just like /\w+/, but doesn't consider underscore part of a word. /[0-9A-Za-z]/ # Anything that's not whitespace is a word. /[^\S]+/ # Accept dashes and apostrophes as parts of words. /[-'\w]+/ # A pretty good heuristic for matching English words. /(\w+([-'.]\w+)*/ The last one deserves some explanation. It matches embedded punctuation within a word, but not at the edges. “Work-in-progress” is recognized as a single word, and “—-never—-” is recognized as the word “never” surrounded bypunctuation. This regular expression can even pick out abbreviations and acronyms such as “Ph.D” and “U.N.C.L.E.”, though it can’t distinguish between the final period of an acro- nym and the period that ends a sentence. This means that “E.F.F.” will be recog- nized as the word “E.F.F” and then a nonword period. 1.10 Changing the Case of a String | 19 Let’s rewrite our word_count method to use that regular expression. We can’t use the original implementation, because its code block takes onlyone argument. String#scan passes its code block one argument for each match group in the regular expression, and our improved regular expression has two match groups. The first match group is the one that actuallycontains the word. So we must rewrite word_ count so that its code block takes two arguments, and ignores the second one: class String def word_count frequencies = Hash.new(0) downcase.scan(/(\w+([-'.]\w+)*)/) { |word, ignore| frequencies[word] += 1 } return frequencies end end %{"That F.B.I. fella--he's quite the man-about-town."}.word_count # => {"quite"=>1, "f.b.i"=>1, "the"=>1, "fella"=>1, "that"=>1, # "man-about-town"=>1, "he's"=>1} Note that the “\w” character set matches different things depending on the value of $KCODE. By default, “\w” matches only characters that are part of ASCII words: french = "il \xc3\xa9tait une fois" french.word_count # => {"fois"=>1, "une"=>1, "tait"=>1, "il"=>1} If you turn on Ruby’s UTF-8 support, the “\w” character set matches more characters: $KCODE='u' french.word_count # => {"fois"=>1, "une"=>1, "était"=>1, "il"=>1} The regular expression group \b matches a word boundary: that is, the last part of a word before a piece of whitespace or punctuation. This is useful for String#split (see Recipe 1.4), but not so useful for String#scan. See Also • Recipe 1.4, “Reversing a String by Words or Characters” • The Facets core librarydefines a String#each_word method, using the regular expression /([-'\w]+)/ 1.10 Changing the Case of a String Problem Your string is in the wrong case, or no particular case at all. 20 | Chapter 1: Strings Solution The String class provides a variety of case-shifting methods: s = 'HELLO, I am not here. I WENT to tHe MaRKEt.' s.upcase # => "HELLO, I AM NOT HERE. I WENT TO THE MARKET." s.downcase # => "hello, i am not here. i went to the market." s.swapcase # => "hello, i AM NOT HERE. i went TO ThE mArkeT." s.capitalize # => "Hello, i am not here. i went to the market." Discussion The upcase and downcase methods force all letters in the string to upper- or lower- case, respectively. The swapcase method transforms uppercase letters into lowercase letters and vice versa. The capitalize method makes the first character of the string uppercase, if it’s a letter, and makes all other letters in the string lowercase. All four methods have corresponding methods that modifya string in place rather than creating a new one: upcase!, downcase!, swapcase!, and capitalize!. Assuming you don’t need the original string, these methods will save memory, especially if the string is large. un_banged = 'Hello world.' un_banged.upcase # => "HELLO WORLD." un_banged # => "Hello world." banged = 'Hello world.' banged.upcase! # => "HELLO WORLD." banged # => "HELLO WORLD." To capitalize a string without lowercasing the rest of the string (for instance, because the string contains proper nouns), you can modify the first character of the string in place. This corresponds to the capitalize! method. If you want something more like capitalize, you can create a new string out of the old one. class String def capitalize_first_letter self[0].chr.capitalize + self[1, size] end def capitalize_first_letter! unless self[0] == (c = self[0,1].upcase[0]) self[0] = c self end # Return nil if no change was made, like upcase! et al. end end s = 'i told Alice. She remembers now.' s.capitalize_first_letter # => "I told Alice. She remembers now." s # => "i told Alice. She remembers now." s.capitalize_first_letter! s # => "I told Alice. She remembers now." 1.11 Managing Whitespace | 21 To change the case of specific letters while leaving the rest alone, you can use the tr or tr! methods, which translate one character into another: 'LOWERCASE ALL VOWELS'.tr('AEIOU', 'aeiou') # => "LoWeRCaSe aLL VoWeLS" 'Swap case of ALL VOWELS'.tr('AEIOUaeiou', 'aeiouAEIOU') # => "SwAp cAsE Of aLL VoWeLS" See Also • Recipe 1.18, “Replacing Multiple Patterns in a Single Pass” • The Facets Core libraryadds a String#camelcase method; it also defines the case predicates String#lowercase? and String#uppercase? 1.11 Managing Whitespace Problem Your string contains too much whitespace, not enough whitespace, or the wrong kind of whitespace. Solution Use strip to remove whitespace from the beginning and end of a string: " \tWhitespace at beginning and end. \t\n\n".strip Add whitespace to one or both ends of a string with ljust, rjust, and center: s = "Some text." s.center(15) s.ljust(15) s.rjust(15) Use the gsub method with a string or regular expression to make more complex changes, such as to replace one type of whitespace with another. #Normalize Ruby source code by replacing tabs with spaces rubyCode.gsub("\t", " ") #Transform Windows-style newlines to Unix-style newlines "Line one\n\rLine two\n\r".gsub(\n\r", "\n") # => "Line one\nLine two\n" #Transform all runs of whitespace into a single space character "\n\rThis string\t\t\tuses\n all\tsorts\nof whitespace.".gsub(/\s+/," ") # => " This string uses all sorts of whitespace." Discussion What counts as whitespace? Anyof these five characters: space, tab ( \t), newline (\n), linefeed (\r), and form feed (\f). The regular expression /\s/ matches anyone 22 | Chapter 1: Strings character from that set. The strip method strips anycombination of those charac- ters from the beginning or end of a string. In rare cases you may need to handle oddball “space” characters like backspace (\b or \010) and vertical tab (\v or \012). These are not part of the \s character group in a regular expression, so use a custom character group to catch these characters. " \bIt's whitespace, Jim,\vbut not as we know it.\n".gsub(/[\s\b\v]+/, " ") # => "It's whitespace, Jim, but not as we know it." To remove whitespace from onlyone end of a string, use the lstrip or rstrip method: s = " Whitespace madness! " s.lstrip # => "Whitespace madness! " s.rstrip # => " Whitespace madness!" The methods for adding whitespace to a string (center, ljust, and rjust) take a sin- gle argument: the total length of the string theyshould return, counting the original string and anyadded whitespace. If center can’t center a string perfectly, it’ll put one extra space on the right: "four".center(5) # => "four " "four".center(6) # => " four " Like most string-modifying methods, strip, gsub, lstrip, and rstrip have counter- parts strip!, gsub!, lstrip!, and rstrip!, which modify the string in place. 1.12 Testing Whether an Object Is String-Like Problem You want to see whether you can treat an object as a string. Solution Check whether the object defines the to_str method. 'A string'.respond_to? :to_str # => true Exception.new.respond_to? :to_str # => true 4.respond_to? :to_str # => false More generally, check whether the object defines the specific method of String you’re thinking about calling. If the object defines that method, the right thing to do is usu- ally to go ahead and call the method. This will make your code work in more places: def join_to_successor(s) raise ArgumentError, 'No successor method!' unless s.respond_to? :succ return "#{s}#{s.succ}" end join_to_successor('a') # => "ab" join_to_successor(4) # => "45" 1.13 Getting the Parts of a String You Want | 23 join_to_successor(4.01) # ArgumentError: No successor method! If I’d checked s.is_a? String instead of s.respond_to? :succ, then I wouldn’t have been able to call join_to_successor on an integer. Discussion This is the simplest example of Ruby’s philosophy of “duck typing:” if an object quacks like a duck (or acts like a string), just go ahead and treat it as a duck (or a string). Whenever possible, you should treat objects according to the methods they define rather than the classes from which they inherit or the modules they include. Calling obj.is_a? String will tell you whether an object derives from the String class, but it will overlook objects that, though intended to be used as strings, don’t inherit from String. Exceptions, for instance, are essentiallystrings that have extra information associ- ated with them. But theydon’t subclass class name "String". Code that uses is_a? String to check for stringness will overlook the essential stringness of Exceptions. Manyadd-on Rubymodules define other classes that can act as strings: code that calls is_a? String will break when given an instance of one of those classes. The idea to take to heart here is the general rule of duck typing: to see whether pro- vided data implements a certain method, use respond_to? instead of checking the class. This lets a future user (possiblyyourself!)create new classes that offer the same capability, without being tied down to the preexisting class structure. All you have to do is make the method names match up. See Also • Chapter 8, especiallythe chapter introduction and Recipe 8.3, “Checking Class or Module Membership” 1.13 Getting the Parts of a String You Want Problem You want only certain pieces of a string. Solution To get a substring of a string, call its slice method, or use the arrayindex operator (that is, call the [] method). Either method accepts a Range describing which charac- ters to retrieve, or two Fixnum arguments: the index at which to start, and the length of the substring to be extracted. s = 'My kingdom for a string!' s.slice(3,7) # => "kingdom" 24 | Chapter 1: Strings s[3,7] # => "kingdom" s[0,3] # => "My " s[11, 5] # => "for a" s[11, 17] # => "for a string!" To get the first portion of a string that matches a regular expression, pass the regular expression into slice or []: s[/.ing/] # => "king" s[/str.*/] # => "string!" Discussion To access a specific byte of a string as a Fixnum, pass onlyone argument (the zero- based index of the character) into String#slice or [] method. To access a specific byte as a single-character string, pass in its index and the number 1. s.slice(3) # => 107 s[3] # => 107 107.chr # => "k" s.slice(3,1) # => "k" s[3,1] # => "k" To count from the end of the string instead of the beginning, use negative indexes: s.slice(-7,3) # => "str" s[-7,6] # => "string" If the length of your proposed substring exceeds the length of the string, slice or [] will return the entire string after that point. This leads to a simple shortcut for get- ting the rightmost portion of a string: s[15...s.length] # => "a string!" See Also • Recipe 1.9, “Processing a String One Word at a Time” • Recipe 1.17, “Matching Strings with Regular Expressions” 1.14 Handling International Encodings Problem You need to handle strings that contain nonASCII characters: probablyUnicode characters encoded in UTF-8. Solution To use Unicode in Ruby, simply add the following to the beginning of code. $KCODE='u' require 'jcode' 1.14 Handling International Encodings | 25 You can also invoke the Ruby interpreter with arguments that do the same thing: $ ruby -Ku -rjcode If you use a Unix environment, you can add the arguments to the shebang line of your Ruby application: #!/usr/bin/ruby -Ku -rjcode The jcode libraryoverrides most of the methods of String and makes them capable of handling multibyte text. The exceptions are String#length, String#count, and String#size, which are not overridden. Instead jcode defines three new methods: String#jlength, string#jcount, and String#jsize. Discussion Consider a UTF-8 string that encodes six Unicode characters: efbca1 (A), efbca2 (B), and so on up to UTF-8 efbca6 (F): string = "\xef\xbc\xa1" + "\xef\xbc\xa2" + "\xef\xbc\xa3" + "\xef\xbc\xa4" + "\xef\xbc\xa5" + "\xef\xbc\xa6" The string contains 18 bytes that encode 6 characters: string.size # => 18 string.jsize # => 6 String#count is a method that takes a strong of bytes, and counts how many times those bytes occurs in the string. String#jcount takes a string of characters and counts how many times those characters occur in the string: string.count "\xef\xbc\xa2" # => 13 string.jcount "\xef\xbc\xa2" # => 1 String#count treats "\xef\xbc\xa2" as three separate bytes, and counts the number of times each of those bytes shows up in the string. String#jcount treats the same string as a single character, and looks for that character in the string, finding it only once. "\xef\xbc\xa2".length # => 3 "\xef\xbc\xa2".jlength # => 1 Apart from these differences, Rubyhandles most Unicode behind the scenes. Once you have your data in UTF-8 format, you really don’t have to worry. Given that Ruby’s creator Yukihiro Matsumoto is Japanese, it is no wonder that Ruby handles Unicode so elegantly. See Also • If you have text in some other encoding and need to convert it to UTF-8, use the iconv library, as described in Recipe 11.2, “Extracting Data from a Document’s Tree Structure” • There are several online search engines for Unicode characters; two good ones are at http://isthisthingon.org/unicode/ and http://www.fileformat.info/info/unicode/char/ search.htm 26 | Chapter 1: Strings 1.15 Word-Wrapping Lines of Text Problem You want to turn a string full of miscellaneous whitespace into a string formatted with linebreaks at appropriate intervals, so that the text can be displayed in a win- dow or sent as an email. Solution The simplest wayto add newlines to a piece of text is to use a regular expression like the following. def wrap(s, width=78) s.gsub(/(.{1,#{width}})(\s+|\Z)/, "\\1\n") end wrap("This text is too short to be wrapped.") # => "This text is too short to be wrapped.\n" puts wrap("This text is not too short to be wrapped.", 20) # This text is not too # short to be wrapped. puts wrap("These ten-character columns are stifling my creativity!", 10) # These # ten-character # columns # are # stifling # my # creativity! Discussion The code given in the Solution preserves the original formatting of the string, insert- ing additional line breaks where necessary. This works well when you want to pre- serve the existing formatting while squishing everything into a smaller space: poetry = %q{It is an ancient Mariner, And he stoppeth one of three. "By thy long beard and glittering eye, Now wherefore stopp'st thou me?} puts wrap(poetry, 20) # It is an ancient # Mariner, # And he stoppeth one # of three. # "By thy long beard # and glittering eye, # Now wherefore # stopp'st thou me? 1.15 Word-Wrapping Lines of Text | 27 But sometimes the existing whitespace isn’t important, and preserving it makes the result look bad: prose = %q{I find myself alone these days, more often than not, watching the rain run down nearby windows. How long has it been raining? The newspapers now print the total, but no one reads them anymore.} puts wrap(prose, 60) # I find myself alone these days, more often than not, # watching the rain run down nearby windows. How long has it # been # raining? The newspapers now print the total, but no one # reads them # anymore. Looks prettyragged. In this case, we want to get replace the original newlines with new ones. The simplest wayto do this is to preprocess the string with another regu- lar expression: def reformat_wrapped(s, width=78) s.gsub(/\s+/, " ").gsub(/(.{1,#{width}})( |\Z)/, "\\1\n") end But regular expressions are relativelyslow; it’s much more efficient to tear the string apart into words and rebuild it: def reformat_wrapped(s, width=78) lines = [] line = "" s.split(/\s+/).each do |word| if line.size + word.size >= width lines << line line = word elsif line.empty? line = word else line << " " << word end end lines << line if line return lines.join "\n" end puts reformat_wrapped(prose, 60) # I find myself alone these days, more often than not, # watching the rain run down nearby windows. How long has it # been raining? The newspapers now print the total, but no one # reads them anymore. See Also • The Facets Core library defines String#word_wrap and String#word_wrap! methods 28 | Chapter 1: Strings 1.16 Generating a Succession of Strings Problem You want to iterate over a series of alphabetically-increasing strings as you would over a series of numbers. Solution If you know both the start and end points of your succession, you can simply create a range and use Range#each, as you would for numbers: ('aa'..'ag').each { |x| puts x } # aa # ab # ac # ad # ae # af # ag The method that generates the successor of a given string is String#succ. If you don’t know the end point of your succession, you can define a generator that uses succ, and break from the generator when you’re done. def endless_string_succession(start) while true yield start start = start.succ end end This code iterates over an endless succession of strings, stopping when the last two letters are the same: endless_string_succession('fol') do |x| puts x break if x[-1] == x[-2] end # fol # fom # fon # foo Discussion Imagine a string as an odometer. Each character position of the string has a separate dial, and the current odometer reading is your string. Each dial always shows the same kind of character. A dial that starts out showing a number will always show a number. A character that starts out showing an uppercase letter will always show an uppercase letter. 1.16 Generating a Succession of Strings | 29 The string succession operation increments the odometer. It moves the rightmost dial forward one space. This might make the rightmost dial wrap around to the beginning: if that happens, the dial directlyto its left is also moved forward one space. This might make that dial wrap around to the beginning, and so on: '89999'.succ # => "90000" 'nzzzz'.succ # => "oaaaa" When the leftmost dial wraps around, a new dial is added to the left of the odome- ter. The new dial is always of the same type as the old leftmost dial. If the old left- most dial showed capital letters, then so will the new leftmost dial: 'Zzz'.succ # => "AAaa" Lowercase letters wrap around from “z” to “a”. If the first character is a lowercase letter, then when it wraps around, an “a” is added on to the beginning of the string: 'z'.succ # => "aa" 'aa'.succ # => "ab" 'zz'.succ # => "aaa" Uppercase letters work in the same way: “Z” becomes “A”. Lowercase and upper- case letters never mix. 'AA'.succ # => "AB" 'AZ'.succ # => "BA" 'ZZ'.succ # => "AAA" 'aZ'.succ # => "bA" 'Zz'.succ # => "AAa" Digits in a string are treated as numbers, and wrap around from 9 to 0, just like a car odometer. 'foo19'.succ # => "foo20" 'foo99'.succ # => "fop00" '99'.succ # => "100" '9Z99'.succ # => "10A00" Characters other than alphanumerics are not incremented unless theyare the only characters in the string. Theyare simplyignored when calculating the succession, and reproduced in the same positions in the new string. This lets you build format- ting into the strings you want to increment. '10-99'.succ # => "11-00" When nonalphanumerics are the onlycharacters in the string, theyare incremented according to ASCII order. Eventuallyan alphanumeric will show up, and the rules for strings containing alphanumerics will take over. 'a-a'.succ # => "a-b" 'z-z'.succ # => "aa-a" 'Hello!'.succ # => "Hellp!" %q{'zz'}.succ # => "'aaa'" %q{z'zz'}.succ # => "aa'aa'" '$$$$'.succ # => "$$$%" 30 | Chapter 1: Strings s = '!@-' 13.times { puts s = s.succ } # !@. # !@/ # !@0 # !@1 # !@2 # ... # !@8 # !@9 # !@10 There’s no reverse version of String#succ. Matz, and the communityas a whole, think there’s not enough demand for such a method to justifythe work necessaryto handle all the edge cases. If you need to iterate over a succession of strings in reverse, your best bet is to transform the range into an array and iterate over that in reverse: ("a".."e").to_a.reverse_each { |x| puts x } # e # d # c # b # a See Also • Recipe 2.15, “Generating a Sequence of Numbers” • Recipe 3.4, “Iterating Over Dates” 1.17 Matching Strings with Regular Expressions Problem You want to know whether or not a string matches a certain pattern. Solution You can usuallydescribe the pattern as a regular expression. The =~ operator tests a string against a regular expression: string = 'This is a 30-character string.' if string =~ /([0-9]+)-character/ and $1.to_i == string.length "Yes, there are #$1 characters in that string." end # => "Yes, there are 30 characters in that string." You can also use Regexp#match: match = Regexp.compile('([0-9]+)-character').match(string) if match && match[1].to_i == string.length "Yes, there are #{match[1]} characters in that string." end # => "Yes, there are 30 characters in that string." 1.17 Matching Strings with Regular Expressions | 31 You can check a string against a series of regular expressions with a case statement: string = "123" case string when /^[a-zA-Z]+$/ "Letters" when /^[0-9]+$/ "Numbers" else "Mixed" end # => "Numbers" Discussion Regular expressions are a cryptic but powerful minilanguage for string matching and substring extraction. They’ve been around for a long time in Unix utilities like sed, but Perl was the first general-purpose programming language to include them. Now almost all modern languages have support for Perl-style regular expression. Rubyprovides several waysof initializing regular expressions. The following are all equivalent and create equivalent Regexp objects: /something/ Regexp.new("something") Regexp.compile("something") %r{something} The following modifiers are also of note. Here’s how to use these modifiers to create regular expressions: /something/mxi Regexp.new('something', Regexp::EXTENDED + Regexp::IGNORECASE + Regexp::MULTILINE) %r{something}mxi Here’s how the modifiers work: case_insensitive = /mangy/i case_insensitive =~ "I'm mangy!" # => 4 case_insensitive =~ "Mangy Jones, at your service." # => 0 multiline = /a.b/m multiline =~ "banana\nbanana" # => 5 /a.b/ =~ "banana\nbanana" # => nil Regexp::IGNORECASE i Makes matches case-insensitive. Regexp::MULTILINE m Normally, a regexp matches against a single line of a string. This will cause a regexp to treat line breaks like any other character. Regexp::EXTENDED x This modifier lets you space out your regular expressions with whitespace and comments, making them more legible. 32 | Chapter 1: Strings # But note: /a\nb/ =~ "banana\nbanana" # => 5 extended = %r{ \ was # Match " was" \s # Match one whitespace character a # Match "a" }xi extended =~ "What was Alfred doing here?" # => 4 extended =~ "My, that was a yummy mango." # => 8 extended =~ "It was\n\n\na fool's errand" # => nil See Also • Mastering Regular Expressions byJeffreyFriedl (O’Reilly)gives a concise intro- duction to regular expressions, with many real-world examples • RegExLib.com provides a searchable database of regular expressions (http:// regexlib.com/default.aspx) • A Ruby-centric regular expression tutorial (http://www.regular-expressions.info/ ruby.html) • ri Regexp • Recipe 1.19, “Validating an Email Address” 1.18 Replacing Multiple Patterns in a Single Pass Problem You want to perform multiple, simultaneous search-and-replace operations on a string. Solution Use the Regexp.union method to aggregate the regular expressions you want to match into one big regular expression that matches anyof them. Pass the big regular expres- sion into String#gsub, along with a code block that takes a MatchData object. You can detect which of your search terms actually triggered the regexp match, and choose the appropriate replacement term: class String def mgsub(key_value_pairs=[].freeze) regexp_fragments = key_value_pairs.collect { |k,v| k } gsub(Regexp.union(*regexp_fragments)) do |match| key_value_pairs.detect{|k,v| k =~ match}[1] end end end Here’s a simple example: "GO HOME!".mgsub([[/.*GO/i, 'Home'], [/home/i, 'is where the heart is']]) # => "Home is where the heart is!" 1.19 Validating an Email Address | 33 This example replaces all letters with pound signs, and all pound signs with the letter P: "Here is number #123".mgsub([[/[a-z]/i, '#'], [/#/, 'P']]) # => "#### ## ###### P123" Discussion The naive solution is to simplystring together multiple gsub calls. The following examples, copied from the solution, show why this is often a bad idea: "GO HOME!".gsub(/.*GO/i, 'Home').gsub(/home/i, 'is where the heart is') # => "is where the heart is is where the heart is!" "Here is number #123".gsub(/[a-z]/i, "#").gsub(/#/, "P") # => "PPPP PP PPPPPP P123" In both cases, our replacement strings turned out to match the search term of a later gsub call. Our replacement strings were themselves subject to search-and-replace. In the first example, the conflict can be fixed byreversing the order of the substitu- tions. The second example shows a case where reversing the order won’t help. You need to do all your replacements in a single pass over the string. The mgsub method will take a hash, but it’s safer to pass in an arrayof key-value pairs. This is because elements in a hash come out in no particular order, so you can’t control the order of substution. Here’s a demonstration of the problem: "between".mgsub(/ee/ => 'AA', /e/ => 'E') # Bad code # => "bEtwEEn" "between".mgsub([[/ee/, 'AA'], [/e/, 'E']]) # Good code # => "bEtwAAn" In the second example, the first substitution runs first. In the first example, it runs second (and doesn’t find anything to replace) because of a quirk of Ruby’s Hash implementation. If performance is important, you may want to rethink how you implement mgsub. The more search and replace terms you add to the array of key-value pairs, the longer it will take, because the detect method performs a set of regular expression checks for every match found in the string. See Also • Recipe 1.17, “Matching Strings with Regular Expressions” • Confused bythe *regexp_fragments syntax in the call to Regexp.union? Take a look at Recipe 8.11, “Accepting or Passing a Variable Number of Arguments” 1.19 Validating an Email Address Problem You need to see whether an email address is valid. 34 | Chapter 1: Strings Solution Here’s a sampling of valid email addresses you might encounter: test_addresses = [ #The following are valid addresses according to RFC822. 'joe@example.com', 'joe.bloggs@mail.example.com', 'joe+ruby-mail@example.com', 'joe(and-mary)@example.museum', 'joe@localhost', Here are some invalid email addresses you might encounter: # Complete the list with some invalid addresses 'joe', 'joe@', '@example.com', 'joe@example@example.com', 'joe and mary@example.com' ] And here are some regular expressions that do an okayjob of filtering out bad email addresses. The first one does very basic checking for ill-formed addresses: valid = '[^ @]+' # Exclude characters always invalid in email addresses username_and_machine = /^#{valid}@#{valid}$/ test_addresses.collect { |i| i =~ username_and_machine } # => [0, 0, 0, 0, 0, nil, nil, nil, nil, nil] The second one prohibits the use of local-network addresses like “joe@localhost”. Most applications should prohibit such addresses. username_and_machine_with_tld = /^#{valid}@#{valid}\.#{valid}$/ test_addresses.collect { |i| i =~ username_and_machine_with_tld } # => [0, 0, 0, 0, nil, nil, nil, nil, nil, nil] However, the odds are good that you’re solving the wrong problem. Discussion Most email address validation is done with naive regular expressions like the ones given above. Unfortunately, these regular expressions are usually written too strictly, and reject manyemail addresses. This is a common source of frustration for people with unusual email addresses like joe(and-mary)@example.museum, or people tak- ing advantage of special features of email, as in joe+ruby-mail@example.com. The regular expressions given above err on the opposite side: they’ll accept some syntacti- cally invalid email addresses, but they won’t reject valid addresses. Whynot give a simple regular expression that alwaysworks? Because there’s no such thing. The definition of the syntax is anything but simple. Perl hacker Paul Warren defined an 6343-character regular expression for Perl’s Mail::RFC822::Address module, and even it needs some preprocessing to accept absolutelyeveryallowable email address. Warren’s regular expression will work unaltered in Ruby, but if you really want it, you should go online and find it, because it would be foolish to try to type it in. 1.19 Validating an Email Address | 35 Check validity, not correctness Even given a regular expression or other tool that infalliblyseparates the RFC822- compliant email addresses from the others, you can’t check the validity of an email address just by looking at it; you can only check its syntactic correctness. It’s easy to mistype your username or domain name, giving out a perfectly valid email address that belongs to someone else. It’s trivial for a malicious user to make up a valid email address that doesn’t work at all—I did it earlier with the joe@example.com non- sense. !@ is a valid email address according to the regexp test, but no one in this uni- verse uses it. You can’t even compare the top-level domain of an address against a static list, because new top-level domains are always being added. Syntactic validation of email addresses is an enormous amount of work that onlysolves a small portion of the problem. The onlywayto be certain that an email address is valid is to successfullysend email to it. The onlywayto be certain that an email address is the right one is to send email to it and get the recipient to respond. You need to weigh this additional work (yours and the user’s) against the real value of a verified email address. It used to be that a user’s email address was closelyassociated with their online iden- tity: most people had only the email address their ISP gave them. Thanks to today’s free web-based email, that’s no longer true. Email verification no longer works to prevent duplicate accounts or to stop antisocial behavior online—if it ever did. This is not to saythat it’s never useful to have a user’s working email address, or that there’s no problem if people mistype their email addresses. To improve the quality of the addresses your users enter, without rejecting valid addresses, you can do three things beyond verifying with the permissive regular expressions given above: 1. Use a second naive regular expression, more restrictive than the ones given above, but don’t prohibit addresses that don’t match. Onlyuse the second regu- lar expression to advise the user that theymayhave mistypedtheir email address. This is not as useful as it seems, because most typos involve changing one letter for another, rather than introducing nonalphanumerics where they don’t belong. def probably_valid?(email) valid = '[A-Za-z\d.+-]+' #Commonly encountered email address characters (email =~ /#{valid}@#{valid}\.#{valid}/) == 0 end #These give the correct result. probably_valid? 'joe@example.com' # => true probably_valid? 'joe+ruby-mail@example.com' # => true probably_valid? 'joe.bloggs@mail.example.com' # => true probably_valid? 'joe@examplecom' # => false probably_valid? 'joe+ruby-mail@example.com' # => true probably_valid? 'joe@localhost' # => false 36 | Chapter 1: Strings # This address is valid, but probably_valid thinks it's not. probably_valid? 'joe(and-mary)@example.museum' # => false # This address is valid, but certainly wrong. probably_valid? 'joe@example.cpm' # => true 2. Extract from the alleged email address the hostname (the “example.com” of joe@example.com), and do a DNS lookup to see if that hostname accepts email. A hostname that has an MX DNS record is set up to receive mail. The following code will catch most domain name misspellings, but it won’t catch anyuser- name misspellings. It’s also not guaranteed to parse the hostname correctly, again because of the complexity of RFC822. require 'resolv' def valid_email_host?(email) hostname = email[(email =~ /@/)+1..email.length] valid = true begin Resolv::DNS.new.getresource(hostname, Resolv::DNS::Resource::IN::MX) rescue Resolv::ResolvError valid = false end return valid end #example.com is a real domain, but it won't accept mail valid_email_host?('joe@example.com') # => false #lcqkxjvoem.mil is not a real domain. valid_email_host?('joe@lcqkxjvoem.mil') # => false #oreilly.com exists and accepts mail, though there might not be a 'joe' there. valid_email_host?('joe@oreilly.com') # => true 3. Send email to the address the user input, and ask the user to verifyreceipt. For instance, the email might contain a verification URL for the user to click on. This is the onlywayto guarantee that the user entered a valid email address that they control. See Recipes 14.5 and 15.19 for this. This is overkill much of the time. It requires that you add special workflow to your application, it significantly raises the barriers to use of your application, and it won’t always work. Some users have spam filters that will treat your test mail as junk, or whitelist email systems that reject all email from unknown sources. Unless you really need a user’s working email address for your applica- tion to work, very simple email validation should suffice. See Also • Recipe 14.5, “Sending Mail” • Recipe 15.19, “Sending Mail with Rails” • See the amazing colossal regular expression for email addresses at http://www.ex- parrot.com/~pdw/Mail-RFC822-Address.html 1.20 Classifying Text with a Bayesian Analyzer | 37 1.20 Classifying Text with a Bayesian Analyzer Problem You want to classifychunks of text byexample: an email message is either spam or not spam, a joke is either funny or not funny, and so on. Solution Use Lucas Carlson’s Classifier library, available as the classifier gem. It provides a naive Bayesian classifier, and one that implements Latent Semantic Indexing, a more advanced technique. The interface for the naive Bayesian classifier is very straightforward. You create a Classifier::Bayes object with some classifications, and train it on text chunks whose classification is known: require 'rubygems' require 'classifier' classifier = Classifier::Bayes.new('Spam', 'Not spam') classifier.train_spam 'are you in the market for viagra? we sell viagra' classifier.train_not_spam 'hi there, are we still on for lunch?' You can then feed the classifier text chunks whose classification is unknown, and have it guess: classifier.classify "we sell the cheapest viagra on the market" # => "Spam" classifier.classify "lunch sounds great" # => "Not spam" Discussion Bayesian analysis is based on probablities. When you train the classifier, you are giv- ing it a set of words and the classifier keeps track of how often words show up in each category. In the simple spam filter built in the Solution, the frequency hash looks like the @categories variable below: classifier # => # # { :lunch=>1, :for=>1, :there=>1, # :"?"=>1, :still=>1, :","=>1 }, # :Spam=> # { :market=>1, :for=>1, :viagra=>2, :"?"=>1, :sell=>1 } # }, # @total_words=12> These hashes are used to build probabilitycalculations. Note that since we mentioned the word “viagra” twice in spam messages, there is a 2 in the “Spam” frequencyhash 38 | Chapter 1: Strings for that word. That makes it more spam-like than other words like “for” (which also shows up in nonspam) or “sell” (which onlyshows up once in spam). The classifier can apply these probabilities to previously unseen text and guess at a classification for it. The more text you use to train the classifier, the better it becomes at guessing. If you can verifythe classifier’s guesses (for instance, byasking the user whether a message reallywas spam), youshould use that information to train the classifier with new data as it comes in. To save the state of the classifier for later use, you can use Madeleine persistence (Recipe 13.3), which writes the state of your classifier to your hard drive. A few more notes about this type of classifier. A Bayesian classifier supports as many categories as you want. “Spam” and “Not spam” are the most common, but you are not limited to two. You can also use the generic train method instead of calling train_[category_name]. Here’s a classifier that has three categories and uses the generic train method: classifier = Classifier::Bayes.new('Interesting', 'Funny', 'Dramatic') classifier.train 'Interesting', "Leaving reminds us of what we can part with and what we can't, then offers us something new to look forward to, to dream about." classifier.train 'Funny', "Knock knock. Who's there? Boo boo. Boo boo who? Don't cry, it is only a joke." classifier.train 'Dramatic', 'I love you! I hate you! Get out right now.' classifier.classify 'what!' # => "Dramatic" classifier.classify "who's on first?" # => "Funny" classifier.classify 'perchance to dream' # => "Interesting" It’s also possible to “untrain” a category if you make a mistake or change your mind later. classifier.untrain_funny "boo" classifier.untrain "Dramatic", "out" See Also • Recipe 13.3, “Persisting Objects with Madeleine” • The README file for the Classifier library has an example of an LSI classifier • Bishop (http://bishop.rubyforge.org/) is another Bayesian classifier, a port of Python’s Reverend; it’s available as the bishop gem • http://en.wikipedia.org/wiki/Naive_Bayes_classifier • http://en.wikipedia.org/wiki/Latent_Semantic_Analysis 39 Chapter 2 CHAPTER 2 Numbers2 Numbers are as fundamental to computing as breath is to human life. Even pro- grams that have nothing to do with math need to count the items in a data structure, displayaverage running times, or use numbers as a source of randomness. Ruby makes it easyto represent numbers, letting youbreathe easyand tackle the harder problems of programming. An issue that comes up when you’re programming with numbers is that there are several different implementations of “number,” optimized for different purposes: 32- bit integers, floating-point numbers, and so on. Rubytries to hide these details from you, but it’s important to know about them because they often manifest as mysteri- ously incorrect calculations.* The first distinction is between small numbers and large ones. If you’ve used other pro- gramming languages, you probably know that you must use different data types to hold small numbers and large numbers (assuming that the language supports large numbers at all). Rubyhas different classes for small numbers ( Fixnum) and large num- bers (Bignum), but you don’t usually have to worry about the difference. When you type in a number, Ruby sees how big it is and creates an object of the appropriate class. 1000.class # => Fixnum 10000000000.class # => Bignum (2**30 - 1).class # => Fixnum (2**30).class # => Bignum When you perform arithmetic, Ruby automatically does any needed conversions. You don’t have to worry about the difference between small and large numbers:† small = 1000 big = small ** 5 # => 1000000000000000 * See, for instance, the Discussion section of Recipe 2.11, where it’s revealed that Matrix#inverse doesn’t work correctlyon a matrix full of integers. This is because Matrix#inverse uses division, and integer division works differently from floating-point division. † Python also has this feature. 40 | Chapter 2: Numbers big.class # => Bignum smaller = big / big # => 1 smaller.class # => Fixnum The other major distinction is between whole numbers (integers) and fractional num- bers. Like all modern programming languages, Rubyimplements the IEEE floating- point standard for representing fractional numbers. If you type a number that includes a decimal point, Ruby creates a Float object instead of a Fixnum or Bignum: 0.01.class # => Float 1.0.class # => Float 10000000000.00000000001.class # => Float But floating-point numbers are imprecise (see Recipe 2.2), and theyhave their own size limits, so Rubyalso provides a class that can represent anynumber with a finite decimal expansion (Recipe 2.3). There’s also a class for numbers like two-thirds, which have an infinite decimal expansion (Recipe 2.4), and a class for complex or “irrational” numbers (Recipe 2.12). Everykind of number in Rubyhas its own class ( Integer, Bignum, Complex, and so on), which inherits from the Numeric class. All these classes implement the basic arithmetic operations, and in most cases you can mix and match numbers of differ- ent types (see Recipe 8.9 for more on how this works). You can reopen these classes to add new capabilities to numbers (see, for instance, Recipe 2.17), but you can’t usefully subclass them. Rubyprovides simple waysof generating random numbers (Recipe 2.5) and sequences of numbers (Recipe 2.15). This chapter also covers some simple mathe- matical algorithms (Recipes 2.7 and 2.11) and statistics (Recipe 2.8). 2.1 Parsing a Number from a String Problem Given a string that contains some representation of a number, you want to get the corresponding integer or floating-point value. Solution Use String#to_i to turn a string into an integer. Use String#to_f to turn a string into a floating-point number. '400'.to_i # => 400 '3.14'.to_f # => 3.14 '1.602e-19'.to_f # => 1.602e-19 2.1 Parsing a Number from a String | 41 Discussion Unlike Perl and PHP, Rubydoes not automaticallymake a number out of a string that contains a number. You must explicitlycall a conversion method that tells Ruby how you want the string to be converted. Along with to_i and to_f, there are other ways to convert strings into numbers. If you have a string that represents a hex or octal string, you can call String#hex or String#oct to get the decimal equivalent. This is the same as passing the base of the number into to_i: '405'.oct # => 261 '405'.to_i(8) # => 261 '405'.hex # => 1029 '405'.to_i(16) # => 1029 'fed'.hex # => 4077 'fed'.to_i(16) # => 4077 If to_i, to_f, hex,oroct find a character that can’t be part of the kind of number they’re looking for, they stop processing the string at that character and return the number so far. If the string’s first character is unusable, the result is zero. "13: a baker's dozen".to_i # => 13 '1001 Nights'.to_i # => 1001 'The 1000 Nights and a Night'.to_i # => 0 '60.50 Misc. Agricultural Equipment'.to_f # => 60.5 '$60.50'.to_f # => 0.0 'Feed the monster!'.hex # => 65261 'I fed the monster at Canoga Park Waterslides'.hex # => 0 '0xA2Z'.hex # => 162 '-10'.oct # => -8 '-109'.oct # => -8 '3.14'.to_i # => 3 Note especiallythat last example: the decimal point is just one more character that stops processing of a string representing an integer. If you want an exception when a string can’t be completely parsed as a number, use Integer( ) or Float( ): Integer('1001') # => 1001 Integer('1001 nights') # ArgumentError: invalid value for Integer: "1001 nights" Float('99.44') # => 99.44 Float('99.44% pure') # ArgumentError: invalid value for Float( ): "99.44% pure" To extract a number from within a larger string, use a regular expression. The NumberParser class below contains regular expressions for extracting floating-point strings, as well as decimal, octal, and hexadecimal numbers. Its extract_numbers method uses String#scan to find all the numbers of a certain type in a string. 42 | Chapter 2: Numbers class NumberParser @@number_regexps = { :to_i => /([+-]?[0-9]+)/, :to_f => /([+-]?([0-9]*\.)?[0-9]+(e[+-]?[0-9]+)?)/i, :oct => /([+-]?[0-7]+)/, :hex => /\b([+-]?(0x)?[0-9a-f]+)\b/i #The \b characters prevent every letter A-F in a word from being #considered a hexadecimal number. } def NumberParser.re(parsing_method=:to_i) re = @@number_regexps[parsing_method] raise ArgumentError, "No regexp for #{parsing_method.inspect}!" unless re return re end def extract(s, parsing_method=:to_i) numbers = [] s.scan(NumberParser.re(parsing_method)) do |match| numbers << match[0].send(parsing_method) end numbers end end Here it is in action: p = NumberParser.new pw = "Today's numbers are 104 and 391." NumberParser.re(:to_i).match(pw).captures # => ["104"] p.extract(pw, :to_i) # => [104, 391] p.extract('The 1000 nights and a night') # => [1000] p.extract('$60.50', :to_f) # => [60.5] p.extract('I fed the monster at Canoga Park Waterslides', :hex) # => [4077] p.extract('In octal, fifteen is 017.', :oct) # => [15] p.extract('From 0 to 10e60 in -2.4 seconds', :to_f) # => [0.0, 1.0e+61, -2.4] p.extract('From 0 to 10e60 in -2.4 seconds') # => [0, 10, 60, -2, 4] If you want to extract more than one kind of number from a string, the most reliable strategyis to stop using regular expressions and start using the scanf module, a free third-party module that provides a parser similar to C’s scanf function. require 'scanf' s = '0x10 4.44 10'.scanf('%x %f %d') # => [16, 4.44, 10] See Also • Recipe 2.6, “Converting Between Numeric Bases” 2.2 Comparing Floating-Point Numbers | 43 • Recipe 8.9, “Converting and Coercing Objects to Different Types” • The scanf module (http://www.rubyhacker.com/code/scanf/) 2.2 Comparing Floating-Point Numbers Problem Floating-point numbers are not suitable for exact comparison. Often, two numbers that should be equal are actuallyslightlydifferent. The Rubyinterpreter can make seemingly nonsensical assertions when floating-point numbers are involved: 1.8 + 0.1 # => 1.9 1.8 + 0.1 == 1.9 # => false 1.8 + 0.1 > 1.9 # => true You want to do comparison operations approximately, so that floating-point num- bers infintesimally close together can be treated equally. Solution You can avoid this problem altogether byusing BigDecimal numbers instead of floats (see Recipe 2.3). BigDecimal numbers are completelyprecise, and work as well as as floats for representing numbers that are relativelysmall and have few decimal places: everyday numbers like the prices of fruits. But math on BigDecimal numbers is much slower than math on floats. Databases have native support for floating-point num- bers, but not for BigDecimals. And floating-point numbers are simpler to create (sim- plytype 10.2 in an interactive Rubyshell to get a Float object). BigDecimals can’t totallyreplace floats, and when youuse floats it would be nice not to have to worry about tiny differences between numbers when doing comparisons. But how tinyis “tiny"?How large can the difference be between two numbers before theyshould stop being considered equal? As numbers get larger, so does the range of floating-point values that can reasonablybe expected to model that number. 1.1 is probablynot “approximatelyequal” to 1.2, but 10 20 + 0.1 is probably“approxi- mately equal” to 1020 + 0.2. The best solution is probablyto compare the relative magnitudes of large num- bers, and the absolute magnitudes of small numbers. The following code accepts both two thresholds: a relative threshold and an absolute threshold. Both default to Float::EPSILON, the smallest possible difference between two Float objects. Two floats are considered approximatelyequal if theyare within absolute_epsilon of each other, or if the difference between them is relative_epsilon times the magnitude of the larger one. class Float def approx(other, relative_epsilon=Float::EPSILON, epsilon=Float::EPSILON) difference = other - self return true if difference.abs <= epsilon 44 | Chapter 2: Numbers relative_error = (difference / (self > other ? self : other)).abs return relative_error <= relative_epsilon end end 100.2.approx(100.1 + 0.1) # => true 10e10.approx(10e10+1e-5) # => true 100.0.approx(100+1e-5) # => false Discussion Floating-point math is veryprecise but, due to the underlyingstorage mechanism for Float objects, not veryaccurate. Manyreal numbers (such as 1.9) can’t be represented bythe floating-point standard. Anyattempt to represent such a number will end up using one of the nearby numbers that does have a floating-point representation. You don’t normallysee the difference between 1.9 and 1.8 + 0.1, because Float#to_s rounds them both off to “1.9”. You can see the difference byusing Kernel#printf to display the two expressions to many decimal places: printf("%.55f", 1.9) # 1.8999999999999999111821580299874767661094665527343750000 printf("%.55f", 1.8 + 0.1) # 1.9000000000000001332267629550187848508358001708984375000 Both numbers straddle 1.9 from opposite ends, unable to accuratelyrepresent the number theyshould both equal. Note that the difference between the two numbers is precisely Float::EPSILON: Float::EPSILON # => 2.22044604925031e-16 (1.8 + 0.1) - 1.9 # => 2.22044604925031e-16 This EPSILON’s worth of inaccuracyis often too small to matter, but it does when you’re doing comparisons. 1.9+Float::EPSILON is not equal to 1.9-Float::EPSILON, even if (in this case) both are attempts to represent the same number. This is whymost floating- point numbers are compared in relative terms. The most efficient wayto do a relative comparison is to see whether the two num- bers differ by more than an specified error range, using code like this: class Float def absolute_approx(other, epsilon=Float::EPSILON) return (other-self).abs <= epsilon end end (1.8 + 0.1).absolute_approx(1.9) # => true 10e10.absolute_approx(10e10+1e-5) # => false The default value of epsilon works well for numbers close to 0, but for larger num- bers the default value of epsilon will be too small. Anyother value of epsilon you might specify will only work well within a certain range. 2.3 Representing Numbers to Arbitrary Precision | 45 Thus, Float#approx, the recommended solution, compares both absolute and rela- tive magnitude. As numbers get bigger, so does the allowable margin of error for two numbers to be considered “equal.” Its default relative_epsilon allows numbers between 2 and 3 to differ bytwice the value of Float::EPSILON. Numbers between 3 and 4 can differ by three times the value of Float::EPSILON, and so on. A verysmall value of relative_epsilon is good for mathematical operations, but if your data comes from a real-world source like a scientific instrument, you can increase it. For instance, a Rubyscript maytrack changes in temperature read from a thermometer that’s only99.9% accurate. In this case, relative_epsilon can be set to 0.001, and everything beyond that point discarded as noise. 98.6.approx(98.66) # => false 98.6.approx(98.66, 0.001) # => true See Also • Recipe 2.3, “Representing Numbers to ArbitraryPrecision,” has more informa- tion on BigDecimal numbers • If you need to represent a fraction with an infinite decimal expansion, use a Rational number (see Recipe 2.4, “Representing Rational Numbers”) • “Comparing floating point numbers” byBruce Dawson has an excellent (albeit C-centric) overview of the tradeoffs involved in different ways of doing floating- point comparisons (http://www.cygnus-software.com/papers/comparingfloats/ comparingfloats.htm) 2.3 Representing Numbers to Arbitrary Precision Problem You’re doing high-precision arithmetic, and floating-point numbers are not precise enough. Solution A BigDecimal number can represent a real number to an arbitrarynumber of decimal places. require 'bigdecimal' BigDecimal("10").to_s # => "0.1E2" BigDecimal("1000").to_s # => "0.1E4" BigDecimal("1000").to_s("F") # => "1000.0" BigDecimal("0.123456789").to_s # => "0.123456789E0" Compare how Float and BigDecimal store the same high-precision number: nm = "0.123456789012345678901234567890123456789" nm.to_f # => 0.123456789012346 46 | Chapter 2: Numbers BigDecimal(nm).to_s # => "0.123456789012345678901234567890123456789E0" Discussion BigDecimal numbers store numbers in scientific notation format. A BigDecimal consists of a sign (positive or negative), an arbitrarilylarge decimal fraction, and an arbitrarily large exponent. This is similar to the wayfloating-point numbers are stored, but a dou- ble-precision floating-point implementation like Ruby’s cannot represent an exponent less than Float::MIN_EXP (–1021) or greater than Float::MAX_EXP (1024). Float objects also can’t represent numbers at a greater precision than Float::EPSILON,or about 2.2*10-16. You can use BigDecimal#split to split a BigDecimal object into the parts of its scientific- notation representation. It returns an arrayof four numbers: the sign (1 for positive numbers, –1 for negative numbers), the fraction (as a string), the base of the expo- nent (which is always 10), and the exponent itself. BigDecimal("105000").split # => [1, "105", 10, 6] # That is, 0.105*(10**6) BigDecimal("-0.005").split # => [-1, "5", 10, -2] # That is, -1 * (0.5*(10**-2)) A good wayto test different precision settings is to create an infinitelyrepeating deci- mal like 2/3, and see how much of it gets stored. Bydefault, BigDecimals give 16 dig- its of precision, roughly comparable to what a Float can give. (BigDecimal("2") / BigDecimal("3")).to_s # => "0.6666666666666667E0" 2.0/3 # => 0.666666666666667 You can store additional significant digits bypassing in a second argument n to the BigDecimal constructor. BigDecimal precision is allocated in chunks of four decimal digits. Values of n from 1 to 4 make a BigDecimal use the default precision of 16 dig- its. Values from 5 to 8 give 20 digits of precision, values from 9 to 12 give 24 digits, and so on: def two_thirds(precision) (BigDecimal("2", precision) / BigDecimal("3")).to_s end two_thirds(1) # => "0.6666666666666667E0" two_thirds(4) # => "0.6666666666666667E0" two_thirds(5) # => "0.66666666666666666667E0" two_thirds(9) # => "0.666666666666666666666667E0" two_thirds(13) # => "0.6666666666666666666666666667E0" 2.3 Representing Numbers to Arbitrary Precision | 47 Not all of a number’s significant digits maybe used. For instance, Rubyconsiders BigDecimal("2") and BigDecimal("2.000000000000") to be equal, even though the sec- ond one has many more significant digits. You can inspect the precision of a number with BigDecimal#precs. This method returns an arrayof two elements: the number of significant digits actuallybeing used, and the toal number of significant digits. Again, since significant digits are allocated in blocks of four, both of these numbers will be multiples of four. BigDecimal("2").precs # => [4, 8] BigDecimal("2.000000000000").precs # => [4, 20] BigDecimal("2.000000000001").precs # => [16, 20] If you use the standard arithmetic operators on BigDecimals, the result is a BigDecimal accurate to the largest possible number of digits. Dividing or multiplying one BigDecimal byanother yieldsa BigDecimal with more digits of precision than either of its parents, just as would happen on a pocket calculator. (a = BigDecimal("2.01")).precs # => [8, 8] (b = BigDecimal("3.01")).precs # => [8, 8] (product = a * b).to_s("F") # => "6.0501" product.precs # => [8, 24] To specifythe number of significant digits that should be retained in an arithmetic operation, you can use the methods add, sub, mul, and div instead of the arithmetic operators. two_thirds = (BigDecimal("2", 13) / 3) two_thirds.to_s # => "0.666666666666666666666666666666666667E0" (two_thirds + 1).to_s # => "0.1666666666666666666666666666666666667E1" two_thirds.add(1, 1).to_s # => "0.2E1" two_thirds.add(1, 4).to_s # => "0.1667E1" Either way, BigDecimal math is significantlyslower than floating-point math. Not onlyare BigDecimals allowed to have more significant digits than floats, but BigDecimals are stored as an arrayof decimal digits, while floats are stored in a binary encoding and manipulated with binary arithmetic. The BigMath module in the Rubystandard librarydefines methods for performing arbitrary-precision mathematical operations on BigDecimal objects. It defines power- related methods like sqrt, log, and exp, and trigonometric methods like sin, cos, and atan. All of these methods take as an argument a number prec indicating how manydigits of precision to retain. Theymayreturn a BigDecimal with more than prec significant digits, but only prec of those digits are guaranteed to be accurate. require 'bigdecimal/math' include BigMath 48 | Chapter 2: Numbers two = BigDecimal("2") BigMath::sqrt(two, 10).to_s("F") # => "1.4142135623730950488016883515" That code gives 28 decimal places, but only10 are guaranteed accurate (because we passed in an n of 10), and only24 are actuallyaccurate. The square root of 2 to 28 decimal places is actually1.4142135623730950488016887242. We can get rid of the inaccurate digits with BigDecimal#round: BigMath::sqrt(two, 10).round(10).to_s("F") # => "1.4142135624" We can also get a more precise number by increasing n: BigMath::sqrt(two, 28).round(28).to_s("F") # => "1.4142135623730950488016887242" BigMath also annotates BigDecimal with class methods BigDecimal.PI and BigDecimal.E. These methods construct BigDecimals of those transcendental numbers at anylevel of precision. Math::PI # => 3.14159265358979 Math::PI.class # => Float BigDecimal.PI(1).to_s # => "0.31415926535897932364198143965603E1" BigDecimal.PI(20).to_s # => "0.3141592653589793238462643383279502883919859293521427E1" See Also • At the time of writing, BigMath::log was veryslow for BigDecimals larger than about 10; see Recipe 2.7, “Taking Logarithms,” for a much faster implementation • See Recipe 2.4, “Representing Rational Numbers,” if you need to exactly repre- sent a rational number with an infinite decimal expansion, like 2/3 • The BigDecimal libraryreference is extremelyuseful; if youlook at the generated RDoc for the Rubystandard library, BigDecimal looks almost undocumented, but it actuallyhas a comprehensive reference file (in English and Japanese): it’s just not in RDoc format, so it doesn’t get picked up; this document is available in the Rubysource package, or do a web search for “BigDecimal: An extension library for Ruby” 2.4 Representing Rational Numbers Problem You want to preciselyrepresent a rational number like 2/3, one that has no finite decimal expansion. Solution Use a Rational object; it represents a rational number as an integer numerator and denominator. 2.4 Representing Rational Numbers | 49 float = 2.0/3.0 # => 0.666666666666667 float * 100 # => 66.6666666666667 float * 100 / 42 # => 1.58730158730159 require 'rational' rational = Rational(2, 3) # => Rational(2, 3) rational.to_f # => 0.666666666666667 rational * 100 # => Rational(200, 3) rational * 100 / 42 # => Rational(100, 63) Discussion Rational objects can store numbers that can’t be represented in anyother form, and arithmetic on Rational objects is completely precise. Since the numerator and denominator of a Rational can be Bignums, a Rational object can also represent numbers larger and smaller than those you can represent in floating- point. But math on BigDecimal objects is faster than on Rationals. BigDecimal objects are also usuallyeasier to work with than Rationals, because most of us think of num- bers in terms of their decimal expansions. You should onlyuse Rational objects when you need to represent rational numbers with perfect accuracy. When you do, be sure to use only Rationals, Fixnums, and Bignums in your calculations. Don’t use any BigDecimals or floating-point numbers: arithmetic operations between a Rational and those types will return floating-point numbers, and you’ll have lost precision forever. 10 + Rational(2,3) # => Rational(32, 3) require 'bigdecimal' BigDecimal('10') + Rational(2,3) # => 10.6666666666667 The methods in Ruby’s Math module implement operations like square root, which usuallygive irrational results. When youpass a Rational number into one of the methods in the Math module, you get a floating-point number back: Math::sqrt(Rational(2,3)) # => 0.816496580927726 Math::sqrt(Rational(25,1)) # => 5.0 Math::log10(Rational(100, 1)) # => 2.0 The mathn libraryadds miscellaneous functionalityto Ruby’smath functions. Among other things, it modifies the Math::sqrt method so that if you pass in a square number, you get a Fixnum back instead of a Float. This preserves precision whenever possible: require 'mathn' Math::sqrt(Rational(2,3)) # => 0.816496580927726 Math::sqrt(Rational(25,1)) # => 5 Math::sqrt(25) # => 5 Math::sqrt(25.0) # => 5.0 50 | Chapter 2: Numbers See Also • The rfloat third-partylibrarylets youuse a Float-like interface that’s actually backed by Rational (http://blade.nagaokaut.ac.jp/~sinara/ruby/rfloat/) • RCR 320 proposes better interoperabilitybetween Rationals and floating-point numbers, including a Rational#approximate method that will let you convert the floating-point number 0.1 into Rational(1, 10) (http://www.rcrchive.net/rcr/ show/320) 2.5 Generating Random Numbers Problem You want to generate pseudorandom numbers, select items from a data structure at random, or repeatedly generate the same “random” numbers for testing purposes. Solution Use the Kernel#rand function with no arguments to select a psuedorandom floating- point number from a uniform distribution between 0 and 1. rand # => 0.517297883846589 rand # => 0.946962603814814 Pass in a single integer argument n to Kernel#rand, and it returns an integer between 0 and n–1: rand(5) # => 0 rand(5) # => 4 rand(5) # => 3 rand(1000) # => 39 Discussion You can use the single-argument form of Kernel#rand to build manycommon tasks based on randomness. For instance, this code selects a random item from an array. a = ['item1', 'item2', 'item3'] a[rand(a.size)] # => "item3" To select a random keyor value from a hash, turn the keysor values into an array and select one at random. m = { :key1 => 'value1', :key2 => 'value2', :key3 => 'value3' } values = m.values values[rand(values.size)] # => "value1" This code generates pronounceable nonsense words: 2.5 Generating Random Numbers | 51 def random_word letters = { ?v => 'aeiou', ?c => 'bcdfghjklmnprstvwyz' } word = '' 'cvcvcvc'.each_byte do |x| source = letters[x] word << source[rand(source.length)].chr end return word end random_word # => "josuyip" random_word # => "haramic" The Rubyinterpreter initializes its random number generator on startup, using a seed derived from the current time and the process number. To reliablygenerate the same random numbers over and over again, you can set the random number seed manually bycalling the Kernel#srand function with the integer argument of your choice. This is useful when you’re writing automated tests of “random” functionality: #Some random numbers based on process number and current time rand(1000) # => 187 rand(1000) # => 551 rand(1000) # => 911 #Start the seed with the number 1 srand 1 rand(1000) # => 37 rand(1000) # => 235 rand(1000) # => 908 #Reset the seed to its previous state srand 1 rand(1000) # => 37 rand(1000) # => 235 rand(1000) # => 908 See Also • Recipe 4.10, “Shuffling an Array” • Recipe 5.11, “Choosing Randomly from a Weighted List” • Recipe 6.9, “Picking a Random Line from a File” • The Facets libraryimplements manymethods for making random selections from data structures: Array#pick, Array#rand_subset, Hash#rand_pair, and so on; it also defines String.random for generating random strings • Christian Neukirchen’s rand.rb also implements manyrandom selection meth- ods (http://chneukirchen.org/blog/static/projects/rand.html) 52 | Chapter 2: Numbers 2.6 Converting Between Numeric Bases Problem You want to convert numbers from one base to another. Solution You can convert specific binary, octal, or hexadecimal numbers to decimal by repre- senting them with the 0b, 0o, or 0x prefixes: 0b100 # => 4 0o100 # => 64 0x100 # => 256 You can also convert between decimal numbers and string representations of those numbers in anybase from 2 to 36. Simplypass the base into String#to_i or Integer#to_s. Here are some conversions between string representations of numbers in various bases, and the corresponding decimal numbers: "1045".to_i(10) # => 1045 "-1001001".to_i(2) # => -73 "abc".to_i(16) # => 2748 "abc".to_i(20) # => 4232 "number".to_i(36) # => 1442151747 "zz1z".to_i(36) # => 1678391 "abcdef".to_i(16) # => 11259375 "AbCdEf".to_i(16) # => 11259375 Here are some reverse conversions of decimal numbers to the strings that represent those numbers in various bases: 42.to_s(10) # => "42" -100.to_s(2) # => "-1100100" 255.to_s(16) # => "ff" 1442151747.to_s(36) # => "number" Some invalid conversions: "6".to_i(2) # => 0 "0".to_i(1) # ArgumentError: illegal radix 1 40.to_s(37) # ArgumentError: illegal radix 37 Discussion String#to_i can parse and Integer#to_s can create a string representation in every common integer base: from binary(the familiar base 2, which uses onlythe digits 0 and 1) to hexatridecimal (base 36). Hexatridecimal uses the digits 0–9 and the letters a–z; it’s sometimes used to generate alphanumeric mneumonics for long numbers. 2.7 Taking Logarithms | 53 The onlycommonlyused counting systemswith bases higher than 36 are the vari- ants of base-64 encoding used in applications like MIME mail attachments. These usuallyencode strings, not numbers; to encode a string in MIME-stylebase-64, use the base64 library. See Also • Recipe 12.5, “Adding Graphical Context with Sparklines,” and Recipe 14.5, “Sending Mail,” show how to use the base64 library 2.7 Taking Logarithms Problem You want to take the logarithm of a number, possibly a huge one. Solution Math.log calculates the natural log of a number: that is, the log base e. Math.log(1) # => 0.0 Math.log(Math::E) # => 1.0 Math.log(10) # => 2.30258509299405 Math::E ** Math.log(25) # => 25.0 Math.log10 calculates the log base 10 of a number: Math.log10(1) # => 0.0 Math.log10(10) # => 1.0 Math.log10(10.1) # => 1.00432137378264 Math.log10(1000) # => 3.0 10 ** Math.log10(25) # => 25.0 To calculate a logarithm in some other base, use the fact that, for anybases b1 and b2, logb1(x) = logb2(x) / logb2(k). module Math def Math.logb(num, base) log(num) / log(base) end end Discussion A logarithm function inverts an exponentiation function. The log base k of x,or logk(x), is the number that gives x when raised to the k power. That is, Math. log10(1000)==3.0 because 10 cubed is 1000.Math.log(Math::E)==1 because e to the first power is e. The logarithm functions for all numeric bases are related (you can get from one base to another bydividing bya constant factor), but they’reused for different purposes. 54 | Chapter 2: Numbers Scientific applications often use the natural log: this is the fastest log implementa- tion in Ruby. The log base 10 is often used to visualize datasets that span many orders of magnitude, such as the pH scale for acidityand the Richter scale for earth- quake intensity. Analyses of algorithms often use the log base 2, or binary logarithm. If you intend to do a lot of algorithms in a base that Ruby doesn’t support natively, you can speed up the calculation by precalculating the dividend: dividend = Math.log(2) (1..6).collect { |x| Math.log(x) / dividend } # => [0.0, 1.0, 1.58496250072116, 2.0, 2.32192809488736, 2.58496250072116] The logarithm functions in Math will onlyaccept integers or floating-point numbers, not BigDecimal or Bignum objects. This is inconvenient since logarithms are often used to make extremelylarge numbers managable. The BigMath module has a func- tion to take the natural logarithm of a BigDecimal number, but it’s very slow. Here’s a fast drop-in replacement for BigMath::log that exploits the logarithmic iden- tity log(x*y) == log(x) + log(y). It decomposes a BigDecimal into three much smaller numbers, and operates on those numbers. This avoids the cases that give BigMath::log such poor performance. require 'bigdecimal' require 'bigdecimal/math' require 'bigdecimal/util' module BigMath alias :log_slow :log def log(x, prec) if x <= 0 || prec <= 0 raise ArgumentError, "Zero or negative argument for log" end return x if x.infinite? || x.nan? sign, fraction, power, exponent = x.split fraction = BigDecimal(".#{fraction}") power = power.to_s.to_d log_slow(fraction, prec) + (log_slow(power, prec) * exponent) end end Like BigMath::log, this implementation returns a BigMath accurate to at least prec digits, but containing some additional digits which might not be accurate. To avoid giving the impression that the result is more accurate than it is, you can round the number to prec digits with BigDecimal#round. include BigMath number = BigDecimal("1234.5678") Math.log(number) # => 7.11847622829779 prec = 50 BigMath.log_slow(number, prec).round(prec).to_s("F") # => "7.11847622829778629250879253638708184134073214145175" 2.8 Finding Mean, Median, and Mode | 55 BigMath.log(number, prec).round(prec).to_s("F") # => "7.11847622829778629250879253638708184134073214145175" BigMath.log(number ** 1000, prec).round(prec).to_s("F") # => "7118.47622829778629250879253638708184134073214145175161" As before, calculate a log other than the natural log bydividing by BigMath.log(base) or BigMath.log_slow(base). huge_number = BigDecimal("1000") ** 1000 base = BigDecimal("10") (BigMath.log(huge_number, 100) / BigMath.log(base, 100)).to_f # => 3000.0 How does it work? The internal representation of a BigDecimal is as a number in sci- entific notation: fraction * 10**power. Because log(x*y) = log(x) + log(y), the log of such a number is log(fraction) + log(10**power). 10**power is just 10 multiplied byitself power times (that is, 10*10*10*...*10). Again, log(x*y) = log(x) + log(y),solog(10*10*10*...*10) = log(10)+log(10) + log(10)+... +log(10),orlog(10)*power. This means we can take the logarithm of a huge BigDecimal bytaking the logarithm of its (verysmall) fractional portion and the loga- rithm of 10. See Also • Mathematicians used to spend years constructing tables of logarithms for scientific and engineering applications; so if you find yourself doing a boring job, be glad you don’t have to do that (see http://en.wikipedia.org/wiki/ Logarithm#Tables_of_logarithms) 2.8 Finding Mean, Median, and Mode Problem You want to find the average of an array of numbers: its mean, median, or mode. Solution Usuallywhen people speak of the “average” of a set of numbers they’rereferring to its mean, or arithmetic mean. The mean is the sum of the elements divided bythe number of elements. def mean(array) array.inject(0) { |sum, x| sum += x } / array.size.to_f end mean([1,2,3,4]) # => 2.5 mean([100,100,100,100.1]) # => 100.025 mean([-100, 100]) # => 0.0 mean([3,3,3,3]) # => 3.0 56 | Chapter 2: Numbers The median is the item x such that half the items in the arrayare greater than x and the other half are less than x. Consider a sorted array: if it contains an odd number of elements, the median is the one in the middle. If the arraycontains an even number of elements, the median is defined as the mean of the two middle elements. def median(array, already_sorted=false) return nil if array.empty? array = array.sort unless already_sorted m_pos = array.size / 2 return array.size % 2 == 1 ? array[m_pos] : mean(array[m_pos-1..m_pos]) end median([1,2,3,4,5]) # => 3 median([5,3,2,1,4]) # => 3 median([1,2,3,4]) # => 2.5 median([1,1,2,3,4]) # => 2 median([2,3,-100,100]) # => 2.5 median([1, 1, 10, 100, 1000]) # => 10 The mode is the single most popular item in the array. If a list contains no repeated items, it is not considered to have a mode. If an arraycontains multiple items at the maximum frequency, it is “multimodal.” Depending on your application, you might handle each mode separately, or you might just pick one arbitrarily. def modes(array, find_all=true) histogram = array.inject(Hash.new(0)) { |h, n| h[n] += 1; h } modes = nil histogram.each_pair do |item, times| modes << item if modes && times == modes[0] and find_all modes = [times, item] if (!modes && times>1) or (modes && times>modes[0]) end return modes ? modes[1...modes.size] : modes end modes([1,2,3,4]) # => nil modes([1,1,2,3,4]) # => [1] modes([1,1,2,2,3,4]) # => [1, 2] modes([1,1,2,2,3,4,4]) # => [1, 2, 4] modes([1,1,2,2,3,4,4], false) # => [1] modes([1,1,2,2,3,4,4,4,4,4]) # => [4] Discussion The mean is the most popular type of average. It’s simple to calculate and to under- stand. The implementation of mean given above always returns a floating-point num- ber object. It’s a good general-purpose implementation because it lets you pass in an arrayof Fixnums and get a fractional average, instead of one rounded to the nearest integer. If you want to find the mean of an array of BigDecimal or Rational objects, you should use an implementation of mean that omits the final to_f call: def mean_without_float_conversion(array) array.inject(0) { |x, sum| sum += x } / array.size end 2.8 Finding Mean, Median, and Mode | 57 require 'rational' numbers = [Rational(2,3), Rational(3,4), Rational(6,7)] mean(numbers) # => 0.757936507936508 mean_without_float_conversion(numbers) # => Rational(191, 252) The median is mainlyuseful when a small proportion of outliers in the dataset would make the mean misleading. For instance, government statistics usuallyshow “median household income” instead of “mean household income.” Otherwise, a few super-wealthyhouseholds would make everyoneelse look much richer than theyare. The example below demonstrates how the mean can be skewed bya few veryhigh or very low outliers. mean([1, 100, 100000]) # => 33367.0 median([1, 100, 100000]) # => 100 mean([1, 100, -1000000]) # => -333299.666666667 median([1, 100, -1000000]) # => 1 The mode is the onlydefinition of “average” that can be applied to arraysof arbi- traryobjects. Since the mean is calculated using arithmetic, an arraycan onlybe said to have a mean if all of its members are numeric. The median involves onlycompari- sons, except when the arraycontains an even number of elements: then, calculating the median requires that you calculate the mean. If you defined some other way to take the median of an array with an even number of elements, you could take the median of Arrays of strings: median(["a", "z", "b", "l", "m", "j", "b"]) # => "j" median(["a", "b", "c", "d"]) # TypeError: String can't be coerced into Fixnum The standard deviation A concept related to the mean is the standard deviation, a quantitythat measures how close the dataset as a whole is to the mean. When a mean is distorted byhigh or low outliers, the corresponding standard deviation is high. When the numbers in a dataset cluster closelyaround the mean, the standard deviation is low. You won’t be fooled by a misleading mean if you also look at the standard deviation. def mean_and_standard_deviation(array) m = mean(array) variance = array.inject(0) { |variance, x| variance += (x - m) ** 2 } return m, Math.sqrt(variance/(array.size-1)) end #All the items in the list are close to the mean, so the standard #deviation is low. mean_and_standard_deviation([1,2,3,1,1,2,1]) # => [1.57142857142857, 0.786795792469443] 58 | Chapter 2: Numbers #The outlier increases the mean, but also increases the standard deviation. mean_and_standard_deviation([1,2,3,1,1,2,1000]) # => [144.285714285714, 377.33526837801] A good rule of thumb is that two-thirds (about 68 percent) of the items in a dataset are within one standard deviation of the mean, and almost all (about 95 percent) of the items are within two standard deviations of the mean. See Also • “Programmers Need to Learn Statistics or I Will Kill Them All,” byZed Shaw (http://www.zedshaw.com/blog/programming/programmer_stats.html) • More Rubyimplementations of simple statistical measures ( http://dada.perl.it/ shootout/moments.ruby.html) • To do more complex statistical analysis in Ruby, try the Ruby bindings to the GNU Scientific Library (http://ruby-gsl.sourceforge.net/) • The Stats class in the Mongrel web server (http://mongrel.rubyforge.org) imple- ments other algorithms for calculating mean and standard deviation, which are faster if you need to repeatedly calculate the mean of a growing series 2.9 Converting Between Degrees and Radians Problem The trigonometryfunctions in Ruby’s Math librarytake input in radians (2 π radians in a circle). Most real-world applications measure angles in degrees (360 degrees in a circle). You want an easy way to do trigonometry with degrees. Solution The simplest solution is to define a conversion method in Numeric that will convert a number of degrees into radians. class Numeric def degrees self * Math::PI / 180 end end You can then treat anynumeric object as a number of degrees and convert it into the corresponding number of radians, bycalling its degrees method. Trigonometryon the result will work as you’d expect: 90.degrees # => 1.5707963267949 Math::tan(45.degrees) # => 1.0 Math::cos(90.degrees) # => 6.12303176911189e-17 Math::sin(90.degrees) # => 1.0 Math::sin(89.9.degrees) # => 0.999998476913288 Math::sin(45.degrees) # => 0.707106781186547 Math::cos(45.degrees) # => 0.707106781186548 2.9 Converting Between Degrees and Radians | 59 Discussion I named the conversion method degrees byanalogyto the methods like hours defined byRails. This makes the code easyto read, but if youlook at the actual numbers, it’s not obvious why 45.degrees should equal the floating-point number 0.785398163397448. If this troubles you, you could name the method something like degrees_to_radians. Or you could use Lucas Carlson’s units gem, which lets you define customized unit conversions, and tracks which unit is being used for a particular number. require 'rubygems' require 'units/base' class Numeric remove_method(:degrees) # Remove the implementation given in the Solution add_unit_conversions(:angle => { :radians => 1, :degrees => Math::PI/180 }) add_unit_aliases(:angle => { :degrees => [:degree], :radians => [:radian] }) end 90.degrees # => 90.0 90.degrees.unit # => :degrees 90.degrees.to_radians # => 1.5707963267949 90.degrees.to_radians.unit # => :radians 1.degree.to_radians # => 0.0174532925199433 1.radian.to_degrees # => 57.2957795130823 The units you define with the units gem do nothing but make your code more read- able. The trigonometry methods don’t understand the units you’ve defined, so you’ll still have to give them numbers in radians. # Don't do this: Math::sin(90.degrees) # => 0.893996663600558 # Do this: Math::sin(90.degrees.to_radians) # => 1.0 Of course, you could also change the trigonometry methods to be aware of units: class << Math alias old_sin sin def sin(x) old_sin(x.unit == :degrees ? x.to_radians : x) end end 90.degrees # => 90.0 Math::sin(90.degrees) # => 1.0 Math::sin(Math::PI/2.radians) # => 1.0 Math::sin(Math::PI/2) # => 1.0 That’s probably overkill, though. 60 | Chapter 2: Numbers See Also • Recipe 8.9, “Converting and Coercing Objects to Different Types” • The Facets More library (available as the facets_more gem) also has a Units module 2.10 Multiplying Matrices Problem You want to turn arrays of arrays of numbers into mathematical matrices, and multi- ply the matrices together. Solution You can create Matrix objects from arrays of arrays, and multiply them together with the * operator: require 'matrix' require 'mathn' a1 = [[1, 1, 0, 1], [2, 0, 1, 2], [3, 1, 1, 2]] m1 = Matrix[*a1] # => Matrix[[1, 1, 0, 1], [2, 0, 1, 2], [3, 1, 1, 2]] a2 = [[1, 0], [3, 1], [1, 0], [2, 2.5]] m2 = Matrix[*a2] # => Matrix[[1, 0], [3, 1], [1, 0], [2, 2.5]] m1 * m2 # => Matrix[[6, 3.5], [7, 5.0], [11, 6.0]] Note the unusual syntax for creating a Matrix object: you pass the rows of the matrix into the array indexing operator, not into Matrix#new (which is private). Discussion Ruby’s Matrix class overloads the arithmetic operators to support all the basic matrix arithmetic operations, including multiplication, between matrices of compatible dimension. If you perform an arithmetic operation on incompatible matrices, you’ll get an ExceptionForMatrix::ErrDimensionMismatch. Multiplying one matrix by another is simple enough, but multiplying a chain of matrices together can be faster or slower depending on the order in which you do the multiplications. This follows from the fact that multiplying a matrix with dimen- sions K × M, bya matrix with dimensions MxN, requires K * M * N operations and 2.10 Multiplying Matrices | 61 gives a matrix with dimension K * N. If K is large for some matrix, you can save time by waiting til the end before doing multiplications involving that matrix. Consider three matrices A, B, and C, which you want to multiply together. A has 100 rows and 20 columns. B has 20 rows and 10 columns. C has 10 rows and one column. Since matrix multiplication is associative, you’ll get the same results whether you mul- tiplyA byB and then the result byC, or multiplyB byC and then the result byA. But multiplying A by B requires 20,000 operations (100 * 20 * 10), and multiplying (AB) by C requires another 1,000 (100 * 10 * 1). Multiplying B by C only requires 200 opera- tions (20 * 10 * 1), and multiplying the result by A requires 2,000 more (100 * 20 * 1). It’s almost 10 times faster to multiply A(BC) instead of the naive order of (AB)C. That kind of potential savings justifies doing some up-front work to find the best order for the multiplication. Here is a method that recursivelyfigures out the most efficient multiplication order for a list of Matrix objects, and another method that actuallycar- ries out the multiplications. Theyshare an arraycontaining information about where to divide up the list of matrices: where to place the parentheses, if you will. class Matrix def self.multiply(*matrices) cache = [] matrices.size.times { cache << [nil] * matrices.size } best_split(cache, 0, matrices.size-1, *matrices) multiply_following_cache(cache, 0, matrices.size-1, *matrices) end Because the methods that do the actual work pass around recursion arguments that the end user doesn’t care about, I’ve created Matrix.multiply, a wrapper method for the methods that do the real work. These methods are defined below (Matrix.best_split and Matrix.multiply_following_cache). Matrix.multiply_following_cache assumes that the optimal wayto multiplythat list of Matrix objects has alreadybeen found and encoded in a variable cache. It recursivelyperforms the matrix multiplications in the optimal order, as determined by the cache. :private def self.multiply_following_cache(cache, chunk_start, chunk_end, *matrices) if chunk_end == chunk_start # There's only one matrix in the list; no need to multiply. return matrices[chunk_start] elsif chunk_end-chunk_start == 1 # There are only two matrices in the list; just multiply them together. lhs, rhs = matrices[chunk_start..chunk_end] else # There are more than two matrices in the list. Look in the # cache to see where the optimal split is located. Multiply # together all matrices to the left of the split (recursively, # in the optimal order) to get our equation's left-hand # side. Similarly for all matrices to the right of the split, to # get our right-hand side. split_after = cache[chunk_start][chunk_end][1] 62 | Chapter 2: Numbers lhs = multiply_following_cache(cache, chunk_start, split_after, *matrices) rhs = multiply_following_cache(cache, split_after+1, chunk_end, *matrices) end # Begin debug code: this illustrates the order of multiplication, # showing the matrices in terms of their dimensions rather than their # (possibly enormous) contents. if $DEBUG lhs_dim = "#{lhs.row_size}x#{lhs.column_size}" rhs_dim = "#{rhs.row_size}x#{rhs.column_size}" cost = lhs.row_size * lhs.column_size * rhs.column_size puts "Multiplying #{lhs_dim} by #{rhs_dim}: cost #{cost}" end # Do a matrix multiplication of the two matrices, whether they are # the only two matrices in the list or whether they were obtained # through two recursive calls. return lhs * rhs end Finally, here’s the method that actually figures out the best way of splitting up the multiplcations. It builds the cache used bythe multiply_following_cache method defined above. It also uses the cache as it builds it, so that it doesn’t solve the same subproblems over and over again. def self.best_split(cache, chunk_start, chunk_end, *matrices) if chunk_end == chunk_start cache[chunk_start][chunk_end] = [0, nil] end return cache[chunk_start][chunk_end] if cache[chunk_start][chunk_end] #Try splitting the chunk at each possible location and find the #minimum cost of doing the split there. Then pick the smallest of #the minimum costs: that's where the split should actually happen. minimum_costs = [] chunk_start.upto(chunk_end-1) do |split_after| lhs_cost = best_split(cache, chunk_start, split_after, *matrices)[0] rhs_cost = best_split(cache, split_after+1, chunk_end, *matrices)[0] lhs_rows = matrices[chunk_start].row_size rhs_rows = matrices[split_after+1].row_size rhs_cols = matrices[chunk_end].column_size merge_cost = lhs_rows * rhs_rows * rhs_cols cost = lhs_cost + rhs_cost + merge_cost minimum_costs << cost end minimum = minimum_costs.min minimum_index = chunk_start + minimum_costs.index(minimum) return cache[chunk_start][chunk_end] = [minimum, minimum_index] end end A simple test confirms the example set of matrices spelled out earlier. Remember that we had a 100 × 20 matrix (A), a 20 × 10 matrix (B), and a 20 × 1 matrix (C). Our 2.10 Multiplying Matrices | 63 method should be able to figure out that it’s faster to multiplyA(BC) than the naive multiplication (AB)C. Since we don’t care about the contents of the matrices, just the dimensions, we’ll first define some helper methods that make it easyto generate matrices with specific dimensions but random contents. class Matrix # Creates a randomly populated matrix with the given dimensions. def self.with_dimensions(rows, cols) a = [] rows.times { a << []; cols.times { a[-1] << rand(10) } } return Matrix[*a] end # Creates an array of matrices that can be multiplied together def self.multipliable_chain(*rows) matrices = [] 0.upto(rows.size-2) do |i| matrices << Matrix.with_dimensions(rows[i], rows[i+1]) end return matrices end end After all that, the test is kind of anticlimactic: # Create an array of matrices 100x20, 20x10, 10x1. chain = Matrix.multipliable_chain(100, 20, 10, 1) # Multiply those matrices two different ways, giving the same result. Matrix.multiply(*chain) == (chain[0] * chain[1] * chain[2]) # Multiplying 20x10 by 10x1: cost 200 # Multiplying 100x20 by 20x1: cost 2000 # => true We can use the Benchmark libraryto verifythat matrix multiplication goes much faster when we do the multiplications in the right order: # We'll generate the dimensions and contents of the matrices randomly, # so no one can accuse us of cheating. dimensions = [] 10.times { dimensions << rand(90)+10 } chain = Matrix.multipliable_chain(*dimensions) require 'benchmark' result_1 = nil result_2 = nil Benchmark.bm(11) do |b| b.report("Unoptimized") do result_1 = chain[0] chain[1..chain.size].each { |c| result_1 *= c } end b.report("Optimized") { result_2 = Matrix.multiply(*chain) } end 64 | Chapter 2: Numbers # user system total real # Unoptimized 4.350000 0.400000 4.750000 ( 11.104857) # Optimized 1.410000 0.110000 1.520000 ( 3.559470) # Both multiplications give the same result. result_1 == result_2 # => true See Also • Recipe 2.11, “Solving a System of Linear Equations,” uses matrices to solve lin- ear equations • For more on benchmarking, see Recipe 17.13, “Benchmarking Competing Solutions” 2.11 Solving a System of Linear Equations Problem You have a number of linear equations (that is, equations that look like “2x + 10y+ 8z = 54”), and you want to figure out the solution: the values of x, y, and z. You have as many equations as you have variables, so you can be certain of a unique solution. Solution Create two Matrix objects. The first Matrix should contain the coefficients of your equations (the 2, 10, and 8 of “2x + 10y+ 8z = 54”), and the second should contain the constant results (the 54 of the same equation). The numbers in both matrices should be represented as floating-point numbers, rational numbers, or BigDecimal objects: anything other than plain Ruby integers. Then invert the coefficient matrix with Matrix#inverse, and multiplythe result by the matrix full of constants. The result will be a third Matrix containing the solu- tions to your equations. For instance, consider these three linear equations in three variables: 2x + 10y + 8z = 54 7y + 4z = 30 5x + 5y + 5z = 35 To solve these equations, create the two matrices: require 'matrix' require 'rational' coefficients = [[2, 10, 8], [0, 7, 4], [5, 5, 5]].collect! do |row| row.collect! { |x| Rational(x) } end coefficients = Matrix[*coefficients] # => Matrix[[Rational(2, 1), Rational(10, 1), Rational(8, 1)], # => [Rational(0, 1), Rational(7, 1), Rational(4, 1)], 2.11 Solving a System of Linear Equations | 65 # => [Rational(5, 1), Rational(5, 1), Rational(5, 1)]] constants = Matrix[[Rational(54)], [Rational(30)], [Rational(35)]] Take the inverse of the coefficient matrix, and multiplyit bythe results matrix. The result will be a matrix containing the values for your variables. solutions = coefficients.inverse * constants # => Matrix[[Rational(1, 1)], [Rational(2, 1)], [Rational(4, 1)]] This means that, in terms of the original equations, x=1, y=2, and z=4. Discussion This mayseem like magic, but it’s analagous to how youmight use algebra to solve a single equation in a single variable. Such an equation looks something like Ax = B: for instance, 6x = 18. To solve for x, you divide both sides by the coefficient: . The sixes on the left side of the equation cancel out, and you can show that x is 18/6, or 3. In that case there’s onlyone coefficient and one constant. With n equations in n vari- ables, you have n2 coefficients and n constants, but bypacking them into matrices you can solve the problem in the same way. Here’s a side-by-side comparision of the set of equations from the Solution, and the corresponding matrices created in order to solve the system of equations. 2x + 10y + 8z = 54 | [ 2 10 8] [x] [54] x + 7y + 4z = 31 | [ 1 7 4] [y] = [31] 5x + 5y + 5z = 35 | [ 5 5 5] [z] [35] If you think of each matrix as a single value, this looks exactly like an equation in a single variable. It’s Ax = B, onlythis time A, x, and B are matrices. Again youcan solve the problem bydividing both sides byA: x = B/A. This time, you’lluse matrix division instead of scalar division, and your result will be a matrix of solutions instead of a single solution. For numbers, dividing B byA is equivalent to multiplyingB bythe inverse of A. For instance, 9/3 equals 9 * 1/3. The same is true of matrices. To divide a matrix B by another matrix A, you multiply B by the inverse of A. The Matrix class overloads the division operator to do multiplication bythe inverse, so you might wonder why we don’t just use that. The problem is that Matrix#/ calcu- lates B/A as B * A.inverse, and what we want is A.inverse * B. Matrix multiplication isn’t commutative, and so neither is division. The developers of the Matrix class had to pick an order to do the multiplication, and theychose the one that won’t work for solving a system of equations. For the most accurate results, you should use Rational or BigDecimal numbers to represent your coefficients and values. You should never use integers. Calling 6x 6------ 18 6------= 66 | Chapter 2: Numbers Matrix#inverse on a matrix full of integers will do the inversion using integer divi- sion. The result will be totallyinaccurate, and youwon’t get the right solutions to your equations. Here’s a demonstration of the problem. Multiplying a matrix by its inverse should get you an identity matrix, full of zeros but with ones going down the right diagonal. This is analagous to the way multiplying 3 by 1/3 gets you 1. When the matrix is full of rational numbers, this works fine: matrix = Matrix[[Rational(1), Rational(2)], [Rational(2), Rational(1)]] matrix.inverse # => Matrix[[Rational(-1, 3), Rational(2, 3)], # => [Rational(2, 3), Rational(-1, 3)]] matrix * matrix.inverse # => Matrix[[Rational(1, 1), Rational(0, 1)], # => [Rational(0, 1), Rational(1, 1)]] But if the matrix is full of integers, multiplying it by its inverse will give you a matrix that looks nothing like an identity matrix. matrix = Matrix[[1, 2], [2, 1]] matrix.inverse # => Matrix[[-1, 1], # => [0, -1]] matrix * matrix.inverse # => Matrix[[-1, -1], # => [-2, 1]] Inverting a matrix that contains floating-point numbers is a lesser mistake: Matrix#inverse tends to magnifythe inevitable floating-point rounding errors. Multi- plying a matrix full of floating-point numbers by its inverse will get you a matrix that’s almost, but not quite, an identity matrix. float_matrix = Matrix[[1.0, 2.0], [2.0, 1.0]] float_matrix.inverse # => Matrix[[-0.333333333333333, 0.666666666666667], # => [0.666666666666667, -0.333333333333333]] float_matrix * float_matrix.inverse # => Matrix[[1.0, 0.0], # => [1.11022302462516e-16, 1.0]] See Also • Recipe 2.10, “Multiplying Matrices” • Another wayof solving systemsof linear equations is with Gauss-Jordan elimi- nation; Shin-ichiro Hara has written an algebra libraryfor Ruby,which includes a module for doing Gaussian elimination, along with lots of other linear algebra libraries (http://blade.nagaokaut.ac.jp/~sinara/ruby/math/algebra/) 2.12 Using Complex Numbers | 67 • There is also a package, called linalg, which provides Rubybindings to the C/Fortran LAPACK library for linear algebra (http://rubyforge.org/projects/linalg/) 2.12 Using Complex Numbers Problem You want to represent complex (“imaginary”) numbers and perform math on them. Solution Use the Complex class, defined in the complex library. All mathematical and trigono- metric operations are supported. require 'complex' Complex::I # => Complex(0, 1) a = Complex(1, 4) # => Complex(1, 4) a.real # => 1 a.image # => 4 b = Complex(1.5, 4.25) # => Complex(1.5, 4.25) b + 1.5 # => Complex(3.0, 4.25) b + 1.5*Complex::I # => Complex(1.5, 5.75) a - b # => Complex(-0.5, -0.25) a * b # => Complex(-15.5, 10.25) b.conjugate # => Complex(1.5, -4.25) Math::sin(b) # => Complex(34.9720129257216, 2.47902583958724) Discussion You can use two floating-point numbers to keep track of the real and complex parts of a complex number, but that makes it complicated to do mathematical operations such as multiplication. If you were to write functions to do these operations, you’d have more or less reimplemented the Complex class. Complex simplykeeps two instances of Numeric, and implements the basic math operations on them, keeping them together as a complex number. It also implements the complex-specific mathe- matical operation Complex#conjugate. Complex numbers have manyuses in scientific applications, but probablytheir coolest application is in drawing certain kinds of fractals. Here’s a class that uses complex numbers to calculate and draw a character-based representation of the Mandelbrot set, scaled to whatever size your screen can handle. class Mandelbrot # Set up the Mandelbrot generator with the basic parameters for # deciding whether or not a point is in the set. 68 | Chapter 2: Numbers def initialize(bailout=10, iterations=100) @bailout, @iterations = bailout, iterations end A point (x,y) on the complex plane is in the Mandelbrot set unless a certain iterative calculation tends to infinity. We can’t calculate “tends towards infinity” exactly, but we can iterate the calculation a certain number of times waiting for the result to exceed some “bail-out” value. If the result ever exceeds the bail-out value, Mandelbrot assumes the calculation goes all the wayto infinity,which takes it out of the Mandelbrot set. Otherwise, the itera- tion will run through without exceeding the bail-out value. If that happens, Mandelbrot makes the opposite assumption: the calculation for that point will never go to infinity, which puts it in the Mandelbrot set. The default values for bailout and iterations are precise enough for small, chunky ASCII renderings. If you want to make big posters of the Mandelbrot set, you should increase these numbers. Next, let’s define a method that uses bailout and iterations to guess whether a specific point on the complex plane belongs to the Mandelbrot set. The variable x is a position on the real axis of the complex plane, and y is a position on the imaginary axis. # Performs the Mandelbrot operation @iterations times. If the # result exceeds @bailout, assume this point goes to infinity and # is not in the set. Otherwise, assume it is in the set. def mandelbrot(x, y) c = Complex(x, y) z = 0 @iterations.times do |i| z = z**2 + c # This is the Mandelbrot operation. return false if z > @bailout end return true end The most interesting part of the Mandelbrot set lives between –2 and 1 on the real axis of the complex plane, and between –1 and 1 on the complex axis. The final method in Mandelbrot produces an ASCII map of that portion of the complex plane. It maps each point on an ASCII grid to a point on or near the Mandelbrot set. If Mandelbrot estimates that point to be in the Mandelbrot set, it puts an asterisk in that part of the grid. Otherwise, it puts a space there. The larger the grid, the more points are sampled and the more precise the map. def render(x_size=80, y_size=24, inside_set="*", outside_set=" ") 0.upto(y_size) do |y| 0.upto(x_size) do |x| scaled_x = -2 + (3 * x / x_size.to_f) scaled_y = 1 + (-2 * y / y_size.to_f) print mandelbrot(scaled_x, scaled_y) ? inside_set : outside_set end puts 2.13 Simulating a Subclass of Fixnum | 69 end end end Even at very small scales, the distinctive shape of the Mandelbrot set is visible. Mandelbrot.new.render(25, 10) # ** # **** # ******** # *** ********* # ******************* # *** ********* # ******** # **** # ** See Also • The scaling equation, used to map the complex plane onto the terminal screen, is similar to the equations used to scale data in Recipe 12.5, “Adding Graphical Context with Sparklines,” and Recipe 12.14, “Representing Data as MIDI Music” 2.13 Simulating a Subclass of Fixnum Problem You want to create a class that acts like a subclass of Fixnum, Float, or one of Ruby’s other built-in numeric classes. This wondrous class can be used in arithmetic along with real Integer or Float objects, and it will usuallyact like one of those objects, but it will have a different representation or implement extra functionality. Solution Let’s take a concrete example and consider the possibilities. Suppose you wanted to create a class that acts just like Integer, except its string representation is a hexadeci- mal string beginning with “0x”. Where a Fixnum’s string representation might be “208”, this class would represent 208 as “0xc8”. You could modify Integer#to_s to output a hexadecimal string. This would proba- blydrive youinsane because it would change the behavior for all Integer objects. From that point on, nearlyall the numbers youuse would have hexadecimal string representations. You probablywant hexadecimal string representations onlyfor a few of your numbers. This is a job for a subclass, but you can’t usefully subclass Fixnum (the Discussion explains whythis is so). The onlyalternative is delegation. You need to create a class that contains an instance of Fixnum, and almost always delegates method calls to that instance. The onlymethod calls it doesn’t delegate should be the ones that it wants to override. 70 | Chapter 2: Numbers The simplest wayto do this is to create a custom delegator class with the delegate library. A class created with DelegateClass accepts another object in its constructor, and delegates all methods to the corresponding methods of that object. require 'delegate' class HexNumber < DelegateClass(Fixnum) # The string representations of this class are hexadecimal numbers. def to_s sign = self < 0 ? "-" : "" hex = abs.to_s(16) "#{sign}0x#{hex}" end def inspect to_s end end HexNumber.new(10) # => 0xa HexNumber.new(-10) # => -0xa HexNumber.new(1000000) # => 0xf4240 HexNumber.new(1024 ** 10) # => 0x10000000000000000000000000 HexNumber.new(10).succ # => 11 HexNumber.new(10) * 2 # => 20 Discussion Some object-oriented languages won’t let you subclass the “basic” data types like integers. Other languages implement those data types as classes, so you can subclass them, no questions asked. Rubyimplements numbers as classes ( Integer, with its concrete subclasses Fixnum and Bignum), and you can subclass those classes. If you try, though, you’ll quickly discover that your subclasses are useless: they don’t have constructors. Rubyjealouslyguards the creation of new Integer objects. This wayit ensures that, for instance, there can be only one Fixnum instance for a given number: 100.object_id # => 201 (10 * 10).object_id # => 201 Fixnum.new(100) # NoMethodError: undefined method `new' for Fixnum:Class You can have more than one Bignum object for a given number, but you can only cre- ate them byexceeding the bounds of Fixnum. There’s no Bignum constructor, either. The same is true for Float. (10 ** 20).object_id # => -606073730 ((10 ** 19) * 10).object_id # => -606079360 Bignum.new(10 ** 20) # NoMethodError: undefined method `new' for Bignum:Class 2.13 Simulating a Subclass of Fixnum | 71 If you subclass Integer or one of its subclasses, you won’t be able to create any instances of your class—not because those classes aren’t “real” classes, but because they don’t really have constructors. You might as well not bother. So how can you create a custom number-like class without redefining all the meth- ods of Fixnum? You can’t, really. The good news is that in Ruby, there’s nothing pain- ful about redefining all the methods of Fixnum. The delegate librarytakes care of it for you. You can use this library to generate a class that responds to all the same method calls as Fixnum. It does this bydelegating all those method calls to a Fixnum object it holds as a member. You can then override those classes at your leisure, cus- tomizing behavior. Since most methods are delegated to the member Fixnum, you can perform math on HexNumber objects, use succ and upto, create ranges, and do almost anything else you can do with a Fixnum. Calling HexNumber#is_a?(Fixnum) will return false, but you can change even that by manually overriding is_a?. Alas, the illusion is spoiled somewhat bythe fact that when youperform math on HexNumber objects, you get Fixnum objects back. HexNumber.new(10) * 2 # => 20 HexNumber.new(10) + HexNumber.new(200) # => 210 Is there a wayto do math with HexNumber objects and get HexNumber objects as results? There is, but it requires moving a little bit beyond the comfort of the delegate library. Instead of simply delegating all our method calls to an Integer object, we want to delegate the method calls, then intercept and modifythe return values. If a method call on the underlying Integer object returns an Integer or a collection of Integers, we want to convert it into a HexNumber object or a collection of HexNumbers. The easiest wayto delegate all methods is to create a class that’s nearlyemptyand define a method_missing method. Here’s a second HexNumber class that silentlycon- verts the results of mathematical operations (and anyother Integer result from a method of Integer) into HexNumber objects. It uses the BasicObject class from the Facets More library(available as the facets-more gem): a class that defines almost no methods at all. This lets us delegate almost everything to Integer. require 'rubygems' require 'facet/basicobject' class BetterHexNumber < BasicObject def initialize(integer) @value = integer end # Delegate all methods to the stored integer value. If the result is a # Integer, transform it into a BetterHexNumber object. If it's an # enumerable containing Integers, transform it into an enumerable # containing BetterHexNumber objects. 72 | Chapter 2: Numbers def method_missing(m, *args) super unless @value.respond_to?(m) hex_args = args.collect do |arg| arg.kind_of?(BetterHexNumber) ? arg.to_int : arg end result = @value.send(m, *hex_args) return result if m == :coerce case result when Integer BetterHexNumber.new(result) when Array result.collect do |element| element.kind_of?(Integer) ? BetterHexNumber.new(element) : element end else result end end # We don't actually define any of the Fixnum methods in this class, # but from the perspective of an outside object we do respond to # them. What outside objects don't know won't hurt them, so we'll # claim that we actually implement the same methods as our delegate # object. Unless this method is defined, features like ranges won't # work. def respond_to?(method_name) super or @value.respond_to? method_name end # Convert the number to a hex string, ignoring any other base # that might have been passed in. def to_s(*args) hex = @value.abs.to_s(16) sign = self < 0 ? "-" : "" "#{sign}0x#{hex}" end def inspect to_s end end Now we can do arithmetic with BetterHexNumber objects, and get BetterHexNumber objects back: hundred = BetterHexNumber.new(100) # => 0x64 hundred + 5 # => 0x69 hundred + BetterHexNumber.new(5) # => 0x69 hundred.succ # => 0x65 hundred / 5 # => 0x14 hundred * -10 # => -0x3e8 hundred.divmod(3) # => [0x21, 0x1] (hundred...hundred+3).collect # => [0x64, 0x65, 0x66] 2.14 Doing Math with Roman Numbers | 73 A BetterHexNumber even claims to be a Fixnum, and to respond to all the methods of Fixnum! The only way to know it’s not is to call is_a?. hundred.class # => Fixnum hundred.respond_to? :succ # => true hundred.is_a? Fixnum # => false See Also • Recipe 2.6, “Converting Between Numeric Bases” • Recipe 2.14, “Doing Math with Roman Numbers” • Recipe 8.8, “Delegating Method Calls to Another Object” • Recipe 10.8, “Responding to Calls to Undefined Methods” 2.14 Doing Math with Roman Numbers Problem You want to convert between Arabic and Roman numbers, or do arithmetic with Roman numbers and get Roman numbers as your result. Solution The simplest wayto define a Roman class that acts like Fixnum is to have its instances delegate most of their method calls to a real Fixnum (as seen in the previous recipe, Recipe 2.13). First we’ll implement a container for the Fixnum delegate, and methods to convert between Roman and Arabic numbers: class Roman # These arrays map all distinct substrings of Roman numbers # to their Arabic equivalents, and vice versa. @@roman_to_arabic = [['M', 1000], ['CM', 900], ['D', 500], ['CD', 400], ['C', 100], ['XC', 90], ['L', 50], ['XL', 40], ['X', 10], ['IX', 9], ['V', 5], ['IV', 4], ['I', 1]] @@arabic_to_roman = @@roman_to_arabic.collect { |x| x.reverse }.reverse # The Roman symbol for 5000 (a V with a bar over it) is not in # ASCII nor Unicode, so we won't represent numbers larger than 3999. MAX = 3999 def initialize(number) if number.respond_to? :to_str @value = Roman.to_arabic(number) else Roman.assert_within_range(number) @value = number end end 74 | Chapter 2: Numbers # Raise an exception if a number is too large or small to be represented # as a Roman number. def Roman.assert_within_range(number) unless number.between?(1, MAX) msg = "#{number} can't be represented as a Roman number." raise RangeError.new(msg) end end #Find the Fixnum value of a string containing a Roman number. def Roman.to_arabic(s) value = s if s.respond_to? :to_str c = s.dup value = 0 invalid = ArgumentError.new("Invalid Roman number: #{s}") value_of_previous_number = MAX+1 value_from_previous_number = 0 @@roman_to_arabic.each_with_index do |(roman, arabic), i| value_from_this_number = 0 while c.index(roman) == 0 value_from_this_number += arabic if value_from_this_number >= value_of_previous_number raise invalid end c = c[roman.size..s.size] end #This one's a little tricky. We reject numbers like "IVI" and #"IXV", because they use the subtractive notation and then #tack on a number that makes the total overshoot the number #they'd have gotten without using the subtractive #notation. Those numbers should be V and XIV, respectively. if i > 2 and @@roman_to_arabic[i-1][0].size > 1 and value_from_this_number + value_from_previous_number >= @@roman_to_arabic[i-2][1] raise invalid end value += value_from_this_number value_from_previous_number = value_from_this_number value_of_previous_number = arabic break if c.size == 0 end raise invalid if c.size > 0 end return value end def to_arabic @value end 2.14 Doing Math with Roman Numbers | 75 #Render a Fixnum as a string depiction of a Roman number def to_roman value = to_arabic Roman.assert_within_range(value) repr = "" @@arabic_to_roman.reverse_each do |arabic, roman| num, value = value.divmod(arabic) repr << roman * num end repr end Next, we’ll make the class respond to all of Fixnum’s methods byimplementing a method_missing that delegates to our internal Fixnum object. This is substantiallythe same method_missing as in Recipe 2.13 Whenever possible, we’ll transform the results of a delegated method into Roman objects, so that operations on Roman objects will yield other Roman objects. # Delegate all methods to the stored integer value. If the result is # a Integer, transform it into a Roman object. If it's an array # containing Integers, transform it into an array containing Roman # objects. def method_missing(m, *args) super unless @value.respond_to?(m) hex_args = args.collect do |arg| arg.kind_of?(Roman) ? arg.to_int : arg end result = @value.send(m, *hex_args) return result if m == :coerce begin case result when Integer Roman.new(result) when Array result.collect do |element| element.kind_of?(Integer) ? Roman.new(element) : element end else result end rescue RangeError # Too big or small to fit in a Roman number. Use the original number result end end The onlymethods that won’t trigger method_missing are methods like to_s, which we’re going to override with our own implementations: def respond_to?(method_name) super or @value.respond_to? method_name end 76 | Chapter 2: Numbers def to_s to_roman end def inspect to_s end end We’ll also add methods to Fixnum and String that make it easyto create Roman objects: class Fixnum def to_roman Roman.new(self) end end class String def to_roman Roman.new(self) end end Now we’re ready to put the Roman class through its paces: 72.to_roman # => LXXII 444.to_roman # => CDXLIV 1979.to_roman # => MCMLXXIX 'MCMXLVIII'.to_roman # => MCMXLVIII Roman.to_arabic('MCMLXXIX') # => 1979 'MMI'.to_roman.to_arabic # => 2001 'MMI'.to_roman + 3 # => MMIV 'MCMXLVIII'.to_roman # => MCMXLVIII 612.to_roman * 3.to_roman # => MDCCCXXXVI (612.to_roman * 3).divmod('VII'.to_roman) # => [CCLXII, II] 612.to_roman * 10000 # => 6120000 # Too big 612.to_roman * 0 # => 0 # Too small 'MCMXCIX'.to_roman.succ # => MM ('I'.to_roman..'X'.to_roman).collect # => [I, II, III, IV, V, VI, VII, VIII, IX, X] Here are some invalid Roman numbers that the Roman class rejects: 'IIII'.to_roman # ArgumentError: Invalid Roman number: IIII 'IVI'.to_roman # ArgumentError: Invalid Roman number: IVI 'IXV'.to_roman # ArgumentError: Invalid Roman number: IXV 'MCMM'.to_roman # ArgumentError: Invalid Roman number: MCMM 2.14 Doing Math with Roman Numbers | 77 'CIVVM'.to_roman # ArgumentError: Invalid Roman number: CIVVM -10.to_roman # RangeError: -10 can't be represented as a Roman number. 50000.to_roman # RangeError: 50000 can't be represented as a Roman number. Discussion The rules for constructing Roman numbers are more complex than those for con- structing positional numbers such as the Arabic numbers we use. An algorithm for parsing an Arabic number can scan from the left, looking at each character in isola- tion. If you were to scan a Roman number from the left one character at a time, you’d often find yourself having to backtrack, because what you thought was “XI” (11) would frequently turn out to be “XIV” (14). The simplest wayto parse a Roman number is to adapt the algorithm so that (for instance) “IV” as treated as its own “character,” distinct from “I” and “V”. If you have a list of all these “characters” and their Arabic values, you can scan a Roman number from left to right with a greedyalgorithm that keeps a running total. Since there are few of these “characters” (only13 of them, for numbers up to 3,999), and none of them are longer than 2 letters, this algorithm is workable. To generate a Roman number from an Arabic number, you can reverse the process. The Roman class given in the Solution works like Fixnum, thanks to the method_missing strategyfirst explained in Recipe 2.13. This lets youdo math entirelyin Roman num- bers, except when a result is out of the supported range of the Roman class. Since this Roman implementation onlysupports 3999 distinct numbers, youcould make the implementation more efficient bypregenerating all of them and retrieving them from a cache as needed. The given implementation lets you extend the imple- mentation to handle larger numbers: you just need to decide on a representation for the larger Roman characters that will work for your encoding. The Roman numeral for 5,000 (a V with a bar over it) isn’t present in ASCII, but there are Unicode characters U+2181 (the Roman numeral 5,000) and U+2182 (the Roman numeral 10,000), so that’s the obvious choice for representing Roman num- bers up to 39,999. If you’re outputting to HTML, you can use a CSS style to put a bar above “V”, “X”, and so on. If you’re stuck with ASCII, you might choose “_V” to represent 5,000, “_X” to represent 10,000, and so on. Whatever you chose, you’d add the appropriate “characters” to the roman_to_arabic array(remembering to add “M_V” and “_V_X” as well as “_V” and “_X”), increment MAX, and suddenlybe able to instantiate Roman objects for large numbers. The Roman#to_arabic method implements the “new” rules for Roman numbers: that is, the ones standardized in the Middle Ages. It rejects certain number representa- tions, like IIII, used by the Romans themselves. 78 | Chapter 2: Numbers Roman numbers are common as toyor contest problems, but it’s rare that a pro- grammer will have to treat a Roman number as a number, as opposed to a funny- looking string. In parts of Europe, centuries and the month section of dates are writ- ten using Roman numbers. Apart from that, outline generation is probablythe only real-world application where a programmer needs to treat a Roman number as a number. Outlines need several of visuallydistinct waysto represent the counting numbers, and Roman numbers (upper- and lowercase) provide two of them. If you’re generating an outline in plain text, you can use Roman#succ to generate a succession of Roman numbers. If your outline is in HTML format, though, you don’t need to know anything about Roman numbers at all. Just give an
    tag a CSS style of list-style-type:lower-roman or list-style-type:upper-roman. Output the ele- ments of your outline as
  1. tags inside the
      tag. All modern browsers will do the right thing:
      1. Primus
      2. Secundis
      3. Tertius
      See Also • Recipe 2.13, “Simulating a Subclass of Fixnum” • An episode of the RubyQuiz focused on algorithms for converting between Roman and Arabic numbers; one solution uses an elegant technique to make it easier to create Roman numbers from within Ruby: it overrides Object#const_ missing to convert anyundefined constant into a Roman number; this lets you issue a statement like XI + IX, and get XX as the result (http://www.rubyquiz.com/ quiz22.html) 2.15 Generating a Sequence of Numbers Problem You want to iterate over a (possiblyinfinite) sequence of numbers the wayyoucan iterate over an array or a range. Solution Write a generator function that yields each number in the sequence. def fibonacci(limit = nil) seed1 = 0 seed2 = 1 while not limit or seed2 <= limit yield seed2 seed1, seed2 = seed2, seed1 + seed2 2.15 Generating a Sequence of Numbers | 79 end end fibonacci(3) { |x| puts x } # 1 # 1 # 2 # 3 fibonacci(1) { |x| puts x } # 1 # 1 fibonacci { |x| break if x > 20; puts x } # 1 # 1 # 2 # 3 # 5 # 8 # 13 Discussion A generator for a sequence of numbers works just like one that iterates over an array or other data structure. The main difference is that iterations over a data structure usuallyhave a natural stopping point, whereas most common number sequences are infinite. One strategyis to implement a method called each that yields the entire sequence. This works especiallywell if the sequence is finite. If not, it’s the responsibilityof the code block that consumes the sequence to stop the iteration with the break keyword. Range#each is an example of an iterator over a finite sequence, while Prime#each enu- merates the infinite set of prime numbers. Range#each is implemented in C, but here’s a (much slower) pure Rubyimplementation for study.This code uses self.begin and self.end to call Range#begin and Range#end, because begin and end are reserved words in Ruby. class Range def each_slow x = self.begin while x <= self.end yield x x = x.succ end end end (1..3).each_slow {|x| puts x} # 1 # 2 # 3 80 | Chapter 2: Numbers The other kind of sequence generator iterates over a finite portion of an infinite sequence. These are methods like Fixnum#upto and Fixnum#step: theytake a start and/ or an end point as input, and generate a finite sequence within those boundaries. class Fixnum def double_upto(stop) x = self until x > stop yield x x = x * 2 end end end 10.double_upto(50) { |x| puts x } # 10 # 20 # 40 Most sequences move monotonically up or down, but it doesn’t have to be that way: def oscillator x = 1 while true yield x x *= -2 end end oscillator { |x| puts x; break if x.abs > 50; } # 1 # -2 # 4 # -8 # 16 # -32 # 64 Though integer sequences are the most common, anytypeof number can be used in a sequence. For instance, Float#step works just like Integer#step: 1.5.step(2.0, 0.25) { |x| puts x } # => 1.5 # => 1.75 # => 2.0 Float objects don’t have the resolution to represent everyreal number. Verysmall differences between numbers are lost. This means that some Float sequences you might think would go on forever will eventually end: def zeno(start, stop) distance = stop - start travelled = start while travelled < stop and distance > 0 yield travelled distance = distance / 2.0 travelled += distance 2.16 Generating Prime Numbers | 81 end end steps = 0 zeno(0, 1) { steps += 1 } steps # => 54 See Also • Recipe 1.16, “Generating a Succession of Strings” • Recipe 2.16, “Generating Prime Numbers,” shows optimizations for generating a very well-studied number sequence • Recipe 4.1, “Iterating Over an Array” • Chapter 7 has more on this kind of generator method 2.16 Generating Prime Numbers Problem You want to generate a sequence of prime numbers, or find all prime numbers below a certain threshold. Solution Instantiate the Prime class to create a prime number generator. Call Prime#succ to get the next prime number in the sequence. require 'mathn' primes = Prime.new primes.succ # => 2 primes.succ # => 3 Use Prime#each to iterate over the prime numbers: primes.each { |x| puts x; break if x > 15; } # 5 # 7 # 11 # 13 # 17 primes.succ # => 19 Discussion Because prime numbers are both mathematicallyinteresting and useful in crypto- graphic applications, a lot of studyhas been lavished on them. Manyalgorithms have been devised for generating prime numbers and determining whether a number is prime. The code in this recipe walks a line between efficiencyand ease of implementation. 82 | Chapter 2: Numbers The best-known prime number algorithm is the Sieve of Eratosthenes, which finds all primes in a certain range byiterating over that range multiple times. On the first pass, it eliminates everyeven number greater than 2, on the second pass everythird num- ber after 3, on the third pass everyfifth number after 5, and so on. This implementa- tion of the Sieve is based on a sample program packaged with the Ruby distribution: def sieve(max=100) sieve = [] (2..max).each { |i| sieve[i] = i } (2..Math.sqrt(max)).each do |i| (i*i).step(max, i) { |j| sieve[j] = nil } if sieve[i] end sieve.compact end sieve(10) # => [2, 3, 5, 7] sieve(100000).size # => 9592 The Sieve is a fast wayto find the primes smaller than a certain number, but it’s memory-inefficient and it’s not suitable for generating an infinite sequence of prime numbers. It’s also not verycompatible with the Rubyidiom of generator methods. This is where the Prime class comes in. A Prime object stores the current state of one iteration over the set of primes. It con- tains all information necessaryto calculate the next prime number in the sequence. Prime#each repeatedlycalls Prime#succ and yields it up to whatever code block was passed in. Ruby1.9 has an efficient implementation of Prime#each, but Ruby1.8 has a very slow implementation. The following code is based on the 1.9 implementation, and it illustrates manyof the simple tricks that drasticallyspeed up algorithms that find or use primes. You can use this code, or just paste the code from Ruby1.9’s mathn.rb into your 1.8 program. The first trick is to share a single list of primes between all Prime objects bymaking it a class variable. This makes it much faster to iterate over multiple Prime instances, but it also uses more memory because the list of primes will never be garbage-collected. We initialize the list with the first few prime numbers. This helps earlyperformance a little bit, but it’s mainlyto get rid of edge cases. The class variable @@check_next tracks the next number we think might be prime. require 'mathn' class Prime @@primes = [2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97, 101] @@check_next = 103 end 2.16 Generating Prime Numbers | 83 A number is prime if it has no factors: more precisely, if it has no prime factors between 2 and its square root. This code uses the list of prime numbers not onlyas a cache, but as a data structure to help find larger prime numbers. Instead of checking all the possible factors of a number, we only need to check some of the prime factors. To avoid calculating square roots, we have @@limit track the largest prime number less than the square root of @@check_next. We can decide when to increment it by calculating squares instead of square roots: class Prime # @@primes[3] < sqrt(@@check_next) < @@primes[4] @@limit = 3 # sqrt(121) == @@primes[4] @@increment_limit_at = 121 end Now we need a new implementation of Prime#succ. Starting from @@check_next, the new implementation iterates over numbers until it finds one that’s prime, then returns the prime number. But it doesn’t iterate over the numbers one at a time: we can do better than that. It skips even numbers and numbers divisible bythree, which are obviously not prime. class Prime def succ @index += 1 while @index >= @@primes.length if @@check_next + 4 > @@increment_limit_at @@limit += 1 @@increment_limit_at = @@primes[@@limit + 1] ** 2 end add_if_prime @@check_next += 4 add_if_prime @@check_next += 2 end return @@primes[@index] end end How does it do this? Well, consider a more formal definition of “even” and “divisi- ble bythree.” If x is congruent to 2 or 4, mod 6 (that is, if x % 6 is 2 or 4), then x is even and not prime. If x is congruent to 3, mod 6, then x is divisible by3 and not prime. If x is congruent to 1 or 5, mod 6, then x might be prime. Our starting point is @@check_next, which starts out at 103. 103 is congruent to 1, mod 6, so it might be prime. Adding 4 gives us 107, a number congruent to 5, mod 6. We skipped two even numbers (104 and 106) and a number divisible by3 (105). Adding 2 to 107 skips another even number and gives us 109. Like 103, 109 is con- gruent to 1, mod 6. We can add 4 and 2 again to get two more numbers that might 84 | Chapter 2: Numbers be prime. Bycontinuallyadding 4 and then 2 to @@check_next, we can skip over the numbers that are obviously not prime. Although all Prime objects share a list of primes, each object should start yielding primes from the beginning of the list: class Prime def initialize @index = -1 end end Finally, here’s the method that actually checks @@check_next for primality, by look- ing for a prime factor of that number between 5 and @@limit. We don’t have to check 2 and 3 because succ skips numbers divisible by2 and 3. If no prime factor is found, the number is prime: we add it to the class-wide list of primes, where it can be returned by succ or yielded to a code block by each. class Prime private def add_if_prime factor = @@primes[2..@@limit].find { |prime| @@check_next % prime == 0 } @@primes << @@check_next unless factor end end end Here’s the new Prime class in action, finding the ten-thousandth prime: primes = Prime.new p = nil 10000.times { p = primes.succ } p # => 104729 Checking primality The simplest wayto check whether a particular number is prime is to generate all the primes up to that number and see whether the number itself is generated as a prime. class Prime def prime?(n) succ( ) while @seed < n return @primes.member?(n) end end If all of this is too complicated for you, there’s a very simple constant-time probabi- listic test for primality that works more than half the time: def probably_prime?(x) x < 8 end probably_prime? 2 # => true probably_prime? 5 # => true 2.17 Checking a Credit Card Checksum | 85 probably_prime? 6 # => true probably_prime? 7 # => true probably_prime? 8 # => false probably_prime? 100000 # => false See Also • Recipe 2.15, “Generating a Sequence of Numbers” • K. Kodama has written a number of simple and advanced primalitytests in Ruby (http://www.math.kobe-u.ac.jp/~kodama/tips-prime.html) 2.17 Checking a Credit Card Checksum Problem You want to know whether a credit card number was entered correctly. Solution The last digit of everycredit card is a checksum digit. You can compare the other digits against the checksum to catch mistakes someone might make when typing their credit card number. Lucas Carlson’s CreditCard library, available as the creditcard gem, contains Ruby implementations of the checksum algorithms. It adds methods to the String and Integer classes to check the internal consistency of a credit card number: require 'rubygems' require 'creditcard' '5276 4400 6542 1319'.creditcard? # => true '5276440065421313'.creditcard? # => false 1276440065421319.creditcard? # => false CreditCard can also determine which brand of credit card a certain number is for: 5276440065421313.creditcard_type # => "mastercard" Discussion The CreditCard libraryuses a well-known algorithm for finding the checksum digit of a credit card. If you can’t or don’t want to install the creditcard gem, you can just implement the algorithm yourself: module CreditCard def creditcard? numbers = self.to_s.gsub(/[^\d]+/, '').split(//) checksum = 0 0.upto numbers.length do |i| weight = numbers[-1*(i+2)].to_i * (2 - (i%2)) 86 | Chapter 2: Numbers checksum += weight % 9 end return numbers[-1].to_i == 10 - checksum % 10 end end class String include CreditCard end class Integer include CreditCard end '5276 4400 6542 1319'.creditcard? # => true How does it work? First, it converts the object to an array of numbers: numbers = '5276 4400 6542 1319'.gsub(/[^\d]+/, '').split(//) # => ["5", "2", "7", "6", "4", "4", "0", "0", # => "6", "5", "4", "2", "1", "3", "1", "9"] It then calculates a weight for each number based on its position, and adds that weight to a running checksum: checksum = 0 0.upto numbers.length do |i| weight = numbers[-1*(i+2)].to_i * (2 - (i%2)) checksum += weight % 9 end checksum # => 51 If the last number of the card is equal to 10 minus the last digit of the checksum, the number is self-consistent: numbers[-1].to_i == 10 - checksum % 10 # => true A self-consistent credit card number is just a number with a certain mathematical property. It can catch typos, but there’s no guarantee that a real credit card exists with that number. To check that, you need to use a payment gateway like Authorize.net, and a gateway library like Payment::AuthorizeNet. See Also • Recipe 16.8, “Charging a Credit Card” 87 Chapter 3 CHAPTER 3 Date and Time3 With no concept of time, our lives would be a mess. Without software programs to constantlymanage and record this bizarre aspect of our universe…well, we might actually be better off. But why take the risk? Some programs manage real-world time on behalf of the people who’d otherwise have to do it themselves: calendars, schedules, and data gatherers for scientific exper- iments. Other programs use the human concept of time for their own purposes: they mayrun experiments of their own, making decisions based on microsecond varia- tions. Objects that have nothing to do with time are sometimes given timestamps recording when theywere created or last modified. Of the basic data types,a time is the only one that directly corresponds to something in the real world. Rubysupports the date and time interfaces youmight be used to from other pro- gramming languages, but on top of them are Ruby-specific idioms that make pro- gramming easier. In this chapter, we’ll show you how to use those interfaces and idioms, and how to fill in the gaps left by the language as it comes out of the box. Rubyactuallyhas two different time implementations. There’s a set of time libraries written in C that have been around for decades. Like most modern programming languages, Rubyprovides a native interface to these C libraries. The libraries are powerful, useful, and reliable, but theyalso have some significant shortcomings, so Rubycompensates with a second time librarywritten in pure Ruby.The pure Ruby libraryisn’t used for everythingbecause it’s slower than the C interface, and it lacks some of the features buried deep in the C library, such as the management of Day- light Saving Time. The Time class contains Ruby’s interface to the C libraries, and it’s all you need for most applications. The Time class has a lot of Rubyidiom attached to it, but most of its methods have strange unRuby-like names like strftime and strptime. This is for the benefit of people who are alreadyused to the C library,or one of its other inter- faces (like Perl or Python’s). 88 | Chapter 3: Date and Time The internal representation of a Time object is a number of seconds before or since “time zero.” Time zero for Rubyis the Unix epoch: the first second GMT of January 1, 1970. You can get the current local time with Time.now, or create a Time object from seconds-since-epoch with Time.at. Time.now # => Sat Mar 18 14:49:30 EST 2006 Time.at(0) # => Wed Dec 31 19:00:00 EST 1969 This numeric internal representation of the time isn’t veryuseful as a human-readable representation. You can get a string representation of a Time, as seen above, or call accessor methods to split up an instant of time according to how humans reckon time: t = Time.at(0) t.sec # => 0 t.min # => 0 t.hour # => 19 t.day # => 31 t.month # => 12 t.year # => 1969 t.wday # => 3 # Numeric day of week; Sunday is 0 t.yday # => 365 # Numeric day of year t.isdst # => false # Is Daylight Saving Time in # effect? t.zone # => "EST" # Time zone See Recipe 3.3 for more human-readable ways of slicing and dicing Time objects. Apart from the awkward method and member names, the biggest shortcoming of the Time class is that on a 32-bit system, its underlying implementation can’t handle dates before December 1901 or after January 2037.* Time.local(1865, 4, 9) # ArgumentError: time out of range Time.local(2100, 1, 1) # ArgumentError: time out of range To represent those times, you’ll need to turn to Ruby’s other time implementation: the Date and DateTime classes. You can probablyuse DateTime for everything, and not use Date at all: require 'date' DateTime.new(1865, 4, 9).to_s # => "1865-04-09T00:00:00Z" DateTime.new(2100, 1, 1).to_s # => "2100-01-01T00:00:00Z" * A system with a 64-bit time_t can represent a much wider range of times (about half a trillion years): Time.local(1865,4,9) # => Sun Apr 09 00:00:00 EWT 1865 Time.local(2100,1,1) # => Fri Jan 01 00:00:00 EST 2100 You’ll still get into trouble with older times, though, because Time doesn’t handle calendrical reform. It’ll also give time zones to times that predate the creation of time zones (EWT stands for Eastern War Time, an American timezone used during World War II). Date and Time | 89 Recall that a Time object is stored as a fractional number of seconds since a “time zero” in 1970. The internal representation of a Date or DateTime object is a astronom- ical Julian date: a fractional number of days since a “time zero” in 4712 BCE, over 6,000 years ago. # Time zero for the date library: DateTime.new.to_s # => "-4712-01-01T00:00:00Z" # The current date and time: DateTime::now.to_s # => "2006-03-18T14:53:18-0500" A DateTime object can preciselyrepresent a time further in the past than the universe is old, or further in the future than the predicted lifetime of the universe. When DateTime handles historical dates, it needs to take into account the calendar reform movements that swept the Western world throughout the last 500 years. See Recipe 3.1 for more information on creating Date and DateTime objects. Clearly DateTime is superior to Time for astronomical and historical applications, but you can use Time for most everyday programs. This table should give you a picture of the relative advantages of Time objects and DateTime objects. Both Time and DateTime objects support niceties like iteration and date arithmetic: you can basically treat them like numbers, because they’re stored as numbers internally. But recall that a Time object is stored as a number of seconds, while a DateTime object is stored as a number of days, so the same operations will operate on different time scales on Time and DateTime objects. See Recipes 3.4 and 3.5 for more on this. So far, we’ve talked about writing code to manage specific moments in time: a moment in the past or future, or right now. The other use of time is duration, the relationship between two times: “start” and “end,” “before” and “after.” You can measure duration bysubtracting one DateTime object from another, or one Time object from another: you’ll get a result measured in days or seconds (see Recipe 3.5). If you want your program to actually experience duration (the difference between now and a time in the future), you can put a thread to sleep for a certain amount of time: see Recipes 3.12 and 3.13. Time DateTime Date range 1901–2037 on 32-bit systems Effectively infinite Handles Daylight Saving Time Yes No Handles calendar reform No Yes Time zone conversion Easy with the tz gem Difficult unless you only work with time zone offsets Common time formats like RFC822 Built-in Write them yourself Speed Faster Slower 90 | Chapter 3: Date and Time You’ll need duration most often, perhaps, during development. Benchmarking and profiling can measure how long your program took to run, and which parts of it took the longest. These topics are covered in Chapter 17: see Recipes 17.12 and 17.13. 3.1 Finding Today’s Date Problem You need to create an object that represents the current date and time, or a time in the future or past. Solution The factorymethod Time.now creates a Time object containing the current local time. If you want, you can then convert it to GMT time by calling Time#gmtime. The gmtime method actuallymodifies the underlyingtime object, though it doesn’t follow the Rubynaming conventions for such methods (it should be called something like gmtime!). now = Time.now # => Sat Mar 18 16:58:07 EST 2006 now.gmtime # => Sat Mar 18 21:58:07 UTC 2006 #The original object was affected by the time zone conversion. now # => Sat Mar 18 21:58:07 UTC 2006 To create a DateTime object for the current local time, use the factorymethod DateTime.now. Convert a DateTime object to GMT bycalling DateTime#new_offset with no argument. Unlike Time#gmtime, this method returns a second DateTime object instead of modifying the original in place. require 'date' now = DateTime.now # => # now.to_s # => "2006-03-18T16:58:07-0500" now.new_offset.to_s # => "2006-03-18T21:58:07Z" #The original object was not affected by the time zone conversion. now.to_s # => "2006-03-18T16:58:07-0500" Discussion Both Time and DateTime objects provide accessor methods for the basic ways in which the Western calendar and clock divide a moment in time. Both classes provide year, month, day, hour (in 24-hour format), min, sec, and zone accessors. Time#isdst lets you know if the underlying time of a Time object has been modified byDaylightSaving Time in its time zone. DateTime pretends Daylight Saving Time doesn’t exist. now_time = Time.new now_datetime = DateTime.now now_time.year # => 2006 3.1 Finding Today’s Date | 91 now_datetime.year # => 2006 now_time.hour # => 18 now_datetime.hour # => 18 now_time.zone # => "EST" now_datetime.zone # => "-0500" now_time.isdst # => false You can see that Time#zone and DateTime#zone are a little different. Time#zone returns a time zone name or abbreviation, and DateTime#zone returns a numeric offset from GMT in string form. You can call DateTime#offset to get the GMT offset as a num- ber: a fraction of a day. now_datetime.offset # => Rational(-5, 24) # -5 hours Both classes can also represent fractions of a second, accessible with Time#usec (that is, µsec or microseconds) and DateTime#sec_fraction. In the example above, the DateTime object was created after the Time object, so the numbers are different even though both objects were created within the same second. now_time.usec # => 247930 # That is, 247930 microseconds now_datetime.sec_fraction # => Rational(62191, 21600000000) # That is, about 287921 microseconds The date libraryprovides a Date class that is like a DateTime, without the time. To create a Date object containing the current date, the best strategyis to create a DateTime object and use the result in a call to a Date factorymethod. DateTime is actu- allya subclass of Date, so you only need to do this if you want to strip time data to make sure it doesn’t get used. class Date def Date.now return Date.jd(DateTime.now.jd) end end puts Date.now # 2006-03-18 In addition to creating a time object for this very moment, you can create one from a string (see Recipe 3.2) or from another time object (see Recipe 3.5). You can also use factorymethods to create a time object from its calendar and clock parts: the year, month, day, and so on. The factorymethods Time.local and Time.gm take arguments Time object for that time. For local time, use Time.local; for GMT, use Time.gm. All arguments after year are optional and default to zero. Time.local(1999, 12, 31, 23, 21, 5, 1044) # => Fri Dec 31 23:21:05 EST 1999 Time.gm(1999, 12, 31, 23, 21, 5, 22, 1044) # => Fri Dec 31 23:21:05 UTC 1999 92 | Chapter 3: Date and Time Time.local(1991, 10, 1) # => Tue Oct 01 00:00:00 EDT 1991 Time.gm(2000) # => Sat Jan 01 00:00:00 UTC 2000 The DateTime equivalent of Time.local is the civil factorymethod. It takes almost but not quite the same arguments as Time.local: [year, month, day, hour, minute, second, timezone_offset, date_of_calendar_reform]. The main differences from Time.local and Time.gmt are: • There’s no separate usec argument for fractions of a second. You can represent fractions of a second by passing in a rational number for second. • All the arguments are optional. However, the default year is 4712 BCE, which is probably not useful to you. • Rather than providing different methods for different time zones, you must pass in an offset from GMT as a fraction of a day. The default is zero, which means that calling DateTime.civil with no time zone will give you a time in GMT. DateTime.civil(1999, 12, 31, 23, 21, Rational(51044, 100000)).to_s # => "1999-12-31T23:21:00Z" DateTime.civil(1991, 10, 1).to_s # => "1991-10-01T00:00:00Z" DateTime.civil(2000).to_s # => "2000-01-01T00:00:00Z" The simplest wayto get the GMT offset for yourlocal time zone is to call offset on the result of DateTime.now. Then you can pass the offset into DateTime.civil: my_offset = DateTime.now.offset # => Rational(-5, 24) DateTime.civil(1999, 12, 31, 23, 21, Rational(51044, 100000), my_offset).to_s # => "1999-12-31T23:21:00-0500" Oh, and there’s the calendar-reform thing, too. Recall that Time objects can onlyrep- resent dates from a limited range (on 32-bit systems, dates from the 20th and 21st centuries). DateTime objects can represent anydate at all. The price of this greater range is that DateTime needs to worryabout calendar reform when dealing with his- torical dates. If you’re using old dates, you may run into a gap caused by a switch from the Julian calendar (which made every fourth year a leap year) to the more accurate Gregorian calendar (which occasionally skips leap years). This switch happened at different times in different countries, creating differently- sized gaps as the local calendar absorbed the extra leap days caused by using the Julian reckoning for so manycenturies. Dates created within a particular country’s gap are invalid for that country. 3.2 Parsing Dates, Precisely or Fuzzily | 93 Bydefault, Rubyassumes that Date objects you create are relative to the Italian calen- dar, which switched to Gregorian reckoning in 1582. For American and Common- wealth users, Rubyhas provided a constant Date::ENGLAND, which corresponds to the date that England and its colonies adopted the Gregorian calendar. DateTime’s con- structors and factorymethods will accept Date::ENGLAND or Date::ITALY as an extra argument denoting when calendar reform started in that country. The calendar reform argument can also be any old Julian day, letting you handle old dates from any country: #In Italy, 4 Oct 1582 was immediately followed by 15 Oct 1582. # Date.new(1582, 10, 4).to_s # => "1582-10-04" Date.new(1582, 10, 5).to_s # ArgumentError: invalid date Date.new(1582, 10, 4).succ.to_s # => "1582-10-15" #In England, 2 Sep 1752 was immediately followed by 14 Sep 1752. # Date.new(1752, 9, 2, Date::ENGLAND).to_s # => "1752-09-02" Date.new(1752, 9, 3, Date::ENGLAND).to_s # ArgumentError: invalid date Date.new(1752, 9, 2, DateTime::ENGLAND).succ.to_s # => "1752-09-14" Date.new(1582, 10, 5, Date::ENGLAND).to_s # => "1582-10-05" You probablywon’t need to use Ruby’sGregorian conversion features: it’s uncom- mon that computer applications need to deal with old dates that are both known with precision and associated with a particular locale. See Also • A list of the dates of Gregorian conversion for various countries (http://www. polysyllabic.com/GregConv.html) • Recipe 3.7, “Converting Between Time Zones • Recipe 3.8, “Checking Whether Daylight Saving Time Is in Effect” 3.2 Parsing Dates, Precisely or Fuzzily Problem You want to transform a string describing a date or date/time into a Date object. You might not know the format of the string ahead of time. 94 | Chapter 3: Date and Time Solution The best solution is to pass the date string into Date.parse or DateTime.parse. These methods use heuristics to guess at the format of the string, and theydo a prettygood job: require 'date' Date.parse('2/9/2007').to_s # => "2007-02-09" DateTime.parse('02-09-2007 12:30:44 AM').to_s # => "2007-09-02T00:30:44Z" DateTime.parse('02-09-2007 12:30:44 PM EST').to_s # => "2007-09-02T12:30:44-0500" Date.parse('Wednesday, January 10, 2001').to_s # => "2001-01-10" Discussion The parse methods can save you a lot of the drudgework associated with parsing times in other programming languages, but they don’t always give you the results you want. Notice in the first example how Date.parse assumed that 2/9/2007 was an American (month first) date instead of a European (dayfirst) date. parse also tends to misinterpret two-digit years: Date.parse('2/9/07').to_s # => "0007-02-09" Let’s saythat Date.parse doesn’t work for you, but you know that all the dates you’re processing will be formatted a certain way. You can create a format string using the standard strftime directives, and pass it along with a date string into DateTime.strptime or Date.strptime. If the date string matches up with the format string, you’ll get a Date or DateTime object back. You mayalreadybe familiar with this technique, since this manylanguages, as well as the Unix date command, do date formatting this way. Some common date and time formats include: american_date = '%m/%d/%y' Date.strptime('2/9/07', american_date).to_s # => "2007-02-09" DateTime.strptime('2/9/05', american_date).to_s # => "2005-02-09T00:00:00Z" Date.strptime('2/9/68', american_date).to_s # => "2068-02-09" Date.strptime('2/9/69', american_date).to_s # => "1969-02-09" european_date = '%d/%m/%y' Date.strptime('2/9/07', european_date).to_s # => "2007-09-02" Date.strptime('02/09/68', european_date).to_s # => "2068-09-02" Date.strptime('2/9/69', european_date).to_s # => "1969-09-02" 3.2 Parsing Dates, Precisely or Fuzzily | 95 four_digit_year_date = '%m/%d/%Y' Date.strptime('2/9/2007', four_digit_year_date).to_s # => "2007-02-09" Date.strptime('02/09/1968', four_digit_year_date).to_s # => "1968-02-09" Date.strptime('2/9/69', four_digit_year_date).to_s # => "0069-02-09" date_and_time = '%m-%d-%Y %H:%M:%S %Z' DateTime.strptime('02-09-2007 12:30:44 EST', date_and_time).to_s # => "2007-02-09T12:30:44-0500" DateTime.strptime('02-09-2007 12:30:44 PST', date_and_time).to_s # => "2007-02-09T12:30:44-0800" DateTime.strptime('02-09-2007 12:30:44 GMT', date_and_time).to_s # => "2007-02-09T12:30:44Z" twelve_hour_clock_time = '%m-%d-%Y %I:%M:%S %p' DateTime.strptime('02-09-2007 12:30:44 AM', twelve_hour_clock_time).to_s # => "2007-02-09T00:30:44Z" DateTime.strptime('02-09-2007 12:30:44 PM', twelve_hour_clock_time).to_s # => "2007-02-09T12:30:44Z" word_date = '%A, %B %d, %Y' Date.strptime('Wednesday, January 10, 2001', word_date).to_s # => "2001-01-10" If your date strings might be in one of a limited number of formats, try iterating over a list of format strings and attempting to parse the date string with each one in turn. This gives you some of the flexibility of Date.parse while letting you override the assumptions it makes. Date.parse is still faster, so if it’ll work, use that. Date.parse('1/10/07').to_s # => "0007-01-10" Date.parse('2007 1 10').to_s # ArgumentError: 3 elements of civil date are necessary TRY_FORMATS = ['%d/%m/%y', '%Y %m %d'] def try_to_parse(s) parsed = nil TRY_FORMATS.each do |format| begin parsed = Date.strptime(s, format) break rescue ArgumentError end end return parsed end try_to_parse('1/10/07').to_s # => "2007-10-01" try_to_parse('2007 1 10').to_s # => "2007-01-10" Several common date formats cannot be reliablyrepresented by strptime format strings. Rubydefines class methods of Time for parsing these date strings, so you don’t have to write the code yourself. Each of the following methods returns a Time object. 96 | Chapter 3: Date and Time Time.rfc822 parses a date string in the format of RFC822/RFC2822, the Internet email standard. In an RFC2822 date, the month and the dayof the week are always in English (for instance, “Tue” and “Jul”), even if the locale is some other language. require 'time' mail_received = 'Tue, 1 Jul 2003 10:52:37 +0200' Time.rfc822(mail_received) # => Tue Jul 01 04:52:37 EDT 2003 To parse a date in the format of RFC2616, the HTTP standard, use Time.httpdate. An RFC2616 date is the kind of date you see in HTTP headers like Last-Modified. As with RFC2822, the month and day abbreviations are always in English: last_modified = 'Tue, 05 Sep 2006 16:05:51 GMT' Time.httpdate(last_modified) # => Tue Sep 05 12:05:51 EDT 2006 To parse a date in the format of ISO 8601 or XML Schema, use Time.iso8601 or Time.xmlschema: timestamp = '2001-04-17T19:23:17.201Z' t = Time.iso8601(timestamp) # => Tue Apr 17 19:23:17 UTC 2001 t.sec # => 17 t.tv_usec # => 201000 Don’t confuse these class methods of Time with the instance methods of the same names. The class methods create Time objects from strings. The instance methods go the other way, formatting an existing Time object as a string: t = Time.at(1000000000) # => Sat Sep 08 21:46:40 EDT 2001 t.rfc822 # => "Sat, 08 Sep 2001 21:46:40 -0400" t.httpdate # => "Sun, 09 Sep 2001 01:46:40 GMT" t.iso8601 # => "2001-09-08T21:46:40-04:00" See Also • The RDoc for the Time#strftime method lists most of the supported strftime directives (ri Time#strftime); for a more detailed and complete list, see the table in Recipe 3.3, “Printing a Date” 3.3 Printing a Date Problem You want to print a date object as a string. Solution If you just want to look at a date, you can call Time#to_s or Date#to_s and not bother with fancy formatting: 3.3 Printing a Date | 97 require 'date' Time.now.to_s # => "Sat Mar 18 19:05:50 EST 2006" DateTime.now.to_s # => "2006-03-18T19:05:50-0500" If you need the date in a specific format, you’ll need to define that format as a string containing time-format directives. Pass the format string into Time#strftime or Date#strftime. You’ll get back a string in which the formatting directives have been replaced by the correpsonding parts of the Time or DateTime object. A formatting directive looks like a percent sign and a letter: %x. Everything in a for- mat string that’s not a formatting directive is treated as a literal: Time.gm(2006).strftime('The year is %Y!') # => "The year is 2006!" The Discussion lists all the time formatting directives defined by Time#strftime and Date#strftime. Here are some common time-formatting strings, shown against a sample date of about 1:30 in the afternoon, GMT, on the last day of 2005: time = Time.gm(2005, 12, 31, 13, 22, 33) american_date = '%D' time.strftime(american_date) # => "12/31/05" european_date = '%d/%m/%y' time.strftime(european_date) # => "31/12/05" four_digit_year_date = '%m/%d/%Y' time.strftime(four_digit_year_date) # => "12/31/2005" date_and_time = '%m-%d-%Y %H:%M:%S %Z' time.strftime(date_and_time) # => "12-31-2005 13:22:33 GMT" twelve_hour_clock_time = '%m-%d-%Y %I:%M:%S %p' time.strftime(twelve_hour_clock_time) # => "12-31-2005 01:22:33 PM" word_date = '%A, %B %d, %Y' time.strftime(word_date) # => "Saturday, December 31, 2005" Discussion Printed forms, parsers, and people can all be verypickyabout the formatting of dates. Having a date in a standard format makes dates easier to read and scan for errors. Agreeing on a format also prevents ambiguities (is 4/12 the fourth of Decem- ber, or the twelfth of April?) If you require 'time', your Time objects will sprout special-purpose formatting meth- ods for common date representation standards: Time#rfc822, Time#httpdate, and Time#iso8601. These make it easyfor youto print dates in formats compliant with email, HTTP, and XML standards: require 'time' time.rfc822 # => "Sat, 31 Dec 2005 13:22:33 -0000" time.httpdate # => "Sat, 31 Dec 2005 13:22:33 GMT" time.iso8601 # => "2005-12-31T13:22:33Z" DateTime provides onlyone of these three formats. ISO8601 is the the default string representation of a DateTime object (the one you get by calling #to_s). This means 98 | Chapter 3: Date and Time you can easily print DateTime objects into XML documents without having to con- vert them into Time objects. For the other two formats, your best strategy is to convert the DateTime into a Time object (see Recipe 3.9 for details). Even on a system with a 32-bit time counter, your DateTime objects will probablyfit into the 1901–2037 yearrange supported by Time, since RFC822 and HTTP dates are almost always used with dates in the recent past or near future. Sometimes you need to define a custom date format. Time#strftime and Date#strftime define manydirectives for use in format strings. The big table below sayswhat theydo. You can combine these in anycombination within a formatting string. Some of these maybe familiar to youfrom other programming languages; virtually all languages since C have included a strftime implementation that uses some of these directives. Some of the directives are unique to Ruby. Formatting directive What it does Example for 13:22:33 on December 31, 2005 %A English day of the week “Saturday” %a Abbreviated English day of the week “Sat” %B English month of the year “December” %b English month of the year “Dec” %C The century part of the year, zero-padded if necessary. “20” %c This prints the date and time in a way that looks like the default string representation of Time, but without the timezone. Equivalent to ‘%a %b %e %H:%M:%S %Y’ “Sat Dec 31 13:22:33 2005” %D American-style short date format with two-digit year. Equivalent to “%m/%d/%y” “12/31/05” %d Day of the month, zero-padded “31” %e Day of the month, not zero-padded “31” %F Short date format with 4-digit year.; equivalent to “%Y-%m-%d” “2005-12-31” %G Commercial year with century, zero-padded to a minimum of four digits and with a minus sign prepended for dates BCE (see Recipe 3.11. For the calendar year, use %Y) “2005” %g Year without century, zero-padded to two digits “05” %H Hour of the day, 24-hour clock, zero-padded to two digits “13” %h Abbreviated month of the year; the same as “%b” “Dec” %I Hour of the day, 12-hour clock, zero-padded to two digits “01” %j Julian day of the year, padded to three digits (from 001 to 366) “365” %k Hour of the day, 24-hour clock, not zero-padded; like %H but with no padding “13” 3.3 Printing a Date | 99 Date defines two formatting directives that won’t work at all in Time#strftime. Both are shortcuts for formatting strings that you could create manually. %l Hour of the day, 12-hour clock, not zero-padded; like %I but with no padding “1” %M Minute of the hour, padded to two digits “22” %m Month of the year, padded to two digits “12” %n A newline; don’t use this; just put a newline in the formatting string “\n” %P Lowercase meridian indicator (“am” or “pm”) “pm” %p Upper meridian indicator. Like %P, except gives “AM” or “PM”; yes, the uppercase P gives the lowercase meridian, and vice versa “PM” %R Short 24-hour time format; equivalent to “%H:%M” “13:22” %r Long 12-hour time format; equivalent to “%I:%M:%S %p” “01:22:33 PM” %S Second of the minute, zero-padded to two digits “33” %s Seconds since the Unix epoch “1136053353” %T Long 24-hour time format; equivalent to “%H:%M:%S” “13:22:33” %t A tab; don’t use this; just put a tab in the formatting string “\t” %U Calendar week number of the year: assumes that the first week of the year starts on the first Sunday; if a date comes before the first Sunday of the year, it’s counted as part of “week zero” and “00” is returned “52” %u Commercial weekday of the year, from 1 to 7, with Monday being day 1 “6” %V Commercial week number of the year (see Recipe 3.11) “52” %W The same as %V, but if a date is before the first Monday of the year, it’s counted as part of “week zero” and “00” is returned “52” %w Calendar day of the week, from 0 to 6, with Sunday being day 0 “6” %X Preferred representation for the time; equivalent to “%H:%M:%S” “13:22:33” %x Preferred representation for the date; equivalent to “%m/%d/%y” “12/31/05” %Y Year with century, zero-padded to four digits and with a minus sign prepended for dates BCE “2005” %y Year without century, zero-padded to two digits “05” %Z The timezone abbreviation (Time) or GMT offset (Date). Date will use “Z” instead of “+0000” if a time is in GMT “GMT” for Time, “Z” for Date %z The timezone as a GMT offset “+0000” %% A literal percent sign “%” Formatting directive What it does Example for 13:22:33 on December 31, 2005 100 | Chapter 3: Date and Time If you need a date format for which there’s no formatting directive, you should be able to compensate bywriting Rubycode. For instance, suppose youwant to format our example date as “The 31st of December”. There’s no special formatting directive tol print the dayas an ordinal number, but youcan use Rubycode to build a format- ting string that gives the right answer. class Time def day_ordinal_suffix if day == 11 or day == 12 return "th" else case day % 10 when 1 then return "st" when 2 then return "nd" when 3 then return "rd" else return "th" end end end end time.strftime("The %e#{time.day_ordinal_suffix} of %B") # => "The 31st of December" The actual formatting string differs depending on the date. In this case, it ends up “The %est of %B”, but for other dates it will be “The %end of %B”, “The %erd of %B”, or “The %eth of %B”. See Also • Time objects can parse common date formats as well as print them out; see Rec- ipe 3.2, “Parsing Dates, Preciselyor Fuzzily,”to see how to parse the output of strftime, rfc822, httpdate, and iso8661 • Recipe 3.11, “Handling Commercial Dates” 3.4 Iterating Over Dates Problem Given a point in time, you want to get somewhere else. Formatting directive What it does Example for 13:22:33 on December 31, 2005 %v European-style date format with month abbreviation; equiva- lent to “%e-%b-%Y” 31-Dec-2005 %+ Prints a Date object as though it were a Time object converted to a string; like %c, but includes the timezone information; equivalent to “%a %b %e %H:%M:%S %Z %Y” Sat Dec 31 13:22:33 Z 2005 3.4 Iterating Over Dates | 101 Solution All of Ruby’s time objects can be used in ranges as though they were numbers. Date and DateTime objects iterate in increments of one day, and Time objects iterate in increments of one second: require 'date' (Date.new(1776, 7, 2)..Date.new(1776, 7, 4)).each { |x| puts x } # 1776-07-02 # 1776-07-03 # 1776-07-04 span = DateTime.new(1776, 7, 2, 1, 30, 15)..DateTime.new(1776, 7, 4, 7, 0, 0) span.each { |x| puts x } # 1776-07-02T01:30:15Z # 1776-07-03T01:30:15Z # 1776-07-04T01:30:15Z (Time.at(100)..Time.at(102)).each { |x| puts x } # Wed Dec 31 19:01:40 EST 1969 # Wed Dec 31 19:01:41 EST 1969 # Wed Dec 31 19:01:42 EST 1969 Ruby’s Date class defines step and upto, the same convenient iterator methods used by numbers: the_first = Date.new(2004, 1, 1) the_fifth = Date.new(2004, 1, 5) the_first.upto(the_fifth) { |x| puts x } # 2004-01-01 # 2004-01-02 # 2004-01-03 # 2004-01-04 # 2004-01-05 Discussion Rubydate objects are stored internallyas numbers, and a range of those objects is treated like a range of numbers. For Date and DateTime objects, the internal represen- tation is the Julian day: iterating over a range of those objects adds one day at a time. For Time objects, the internal representation is the number of seconds since the Unix epoch: iterating over a range of Time objects adds one second at a time. Time doesn’t define the step and upto method, but it’s simple to add them: class Time def step(other_time, increment) raise ArgumentError, "step can't be 0" if increment == 0 increasing = self < other_time if (increasing && increment < 0) || (!increasing && increment > 0) yield self return 102 | Chapter 3: Date and Time end d = self begin yield d d += increment end while (increasing ? d <= other_time : d >= other_time) end def upto(other_time) step(other_time, 1) { |x| yield x } end end the_first = Time.local(2004, 1, 1) the_second = Time.local(2004, 1, 2) the_first.step(the_second, 60 * 60 * 6) { |x| puts x } # Thu Jan 01 00:00:00 EST 2004 # Thu Jan 01 06:00:00 EST 2004 # Thu Jan 01 12:00:00 EST 2004 # Thu Jan 01 18:00:00 EST 2004 # Fri Jan 02 00:00:00 EST 2004 the_first.upto(the_first) { |x| puts x } # Thu Jan 01 00:00:00 EST 2004 See Also • Recipe 2.15, “Generating a Sequence of Numbers” 3.5 Doing Date Arithmetic Problem You want to find how much time has elapsed between two dates, or add a number to a date to get an earlier or later date. Solution Adding or subtracting a Time object and a number adds or subtracts that number of seconds. Adding or subtracting a Date object and a number adds or subtracts that number of days: require 'date' y2k = Time.gm(2000, 1, 1) # => Sat Jan 01 00:00:00 UTC 2000 y2k + 1 # => Sat Jan 01 00:00:01 UTC 2000 y2k - 1 # => Fri Dec 31 23:59:59 UTC 1999 y2k + (60 * 60 * 24 * 365) # => Sun Dec 31 00:00:00 UTC 2000 y2k_dt = DateTime.new(2000, 1, 1) (y2k_dt + 1).to_s # => "2000-01-02T00:00:00Z" (y2k_dt - 1).to_s # => "1999-12-31T00:00:00Z" 3.5 Doing Date Arithmetic | 103 (y2k_dt + 0.5).to_s # => "2000-01-01T12:00:00Z" (y2k_dt + 365).to_s # => "2000-12-31T00:00:00Z" Subtracting one Time from another gives the interval between the dates, in seconds. Subtracting one Date from another gives the interval in days: day_one = Time.gm(1999, 12, 31) day_two = Time.gm(2000, 1, 1) day_two - day_one # => 86400.0 day_one - day_two # => -86400.0 day_one = DateTime.new(1999, 12, 31) day_two = DateTime.new(2000, 1, 1) day_two - day_one # => Rational(1, 1) day_one - day_two # => Rational(-1, 1) # Compare times from now and 10 seconds in the future. before_time = Time.now before_datetime = DateTime.now sleep(10) Time.now - before_time # => 10.003414 DateTime.now - before_datetime # => Rational(5001557, 43200000000) The activesupport gem, a prerequisite of Rubyon Rails, defines manyuseful func- tions on Numeric and Time for navigating through time:* require 'rubygems' require 'active_support' 10.days.ago # => Wed Mar 08 19:54:17 EST 2006 1.month.from_now # => Mon Apr 17 20:54:17 EDT 2006 2.weeks.since(Time.local(2006, 1, 1)) # => Sun Jan 15 00:00:00 EST 2006 y2k - 1.day # => Fri Dec 31 00:00:00 UTC 1999 y2k + 6.3.years # => Thu Apr 20 01:48:00 UTC 2006 6.3.years.since y2k # => Thu Apr 20 01:48:00 UTC 2006 Discussion Ruby’s date arithmetic takes advantage of the fact that Ruby’s time objects are stored internallyas numbers. Additions to dates and differences between dates are handled byadding to and subtracting the underlyingnumbers. This is whyadding 1 to a Time adds one second and adding 1 to a DateTime adds one day: a Time is stored as a num- ber of seconds since a time zero, and a Date or DateTime is stored as a number of days since a (different) time zero. Not everyarithmetic operation makes sense for dates: youcould “multiplytwo dates” by multiplying the underlying numbers, but that would have no meaning in terms of real time, so Rubydoesn’t define those operators. Once a number takes on * So does the Facets More library. 104 | Chapter 3: Date and Time aspects of the real world, there are limitations to what you can legitimately do to that number. Here’s a shortcut for adding or subtracting big chunks of time: using the right- or left-shift operators on a Date or DateTime object will add or subtract a certain num- ber number of months from the date. (y2k_dt >> 1).to_s # => "2000-02-01T00:00:00Z" (y2k_dt << 1).to_s # => "1999-12-01T00:00:00Z" You can get similar behavior with activesupport’s Numeric#month method, but that method assumes that a “month” is 30 days long, instead of dealing with the lengths of specific months: y2k + 1.month # => Mon Jan 31 00:00:00 UTC 2000 y2k - 1.month # => Thu Dec 02 00:00:00 UTC 1999 By contrast, if you end up in a month that doesn’t have enough days (for instance, you start on the 31st and then shift to a month that only has 30 days), the standard library will use the last day of the new month: # Thirty days hath September... halloween = Date.new(2000, 10, 31) (halloween << 1).to_s # => "2000-09-30" (halloween >> 1).to_s # => "2000-11-30" (halloween >> 2).to_s # => "2000-12-31" leap_year_day = Date.new(1996, 2, 29) (leap_year_day << 1).to_s # => "1996-01-29" (leap_year_day >> 1).to_s # => "1996-03-29" (leap_year_day >> 12).to_s # => "1997-02-28" (leap_year_day << 12 * 4).to_s # => "1992-02-29" See Also • Recipe 3.4, “Iterating Over Dates” • Recipe 3.6, “Counting the Days Since an Arbitrary Date” • The RDoc for Rails’ ActiveSupport::CoreExtensions::Numeric::Time module (http:// api.rubyonrails.com/classes/ActiveSupport/CoreExtensions/Numeric/Time.html) 3.6 Counting the Days Since an Arbitrary Date Problem You want to see how manydayshave elapsed since a particular date, or how many remain until a date in the future. 3.6 Counting the Days Since an Arbitrary Date | 105 Solution Subtract the earlier date from the later one. If you’re using Time objects, the result will be a floating-point number of seconds, so divide by the number of seconds in a day: def last_modified(file) t1 = File.stat(file).ctime t2 = Time.now elapsed = (t2-t1)/(60*60*24) puts "#{file} was last modified #{elapsed} days ago." end last_modified("/etc/passwd") # /etc/passwd was last modified 125.873605469919 days ago. last_modified("/home/leonardr/") # /home/leonardr/ was last modified 0.113293513796296 days ago. If you’re using DateTime objects, the result will be a rational number. You’ll probably want to convert it to an integer or floating-point number for display: require 'date' def advent_calendar(date=DateTime.now) christmas = DateTime.new(date.year, 12, 25) christmas = DateTime.new(date.year+1, 12, 25) if date > christmas difference = (christmas-date).to_i if difference == 0 puts "Today is Christmas." else puts "Only #{difference} day#{"s" unless difference==1} until Christmas." end end advent_calendar(DateTime.new(2006, 12, 24)) # Only 1 day until Christmas. advent_calendar(DateTime.new(2006, 12, 25)) # Today is Christmas. advent_calendar(DateTime.new(2006, 12, 26)) # Only 364 days until Christmas. Discussion Since times are stored internallyas numbers, subtracting one from another will give you a number. Since both numbers measure the same thing (time elapsed since some “time zero”), that number will actuallymean something: it’ll be the number of sec- onds or days that separate the two times on the timeline. Of course, this works with other time intervals as well. To displaya difference in hours, for Time objects divide the difference bythe number of seconds in an hour (3,600, or 1.hour if you’re using Rails). For DateTime objects, divide bythe number of days in an hour (that is, multiply the difference by 24): sent = DateTime.new(2006, 10, 4, 3, 15) received = DateTime.new(2006, 10, 5, 16, 33) 106 | Chapter 3: Date and Time elapsed = (received-sent) * 24 puts "You responded to my email #{elapsed.to_f} hours after I sent it." # You responded to my email 37.3 hours after I sent it. You can even use divmod on a time interval to hack it down into smaller and smaller pieces. Once when I was in college, I wrote a script that displayed how much time remained until the finals I should have been studying for. This method gives you a countdown of the days, hours, minutes, and seconds until some scheduled event: require 'date' def remaining(date, event) intervals = [["day", 1], ["hour", 24], ["minute", 60], ["second", 60]] elapsed = DateTime.now - date tense = elapsed > 0 ? "since" : "until" interval = 1.0 parts = intervals.collect do |name, new_interval| interval /= new_interval number, elapsed = elapsed.abs.divmod(interval) "#{number.to_i} #{name}#{"s" unless number == 1}" end puts "#{parts.join(", ")} #{tense} #{event}." end remaining(DateTime.new(2006, 4, 15, 0, 0, 0, DateTime.now.offset), "the book deadline") # 27 days, 4 hours, 16 minutes, 9 seconds until the book deadline. remaining(DateTime.new(1999, 4, 23, 8, 0, 0, DateTime.now.offset), "the Math 114A final") # 2521 days, 11 hours, 43 minutes, 50 seconds since the Math 114A final. See Also • Recipe 3.5, “Doing Date Arithmetic” 3.7 Converting Between Time Zones Problem You want to change a time object so that it represents the same moment of time in some other time zone. Solution The most common time zone conversions are the conversion of system local time to UTC, and the conversion of UTC to local time. These conversions are easyfor both Time and DateTime objects. The Time#gmtime method modifies a Time object in place, converting it to UTC. The Time#localtime method converts in the opposite direction: 3.7 Converting Between Time Zones | 107 now = Time.now # => Sat Mar 18 20:15:58 EST 2006 now = now.gmtime # => Sun Mar 19 01:15:58 UTC 2006 now = now.localtime # => Sat Mar 18 20:15:58 EST 2006 The DateTime.new_offset method converts a DateTime object from one time zone to another. You must pass in the dstination time zone’s offset from UTC; to convert local time to UTC, pass in zero. Since DateTime objects are immutable, this method creates a new object identical to the old DateTime object, except for the time zone offset: require 'date' local = DateTime.now local.to_s # => "2006-03-18T20:15:58-0500" utc = local.new_offset(0) utc.to_s # => "2006-03-19T01:15:58Z" To convert a UTC DateTime object to local time, you’ll need to call DateTime#new_ offset and pass in the numeric offset for your local time zone. The easiest way to get this offset is to call offset on a DateTime object known to be in local time. The offset will usually be a rational number with a denominator of 24: local = DateTime.now utc = local.new_offset local.offset # => Rational(-5, 24) local_from_utc = utc.new_offset(local.offset) local_from_utc.to_s # => "2006-03-18T20:15:58-0500" local == local_from_utc # => true Discussion Time objects created with Time.at, Time.local, Time.mktime, Time.new, and Time.now are created using the current system time zone. Time objects created with Time.gm and Time.utc are created using the UTC time zone. Time objects can represent anytime zone, but it’s difficult to use a time zone with Time other than local time or UTC. Suppose you need to convert local time to some time zone other than UTC. If you know the UTC offset for the destination time zone, you can represent it as a fraction of a day and pass it into DateTime#new_offset: #Convert local (Eastern) time to Pacific time eastern = DateTime.now eastern.to_s # => "2006-03-18T20:15:58-0500" pacific_offset = Rational(-7, 24) pacific = eastern.new_offset(pacific_offset) pacific.to_s # => "2006-03-18T18:15:58-0700" DateTime#new_offset can convert between arbitrarytime zone offsets, so for time zone conversions, it’s easiest to use DateTime objects and convert back to Time objects if necessary. But DateTime objects onlyunderstand time zones in terms of numeric UTC offsets. How can you convert a date and time to UTC when all you know is that the time zone is called “WET”, “Zulu”, or “Asia/Taskent”? 108 | Chapter 3: Date and Time On Unix systems, you can temporarily change the “system” time zone for the cur- rent process. The C libraryunderlyingthe Time class knows about an enormous number of time zones (this “zoneinfo” database is usuallylocated in /usr/share/ zoneinfo/, if you want to look at the available time zones). You can tap this knowl- edge bysetting the environment variable TZ to an appropriate value, forcing the Time class to act as though your computer were in some other time zone. Here’s a method that uses this trick to convert a Time object to anytime zone supported bythe under- lying C library: class Time def convert_zone(to_zone) original_zone = ENV["TZ"] utc_time = dup.gmtime ENV["TZ"] = to_zone to_zone_time = utc_time.localtime ENV["TZ"] = original_zone return to_zone_time end end Let’s do a number of conversions of a local (Eastern) time to other time zones across the world: t = Time.at(1000000000) # => Sat Sep 08 21:46:40 EDT 2001 t.convert_zone("US/Pacific") # => Sat Sep 08 18:46:40 PDT 2001 t.convert_zone("US/Alaska") # => Sat Sep 08 17:46:40 AKDT 2001 t.convert_zone("UTC") # => Sun Sep 09 01:46:40 UTC 2001 t.convert_zone("Turkey") # => Sun Sep 09 04:46:40 EEST 2001 Note that some time zones, like India’s, are half an hour offset from most others: t.convert_zone("Asia/Calcutta") # => Sun Sep 09 07:16:40 IST 2001 Bysetting the TZ environment variable before creating a Time object, you can repre- sent the time in anytime zone. The following code converts Lagos time to Singapore time, regardless of the “real” underlying time zone. ENV["TZ"] = "Africa/Lagos" t = Time.at(1000000000) # => Sun Sep 09 02:46:40 WAT 2001 ENV["TZ"] = nil t.convert_zone("Singapore") # => Sun Sep 09 09:46:40 SGT 2001 # Just to prove it's the same time as before: t.convert_zone("US/Eastern") # => Sat Sep 08 21:46:40 EDT 2001 Since the TZ environment variable is global to a process, you’ll run into problems if you have multiple threads trying to convert time zones at once. 3.8 Checking Whether Daylight Saving Time Is in Effect | 109 See Also • Recipe 3.9, “Converting Between Time and DateTime Objects” • Recipe 3.8, “Checking Whether Daylight Saving Time Is in Effect” • Information on the “zoneinfo” database (http://www.twinsun.com/tz/tz-link.htm) 3.8 Checking Whether Daylight Saving Time Is in Effect Problem You want to see whether the current time in your locale is normal time or Daylight Saving/Summer Time. Solution Create a Time object and check its isdst method: Time.local(2006, 1, 1) # => Sun Jan 01 00:00:00 EST 2006 Time.local(2006, 1, 1).isdst # => false Time.local(2006, 10, 1) # => Sun Oct 01 00:00:00 EDT 2006 Time.local(2006, 10, 1).isdst # => true Discussion Time objects representing UTC times will always return false when isdst is called, because UTC is the same year-round. Other Time objects will consult the daylight saving time rules for the time locale used to create the Time object. This is usuallythe sysem locale on the computer you used to create it: see Recipe 3.7 for information on changing it. The following code demonstrates some of the rules pertaining to Day- light Saving Time across the United States: eastern = Time.local(2006, 10, 1) # => Sun Oct 01 00:00:00 EDT 2006 eastern.isdst # => true ENV['TZ'] = 'US/Pacific' pacific = Time.local(2006, 10, 1) # => Sun Oct 01 00:00:00 PDT 2006 pacific.isdst # => true # Except for the Navajo Nation, Arizona doesn't use Daylight Saving Time. ENV['TZ'] = 'America/Phoenix' arizona = Time.local(2006, 10, 1) # => Sun Oct 01 00:00:00 MST 2006 arizona.isdst # => false # Finally, restore the original time zone. ENV['TZ'] = nil The C libraryon which Ruby’s Time class is based handles the complex rules for Day- light Saving Time across the historyof a particular time zone or locale. For instance, 110 | Chapter 3: Date and Time Daylight Saving Time was mandated across the U.S. in 1918, but abandoned in most locales shortlyafterwards. The “zoneinfo” file used bythe C librarycontains this information, along with many other rules: # Daylight saving first took effect on March 31, 1918. Time.local(1918, 3, 31).isdst # => false Time.local(1918, 4, 1).isdst # => true Time.local(1919, 4, 1).isdst # => true # The federal law was repealed later in 1919, but some places # continued to use Daylight Saving Time. ENV['TZ'] = 'US/Pacific' Time.local(1920, 4, 1) # => Thu Apr 01 00:00:00 PST 1920 ENV['TZ'] = nil Time.local(1920, 4, 1) # => Thu Apr 01 00:00:00 EDT 1920 # Daylight Saving Time was reintroduced during the Second World War. Time.local(1942,2,9) # => Mon Feb 09 00:00:00 EST 1942 Time.local(1942,2,10) # => Tue Feb 10 00:00:00 EWT 1942 # EWT stands for "Eastern War Time" A U.S. law passed in 2005 expands Daylight Saving Time into March and November, beginning in 2007. Depending on how old your zoneinfo file is, Time objects you create for dates in 2007 and beyond might or might not reflect the new law. Time.local(2007, 3, 13) # => Tue Mar 13 00:00:00 EDT 2007 # Your computer may incorrectly claim this time is EST. This illustrates a general point. There’s nothing your elected officials love more than passing laws, so you shouldn’t rely on isdst to be accurate for any Time objects that represent times a year or more into the future. When that time actually comes around, Daylight Saving Time might obey different rules in your locale. The Date class isn’t based on the C library, and knows nothing about time zones or locales, so it also knows nothing about Daylight Saving Time. See Also • Recipe 3.7, “Converting Between Time Zones” • Information on the “zoneinfo” database (http://www.twinsun.com/tz/tz-link.htm) 3.9 Converting Between Time and DateTime Objects Problem You’re working with both DateTime and Time objects, created from Ruby’s two stan- dard date/time libraries. You can’t mix these objects in comparisons, iterations, or date arithmetic because they’re incompatible. You want to convert all the objects into one form or another so that you can treat them all the same way. 3.9 Converting Between Time and DateTime Objects | 111 Solution To convert a Time object to a DateTime, you’ll need some code like this: require 'date' class Time def to_datetime # Convert seconds + microseconds into a fractional number of seconds seconds = sec + Rational(usec, 10**6) # Convert a UTC offset measured in minutes to one measured in a # fraction of a day. offset = Rational(utc_offset, 60 * 60 * 24) DateTime.new(year, month, day, hour, min, seconds, offset) end end time = Time.gm(2000, 6, 4, 10, 30, 22, 4010) # => Sun Jun 04 10:30:22 UTC 2000 time.to_datetime.to_s # => "2000-06-04T10:30:22Z" Converting a DateTime to a Time is similar; you just need to decide whether you want the Time object to use local time or GMT. This code adds the conversion method to Date, the superclass of DateTime, so it will work on both Date and DateTime objects. class Date def to_gm_time to_time(new_offset, :gm) end def to_local_time to_time(new_offset(DateTime.now.offset-offset), :local) end private def to_time(dest, method) #Convert a fraction of a day to a number of microseconds usec = (dest.sec_fraction * 60 * 60 * 24 * (10**6)).to_i Time.send(method, dest.year, dest.month, dest.day, dest.hour, dest.min, dest.sec, usec) end end (datetime = DateTime.new(1990, 10, 1, 22, 16, Rational(41,2))).to_s # => "1990-10-01T22:16:20Z" datetime.to_gm_time # => Mon Oct 01 22:16:20 UTC 1990 datetime.to_local_time # => Mon Oct 01 17:16:20 EDT 1990 112 | Chapter 3: Date and Time Discussion Ruby’s two ways of representing dates and times don’t coexist very well. But since neither can be a total substitute for the other, you’ll probably use them both during your Ruby career. The conversion methods let you get around incompatibilities by simply converting one type to the other: time < datetime # ArgumentError: comparison of Time with DateTime failed time.to_datetime < datetime # => false time < datetime.to_gm_time # => false time - datetime # TypeError: can't convert DateTime into Float (time.to_datetime - datetime).to_f # => 3533.50973962975 # Measured in days time - datetime.to_gm_time # => 305295241.50401 # Measured in seconds The methods defined above are reversible: you can convert back and forth between Date and DateTime objects without losing accuracy. time # => Sun Jun 04 10:30:22 UTC 2000 time.usec # => 4010 time.to_datetime.to_gm_time # => Sun Jun 04 10:30:22 UTC 2000 time.to_datetime.to_gm_time.usec # => 4010 datetime.to_s # => "1990-10-01T22:16:20Z" datetime.to_gm_time.to_datetime.to_s # => "1990-10-01T22:16:20Z" Once you can convert between Time and DateTime objects, it’s simple to write code that normalizes a mixed array, so that all its elements end up being of the same type. This method tries to turn a mixed arrayinto an arraycontaining only Time objects. If it encounters a date that won’t fit within the constraints of the Time class, it starts over and converts the arrayinto an arrayof DateTime objects instead (thus losing any information about Daylight Saving Time): def normalize_time_types(array) # Don't do anything if all the objects are already of the same type. first_class = array[0].class first_class = first_class.super if first_class == DateTime return unless array.detect { |x| !x.is_a?(first_class) } normalized = array.collect do |t| if t.is_a?(Date) begin t.to_local_time rescue ArgumentError # Time out of range; convert to DateTimes instead. convert_to = DateTime break 3.10 Finding the Day of the Week | 113 end else t end end unless normalized normalized = array.collect { |t| t.is_a?(Time) ? t.to_datetime : t } end return normalized end When all objects in a mixed arraycan be represented as either Time or DateTime objects, this method makes them all Time objects: mixed_array = [Time.now, DateTime.now] # => [Sat Mar 18 22:17:10 EST 2006, # #] normalize_time_types(mixed_array) # => [Sat Mar 18 22:17:10 EST 2006, Sun Mar 19 03:17:10 EST 2006] If one of the DateTime objects can’t be represented as a Time, normalize_time_types turns all the objects into DateTime instances. This code is run on a system with a 32- bit time counter: mixed_array << DateTime.civil(1776, 7, 4) normalize_time_types(mixed_array).collect { |x| x.to_s } # => ["2006-03-18T22:17:10-0500", "2006-03-18T22:17:10-0500", # => "1776-07-04T00:00:00Z"] See Also • Recipe 3.1, “Finding Today’s Date” 3.10 Finding the Day of the Week Problem You want to find the day of the week for a certain date. Solution Use the wday method (supported byboth Time and DateTime) to find the dayof the week as a number between 0 and 6. Sunday is day zero. The following code yields to a code block the date of every Sunday between two dates. It uses wday to find the first Sundayfollowing the start date (keeping in mind that the first date may itself be a Sunday). Then it adds seven days at a time to get subsequent Sundays: def every_sunday(d1, d2) # You can use 1.day instead of 60*60*24 if you're using Rails. 114 | Chapter 3: Date and Time one_day = d1.is_a?(Time) ? 60*60*24 : 1 sunday = d1 + ((7-d1.wday) % 7) * one_day while sunday < d2 yield sunday sunday += one_day * 7 end end def print_every_sunday(d1, d2) every_sunday(d1, d2) { |sunday| puts sunday.strftime("%x")} end print_every_sunday(Time.local(2006, 1, 1), Time.local(2006, 2, 4)) # 01/01/06 # 01/08/06 # 01/15/06 # 01/22/06 # 01/29/06 Discussion The most commonlyused parts of a time are its calendar and clock readings: year, day, hour, and so on. Time and DateTime let you access these, but they also give you access to a few other aspects of a time: the Julian dayof the year( yday), and, more usefully, the day of the week (wday). The every_sunday method will accept either two Time objects or two DateTime objects. The onlydifference is the number youneed to add to an object to increment it by one day. If you’re only going to be using one kind of object, you can simplify the code a little. To get the day of the week as an English string, use the strftime directives %A and %a: t = Time.local(2006, 1, 1) t.strftime("%A %A %A!") # => "Sunday Sunday Sunday!" t.strftime("%a %a %a!") # => "Sun Sun Sun!" You can find the dayof the week and the dayof the year,but Rubyhas no built-in method for finding the week of the year (there is a method to find the commercial week of the year; see Recipe 3.11). If you need such a method, it’s not hard to create one using the dayof the yearand the dayof the week. This code defines a week method in a module, which it mixes in to both Date and Time: require 'date' module Week def week (yday + 7 - wday) / 7 end end class Date include Week end 3.11 Handling Commercial Dates | 115 class Time include Week end saturday = DateTime.new(2005, 1, 1) saturday.week # => 0 (saturday+1).week # => 1 #Sunday, January 2 (saturday-1).week # => 52 #Friday, December 31 See Also • Recipe 3.3, “Printing a Date” • Recipe 3.5, “Doing Date Arithmetic” • Recipe 3.11, “Handling Commercial Dates” 3.11 Handling Commercial Dates Problem When writing a business or financial application, you need to deal with commercial dates instead of civil or calendar dates. Solution DateTime offers some methods for working with commercial dates. Date#cwday gives the commercial dayof the week, Date#cweek gives the commercial week of the year, and Date#cwyear gives the commercial year. Consider January1, 2006. This was the first dayof calendar 2006, but since it was a Sunday, it was the last day of commercial 2005: require 'date' sunday = DateTime.new(2006, 1, 1) sunday.year # => 2006 sunday.cwyear # => 2005 sunday.cweek # => 52 sunday.wday # => 0 sunday.cwday # => 7 Commercial 2006 started on the first weekday in 2006: monday = sunday + 1 monday.cwyear # => 2006 monday.cweek # => 1 Discussion Unless you’re writing an application that needs to use commercial dates, you proba- blydon’t care about this, but it’s kind of interesting (if youthink dates are interest- ing). The commercial week starts on Monday, not Sunday, because Sunday’s part of 116 | Chapter 3: Date and Time the weekend. DateTime#cwday is just like DateTime#wday, except it gives Sundaya value of seven instead of zero. This means that DateTime#cwday has a range from one to seven instead of from zero to six: (sunday...sunday+7).each do |d| puts "#{d.strftime("%a")} #{d.wday} #{d.cwday}" end # Sun 0 7 # Mon 1 1 # Tue 2 2 # Wed 3 3 # Thu 4 4 # Fri 5 5 # Sat 6 6 The cweek and cwyear methods have to do with the commercial year, which starts on the first Monday of a year. Any days before the first Monday are considered part of the previous commercial year. The example given in the Solution demonstrates this: January1, 2006 was a Sunday,so bythe commercial reckoning it was part of the last week of 2005. See Also • See Recipe 3.3, “Printing a Date,” for the strftime directives used to print parts of commercial dates 3.12 Running a Code Block Periodically Problem You want to run some Rubycode (such as a call to a shell command) repeatedlyat a certain interval. Solution Create a method that runs a code block, then sleeps until it’s time to run the block again: def every_n_seconds(n) loop do before = Time.now yield interval = n-(Time.now-before) sleep(interval) if interval > 0 end end 3.12 Running a Code Block Periodically | 117 every_n_seconds(5) do puts "At the beep, the time will be #{Time.now.strftime("%X")}... beep!" end # At the beep, the time will be 12:21:28... beep! # At the beep, the time will be 12:21:33... beep! # At the beep, the time will be 12:21:38... beep! # ... Discussion There are two main times when you’d want to run some code periodically. The first is when you actually want something to happen at a particular interval: say you’re appending your status to a log file every 10 seconds. The other is when you would prefer for something to happen continuously, but putting it in a tight loop would be bad for system performance. In this case, you compromise by putting some slack time in the loop so that your code isn’t always running. The implementation of every_n_seconds deducts from the sleep time the time spent running the code block. This ensures that calls to the code block are spaced evenly apart, as close to the desired interval as possible. If you tell every_n_seconds to call a code block everyfive seconds, but the code block takes four seconds to run, every_n_ seconds onlysleeps for one second. If the code block takes six seconds to run, every_ n_seconds won’t sleep at all: it’ll come back from a call to the code block, and imme- diately yield to the block again. If you always want to sleep for a certain interval, no matter how long the code block takes to run, you can simplify the code: def every_n_seconds(n) loop do yield sleep(n) end end In most cases, you don’t want every_n_seconds to take over the main loop of your program. Here’s a version of every_n_seconds that spawns a separate thread to run your task. If your code block stops the loop by with the break keyword, the thread stops running: def every_n_seconds(n) thread = Thread.new do while true before = Time.now yield interval = n-(Time.now-before) sleep(interval) if interval > 0 end end return thread end 118 | Chapter 3: Date and Time In this snippet, I use every_n_seconds to spy on a file, waiting for people to modify it: def monitor_changes(file, resolution=1) last_change = Time.now every_n_seconds(resolution) do check = File.stat(file).ctime if check > last_change yield file last_change = check elsif Time.now - last_change > 60 puts "Nothing's happened for a minute, I'm bored." break end end end That example might give output like this, if someone on the system is working on the file /tmp/foo: thread = monitor_changes("/tmp/foo") { |file| puts "Someone changed #{file}!" } # "Someone changed /tmp/foo!" # "Someone changed /tmp/foo!" # "Nothing's happened for a minute; I'm bored." thread.status # => false See Also • Recipe 3.13, “Waiting a Certain Amount of Time” • Recipe 23.4, “Running Periodic Tasks Without cron or at” 3.13 Waiting a Certain Amount of Time Problem You want to pause your program, or a single thread of it, for a specific amount of time. Solution The Kernel#sleep method takes a floating-point number and puts the current thread to sleep for some (possibly fractional) number of seconds: 3.downto(1) { |i| puts "#{i}..."; sleep(1) }; puts "Go!" # 3... # 2... # 1... # Go! Time.new # => Sat Mar 18 21:17:58 EST 2006 sleep(10) 3.13 Waiting a Certain Amount of Time | 119 Time.new # => Sat Mar 18 21:18:08 EST 2006 sleep(1) Time.new # => Sat Mar 18 21:18:09 EST 2006 # Sleep for less then a second. Time.new.usec # => 377185 sleep(0.1) Time.new.usec # => 479230 Discussion Timers are often used when a program needs to interact with a source much slower than a computer’s CPU: a network pipe, or human eyes and hands. Rather than con- stantlypoll for new data, a Rubyprogram can sleep for a fraction of a second between each poll, giving other programs on the CPU a chance to run. That’s not much time byhuman standards, but sleeping for a fraction of a second at a time can greatly improve a system’s overall performance. You can pass anyfloating-point number to sleep, but that gives an exaggerated pic- ture of how finely you can control a thread’s sleeping time. For instance, you can’t sleep for 10-50 seconds, because it’s physically impossible (that’s less than the Planck time). You can’t sleep for Float::EPSILON seconds, because that’s almost certainly less than the resolution of your computer’s timer. You probablycan’t even reliably sleep for a microsecond, even though most modern computer clocks have microsecond precision. Bythe time your sleep command is processed bythe Rubyinterpreter and the thread actuallystarts waiting for its timer to go off, some small amount of time has alreadyelapsed. At verysmall intervals, this time can be greater than the time you asked Ruby to sleep in the first place. Here’s a simple benchmark that shows how long sleep on your system will actually make a thread sleep. It starts with a sleep interval of one second, which is fairlyaccu- rate. It then sleeps for shorter and shorter intervals, with lessening accuracy each time: interval = 1.0 10.times do |x| t1 = Time.new sleep(interval) actual = Time.new - t1 difference = (actual-interval).abs percent_difference = difference / interval * 100 printf("Expected: %.9f Actual: %.6f Difference: %.6f (%.2f%%)\n", interval, actual, difference, percent_difference) interval /= 10 end # Expected: 1.000000000 Actual: 0.999420 Difference: 0.000580 (0.06%) # Expected: 0.100000000 Actual: 0.099824 Difference: 0.000176 (0.18%) # Expected: 0.010000000 Actual: 0.009912 Difference: 0.000088 (0.88%) # Expected: 0.001000000 Actual: 0.001026 Difference: 0.000026 (2.60%) 120 | Chapter 3: Date and Time # Expected: 0.000100000 Actual: 0.000913 Difference: 0.000813 (813.00%) # Expected: 0.000010000 Actual: 0.000971 Difference: 0.000961 (9610.00%) # Expected: 0.000001000 Actual: 0.000975 Difference: 0.000974 (97400.00%) # Expected: 0.000000100 Actual: 0.000015 Difference: 0.000015 (14900.00%) # Expected: 0.000000010 Actual: 0.000024 Difference: 0.000024 (239900.00%) # Expected: 0.000000001 Actual: 0.000016 Difference: 0.000016 (1599900.00%) A small amount of the reported time comes from overhead, caused bycreating the second Time object, but not enough to affect these results. On mysystem,if I tell Rubyto sleep for a millisecond, the time spent running the sleep call greatlyexceeds the time I wanted to sleep in the first place! According to this benchmark, the short- est length of time for which I can expect sleep to accuratelysleep is about 1/100 of a second. You might think to get better sleep resolution byputting the CPU into a tight loop with a certain number of repetitions. Apart from the obvious problems (this hurts system performance, and the same loop will run faster over time since computers are always getting faster), this isn’t even reliable. The operating system doesn’t know you’re trying to run a timing loop: it just sees you using the CPU, and it can interrupt your loop at any time, for any length of time, to let some other process use the CPU. Unless you’re on an embedded operating sys- tem where you can control exactly what the CPU does, the only reliable way to wait for a specific period of time is with sleep. Waking up early The sleep method will end earlyif the thread that calls it has its run method called. If you want a thread to sleep until another thread wakes it up, use Thread.stop: alarm = Thread.new(self) { sleep(5); Thread.main.wakeup } puts "Going to sleep for 1000 seconds at #{Time.new}..." sleep(10000); puts "Woke up at #{Time.new}!" # Going to sleep for 1000 seconds at Thu Oct 27 14:45:14 PDT 2005... # Woke up at Thu Oct 27 14:45:19 PDT 2005! alarm = Thread.new(self) { sleep(5); Thread.main.wakeup } puts "Goodbye, cruel world!"; Thread.stop; puts "I'm back; how'd that happen?" # Goodbye, cruel world! # I'm back; how'd that happen? See Also • Recipe 3.12, “Running a Code Block Periodically” • Chapter 20 • The Morse Code example in Recipe 21.11, “Making Your Keyboard Lights Blink,” displays an interesting use of sleep 3.14 Adding a Timeout to a Long-Running Operation | 121 3.14 Adding a Timeout to a Long-Running Operation Problem You’re running some code that might take a long time to complete, or might never complete at all. You want to interrupt the code if it takes too long. Solution Use the built-in timeout library. The Timeout.timeout method takes a code block and a deadline (in seconds). If the code block finishes running in time, it returns true. If the deadline passes and the code block is still running, Timeout.timeout terminates the code block and raises an exception. The following code would never finish running were it not for the timeout call. But after five seconds, timeout raises a Timeout::Error and execution halts: # This code will sleep forever... OR WILL IT? require 'timeout' before = Time.now begin status = Timeout.timeout(5) { sleep } rescue Timeout::Error puts "I only slept for #{Time.now-before} seconds." end # I only slept for 5.035492 seconds. Discussion Sometimes you must make a network connection or take some other action that might be incrediblyslow, or that might never complete at all. With a timeout, you can impose an upper limit on how long that operation can take. If it fails, you can try it again later, or forge ahead without the information you were trying to get. Even when you can’t recover, you can report your failure and gracefully exit the program, rather than sitting around forever waiting for the operation to complete. Bydefault, Timeout.timeout raises a Timeout::Error. You can pass in a custom excep- tion class as the second argument to Timeout.timeout: this saves you from having to rescue the Timeout:Error just so you can raise some other error that your application knows how to handle. If the code block had side effects, theywill still be visible after the timeout kills the code block: def count_for_five_seconds $counter = 0 begin Timeout::timeout(5) { loop { $counter += 1 } } rescue Timeout::Error puts "I can count to #{$counter} in 5 seconds." 122 | Chapter 3: Date and Time end end count_for_five_seconds # I can count to 2532825 in 5 seconds. $counter # => 2532825 This may mean that your dataset is now in an inconsistent state. See Also • ri Timeout • Recipe 3.13, “Waiting a Certain Amount of Time” • Recipe 14.1, “Grabbing the Contents of a Web Page” 123 Chapter 4 CHAPTER 4 Arrays4 Like all high-level languages, Rubyhas built-in support for arrays, objects that con- tain ordered lists of other objects. You can use arrays (often in conjunction with hashes) to build and use complex data structures without having to define anycus- tom classes. An arrayin Rubyis an ordered list of elements. Each element is a reference to some object, the waya Rubyvariable is a reference to some object. For convenience, throughout this book we usuallytalk about arraysas though the arrayelements were the actual objects, not references to the objects. Since Ruby(unlike languages like C) gives no way of manipulating object references directly, the distinction rarely matters. The simplest wayto create a new arrayis to put a comma-separated list of object ref- erences between square brackets. The object references can be predefined variables (my_var), anonymous objects created on the spot ('my string', 4.7,orMyClass.new), or expressions (a+b, object.method). A single arraycan contain references to objects of many different types: a1 = [] # => [] a2 = [1, 2, 3] # => [1, 2, 3] a3 = [1, 2, 3, 'a', 'b', 'c', nil] # => [1, 2, 3, "a", "b", "c", nil] n1 = 4 n2 = 6 sum_and_difference = [n1, n2, n1+n2, n1-n2] # => [4, 6, 10, -2] If your array contains only strings, you may find it simpler to build your array by enclosing the strings in the w{} syntax, separated by whitespace. This saves you from having to write all those quotes and comma: %w{1 2 3} # => ["1", "2", "3"] %w{The rat sat on the mat} # => ["The", "rat", "sat", "on", "the", "mat"] 124 | Chapter 4: Arrays The << operator is the simplest way to add a value to an array. Ruby dynamically resizes arrays as elements are added and removed. a = [1, 2, 3] # => [1, 2, 3] a << 4.0 # => [1, 2, 3, 4.0] a << 'five' # => [1, 2, 3, 4.0, "five"] An arrayelement can be anyobject reference, including a reference to another array. An arraycan even contain a reference to itself, though this is usuallya bad idea, since it can send your code into infinite loops. a = [1,2,3] # => [1, 2, 3] a << [4, 5, 6] # => [1, 2, 3, [4, 5, 6]] a << a # => [1, 2, 3, [4, 5, 6], [...]] As in most other programming languages, the elements of an arrayare numbered with indexes starting from zero. An arrayelement can be looked up bypassing its index into the arrayindex operator []. The first element of an arraycan be accessed with a[0], the second with a[1], and so on. Negative indexes count from the end of the array: the last element of an array can be accessed with a[-1], the second-to-last with a[-2], and so on. See Recipe 4.13 for more ways of using the array indexing operator. The size of an arrayis available through the Array#size method. Because the index numbering starts from zero, the index of the last element of an arrayis the size of the array, minus one. a = [1, 2, 3, [4, 5, 6]] a.size # => 4 a << a # => [1, 2, 3, [4, 5, 6], [...]] a.size # => 5 a[0] # => 1 a[3] # => [4, 5, 6] a[3][0] # => 4 a[3].size # => 3 a[-2] # => [4, 5, 6] a[-1] # => [1, 2, 3, [4, 5, 6], [...]] a[a.size-1] # => [1, 2, 3, [4, 5, 6], [...]] a[-1][-1] # => [1, 2, 3, [4, 5, 6], [...]] a[-1][-1][-1] # => [1, 2, 3, [4, 5, 6], [...]] All languages with arrays have constructs for iterating over them (even if it’s just a for loop). Languages like Java and Python have general iterator methods similar to Ruby’s, but they’re usually used for iterating over arrays. In Ruby, iterators are the standard wayof traversing all data structures: arrayiterators are just their simplest manifestation. Ruby’s array iterators deserve special study because they’re Ruby’s simplest and most accessible iterator methods. If you come to Ruby from another language, you’ll 4.1 Iterating Over an Array | 125 probablystart off thinking of iterator methods as letting youtreat aspects of a data structure “like an array.” Recipe 4.1 covers the basic array iterator methods, includ- ing ones in the Enumerable module that you’ll encounter over and over again in differ- ent contexts. The Set class, included in Ruby’s standard library, is a useful alternative to the Array class for manybasic algorithms. A Rubyset models a mathematical set: sets are not ordered, and cannot contain more than one reference to the same object. For more about sets, see Recipes 4.14 and 4.15. 4.1 Iterating Over an Array Problem You want to perform some operation on each item in an array. Solution Iterate over the arraywith Enumerable#each. Put into a block the code you want to execute for each item in the array. [1, 2, 3, 4].each { |x| puts x } # 1 # 2 # 3 # 4 If you want to produce a new array based on a transformation of some other array, use Enumerable#collect along with a block that takes one element and transforms it: [1, 2, 3, 4].collect { |x| x ** 2 } # => [1, 4, 9, 16] Discussion Rubysupports for loops and the other iteration constructs found in most modern programming languages, but its prefered idiom is a code block fed to an method like each or collect. Methods like each and collect are called generators or iterators: theyiterate over a data structure, yielding one element at a time to whatever code block you’ve attached. Once your code block completes, they continue the iteration and yield the next item in the data structure (according to whatever definition of “next” the gener- ator supports). These methods are covered in detail in Chapter 7. In a method like each, the return value of the code block, if any, is ignored. Methods like collect take a more active role. After they yield an element of a data structure to a code block, theyuse the return value in some way.The collect method uses the return value of its attached block as an element in a new array. 126 | Chapter 4: Arrays Although commonlyused in arrays,the collect method is actuallydefined in the Enumerable module, which the Array class includes. Manyother Rubyclasses ( Hash and Range are just two) include the Enumerable methods; it’s a sort of baseline for Rubyobjects that provide iterators. Though Enumerable does not define the each method, it must be defined byanyclass that includes Enumerable, so you’ll see that method a lot, too. This is covered in Recipe 9.4. If you need to have the array indexes along with the array elements, use Enumerable#each_with_index. ['a', 'b', 'c'].each_with_index do |item, index| puts "At position #{index}: #{item}" end # At position 0: a # At position 1: b # At position 2: c Ruby’s Array class also defines several generators not seen in Enumerable. For instance, to iterate over a list in reverse order, use the reverse_each method: [1, 2, 3, 4].reverse_each { |x| puts x } # 4 # 3 # 2 # 1 Enumerable#collect has a destructive equivalent: Array#collect!, also known as Arary#map! (a helpful alias for Python programmers). This method acts just like collect, but instead of creating a new arrayto hold the return values of its calls to the code block, it replaces each item in the old arraywith the corresponding value from the code block. This saves memory and time, but it destroys the old array: array = ['a', 'b', 'c'] array.collect! { |x| x.upcase } array # => ["A", "B", "C"] array.map! { |x| x.downcase } array # => ["a", "b", "c"] If you need to skip certain elements of an array, you can use the iterator methods Range#step and Integer#upto instead of Array#each. These methods generate a sequence of numbers that you can use as successive indexes into an array. array = ['junk', 'junk', 'junk', 'val1', 'val2'] 3.upto(array.length-1) { |i| puts "Value #{array[i]}" } # Value val1 # Value val2 array = ['1', 'a', '2', 'b', '3', 'c'] (0..array.length-1).step(2) do |i| puts "Letter #{array[i]} is #{array[i+1]}" end # Letter 1 is a # Letter 2 is b # Letter 3 is c 4.1 Iterating Over an Array | 127 Like most other programming languages, Rubylets youdefine for, while, and until loops—but you shouldn’t need them very often. The for construct is equivalent to each, whether it’s applied to an array or a range: for element in ['a', 'b', 'c'] puts element end # a # b # c for element in (1..3) puts element end # 1 # 2 # 3 The while and until constructs take a boolean expression and execute the loop while the expression is true (while) or until it becomes true (until). All three of the follow- ing code snippets generate the same output: array = ['cherry', 'strawberry', 'orange'] for index in (0...array.length) puts "At position #{index}: #{array[index]}" end index = 0 while index < array.length puts "At position #{index}: #{array[index]}" index += 1 end index = 0 until index == array.length puts "At position #{index}: #{array[index]}" index += 1 end # At position 0: cherry # At position 1: strawberry # At position 2: orange These constructs don’t make for veryidiomatic Ruby.You should onlyneed to use them when you’re iterating over a data structure in a way that doesn’t already have an iterator method (for instance, if you’re traversing a custom tree structure). Even then, it’s more idiomatic if you only use them to define your own iterator methods. The following code is a hybrid of each and each_reverse. It switches back and forth between iterating from the beginning of an array and iterating from its end. array = [1,2,3,4,5] new_array = [] front_index = 0 128 | Chapter 4: Arrays back_index = array.length-1 while front_index <= back_index new_array << array[front_index] front_index += 1 if front_index <= back_index new_array << array[back_index] back_index -= 1 end end new_array # => [1, 5, 2, 4, 3] That code works, but it becomes reusable when defined as an iterator. Put it into the Array class, and it becomes a universallyaccessible wayof doing iteration, the col- league of each and reverse_each: class Array def each_from_both_sides front_index = 0 back_index = self.length-1 while front_index <= back_index yield self[front_index] front_index += 1 if front_index <= back_index yield self[back_index] back_index -= 1 end end end end new_array = [] [1,2,3,4,5].each_from_both_sides { |x| new_array << x } new_array # => [1, 5, 2, 4, 3] This “burning the candle at both ends” behavior can also be defined as a collect- type method: one which constructs a new array out of multiple calls to the attached code block. The implementation below delegates the actual iteration to the each_ from_both_sides method defined above: class Array def collect_from_both_sides new_array = [] each_from_both_sides { |x| new_array << yield(x) } return new_array end end ["ham", "eggs", "and"].collect_from_both_sides { |x| x.capitalize } # => ["Ham", "And", "Eggs"] See Also • Chapter 7, especiallyRecipe 7.5, “Writing an Iterator Over a Data Structure,” and Recipe 7.9, “Looping Through Multiple Iterables in Parallel” 4.2 Rearranging Values Without Using Temporary Variables | 129 4.2 Rearranging Values Without Using Temporary Variables Problem You want to rearrange a number of variables, or assign the elements of an arrayto individual variables. Solution Use a single assignment statement. Put the destination variables on the left-hand side, and line each one up with a variable (or expression) on the right side. A simple swap: a = 1 b = 2 a, b = b, a a # => 2 b # => 1 A more complex rearrangement: a, b, c = :red, :green, :blue c, a, b = a, b, c a # => :green b # => :blue c # => :red You can split out an array into its components: array = [:red, :green, :blue] c, a, b = array a # => :green b # => :blue c # => :red You can even use the splat operator to extract items from the front of the array: a, b, *c = [12, 14, 178, 89, 90] a # => 12 b # => 14 c # => [178, 89, 90] Discussion Rubyassignment statements are veryversatile. When youput a comma-separated list of variables on the left-hand side of an assignment statement, it’s equivalent to assigning each variable in the list the corresponding right-hand value. Not onlydoes this make your code more compact and readable, it frees you from having to keep track of temporary variables when you swap variables. 130 | Chapter 4: Arrays Rubyworks behind the scenes to allocate temporarystorage space for variables that would otherwise be overwritten, so you don’t have to do it yourself. You don’t have to write this kind of code in Ruby: a, b = 1, 2 x = a a = b b = x The right-hand side of the assignment statement can get almost arbitrarily complicated: a, b = 5, 10 a, b = b/a, a-1 # => [2, 4] a, b, c = 'A', 'B', 'C' a, b, c = [a, b], { b => c }, a a # => ["A", "B"] b # => {"B"=>"C"} c # => "A" If there are more variables on the left side of the equal sign than on the right side, the extra variables on the left side get assigned nil. This is usually an unwanted side effect. a, b = 1, 2 a, b = b a # => 2 b # => nil One final nugget of code that is interesting enough to mention even though it has no legitimate use in Ruby: it doesn’t save enough memory to be useful, and it’s slower than doing a swap with an assignment. It’s possible to swap two integer variables using bitwise XOR, without using anyadditional storage space at all (not even implicitly): a, b = rand(1000), rand(1000) # => [595, 742] a = a ^ b # => 181 b = b ^ a # => 595 a = a ^ b # => 742 [a, b] # => [742, 595] In terms of the cookbook metaphor, this final snippet is a dessert—no nutritional value, but it sure is tasty. 4.3 Stripping Duplicate Elements from an Array Problem You want to strip all duplicate elements from an array, or prevent duplicate ele- ments from being added in the first place. 4.3 Stripping Duplicate Elements from an Array | 131 Solution Use Array#uniq to create a new array, based on an existing array but with no dupli- cate elements. Array#uniq! strips duplicate elements from an existing array. survey_results = [1, 2, 7, 1, 1, 5, 2, 5, 1] distinct_answers = survey_results.uniq # => [1, 2, 7, 5] survey_results.uniq! survey_results # => [1, 2, 7, 5] To ensure that duplicate values never get into your list, use a Set instead of an array. If you try to add a duplicate element to a Set, nothing will happen. require 'set' survey_results = [1, 2, 7, 1, 1, 5, 2, 5, 1] distinct_answers = survey_results.to_set # => # games = [["Alice", "Bob"], ["Carol", "Ted"], ["Alice", "Mallory"], ["Ted", "Bob"]] players = games.inject(Set.new) { |set, game| game.each { |p| set << p }; set } # => # players << "Ted" # => # Discussion The common element between these two solutions is the hash (see Chapter 5). Array#uniq iterates over an array, using each element as a key in a hash that it always checks to see if it encountered an element earlier in the iteration. A Set keeps the same kind of hash from the beginning, and rejects elements alreadyin the hash. You see something that acts like an array, but it won’t accept duplicates. In either case, two objects are considered “duplicates” if they have the same result for ==. The return value of Array#uniq is itself an array, and nothing prevents you from add- ing duplicate elements to it later on. If you want to start enforcing uniqueness in per- petuity, you should turn the array into a Set instead of calling uniq. Requiring the set library will define a new method Enumerable#to_set, which does this. Array#uniq preserves the original order of the array(that is, the first instance of an object remains in its original location), but a Set has no order, because its internal implementation is a hash. To get array-like order in a Set, combine this recipe with Recipe 5.8 and subclass Set to use an OrderedHash: class OrderedSet < Set def initialize @hash ||= OrderedHash.new end end 132 | Chapter 4: Arrays Needing to strip all instances of a particular value from an arrayis a problem that often comes up. Rubyprovides Array#delete for this task, and Array#compact for the special case of removing nil values. a = [1, 2, nil, 3, 3, nil, nil, nil, 5] a.compact # => [1, 2, 3, 3, 5] a.delete(3) a # => [1, 2, nil, nil, nil, nil, 5] 4.4 Reversing an Array Problem Your arrayis the wrong wayaround: the last item should be first and the first should be last. Solution Use reverse to create a new arraywith the items reversed. Internal subarrayswill not themselves be reversed. [1,2,3].reverse # => [3, 2, 1] [1,[2,3,4],5].reverse # => [5, [2, 3, 4], 1] Discussion Like manyoperations on basic Rubytypes, reverse has a corresponding method, reverse!, which reverses an array in place: a = [1,2,3] a.reverse! a # => [3, 2, 1] Don’t reverse an arrayif youjust need to iterate over it backwards. Don’t use a for loop either; the reverse_each iterator is more idiomatic. See Also • Recipe 1.4, “Reversing a String by Words or Characters” • Recipe 4.1, “Iterating Over an Array,” talks about using Array#reverse_each to iterate over an array in reverse order • Recipe 4.2, “Rearranging Values Without Using Temporary Variables” 4.5 Sorting an Array Problem You want to sort an arrayof objects, possiblyaccording to some custom notion of what “sorting” means. 4.5 Sorting an Array | 133 Solution Homogeneous arrays of common data types, like strings or numbers, can be sorted “naturally” by just calling Array#sort: [5.01, -5, 0, 5].sort # => [-5, 0, 5, 5.01] ["Utahraptor", "Ankylosaur", "Maiasaur"].sort # => ["Ankylosaur", "Maiasaur", "Utahraptor"] To sort objects based on one of their data members, or bythe results of a method call, use Array#sort_by. This code sorts an arrayof arraysbysize, regardless of their contents: arrays = [[1,2,3], [100], [10,20]] arrays.sort_by { |x| x.size } # => [[100], [10, 20], [1, 2, 3]] To do a more general sort, create a code block that compares the relevant aspect of any two given objects. Pass this block into the sort method of the array you want to sort. This code sorts an arrayof numbers in ascending numeric order, except that the number 42 will always be at the end of the list: [1, 100, 42, 23, 26, 10000].sort do |x, y| x == 42 ? 1 : x <=> y end # => [1, 23, 26, 100, 10000, 42] Discussion If there is one “canonical” wayto sort a particular class of object, then youcan have that class implement the <=> comparison operator. This is how Rubyautomatically knows how to sort numbers in ascending order and strings in ascending ASCII order: Numeric and String both implement the comparison operator. The sort_by method sorts an arrayusing a Schwartzian transform (see Recipe 4.6 for an in-depth discussion). This is the most useful customized sort, because it’s fast and easy to define. In this example, we use sort_by to sort on any one of an object’s fields. class Animal attr_reader :name, :eyes, :appendages def initialize(name, eyes, appendages) @name, @eyes, @appendages = name, eyes, appendages end def inspect @name end end animals = [Animal.new("octopus", 2, 8), Animal.new("spider", 6, 8), Animal.new("bee", 5, 6), Animal.new("elephant", 2, 4), Animal.new("crab", 2, 10)] 134 | Chapter 4: Arrays animals.sort_by { |x| x.eyes } # => [octopus, elephant, crab, bee, spider] animals.sort_by { |x| x.appendages } # => [elephant, bee, octopus, spider, crab] If you pass a block into sort, Rubycalls the block to make comparisons instead of using the comparison operator. This is the most general possible sort, and it’s useful for cases where sort_by won’t work. The comparison operator and a sort code block both take one argument: an object against which to compare self. A call to <=> (or a sort code block) should return –1 if self is “less than” the given object (and should therefore show up before it in a sorted list). It should return 1 if self is “greater than” the given object (and should show up after it in a sorted list), and 0 if the objects are “equal” (and it doesn’t mat- ter which one shows up first). You can usuallyavoid remembering this bydelegating the return value to some other object’s <=> implementation. See Also • Recipe 4.6, “Ignoring Case When Sorting Strings,” covers the workings of the Schwartzian Transform • Recipe 4.7, “Making Sure a Sorted Array Stays Sorted” • Recipe 4.10, “Shuffling an Array” • If you need to find the minimum or maximum item in a list according to some criteria, don’t sort it just to save writing some code; see Recipe 4.11, “Getting the N Smallest Items of an Array,” for other options 4.6 Ignoring Case When Sorting Strings Problem When you sort a list of strings, the strings beginning with uppercase letters sort before the strings beginning with lowercase letters. list = ["Albania", "anteater", "zorilla", "Zaire"] list.sort # => ["Albania", "Zaire", "anteater", "zorilla"] You want an alphabetical sort, regardless of case. Solution Use Array#sort_by. This is both the fastest and the shortest solution. list.sort_by { |x| x.downcase } # => ["Albania", "anteater", "Zaire", "zorilla"] 4.7 Making Sure a Sorted Array Stays Sorted | 135 Discussion The Array#sort_by method was introduced in Recipe 4.5, but it’s worth discussing in detail because it’s so useful. It uses a technique called a Schwartzian Transform. This common technique is like writing the following Rubycode (but it’s a lot faster, because it’s implemented in C): list.collect { |s| [s.downcase, s] }.sort.collect { |subarray| subarray[1] } It works like this: Rubycreates a new arraycontaining two-element subarrays.Each subarraycontains a value of String#downcase, along with the original string. This new arrayis sorted, and then the original strings (now sorted bytheir values for String#downcase) are recovered from the subarrays. String#downcase is called only once for each string. A sort is the most common occurance of this pattern, but it shows up whenever an algorithm calls a particular method on the same objects over and over again. If you’re not sorting, you can’t use Ruby’s internal Schwartzian Transform, but you can save time by caching, or memoizing, the results of each distinct method call. If you need to implement a Schwartzian Transform in Ruby, it’s faster to use a hash than an array: m = {} list.sort { |x,y| (m[x] ||= x.downcase) <=> (m[y] ||= y.downcase) } This technique is especiallyimportant if the method youneed to call has side effects. You certainly don’t want to call such methods more than once! See Also • The Ruby FAQ, question 9.15 • Recipe 4.5, “Sorting an Array” 4.7 Making Sure a Sorted Array Stays Sorted Problem You want to make sure an array stays sorted, even as you replace its elements or add new elements to it. Solution Subclass Array and override the methods that add items to the array. The new imple- mentations add everynew item to a position that maintains the sortedness of the array. As you can see below, there are a lot of these methods. If you can guarantee that a particular method will never be called, you can get away with not overriding it. 136 | Chapter 4: Arrays class SortedArray < Array def initialize(*args, &sort_by) @sort_by = sort_by || Proc.new { |x,y| x <=> y } super(*args) sort! &sort_by end def insert(i, v) # The next line could be further optimized to perform a # binary search. insert_before = index(find { |x| @sort_by.call(x, v) == 1 }) super(insert_before ? insert_before : -1, v) end def <<(v) insert(0, v) end alias push << alias unshift << Some methods, like collect!, can modifythe items in an array,taking them out of sort order. Some methods, like flatten!, can add new elements to strange places in an array.Rather than figuring out a wayto implement these methods in a waythat preserves the sortedness of the array, we’ll just let them run and then re-sort the array.* ["collect!", "flatten!", "[]="].each do |method_name| module_eval %{ def #{method_name}(*args) super sort! &@sort_by end } end def reverse! #Do nothing; reversing the array would disorder it. end end A SortedArray created from an unsorted array will end up sorted: a = SortedArray.new([3,2,1]) # => [1, 2, 3] Discussion Manymethods of Array are much faster on sorted arrays, so it’s often useful to expend some overhead on keeping an arraysorted over time. Removing items from a * We can’t use define_method to define these methods because in Ruby1.8 youcan’t use define_method to cre- ate a method that takes a block argument. See Chapter 10 for more on this. 4.7 Making Sure a Sorted Array Stays Sorted | 137 sorted arraywon’t unsort it, but adding or modifyingitems can. Keeping a sorted arraysorted means intercepting and reimplementing everysneakywayof putting objects into the array. The SortedArray constructor accepts anycode block youcan pass into Array#sort, and keeps the arraysorted according to that code block. The default code block uses the comparison operator (<=>) used by sort. unsorted= ["b", "aa", "a", "cccc", "1", "zzzzz", "k", "z"] strings_by_alpha = SortedArray.new(unsorted) # => ["1", "a", "aa", "b", "cccc", "k", "z", "zzzzz"] strings_by_length = SortedArray.new(unsorted) do |x,y| x.length <=> y.length end # => ["b", "z", "a", "k", "1", "aa", "cccc", "zzzzz"] The methods that add elements to an arrayspecifywhere in the arraytheyoperate: push operates on the end of the array, and insert operates on a specified spot. SortedArray responds to these methods but it ignores the caller’s request to put ele- ments in a certain place. Everynew element is inserted into a position that keeps the array sorted. a << -1 # => [-1, 1, 2, 3] a << 1.5 # => [-1, 1, 1.5, 2, 3] a.push(2.5) # => [-1, 1, 1.5, 2, 2.5, 3] a.unshift(1.6) # => [-1, 1, 1.5, 1.6, 2, 2.5, 3] For methods like collect! and arrayassignment ( []=) that allow complex changes to an array, the simplest solution is to allow the changes to go through and then re-sort: a = SortedArray.new([10, 6, 4, -4, 200, 100]) # => [-4, 4, 6, 10, 100, 200] a.collect! { |x| x * -1 } # => [-200, -100, -10, -6, -4, 4] a[3] = 25 a # => [-200, -100, -10, -4, 4, 25] # That is, -6 has been replaced by 25 and the array has been re-sorted. a[1..2] = [6000, 10, 600, 6] a # => [-200, -4, 4, 6, 10, 25, 600, 6000] # That is, -100 and -10 have been replaced by 6000, 10, 600, and 6, # and the array has been re-sorted. But with a little more work, we can write a more efficient implementation of array assignment that gives the same behavior. What happens when you run a command like a[0] = 10 on a SortedArray? The first element in the SortedArray is replaced by 10, and the SortedArray is re-sorted. This is equivalent to removing the first element in the array, then adding the value 10 to a place in the array that keeps it sorted. Array#[]= implements three different types of array assignment, but all three can be modeled as a series of removals followed bya series of insertions. We can use this fact to implement a more efficient version of SortedArray#[]=:. 138 | Chapter 4: Arrays class SortedArray def []=(*args) if args.size == 3 #e.g. "a[6,3] = [1,2,3]" start, length, value = args slice! Range.new(start, start+length, true) (value.respond_to? :each) ? value.each { |x| self << x } : self << value elsif args.size == 2 index, value = args if index.is_a? Numeric #e.g. "a[0] = 10" (the most common form of array assignment) delete_at(index) self << value elsif index.is_a? Range #e.g. "a[0..3] = [1,2,3]" slice! index (value.respond_to? :each) ? value.each { |x| self << x } : self << value else #Not supported. Delegate to superclass; will probably give an error. super sort!(&sort_by) end else #Not supported. Delegate to superclass; will probably give an error. super sort!(&sort_by) end end end Just as before, the sort will be maintained even when you use array assignment to replace some of a SortedArray’s elements with other objects. But this implementa- tion doesn’t have to re-sort the array every time. a = SortedArray.new([1,2,3,4,5,6]) a[0] = 10 a # => [2, 3, 4, 5, 6, 10] a[0, 2] = [100, 200] a # => [4, 5, 6, 10, 100, 200] a[1..2] = [-4, 6] a # => [-4, 4, 6, 10, 100, 200] It’s possible to subvert the sortedness of a SortedArray bymodifyingan object in place in a waythat changes its sort order. Since the SortedArray never hears about the change to this object, it has no wayof updating itself to move that object to its new sort position:* * One alternative is to modify SortedArray[] so that when you look up an element of the array, you actually get a delegate object that intercepts all of the element’s method calls, and re-sorts the arraywhenever the user calls a method that modifies the element in place. This is probably overkill. 4.7 Making Sure a Sorted Array Stays Sorted | 139 stripes = SortedArray.new(["aardwolf", "zebrafish"]) stripes[1].upcase! stripes # => ["aardwolf", "ZEBRAFISH"] stripes.sort! # => ["ZEBRAFISH", "aardwolf"] If this bothers you, you can make a SortedArray keep frozen copies of objects instead of the objects themselves. This solution hurts performance and uses more memory, but it will also prevent objects from being modified after being put into the SortedArray. This code adds a convenience method to Object that makes a frozen copy of the object: class Object def to_frozen f = self unless frozen? begin f = dup.freeze rescue TypeError #This object can't be duped (e.g. Fixnum); fortunately, #it usually can't be modified either end end return f end end The FrozenCopySortedArray stores frozen copies of objects instead of the objects themselves: class FrozenCopySortedArray < SortedArray def insert(i, v) insert_before = index(find { |x| x > v }) super(insert_before ? insert_before : -1, v.to_frozen) end ["initialize", "collect!", "flatten!"].each do |method_name| define_method(method_name) do super each_with_index { |x, i| self[i] = x.to_frozen } # No need to sort; by doing an assignment to every element # in the array, we've made #insert keep the array sorted. end end end stripes = SortedArray.new(["aardwolf", "zebrafish"]) stripes[1].upcase! # TypeError: can't modify frozen string Unlike a regular array, which can have elements of arbitrarily different data classes, all the elements of a SortedArray must be mutuallycomparable. For instance, you can mix integers and floating-point numbers within a SortedArray, but you can’t mix 140 | Chapter 4: Arrays integers and strings. Anydata set that would cause Array#sort to fail makes an invalid SortedArray: [1, "string"].sort # ArgumentError: comparison of Fixnum with String failed a = SortedArray.new([1]) a << "string" # ArgumentError: comparison of Fixnum with String failed One other pitfall: operations that create a new object, such as |=, +=, and to_a will turn an SortedArray into a (possibly unsorted) array. a = SortedArray.new([3, 2, 1]) # => [1, 2, 3] a += [1, -10] # => [1, 2, 3, 1, -10] a.class # => Array The simplest wayto avoid this is to override these methods to transform the result- ing array back into a SortedArray: class SortedArray def + (other_array) SortedArray.new(super) end end See Also • Recipe 4.11, “Getting the N Smallest Items of an Array,” uses a SortedArray • If you’re going to do a lot of insertions and removals, a red-black tree may be faster than a SortedArray; you can choose from a pure Ruby implementation (http://www.germane-software.com/software/Utilities/RBTree/) and one that uses a C extension for speed (http://www.geocities.co.jp/SiliconValley-PaloAlto/3388/ rbtree/README.html) 4.8 Summing the Items of an Array Problem You want to add together many objects in an array. Solution There are two good ways to accomplish this in Ruby. Plain vanilla iteration is a sim- ple way to approach the problem: collection = [1, 2, 3, 4, 5] sum = 0 collection.each {|i| sum += i} sum # => 15 4.9 Sorting an Array by Frequency of Appearance | 141 However this is such a common action that Rubyhas a special iterator method called inject, which saves a little code: collection = [1, 2, 3, 4, 5] collection.inject(0) {|sum, i| sum + i} # => 15 Discussion Notice that in the inject solution, we didn’t need to define the variable total vari- able outside the scope of iteration. Instead, its scope moved into the iteration. In the example above, the initial value for total is the first argument to inject. We changed the += to + because the block given to inject is evaluated on each value of the collec- tion, and the total variable is set to its output every time. You can think of the inject example as equivalent to the following code: collection = [1, 2, 3, 4, 5] sum = 0 sum = sum + 1 sum = sum + 2 sum = sum + 3 sum = sum + 4 sum = sum + 5 Although inject is the preferred wayof summing over a collection, inject is gener- allya few times slower than each. The speed difference does not grow exponentially, so you don’t need to always be worrying about it as you write code. But after the fact, it’s a good idea to look for inject calls in crucial spots that you can change to use faster iteration methods like each. Nothing stops you from using other kinds of operators in your inject code blocks. For example, you could multiply: collection = [1, 2, 3, 4, 5] collection.inject(1) {|total, i| total * i} # => 120 Manyof the other recipes in this book use inject to build data structures or run cal- culations on them. See Also • Recipe 2.8, “Finding Mean, Median, and Mode” • Recipe 4.12, “Building Up a Hash Using Injection” • Recipe 5.12, “Building a Histogram” 4.9 Sorting an Array by Frequency of Appearance Problem You want to sort an array so that its least-frequently-appearing items come first. 142 | Chapter 4: Arrays Solution Build a histogram of the frequencies of the objects in the array, then use it as a lookup table in conjunction with the sort_by method. The following method puts the least frequently-appearing objects first. Objects that have the same frequency are sorted normally, with the comparison operator. module Enumerable def sort_by_frequency histogram = inject(Hash.new(0)) { |hash, x| hash[x] += 1; hash} sort_by { |x| [histogram[x], x] } end end [1,2,3,4,1,2,4,8,1,4,9,16].sort_by_frequency # => [3, 8, 9, 16, 2, 2, 1, 1, 1, 4, 4, 4] Discussion The sort_by_frequency method uses sort_by, a method introduced in Recipe 4.5 and described in detail in Recipe 4.6. The technique here is a little different from other uses of sort_by, because it sorts bytwo different criteria. We want to first compare the relative frequencies of two items. If the relative frequencies are equal, we want to compare the items themselves. That way, all the instances of a given item will show up together in the sorted list. The block you pass to Enumerable#sort_by can return onlya single sort keyfor each object, but that sort key can be an array. Ruby compares two arrays by comparing their corresponding elements, one at a time. As soon as an element of one arrayis different from an element of another, the comparison stops, returning the compari- son of the two different elements. If one of the arrays runs out of elements, the longer one sorts first. Here are some quick examples: [1,2] <=> [0,2] # => 1 [1,2] <=> [1,2] # => 0 [1,2] <=> [2,2] # => -1 [1,2] <=> [1,1] # => 1 [1,2] <=> [1,3] # => -1 [1,2] <=> [1] # => 1 [1,2] <=> [3] # => -1 [1,2] <=> [0,1,2] # => 1 [1,2] <=> [] # => 1 In our case, all the arrays contain two elements: the relative frequency of an object in the array, and the object itself. If two objects have different frequencies, the first ele- ments of their arrays will differ, and the items will be sorted based on their frequen- cies. If two items have the same frequency, the first element of each array will be the same. The comparison method will move on to the second arrayelement, which means the two objects will be sorted based on their values. 4.10 Shuffling an Array | 143 If you don’t mind elements with the same frequency showing up in an unsorted order, you can speed up the sort a little by comparing only the histogram frequencies: module Enumerable def sort_by_frequency_faster histogram = inject(Hash.new(0)) { |hash, x| hash[x] += 1; hash} sort_by { |x| histogram[x] } end end [1,2,3,4,1,2,4,8,1,4,9,16].sort_by_frequency_faster # => [16, 8, 3, 9, 2, 2, 4, 1, 1, 4, 4, 1] To sort the list so that the most-frequently-appearing items show up first, either invert the result of sort_by_frequency, or multiplythe histogram values by–1 when passing them into sort_by: module Enumerable def sort_by_frequency_descending histogram = inject(Hash.new(0)) { |hash, x| hash[x] += 1; hash} sort_by { |x| [histogram[x] * -1, x]} end end [1,2,3,4,1,2,4,8,1,4,9,16].sort_by_frequency_descending # => [1, 1, 1, 4, 4, 4, 2, 2, 3, 8, 9, 16] If you want to sort a list by the frequency of its elements, but not have repeated ele- ments actuallyshow up in the sorted list, youcan run the list through Array#uniq after sorting it. However, since the keys of the histogram are just the distinct elements of the array, it’s more efficient to sort the keys of the histogram and return those: module Enumerable def sort_distinct_by_frequency histogram = inject(Hash.new(0)) { |hash, x| hash[x] += 1; hash } histogram.keys.sort_by { |x| [histogram[x], x] } end end [1,2,3,4,1,2,4,8,1,4,9,16].sort_distinct_by_frequency # => [3, 8, 9, 16, 2, 1, 4] See Also • Recipe 4.5, “Sorting an Array” • Recipe 5.12, “Building a Histogram” 4.10 Shuffling an Array Problem You want to put the elements of an array in random order. 144 | Chapter 4: Arrays Solution The simplest way to shuffle an array (in Ruby 1.8 and above) is to sort it randomly: [1,2,3].sort_by { rand } # => [1, 3, 2] This is not the fastest way, though. Discussion It’s hard to beat a random sort for brevityof code, but it does a lot of extra work. Like anygeneral sort, a random sort will do about n log n variable swaps. But to shuffle a list, it suffices to put a randomlyselected element in each position of the list. This can be done with only n variable swaps. class Array def shuffle! each_index do |i| j = rand(length-i) + i self[j], self[i] = self[i], self[j] end end def shuffle dup.shuffle! end end If you’re shuffling a very large list, either Array#shuffle or Array#shuffle! will be sig- nificantlyfaster than a random sort. Here’s a real-world example of shuffling using Array#shuffle: class Card def initialize(suit, rank) @suit = suit @rank = rank end def to_s "#{@suit} of #{@rank}" end end class Deck < Array attr_reader :cards @@suits = %w{Spades Hearts Clubs Diamonds} @@ranks = %w{Ace 2 3 4 5 6 7 8 9 10 Jack Queen King} def initialize @@suits.each { |suit| @@ranks.each { |rank| self << Card.new(rank, suit) } } end end 4.11 Getting the N Smallest Items of an Array | 145 deck = Deck.new deck.collect { |card| card.to_s } # => ["Ace of Spades", "2 of Spades", "3 of Spades", "4 of Spades", ... ] deck.shuffle! deck.collect { |card| card.to_s } # => ["6 of Clubs", "8 of Diamonds", "2 of Hearts", "5 of Clubs", ... ] See Also • Recipe 2.5, “Generating Random Numbers” • The Facets Core libraryprovides implementations of Array#shuffle and Array#shuffle! 4.11 Getting the N Smallest Items of an Array Problem You want to find the smallest few items in an array, or the largest, or the most extreme according to some other measure. Solution If you only need to find the single smallest item according to some measure, use Enumerable#min. Bydefault, it uses the <=> method to see whether one item is “smaller” than another, but you can override this by passing in a code block. [3, 5, 11, 16].min # => 3 ["three", "five", "eleven", "sixteen"].min # => "eleven" ["three", "five", "eleven", "sixteen"].min { |x,y| x.size <=> y.size } # => "five" Similarly, if you need to find the single largest item, use Enumerable#max. [3, 5, 11, 16].max # => 16 ["three", "five", "eleven", "sixteen"].max # => "three" ["three", "five", "eleven", "sixteen"].max { |x,y| x.size <=> y.size } # => "sixteen" Bydefault, arraysare sorted bytheir natural order: numbers are sorted byvalue, strings bytheir position in the ASCII collating sequence (basicallyalphabetical order, but all lowercase characters precede all uppercase characters). Hence, in the previ- ous examples, “three” is the largest string, and “eleven” the smallest. It gets more complicated when you need to get a number of the smallest or largest ele- ments according to some measurement: say, the top 5 or the bottom 10. The simplest solution is to sort the list and skim the items you want off of the top or bottom. 146 | Chapter 4: Arrays l = [1, 60, 21, 100, -5, 20, 60, 22, 85, 91, 4, 66] sorted = l.sort #The top 5 sorted[-5...sorted.size] # => [60, 66, 85, 91, 100] #The bottom 5 sorted[0...5] # => [-5, 1, 4, 20, 21] Despite the simplicityof this technique, it’s inefficient to sort the entire list unless the number of items you want to extract approaches the size of the list. Discussion The min and max methods work bypicking the first element of the arrayas a “cham- pion,” then iterating over the rest of the list trying to find an element that can beat the current champion on the appropriate metric. When it finds one, that element becomes the new champion. An element that can beat the old champion can also beat anyof the other contenders seen up to that point, so one run through the list suffices to find the maximum or minimum. The naive solution to finding more than one smallest item is to repeat this process mul- tiple times. Iterate over the Array once to find the smallest item, then iterate over it again to find the next-smallest item, and so on. This is naive for the same reason a bub- ble sort is naive: you’re repeating many of your comparisons more times than neces- sary. Indeed, if you run this algorithm once for every item in the array (trying to find the n smallest items in an array of n items), you get a bubble sort. Sorting the list beforehand is better when you need to find more than a small frac- tion of the items in the list, but it’s possible to do better. After all, you don’t really want to sort the whole list: you just want to sort the bottom of the list to find the smallest items. You don’t care if the other elements are unsorted because you’re not interested in those elements anyway. To sort onlythe smallest elements, youcan keep a sorted “stable” of champions, and kick the largest champion out of the stable whenever you find an element that’s smaller. If you encounter a number that’s too large to enter the stable, you can ignore it from that point on. This process rapidlycuts down on the number of elements you must consider, making this approach faster than doing a sort. The SortedList class from Recipe 4.7 is useful for this task. The min_n method below creates a SortedList “stable” that keeps its elements sorted based on the same block being used to find the minimum. It keeps the stable at a certain size bykicking out the largest item in the stable whenever a smaller item is found. The max_n method works similarly, but the comparisons are reversed, and the smallest element in the stable is kicked out when a larger element is found. 4.12 Building Up a Hash Using Injection | 147 module Enumerable def min_n(n, &block) block = Proc.new { |x,y| x <=> y } if block == nil stable = SortedArray.new(&block) each do |x| stable << x if stable.size < n or block.call(x, stable[-1]) == -1 stable.pop until stable.size <= n end return stable end def max_n(n, &block) block = Proc.new { |x,y| x <=> y } if block == nil stable = SortedArray.new(&block) each do |x| stable << x if stable.size < n or block.call(x, stable[0]) == 1 stable.shift until stable.size <= n end return stable end end l = [1, 60, 21, 100, -5, 20, 60, 22, 85, 91, 4, 66] l.max_n(5) # => [60, 66, 85, 91, 100] l.min_n(5) # => [-5, 1, 4, 20, 21] l.min_n(5) { |x,y| x.abs <=> y.abs } # => [1, 4, -5, 20, 21] See Also • Recipe 4.7, “Making Sure a Sorted Array Stays Sorted” 4.12 Building Up a Hash Using Injection Problem You want to create a hash from the values in an array. Solution As seen in Recipe 4.8, the most straightforward wayto solve this kind of problem is to use Enumerable#inject. The inject method takes one parameter (the object to build up, in this case a hash), and a block specifying the action to take on each item. The block takes two parameters: the object being built up (the hash), and one of the items from the array. 148 | Chapter 4: Arrays Here’s a straightforward use of inject to build a hash out of an arrayof key-value pairs: collection = [ [1, 'one'], [2, 'two'], [3, 'three'], [4, 'four'], [5, 'five'] ] collection.inject({}) do |hash, value| hash[value.first] = value.last hash end # => {5=>"five", 1=>"one", 2=>"two", 3=>"three", 4=>"four"} Discussion Whyis there that somewhat incongrous expression hash at the end of the inject block above? Because the next time it calls the block, inject uses the value it got from the block the last time it called the block. When you’re using inject to build a data structure, the last line of code in the block should evaluate to the object you’re building up: in this case, our hash. This is probablythe most common inject-related gotcha. Here’s some code that doesn’t work: collection.dup.inject({}) { |hash, value| hash[value.first] = value.last } # IndexError: index 3 out of string Whydoesn’t this work? Because hash assignment returns the assigned value, not the hash. Hash.new["key"] = "some value" # => "some value" In the broken example above, when inject calls the code block for the second and subsequent times, it does not pass the hash as the code block’s first argument. It passes in the last value to be assigned to the hash. In this case, that’s a string (maybe “one” or “four”). The hash has been lost forever, and the inject block crashes when it tries to treat a string as a hash. Hash#update can be used like hash assignment, except it returns the hash instead of the assigned value (and it’s slower). So this code will work: collection.inject({}) do |hash, value| hash.update value.first => value.last end # => {5=>"five", 1=>"ontwo", 2=>"two", 3=>"three", 4=>"four"} Ryan Carver came up with a more sophisticated way of building a hash out of an array: define a general method for all arrays called to_h. class Array def to_h(default=nil) Hash[ *inject([]) { |a, value| a.push value, default || yield(value) } ] end end 4.13 Extracting Portions of Arrays | 149 The magic of this method is that you can provide a code block to customize how keys in the array are mapped to values. a = [1, 2, 3] a.to_h(true) # => {1=>true, 2=>true, 3=>true} a.to_h { |value| [value * -1, value * 2] } # => {1=>[-1, 2], 2=>[-2, 4], 3=>[-3, 6]} References • Recipe 5.3, “Adding Elements to a Hash” • Recipe 5.12, “Building a Histogram” • The original definition of Array#to_h:(http://fivesevensix.com/posts/2005/05/20/ array-to_h) 4.13 Extracting Portions of Arrays Problem Given an array, you want to retrieve the elements of the array that occupy certain positions or have certain properties. You might to do this in a waythat removes the matching elements from the original array. Solution To gather a chunk of an arraywithout modifyingit, use the arrayretrieval operator Array#[], or its alias Array#slice. The arrayretrieval operator has three forms, which are the same as the corresponding forms for substring accesses. The simplest and most common form is array[index].It takes a number as input, treats it as an index into the array, and returns the element at that index. If the input is negative, it counts from the end of the array. If the array is smaller than the index, it returns nil. If performance is a big consideration for you, Array#at will do the same thing, and it’s a little faster than Array#[]: a = ("a".."h").to_a # => ["a", "b", "c", "d", "e", "f", "g", "h"] a[0] # => "a" a[1] # => "b" a.at(1) # => "b" a.slice(1) # => "b" a[-1] # => "h" a[-2] # => "g" a[1000] # => nil a[-1000] # => nil 150 | Chapter 4: Arrays The second form is array[range]. This form retrieves everyelement identified byan index in the given range, and returns those elements as a new array. A range in which both numbers are negative will retrieve elements counting from the end of the array. You can mix positive and negative indices where that makes sense: a[2..5] # => ["c", "d", "e", "f"] a[2...5] # => ["c", "d", "e"] a[0..0] # => ["a"] a[1..-4] # => ["b", "c", "d", "e"] a[5..1000] # => ["f", "g", "h"] a[2..0] # => [] a[0...0] # => [] a[-3..2] # => [] The third form is array[start_index, length]. This is equivalent to array[range. new(start_index...start_index+length)]. a[2, 4] # => ["c", "d", "e", "f"] a[2, 3] # => ["c", "d", "e"] a[0, 1] # => ["a"] a[1, 2] # => ["b", "c"] a[-4, 2] # => ["e", "f"] a[5, 1000] # => ["f", "g", "h"] To remove a slice from the array, use Array#slice!. This method takes the same arguments and returns the same results as Array#slice, but as a side effect, the objects it retrieves are removed from the array. a.slice!(2..5) # => ["c", "d", "e", "f"] a # => ["a", "b", "g", "h"] a.slice!(0) # => "a" a # => ["b", "g", "h"] a.slice!(1,2) # => ["g", "h"] a # => ["b"] Discussion The Array methods [], slice, and slice! work well if you need to extract one partic- ular elements, or a set of adjacent elements. There are two other main possibilities: you might need to retrieve the elements at an arbitrary set of indexes, or (a catch-all) you might need to retrieve all elements with a certain property that can be deter- mined with a code block. To nondestructivelygather the elements at particular indexes in an array,pass in any number of indices to Array#values_at. Results will be returned in a new array, in the same order they were requested. a = ("a".."h").to_a # => ["a", "b", "c", "d", "e", "f", "g", "h"] a.values_at(0) # => ["a"] 4.13 Extracting Portions of Arrays | 151 a.values_at(1, 0, -2) # => ["b", "a", "g"] a.values_at(4, 6, 6, 7, 4, 0, 3)# => ["e", "g", "g", "h", "e", "a", "d"] Enumerable#find_all finds all elements in an array(or other class with Enumerable mixed in) for which the specified code block returns true. Enumerable#reject will find all elements for which the specified code block returns false. a.find_all { |x| x < "e" } # => ["a", "b", "c", "d"] a.reject { |x| x < "e" } # => ["e", "f", "g", "h"] To find all elements in an arraythat match a regular expression, youcan use Enumerable#grep instead of defining a block that does the regular expression match: a.grep /[aeiou]/ # => ["a", "e"] a.grep /[^g]/ # => ["a", "b", "c", "d", "e", "f", "h"] It’s a little trickyto implement a destructive version of Array#values_at, because removing one element from an arraychanges the indexes of all subsequent elements. We can let Rubydo the work, though, byreplacing each element we want to remove with a dummyobject that we know cannot alreadybe present in the array.We can then use the C-backed method Array#delete to remove all instances of the dummy object from the array. This is much faster than using Array#slice! to remove ele- ments one at a time, because each call to Array#slice! forces Rubyto rearrange the array to be contiguous. If you know that your array contains no nil values, you can set your undesired val- ues to nil, then use use Array#compress! to remove them. The solution below is more general. class Array def strip_values_at!(*args) #For each mentioned index, replace its value with a dummy object. values = [] dummy = Object.new args.each do |i| if i < size values << self[i] self[i] = dummy end #Strip out the dummy object. delete(dummy) return values end end end a = ("a".."h").to_a a.strip_values_at!(1, 0, -2) # => ["b", "a", "g"] a # => ["c", "d", "e", "f", "h"] a.strip_values_at!(1000) # => [] a # => ["c", "d", "e", "f", "h"] 152 | Chapter 4: Arrays Array#reject! removes all items from an arraythat match a code block, but it doesn’t return the removed items, so it won’t do for a destructive equivalent of Enumerable#find_all. This implementation of a method called extract! picks up where Array#reject! leaves off: class Array def extract! ary = self.dup self.reject! { |x| yield x } ary - self end end a = ("a".."h").to_a a.extract! { |x| x < "e" && x != "b" } # => ["a", "c", "d"] a # => ["b", "e", "f", "g", "h"] Finally, a convenience method called grep_extract! provides a method that destruc- tively approximates the behavior of Enumerable#grep. class Array def grep_extract!(re) extract! { |x| re.match(x) } end end a = ("a".."h").to_a a.grep_extract!(/[aeiou]/) # => ["a", "e"] a # => ["b", "c", "d", "f", "g", "h"] See Also • Strings support the arraylookup operator, slice, slice!, and all the methods of Enumerable, so you can treat them like arrays in many respects; see Recipe 1.13, “Getting the Parts of a String You Want” 4.14 Computing Set Operations on Arrays Problem You want to find the union, intersection, difference, or Cartesian product of two arrays, or the complement of a single array with respect to some universe. Solution Array objects have overloaded arithmetic and logical operators to provide the three simplest set operations: #Union [1,2,3] | [1,4,5] # => [1, 2, 3, 4, 5] 4.14 Computing Set Operations on Arrays | 153 #Intersection [1,2,3] & [1,4,5] # => [1] #Difference [1,2,3] - [1,4,5] # => [2, 3] Set objects overload the same operators, as well as the exclusive-or operator (^). If you already have Arrays, though, it’s more efficient to deconstruct the XOR opera- tion into its three component operations. require 'set' a = [1,2,3] b = [3,4,5] a.to_set ^ b.to_set # => # (a | b) - (a & b) # => [1, 2, 4, 5] Discussion Set objects are intended to model mathematical sets: where arrays are ordered and can contain duplicate entries, Sets model an unordered collection of unique items. Set not onlyoverrides operators for set operations, it provides English-language aliases for the three most common operators: Set#union, Set#intersection, and Set#difference. An arraycan onlyperform a set operation on another array,but a Set can perform a set operation on any Enumerable. array = [1,2,3] set = [3,4,5].to_s array & set # => TypeError: can't convert Set into Array set & array # => # You might think that Set objects would be optimized for set operations, but they’re actuallyoptimized for constant-time membership checks (internally,a Set is based on a hash). Set union is faster when the left-hand object is a Set object, but intersection and difference are significantlyfaster when both objects are arrays.It’s not worth it to con- vert arrays into Sets just so you can say you performed set operations on Set objects. The union and intersection set operations remove duplicate entries from arrays. The difference operation does not remove duplicate entries from an arrayexcept as part of a subtraction. [3,3] & [3,3] # => [3] [3,3] | [3,3] # => [3] [1,2,3,3] - [1] # => [2, 3, 3] [1,2,3,3] - [3] # => [1, 2] [1,2,3,3] - [2,2,3] # => [1] Complement If you want the complement of an array with respect to some small universe, create that universe and use the difference operation: u = [:red, :orange, :yellow, :green, :blue, :indigo, :violet] a = [:red, :blue] u - a # => [:orange, :yellow, :green, :indigo, :violet] 154 | Chapter 4: Arrays More often, the relevant universe is infinite (the set of natural numbers) or extremely large (the set of three-letter strings). The best strategyhere is to define a generator and use it to iterate through the complement. Be sure to break when you’re done; you don’t want to iterate over an infinite set. def natural_numbers_except(exclude) exclude_map = {} exclude.each { |x| exclude_map[x] = true } x = 1 while true yield x unless exclude_map[x] x = x.succ end end natural_numbers_except([2,3,6,7]) do |x| break if x > 10 puts x end # 1 # 4 # 5 # 8 # 9 # 10 Cartesian product To get the Cartesian product of two arrays, write a nested iteration over both lists and append each pair of items to a new array. This code is attached to Enumerable so you can also use it with Sets or any other Enumerable. module Enumberable def cartesian(other) res = [] each { |x| other.each { |y| res << [x, y] } } return res end end [1,2,3].cartesian(["a",5,6]) # => [[1, "a"], [1, 5], [1, 6], # [2, "a"], [2, 5], [2, 6], # [3, "a"], [3, 5], [3, 6] This version uses Enumerable#inject to make the code more concise; however, the original version is more efficient. module Enumerable def cartesian(other) inject([]) { |res, x| other.inject(res) { |res, y| res << [x,y] } } end end 4.15 Partitioning or Classifying a Set | 155 See Also • See Recipe 2.5, “Generating Random Numbers,” for an example (constructing a deck of cards from suits and ranks) that could benefit from a function to calcu- late the Cartesian product • Recipe 2.10, “Multiplying Matrices” 4.15 Partitioning or Classifying a Set Problem You want to partition a Set or arraybased on some attribute of its elements. All ele- ments that go “together” in some code-specific sense should be grouped together in distinct data structures. For instance, if you’re partitioning by color, all the green objects in a Set should be grouped together, separate from the group of all the red objects in the Set. Solution Use Set#divide, passing in a code block that returns the partition of the object it’s passed. The result will be a new Set containing a number of partitioned subsets of your original Set. The code block can accept either a single argument or two arguments.* The single- argument version examines each object to see which subset it should go into. require 'set' s = Set.new((1..10).collect) # => # # Divide the set into the "true" subset and the "false" subset: that # is, the "less than 5" subset and the "not less than 5" subset. s.divide { |x| x < 5 } # => #, #}> # Divide the set into the "0" subset and the "1" subset: that is, the # "even" subset and the "odd" subset. s.divide { |x| x % 2 } # => #, #}> s = Set.new([1, 2, 3, 'a', 'b', 'c', -1.0, -2.0, -3.0]) # Divide the set into the "String subset, the "Fixnum" subset, and the # "Float" subset. s.divide { |x| x.class } # => #, * This is analogous to the one-argument code block passed into Enumerable#sort_by and the two-argument code block passed into Array#sort. 156 | Chapter 4: Arrays # => #, # => #}> For the two-argument code block version of Set#divide, the code block should return true if both the arguments it has been passed should be put into the same subset. s = [1, 2, 3, -1, -2, -4].to_set # Divide the set into sets of numbers with the same absolute value. s.divide { |x,y| x.abs == y.abs } # => #, # => #, # => #, # => #}> # Divide the set into sets of adjacent numbers s.divide { |x,y| (x-y).abs == 1 } # => #, # => #, # => #}> If you want to classify the subsets by the values they have in common, use Set#classify instead of Set#divide. It works like Set#divide, but it returns a hash that maps the names of the subsets to the subsets themselves. s.classify { |x| x.class } # => {String=>#, # => Fixnum=>#, # => Float=>#} Discussion The version of Set#divide that takes a two-argument code block uses the tsort libraryto turn the Set into a directed graph. The nodes in the graph are the items in the Set. Two nodes x and y in the graph are connected with a vertex (one-wayarrow) if the code block returns true when passed |x,y|. For the Set and the two-argument code block given in the example above, the graph looks like Figure 4-1. The Set partitions returned by Set#divide are the strongly connected components of this graph, obtained byiterating over TSort#each_strongly_connected_component. A strongly connected component is a set of nodes such that, starting from anynode in the compo- nent, you can follow the one-way arrows and get to any other node in the component. Figure 4-1. The set {1, 2, 3, -1, -2, -4} graphed according to the code block that checks adjacency 21–4 3–1–2 4.15 Partitioning or Classifying a Set | 157 Visuallyspeaking, the stronglyconnected components are the “clumps” in the graph. 1 and 3 are in the same stronglyconnected component as 2, because starting from 3 you can follow one-way arrows through 2 and get to 1. Starting from 1, you can follow one-wayarrows through 2 and get to 3. This makes 1, 2, and 3 part of the same Set partition, even though there are no direct connections between 1 and 3. In most real-world scenarios (including all the examples above), the one-wayarrows will be symmetrical: if the code returns true for |x,y|, it will also return true for |y,x|. Set#divide will work even if this isn’t true. Consider a Set and a divide code block like the following: connections = { 1 => 2, 2 => 3, 3 => 1, 4 => 1 } [1,2,3,4].to_set.divide { |x,y| connections[x] == y } # => #, #}> The corresponding graph looks like Figure 4-2. You can get to anyother node from 4 byfollowing one-wayarrows, but youcan’t get to 4 from anyof the other nodes. This puts 4 is in a stronglyconnected compo- nent—and a Set partition—all byitself. 1, 2, and 3 form a second stronglycon- nected component—and a second Set partition—because you can get from any of them to any of them by following one-way arrows. Implementation for arrays If you’re starting with an array instead of a Set, it’s easyto simulate Set#classify (and the single-argument block form of Set#divide) with a hash. In fact, the code below is almost identical to the current Ruby implementation of Set#classify. class Array def classify require 'set' h = {} each do |i| x = yield(i) (h[x] ||= self.class.new) << i end h end Figure 4-2. The set {1,2,3,4} graphed according to the connection hash 21 34 158 | Chapter 4: Arrays def divide(&block) Set.new(classify(&block).values) end end [1,1,2,6,6,7,101].divide { |x| x % 2 } # => # There’s no simple wayto implement a version of Array#divide that takes a two- argument block. The TSort class is Set-like, in that it won’t create two different nodes for the same object. The simplest solution is to convert the arrayinto a Set to remove anyduplicate values, divide the Set normally, then convert the partitioned subsets into arrays, adding back the duplicate values as you go: class Array def divide(&block) if block.arity == 2 counts = inject({}) { |h, x| h[x] ||= 0; h[x] += 1; h} to_set.divide(&block).inject([]) do |divided, set| divided << set.inject([]) do |partition, e| counts[e].times { partition << e } partition end end else Set.new(classify(&block).values) end end end [1,1,2,6,6,7,101].divide { |x,y| (x-y).abs == 1 } # => [[101], [1, 1, 2], [6, 6, 7]] Is it worth it? You decide. 159 Chapter 5 CHAPTER 5 Hashes5 Hashes and arrays are the two basic “aggregate” data types supported by most mod- ern programming lagnguages. The basic interface of a hash is similar to that of an array. The difference is that while an array stores items according to a numeric index, the index of a hash can be any object at all. Arrays and strings have been built into programming languages for decades, but built-in hashes are a relativelyrecent development. Now that they’rearound, it’s hard to live without them: they’re at least as useful as arrays. You can create a Hash bycalling Hash.new or byusing one of the special sytaxes Hash[] or {}. With the Hash[] syntax, you pass in the initial elements as comma-separated object references. With the {} syntax, you pass in the initial contents as comma- separated key-value pairs. empty = Hash.new # => {} empty = {} # => {} numbers = { 'two' => 2, 'eight' => 8} # => {"two"=>2, "eight"=>8} numbers = Hash['two', 2, 'eight', 8] # => {"two"=>2, "eight"=>8} Once the hash is created, you can do hash lookups and element assignments using the same syntax you would use to view and modify array elements: numbers["two"] # => 2 numbers["ten"] = 10 # => 10 numbers # => {"two"=>2, "eight"=>8, "ten"=>10} You can get an arraycontaining the keysor values of a hash with Hash#keys or Hash#values. You can get the entire hash as an array with Hash#to_a: numbers.keys # => ["two", "eight", "ten"] numbers.values # => [2, 8, 10] numbers.to_a # => [["two", 2], ["eight", 8], ["ten", 10]] Like an array, a hash contains references to objects, not copies of them. Modifica- tions to the original objects will affect all references to them: motto = "Don't tread on me" flag = { :motto => motto, 160 | Chapter 5: Hashes :picture => "rattlesnake.png"} motto.upcase! flag[:motto] # => "DON'T TREAD ON ME" The defining feature of an arrayis its ordering. Each element of an arrayis assigned a Fixnum object as its key. The keys start from zero and there can never be gaps. In con- trast, a hash has no natural ordering, since its keys can be any objects at all. This fea- ture make hashes useful for storing lightly structured data or key-value pairs. Consider some simple data for a person in an address book. For a side-by-side com- parison I’ll represent identical data as an array, then as a hash: a = ["Maury", "Momento", "123 Elm St.", "West Covina", "CA"] h = { :first_name => "Maury", :last_name => "Momento", :address => "123 Elm St." :city => "West Covina", :state => "CA" } The array version is more concise, and if you know the numeric index, you can retrieve anyelement from it in constant time. The problem is knowing the index, and knowing what it means. Other than inspecting the records, there’s no wayto know whether the element at index 1 is a last name or a first name. Worse, if the arrayfor- mat changes to add an apartment number between the street address and city, all code that uses a[3] or a[4] will need to have its index changed. The hash version doesn’t have these problems. The last name will always be at :last_ name, and it’s easy (for a human, anyway) to know what :last_name means. Most of the time, hash lookups take no longer than array lookups. The main advantage of a hash is that it’s often easier to find what you’re looking for. Checking whether an arraycontains a certain value might require scanning the entire array. To see whether a hash contains a value for a certain key, you only need to look up that key. The set library(as seen in the previous chapter) exploits this behavior to implement a class that looks like an array, but has the performance characteristics of a hash. The downside of using a hash is that since it has no natural ordering, it can’t be sorted except byturning it into an arrayfirst. There’s also no guarantee of order when you iterate over a hash. Here’s a contrasting case, in which an array is obvi- ously the right choice: a = [1, 4, 9, 16] h = { :one_squared => 1, two_squared => 4, three_squared => 9, :four_squared => 16 } In this case, there’s a numeric order to the entries, and giving them additional labels distracts more than it helps. A hash in Ruby is actually implemented as an array. When you look up a key in a hash (either to see what’s associated with that key, or to associate a value with the 5.1 Using Symbols as Hash Keys | 161 key), Ruby calculates the hash code of the keybycalling its hash method. The result is used as a numeric index in the array. Recipe 5.5 will help you with the most com- mon problem related to hash codes. The performance of a hash depends a lot on the fact that it’s veryrare for two objects to have the same hash code. If all objects in a hash had the same hash code, a hash would be much slower than an array. Code like this would be a very bad idea: class BadIdea def hash 100 end end Except for strings and other built-in objects, most objects have a hash code equiva- lent to their internal object ID. As seen above, you can override Object#hash to change this, but the only time you should need to do this is if your class also over- rides Object#==. If two objects are considered equal, theyshould also have the same hash code; otherwise, theywill behave strangelywhen youput them into hashes. Code like the fragment below is a very good idea: class StringHolder attr_reader :string def initialize(s) @string = s end def ==(other) @string == other.string end def hash @string.hash end end a = StringHolder.new("The same string.") b = StringHolder.new("The same string.") a == b # => true a.hash # => -1007666862 b.hash # => -1007666862 5.1 Using Symbols as Hash Keys Credit: Ben Giddings Problem When using a hash, you want the slight optimization you can get by using symbols as keys instead of strings. 162 | Chapter 5: Hashes Solution Whenever you would otherwise use a quoted string, use a symbol instead. A symbol can be created byeither using a colon in front of a word, like :keyname, or bytrans- forming a string to a symbol using String#intern. people = Hash.new people[:nickname] = 'Matz' people[:language] = 'Japanese' people['last name'.intern] = 'Matsumoto' people[:nickname] # => "Matz" people['nickname'.intern] # => "Matz" Discussion While 'name' and 'name' appear exactlyidentical, they’reactuallydifferent. Each time you create a quoted string in Ruby, you create a unique object. You can see this by looking at the object_id method. 'name'.object_id # => -605973716 'name'.object_id # => -605976356 'name'.object_id # => -605978996 By comparison, each instance of a symbol refers to a single object. :name.object_id # => 878862 :name.object_id # => 878862 'name'.intern.object_id # => 878862 'name'.intern.object_id # => 878862 Using symbolsinstead of strings saves memoryand time. It saves memorybecause there’s only one symbol instance, instead of many string instances. If you have many hashes that contain the same keys, the memory savings adds up. Using symbols as hash keys is faster because the hash value of a symbol is simply its object ID. If you use strings in a hash, Ruby must calculate the hash value of a string each time it’s used as a hash key. See Also • Recipe 1.7, “Converting Between Strings and Symbols” 5.2 Creating a Hash with a Default Value Credit: Ben Giddings Problem You’re using a hash, and you don’t want to get nil as a value when you look up a keythat isn’t present in the hash. You want to get some more convenient value instead, possibly one calculated dynamically. 5.2 Creating a Hash with a Default Value | 163 Solution A normal hash has a default value of nil: h = Hash.new h[1] # => nil h['do you have this string?'] # => nil There are two ways of creating default values for hashes. If you want the default value to be the same object for every hash key, pass that value into the Hash constructor. h = Hash.new("nope") h[1] # => "nope" h['do you have this string?'] # => "nope" If youwant the default value for a missing keyto depend on the keyor the current state of the hash, pass a code block into the hash constructor. The block will be called each time someone requests a missing key. h = Hash.new { |hash, key| (key.respond_to? :to_str) ? "nope" : nil } h[1] # => nil h['do you have this string'] # => "nope" Discussion The first type of custom default value is most useful when you want a default value of zero. For example, this form can be used to calculate the frequencyof certain words in a paragraph of text: text = 'The rain in Spain falls mainly in the plain.' word_count_hash = Hash.new 0 # => {} text.split(/\W+/).each { |word| word_count_hash[word.downcase] += 1 } word_count_hash # => {"rain"=>1, "plain"=>1, "in"=>2, "mainly"=>1, "falls"=>1, # "the"=>2, "spain"=>1} What if you wanted to make lists of the words starting with a given character? Your first attempt might look like this: first_letter_hash = Hash.new [] text.split(/\W+/).each { |word| first_letter_hash[word[0,1].downcase] << word } first_letter_hash # => {} first_letter_hash["m"] # => ["The", "rain", "in", "Spain", "falls", "mainly", "in", "the", "plain"] What’s going on here? All those words don’t start with “m”.... What happened is that the arrayyoupassed into the Hash constructor is being used for every default value. first_letter_hash["m"] is now a reference to that array, as is first_letter_hash["f"] and even first_letter_hash[1006]. This is a case where you need to pass in a block to the Hash constructor. The block is run everytime the Hash can’t find a key. This way you can create a different array each time. 164 | Chapter 5: Hashes first_letter_hash = Hash.new { |hash, key| hash[key] = [] } text.split(/\W+/).each { |word| first_letter_hash[word[0,1].downcase] << word } first_letter_hash # => {"m"=>["mainly"], "p"=>["plain"], "f"=>["falls"], "r"=>["rain"], # "s"=>["Spain"], "i"=>["in", "in"], "t"=>["The", "the"]} first_letter_hash["m"] # => ["mainly"] When a letter can’t be found in the hash, Rubycalls the block passed into the Hash constructor. That block puts a new arrayinto the hash, using the missing letter as the key. Now the letter is bound to a unique array, and words can be added to that array normally. Note that if you want to add the array to the hash so it can be used later, you must assign it within the block of the Hash constructor. Otherwise you’ll get a new, empty arrayeverytime youaccess first_letter_hash["m"]. The words you want to append to the array will be lost. See Also • This technique is used in recipes like Recipe 5.6, “Keeping Multiple Values for the Same Hash Key,” and Recipe 5.12, “Building a Histogram” 5.3 Adding Elements to a Hash Problem You have some items, loose or in some other data structure, which you want to put into an existing hash. Solution To add a single key-value pair, assign the value to the element lookup expression for the key: that is, call hash[key] = value. Assignment will override anyprevious value stored for that key. h = {} h["Greensleeves"] = "all my joy" h # => {"Greensleeves"=>"all my joy"} h["Greensleeves"] = "my delight" h # => {"Greensleeves"=>"my delight"} Discussion When you use a string as a hash key, the string is transparently copied and the copy is frozen. This is to avoid confusion should you modify the string in place, then try to use its original form to do a hash lookup: key = "Modify me if you can" h = { key => 1 } 5.3 Adding Elements to a Hash | 165 key.upcase! # => "MODIFY ME IF YOU CAN" h[key] # => nil h["Modify me if you can"] # => 1 h.keys # => ["Modify me if you can"] h.keys[0].upcase! # TypeError: can't modify frozen string To add an arrayof key-valuepairs to a hash, either iterate over the arraywith Array#each, or pass the hash into Array#inject. Using inject is slower but the code is more concise. squares = [[1,1], [2,4], [3,9]] results = {} squares.each { |k,v| results[k] = v } results # => {1=>1, 2=>4, 3=>9} squares.inject({}) { |h, kv| h[kv[0]] = kv[1]; h } # => {1=>1, 2=>4, 3=>9} To turn a flat arrayinto the key-valuepairs of a hash, iterate over the arrayelements two at a time: class Array def into_hash(h) unless size % 2 == 0 raise StandardError, "Expected array with even number of elements" end 0.step(size-1, 2) { |x| h[self[x]] = self[x+1] } h end end squares = [1,1,2,3,4,9] results = {} squares.into_hash(results) # => {1=>1, 2=>3, 4=>9} [1,1,2].into_hash(results) # StandardError: Expected array with even number of elements To insert into a hash everykey-valuefrom another hash, use Hash#merge!. If a keyis present in both hashes when a.merge!(b) is called, the value in b takes precedence over the value in a. squares = { 1 => 1, 2 => 4, 3 => 9} cubes = { 3 => 27, 4 => 256, 5 => 3125} squares.merge!(cubes) squares # =>{5=>3125, 1=>1, 2=>4, 3=>27, 4=>256} cubes # =>{5=>3125, 3=>27, 4=>256} Hash#merge! also has a nondestructive version, Hash#merge, which creates a new Hash with elements from both parent hashes. Again, the hash passed in as an argument takes precedence. 166 | Chapter 5: Hashes To completelyreplace the entire contents of one hash with the contents of another, use Hash#replace. squares = { 1 => 1, 2 => 4, 3 => 9} cubes = { 1 => 1, 2 => 8, 3 => 27} squares.replace(cubes) squares # => {1=>1, 2=>8, 3=>27} This is different from simplyassigning the cubes hash to the squares variable name, because cubes and squares are still separate hashes: theyjust happen to contain the same elements right now. Changing cubes won’t affect squares: cubes[4] = 64 squares # => {1=>1, 2=>8, 3=>27} Hash#replace is useful for reverting a Hash to known default values. defaults = {:verbose => true, :help_level => :beginner } args = {} requests.each do |request| args.replace(defaults) request.process(args) #The process method might modify the args Hash. end See Also • Recipe 4.12, “Building Up a Hash Using Injection,” has more about the inject method • Recipe 5.1, “Using Symbols as Hash Keys,” for a way to save memory when con- structing certain types of hashes • Recipe 5.5, “Using an Arrayor Other Modifiable Object as a Hash Key,”talks about how to avoid another common case of confusion when a hash keyis modified 5.4 Removing Elements from a Hash Problem Certain elements of your hash have got to go! Solution Most of the time you want to remove a specific element of a hash. To do that, pass the key into Hash#delete. h = {} h[1] = 10 h # => {1=>10} h.delete(1) h # => {} 5.4 Removing Elements from a Hash | 167 Discussion Don’t tryto delete an element from a hash bymapping it to nil. It’s true that, by default, you get nil when you look up a key that’s not in the hash, but there’s a dif- ference between a keythat’s missing from the hash and a keythat’s present but mapped to nil. Hash#has_key? will see a keymapped to nil, as will Hash#each and all other methods except for a simple fetch: h = {} h[5] # => nil h[5] = 10 h[5] # => 10 h[5] = nil h[5] # => nil h.keys # => [5] h.delete(5) h.keys # => [] Hash#delete works well when you need to remove elements on an ad hoc basis, but sometimes you need to go through the whole hash looking for things to remove. Use the Hash#delete_if iterator to delete key-value pairs for which a certain code block returns true (Hash#reject works the same way, but it works on a copy of the Hash). The following code deletes all key-value pairs with a certain value: class Hash def delete_value(value) delete_if { |k,v| v == value } end end h = {'apple' => 'green', 'potato' => 'red', 'sun' => 'yellow', 'katydid' => 'green' } h.delete_value('green') h # => {"sun"=>"yellow", "potato"=>"red"} This code implements the opposite of Hash#merge; it extracts one hash from another: class Hash def remove_hash(other_hash) delete_if { |k,v| other_hash[k] == v } end end squares = { 1 => 1, 2 => 4, 3 => 9 } doubles = { 1 => 2, 2 => 4, 3 => 6 } squares.remove_hash(doubles) squares # => {1=>1, 3=>9} Finally, to wipe out the entire contents of a Hash, use Hash#clear: h = {} 1.upto(1000) { |x| h[x] = x } h.keys.size # => 1000 h.clear h # => {} 168 | Chapter 5: Hashes See Also • Recipe 5.3, “Adding Elements to a Hash” • Recipe 5.7, “Iterating Over a Hash” 5.5 Using an Array or Other Modifiable Object as a Hash Key Problem You want to use a modifiable built-in object (an arrayor a hash, but not a string) as a keyin a hash, even while youmodifythe object in place. A naive solution tends to lose hash values once the keys are modified: coordinates = [10, 5] treasure_map = { coordinates => 'jewels' } treasure_map[coordinates] # => "jewels" # Add a z-coordinate to indicate how deep the treasure is buried. coordinates << -5 coordinates # => [10, 5, -5] treasure_map[coordinates] # => nil # Oh no! Solution The easiest solution is to call the Hash#rehash method everytime youmodifyone of the hash’s keys. Hash#rehash will repair the broken treasure map defined above: treasure_map.rehash treasure_map[coordinates] # => "jewels" If this is too much code, you might consider changing the definition of the object you use as a hash key, so that modifications don’t affect the way the hash treats it. Suppose you want a reliably hashable Array class. If you want this behavior univer- sally, you can reopen the Array class and redefine hash to give you the new behavior. But it’s safer to define a subclass of Array that implements a reliable-hashing mixin, and to use that subclass only for the Arrays you want to use as hash keys: module ReliablyHashable def hash return object_id end end class ReliablyHashableArray < Array include ReliablyHashable end 5.5 Using an Array or Other Modifiable Object as a Hash Key | 169 It’s now possible to keep track of the jewels: coordinates = ReliablyHashableArray.new([10,5]) treasure_map = { coordinates => 'jewels' } treasure_map[coordinates] # => "jewels" # Add a z-coordinate to indicate how deep the treasure is buried. coordinates.push(-5) treasure_map[coordinates] # => "jewels" Discussion Rubyperforms hash lookups using not the keyobject itself but the object’s hash code (an integer obtained from the keybycalling its hash method). The default implemen- tation of hash,inObject, uses an object’s internal ID as its hash code. Array, Hash, and String override this method to provide different behavior. In the initial example, the hash code of [10,5] is 41 and the hash code of [10,5,–5] is –83. The mapping of the coordinate list to ‘jewels’ is still present (it’ll still show up in an iteration over each_pair), but once you change the coordinate list, you can no longer use that variable as a key. You may also run into this problem when you use a hash or a string as a hash key, and then modifythe keyin place. This happens because the hash implementations of manybuilt-in classes tryto make sure that two objects that are “the same” (for instance, two distinct arrays with the same contents, or two distinct but identical strings) get the same hash value. When coordinates is [10,5], it has a hash code of 41, like anyother Array containing [10,5]. When coordinates is [10,5,–5] it has a hash code of –83, like any other Array with those contents. Because of the potential for confusion, some languages don’t let you use arrays or hashes as hash keys at all. Ruby lets you do it, but you have to face the conse- quences if the key changes. Fortunately, you can dodge the consequences by overrid- ing hash to work the way you want. Since an object’s internal ID never changes, the Object implementation is what you want to get reliable hashing. To get it back, you’ll have to override or subclass the hash method of Array or Hash (depending on what type of key you’re having trouble with). The implementations of hash given in the solution violate the principle that different representations of the same data should have the same hash code. This means that two ReliablyHashableArray objects will have different hash codes even if theyhave the same contents. For instance: a = [1,2] b = a.clone a.hash # => 11 b.hash # => 11 170 | Chapter 5: Hashes a = ReliablyHashableArray.new([1,2]) b = a.clone a.hash # => -606031406 b.hash # => -606034266 If you want a particular value in a hash to be accessible by two different arrays with the same contents, then you must key it to a regular array instead of a ReliablyHashableArray. You can’t have it both ways. If an object is to have the same hash keyas its earlier self, it can’t also have the same hash keyas another representa- tion of its current state. Another solution is to freeze your hash keys. Any frozen object can be reliably used as a hash key, since you can’t do anything to a frozen object that would cause its hash code to change. Ruby uses this solution: when you use a string as a hash key, Ruby copies the string, freezes the copy, and uses that as the actual hash key. See Also • Recipe 8.15, “Freezing an Object to Prevent Changes” 5.6 Keeping Multiple Values for the Same Hash Key Problem You want to build a hash that might have duplicate values for some keys. Solution The simplest wayis to create a hash that initializes missing values to emptyarrays. You can then append items onto the automatically created arrays: hash = Hash.new { |hash, key| hash[key] = [] } raw_data = [ [1, 'a'], [1, 'b'], [1, 'c'], [2, 'a'], [2, ['b', 'c']], [3, 'c'] ] raw_data.each { |x,y| hash[x] << y } hash # => {1=>["a", "b", "c"], 2=>["a", ["b", "c"]], 3=>["c"]} Discussion A hash maps anygiven keyto onlyone value, but that value can be an array.This is a common phenomenon when reading data structures from the outside world. For instance, a list of tasks with associated priorities maycontain multiple items with the same priority. Simply reading the tasks into a hash keyed on priority would create key collisions, and obliterate all but one task with any given priority. It’s possible to subclass Hash to act like a normal hash until a keycollision occurs, and then start keeping an array of values for the key that suffered the collision: 5.7 Iterating Over a Hash | 171 class MultiValuedHash < Hash def []=(key, value) if has_key?(key) super(key, [value, self[key]].flatten) else super end end end hash = MultiValuedHash.new raw_data.each { |x,y| hash[x] = y } hash # => {1=>["c", "b", "a"], 2=>["b", "c", "a"], 3=>"c"} This saves a little bit of memory, but it’s harder to write code for this class than for one that always keeps values in an array. There’s also no way of knowing whether a value [1,2,3] is a single array value or three numeric values. See Also • Recipe 5.2, “Creating a Hash with a Default Value,” explains the technique of the dynamic default value in more detail, and explains why you must initalize the empty list within a code block—never within the arguments to Hash.new 5.7 Iterating Over a Hash Problem You want to iterate over a hash’s key-value pairs as though it were an array. Solution Most likely, the iterator you want is Hash#each_pair or Hash#each. These methods yield every key-value pair in the hash: hash = { 1 => 'one', [1,2] => 'two', 'three' => 'three' } hash.each_pair { |key, value| puts "#{key.inspect} maps to #{value}"} # [1, 2] maps to two # "three" maps to three # 1 maps to one Note that each and each_pair return the key-value pairs in an apparently random order. Discussion Hash#each_pair and Hash#each let you iterate over a hash as though it were an array full of key-value pairs. Hash#each_pair is more commonlyused and slightlymore 172 | Chapter 5: Hashes efficient, but Hash#each is more array-like. Hash also provides several other iteration methods that can be more efficient than each. Use Hash#each_key if you only need the keys of a hash. In this example, a list has been stored as a hash to allow for quick lookups (this is how the Set class works). The values are irrelevant, but each_key can be used to iterate over the keys: active_toggles = { 'super' => true, 'meta' => true, 'hyper' => true } active_toggles.each_key { |active| puts active } # hyper # meta # super Use Hash#each_value if you only need the values of a hash. In this example, each_value is used to summarize the results of a survey. Here it’s the keys that are irrelevant: favorite_colors = { 'Alice' => :red, 'Bob' => :violet, 'Mallory' => :blue, 'Carol' => :blue, 'Dave' => :violet } summary = Hash.new 0 favorite_colors.each_value { |x| summary[x] += 1 } summary # => {:red=>1, :violet=>2, :blue=>2} Don’t iterate over Hash#each_value looking for a particular value: it’s simpler and faster to use has_value? instead. hash = {} 1.upto(10) { |x| hash[x] = x * x } hash.has_value? 49 # => true hash.has_value? 81 # => true hash.has_value? 50 # => false Removing unprocessed elements from a hash during an iteration prevents those items from being part of the iteration. However, adding elements to a hash during an iteration will not make them part of the iteration. Don’t modify the keyset of a hash during an iteration, or you’ll get undefined results and possibly a RuntimeError: 1.upto(100) { |x| hash[x] = true } hash.keys { |k| hash[k * 2] = true } # RuntimeError: hash modified during iteration Using an array as intermediary An alternative to using the hash iterators is to get an array of the keys, values, or key- value pairs in the hash, and then work on the array. You can do this with the keys, values, and to_a methods, respectively: hash = {1 => 2, 2 => 2, 3 => 10} hash.keys # => [1, 2, 3] hash.values # => [2, 2, 10] hash.to_a # => [[1, 2], [2, 2], [3, 10]] 5.7 Iterating Over a Hash | 173 The most common use of keys and values is to iterate over a hash in a specific order. All of Hash’s iterators return items in a seeminglyrandom order. If youwant to iter- ate over a hash in a certain order, the best strategyis usuallyto create an arrayfrom some portion of the hash, sort the array, then iterate over it. The most common case is to iterate over a hash according to some propertyof the keys. To do this, sort the result of Hash#keys. Use the original hash to look up the value for a key, if necessary. extensions = { 'Alice' => '104', 'Carol' => '210', 'Bob' => '110' } extensions.keys.sort.each do |k| puts "#{k} can be reached at extension ##{extensions[k]}" end # Alice can be reached at extension #104 # Bob can be reached at extension #110 # Carol can be reached at extension #210 Hash#values gives you the values of a hash, but that’s not useful for iterating because it’s so expensive to find the keyfor a corresponding value (and if you only wanted the values, you’d use each_value). Hash#sort and Hash#sort_by turn a hash into an arrayof two-element subarrays(one for each key-value pair), then sort the array of arrays however you like. Your custom sort method can sort on the values, on the values and the keys, or on some relation- ship between keyand value. You can then iterate over the sorted arraythe same as you would with the Hash.each iterator. This code sorts a to-do list by priority, then alphabetically: to_do = { 'Clean car' => 5, 'Take kangaroo to vet' => 3, 'Realign plasma conduit' => 3 } to_do.sort_by { |task, priority| [priority, task] }.each { |k,v| puts k } # Realign plasma conduit # Take kangaroo to vet # Clean car This code sorts a hash full of number pairs according to the magnitude of the differ- ence between the key and the value: transform_results = { 4 => 8, 9 => 9, 10 => 6, 2 => 7, 6 => 5 } by_size_of_difference = transform_results.sort_by { |x, y| (x-y).abs } by_size_of_difference.each { |x, y| puts "f(#{x})=#{y}: difference #{y-x}" } # f(9)=9: difference 0 # f(6)=5: difference -1 # f(10)=6: difference -4 # f(4)=8: difference 4 # f(2)=7: difference 5 See Also • See Recipe 5.8, “Iterating Over a Hash in Insertion Order,” for a more complex iterator 174 | Chapter 5: Hashes • Recipe 5.12, “Building a Histogram” • Recipe 5.13, “Remapping the Keys and Values of a Hash” 5.8 Iterating Over a Hash in Insertion Order Problem Iterations over a hash happen in a seeminglyrandom order. Sorting the keysor val- ues onlyworks if the keysor values are all mutuallycomparable. You’d like to iter- ate over a hash in the order in which the elements were added to the hash. Solution Use the orderedhash library(see below for how to get it). Its OrderedHash class acts like a hash, but it keeps the elements of the hash in insertion order. require 'orderedhash' h = OrderedHash.new h[1] = 1 h["second"] = 2 h[:third] = 3 h.keys # => [1, "second", :third] h.values # => [1, 2, 3] h.each { |k,v| puts "The #{k} counting number is #{v}" } # The 1 counting number is 1 # The second counting number is 2 # The third counting number is 3 Discussion OrderedHash is a subclass of Hash that also keeps an arrayof the keysin insertion order. When you add a key-value pair to the hash, OrderedHash modifies both the underlying hash and the array. When you ask for a specific hash element, you’re using the hash. When you ask for the keys or the values, the data comes from the array, and you get it in insertion order. Since OrderedHash is a real hash, it supports all the normal hash operations. But any operation that modifies an OrderedHash mayalso modifythe internal array,so it’s slower than just using a hash. OrderedHash#delete is especiallyslow, since it must perform a linear search of the internal arrayto find the keybeing deleted. Hash#delete runs in constant time, but OrderedHash#delete takes time proportionate to the size of the hash. See Also • You can get OrderedHash from the RAA at http://raa.ruby-lang.org/project/ orderedhash/; it’s not available as a gem, and it has no setup.rb script, so you’ll 5.9 Printing a Hash | 175 need to distribute orderedhash.rb with your project, or copy it into your Ruby library path • There is a queuehash gem that provides much the same functionality, but it has worse performance than OrderedHash 5.9 Printing a Hash Credit: Ben Giddings Problem You want to print out the contents of a Hash, but Kernel#puts doesn’t give veryuse- ful results. h = {} h[:name] = "Robert" h[:nickname] = "Bob" h[:age] = 43 h[:email_addresses] = {:home => "bob@example.com", :work => "robert@example.com"} h # => {:email_addresses=>["bob@example.com", "robert@example.com"], # :nickname=>"Bob", :name=>"Robert", :age=>43} puts h # nicknameBobage43nameRobertemail_addresseshomebob@example.comworkrobert@example.com puts h[:email_addresses] # homebob@example.comworkrobert@example.com Solution In other recipes, we sometimes reformat the results or output of Ruby statements so they’ll look better on the printed page. In this recipe, you’ll see raw, unretouched output, so you can compare different ways of printing hashes. The easiest wayto print a hash is to use Kernel#p. Kernel#p prints out the “inspected” version of its arguments: the string you get by calling inspect on the hash. The “inspected” version of an object often looks like Rubysource code for cre- ating the object, so it’s usually readable: p h[:email_addresses] # {:home=>"bob@example.com", :work=>"robert@example.com"} For small hashes intended for manual inspection, this maybe all youneed. However, there are two difficulties. One is that Kernel#p onlyprints to stdout. The second is that the printed version contains no newlines, making it difficult to read large hashes. p h # {:nickname=>"Bob", :age=>43, :name=>"Robert", :email_addresses=>{:home=> # "bob@example.com", :work=>"robert@example.com"}} 176 | Chapter 5: Hashes When the hash you’re trying to print is too large, the pp (“pretty-print”) module pro- duces very readable results: require 'pp' pp h[:email_addresses] # {:home=>"bob@example.com", :work=>"robert@example.com"} pp h # {:email_addresses=>{:home=>"bob@example.com", :work=>"robert@example.com"}, # :nickname=>"Bob", # :name=>"Robert", # :age=>43} Discussion There are a number of ways of printing hash contents. The solution you choose depends on the complexity of the hash you’re trying to print, where you’re trying to print the hash, and your personal preferences. The best general-purpose solution is the pp library. When a given hash element is too big to fit on one line, pp knows to put it on multi- ple lines. Not onlythat, but (as with Hash#inspect), the output is valid Rubysyntax for creating the hash: you can copy and paste it directly into a Ruby program to rec- reate the hash. The pp librarycan also pretty-printto I/O streams besides standard output, and can print to shorter lines (the default line length is 79). This example prints the hash to $stderr and wraps at column 50: PP::pp(h, $stderr, 50) # {:nickname=>"Bob", # :email_addresses=> # {:home=>"bob@example.com", # :work=>"robert@example.com"}, # :age=>43, # :name=>"Robert"} # => # You can also print hashes byconverting them into YAML with the yaml library. YAML is a human-readable markup language for describing data structures: require 'yaml' puts h.to_yaml # --- # :nickname: Bob # :age: 43 # :name: Robert # :email_addresses: # :home: bob@example.com # :work: robert@example.com 5.10 Inverting a Hash | 177 If none of these is suitable, you can print the hash out yourself by using Hash#each_ pair to iterate over the hash elements: h[:email_addresses].each_pair do |key, val| puts "#{key} => #{val}" end # home => bob@example.com # work => robert@example.com See Also • Recipe 8.10, “Getting a Human-Readable Printout of AnyObject,” covers the general case of this problem • Recipe 13.1, “Serializing Data with YAML” 5.10 Inverting a Hash Problem Given a hash, you want to switch the keys and values. That is, you want to create a new hash whose keys are the values of the old hash, and whose values are the keys of the old hash. If the old hash mapped “human” to “wolf;” you want the new hash to map “wolf” to “human.” Solution The simplest technique is to use the Hash#invert method: phone_directory = { 'Alice' => '555-1212', 'Bob' => '555-1313', 'Mallory' => '111-1111' } phone_directory.invert # => {"111-1111"=>"Mallory", "555-1212"=>"Alice", "555-1313"=>"Bob"} Discussion Hash#invert probably won’t do what you want if your hash maps more than one key to the same value. Onlyone of the keysfor that value will show up as a value in the inverted hash: phone_directory = { 'Alice' => '555-1212', 'Bob' => '555-1313', 'Carol' => '555-1313', 'Mallory' => '111-1111', 'Ted' => '555-1212' } phone_directory.invert # => {"111-1111"=>"Mallory", "555-1212"=>"Ted", "555-1313"=>"Bob"} 178 | Chapter 5: Hashes To preserve all the data from the original hash, borrow the idea behind Recipe 5.6, and write a version of invert that keeps an arrayof values for each key.The follow- ing is based on code by Tilo Sloboda: class Hash def safe_invert new_hash = {} self.each do |k,v| if v.is_a? Array v.each { |x| new_hash.add_or_append(x, k) } else new_hash.add_or_append(v, k) end end return new_hash end The add_or_append method a lot like the method MultivaluedHash#[]= defined in Recipe 5.6: def add_or_append(key, value) if has_key?(key) self[key] = [value, self[key]].flatten else self[key] = value end end end Here’s safe_invert in action: phone_directory.safe_invert # => {"111-1111"=>"Mallory", "555-1212"=>["Ted", "Alice"], # "555-1313"=>["Bob", "Carol"]} phone_directory.safe_invert.safe_invert # => {"Alice"=>"555-1212", "Mallory"=>"111-1111", "Ted"=>"555-1212", # => "Carol"=>"555-1313", "Bob"=>"555-1313"} Ideally, if you called an inversion method twice you’d always get the same data you started with. The safe_invert method does better than invert on this score, but it’s not perfect. If your original hash used arrays as hash keys, safe_invert will act as if you’d individually mapped each element in the array to the same value. Call safe_ invert twice, and the arrays will be gone. See Also • Recipe 5.5, “Using an Array or Other Modifiable Object as a Hash Key” • “True Inversion of a Hash in Ruby,” by Tilo Sloboda (http://www.unixgods.org/ ~tilo/Ruby/invert_hash.html) • The Facets library defines a Hash#inverse method much like safe_invert 5.11 Choosing Randomly from a Weighted List | 179 5.11 Choosing Randomly from a Weighted List Problem You want to pick a random element from a collection, where each element in the col- lection has a different probability of being chosen. Solution Store the elements in a hash, mapped to their relative probabilities. The following code will work with a hash whose keys are mapped to relative integer probabilities: def choose_weighted(weighted) sum = weighted.inject(0) do |sum, item_and_weight| sum += item_and_weight[1] end target = rand(sum) weighted.each do |item, weight| return item if target <= weight target -= weight end end For instance, if all the keys in the hash map to 1, the keys will be chosen with equal probability. If all the keys map to 1, except for one which maps to 10, that key will be picked 10 times more often than any single other key. This algorithm lets you sim- ulate those probabilityproblems that begin like, “You have a box containing 51 black marbles and 17 white marbles…”: marbles = { :black => 51, :white => 17 } 3.times { puts choose_weighted(marbles) } # black # white # black I’ll use it to simulate a lotteryin which the results have different probabilities of showing up: lottery_probabilities = { "You've wasted your money!" => 1000, "You've won back the cost of your ticket!" => 50, "You've won two shiny zorkmids!" => 20, "You've won five zorkmids!" => 10, "You've won ten zorkmids!" => 5, "You've won a hundred zorkmids!" => 1 } # Let's buy some lottery tickets. 5.times { puts choose_weighted(lottery_probabilities) } # You've wasted your money! # You've wasted your money! # You've wasted your money! # You've wasted your money! # You've won five zorkmids! 180 | Chapter 5: Hashes Discussion An extremelynaive solution would put the elements in a list and choose one at ran- dom. This doesn’t solve the problem because it ignores weights altogether: low- weight elements will show up exactlyas often as high-weight ones. A less naive solu- tion would be to repeat each element in the list a number of times proportional to its weight. Under this implementation, our simulation of the marble box would contain :black 51 times and :white 17 times, just like a real marble box. This is a common quick-and-dirty solution, but it’s hard to maintain, and it uses lots of memory. The algorithm given above actuallyworks the same wayas the less naive solution: the numeric weights stand in for multiple copies of the same object. Instead of pick- ing one of the 68 marbles, we pick a number between 0 and 67 inclusive. Since we know there are 51 black marbles, we simplydecide that the numbers from 0 to 50 will represent black marbles. For the implementation given above to work, all the weights in the hash must be inte- gers. This isn’t a big problem the first time you create a hash, but suppose that after the lotteryhas been running for a while, youdecide to add a new jackpot that’s 10 times less common than the 100-zorkmid jackpot. You’d like to give this new possibilitya weight of 0.1, but that won’t work with the choose_weighted implementation. You’ll need to give it a weight of 1, and multiply all the existing weights by 10. There is an alternative, though: normalize the weights so that theyadd up to 1. You can then generate a random floating-point number between 0 and 1, and use a simi- lar algorithm to the one above. This approach lets you weight the hash keys using anynumeric objects youlike, since normalization turns them all into small floating- point numbers anyway. def normalize!(weighted) sum = weighted.inject(0) do |sum, item_and_weight| sum += item_and_weight[1] end sum = sum.to_f weighted.each { |item, weight| weighted[item] = weight/sum } end lottery_probabilities["You've won five hundred zorkmids!"] = 0.1 normalize!(lottery_probabilities) # => { "You've wasted your money!" => 0.920725531718995, # "You've won back the cost of your ticket!" => 0.0460362765859497, # "You've won two shiny zorkmids!" => 0.0184145106343799, # "You've won five zorkmids!" => 0.00920725531718995, # "You've won ten zorkmids!" => 0.00460362765859497, # "You've won a hundred zorkmids!" => 0.000920725531718995, # "You've won five hundred zorkmids!" => 9.20725531718995e-05 } Once the weights have been normalized, we know that theysum to one (within the limits of floating-point arithmetic). This simplifies the code that picks an element at random, since we don’t have to sum up the weights every time: 5.12 Building a Histogram | 181 def choose_weighted_assuming_unity(weighted) target = rand weighted.each do |item, weight| return item if target <= weight target -= weight end end 5.times { puts choose_weighted_assuming_unity(lottery_probabilities) } # You've wasted your money! # You've wasted your money! # You've wasted your money! # You've wasted your money! # You've won back the cost of your ticket! See Also • Recipe 2.5, “Generating Random Numbers” • Recipe 6.9, “Picking a Random Line from a File” 5.12 Building a Histogram Problem You have an arraythat contains a lot of references to relativelyfew objects. You want to create a histogram, or frequencymap: something youcan use to see how often a given object shows up in the array. Solution Build the histogram in a hash, mapping each object found to the number of times it appears. module Enumerable def to_histogram inject(Hash.new(0)) { |h, x| h[x] += 1; h} end end [1, 2, 2, 2, 3, 3].to_histogram # => {1=>1, 2=>3, 3=>2} ["a", "b", nil, "c", "b", nil, "a"].to_histogram # => {"a"=>2, "b"=>2, "c"=>1, nil=>2} "Aye\nNay\nNay\nAbstaining\nAye\nNay\nNot Present\n".to_histogram # => {"Abstaining\n"=>1, "Nay\n"=>3, "Not Present\n"=>1, "Aye\n"=>2} survey_results = { "Alice" => :red, "Bob" => :green, "Carol" => :green, "Mallory" => :blue } survey_results.values.to_histogram # => {:red=>1, :green=>2, :blue=>1} 182 | Chapter 5: Hashes Discussion Making a histogram is an easyand fast (linear-time) wayto summarize a dataset. Histograms expose the relative popularityof the items in a dataset, so they’reuseful for visualizing optimization problems and dividing the “head” from the “long tail.” Once you have a histogram, you can find the most or least common elements in the list, sort the list byfrequencyof appearance, or see whether the distribution of items matches your expectations. Many of the other recipes in this book build a histogram as a first step towards a more complex algorithm. Here’s a quick wayof visualizing a histogram as an ASCII chart. First, we convert the histogram keys to their string representations so they can be sorted and printed. We also store the histogram value for the key, since we can’t do a histogram lookup later based on the string value we’ll be using. def draw_graph(histogram, char="#") pairs = histogram.keys.collect { |x| [x.to_s, histogram[x]] }.sort Then we find the keywith the longest string representation. We’ll pad the rest of the histogram rows to this length, so that the graph bars will line up correctly. largest_key_size = pairs.max { |x,y| x[0].size <=> y[0].size }[0].size Then we print each key-value pair, padding with spaces as necessary. pairs.inject("") do |s,kv| s << "#{kv[0].ljust(largest_key_size)} |#{char*kv[1]}\n" end end Here’s a histogram of the color survey results from the Solution: puts draw_graph(survey_results.values.to_histogram) # blue |# # green |## # red |# This code generates a bunch of random numbers, then graphs the random distribution: random = [] 100.times { random << rand(10) } puts draw_graph(random.to_histogram) # 0 |############ # 1 |######## # 2 |####### # 3 |######### # 4 |########## # 5 |############# # 6 |############### # 7 |######## # 8 |####### # 9 |########### 5.13 Remapping the Keys and Values of a Hash | 183 See Also • Recipe 2.8, “Finding Mean, Median, and Mode” • Recipe 4.9, “Sorting an Array by Frequency of Appearance” 5.13 Remapping the Keys and Values of a Hash Problem You have two hashes with common keys but differing values. You want to create a new hash that maps the values of one hash to the values of another. Solution class Hash def tied_with(hash) remap do |h,key,value| h[hash[key]] = value end.delete_if { |key,value| key.nil? || value.nil? } end Here is the Hash#remap method: def remap(hash={}) each { |k,v| yield hash, k, v } hash end end Here’s how to use Hash#tied_with to merge two hashes: a = {1 => 2, 3 => 4} b = {1 => 'foo', 3 => 'bar'} a.tied_with(b) # => {"foo"=>2, "bar"=>4} b.tied_with(a) # => {2=>"foo", 4=>"bar"} Discussion This remap method can be handywhen youwant to make a similar change to every item in a hash. It is also a good example of using the yield method. Hash#remap is conceptuallysimilar to Hash#collect, but Hash#collect builds up a nested array of key-value pairs, not a new hash. See Also • The Facets librarydefines methods Hash#update_each and Hash#replace_each! for remapping the keys and values of a hash 184 | Chapter 5: Hashes 5.14 Extracting Portions of Hashes Problem You have a hash that contains a lot of values, but onlya few of them are interesting. You want to select the interesting values and ignore the rest. Solution You can use the Hash#select method to extract part of a hash that follows a certain rule. Suppose you had a hash where the keys were Time objects representing a cer- tain date, and the values were the number of web site clicks for that given day. We’ll simulate such as hash with random data: require 'time' click_counts = {} 1.upto(30) { |i| click_counts[Time.parse("2006-09-#{i}")] = 400 + rand(700) } p click_counts # {Sat Sep 23 00:00:00 EDT 2006=>803, Tue Sep 12 00:00:00 EDT 2006=>829, # Fri Sep 01 00:00:00 EDT 2006=>995, Mon Sep 25 00:00:00 EDT 2006=>587, # ... You might want to know the days when your click counts were low, to see if you could spot a trend. Hash#select can do that for you: low_click_days = click_counts.select {|key, value| value < 450 } # [[Thu Sep 14 00:00:00 EDT 2006, 449], [Mon Sep 11 00:00:00 EDT 2006, 406], # [Sat Sep 02 00:00:00 EDT 2006, 440], [Mon Sep 04 00:00:00 EDT 2006, 431], # ... Discussion The arrayreturned by Hash#select contains a number of key-value pairs as two- element arrays. The first element of one of these inner arrays is a key into the hash, and the second element is the corresponding value. This is similar to how Hash#each yields a succession of two-element arrays. If you want another hash instead of an array of key-value pairs, you can use Hash#inject instead of Hash#select. In the code below, kv is a two-element arraycon- taining a key-value pair. kv[0] is a keyfrom click_counts, and kv[1] is the corre- sponding value. low_click_days_hash = click_counts.inject({}) do |h, kv| k, v = kv h[k] = v if v < 450 h end # => {Mon Sep 25 00:00:00 EDT 2006=>403, # Wed Sep 06 00:00:00 EDT 2006=>443, # Thu Sep 28 00:00:00 EDT 2006=>419} 5.15 Searching a Hash with Regular Expressions | 185 You can also use the Hash.[] constructor to create a hash from the arrayresult of Hash#select: low_click_days_hash = Hash[*low_click_days.flatten] # => {Thu Sep 14 00:00:00 EDT 2006=>449, Mon Sep 11 00:00:00 EDT 2006=>406, # Sat Sep 02 00:00:00 EDT 2006=>440, Mon Sep 04 00:00:00 EDT 2006=>431, # ... See Also • Recipe 4.13, “Extracting Portions of Arrays” 5.15 Searching a Hash with Regular Expressions Credit: Ben Giddings Problem You want to grep a hash: that is, find all keys and/or values in the hash that match a regular expression. Solution The fastest way to grep the keys of a hash is to get the keys as an array, and grep that: h = { "apple tree" => "plant", "ficus" => "plant", "shrew" => "animal", "plesiosaur" => "animal" } h.keys.grep /p/ # => ["apple tree", "plesiosaur"] The solution for grepping the values of a hash is similar (substitute Hash#values for Hash#keys), unless you need to map the values back to the keys of the hash. If that’s what you need, the fastest way is to use Hash#each to get key-value pairs, and match the regular expression against each value. h.inject([]) { |res, kv| res << kv if kv[1] =~ /p/; res } # => [["ficus", "plant"], ["apple tree", "plant"]] The code is similar if you need to find key-value pairs where either the key or the value matches a regular expression: class Hash def grep(pattern) inject([]) do |res, kv| res << kv if kv[0] =~ pattern or kv[1] =~ pattern res end end end h.grep(/pl/) # => [["ficus", "plant"], ["apple tree", "plant"], ["plesiosaur", "animal"]] h.grep(/plant/) # => [["ficus", "plant"], ["apple tree", "plant"]] h.grep(/i.*u/) # => [["ficus", "plant"], ["plesiosaur", "animal"]] 186 | Chapter 5: Hashes Discussion Hash defines its own grep method, but it will never give you any results. Hash#grep is inherited from Enumerable#grep, which tries to match the output of each against the given regular expression. Hash#each returns a series of two-item arrays containing key-value pairs, and an array will never match a regular expression. The Hash#grep implementation above is more useful. Hash#keys.grep and Hash#values.grep are more efficient than matching a regular expression against each keyor value in a Hash, but those methods create a new array containing all the keys in the Hash. If memoryusage is yourprimaryconcern, iterate over each_key or each_value instead: res = [] h.each_key { |k| res << k if k =~ /p/ } res # => ["apple tree", "plesiosaur"] 187 Chapter 6 CHAPTER 6 Files and Directories6 As programming languages increase in power, we programmers get further and fur- ther from the details of the underlying machine language. When it comes to the operating system, though, even the most modern programming languages live on a level of abstraction that looks a lot like the C and Unix libraries that have been around for decades. We covered this kind of situation in Chapter 3 with Ruby’s Time objects, but the issue reallyshows up when youstart to work with files. Rubyprovides an elegant object-oriented interface that lets you do basic file access, but the more advanced file libraries tend to look like the C libraries they’re based on. To lock a file, change its Unix permissions, or read its metadata, you’ll need to remember method names like mtime, and the meaning of obscure constants like File::LOCK_EX and 0644. This chap- ter will show you how to use the simple interfaces, and how to make the more obscure interfaces easier to use. Looking at Ruby’s support for file and directory operations, you’ll see four distinct tiers of support. The most common operations tend to show up on the lower- numbered tiers: 1. File objects to read and write the contents of files, and Dir objects to list the contents of directories. For examples, see Recipes 6.5, 6.7, and 6.17. Also see Recipe 6.13 for a Ruby-idiomatic approach. 2. Class methods of File to manipulate files without opening them. For instance, to delete a file, examine its metadata, or change its permissions. For examples, see Recipes 6.1, 6.3, and 6.4. 3. Standard libraries, such as find to walk directorytrees, and fileutils to per- form common filesystem operations like copying files and creating directories. For examples, see Recipes 6.8, 6.12, and 6.20. 4. Gems like file-tail, lockfile, and rubyzip, which fill in the gaps left bythe standard library. Most of the file-related gems covered in this book deal with specific file formats, and are covered in Chapter 12. 188 | Chapter 6: Files and Directories Kernel#open is the simplest wayto open a file. It returns a File object that you can read from or write to, depending on the “mode” constant you pass in. I’ll introduce read mode and write mode here; there are several others, but I’ll talk about most of those as they come up in recipes. To write data to a file, pass a mode of 'w' to open. You can then write lines to the file with File#puts, just like printing to standard output with Kernel#puts. For more pos- sibilities, see Recipe 6.7. open('beans.txt', "w") do |file| file.puts('lima beans') file.puts('pinto beans') file.puts('human beans') end To read data from a file, open it for read access byspecifyinga mode of 'r', or just omitting the mode. You can slurp the entire contents into a string with File#read,or process the file line-by-line with File#each. For more details, see Recipe 6.6. open('beans.txt') do |file| file.each { |l| puts "A line from the file: #{l}" } end # A line from the file: lima beans # A line from the file: pinto beans # A line from the file: human beans As seen in the examples above, the best wayto use the open method is with a code block. The open method creates a new File object, passes it to your code block, and closes the file automatically after your code block runs—even if your code throws an exception. This saves you from having to remember to close the file after you’re done with it. You could relyon the Rubyinterpreter’s garbage collection to close the file once it’s no longer being used, but Ruby makes it easy to do things the right way. To find a file in the first place, youneed to specifyits disk path. You mayspecifyan absolute path, or one relative to the current directoryof yourRubyprocess (see Rec- ipe 6.21). Relative paths are usuallybetter, because they’remore portable across platforms. Relative paths like “beans.txt” or “subdir/beans.txt” will work on any platform, but absolute Unix paths look different from absolute Windows paths: # A stereotypical Unix path. open('/etc/passwd') # A stereotypical Windows path; note the drive letter. open('c:/windows/Documents and Settings/User1/My Documents/ruby.doc') Windows paths in Rubyuse forward slashes to separate the parts of a path, even though Windows itself uses backslashes. Rubywill also accept backslashes in a Win- dows path, so long as you escape them: open('c:\\windows\\Documents and Settings\\User1\\My Documents\\ruby.doc') Files and Directories | 189 Although this chapter focuses mainlyon disk files, most of the methods of File are actuallymethods of its superclass, IO. You’ll encounter manyother classes that are also subclasses of IO, or just respond to the same methods. This means that most of the tricks described in this chapter are applicable to classes like the Socket class for Internet sockets and the infinitely useful StringIO (see Recipe 6.15). Your Rubyprogram’s standard input, output, and error ( $stdin, $stdout, and $stderr) are also IO objects, which means you can treat them like files. This one-line program echoes its input to its output: $stdin.each { |l| puts l } The Kernel#puts command just calls $stdout.puts, so that one-liner is equivalent to this one: $stdin.each { |l| $stdout.puts l } Not all file-like objects support all the methods of IO. See Recipe 6.11 for ways to get around the most common problem with unsupported methods. Also see Recipe 6.16 for more on the default IO objects. Several of the recipes in this chapter (such as Recipes 6.12 and 6.20) create specific directorystructures to demonstrate different concepts. Rather than bore youbyfill- ing up recipes with the Rubycode to create a certain directorystructure, I’ve written a method that takes a short description of a directorystructure, and creates the appropriate files and subdirectories: # create_tree.rb def create_tree(directories, parent=".") directories.each_pair do |dir, files| path = File.join(parent, dir) Dir.mkdir path unless File.exists? path files.each do |filename, contents| if filename.respond_to? :each_pair # It's a subdirectory create_tree filename, path else # It's a file open(File.join(path, filename), 'w') { |f| f << contents || "" } end end end end Now I can present the directorystructure as a data structure and youcan create it with a single method call: require 'create_tree' create_tree 'test' => [ 'An empty file', ['A file with contents', 'Contents of file'], { 'Subdirectory' => ['Empty file in subdirectory', ['File in subdirectory', 'Contents of file'] ] }, { 'Empty subdirectory' => [] } ] 190 | Chapter 6: Files and Directories require 'find' Find.find('test') { |f| puts f } # test # test/Empty subdirectory # test/Subdirectory # test/Subdirectory/File in subdirectory # test/Subdirectory/Empty file in subdirectory # test/A file with contents # test/An empty file File.read('test/Subdirectory/File in subdirectory') # => "Contents of file" 6.1 Checking to See If a File Exists Problem Given a filename, you want to see whether the corresponding file exists and is the right kind for your purposes. Solution Most of the time you’ll use the File.file? predicate, which returns true onlyif the file is an existing regular file (that is, not a directory, a socket, or some other special file). filename = 'a_file.txt' File.file? filename # => false require 'fileutils' FileUtils.touch(filename) File.file? filename # => true Use the File.exists? predicate instead if the file might legitimatelybe a directoryor other special file, or if you plan to create a file by that name if it doesn’t exist. File. exists? will return true if a file of the given name exists, no matter what kind of file it is. directory_name = 'a_directory' FileUtils.mkdir(directory_name) File.file? directory_name # => false File.exists? directory_name # => true Discussion A true response from File.exists? means that the file is present on the filesystem, but says nothing about what type of file it is. If you open up a directory thinking it’s a regular file, you’re in for an unpleasant surprise. This is why File.file? is usually more useful than File.exists?. Rubyprovides several other predicates for checking the typeof a file: the other com- monly useful one is File.directory?: 6.2 Checking Your Access to a File | 191 File.directory? directory_name # => true File.directory? filename # => false The rest of the predicates are designed to work on Unix systems. File.blockdev? tests for block-device files (such as hard-drive partitions), File.chardev? tests for character- device files (such as TTYs), File.socket? tests for socket files, and File.pipe? tests for named pipes, File.blockdev? '/dev/hda1' # => true File.chardev? '/dev/tty1' # => true File.socket? '/var/run/mysqld/mysqld.sock' # => true system('mkfifo named_pipe') File.pipe? 'named_pipe' # => true File.symlink? tests whether a file is a symbolic link to another file, but you only need to use it when you want to treat symlinks differently from other files. A symlink to a regular file will satisfy File.file?, and can be opened and used just like a regular file. In most cases, you don’t even have to know it’s a symlink. The same goes for sym- links to directories and to other types of files. new_filename = "#{filename}2" File.symlink(filename, new_filename) File.symlink? new_filename # => true File.file? new_filename # => true All of Ruby’s file predicates return false if the file doesn’t exist at all. This means you can test “exists and is a directory” by just testing directory?; it’s the same for the other predicates. See Also • Recipe 6.8, “Writing to a TemporaryFile,” and Recipe 6.14, “Backing Up to Versioned Filenames,” deal with writing to files that don’t currently exist 6.2 Checking Your Access to a File Problem You want to see what you can do with a file: whether you have read, write, or (on Unix systems) execute permission on it. Solution Use the class methods File.readable?, File.writeable?, and File.executable?. File.readable?('/bin/ls') # => true File.readable?('/etc/passwd-') # => false filename = 'test_file' File.open(filename, 'w') {} 192 | Chapter 6: Files and Directories File.writable?(filename) # => true File.writable?('/bin/ls') # => false File.executable?('/bin/ls') # => true File.executable?(filename) # => false Discussion Ruby’s file permission tests are Unix-centric, but readable? and writable? work on any platform; the rest fail gracefullywhen the OS doesn’t support them. For instance, Win- dows doesn’t have the Unix notion of execute permission, so File.executable? always returns true on Windows. The return value of a Unix permission test depends in part on whether your user owns the file in question, or whether you belong to the Unix group that owns it. Ruby provides convenience tests File.owned? and File.grpowned? to check this. File.owned? 'test_file' # => true File.grpowned? 'test_file' # => true File.owned? '/bin/ls' # => false On Windows, File.owned? always returns true (even for a file that belongs to another user) and File.grpowned? always returns false. The File methods described above should be enough to answer most permission questions about a file, but you can also see a file’s Unix permissions in their native form bylooking at the file’s mode. The mode is a number, each bit of which has a different meaning within the Unix permission system.* You can view a file’s mode with File::Lstat#mode. The result of mode contains some extra bits describing things like the type of a file. You probablywant to strip that information out bymasking those bits. This exam- ple demonstrates that the file originallycreated in the solution has a Unix permis- sion mask of 0644: File.lstat('test_file').mode & 0777 # Keep only the permission bits. # => 420 # That is, 0644 octal. Setuid and setgid scripts readable?, writable?, and executable? return answers that depend on the effective user and group ID you are using to run the Ruby interpreter. This may not be your actual user or group ID: the Rubyinterpreter might be running setuid or setgid, or youmight have changed their effective ID with Process.euid= or Process.egid=. Each of the permission checks has a corresponding method that returns answers from the perspective of the process’s real user and real group IDs: executable_real?, * If you’re not familiar with this, Recipe 6.3 describes the significance of the permission bits in a file’s mode. 6.3 Changing the Permissions on a File | 193 readable_real?, and writable_real?. If you’re running the Ruby interpreter setuid, then readable_real? (for instance) will give different answers from readable?. You can use this to disallow users from reading or modifying certain files unless they actually are the root user, not just taking on the root users’ privileges through setuid. For instance, consider the following code, which prints our real and effective user and group IDs, then checks to see what it can do to a system file: def what_can_i_do? sys = Process::Sys puts "UID=#{sys.getuid}, GID=#{sys.getgid}" puts "Effective UID=#{sys.geteuid}, Effective GID=#{sys.getegid}" file = '/bin/ls' can_do = [:readable?, :writable?, :executable?].inject([]) do |arr, method| arr << method if File.send(method, file); arr end puts "To you, #{file} is: #{can_do.join(', ')}" end If you run this code as root, you can call this method and get one set of answers, then take on the guise of a less privileged user and get another set of answers: what_can_i_do? # UID=0, GID=0 # Effective UID=0, Effective GID=0 # To you, /bin/ls is: readable?, writable?, executable? Process.uid = 1000 what_can_i_do? # UID=0, GID=0 # Effective UID=1000, Effective GID=0 # To you, /bin/ls is: readable?, executable? See Also • Recipe 6.3, “Changing the Permissions on a File” • Recipe 23.3, “Running Code as Another User,” has more on setting the effective user ID 6.3 Changing the Permissions on a File Problem You want to control access to a file bymodifyingits Unix permissions. For instance, you want to make it so that everyone on your system can read a file, but only you can write to it. 194 | Chapter 6: Files and Directories Solution Unless you’ve got a lot of Unix experience, it’s hard to remember the numeric codes for the nine Unix permission bits. Probablythe first thing youshould do is define constants for them. Here’s one constant for everyone of the permission bits. If these names are too concise for you, you can name them USER_READ, GROUP_WRITE, OTHER_ EXECUTE, and so on. class File U_R = 0400 U_W = 0200 U_X = 0100 G_R = 0040 G_W = 0020 G_X = 0010 O_R = 0004 O_W = 0002 O_X = 0001 end You might also want to define these three special constants, which you can use to set the user, group, and world permissions all at once: class File A_R = 0444 A_W = 0222 A_X = 0111 end Now you’re ready to actually change a file’s permissions. Every Unix file has a per- mission bitmap, or mode, which you can change (assuming you have the permis- sions!) bycalling File.chmod. You can manipulate the constants defined above to get a new mode, then pass it in along with the filename to File.chmod. The following three chmod calls are equivalent: for the file my_file, theygive read- write access to to the user who owns the file, and restrict everyone else to read-only access. This is equivalent to the permission bitmap 11001001, the octal number 0644, or the decimal number 420. open("my_file", "w") {} File.chmod(File::U_R | File::U_W | File::G_R | File::O_R, "my_file") File.chmod(File::A_R | File::U_W, "my_file") File.chmod(0644, "my_file") # Bitmap: 110001001 File::U_R | File::U_W | File::G_R | File::O_R # => 420 File::A_R | File::U_W # => 420 0644 # => 420 File.lstat("my_file").mode & 0777 # => 420 Note how I build a full permission bitmap bycombining the permission constants with the OR operator (|). 6.3 Changing the Permissions on a File | 195 Discussion A Unix file has nine associated permission bits that are consulted whenever anyone tries to access the file. They’re divided into three sets of three bits. There’s one set for the user who owns the file, one set is for the user group who owns the file, and one set is for everyone else. Each set contains one bit for each of the three basic things you might do to a file in Unix: read it, write it, or execute it as a program. If the appropriate bit is set for you, you can carry out the operation; if not, you’re denied access. When you put these nine bits side by side into a bitmap, they form a number that you can pass into File.chmod. These numbers are difficult to construct and read without a lot of practice, which is whyI recommend youuse the constants defined above. It’ll make your code less buggy and more readable.* File.chmod completelyoverwrites the file’s current permission bitmap with a new one. Usuallyyoujust want to change one or two permissions: make sure the file isn’t world-writable, for instance. The simplest wayto do this is to use File.lstat#mode to get the file’s current permission bitmap, then modifyit with bit operators to add or remove permissions. You can pass the result into File.chmod. Use the XOR operator (^) to remove permissions from a bitmap, and the OR opera- tor, as seen above, to add permissions: # Take away the world's read access. new_permission = File.lstat("my_file").mode ^ File::O_R File.chmod(new_permission, "my_file") File.lstat("my_file").mode & 0777 # => 416 # 0640 octal # Give everyone access to everything new_permission = File.lstat("my_file").mode | File::A_R | File::A_W | File::A_X File.chmod(new_permission, "my_file") File.lstat("my_file").mode & 0777 # => 511 # 0777 octal # Take away the world's write and execute access new_permission = File.lstat("my_file").mode ^ (File::O_W | File::O_X) File.chmod(new_permission, "my_file") File.lstat("my_file").mode & 0777 # => 508 # 0774 octal If doing bitwise math with the permission constants is also too complicated for you, you can use code like this to parse a permission string like the one accepted by the Unix chmod command: class File def File.fancy_chmod(permission_string, file) * It’s true that it’s more macho to use the numbers, but if you really wanted to be macho you’d be writing a shell script, not a Ruby program. 196 | Chapter 6: Files and Directories mode = File.lstat(file).mode permission_string.scan(/[ugoa][+-=][rwx]+/) do |setting| who = setting[0..0] setting[2..setting.size].each_byte do |perm| perm = perm.chr.upcase mask = eval("File::#{who.upcase}_#{perm}") (setting[1] == ?+) ? mode |= mask : mode ^= mask end end File.chmod(mode, file) end end # Give the owning user write access File.fancy_chmod("u+w", "my_file") File.lstat("my_file").mode & 0777 # => 508 # 0774 octal # Take away the owning group's execute access File.fancy_chmod("g-x", "my_file") File.lstat("my_file").mode & 0777 # => 500 # 0764 octal # Give everyone access to everything File.fancy_chmod("a+rwx", "my_file") File.lstat("my_file").mode & 0777 # => 511 # 0777 octal # Give the owning user access to everything. Then take away the # execute access for users who aren't the owning user and aren't in # the owning group. File.fancy_chmod("u+rwxo-x", "my_file") File.lstat("my_file").mode & 0777 # => 510 # 0774 octal Unix-like systems such as Linux and Mac OS X support the full range of Unix per- missions. On Windows systems, the only one of these operations that makes sense is adding or subtracting the U_W bit of a file—making a file read-onlyor not. You can use File.chmod on Windows, but the onlybit you’llbe able to change is the user write bit. See Also • Recipe 6.2, “Checking Your Access to a File” • Recipe 23.9, “Normalizing Ownership and Permissions in User Directories” 6.4 Seeing When a File Was Last Used Problem You want to see when a file was last accessed or modified. 6.4 Seeing When a File Was Last Used | 197 Solution The result of File.stat contains a treasure trove of metadata about a file. Perhaps the most useful of its methods are the two time methods mtime (the last time anyone wrote to the file), and atime (the last time anyone read from the file). open("output", "w") { |f| f << "Here's some output.\n" } stat = File.stat("output") stat.mtime # => Thu Mar 23 12:23:54 EST 2006 stat.atime # => Thu Mar 23 12:23:54 EST 2006 sleep(2) open("output", "a") { |f| f << "Here's some more output.\n" } stat = File.stat("output") stat.mtime # => Thu Mar 23 12:23:56 EST 2006 stat.atime # => Thu Mar 23 12:23:54 EST 2006 sleep(2) open("output") { |f| contents = f.read } stat = File.stat("output") stat.mtime # => Thu Mar 23 12:23:56 EST 2006 stat.atime # => Thu Mar 23 12:23:58 EST 2006 Discussion A file’s atime changes whenever data is read from the file, and its mtime changes whenever data is written to the file. There’s also a ctime method, but it’s not as useful as the other two. Contraryto semi- popular belief, ctime does not track the creation time of the file (there’s no wayto track this in Unix). A file’s ctime is basicallya more inclusive version of its mtime. The ctime changes not onlywhen someone modifies the contents of a file, but when someone changes its permissions or its other metadata. All three methods are useful for separating the files that actuallyget used from the ones that just sit there on disk. They can also be used in sanity checks. Here’s code for the part of a game that saves and loads the game state to a file. As a deterrent against cheating, when the game loads a save file it performs a simple check against the file’s modification time. If it differs from the timestamp recorded inside the file, the game refuses to load the save file. The save_game method is responsible for recording the timestamp: def save_game(file) score = 1000 open(file, "w") do |f| f.puts(score) f.puts(Time.new.to_i) end end 198 | Chapter 6: Files and Directories The load_game method is responsible for comparing the timestamp within the file to the time the filesystem has associated with the file: def load_game(file) open(file) do |f| score = f.readline.to_i time = Time.at(f.readline.to_i) difference = (File.stat(file).mtime - time).abs raise "I suspect you of cheating." if difference > 1 "Your saved score is #{score}." end end This mechanism can detect simple forms of cheating: save_game("game.sav") sleep(2) load_game("game.sav") # => "Your saved score is 1000." # Now let's cheat by increasing our score to 9000 open("game.sav", "r+b") { |f| f.write("9") } load_game("game.sav") # RuntimeError: I suspect you of cheating. Since it’s possible to modifya file’s times with tools like the Unix touch command, you shouldn’t depend on these methods to defend you against a skilled attacker actively trying to fool your program. See Also • An example in Recipe 3.12, “Running a Code Block Periodically,” monitors a file for changes by checking its mtime periodically • Recipe 6.20, “Finding the Files You Want,” shows examples of filesystem searches that make comparisons between the file times 6.5 Listing a Directory Problem You want to list or process the files or subdirectories within a directory. Solution If you’re starting from a directory name, you can use Dir.entries to get an arrayof the items in the directory, or Dir.foreach to iterate over the items. Here’s an exam- ple of each run on a sample directory: # See the chapter intro to get the create_tree library require 'create_tree' 6.5 Listing a Directory | 199 create_tree 'mydir' => [ {'subdirectory' => [['file_in_subdirectory', 'Just a simple file.']] }, '.hidden_file', 'ruby_script.rb', 'text_file' ] Dir.entries('mydir') # => [".", "..", ".hidden_file", "ruby_script.rb", "subdirectory", # "text_file"] Dir.foreach('mydir') { |x| puts x if x != "." && x != ".."} # .hidden_file # ruby_script.rb # subdirectory # text_file You can also use Dir[] to pick up all files matching a certain pattern, using a format similar to the bash shell’s glob format (and somewhat less similar to the wildcard for- mat used by the Windows command-line shell): # Find all the "regular" files and subdirectories in mydir. This excludes # hidden files, and the special directories . and .. Dir["mydir/*"] # => ["mydir/ruby_script.rb", "mydir/subdirectory", "mydir/text_file"] # Find all the .rb files in mydir Dir["mydir/*.rb"] # => ["mydir/ruby_script.rb"] You can also open a directoryhandle with Dir#open, and treat it like anyother Enumerable. Methods like each, each_with_index, grep, and reject will all work (but see below if you want to call them more than once). As with File#open, you should do yourdirectoryprocessing in a code block so that the directoryhandle will get closed once you’re done with it. Dir.open('mydir') { |d| d.grep /file/ } # => [".hidden_file", "text_file"] Dir.open('mydir') { |d| d.each { |x| puts x } } # . # .. # .hidden_file # ruby_script.rb # subdirectory # text_file Discussion Reading entries from a Dir object is more like reading data from a file than iterating over an array. If you call one of the Dir instance methods and then want to call another one on the same Dir object, you’ll need to call Dir#rewind first to go back to the beginning of the directory listing: #Get all contents other than ".", "..", and hidden files. d = Dir.open('mydir') d.reject { |f| f[0] == '.' } # => ["subdirectory", "ruby_script.rb", "text_file"] 200 | Chapter 6: Files and Directories #Now the Dir object is useless until we call Dir#rewind. d.entries.size # => 0 d.rewind d.entries.size # => 6 #Get the names of all files in the directory. d.rewind d.reject { |f| !File.file? File.join(d.path, f) } # => [".hidden_file", "ruby_script.rb", "text_file"] d.close Methods for listing directories and looking for files return string pathnames instead of File and Dir objects. This is partlyfor efficiency,and partlybecause creating a File or Dir actually opens up a filehandle on that file or directory. Even so, it’s annoying to have to take the output of these methods and patch together real File or Dir objects on which you can operate. Here’s a simple method that will build a File or Dir, given a filename and the name or Dir of the parent directory: def File.from_dir(dir, name) dir = dir.path if dir.is_a? Dir path = File.join(dir, name) (File.directory?(path) ? Dir : File).open(path) { |f| yield f } end As with File#open and Dir#open, the actual processing happens within a code block: File.from_dir("mydir", "subdirectory") do |subdir| File.from_dir(subdir, "file_in_subdirectory") do |file| puts %{My path is #{file.path} and my contents are "#{file.read}".} end end # My path is mydir/subdirectory/file_in_subdirectory and my contents are # "Just a simple file". Globs make excellent shortcuts for finding files in a directoryor a directorytree. Especiallyuseful is the ** glob, which matches anynumber of directories. A glob is the easiest and fastest wayto recursivelyprocess everyfile in a directorytree, although it loads all the filenames into an array in memory. For a less memory- intensive solution, see the find library, described in Recipe 6.12. Dir["mydir/**/*"] # => ["mydir/ruby_script.rb", "mydir/subdirectory", "mydir/text_file", # "mydir/subdirectory/file_in_subdirectory"] Dir["mydir/**/*file*"] # => ["mydir/text_file", "mydir/subdirectory/file_in_subdirectory"] A brief tour of the other features of globs: #Regex-style character classes Dir["mydir/[rs]*"] # => ["mydir/ruby_script.rb", "mydir/subdirectory"] Dir["mydir/[^s]*"] # => ["mydir/ruby_script.rb", "mydir/text_file"] 6.6 Reading the Contents of a File | 201 # Match any of the given strings Dir["mydir/{text,ruby}*"] # => ["mydir/text_file", "mydir/ruby_script.rb"] # Single-character wildcards Dir["mydir/?ub*"] # => ["mydir/ruby_script.rb", "mydir/subdirectory"] Globs will not pick up files or directories whose names start with periods, unless you match them explicitly: Dir["mydir/.*"] # => ["mydir/.", "mydir/..", "mydir/.hidden_file"] See Also • Recipe 6.12, “Walking a Directory Tree” • Recipe 6.20, “Finding the Files You Want” 6.6 Reading the Contents of a File Problem You want to read some or all of a file into memory. Solution Open the file with Kernel#open, and pass in a code block that does the actual read- ing. To read the entire file into a single string, use IO#read: #Put some stuff into a file. open('sample_file', 'w') do |f| f.write("This is line one.\nThis is line two.") end # Then read it back out. open('sample_file') { |f| f.read } # => "This is line one.\nThis is line two." To read the file as an array of lines, use IO#readlines: open('sample_file') { |f| f.readlines } # => ["This is line one.\n", "This is line two."] To iterate over each line in the file, use IO#each. This technique loads onlyone line into memory at a time: open('sample_file').each { |x| p x } # "This is line one.\n" # "This is line two." Discussion How much of the file do you want to read into memory at once? Reading the entire file in one gulp uses memoryequal to the size of the file, but youend up with a string, and you can use any of Ruby’s string processing techniques on it. 202 | Chapter 6: Files and Directories The alternative is to process the file one chunk at a time. This uses onlythe memory needed to store one chunk, but it can be more difficult to work with, because any given chunk maybe incomplete. To process a chunk, youmayend up reading the next chunk, and the next. This code reads the first 50-byte chunk from a file, but it turns out not to be enough: puts open('conclusion') { |f| f.read(50) } # "I know who killed Mr. Lambert," said Joe. "It was If a certain string always marks the end of a chunk, you can pass that string into IO#each to get one chunk at a time, as a series of strings. This lets you process each full chunk as a string, and it uses less memory than reading the entire file. # Create a file... open('end_separated_records', 'w') do |f| f << %{This is record one. It spans multiple lines.ENDThis is record two.END} end # And read it back in. open('end_separated_records') { |f| f.each('END') { |record| p record } } # "This is record one.\nIt spans multiple lines.END" # "This is record two.END" You can also pass a delimiter string into IO#readlines to get the entire file split into an array by the delimiter string: # Create a file... open('pipe_separated_records', 'w') do |f| f << "This is record one.|This is record two.|This is record three." end # And read it back in. open('pipe_separated_records') { |f| f.readlines('|') } # => ["This is record one.|", "This is record two.|", # "This is record three."] The newline character usuallymakes a good delimiter (manyscripts process a file one line at a time), so by default, IO#each and IO#readlines split the file by line: open('newline_separated_records', 'w') do |f| f.puts 'This is record one. It cannot span multiple lines.' f.puts 'This is record two.' end open('newline_separated_records') { |f| f.each { |x| p x } } # "This is record one. It cannot span multiple lines.\n" # "This is record two.\n" The trouble with newlines is that different operating systems have different newline formats. Unix newlines look like “\n”, while Windows newlines look like “\r\n”, and the newlines for old (pre-OS X) Macintosh files look like “\r”. A file uploaded to a web application might come from anyof those systems,but IO#each and 6.6 Reading the Contents of a File | 203 IO#readlines split files into lines depending on the newline character of the OS that’s running the Ruby script (this is kept in the special variable $/). What to do? Bypassing “\n” into IO#each or IO#readlines, you can handle the newlines of files created on any recent operating system. If you need to handle all three types of new- lines, the easiest wayis to read the entire file at once and then split it up with a regu- lar expression. open('file_from_unknown_os') { |f| f.read.split(/\r?\n|\r(?!\n)/) } IO#each and IO#readlines don’t strip the delimiter strings from the end of the lines. Assuming the delimiter strings aren’t useful to you, you’ll have to strip them manually. To strip delimiter characters from the end of a line, use the String#chomp or String#chomp! methods. Bydefault, these methods will remove the last character or set of characters that can be construed as a newline. However, theycan be made to strip any other delimiter string from the end of a line. "This line has a Unix/Mac OS X newline.\n".chomp # => "This line has a Unix/Mac OS X newline." "This line has a Windows newline.\r\n".chomp # => "This line has a Windows newline." "This line has an old-style Macintosh newline.\r".chomp # => "This line has an old-style Macintosh newline." "This string contains two newlines.\n\n".chomp # "This string contains two newlines.\n" 'This is record two.END'.chomp('END') # => "This is record two." 'This string contains no newline.'.chomp # => "This string contains no newline." You can chomp the delimiters as IO#each yields each record, or you can chomp each line returned by IO#readlines: open('pipe_separated_records') do |f| f.each('|') { |l| puts l.chomp('|') } end # This is record one. # This is record two. # This is record three. lines = open('pipe_separated_records') { |f| f.readlines('|') } # => ["This is record one.|", "This is record two.|", # "This is record three."] lines.each { |l| l.chomp!('|') } # => ["This is record one.", "This is record two.", "This is record three."] You’ve got a problem if a file is too big to fit into memory, and there are no known delimiters, or if the records between the delimiters are themselves too big to fit in 204 | Chapter 6: Files and Directories memory. You’ve got no choice but to read from the file in chunks of a certain num- ber of bytes. This is also the best way to read binary files; see Recipe 6.17 for more. Use IO#read to read a certain number of bytes, or IO#each_byte to iterate over the File one byte at a time. The following code uses IO#read to continuouslyread uniformly sized chunks until it reaches end-of-file: class File def each_chunk(chunk_size=1024) yield read(chunk_size) until eof? end end open("pipe_separated_records") do |f| f.each_chunk(15) { |chunk| puts chunk } end # This is record # one.|This is re # cord two.|This # is record three # . All of these methods are made available bythe IO class, the superclass of File. You can use the same methods on Socket objects. You can also use each and each_byte on String objects, which in some cases can save you from having to create a StringIO object (see Recipe 6.15 for more on those beasts). See Also • Recipe 6.11, “Performing Random Access on “Read-Once” Input Streams” • Recipe 6.17, “Processing a BinaryFile,” goes into more depth about reading files as chunks of bytes • Recipe 6.15, “Pretending a String Is a File” 6.7 Writing to a File Problem You want to write some text or Rubydata structures to a file. The file might or might not exist. If it does exist, you might want to overwrite the old contents, or just append new data to the end of the file. Solution Open the file in write mode ('w'). The file will be created if it doesn’t exist, and trun- cated to zero bytes if it does exist. You can then use IO#write or the << operator to write strings to the file, as though the file itself were a string and you were append- ing to it. 6.7 Writing to a File | 205 You can also use IO#puts or IO#p to write lines to the file, the same wayyoucan use Kernel#puts or Kernel#p to write lines to standard output. Both of the following chunks of code destroythe previous contents of the file output, then write a new string to the file: open('output', 'w') { |f| f << "This file contains great truths.\n" } open('output', 'w') do |f| f.puts 'The great truths have been overwritten with an advertisement.' end open('output') { |f| f.read } # => "The great truths have been overwritten with an advertisement.\n" To append to a file without overwriting its old contents, open the file in append mode ('a') instead of write mode: open('output', "a") { |f| f.puts 'Buy Ruby(TM) brand soy sauce!' } open('output') { |f| puts f.read } # The great truths have been overwritten with an advertisement. # Buy Ruby(TM) brand soy sauce! Discussion Sometimes you’ll only need to write a single (possibly very large) string to a file. Usu- ally, though, you’ll be getting your strings one at a time from a data structure or some other source, and you’ll call puts or the append operator within some kind of loop: open('output', 'w') do |f| [1,2,3].each { |i| f << i << ' and a ' } end open('output') { |f| f.read } # => "1 and a 2 and a 3 and a " Since the << operator returns the filehandle it wrote to, you can chain calls to it. As seen above, this feature lets you write multiple strings to a file in a single line of Ruby code. Because opening a file in write mode destroys the file’s existing contents, you should only use it when you don’t care about the old contents, or after you’ve read them into memoryfor later use. Append mode is nondestructive, making it useful for files like log files, which need to be updated periodically without destroying their old contents. Buffered I/O There’s no guarantee that data will be written to your file as soon as you call << or puts. Since disk writes are expensive, Rubylets changes to a file pile up in a buffer. It occasionallyflushes the buffer, sending the data to the operating systemso it can be written to disk. 206 | Chapter 6: Files and Directories You can manuallyflush Ruby’sbuffer for a particular file bycalling its IO#flush method. You can turn off Ruby’s buffering altogether by setting IO.sync to false. However, your operating system probably does some disk buffering of its own, so doing these things won’t neccessarily write your changes directly to disk. open('output', 'w') do |f| f << 'This is going into the Ruby buffer.' f.flush # Now it's going into the OS buffer. end IO.sync = false open('output', 'w') { |f| f << 'This is going straight into the OS buffer.' } See Also • Recipe 1.1, “Building a String from Parts” • Recipe 6.6, “Reading the Contents of a File” • Recipe 6.19, “Truncating a File” 6.8 Writing to a Temporary File Problem You want to write data to a secure temporary file with a unique name. Solution Create a Tempfile object. It has all the methods of a File object, and it will be in a location on disk guaranteed to be unique. require 'tempfile' out = Tempfile.new("tempfile") out.path # => "/tmp/tempfile23786.0" A Tempfile object is opened for read-write access (mode w+), so you can write to it and then read from it without having to close and reopen it: out << "Some text." out.rewind out.read # => "Some text." out.close Note that you can’t pass a code block into the Tempfile constructor: you have to assign the temp file to an object, and call Tempfile#close when you’re done. Discussion To avoid securityproblems, use the Tempfile class to generate temp file names, instead of writing the code yourself. The Tempfile class creates a file on disk guaran- teed not to be in use byanyother thread or process, and sets that file’s permissions 6.9 Picking a Random Line from a File | 207 so that onlyyoucan read or write to it. This eliminates anypossibilitythat a hostile process might inject fake data into the temp file, or read what you write.* The name of a temporaryfile incorporates the string youpass into the Tempfile con- structor, the process ID of the current process ($$,or$PID if you’ve done an include English), and a unique number. Bydefault, temporaryfiles are created in Dir:: tmpdir (usually /tmp), but you can pass in a different directory name: out = Tempfile.new("myhome_tempfile", "/home/leonardr/temp/") No matter where you create your temporary files, when your process exits, all of its temporary files are automatically destroyed. If you want the data you wrote to tem- porary files to live longer than your process, you should copy or move the temporary files to “real” files: require 'fileutils' FileUtils.mv(out.path, "/home/leonardr/old_tempfile") The tempfile assumes that the operating system can atomically open a file and get an exclusive lock on it. This doesn’t work on all filesystems. Ara Howard’s lockfile library (available as a gem of the same name) uses linking, which is atomic everywhere. 6.9 Picking a Random Line from a File Problem You want to choose a random line from a file, without loading the entire file into memory. Solution Iterate over the file, giving each line a chance to be the randomly selected one: module Enumerable def random_line selected = nil each_with_index { |line, lineno| selected = line if rand < 1.0/lineno } return selected.chomp if selected end end #Create a file with 1000 lines open('random_line_test', 'w') do |f| 1000.times { |i| f.puts "Line #{i}" } end #Pick random lines from the file. f = open('random_line_test') * Unless the hostile process is running as you or as the root user, but then you’ve got bigger problems. 208 | Chapter 6: Files and Directories f.random_line # => "Line 520" f.random_line # => nil f.rewind f.random_line # => "Line 727" Discussion The obvious solution reads the entire file into memory: File.open('random_line_test') do |f| l = f.readlines l[rand(l.size)].chomp end # => "Line 708" The recommended solution is just as fast, and onlyreads one line at a time into memory. However, once it’s done, the file pointer has been set to the end of the file and you can’t access the file anymore without calling File#rewind. If you want to pick a lot of random lines from a file, reading the entire file into memorymight be preferable to iterating over it multiple times. This recipe makes for a good command-line tool. The following code uses the spe- cial variable $., which holds the number of the line most recently read from a file: $ ruby -e 'rand < 1.0/$. and line = $_ while gets; puts line.chomp if line' The algorithm works because, although lines that come earlier in the file have a bet- ter chance of being selected initially, they also have more chances to be replaced by a later line. A proof byinduction demonstrates that the algorithm gives equal weight to each line in the file. The base case is a file of a single line, where it will obviouslywork: anyvalue of Kernel#rand will be less than 1, so the first line will always be chosen. Now for the inductive step. Assume that the algorithm works for a file of n lines: that is, each of the first n lines has a 1/n chance of being chosen. Then, add another line to the file and process the new line. The chance that line n+1 will become the ran- domlychosen line is 1/(n+1). The remaining probability, n/n+1, is the chance that one of the other n lines is the randomly chosen one. Our inductive assumption was that each of the n original lines had an equal chance of being chosen, so this remaining n/n+1 probabilitymust be distributed evenly across the n original lines. Given a line in the first n, what’s it’s chance of being the chosen one? It’s just n/n+1 divided by n,or1/n+1. Line n+1 and all earlier lines have a 1/n+1 chance of being chosen, so the choice is truly random. See Also • Recipe 2.5, “Generating Random Numbers” • Recipe 4.10, “Shuffling an Array” • Recipe 5.11, “Choosing Randomly from a Weighted List” 6.10 Comparing Two Files | 209 6.10 Comparing Two Files Problem You want to see if two files contain the same data. If theydiffer, youmight want to represent the differences between them as a string: a patch from one to the other. Solution If two files differ, it’s likelythat their sizes also differ, so youcan often solve the problem quicklybycomparing sizes. If both files are regular files with the same size, you’ll need to look at their contents. This code does the cheap checks first: 1. If one file exists and the other does not, they’re not the same. 2. If neither file exists, say they’re the same. 3. If the files are the same file, they’re the same. 4. If the files are of different types or sizes, they’re not the same. class File def File.same_contents(p1, p2) return false if File.exists?(p1) != File.exists?(p2) return true if !File.exists?(p1) return true if File.expand_path(p1) == File.expand_path(p2) return false if File.ftype(p1) != File.ftype(p2) || File.size(p1) != File.size(p2) Otherwise, it compares the files contents, a block at a time: open(p1) do |f1| open(p2) do |f2| blocksize = f1.lstat.blksize same = true while same && !f1.eof? && !f2.eof? same = f1.read(blocksize) == f2.read(blocksize) end return same end end end end To illustrate, I’ll create two identical files and compare them. I’ll then make them slightly different, and compare them again. 1.upto(2) do |i| open("output#{i}", 'w') { |f| f << 'x' * 10000 } end File.same_contents('output1', 'output2') # => true 210 | Chapter 6: Files and Directories open("output1", 'a') { |f| f << 'x' } open("output2", 'a') { |f| f << 'y' } File.same_contents('output1', 'output2') # => false File.same_contents('nosuchfile', 'output1') # => false File.same_contents('nosuchfile1', 'nosuchfile2') # => true Discussion The code in the Solution works well if you only need to determine whether two files are identical. If you need to see the differences between two files, the most useful tool is is Austin Ziegler’s Diff::LCS library, available as the diff-lcs gem. It imple- ments a sophisticated diff algorithm that can find the differences between anytwo enumerable objects, not just strings. You can use its LCS module to represent the dif- ferences between two nested arrays, or other complex data structures. The downside of such flexibilityis a poor interface when youjust want to diff two files or strings. A diff is represented byan arrayof Change objects, and though you can traverse this arrayin helpful ways,there’s no simple wayto just turn it into a string representation of the sort you might get by running the Unix command diff. Fortunately, the lcs-diff gem comes with command-line diff programs ldiff and htmldiff. If you need to perform a textual diff from within Ruby code, you can do one of the following: 1. Call out to one of those programs: assuming the gem is installed, this is more portable than relying on the Unix diff command. 2. Import the program’s underlying library, and fake a command-line call to it. You’ll have to modify your own program’s ARGV, at least temporarily. 3. Write Rubycode that copies one of the underlyingimplementations to do what you want. Here’s some code, adapted from the ldiff command-line program, which builds a string representation of the differences between two strings. The result is something you might see by running ldiff, or the Unix command diff. The most common diff formats are :unified and :context. require 'rubygems' require 'diff/lcs/hunk' def diff_as_string(data_old, data_new, format=:unified, context_lines=3) First we massage the data into shape for the diff algorithm: data_old = data_old.split(/\n/).map! { |e| e.chomp } data_new = data_new.split(/\n/).map! { |e| e.chomp } Then we perform the diff, and transform each “hunk” of it into a string: output = "" diffs = Diff::LCS.diff(data_old, data_new) 6.10 Comparing Two Files | 211 return output if diffs.empty? oldhunk = hunk = nil file_length_difference = 0 diffs.each do |piece| begin hunk = Diff::LCS::Hunk.new(data_old, data_new, piece, context_lines, file_length_difference) file_length_difference = hunk.file_length_difference next unless oldhunk # Hunks may overlap, which is why we need to be careful when our # diff includes lines of context. Otherwise, we might print # redundant lines. if (context_lines > 0) and hunk.overlaps?(oldhunk) hunk.unshift(oldhunk) else output << oldhunk.diff(format) end ensure oldhunk = hunk output << "\n" end end #Handle the last remaining hunk output << oldhunk.diff(format) << "\n" end Here it is in action: s1 = "This is line one.\nThis is line two.\nThis is line three.\n" s2 = "This is line 1.\nThis is line two.\nThis is line three.\n" + "This is line 4.\n" puts diff_as_string(s1, s2) # @@ -1,4 +1,5 @@ # -This is line one. # +This is line 1. # This is line two. # This is line three. # +This is line 4. With all that code, on a Unix system you could be forgiven for just calling out to the Unix diff program: open('old_file', 'w') { |f| f << s1 } open('new_file', 'w') { |f| f << s2 } puts %x{diff old_file new_file} # 1c1 # < This is line one. # --- # > This is line 1. # 3a4 # > This is line 4. 212 | Chapter 6: Files and Directories See Also • The algorithm-diff gem is another implementation of a general diff algorithm; its API is a little simpler than diff-lcs, but it has the same basic structure; both gems are descended from Perl’s Algorithm::Diff module • It’s not available as a gem, but the diff.rb package is a little easier to script from Rubyif youneed to create a textual diff of two files; look at how the unixdiff.rb program creates a Diff object and manipulates it (http://users.cybercity.dk/ ~dsl8950/ruby/diff.html) • The MD5 checksum is often used in file comparisons: I didn’t use it in this rec- ipe because when you’re only comparing two files, it’s faster to compare their contents; in Recipe 23.7, “Finding Duplicate Files,” though, the MD5 checksum is used as a convenient shorthand for the contents of many files 6.11 Performing Random Access on “Read-Once” Input Streams Problem You have an IO object, probablya socket, that doesn’t support random-access meth- ods like seek, pos=, and rewind. You want to treat this object like a file on disk, where you can jump around and reread parts of the file. Solution The simplest solution is to read the entire contents of the socket (or as much as you’re going to need) and put it into a StringIO object. You can then treat the StringIO object exactly like a file: require 'socket' require 'stringio' sock = TCPSocket.open("www.example.com", 80) sock.write("GET /\n") file = StringIO.new(sock.read) file.read(10) # => "\r\n "\r\n " this web page " Discussion A socket is supposed to work just like a file, but sometimes the illusion breaks down. Since the data is coming from another computer over which you have no control, you 6.11 Performing Random Access on “Read-Once” Input Streams | 213 can’t just go back and reread data you’ve already read. That data has already been sent over the pipe, and the server doesn’t care if you lost it or need to process it again. If you have enough memory to read the entire contents of a socket, it’s easy to put the results into a form that more closelysimulates a file on disk. But youmight not want to read the entire socket, or the socket maybe one that keeps sending data until you close it. In that case you’ll need to buffer the data as you read it. Instead of using memoryfor the entire contents of the socket (which maybe infinite), you’llonlyuse memory for the data you’ve actually read. This code defines a BufferedIO class that adds data to an internal StringIO as it’s read from its source: class BufferedIO def initialize(io) @buff = StringIO.new @source = io @pos = 0 end def read(x=nil) to_read = x ? to_read = x+@buff.pos-@buff.size : nil _append(@source.read(to_read)) if !to_read or to_read > 0 @buff.read(x) end def pos=(x) read(x-@buff.pos) if x > @buff.size @buff.pos = x end def seek(x, whence=IO::SEEK_SET) case whence when IO::SEEK_SET then self.pos=(x) when IO::SEEK_CUR then self.pos=(@buff.pos+x) when IO::SEEK_END then read; self.pos=(@buff.size-x) # Note: SEEK END reads all the socket data. end pos end # Some methods can simply be delegated to the buffer. ["pos", "rewind", "tell"].each do |m| module_eval "def #{m}\n@buff.#{m}\nend" end private def _append(s) @buff << s @buff.pos -= s.size end end 214 | Chapter 6: Files and Directories Now you can seek, rewind, and generallymove around in an input socket as if it were a disk file. You only have to read as much data as you need: sock = TCPSocket.open("www.example.com", 80) sock.write("GET /\n") file = BufferedIO.new(sock) file.read(10) # => "\r\n 0 file.read(10) # => "\r\n 90 file.read(15) # => " this web page " file.seek(-10, IO::SEEK_CUR) # => 95 file.read(10) # => " web page " BufferedIO doesn’t implement all the methods of IO, onlythe ones not implemented bysocket-type IO objects. If you need the other methods, you should be able to implement the ones you need using the existing methods as guidelines. For instance, you could implement readline like this: class BufferedIO def readline oldpos = @buff.pos line = @buff.readline unless @buff.eof? if !line or line[-1] != ?\n _append(@source.readline) # Finish the line @buff.pos = oldpos # Go back to where we were line = @buff.readline # Read the line again end line end end file.readline # => "by typing "example.com",\r\n" See Also • Recipe 6.17, “Processing a Binary File,” for more information on IO#seek 6.12 Walking a Directory Tree Problem You want to recursively process every subdirectory and file within a certain directory. Solution Suppose that the directorytree youwant to walk looks like this (see this chapter’s introduction section for the create_tree librarythat can build this directorytree automatically): 6.12 Walking a Directory Tree | 215 require 'create_tree' create_tree './' => [ 'file1', 'file2', { 'subdir1/' => [ 'file1' ] }, { 'subdir2/' => [ 'file1', 'file2', { 'subsubdir/' => [ 'file1' ] } ] } ] The simplest solution is to load all the files and directories into memorywith a big recursive file glob, and iterate over the resulting array. This uses a lot of memory because all the filenames are loaded into memory at once: Dir['**/**'] # => ["file1", "file2", "subdir1", "subdir2", "subdir1/file1", # "subdir2/file1", "subdir2/file2", "subdir2/subsubdir", # "subdir2/subsubdir/file1"] A more elegant solution is to use the find method in the Find module. It performs a depth-first traversal of a directorytree, and calls the given code block on each direc- toryand file. The code block should take as an argument the full path to a directory or file. This snippet calls Find.find with a code block that simplyprints out each path it receives. This demonstrates how Ruby performs the traversal: require 'find' Find.find('./') { |path| puts path } # ./ # ./subdir2 # ./subdir2/subsubdir # ./subdir2/subsubdir/file1 # ./subdir2/file2 # ./subdir2/file1 # ./subdir1 # ./subdir1/file1 # ./file2 # ./file1 Discussion Even if you’re not a system administrator, the demands of keeping your own files organized will frequentlycall for youto process everyfile in a directorytree. You may want to backup, modify, or delete each file in the directory structure, or you may just want to see what’s there. Normally you’ll want to at least look at every file in the tree, but sometimes you’ll want to skip certain directories. For instance, you might know that a certain direc- tory is full of a lot of large files you don’t want to process. When your block is passed a path to a directory, you can prevent Find.find from recursing into a directoryby 216 | Chapter 6: Files and Directories calling Find.prune. In this example, I’ll prevent Find.find from processing the files in the subdir2 directory. Find.find('./') do |path| Find.prune if File.basename(path) == 'subdir2' puts path end # ./ # ./subdir1 # ./subdir1/file1 # ./file2 # ./file1 Calling Find.prune when your block has been passed a file will only prevent Find. find from processing that one file. It won’t halt the processing of the rest of the files in that directory: Find.find('./') do |path| if File.basename(path) =~ /file2$/ puts "PRUNED #{path}" Find.prune end puts path end # ./ # ./subdir2 # ./subdir2/subsubdir # ./subdir2/subsubdir/file1 # PRUNED ./subdir2/file2 # ./subdir2/file1 # ./subdir1 # ./subdir1/file1 # PRUNED ./file2 # ./file1 Find.find works bykeeping a queue of files to process. When it finds a directory,it inserts that directory’s files at the beginning of the queue. This gives it the character- istics of a depth-first traversal. Note how all the files in the top-level directoryare processed after the subdirectories. The alternative would be a breadth-first traversal, which would process the files in a directory before even touching the subdirectories. If you want to do a breadth-first traversal instead of a depth-first one, the simplest solution is to use a glob and sort the resulting array. Pathnames sort naturally in a way that simulates a breadth-first traversal: Dir["**/**"].sort.each { |x| puts x } # file1 # file2 # subdir1 # subdir1/file1 # subdir2 # subdir2/file1 # subdir2/file2 # subdir2/subsubdir # subdir2/subsubdir/file1 6.13 Locking a File | 217 See Also • Recipe 6.20, “Finding the Files You Want” • Recipe 23.7, “Finding Duplicate Files” 6.13 Locking a File Problem You want to prevent other threads or processes from modifying a file that you’re working on. Solution Open the file, then lock it with File#flock. There are two kinds of lock; pass in the File constant for the kind you want. • File::LOCK_EX gives you an exclusive lock, or write lock. If your thread has an exclusive lock on a file, no other thread or process can get a lock on that file. Use this when you want to write to a file without anyone else being able to write to it. • File::LOCK_SH will give you a shared lock, or read lock. Other threads and pro- cesses can get their own shared locks on the file, but no one can get an exclusive lock. Use this when you want to read a file and know that it won’t change while you’re reading it. Once you’re done using the file, you need to unlock it. Call File#flock again, and pass in File::LOCK_UN as the lock type. You can skip this step if you’re running on Windows. The best wayto handle all this is to enclose the locking and unlocking in a method that takes a block, the way open does: def flock(file, mode) success = file.flock(mode) if success begin yield file ensure file.flock(File::LOCK_UN) end end return success end This makes it possible to lock a file without having to worryabout unlocking it later. Even if your block raises an exception, the file will be unlocked and another thread can use it. open('output', 'w') do |f| flock(f, File::LOCK_EX) do |f| 218 | Chapter 6: Files and Directories f << "Kiss me, I've got a write lock on a file!" end end Discussion Different operating systems support different ways of locking files. Ruby’s flock implementation tries to hide the differences behind a common interface that looks like Unix’s file locking interface. In general, you can use flock as though you were on Unix, and your scripts will work across platforms. On Unix, both exclusive and shared locks work onlyif all threads and processes play bythe rules. If one thread has an exclusive lock on a file, another thread can still open the file without locking it and wreak havoc byoverwriting its contents. That’s whyit’s important to get a lock on anyfile that might conceivablybe used by another thread or another process on the system. Ruby’s block-oriented coding style makes it easy to do the right thing with locking. The following shortcut method works with the flock method previouslydefined. It takes care of opening, locking, unlocking, and closing a file, letting you focus on whatever you want to do with the file’s contents. def open_lock(filename, openmode="r", lockmode=nil) if openmode == 'r' || openmode == 'rb' lockmode ||= File::LOCK_SH else lockmode ||= File::LOCK_EX end value = nil open(filename, openmode) do |f| flock(f, lockmode) do begin value = yield f ensure f.flock(File::LOCK_UN) # Comment this line out on Windows. end end return value end end This code creates two threads, each of which want to access the same file. Thanks to locks, we can guarantee that onlyone thread is accessing the file at a time (see Chapter 20 if you’re not comfortable with threads). t1 = Thread.new do puts 'Thread 1 is requesting a lock.' open_lock('output', 'w') do |f| puts 'Thread 1 has acquired a lock.' f << "At last we're alone!" sleep(5) end 6.13 Locking a File | 219 puts 'Thread 1 has released its lock.' end t2 = Thread.new do puts 'Thread 2 is requesting a lock.' open_lock('output', 'r') do |f| puts 'Thread 2 has acquired a lock.' puts "File contents: #{f.read}" end puts 'Thread 2 has released its lock.' end t1.join t2.join # Thread 1 is requesting a lock. # Thread 1 has acquired a lock. # Thread 2 is requesting a lock. # Thread 1 has released its lock. # Thread 2 has acquired a lock. # File contents: At last we're alone! # Thread 2 has released its lock. Nonblocking locks If you try to get an exclusive or shared lock on a file, your thread will block until Rubycan lock the file. But youmight be left waiting a long time, perhaps forever. The code that has the file locked maybe buggyand in an infinite loop; or it may itself be blocking, waiting to lock a file that you have locked. You can avoid deadlock and similar problems byasking for a nonblocking lock. When you do, if Ruby can’t lock the file for you, File#flock returns false, rather than waiting (possiblyforever) for another thread or process to release its lock. If you don’t get a lock, you can wait a while and try again, or you can raise an exception and let the user deal with it. To make a lock into a nonblocking lock, use the OR operator (|) to combine File:: LOCK_NB with either File::LOCK_EX or File::LOCK_SH. The following code will print “I’ve got a lock!” if it can get an exclusive lock on the file “output”; otherwise it will print “I couldn’t get a lock.” and continue: def try_lock puts "I couldn't get a lock." unless open_lock('contested', 'w', File::LOCK_EX | File::LOCK_NB) do puts "I've got a lock!" true end end try_lock # I've got a lock! open('contested', 'w').flock(File::LOCK_EX) # Get a lock, hold it forever. 220 | Chapter 6: Files and Directories try_lock # I couldn't get a lock. See Also • Chapter 20, especiallyRecipe 20.11, “Avoiding Deadlock,” which covers other types of deadlock problems in a multithreaded environment 6.14 Backing Up to Versioned Filenames Problem You want to copya file to a numbered backup before overwriting the original file. More generally: rather than overwriting an existing file, you want to use a new file whose name is based on the original filename. Solution Use String#succ to generate versioned suffixes for a filename until you find one that doesn’t already exist: class File def File.versioned_filename(base, first_suffix='.0') suffix = nil filename = base while File.exists?(filename) suffix = (suffix ? suffix.succ : first_suffix) filename = base + suffix end return filename end end 5.times do |i| name = File.versioned_filename('filename.txt') open(name, 'w') { |f| f << "Contents for run #{i}" } puts "Created #{name}" end # Created filename.txt # Created filename.txt.0 # Created filename.txt.1 # Created filename.txt.2 # Created filename.txt.3 If you want to copy or move the original file to the versioned filename as a prelude to writing to the original file, include the ftools libraryto add the class methods File. copy and File.move. Then call versioned_filename and use File.copy or File.move to put the old file in its new place: require 'ftools' class File 6.14 Backing Up to Versioned Filenames | 221 def File.to_backup(filename, move=false) new_filename = nil if File.exists? filename new_filename = File.versioned_filename(filename) File.send(move ? :move : :copy, filename, new_filename) end return new_filename end end Let’s back up filename.txt a couple of times. Recall from earlier that the files filename.txt.[0-3] already exist. File.to_backup('filename.txt') # => "filename.txt.4" File.to_backup('filename.txt') # => "filename.txt.5" Now let’s do a destructive backup: File.to_backup('filename.txt', true) # => "filename.txt.6" File.exists? 'filename.txt' # => false You can’t back up what doesn’t exist: File.to_backup('filename.txt') # => nil Discussion If you anticipate more than 10 versions of a file, you should add additional zeroes to the initial suffix. Otherwise, filename.txt.10 will sort before filename.txt.2 in a directory listing. A commonly used suffix is “.000”. 200.times do |i| name = File.versioned_filename('many_versions.txt', '.000') open(name, 'w') { |f| f << "Contents for run #{i}" } puts "Created #{name}" end # Created many_versions.txt # Created many_versions.txt.000 # Created many_versions.txt.001 # ... # Created many_versions.txt.197 # Created many_versions.txt.198 The result of versioned_filename won’t be trustworthyif other threads or processes on your machine might be trying to write the same file. If this is a concern for you, you shouldn’t be satisfied with a negative result from File.exists?. In the time it takes to open that file, some other process or thread might open it before you. Once you find a file that doesn’t exist, you must get an exclusive lock on the file before you can be totally certain it’s okay to use. Here’s how such an implementation might look on a Unix system. The versioned_ filename methods return the name of a file, but this implementation needs to return the actual file, opened and locked. This is the onlywayto avoid a race condition between the time the method returns a filename, and the time you open and lock the file. 222 | Chapter 6: Files and Directories class File def File.versioned_file(base, first_suffix='.0', access_mode='w') suffix = file = locked = nil filename = base begin suffix = (suffix ? suffix.succ : first_suffix) filename = base + suffix unless File.exists? filename file = open(filename, access_mode) locked = file.flock(File::LOCK_EX | File::LOCK_NB) file.close unless locked end end until locked return file end end File.versioned_file('contested_file') # => # File.versioned_file('contested_file') # => # File.versioned_file('contested_file') # => # The construct begin...end until locked creates a loop that runs at least once, and continues to run until the variable locked becomes true, indicating that a file has been opened and successfully locked. See Also • Recipe 6.13, “Locking a File” 6.15 Pretending a String Is a File Problem You want to call code that expects to read from an open file object, but your source is a string in memory. Alternatively, you want to call code that writes its output to a file, but have it actually write to a string. Solution The StringIO class wraps a string in the interface of the IO class. You can treat it like a file, then get everything that’s been “written” to it by calling its string method. Here’s a StringIO used as an input source: require 'stringio' s = StringIO.new %{I am the very model of a modern major general. I've information vegetable, animal, and mineral.} s.pos # => 0 s.each_line { |x| puts x } # I am the very model of a modern major general. # I've information vegetable, animal, and mineral. 6.15 Pretending a String Is a File | 223 s.eof? # => true s.pos # => 95 s.rewind s.pos # => 0 s.grep /general/ # => ["I am the very model of a modern major general.\n"] Here are StringIO objects used as output sinks: s = StringIO.new s.write('Treat it like a file.') s.rewind s.write("Act like it's") s.string # => "Act like it's a file." require 'yaml' s = StringIO.new YAML.dump(['A list of', 3, :items], s) puts s.string # --- # - A list of # - 3 # - :items Discussion The Adapter is a common design pattern: to make an object acceptable as input to a method, it’s wrapped in another object that presents the appropriate interface. The StringIO class is an Adapter between String and File (or IO), designed for use with methods that work on File or IO instances. With a StringIO, you can disguise a string as a file and use those methods without them ever knowing theyhaven’t really been given a file. For instance, if you want to write unit tests for a library that reads from a file, the simplest wayis to pass in predefined StringIO objects that simulate files with various contents. If you need to modify the output of a method that writes to a file, a StringIO can capture the output, making it easyto modifyand send on to its final destination. StringIO-typefunctionalityis less necessaryin Rubythan in languages like Python, because in Ruby, strings and files implement a lot of the same methods to begin with. Often you can get away with simply using these common methods. For instance, if all you’re doing is writing to an output sink, you don’t need a StringIO object, because String#<< and File#<< work the same way: def make_more_interesting(io) io << "... OF DOOM!" end make_more_interesting("Cherry pie") # => "Cherry pie... OF DOOM!" open('interesting_things', 'w') do |f| 224 | Chapter 6: Files and Directories f.write("Nightstand") make_more_interesting(f) end open('interesting_things') { |f| f.read } # => "Nightstand... OF DOOM!" Similarly, File and String both include the Enumerable mixin, so in a lot of cases you can read from an object without caring what type it is. This is a good example of Ruby’s duck typing. Here’s a string: poem = %{The boy stood on the burning deck Whence all but he had fled He'd stayed above to wash his neck Before he went to bed} and a file containing that string: output = open("poem", "w") output.write(poem) output.close input = open("poem") will give the same result when you call an Enumerable method: poem.grep /ed$/ # => ["Whence all but he had fled\n", "Before he went to bed"] input.grep /ed$/ # => ["Whence all but he had fled\n", "Before he went to bed"] Just remember that, unlike a string, you can’t iterate over a file multiple times with- out calling rewind: input.grep /ed$/ # => [] input.rewind input.grep /ed$/ # => ["Whence all but he had fled\n", "Before he went to bed"] StringIO comes in when the Enumerable methods and << aren’t enough. If a method you’re writing needs to use methods specific to IO, you can accept a string as input and wrap it in a StringIO. The class also comes in handywhen youneed to call a method someone else wrote, not anticipating that anyone would ever need to call it with anything other than a file: def fifth_byte(file) file.seek(5) file.read(1) end fifth_byte("123456") # NoMethodError: undefined method `seek' for "123456":String fifth_byte(StringIO.new("123456")) # => "6" When you write a method that accepts a file as an argument, you can silently accom- modate callers who pass in strings bywrapping in a StringIO anystring that gets passed in: 6.16 Redirecting Standard Input or Output | 225 def file_operation(io) io = StringIO(io) if io.respond_to? :to_str && !io.is_a? StringIO #Do the file operation... end A StringIO object is always open for both reading and writing: s = StringIO.new s << "A string" s.read # => "" s << ", and more." s.rewind s.read # => "A string, and more." Memoryaccess is faster than disk access, but for large amounts of data (more than about 10 kilobytes), StringIO objects are slower than disk files. If speed is your aim, your best bet is to write to and read from temp files using the tempfile module. Or you can do what the open-uri librarydoes: start off bywriting to a StringIO and, if it gets too big, switch to using a temp file. See Also • Recipe 6.8, “Writing to a Temporary File” • Recipe 6.11, “Performing Random Access on “Read-Once” Input Streams” 6.16 Redirecting Standard Input or Output Problem You don’t want the standard input, output, or error of your process to go to the default IO objects set up bythe Rubyinterpreter. You want them to go to other file- type objects of your own choosing. Solution You can assign any IO object (a File,aSocket, or what have you) to the global vari- ables $stdin, $stdout,or$stderr. You can then read from or write to those objects as though they were the originals. This short Rubyprogram demonstrates how to redirect the Kernel methods that print to standard output. To avoid confusion, I’m presenting it as a standalone Ruby program rather than an interactive irb session.* #!/usr/bin/ruby -w # ./redirect_stdout.rb require 'stringio' new_stdout = StringIO.new * irb prints the result of each Ruby expression to $stdout, which tends to clutter the results in this case. 226 | Chapter 6: Files and Directories $stdout = new_stdout puts "Hello, hello." puts "I'm writing to standard output." $stderr.puts "#{new_stdout.size} bytes written to standard ouput so far." $stderr.puts "You haven't seen anything on the screen yet, but you soon will:" $stderr.puts new_stdout.string Run this program and you’ll see the following: $ ruby redirect_stdout.rb 46 bytes written to standard output so far. You haven't seen anything on the screen yet, but you soon will: Hello, hello. I'm writing to standard output. Discussion If you have any Unix experience, you know that when you run a Ruby script from the command line, you can make the shell redirect its standard input, output, and error streams to files or other programs. This technique lets you do the same thing from within a Ruby script. You can use this as a quick and dirtywayto write errors to a file, write output to a StringIO object (as seen above), or even read input from a socket. Within a script, you can programatically decide where to send your output, or receive standard input from multiple sources. These things are generallynot possible from the command line without a lot of fancy shell scripting. The redirection technique is especiallyuseful when you’vewritten or inherited a script that prints text to standard output, and you need to make it capable of print- ing to any file-like object. Rather than changing almost every line of your code, you can just set $stdout at the start of your program, and let it run as is. This isn’t a per- fect solution, but it’s often good enough. The original input and output streams for a process are always available as the con- stants STDIN, STDOUT, and STDERR. If you want to temporarily swap one IO stream for another, change back to the “standard” standard output bysetting $stdin = STDIN. Keep in mind that since the $std objects are global variables, even a temporary change affects all threads in your script. See Also • Recipe 6.15, “Pretending a String Is a File,” has much more information on StringIO 6.17 Processing a Binary File | 227 6.17 Processing a Binary File Problem You want to read binary data from a file, or write it to one. Solution Since Rubystrings make no distinction between binaryand text data, processing a binaryfile needn’t be anydifferent than processing a text file. Just make sure youadd “b” to your file mode when you open a binary file on Windows. This code writes 10 bytes of binary data to a file, then reads it back: open('binary', 'wb') do |f| (0..100).step(10) { |b| f << b.chr } end s = open('binary', 'rb') { |f| f.read } # => "\000\n\024\036(2 0 f.read(1) # => "\000" f.pos # => 1 228 | Chapter 6: Files and Directories You can also just set pos to jump to a specific byte in the file: f.pos = 4 # => 4 f.read(2) # => "(2" f.pos # => 6 You can use IO#seek to move the cursor forward or backward relative to its current position (with File::SEEK_CUR), or to move to a certain distance from the end of a file (with File::SEEK_END). Unlike the iterator methods, which go through the entire file once, you can use seek or set pos to jump anywhere in the file, even to a byte you’ve already read. f.seek(8) f.pos # => 8 f.seek(-4, File::SEEK_CUR) f.pos # => 4 f.seek(2, File::SEEK_CUR) f.pos # => 6 # Move to the second-to-last byte of the file. f.seek(-2, File::SEEK_END) f.pos # => 9 Attempting to read more bytes than there are in the file returns the rest of the bytes, and set your file’s eof? flag to true: f.read(500) # => "Zd" f.pos # => 11 f.eof? # => true f.close Often you need to read from and write to a binary file simultaneously. You can open anyfile for simultaneous reading and writing using the “r+” mode (or, in this case, “rb+”): f = open('binary', 'rb+') f.read # => "\000\n\024\036(2 "\000\nHello.PZd" f << 'Goodbye.' f.rewind f.read # => "\000\nHello.PZdGoodbye." f.close You can append new data to the end of a file you’ve opened for read-write access, and you can overwrite existing data byte for byte, but you can’t insert new data into the middle of a file. This makes the read-write technique useful for binaryfiles, where exact byte offsets are often important, and less useful for text files, where it might make sense to add an extra line in the middle. 6.17 Processing a Binary File | 229 Whydo youneed to append “b” to the file mode when opening a binaryfile on Win- dows? Because otherwise Windows will mangle anynewline characters that show up in your binary file. The “b” tells Windows to leave the newlines alone, because they’re not really newlines: they’re binary data. Since it doesn’t hurt anything on Unix to put “b” in the file mode, you can make your code cross-platform by append- ing “b” to the mode whenever you open a file you plan to treat as binary. Note that “b” by itself is not a valid file mode: you probably want “rb”. An MP3 example Because everybinaryformat is different, probablythe best I can do to help you beyond this point is show you an example. Consider MP3 music files. Many MP3 files have a 128-byte data structure at the end called an ID3 tag. These 128 bytes are literallypacked with information about the song: its name, the artist, which album it’s from, and so on. You can parse this data structure byopening an MP3 file and doing a series of reads from a pos near the end of the file. According to the ID3 standard, if you start from the 128th-to-last byte of an MP3 file and read three bytes, you should get the string “TAG”. If you don’t, there’s no ID3 tag for this MP3 file, and nothing to do. If there is an ID3 tag present, then the 30 bytes after “TAG” contain the name of the song, the 30 bytes after that contain the name of the artist, and so on. Here’s some code that parses a file’s ID3 tag and puts the results into a hash: def parse_id3(mp3_file) fields_and_sizes = [[:track_name, 30], [:artist_name, 30], [:album_name, 30], [:year, 4], [:comment, 30], [:genre, 1]] tag = {} open(mp3_file) do |f| f.seek(-128, File::SEEK_END) if f.read(3) == 'TAG' # An ID3 tag is present fields_and_sizes.each do |field, size| # Read the field and strip off anything after the first null # character. data = f.read(size).gsub(/\000.*/, '') # Convert the genre string to a number. data = data[0] if field == :genre tag[field] = data end end end return tag end parse_id3('ID3.mp3') # => {:year=>"2005", :artist_name=>"The ID Three", # :album_name=>"Binary Brain Death", 230 | Chapter 6: Files and Directories # :comment=>"http://www.example.com/id3/", :genre=>22, # :track_name=>"ID 3"} parse_id3('Too Indie For ID3 Tags.mp3') # => {} Rather than specifying the genre of the music as a string, the :genre element of the hash is a single byte, an entry into a lookup table shared by all applications that use ID3. In this table, genre number 22 is “Death metal”. It’s less code to specifythe byteoffsets for a binaryfile is in the format recognized by String#unpack, which can parse the bytes of a string according to a given format. It returns an array containing the results of the parsing. #Returns [track, artist, album, year, comment, genre] def parse_id3(mp3_file) format = 'Z30Z30Z30Z4Z30C' open(mp3_file) do |f| f.seek(-128, File::SEEK_END) if f.read(3) == "TAG" # An ID3 tag is present return f.read(125).unpack(format) end end return nil end parse_id3('ID3.mp3') # => ["ID 3", "The ID Three", "Binary Brain Death", "2005", "http://www.example.com/ id3/", 22] As you can see, the unpack format is obscure but veryconcise. The string “Z30Z30Z30Z4Z30C” passed into String#unpack completelydescribes the elements of the ID3 format after the “TAG”: • Three strings of 30 bytes, with null characters stripped (“Z30Z30Z30”) • A string of 4 bytes, with null characters stripped (“Z4”) • One more string of 30 bytes, with null characters stripped (“Z30”) • A single character, represented as an unsigned integer (“C”) It doesn’t describe what those elements are supposed to be used for, though. When writing binarydata to a file, youcan use Array#pack, the opposite of String#unpack: id3 = ["ID 3", "The ID Three", "Binary Brain Death", "2005", "http://www.example.com/id3/", 22] id3.pack 'Z30Z30Z30Z4Z30C' # => "ID 3\000\000\000\000\000...http://www.example.com/id3/\000\000\000\026" 6.18 Deleting a File | 231 See Also • The ID3 standard, described at http://en.wikipedia.org/wiki/ID3 along with the table of genres; the code in this recipe parses the original ID3v1 standard, which is much simpler than ID3v2 • ri String#unpack and ri Array#pack 6.18 Deleting a File Problem You want to delete a single file, or a whole directory tree. Solution Removing a file is simple, with File.delete: import 'fileutils' FileUtils.touch "doomed_file" File.exists? "doomed_file" # => true File.delete "doomed_file" File.exists? "doomed_file" # => false Removing a directorytree is also fairlysimple. The most confusing thing about it is the number of different methods Rubyprovides to do it. The method youwant is probably FileUtils.remove_dir, which recursively deletes the contents of a directory: Dir.mkdir "doomed_directory" File.exists? "doomed_directory" # => true FileUtils.remove_dir "doomed_directory" File.exists? "doomed_directory" # => false Discussion Rubyprovides several methods for removing directories, but youreallyonlyneed remove_dir. Dir.delete and FileUtils.rmdir will onlywork if the directoryis already empty. The rm_r and rm_rf defined in FileUtils are similar to remove_dir, but if you’re a Unix user you may find their names more mneumonic. You should also know about the :secure option to rm_rf, because the remove_dir method and all its variants are vulnerable to a race condition when you remove a world-writable directory. The risk is that a process owned by another user might cre- ate a symlink in that directory while you’re deleting it. This would make you delete the symlinked file along with the files you actually meant to delete. Passing in the :secure option to rm_rf slows down deletions significantly(it has to change the permissions on the directorybefore deleting it), but it avoids the race 232 | Chapter 6: Files and Directories condition. If you’re running Ruby 1.8, you’ll also need to hack the FileUtils mod- ule a little bit to work around a bug (the bug is fixed in Ruby 1.9): # A hack to make a method used by rm_rf actually available module FileUtils module_function :fu_world_writable? end Dir.mkdir "/tmp/doomed_directory" FileUtils.rm_rf("/tmp/doomed_directory", :secure=>true) File.exists? "/tmp/doomed_directory" # => false Whyisn’t the :secure option the default for rm_rf? Because secure deletion isn’t thread-safe: it actuallychanges the current working directoryof the process. You need to choose between thread safety and a possible security hole. 6.19 Truncating a File Problem You want to truncate a file to a certain length, probably zero bytes. Solution Usually, you want to destroy the old contents of a file and start over. Opening a file for write access will automatically truncate it to zero bytes, and let you write new contents to the file: filename = 'truncate.txt' open(filename, 'w') { |f| f << "All of this will be truncated." } File.size(filename) # => 30 f = open(filename, 'w') {} File.size(filename) # => 0 If you just need to truncate the file to zero bytes, and not write any new contents to it, you can open it with an access mode of File::TRUNC. open(filename, 'w') { |f| f << "Here are some new contents." } File.size(filename) # => 27 f = open(filename, File::TRUNC) {} File.size(filename) # => 0 You can’t actually do anything with a FILE whose access mode is File::TRUNC: open(filename, File::TRUNC) do |f| f << "At last, an empty file to write to!" end # IOError: not opened for writing 6.20 Finding the Files You Want | 233 Discussion Transient files are the most likelycandidates for truncation. Log files are often trun- cated, automatically or by hand, before they grow too large. The most common type of truncation is truncating a file to zero bytes, but the File. truncate method can truncate a file to anynumber of bytes,not just zero. You can also use the instance method, File#truncate, to truncate a file you’ve opened for writing: f = open(filename, 'w') do |f| f << 'These words will remain intact after the file is truncated.' end File.size(filename) # => 59 File.truncate(filename, 30) File.size(filename) # => 30 open(filename) { |f| f.read } # => "These words will remain intact" These methods don’t always make a file smaller. If the file starts out smaller than the size you give, they append zero-bytes (\000) to the end of file until the file reaches the specified size. f = open(filename, "w") { |f| f << "Brevity is the soul of wit." } File.size(filename) # => 27 File.truncate(filename, 30) File.size(filename) # => 30 open(filename) { |f| f.read } # => "Brevity is the soul of wit.\000\000\000" File.truncate and File#truncate act like the bed of Procrustes: theyforce a file to be a certain number of bytes long, whether that means stretching it or chopping off the end. 6.20 Finding the Files You Want Problem You want to locate all the files in a directoryhierarchythat match some criteria. For instance, you might want to find all the empty files, all the MP3 files, or all the files named “README.” Solution Use the Find.find method to walk the directorystructure and accumulate a list of matching files. Pass in a block to the following method and it’ll walk a directorytree, testing each file against the code block you provide. It returns an array of all files for which the value of the block is true. 234 | Chapter 6: Files and Directories require 'find' module Find def match(*paths) matched = [] find(*paths) { |path| matched << path if yield path } return matched end module_function :match end Here’s what Find.match might return if you used it on a typical disorganized home directory: Find.match("./") { |p| File.lstat(p).size == 0 } # => ["./Music/cancelled_download.MP3", "./tmp/empty2", "./tmp/empty1"] Find.match("./") { |p| ext = p[-4...p.size]; ext && ext.downcase == ".mp3" } # => ["./Music/The Snails - Red Rocket.mp3", # => "./Music/The Snails - Moonfall.mp3", "./Music/cancelled_download.MP3"] Find.match("./") { |p| File.split(p)[1] == "README" } # => ["./rubyprog-0.1/README", "./tmp/README"] Discussion This is an especiallyuseful chunk of code for systemadministration tasks. It gives you functionality at least as powerful as the Unix find command, but you can write your search criteria in Ruby and you won’t have to remember the arcane syntax of find. As with Find.walk itself, you can stop Find.match from processing a directorybycall- ing Find.prune: Find.match("./") do |p| Find.prune if p == "./tmp" File.split(p)[1] == "README" end # => ["./rubyprog-0.1/README"] You can even look inside each file to see whether you want it: # Find all files that start with a particular phrase. must_start_with = "This Ruby program" Find.match("./") do |p| if File.file? p open(p) { |f| f.read(must_start_with.size) == must_start_with } else false end end # => ["./rubyprog-0.1/README"] A few other useful things to search for using this function: 6.21 Finding and Changing the Current Working Directory | 235 # Finds files that were probably left behind by emacs sessions. def emacs_droppings(*paths) Find.match(*paths) do |p| (p[-1] == ?~ and p[0] != ?~) or (p[0] == ?# and p[-1] == ?#) end end # Finds all files that are larger than a certain threshold. Use this to find # the files hogging space on your filesystem. def bigger_than(bytes, *paths) Find.match(*paths) { |p| File.lstat(p).size > bytes } end # Finds all files modified more recently than a certain number of seconds ago. def modified_recently(seconds, *paths) time = Time.now - seconds Find.match(*paths) { |p| File.lstat(p).mtime > time } end # Finds all files that haven't been accessed since they were last modified. def possibly_abandoned(*paths) Find.match(*paths) { |p| f = File.lstat(p); f.mtime == f.atime } end See Also • Recipe 6.12, “Walking a Directory Tree” 6.21 Finding and Changing the Current Working Directory Problem You want to see which directorythe Rubyprocess considers its current working directory, or change that directory. Solution To find the current working directory, use Dir.getwd: Dir.getwd # => "/home/leonardr" To change the current working directory, use Dir.chdir: Dir.chdir("/bin") Dir.getwd # => "/bin" File.exists? "ls" # => true 236 | Chapter 6: Files and Directories Discussion The current working directoryof a Rubyprocess starts out as the directoryyouwere in when you started the Ruby interpreter. When you refer to a file without providing an absolute pathname, Rubyassumes youwant a file bythat name in the current working directory. Ruby also checks the current working directory when you require a library that can’t be found anywhere else. The current working directoryis a useful default. If you’rewriting a Rubyscript that operates on a directorytree, youmight start from the current working directoryif the user doesn’t specify one. However, youshouldn’t relyon the current working directorybeing set to anypar- ticular value: this makes scripts brittle, and prone to break when run from a differ- ent directory. If your Ruby script comes bundled with libraries, or needs to load additional files from subdirectories of the script directory, you should set the work- ing directory in code. You can change the working directoryas often as necessary,but it’s more reliable to use absolute pathnames, even though this can make your code less portable. This is especially true if you’re writing multithreaded code. The current working directoryis global to a process. If multiple threads are running code that changes the working directoryto different values, you’llnever know for sure what the working directory is at any given moment. See Also • Recipe 6.18, “Deleting a File,” shows some problems created bya process-global working directory 237 Chapter 7 CHAPTER 7 Code Blocks and Iteration7 In Ruby, a code block (or just “block”) is an object that contains some Ruby code, and the context neccesaryto execute it. Code blocks are the most visuallydistinctive aspect of Ruby, and also one of the most confusing to newcomers from other lan- guages. Essentially, a Ruby code block is a method that has no name. Most other languages have something like a Rubycode block: C’s function pointers, C++’s function objects, Python’s lambdas and list comprehensions, Perl’s anony- mous functions, Java’s anonymous inner classes. These features live mostly in the corners of those languages, shunned bynovice programmers. Rubycan’t be written without code blocks. Of the major languages, only Lisp is more block-oriented. Unlike most other languages, Rubymakes code blocks easyto create and imposes few restrictions on them. In everyother chapter of this book, you’llsee blocks passed into methods like it’s no big deal (which it isn’t): [1,2,3].each { |i| puts i} # 1 # 2 # 3 In this chapter, we’ll show you how to write that kind of method, the kinds of method that are useful to write that way, and when and how to treat blocks as first- class objects. Rubyprovides two syntaxesfor creating code blocks. When the entire block will fit on one line, it’s most readable when enclosed in curly braces: [1,2,3].each { |i| puts i } # 1 # 2 # 3 When the block is longer than one line, it’s more readable to begin it with the do key- word and end it with the end keyword: [1,2,3].each do |i| if i % 2 == 0 238 | Chapter 7: Code Blocks and Iteration puts "#{i} is even." else puts "#{i} is odd." end end # 1 is odd. # 2 is even. # 3 is odd. Some people use the bracket syntax when they’re interested in the return value of the block, and the do...end syntax when they’re interested in the block’s side effects. Keep in mind that the bracket syntax has a higher precedence than the do..end syn- tax. Consider the following two snippets of code: 1.upto 3 do |x| puts x end # 1 # 2 # 3 1.upto 3 { |x| puts x } # SyntaxError: compile error In the second example, the code block binds to the number 3, not to the function call 1.upto 3. A standalone variable can’t take a code block, so you get a compile error. When in doubt, use parentheses. 1.upto(3) { |x| puts x } # 1 # 2 # 3 Usuallythe code blocks passed into methods are anonymousobjects, created on the spot. But you can instantiate a code block as a Proc object bycalling lambda. See Rec- ipe 7.1 for more details. hello = lambda { "Hello" } hello.call # => "Hello" log = lambda { |str| puts "[LOG] #{str}" } log.call("A test log message.") # [LOG] A test log message. Like anymethod, a block can accept arguments. A block’s arguments are defined in a comma-separated list at the beginning of the block, enclosed in pipe characters: {1=>2, 2=>4}.each { |k,v| puts "Key #{k}, value #{v}" } # Key 1, value 2 # Key 2, value 4 Code Blocks and Iteration | 239 Arguments to blocks look almost like arguments to methods, but there are a few restrictions: you can’t set default values for block arguments, you can’t expand hashes or arrays inline, and a block cannot itself take a block argument.* Since Proc objects are created like other objects, you can create factory methods whose return values are customized pieces of executable Rubycode. Here’s a simple factory method for code blocks that do multiplication: def times_n(n) lambda { |x| x * n } end The following code uses the factory to create and use two customized methods: times_ten = times_n(10) times_ten.call(5) # => 50 times_ten.call(1.25) # => 12.5 circumference = times_n(2*Math::PI) circumference.call(10) # => 62.8318530717959 circumference.call(3) # => 18.8495559215388 [1, 2, 3].collect(&circumference) # => [6.28318530717959, 12.5663706143592, 18.8495559215388] You mayhave heard people talking about Ruby’s“closures.” What is a closure, and how is it different from a block? In Ruby, there is no difference between closures and blocks. Every Ruby block is also a closure.† So what makes a Rubyblock a closure? Basically,a Rubyblock carries around the context in which it was defined. A block can reference the variables that were in scope when it was defined, even if those variables later go out of scope. Here’s a sim- ple example; see Recipe 7.4 for more. ceiling = 50 # Which of these numbers are less than the target? [1, 10, 49, 50.1, 200].select { |x| x < ceiling } # => [1, 10, 49] The variable ceiling is within scope when the block is defined, but it goes out of scope when the flow of execution enters the select method. Nonetheless, the block can access ceiling from within select, because it carries its context around with it. That’s what makes it a closure. We suspect that a lot of people who say“closures” when talking about Rubyblocks just do it to sound smart. Since we’ve alreadyruined anychance we might have had * In Ruby1.9, a block can itself take a block argument: |arg1, arg2, &block|. This makes methods like Module#define_method more useful. In Ruby 2.0, you’ll be able to give default values to block arguments. † Someone could argue that a block isn’t really a closure if it never actuallyuses anyof the context it carries around: you could have done the same job with a “dumb” block, assuming Ruby supported those. For sim- plicity’s sake, we do not argue this. 240 | Chapter 7: Code Blocks and Iteration at sounding smart, we’ve decided refer to Rubyclosures as just plain “blocks” throughout this book. The onlyexceptions are in the rare places where we must dis- cuss the context that makes Ruby’s code blocks real closures, rather than “dumb” blocks. 7.1 Creating and Invoking a Block Problem You want to put some Rubycode into an object so youcan pass it around and call it later. Solution Bythis time, youshould familiar with a block as some Rubycode enclosed in curly brackets. You might think it possible to define a block object as follows: aBlock = { |x| puts x } # WRONG # SyntaxError: compile error That doesn’t work because a block is onlyvalid Rubysyntaxwhen it’s an argument to a method call. There are several equivalent methods that take a block and return it as an object. The most favored method is Kernel#lambda:* aBlock = lambda { |x| puts x } # RIGHT To call the block, use the call method: aBlock.call "Hello World!" # Hello World! Discussion The abilityto assign a bit of Rubycode to a variable is verypowerful. It lets youwrite general frameworks and plug in specific pieces of code at the crucial points. As you’ll find out in Recipe 7.2, you can accept a block as an argument to a method byprepending & to the argument name. This way, you can write your own trivial ver- sion of the lambda method: def my_lambda(&aBlock) aBlock end b = my_lambda { puts "Hello World My Way!" } b.call # Hello World My Way! * The name lambda comes from the lambda calculus (a mathematical formal system) via Lisp. 7.2 Writing a Method That Accepts a Block | 241 A newly defined block is actually a Proc object. b.class # => Proc You can also initialize blocks with the Proc constructor or the method Kernel#proc. The methods Kernel#lambda, Kernel#proc, and Proc.new all do basicallythe same thing. These three lines of code are nearly equivalent: aBlock = Proc.new { |x| puts x } aBlock = proc { |x| puts x } aBlock = lambda { |x| puts x } What’s the difference? Kernel#lambda is the preferred wayof creating block objects, because it gives you block objects that act more like Ruby methods. Consider what happens when you call a block with the wrong number of arguments: add_lambda = lambda { |x,y| x + y } add_lambda.call(4) # ArgumentError: wrong number of arguments (1 for 2) add_lambda.call(4,5,6) # ArgumentError: wrong number of arguments (3 for 2) A block created with lambda acts like a Rubymethod. If youdon’t specifythe right number of arguments, you can’t call the block. But a block created with Proc.new acts like the anonymous code block you pass into a method like Enumerable#each: add_procnew = Proc.new { |x,y| x + y } add_procnew.call(4) # TypeError: nil can't be coerced into Fixnum add_procnew.call(4,5,6) # => 9 If you don’t specify enough arguments when you call the block, the rest of the argu- ments are given nil. If youspecifytoo manyarguments, the extra arguments are ignored. Unless you want this kind of behavior, use lambda. In Ruby1.8, Kernel#proc acts like Kernel#lambda. In Ruby1.9, Kernel#proc acts like Proc.new, as better befits its name. See Also • Recipe 7.2, “Writing a Method That Accepts a Block” • Recipe 10.4, “Getting a Reference to a Method” 7.2 Writing a Method That Accepts a Block Problem You want to write a method that can accept and call an attached code block: a method that works like Array#each, Fixnum#upto, and other built-in Ruby methods. 242 | Chapter 7: Code Blocks and Iteration Solution You don’t need to do anything special to make your method capable of accepting a block. Anymethod can use a block if the caller passes one in. At anytime in your method, you can call the block with yield: def call_twice puts "I'm about to call your block." yield puts "I'm about to call your block again." yield end call_twice { puts "Hi, I'm a talking code block." } # I'm about to call your block. # Hi, I'm a talking code block. # I'm about to call your block again. # Hi, I'm a talking code block. Another example: def repeat(n) if block_given? n.times { yield } else raise ArgumentError.new("I can't repeat a block you don't give me!") end end repeat(4) { puts "Hello." } # Hello. # Hello. # Hello. # Hello. repeat(4) # ArgumentError: I can't repeat a block you don't give me! Discussion Since Rubyfocuses so heavilyon iterator methods and other methods that accept code blocks, it’s important to know how to use code blocks in your own methods. You don’t have to do anything special to make your method capable of taking a code block. A caller can pass a code block into any Rubymethod; it’s just that there’s no point in doing that if the method never invokes yield. puts("Print this message.") { puts "And also run this code block!" } # Print this message. The yield keyword acts like a special method, a stand-in for whatever code block was passed in. When you call it, it’s exactly as the code block were a Proc object and you had invoked its call method. 7.2 Writing a Method That Accepts a Block | 243 This may seem mysterious if you’re unfamiliar with the practice of passing blocks around, but it is usually the preferred method of calling blocks in Ruby. If you feel more comfortable receiving a code block as a “real” argument to your method, see Recipe 7.3. You can pass in arguments to yield (they’ll be passed to the block) and you can do things with the value of the yield statement (this is the value of the last statement in the block). Here’s a method that passes arguments into its code block, and uses the value of the block: def call_twice puts "Calling your block." ret1 = yield("very first") puts "The value of your block: #{ret1}" puts "Calling your block again." ret2 = yield("second") puts "The value of your block: #{ret2}" end call_twice do |which_time| puts "I'm a code block, called for the #{which_time} time." which_time == "very first" ? 1 : 2 end # Calling your block. # I'm a code block, called for the very first time. # The value of your block: 1 # Calling your block again. # I'm a code block, called for the second time. # The value of your block: 2 Here’s a more realistic example. The method Hash#find takes a code block, passes each of a hash’s key-value pairs into the code block, and returns the first key-value pair for which the code block evaluates to true. squares = {0=>0, 1=>1, 2=>4, 3=>9} squares.find { |key, value| key > 1 } # => [2, 4] Suppose we want a method that works like Hash#find, but returns a new hash con- taining all the key-value pairs for which the code block evaluates to true. We can do this by passing arguments into the yield statement and using its result: class Hash def find_all new_hash = Hash.new each { |k,v| new_hash[k] = v if yield(k, v) } new_hash end end squares.find_all { |key, value| key > 1 } # => {2=>4, 3=>9} 244 | Chapter 7: Code Blocks and Iteration As it turns out, the Hash#delete_if method alreadydoes the inverse of what we want. Bynegating the result of our code block, we can make Hash#delete_if do the job of Hash#find_all. We just need to work off of a duplicate of our hash, because delete_ if is a destructive method: squares.dup.delete_if { |key, value| key > 1 } # => {0=>0, 1=>1} squares.dup.delete_if { |key, value| key <= 1 } # => {2=>4, 3=>9} Hash#find_all turns out to be unnecessary, but it made for a good example. You can write a method that takes an optional code block bycalling Kernel#block_ given? from within your method. That method returns true only if the caller of your method passed in a code block. If it returns false, you can raise an exception, or you can fall back to behavior that doesn’t need a block and never uses the yield keyword. If your method calls yield and the caller didn’t pass in a code block, Rubywill throw an exception: [1, 2, 3].each # LocalJumpError: no block given See Also • Recipe 7.3, “Binding a Block Argument to a Variable” 7.3 Binding a Block Argument to a Variable Problem You’ve written a method that takes a code block, but it’s not enough for you to sim- plycall the block with yield. You need to somehow bind the code block to a vari- able, so you can manipulate the block directly. Most likely, you need to pass it as the code block to another method. Solution Put the name of the block variable at the end of the list of your method’s arguments. Prefix it with an ampersand so that Rubyknows it’s a block argument, not a regular argument. An incoming code block will be converted into a Proc object and bound to the block variable. You can pass it around to other methods, call it directlyusing call,oryield to it as though you’d never bound it to a variable at all. All three of the following methods do exactly the same thing: def repeat(n) n.times { yield } if block_given? end repeat(2) { puts "Hello." } # Hello. # Hello. 7.3 Binding a Block Argument to a Variable | 245 def repeat(n, &block) n.times { block.call } if block end repeat(2) { puts "Hello." } # Hello. # Hello. def repeat(n, &block) n.times { yield } if block end repeat(2) { puts "Hello." } # Hello. # Hello. Discussion If &foo is the name of a method’s last argument, it means that the method accepts an optional block named foo. If the caller chooses to pass in a block, it will be made available as a Proc object bound to the variable foo. Since it is an optional argument, foo will be nil if no block is actuallypassed in. This frees youfrom having to call Kernel#block_given? to see whether or not you got a block. When you call a method, you can pass in any Proc object as the code block byprefix- ing the appropriate variable name with an ampersand. You can even do this on a Proc object that was originally passed in as a code block to your method. Manymethods for collections, like each, select, and detect, accept code blocks. It’s easyto wrap such methods when yourown methods can bind a block to a variable. Here, a method called biggest finds the largest element of a collection that gives a true result for the given block: def biggest(collection, &block) block ? collection.select(&block).max : collection.max end array = [1, 2, 3, 4, 5] biggest(array) {|i| i < 3} # => 2 biggest(array) {|i| i != 5 } # => 4 biggest(array) # => 5 This is also veryuseful when youneed to write a frontend to a method that takes a block. Your wrapper method can bind an incoming code block to a variable, then pass it as a code block to the other method. This code calls a code block limit times, each time passing in a random number between min and max: def pick_random_numbers(min, max, limit) limit.times { yield min+rand(max+1) } end 246 | Chapter 7: Code Blocks and Iteration This code is a wrapper method for pick_random_numbers. It calls a code block 6 times, each time with a random number from 1 to 49: def lottery_style_numbers(&block) pick_random_numbers(1, 49, 6, &block) end lottery_style_numbers { |n| puts "Lucky number: #{n}" } # Lucky number: 20 # Lucky number: 39 # Lucky number: 41 # Lucky number: 10 # Lucky number: 41 # Lucky number: 32 The code block argument must always be the very last argument defined for a method. This means that if your method takes a variable number of arguments, the code block argument goes after the container for the variable arguments: def invoke_on_each(*args, &block) args.each { |arg| yield arg } end invoke_on_each(1, 2, 3, 4) { |x| puts x ** 2 } # 1 # 4 # 9 # 16 See Also • Recipe 8.11, “Accepting or Passing a Variable Number of Arguments” • Recall from the chapter introduction that in Ruby1.8, a code block cannot itself take a block argument; this is fixed in Ruby 1.9 7.4 Blocks as Closures: Using Outside Variables Within a Code Block Problem You want to share variables between a method, and a code block defined within it. Solution Just reference the variables, and Rubywill do the right thing. Here’s a method that adds a certain number to every element of an array: def add_to_all(array, number) array.collect { |x| x + number } end add_to_all([1, 2, 3], 10) # => [11, 12, 13] 7.5 Writing an Iterator Over a Data Structure | 247 Enumerable#collect can’t access number directly, but it’s passed a block that can access it, since number was in scope when the block was defined. Discussion A Rubyblock is a closure: it carries around the context in which it was defined. This is useful because it lets you define a block as though it were part of your normal code, then tear it off and send it to a predefined piece of code for processing. A Rubyblock contains references to the variable bindings, not copies of the values. If the variable changes later, the block will have access to the new value: tax_percent = 6 position = lambda do "I have always supported a #{tax_percent}% tax on imported limes." end position.call # => "I have always supported a 6% tax on imported limes." tax_percent = 7.25 position.call # => "I have always supported a 7.25% tax on imported limes." This works both ways: you can rebind or modify a variable from within a block. counter = 0 4.times { counter += 1; puts "Counter now #{counter}"} # Counter now 1 # Counter now 2 # Counter now 3 # Counter now 4 counter # => 4 This is especiallyuseful when youwant to simulate inject or collect in conjunction with a strange iterator. You can create a storage object outside the block, and add things to it from within the block. This code simulates Enumerable#collect, but it collects the elements of an array in reverse order: accumulator = [] [1, 2, 3].reverse_each { |x| accumulator << x + 1 } accumulator # => [4, 3, 2] The accumulator variable is not within the scope of Array#reverse_each, but it is within the scope of the block. 7.5 Writing an Iterator Over a Data Structure Problem You’ve created a custom data structure, and you want to implement an each method for it, or you want to implement an unusual way of iterating over an existing data structure. 248 | Chapter 7: Code Blocks and Iteration Solution Complex data structures are usuallyconstructed out of the basic data structures: hashes, arrays, and so on. All of the basic data structures have defined the each method. If your data structure is composed entirely of scalar values and these simple data structures, you can write a new each method in terms of the each methods of its components. Here’s a simple tree data structure. A tree contains a single value, and a list of chil- dren (each of which is a smaller tree). class Tree attr_reader :value def initialize(value) @value = value @children = [] end def <<(value) subtree = Tree.new(value) @children << subtree return subtree end end Here’s code to create a specific Tree (Figure 7-1): t = Tree.new("Parent") child1 = t << "Child 1" child1 << "Grandchild 1.1" child1 << "Grandchild 1.2" child2 = t << "Child 2" child2 << "Grandchild 2.1" How can we iterate over this data structure? Since a tree is defined recursively, it makes sense to iterate over it recursively. This implementation of Tree#each yields the value stored in the tree, then iterates over its children (the children are stored in an array, which already supports each) and recursively calls Tree#each on every child tree. Figure 7-1. A simple tree Parent Child 1 Child 2 Grandchild 1.1 Grandchild 1.2 Grandchild 2.1 7.5 Writing an Iterator Over a Data Structure | 249 class Tree def each yield value @children.each do |child_node| child_node.each { |e| yield e } end end end The each method traverses the tree in a way that looks right: t.each { |x| puts x } # Parent # Child 1 # Grandchild 1.1 # Grandchild 1.2 # Child 2 # Grandchild 2.1 Discussion The simplest wayto build an iterator is recursively:to use smaller iterators until you’ve covered every element in your data structure. But what if those iterators aren’t there? More likely, what if they’re there but they give you elements in the wrong order? You’ll need to go down a level and write some loops. Loops are somewhat declassé in Rubybecause iterators are more idiomatic, but when you’re writing an iterator you may have no choice but to use a loop. Here’s a reprint of an iterator from Recipe 4.1, which illustrates how to use a while loop to iterate over an array from both sides: class Array def each_from_both_sides( ) front_index = 0 back_index = self.length-1 while front_index <= back_index yield self[front_index] front_index += 1 if front_index <= back_index yield self[back_index] back_index -= 1 end end end end %w{Curses! been again! foiled I've}.each_from_both_sides { |x| puts x } # Curses! # I've # been # foiled # again! 250 | Chapter 7: Code Blocks and Iteration Here are two more simple iterators. The first one yields each element multiple times in a row: module Enumerable def each_n_times(n) each { |e| n.times { yield e } } end end %w{Hello Echo}.each_n_times(3) { |x| puts x } # Hello # Hello # Hello # Echo # Echo # Echo The next one returns the elements of an Enumerable in random order; see Recipe 4.10 for a more efficient way to do the shuffling. module Enumerable def each_randomly (sort_by { rand }).each { |e| yield e } end end %w{Eat at Joe's}.each_randomly { |x| puts x } # Eat # Joe's # at See Also • Recipe 4.1, “Iterating Over an Array” • Recipe 4.10, “Shuffling an Array” • Recipe 5.7, “Iterating Over a Hash” • Recipe 7.6, “Changing the Way an Object Iterates” • Recipe 7.8, “Stopping an Iteration” • Recipe 7.9, “Looping Through Multiple Iterables in Parallel” 7.6 Changing the Way an Object Iterates Problem You want to use a data structure as an Enumerable, but the object’s implementation of #each doesn’t iterate the wayyouwant. Since all of Enumerable’s methods are based on each, this makes them all useless to you. 7.6 Changing the Way an Object Iterates | 251 Discussion Here’s a concrete example: a simple array. array = %w{bob loves alice} array.collect { |x| x.capitalize } # => ["Bob", "Loves", "Alice"] Suppose we want to call collect on this array, but we don’t want collect to use each: we want it to use reverse_each. Something like this hypothetical collect_ reverse method: array.collect_reverse { |x| x.capitalize } # => ["Alice", "Loves", "Bob"] Actuallydefining a collect_reverse method would add significant new code and onlysolve part of the problem. We could overwrite the array’s each implementation with a singleton method that calls reverse_each, but that’s hackyand it would surely have undesired side effects. Fortunately, there’s an elegant solution with no side effects: wrap the object in an Enumerator. This gives you a new object that acts like the old object would if you’d swapped out its each method: require 'enumerator' reversed_array = array.to_enum(:reverse_each) reversed_array.collect { |x| x.capitalize } # => ["Alice", "Loves", "Bob"] reversed_array.each_with_index do |x, i| puts %{#{i}=>"#{x}"} end # 0=>"alice" # 1=>"loves" # 2=>"bob" Note that you can’t use the Enumerator for our arrayas though it were the actual array. Only the methods of Enumerable are supported: reversed_array[0] # NoMethodError: undefined method `[]' for # Discussion Whenever you’re tempted to reimplement one of the methods of Enumerable, try using an Enumerator instead. It’s like modifying an object’s each method, but it doesn’t affect the original object. This can save you a lot of work. Suppose you have a tree data structure that provides three different iteration styles: each_prefix, each_postfix, and each_infix. Rather than implementing the methods of Enumerable for all three iteration styles, you can let each_ prefix be the default implementation of each, and call tree.to_enum(:each_postfix) or tree.to_enum(:each_infix) if you need an Enumerable that acts differently. 252 | Chapter 7: Code Blocks and Iteration A single underlying object can have multiple Enumerable objects. Here’s a second Enumerable for our simple array, in which each acts like each_with_index does for the original array: array_with_index = array.enum_with_index array_with_index.each do |x, i| puts %{#{i}=>"#{x}"} end # 0=>"bob" # 1=>"loves" # 2=>"alice" array_with_index.each_with_index do |x, i| puts %{#{i}=>#{x.inspect}} end # 0=>["bob", 0] # 1=>["loves", 1] # 2=>["alice", 2] When you require 'enumerator', Enumerable sprouts two extra enumeration meth- ods, each_cons and each_slice. These make it easyto iterate over a data structure in chunks. An example is the best way to show what they do: sentence = %w{Well, now I've seen everything!} two_word_window = sentence.to_enum(:each_cons, 2) two_word_window.each { |x| puts x.inspect } # ["Well,", "now"] # ["now", "I've"] # ["I've", "seen"] # ["seen", "everything!"] two_words_at_a_time = sentence.to_enum(:each_slice, 2) two_words_at_a_time.each { |x| puts x.inspect } # ["Well,", "now"] # ["I've", "seen"] # ["everything!"] Note how anyarguments passed into to_enum are passed along as arguments to the iteration method itself. In Ruby1.9, the Enumerable::Enumerator class is part of the Rubycore; youdon’t need the require statement. Also, each_cons and each_slice are built-in methods of Enumerable. See Also • Recipe 7.9, “Looping Through Multiple Iterables in Parallel” • Recipe 20.6, “Running a Code Block on Many Objects Simultaneously” 7.7 Writing Block Methods That Classify or Collect | 253 7.7 Writing Block Methods That Classify or Collect Problem The basic block methods that come with the Rubystandard libraryaren’t enough for you. You want to define your own method that classifies the elements in an enumer- ation (like Enumerable#detect and Enumerable#find_all), or that does a transforma- tion on each element in an enumeration (like Enumerable#collect). Solution You can usuallyuse inject to write a method that searches or classifies an enumera- tion of objects. With inject you can write your own versions of methods such as detect and find_all: module Enumerable def find_no_more_than(limit) inject([]) do |a,e| a << e if yield e return a if a.size >= limit a end end end This code finds at most three of the even numbers in a list: a = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10] a.find_no_more_than(3) { |x| x % 2 == 0 } # => [2, 4, 6] If you find yourself needing to write a method like collect, it’s probablybecause, for your purposes, collect itself yields elements in the wrong order. You can’t use inject, because that yields elements in the same order as collect. You need to find or write an iterator that yields elements in the order you want. Once you’ve done that, you have two options: you can write a collect equivalent on top of the iterator method, or you can use the iterator method to build an Enumerable object, and call its collect method (as seen in Recipe 7.6). Discussion We discussed these block methods in more detail in Chapter 4, because arrays are the simplest and most common enumerable data type, and the most common. But almost anydata structure can be enumerated, and a more complex data structure can be enumerated in more different ways. As you’ll see in Recipe 9.4, the Enumerable methods, like detect and inject, are actu- allyimplemented in terms of each. The detect and inject methods yield to the code block everyelement that comes out of each. The value of the yield statement is used to determine whether the element matches some criteria. 254 | Chapter 7: Code Blocks and Iteration In a method like detect, the iteration maystop once it finds an element that matches. In a method like find_all, the iteration goes through all elements, collecting the ones that match. Methods like collect work the same way, but instead of returning a subset of ele- ments based on what the code block says, they collect the values returned by the code block in a new data structure, and return the data structure once the iteration is completed. If you’re using a particular object and you wish its collect method used a different iterator, then you should turn the object into an Enumerator and call its collect method. But if you’re writing a class and you want to expose a new collect-like method, you’ll have to define a new method.* In that case, the best solution is proba- blyto expose a method that returns a custom Enumerator: that way, your users can use all the methods of Enumerable, not just collect. See Also • Recipe 4.5, “Sorting an Array” • Recipe 4.11, “Getting the N Smallest Items of an Array” • Recipe 4.15, “Partitioning or Classifying a Set” • Recipe 7.6, “Changing the Way an Object Iterates” • If all you want is to make your custom data structure support the methods of Enumerable, see Recipe 9.4, “Implementing Enumerable: Write One Method, Get 22 Free” 7.8 Stopping an Iteration Problem You want to interrupt an iteration from within the code block you passed into it. Solution The simplest wayto interrupt execution is to use break.Abreak statement will jump out of the closest enclosing loop defined in the current method: 1.upto(10) do |x| puts x break if x == 3 end # 1 # 2 # 3 * Of course, behind the scenes, your method could just create an appropriate Enumerator and call its collect implemenation. 7.8 Stopping an Iteration | 255 Discussion The break statement is simple but it has several limitations. You can’t use break within a code block defined with Proc.new or (in Ruby1.9 and up) Kernel#proc.If this is a problem for you, use lambda instead: aBlock = Proc.new do |x| puts x break if x == 3 puts x + 2 end aBlock.call(5) # 5 # 7 aBlock.call(3) # 3 # LocalJumpError: break from proc-closure More seriously, you can’t use break to jump out of multiple loops at once. Once a loop has run, there’s no wayto know whether it completed normallyor byusing break. The simplest wayaround this problem is to enclose the code youwant to skip within a catch block with a descriptive symbolic name. You can then throw the correspond- ing symbol when you want to jump to the end of the catch block. This lets you skip out of any number of nested loops and method calls. The throw/catch syntax isn’t exception handling—exceptions use a raise/rescue syntax. This is a special flow control construct designed to replace the use of excep- tions for flow control (as sometimes happens in Java programs). It’s a bit like an old- style global GOTO, capable of suddenly moving execution to a faraway part of your program. It keeps your code more readable than a GOTO, though, because it’s restricted: a throw can only jump to the end of a corresponding catch block. The best example of the catch..throw syntax is the Find.find function described in Recipe 6.12. When you pass a code block into Find.find, it yields up every direc- tory and file in a certain directory tree. When your code block is given a directory, it can stop find from recursing into that directorybycalling Find.prune, which throws a :prune symbol. Using break would stop the find operation altogether; throwing a symbol lets Find.prune know to just skip one directory. Here’s a simplified view of the Find.find and Find.prune code: def find(*paths) paths.each do |p| catch(:prune) do # Process p as a file or directory... end # When you call Find.prune you'll end up here. 256 | Chapter 7: Code Blocks and Iteration end end def prune throw :prune end When you call Find.prune, execution jumps to immediatelyafter the catch(:prune) block. Find.find then starts processing the next file or directory. See Also • Recipe 6.12, “Walking a Directory Tree” • ri Find 7.9 Looping Through Multiple Iterables in Parallel Problem You want to traverse multiple iteration methods simultaneously, probably to match up the corresponding elements in several different arrays. Solution The SyncEnumerator class, defined in the generator library, makes it easy to iterate over a bunch of arrays or other Enumerable objects in parallel. Its each method yields a series of arrays, each array containing one item from each underlying Enumerable object: require 'generator' enumerator = SyncEnumerator.new(%w{Four seven}, %w{score years}, %w{and ago}) enumerator.each do |row| row.each { |word| puts word } puts '---' end # Four # score # and # --- # seven # years # ago # --- enumerator = SyncEnumerator.new(%w{Four and}, %w{score seven years ago}) enumerator.each do |row| row.each { |word| puts word } puts '---' end # Four 7.9 Looping Through Multiple Iterables in Parallel | 257 # score # --- # and # seven # --- # nil # years # --- # nil # ago # --- You can reproduce the workings of a SyncEnumerator bywrapping each of your Enumerable objects in a Generator object. This code acts like SyncEnumerator#each, only it yields each individual item instead of arrays containing one item from each Enumerable: def interosculate(*enumerables) generators = enumerables.collect { |x| Generator.new(x) } done = false until done done = true generators.each do |g| if g.next? yield g.next done = false end end end end interosculate(%w{Four and}, %w{score seven years ago}) do |x| puts x end # Four # score # and # seven # years # ago Discussion Anyobject that implements the each method can be wrapped in a Generator object. If you’ve used Java, think of a Generator as being like a Java Iterator object. It keeps track of where you are in a particular iteration over a data structure. Normally, when you pass a block into an iterator method like each, that block gets called for everyelement in the iterator without interruption. No code outside the block will run until the iterator is done iterating. You can stop the iteration bywrit- ing a break statement inside the code block, but you can’t restart a broken iteration later from the same place—unless you use a Generator. 258 | Chapter 7: Code Blocks and Iteration Think of an iterator method like each as a candydispenser that pours out all its candyin a steadystream once youpush the button. The Generator class lets you turn that candydispenser into one which dispenses onlyone piece of candyeverytime you push its button. You can carry this new dispenser around and ration your candy more easily. In Ruby1.8, the Generator class uses continuations to achieve this trick. It sets book- marks for jumping out of an iteration and then back in. When you call Generator#next the generator “pumps” the iterator once (yielding a single element), sets a bookmark, and returns control back to your code. The next time you call Generator#next, the generator jumps back to its previouslyset bookmark and “pumps” the iterator once more. Ruby1.9 uses a more efficient implementation based on threads. This implementa- tion calls each Enumerable object’s each method (triggering the neverending stream of candy), but it does it in a separate thread for each object. After each piece of candy comes out, Rubyfreezes time (pauses the thread) until the next time youcall Generator#next. It’s simple to wrap an arrayin a generator, but if that’s all there were to generators, you wouldn’t need to mess around with Generators or even SyncEnumerables. It’s easy to simulate the behavior of SyncEnumerable for arrays by starting an index into each arrayand incrementing it whenever youwant to get another item from a particular array. Generator methods are truly useful in their ability to turn any type of iteration into a single-item candy dispenser. Suppose that you want to use the functionality of a generator to iterate over an array, but you have an unusual type of iteration in mind. For instance, consider an array that looks like this: l = ["junk1", 1, "junk2", 2, "junk3", "junk4", 3, "junk5"] Let’s sayyou’dlike to iterate over the list but skip the “junk” entries. Wrapping the list in a generator object doesn’t work; it gives you all the entries: g = Generator.new(l) g.next # => "junk1" g.next # => 1 g.next # => "junk2" It’s not difficult to write an iterator method that skips the junk. Now, we don’t want an iterator method—we want a Generator object—but the iterator method is a good starting point. At least it proves that the iteration we want can be implemented in Ruby. def l.my_iterator each { |e| yield e unless e =~ /^junk/ } end l.my_iterator { |x| puts x } 7.9 Looping Through Multiple Iterables in Parallel | 259 # 1 # 2 # 3 Here’s the twist: when you wrap an array in a Generator or a SyncEnumerable object, you’re actually wrapping the array’s each method. The Generator doesn’t just hap- pen to yield elements in the same order as each: it’s actuallycalling each, but using continuation (or thread) trickeryto pause the iteration after each call to Generator#next. Bydefining an appropriate code block and passing it into the Generator constructor, you can make a generation object of out of any piece of iteration code—not only the each method. The generator will know to call and interrupt that block of code, just as it knows to call and interrupt each when you pass an array into the constructor. Here’s a generator that iterates over our array the way we want: g = Generator.new { |g| l.each { |e| g.yield e unless e =~ /^junk/ } } g.next # => 1 g.next # => 2 g.next # => 3 The Generator constructor can take a code block that accepts the generator object itself as an argument. This code block performs the iteration that you’d like to have wrapped in a generator. Note the basic similarityof the code block to the bodyof the l#my_iterator method. The onlydifference is that instead of the yield keyword we call the Generator#yield function, which handles some of the work involved with set- ting up and jumping to the continuations (Generator#next handles the rest of the continuation work). Once you see how this works, you can eliminate some duplicate code by wrapping the l#my_iterator method itself in a Generator: g = Generator.new { |g| l.my_iterator { |e| g.yield e } } g.next # => 1 g.next # => 2 g.next # => 3 Here’s a version of the interosculate method that can wrap methods as well as arrays. It accepts any combination of Enumerable objects and Method objects, turns each one into a Generator object, and loops through all the Generator objects, get- ting one element at a time from each: def interosculate(*iteratables) generators = iteratables.collect do |x| if x.is_a? Method Generator.new { |g| x.call { |e| g.yield e } } else Generator.new(x) end end done = false until done 260 | Chapter 7: Code Blocks and Iteration done = true generators.each do |g| if g.next? yield g.next done = false end end end end Here, we pass interosculate an arrayand a Method object, so that we can iterate through two arrays in opposite directions: words1 = %w{Four and years} words2 = %w{ago seven score} interosculate(words1, words2.method(:reverse_each)) { |x| puts x } # Four # score # and # seven # years # ago See Also • Recipe 7.5, “Writing an Iterator Over a Data Structure” • Recipe 7.6, “Changing the Way an Object Iterates” 7.10 Hiding Setup and Cleanup in a Block Method Problem You have a setup method that always needs to run before custom code, or a cleanup method that needs to run afterwards. You don’t trust the person writing the code (possibly yourself) to remember to call the setup and cleanup methods. Solution Create a method that runs the setup code, yields to a code block (which contains the custom code), then runs the cleanup code. To make sure the cleanup code always runs, even if the custom code throws an exception, use a begin/finally block. def between_setup_and_cleanup setup begin yield finally cleanup end end 7.10 Hiding Setup and Cleanup in a Block Method | 261 Here’s a concrete example. It adds a DOCTYPE and an HTML tag to the beginning of an HTML document. At the end, it closes the HTML tag it opened earlier. This saves you a little bit of work when you’re generating HTML files. def write_html(out, doctype=nil) doctype ||= %{} out.puts doctype out.puts '' begin yield out ensure out.puts '' end end write_html($stdout) do |out| out.puts '

      Sorry, the Web is closed.

      ' end # # #

      Sorry, the Web is closed.

      # Discussion This useful technique shows up most often when there are scarce resources (such as file handles or database connections) that must be closed when you’re done with them, lest theyall get used up. A language that makes the programmer remember these resources tends to leak those resources, because programmers are lazy. Ruby makes it easy to be lazy and still do the right thing. You’ve probablyused this technique already,with the the Kernel#open and File#open methods for opening files on disk. These methods accept a code block that manipu- lates an alreadyopen file. Theyopen the file, call yourcode block, and close the file once you’re done: open('output.txt', 'w') do |out| out.puts 'Sorry, the filesystem is also closed.' end Ruby’s standard cgi module takes the write_html example to its logical conclusion.* You can construct an entire HTML document bynesting blocks inside each other. Here’s a small RubyCGI that outputs much the same document as the write_html example above. #!/usr/bin/ruby # closed_cgi.rb * But your code will be more maintainable if you do HTML with templates instead of writing it in Ruby code. 262 | Chapter 7: Code Blocks and Iteration require 'cgi' c = CGI.new("html4") c.out do c.html do c.h1 { 'Sorry, the Web is closed.' } end end Note the multiple levels of blocks: the block passed into CGI#out simplycalls CGI#html to generate the DOCTYPE and the tags. The tags contain the result of a call to CGI#h1, which encloses some plain text in

      tags. The program produces this output: Content-Type: text/html Content-Length: 137

      Sorry, the Web is closed.

      The XmlMarkup class in Ruby’s builder gem works the same way: you can write Ruby code that resembles the structure of the document it creates: require 'rubygems' require 'builder' xml = Builder::XmlMarkup.new.message('type' => 'apology') do |b| b.content('Sorry, Web Services are closed.') end puts xml # # Sorry, Web Services are closed. # See Also • Recipe 6.13, “Locking a File,” uses this technique to create a method that locks a file, and automatically unlocks it when you’re done using it • Recipe 11.9, “Creating and Modifying XML Documents” • Recipe 20.11, “Avoiding Deadlock,” uses this technique to have your thread lock multiple resources in the right order, and unlock them when you’re done using them 7.11 Coupling Systems Loosely with Callbacks Problem You want to combine different types of objects without hardcoding them full of ref- erences to each other. 7.11 Coupling Systems Loosely with Callbacks | 263 Solution Use a callback system, in which objects register code blocks with each other to be executed as needed. An object can call out to its registered callbacks when it needs something, or it can send notification to the callbacks when it does something. To implement a callback system, write a “register” or “subscribe” method that accepts a code block. Store the registered code blocks as Proc objects in a data struc- ture: probably an array (if you only have one type of callback) or a hash (if you have multiple types). When you need to call the callbacks, iterate over the data structure and call each of the registered code blocks. Here’s a mixin module that gives each instance of a class its own hash of “listener” callback blocks. An outside object can listen for a particular event bycalling subscribe with the name of the event and a code block. The dispatcher itself is responsible for calling notify with an appropriate event name at the appropriate time, and the outside object is responsible for passing in the name of the event it wants to “listen” for. module EventDispatcher def setup_listeners @event_dispatcher_listeners = {} end def subscribe(event, &callback) (@event_dispatcher_listeners[event] ||= []) << callback end protected def notify(event, *args) if @event_dispatcher_listeners[event] @event_dispatcher_listeners[event].each do |m| m.call(*args) if m.respond_to? :call end end return nil end end Here’s a Factory class that keeps a set of listeners. An outside object can choose to be notified everytime a Factory object is created, or everytime a Factory object pro- duces a widget: class Factory include EventDispatcher def initialize setup_listeners end def produce_widget(color) #Widget creation code goes here... 264 | Chapter 7: Code Blocks and Iteration notify(:new_widget, color) end end Here’s a listener class that’s interested in what happens with Factory objects: class WidgetCounter def initialize(factory) @counts = Hash.new(0) factory.subscribe(:new_widget) do |color| @counts[color] += 1 puts "#{@counts[color]} #{color} widget(s) created since I started watching." end end end Finally, here’s the listener in action: f1 = Factory.new WidgetCounter.new(f1) f1.produce_widget("red") # 1 red widget(s) created since I started watching. f1.produce_widget("green") # 1 green widget(s) created since I started watching. f1.produce_widget("red") # 2 red widget(s) created since I started watching. # This won't produce any output, since our listener is listening to # another Factory. Factory.new.produce_widget("blue") Discussion Callbacks are an essential technique for making your code extensible. This tech- nique has manynames (callbacks, hook methods, plugins, publish/subscribe, etc.) but no matter what terminologyis used, it’s alwaysthe same. One object asks another to call a piece of code (the callback) when some condition is met. This tech- nique works even when the two objects know almost nothing about each other. This makes it ideal for refactoring big, tightlyintegrated systemsinto smaller, looselycou- pled systems. In a pure listener system (like the one given in the Solution), the callbacks set up lines of communication that always move from the event dispatcher to the listeners. This is useful when you have a master object (like the Factory), from which numer- ous lackey objects (like the WidgetCounter) take all their cues. But in many loosely coupled systems, information moves both ways: the dispatcher calls the callbacks and then uses the return results. Consider the stereotypical web portal: a customizable homepage full of HTML boxes containing sports scores, weather predictions, and so on. Since new boxes are always being added to the 7.11 Coupling Systems Loosely with Callbacks | 265 system, the core portal software shouldn’t have to know anything about a specific box. The boxes should also know as little about the core software as possible, so that changing the core doesn’t require a change to all the boxes. A simple change to the EventDispatcher class makes it possible for the dispatcher to use the return values of the registered callbacks. The original implementation of EventDispatcher#notify called the registered code blocks, but ignored their return value. This version of EventDispatcher#notify yields the return values to a block passed in to notify: module EventDispatcher def notify(event, *args) if @event_dispatcher_listeners[event] @event_dispatcher_listeners[event].each do |m| yield(m.call(*args)) if m.respond_to? :call end end return nil end end Here’s an insultinglysimple portal rendering engine. It lets boxes register to be ren- dered inside an HTML table, on one of two rows on the portal page: class Portal include EventDispatcher def initialize setup_listeners end def render puts '' render_block = Proc.new { |box| puts " " } [:row1, :row2].each do |row| puts ' ' notify(row, &render_block) puts ' ' end puts '
      #{box}
      ' end end Here’s the rendering engine rendering a specific user’s portal layout. This user likes to see a stock ticker and a weather report on the left, and a news box on the right. Note that there aren’t even anyclasses for these boxes; they’reso simple theycan be implemented as anonymous code blocks: portal = Portal.new portal.subscribe(:row1) { 'Stock Ticker' } portal.subscribe(:row1) { 'Weather' } portal.subscribe(:row2) { 'Pointless, Trivial News' } 266 | Chapter 7: Code Blocks and Iteration portal.render # # # # # # # # #
      Stock TickerWeather
      Pointless, Trivial News
      If you want the registered listeners to be shared across all instances of a class, you can make listeners a class variable, and make subscribe a module method. This is most useful when you want listeners to be notified whenever a new instance of the class is created. 267 Chapter 8 CHAPTER 8 Objects and Classes8 Rubyis an object-oriented programming language; this chapter will show youwhat that reallymeans. Like all modern languages, Rubysupports object-oriented notions like classes, inheiritance, and polymorphism. But Ruby goes further than other lan- guages you may have used. Some languages are strict and some are permissive; Ruby is one of the most permissive languages around. Strict languages enforce strong typing, usually at compile type: a variable defined as an arraycan’t be used as another data type.If a method takes an arrayas an argu- ment, you can’t pass in an array-like object unless that object happens to be a sub- class of the array class or can be converted into an array. Ruby enforces dynamic typing, or duck typing (“if it quacks like a duck, it is a duck”). A strongly typed language enforces its typing everywhere, even when it’s not needed. Rubyenforces its duck typingrelative to a particular task. If a variable quacks like a duck, it is one—assuming you wanted to hear it quack. When you want “swims like a duck” instead, duck typing will enforce the swimming, and not the quacking. Here’s an example. Consider the following three classes, Duck, Goose, and DuckRecording: class Duck def quack 'Quack!' end def swim 'Paddle paddle paddle...' end end class Goose def honk 'Honk!' end 268 | Chapter 8: Objects and Classes def swim 'Splash splash splash...' end end class DuckRecording def quack play end def play 'Quack!' end end If Rubywas a stronglytypedlanguage, a method that told a Duck to quack would fail when given a DuckRecording. The following code is written in the hypothetical lan- guage Strongly-Typed Ruby; it won’t work in real Ruby. def make_it_quack(Duck duck) duck.quack end make_it_quack(Duck.new) # => "Quack!" make_it_quack(DuckRecording.new) # TypeException: object not of type Duck If you were expecting a Duck, you wouldn’t be able to tell a Goose to swim: def make_it_swim(Duck duck) duck.swim end make_it_swim(Duck.new) # => "Paddle paddle paddle..." make_it_swim(Goose.new) # TypeException: object not of type Goose Since real Ruby uses duck typing, you can get a recording to quack or a goose to swim: def make_it_quack(duck) duck.quack end make_it_quack(Duck.new) # => "Quack!" make_it_quack(DuckRecording.new) # => "Quack!" def make_it_swim(duck) duck.swim end make_it_swim(Duck.new) # => "Paddle paddle paddle..." make_it_swim(Goose.new) # => "Splash splash splash..." But you can’t make a recording swim or a goose quack: make_it_quack(Goose.new) # NoMethodError: undefined method `quack' for # 8.1 Managing Instance Data | 269 make_it_swim(DuckRecording.new) # NoMethodError: undefined method `swim' for # Over time, strict languages develop workarounds for their strong typing (have you ever done a cast when retrieving something from an Java collection?), and then workarounds for the workarounds (have you ever created a parameterized Java col- lection using generics?). Rubyjust doesn’t bother with anyof it. If an object sup- ports the method you’re trying to use, Ruby gets out of its way and lets it work. Ruby’s permissiveness is more a matter of attitude than a technical advancement. Python lets you reopen a class after its original definition and modify it after the fact, but the language syntax doesn’t make many allowances for it. It’s sort of a dirty little secret of the language. In Ruby, this behavior is not only allowed, it’s encouraged. Some parts of the standard libraryadd functionalityto built-in classes when imported, just to make it easier for the programmer to write code. The Facets Core libraryadds dozens of convenience methods to Ruby’s standard classes. Ruby is proud of this capa- bility, and urges programmers to exploit it if it makes their lives easier. Strict languages end up needing code generation tools that hide the restrictions and complexities of the language. Rubyhas code generation tools built right into the language, saving you work while leaving complete control in your hands (see Chapter 10). Is this chaotic? It can be. Does it matter? Only when it actually interferes with you get- ting work done. In this chapter and the next two, we’ll show you how to follow com- mon conventions, and how to impose order on the chaos when you need it. With Rubyyoucan impose the right kind of order on your objects, tailored for your situa- tion, not a one-size-fits all that makes you jump through hoops most of the time. These recipes are probably less relevant to the problems you’re trying to solve than the other ones in this book, but they’re not less important. This chapter and the next two provide a general-purpose toolbox for doing the dirtywork of actual program- ming, whatever your underlying purpose or algorithm. These are the chapters you should turn to when you find yourself stymied by the Ruby language itself, or grind- ing through tedious makework that Ruby’s labor-saving techniques can eliminate. Every other chapter in this book uses the ideas behind these recipes. 8.1 Managing Instance Data Problem You want to associate a variable with an object. You mayalso want the variable to be readable or writable from outside the object. 270 | Chapter 8: Objects and Classes Solution Within the code for the object’s class, define a variable and prefix its name with an at sign (@). When an object runs the code, a variable bythat name will be stored within the object. An instance of the Frog class defined below might eventuallyhave two instance vari- ables stored within it, @name and @speaks_english: class Frog def initialize(name) @name = name end def speak # It's a well-known fact that only frogs with long names start out # speaking English. @speaks_english ||= @name.size > 6 @speaks_english ? "Hi. I'm #{@name}, the talking frog." : 'Ribbit.' end end Frog.new('Leonard').speak # => "Hi. I'm Leonard, the talking frog." lucas = Frog.new('Lucas') lucas.speak # => "Ribbit." If you want to make an instance variable readable from outside the object, call the attr_reader method on its symbol: lucas.name # NoMethodError:undefined method `name' for # class Frog attr_reader :name end lucas.name # => "Lucas" Similarly, to make an instance variable readable and writable from outside the object, call the attr_accessor method on its symbol: lucas.speaks_english = false # => NoMethodError: undefined method `speaks_english=' for # class Frog attr_accessor :speaks_english end lucas.speaks_english = true lucas.speak # => "Hi. I'm Lucas, the talking frog." 8.1 Managing Instance Data | 271 Discussion Some programming languages have complex rules about when one object can directlyaccess to another object’s instance variables. Rubyhas one simple rule: it’s never allowed. To get or set the value of an instance variable from outside the object that owns it, you need to call an explicitly defined getter or setter method. Basic getter and setter methods look like this: class Frog def speaks_english @speaks_english end def speaks_english=(value) @speaks_english = value end end But it’s boring and error-prone to write that yourself, so Ruby provides built-in deco- rator methods like Module#attr_reader and Module#attr_accessor. These methods use metaprogramming to generate custom getter and setter methods for your class. Calling attr_reader :speaks_english generates the getter method speaks_english and attaches it to your class. Calling attr_accessor :instance_variable generates both the getter method speaks_english and the setter method speaks_english=. There’s also an attr_writer decorator method, which onlygenerates a setter method, but you won’t use it very often. It doesn’t usually make sense for an instance variable to be writable from the outside, but not readable. You’ll probably use it only when you plan to write your own custom getter method instead of gener- ating one. Another slight difference between Rubyand some other programming languages: in Ruby, instance variables (just like other variables) don’t exist until they’re defined. Below, note how the @speaks_english variable isn’t defined until the Frog#speak method gets called: michael = Frog.new("Michael") # => # michael.speak # => "Hi. I'm Michael, the talking frog." michael # => # It’s possible that one Frog object would have the @speaks_english instance variable set while another one would not. If you call a getter method for an instance variable that’s not defined, you’ll get nil. If this behavior is a problem, write an initialize that initializes all your instance variables. Given the symbol for an instance variable, you can retrieve the value with Object#instance_variable_get, and set it with Object#instance_variable_set. 272 | Chapter 8: Objects and Classes Because this method ignores encapsulation, you should only use it in within the class itself: say, within a call to Module#define_method. This use of instance_variable_get violates encapsulation, since we’re calling it from outside the Frog class: michael.instance_variable_get("@name") # => "Michael" michael.instance_variable_set("@name", 'Bob') michael.name # => "Bob" This use doesn’t violate encapsulation (though there’s no real need to call define_ method here): class Frog define_method(:scientific_name) do species = 'vulgaris' species = 'loquacious' if instance_variable_get('@speaks_english') "Rana #{species}" end end michael.scientific_name # => "Rana loquacious" See Also • Recipe 10.10, “Avoiding Boilerplate Code with Metaprogramming” 8.2 Managing Class Data Problem Instead of storing a bit of data along with everyinstance of a class, youwant to store a bit of data along with the class itself. Solution Instance variables are prefixed bya single at sign; class variables are prefixed bytwo at signs. This class contains both an instance variable and a class variable: class Warning @@translations = { :en => 'Wet Floor', :es => 'Piso Mojado' } def initialize(language=:en) @language = language end def warn @@translations[@language] end end Warning.new.warn # => "Wet Floor" Warning.new(:es).warn # => "Piso Mojado" 8.2 Managing Class Data | 273 Discussion Class variables store information that’s applicable to the class itself, or applicable to everyinstance of the class. They’reoften used to control, prevent, or react to the instantiation of the class. A class variable in Ruby acts like a static variable in Java. Here’s an example that uses a class constant and a class variable to control when and how a class can be instantiated: class Fate NAMES = ['Klotho', 'Atropos', 'Lachesis'].freeze @@number_instantiated = 0 def initialize if @@number_instantiated >= NAMES.size raise ArgumentError, 'Sorry, there are only three Fates.' end @name = NAMES[@@number_instantiated] @@number_instantiated += 1 puts "I give you... #{@name}!" end end Fate.new # I give you... Klotho! # => # Fate.new # I give you... Atropos! # => # Fate.new # I give you... Lachesis! # => # Fate.new # ArgumentError: Sorry, there are only three Fates. It’s not considered good form to write setter or getter methods for class variables. You won’t usuallyneed to expose anyclass-wide information apart from helpful con- stants, and those you can expose with class constants such as NAMES above. If you do want to write setter or getter methods for class variables, you can use the following class-level equivalents of Module#attr_reader and Module#attr_writer. They use metaprogramming to define new accessor methods:* class Module def class_attr_reader(*symbols) symbols.each do |symbol| * In Ruby1.9, Object#send can’t be used to call private methods. You’ll need to replace the calls to send with calls to Object#funcall. 274 | Chapter 8: Objects and Classes self.class.send(:define_method, symbol) do class_variable_get("@@#{symbol}") end end end def class_attr_writer(*symbols) symbols.each do |symbol| self.class.send(:define_method, "#{symbol}=") do |value| class_variable_set("@@#{symbol}", value) end end end def class_attr_accessor(*symbols) class_attr_reader(*symbols) class_attr_writer(*symbols) end end Here is Module#class_attr_reader being used to give the Fate class an accessor for its class variable: Fate.number_instantiated # NoMethodError: undefined method `number_instantiated' for Fate:Class class Fate class_attr_reader :number_instantiated end Fate.number_instantiated # => 3 You can have both a class variable foo and an instance variable foo, but this will only end up confusing you. For instance, the accessor method foo must retrieve one or the other. If you call attr_accessor :foo and then class_attr_accessor :foo, the class version will silently overwrite the instance version. As with instance variables, you can bypass encapsulation and use class variables directlywith class_variable_get and class_variable_set. Also as with instance vari- ables, you should only do this from inside the class, usually within a define_method call. See Also • If you want to create a singleton, don’t mess around with class variables; instead, use the singleton library from Ruby’s standard library • Recipe 8.18, “Implementing Class and Singleton Methods” • Recipe 10.10, “Avoiding Boilerplate Code with Metaprogramming” 8.3 Checking Class or Module Membership | 275 8.3 Checking Class or Module Membership Problem You want to see if an object is of the right type for your purposes. Solution If you plan to call a specific method on the object, just check to see whether the object reponds to that method: def send_as_package(obj) if obj.respond_to? :package packaged = obj.package else $stderr.puts "Not sure how to package a #{obj.class}." $stderr.puts 'Trying generic packager.' package = Package.new(obj) end send(package) end If you really can only accept objects of one specific class, or objects that include one specific module, use the is_a? predicate: def multiply_precisely(a, b) if a.is_a? Float or b.is_a? Float raise ArgumentError, "I can't do precise multiplication with floats." end a * b end multiply_precisely(4, 5) # => 20 multiply_precisely(4.0, 5) # ArgumentError: I can't do precise multiplication with floats. Discussion Whenever possible, you should use duck typing (Object#respond_to?) in preference to class typing (Object#is_a?). Duck typing is one of the great strengths of Ruby, but it only works if everyone uses it. If you write a method that only accepts strings, instead of accepting anything that supports to_str, then you’ve broken the duck typ- ing illusion for everyone who uses your code. Sometimes you can’t use duck typing, though, or sometimes you need to combine it with class typing. Sometimes two different classes define the same method (espe- cially one of the operators) in completely different ways. Duck typing makes it possi- ble to silently do the right thing, but if you know that duck typing would silently do the wrong thing, a little class typing won’t hurt. 276 | Chapter 8: Objects and Classes Here’s a method that uses duck typing to see whether an operation is supported, and class typing to cut short a possible problem before it occurs: def append_to_self(x) unless x.respond_to? :<< raise ArgumentError, "This object doesn't support the left-shift operator." end if x.is_a? Numeric raise ArgumentError, "The left-shift operator for this object doesn't do an append." end x << x end append_to_self('abc') # => "abcabc" append_to_self([1, 2, 3]) # => [1, 2, 3, [...]] append_to_self({1 => 2}) # ArgumentError: This object doesn't support the left-shift operator. append_to_self(5) # ArgumentError: The left-shift operator for this object doesn't do an append. 5 << 5 # => 160 # That is, 5 * (2 ** 5) An alternative solution approximates the functionalityof Java’s interfaces. You can create a dummymodule for a given capability,have all appropriate classes include it, and use is_a? to check for inclusion of the module. This requires that each partici- pating class signal its abilityto perform a certain task, but it doesn’t tie youto any particular class hierarchy, and it saves you from calling the wrong method just because it has the right name. module ShiftMeansAppend def <<(x) end end class String include ShiftMeansAppend end class Array include ShiftMeansAppend end def append_to_self(x) unless x.is_a? ShiftMeansAppend raise ArgumentError, "I can't trust this object's left-shift operator." end x << x end 8.4 Writing an Inherited Class | 277 append_to_self 4 # ArgumentError: I can't trust this object's left-shift operator. append_to_self '4' # => "44" See Also • Recipe 1.12, “Testing Whether an Object Is String-Like” 8.4 Writing an Inherited Class Problem You want to create a new class that extends or modifies the behavior of an existing class. Solution If you’re writing a new method that conceptually belongs in the original class, you can reopen the class and append your method to the class definition. You should only do this if your method is generally useful, and you’re sure it won’t conflict with a method defined by some library you include in the future. This code adds a scramble method to Ruby’s built-in String class (see Recipe 4.10 for a faster way to sort randomly): class String def scramble split(//).sort_by { rand }.join end end "I once was a normal string.".scramble # => "i arg cn lnws.Ioateosma n r" If your method isn’t generally useful, or you don’t want to take the risk of modifying a class after its initial creation, create a subclass of the original class. The subclass can override its parent’s methods, or add new ones. This is safer because the original class, and anycode that depended on it, is unaffected. This subclass of String adds one new method and overrides one existing one: class UnpredictableString < String def scramble split (//).sort_by { rand }.join end def inspect scramble.inspect end end 278 | Chapter 8: Objects and Classes str = UnpredictableString.new("It was a dark and stormy night.") # => " hsar gsIo atr tkd naaniwdt.ym" str # => "ts dtnwIktsr oydnhgi .mara aa" Discussion All of Ruby’s classes can be subclassed, though a few of them can’t be usefully sub- classed (see Recipe 8.18 for information on how to deal with the holdouts). Rubyprogrammers use subclassing less frequentlythan theywould in other lan- guages, because it’s often acceptable to simplyreopen an existing class (even a built- in class) and attach a new method. We do this throughout this book, adding useful new methods to built-in classes rather than defining them in Kernel, or putting them in subclasses or utility classes. Libraries like Rails and Facets Core do the same. This improves the organization of your code. But the risk is that a library you include (or a libraryincluded byone youinclude) will define the same method in the same built-in class. Either the library will override your method (breaking your code), or you’ll override its method (breaking its code, which will break your code). There is no general solution to this problem short of adopting naming conventions, or always subclassing and never modifying preexisting classes. You should certainlysubclass if you’rewriting a method that isn’t generallyuseful, or that onlyapplies to certain instances of a class. For instance, here’s a method Array#sum that adds up the elements of an array: class Array def sum(start_at=0) inject(start_at) { |sum, x| sum + x } end end This works for arrays that contain only numbers (or that contain only strings), but it will fail for other kinds of arrays. [79, 14, 2].sum # => 95 ['so', 'fa'].sum('') # => "sofa" [79, 'so'].sum # TypeError: String can't be coerced into Fixnum Maybe you should signal this by putting it in a subclass called NumericArray or SummableArray: class NumericArray < Array def sum inject(0) { |sum, x| sum + x } end end 8.5 Overloading Methods | 279 The NumericArray class doesn’t actuallydo typechecking to make sure it onlycon- tains numeric objects, but since it’s a different class, you and other programmers are less likely to use sum where it’s not appropriate.* You should also subclass if you want to override a method’s behavior. In the UnpredictableString example, I overrode the inspect method in mysubclass. If I’d just modified String#inspect, the rest of myprogram would have been thrown into confusion. Rarelyis it acceptable to override a method in place: one example would be if you’ve written a drop-in implementation that’s more efficient. See Also • Recipe 8.18, “Implementing Class and Singleton Methods,” shows you how to extend the behavior of a particular object after it’s been created • http://www.rubygarden.org/ruby?TheOpenNatureOfRuby 8.5 Overloading Methods Problem You want to create two different versions of a method with the same name: two methods that differ in the arguments they take. Solution A Rubyclass can have onlyone method with a given name. Within that single method, though, you can put logic that branches depending on how many and what kinds of objects were passed in as arguments. Here’s a Rectangle class that represents a rectangular shape on a grid. You can instantiate a Rectangle in one of two ways: by passing in the coordinates of its top- left and bottom-left corners, or bypassing in its top-left corner along with its length and width. There’s onlyone initialize method, but you can act as though there were two. # The Rectangle constructor accepts arguments in either of the following forms: # Rectangle.new([x_top, y_left], length, width) # Rectangle.new([x_top, y_left], [x_bottom, y_right]) class Rectangle def initialize(*args) case args.size when 2 @top_left, @bottom_right = args * This isn’t a hard and fast rule. Array#sort won’t work on arrays whose elements can’t be mutually compared, but it would be a big inconvenience to put sort in a subclass of Array or leave it out of the Rubystandard library. You might feel the same way about sum; but then, you’re not the Ruby standard library. 280 | Chapter 8: Objects and Classes when 3 @top_left, length, width = args @bottom_right = [@top_left[0] + length, @top_left[1] - width] else raise ArgumentError, "This method takes either 2 or 3 arguments." end # Perform additional type/error checking on @top_left and # @bottom_right... end end Here’s the Rectangle constructor in action: ` Rectangle.new([10, 23], [14, 13]) # => # Rectangle.new([10, 23], 4, 10) # => # Rectangle.new # => ArgumentError: This method takes either 2 or 3 arguments. Discussion In strongly typed languages like C++ and Java, you must often create multiple ver- sions of the same method with different arguments. For instance, Java’s StringBuffer class implements over 10 variants of its append method: one that takes a boolean, one that takes a string, and so on. Ruby’s equivalent of StringBuffer is StringIO, and its equivalent of the append method is StringIO#<<. In Ruby, that method can only be defined once, but it can take an object of anytype.There’s no need to write different versions of the method for taking different kinds of object. If you need to do type checking (such as making sure the object has a string representation), you put it in the method body rather than in the method definition. Ruby’s loose typing eliminates most of the need for method overloading. Its default arguments, variable-length argument lists, and (simulated) keyword arguments elimi- nate most of the remaining cases. What’s left? Mainlymethods that can take two completelydifferent sets of arguments, like the Rectangle constructor given in the Solution. To handle these, write a method that takes a variable number of arguments, and give it some extra code at the front that figures out which set of arguments was passed. Rectangle#initialize rejects argument lists that are of the wrong length. Additional code could enforce duck typing to make sure that the arguments passed in are of the right type. See Recipe 10.16 for simple ways to do argument validation. 8.6 Validating and Modifying Attribute Values | 281 See Also • Recipe 8.11, “Accepting or Passing a Variable Number of Arguments” • Recipe 8.12, “Simulating Keyword Arguments” • Recipe 10.16, “Enforcing Software Contracts” 8.6 Validating and Modifying Attribute Values Problem You want to let outside code set your objects’ instance variables, but you also want to impose some control over the values your variables are set to. You might want a chance to validate new values before accepting them. Or you might want to accept values in a form convenient to the caller, but transform them into a different form for internal storage. Solution Define your own setter method for each instance variable you want to control. The setter method for an instance variable quantity would be called quantity=. When a user issues a statement like object.quantity = 10, the method object#quantity= is called with the argument 10. It’s up to the quantity= method to decide whether the instance variable quantity should actuallytake the value 10. A setter method is free to raise an ArgumentException if it’s passed an invalid value. It mayalso modifythe provided value, massaging it into the canonical form used bythe class. If it can get an accept- able value, its last act should be to modify the instance variable. I’ll define a class that keeps track of peoples’ first and last names. It uses setter meth- ods to enforce two somewhat parochial rules: everyone must have both a first and a last name, and everyone’s first name must begin with a capital letter: class Name # Define default getter methods, but not setter methods. attr_reader :first, :last # When someone tries to set a first name, enforce rules about it. def first=(first) if first == nil or first.size == 0 raise ArgumentError.new('Everyone must have a first name.') end first = first.dup first[0] = first[0].chr.capitalize @first = first end 282 | Chapter 8: Objects and Classes # When someone tries to set a last name, enforce rules about it. def last=(last) if last == nil or last.size == 0 raise ArgumentError.new('Everyone must have a last name.') end @last = last end def full_name "#{@first} #{@last}" end # Delegate to the setter methods instead of setting the instance # variables directly. def initialize(first, last) self.first = first self.last = last end end I’ve written the Name class so that the rules are enforced both in the constructor and after the object has been created: jacob = Name.new('Jacob', 'Berendes') jacob.first = 'Mary Sue' jacob.full_name # => "Mary Sue Berendes" john = Name.new('john', 'von Neumann') john.full_name # => "John von Neumann" john.first = 'john' john.first # => "John" john.first = nil # ArgumentError: Everyone must have a first name. Name.new('Kero, international football star and performance artist', nil) # ArgumentError: Everyone must have a last name. Discussion Rubynever lets one object access another object’s instance variables. All youcan do is call methods. Ruby simulates instance variable access bymaking it easyto define getter and setter methods whose names are based on the names of instance variables. When you access object.my_var, you’re actually calling a method called my_var, which (by default) just happens to return a reference to the instance variable my_var. Similarly, when you set a new value for object.my_var, you’re actually passing that value into a setter method called my_var=. That method might go ahead and stick your new value into the instance variable my_var. It might accept your value, but silentlyclean it up, convert it to another format, or otherwise modifyit. It might be picky and reject your value altogether by raising an ArgumentError. 8.7 Defining a Virtual Attribute | 283 When you’re defining a class, you can have Ruby generate a setter method for one of your instance variables by calling Module#atttr_writer or Module#attr_accessor on the symbol for that variable. This saves you from having to write code, but the default setter method lets anyone set the instance variable to any value at all: class SimpleContainer attr_accessor :value end c = SimpleContainer.new c.respond_to? "value=" # => true c.value = 10; c.value # => 10 c.value = "some random value"; c.value # => "some random value" c.value = [nil, nil, nil]; c.value # => [nil, nil, nil] A lot of the time, this kind of informalityis just fine. But sometimes youdon’t trust the data coming in through the setter methods. That’s when you can define your own methods to stop bad data before it infects your objects. Within a class, you have direct access to the instance variables. You can simply assign to an instance variable and the setter method won’t be triggered. If you do want to trigger the setter method, you’ll have to call it explicitly. Note how, in the Name#initialize method above, I call the first= and last= methods instead of assigning to @first and @last. This makes sure the validation code gets run for the initial values of every Name object. I can’t just say first = first, because first is a variable name in that method. See Also • Recipe 8.1, “Managing Instance Data” • Recipe 13.14, “Validating Data with ActiveRecord” 8.7 Defining a Virtual Attribute Problem You want to create accessor methods for an attribute that isn’t directlybacked by anyinstance variable: it’s a calculated value derived from one or more different instance variables. 284 | Chapter 8: Objects and Classes Solution Define accessor methods for the attribute in terms of the instance variables that are actuallyused. There need not be anyrelationship between the names of the accessor methods and the names of the instance variables. The following class exposes four accessor methods: degrees, degrees=, radians, and radians=. But it only stores one instance variable: @radians. class Arc attr_accessor :radians def degrees @radians * 180 / Math::PI end def degrees=(degrees) @radians = degrees * Math::PI / 180 end end arc = Arc.new arc.degrees = 180 arc.radians # => 3.14159265358979 arc.radians = Math::PI / 2 arc.degrees # => 90.0 Discussion Rubyaccessor methods usuallycorrespond to the names of the instance variables theyaccess, but this is nothing more than a convention. Outside code has no wayof knowing what your instance variables are called, or whether you have any at all, so you can create accessors for virtual attributes with no risk of outside code thinking they’re backed by real instance variables. See Also • Recipe 2.9, “Converting Between Degrees and Radians” 8.8 Delegating Method Calls to Another Object Problem You’d like to delegate some of an object’s method calls to a different object, or make one object capable of “impersonating” another. Solution If you want to completely impersonate another object, or delegate most of one object’s calls to another, use the delegate library. It generates custom classes whose 8.8 Delegating Method Calls to Another Object | 285 instances can impersonate objects of anyother class. These custom classes respond to all methods of the class theyshadow, but theydon’t do anywork of their own apart from calling the same method on some instance of the “real” class. Here’s some code that uses delegate to generate CardinalNumber, a class that acts almost like a Fixnum. CardinalNumber defines the same methods as Fixnum does, and it takes a genuine Fixnum as an argument to its constructor. It stores this object as a member, and when you call any of Fixnum’s methods on a CardinalNumber object, it delegates that method call to the stored Fixnum. The onlymajor exception is the to_s method, which I’ve decided to override. require 'delegate' # An integer represented as an ordinal number (1st, 2nd, 3rd...), as # opposed to an ordinal number (1, 2, 3...) Generated by the # DelegateClass to have all the methods of the Fixnum class. class OrdinalNumber < DelegateClass(Fixnum) def to_s delegate_s = __getobj_ _.to_s check = abs if to_check == 11 or to_check == 12 suffix = "th" else case check % 10 when 1 then suffix = "st" when 2 then suffix = "nd" else suffix = "th" end end return delegate_s + suffix end end 4.to_s # => "4" OrdinalNumber.new(4).to_s # => "4th" OrdinalNumber.new(102).to_s # => "102nd" OrdinalNumber.new(11).to_s # => "11th" OrdinalNumber.new(-21).to_s # => "-21st" OrdinalNumber.new(5).succ # => 6 OrdinalNumber.new(5) + 6 # => 11 OrdinalNumber.new(5) + OrdinalNumber.new(6) # => 11 Discussion The delegate library is useful when you want to extend the behavior of objects you don’t have much control over. Usuallythese are objects you’renot in charge of instantiating—they’re instantiated by factory methods, or by Ruby itself. With delegate, you can create a class that wraps an already existing object of another class 286 | Chapter 8: Objects and Classes and modifies its behavior. You can do all of this without changing the original class. This is especially useful if the original class has been frozen. There are a few methods that delegate won’t delegate: most of the ones in Kernel. public_instance_methods. The most important one is is_a?. Code that explicitly checks the type of your object will be able to see that it’s not a real instance of the object it’s impersonating. Using is_a? instead of respond_to? is often bad Rubyprac- tice, but it happens pretty often, so you should be aware of it. The Forwardable module is a little more precise and a little less discerning: it lets you delegate anyof an object’s methods to another object. A class that extends Forwardable can use the def_delegator decorator method, which takes as arguments an object symbol and a method symbol. It defines a new method that delegates to the method of the same name in the given object. There’s also a def_delegators method, which takes multiple method symbols as arguments and defines a delegator method for each one. Bycalling def_delegator multiple times, you can have a single Forwardable delegate different methods to different subobjects. Here I’ll use Forwardable to define a simple class that works like an array, but sup- ports none of Array’s methods except the append operator, <<. Note how the << method defined by def_delegator is passed through to modify the underlying array. class AppendOnlyArray extend Forwardable def initialize @array = [] end def_delegator :@array, :<< end a = AppendOnlyArray a << 4 a << 5 a.size # => undefined method `size' for # AppendOnlyArray is prettyuseless, but the same principle makes Forwardable useful if you want to expose only a portion of a class’ interface. For instance, suppose you want to create a data structure that works like a Hash, but onlysupports random access. You don’t want to support keys, each, or anyof the other waysof getting information out of a hash without providing a key. You could subclass Hash, then redefine or delete all the methods that you don’t want to support. Then you could worry a lot about having missed some of those methods. Or you could define a subclass of Forwardable and define onlythe methods of Hash that you do want to support. class RandomAccessHash extend Forwardable 8.9 Converting and Coercing Objects to Different Types | 287 def initialize @delegate_to = {} end def_delegators :@delegate_to, :[], "[]=" end balances_by_account_number = RandomAccessHash.new # Load balances from a database or something. balances_by_account_number["101240A"] = 412.60 balances_by_account_number["104918J"] = 10339.94 balances_by_account_number["108826N"] = 293.01 Random access works if you know the key, but anything else is forbidden: balances_by_account_number["104918J"] # => 10339.94 balances_by_account_number.each do |number, balance| puts "I now know the balance for account #{number}: it's #{balance}" end # => NoMethodError: undefined method `each' for # See Also • An alternative to using SimpleDelegator to write delegator methods is to skip out on the methods altogether, and instead implement a method_missing which does the delegating. Recipe 2.13, “Simulating a Subclass of Fixnum,” uses this tech- nique. You might especiallyfind this recipe interesting if you’dlike to make arithmetic on CardinalNumber objects yield new CardinalNumber objects instead of Fixnum objects. 8.9 Converting and Coercing Objects to Different Types Problem You have an object of one type and you want to use it as though it were of another type. Solution You might not have to do anything at all. Ruby doesn’t enforce type safety unless the programmer has explicitlywritten it in. If youroriginal class defines the same meth- ods as the class you were thinking of converting it to, you might be able to use your object as is. If you do have to convert from one class to another, Ruby provides conversion meth- ods for most common paths: "4".to_i # => 4 4.to_s # => "4" 288 | Chapter 8: Objects and Classes Time.now.to_f # => 1143572140.90932 { "key1" => "value1", "key2" => "value2" }.to_a # => [["key1", "value1"], ["key2", "value2"]] If all else fails, you might be able to manually create an instance of the new class, and set its instance variables using the old data. Discussion Some programming languages have a “cast” operator that forces the compiler to treat an object of one type like an object of another type. A cast is usually a programmer’s assertion that he knows more about the types of objects than the compiler. Ruby has no cast operator. From Ruby’s perspective, type checking is just an extra hoop you have to jump through. A cast operator would make it easier to jump through that hoop, but Ruby omits the hoop altogether. Wherever you’re tempted to cast an object to another type, you should be able to just do nothing. If your object can be used as the other type, there’s no problem: if not, then casting it to that type wouldn’t have helped anyway. Here’s a concrete example. You probablydon’t need to convert a hash into an array just so you can pass it into an iteration method that expects an array. If that method onlycalls each on its argument, it doesn’t really“expect an array:”it expects a rea- sonable implementation of each. Rubyhashes provide that implementation just as well as arrays. def print_each(array) array.each { |x| puts x.inspect } end hash = { "pickled peppers" => "peck of", "sick sheep" => "sixth" } print_each(hash.to_a) # ["sick sheep", "sixth"] # ["pickled peppers", "peck of"] print_each(hash) # ["sick sheep", "sixth"] # ["pickled peppers", "peck of"] Rubydoes provide methods for converting one data type into another. These meth- ods follow the naming convention to_[other type], and theyusuallycreate a brand new object of the new type, but containing the old data. They are generally used when you want to use some method of the new data type, or display or store the data in another format. In the case of print_each, not converting the hash to an arraygives the same results as converting, and the code is shorter and faster when it doesn’t do the conversion. But converting a hash into an array of key-value pairs does let you call methods defined by Array but not by Hash. If what you really want is an array—something 8.9 Converting and Coercing Objects to Different Types | 289 ordered, something you can modify with push and pop—there’s no reason not to con- vert to an array and stop using the hash. array = hash.to_a # => [["sick sheep", "sixth"], ["pickled peppers", "peck of"]] # Print out a tongue-twisting invoice. until array.empty? item, quantity = array.pop puts "#{quantity} #{item}" end # peck of pickled peppers # sixth sick sheep Some methods convert one data type to another as a side effect: for instance, sorting a hash implicitly converts it into an array, since hashes have no notion of ordering. hash.sort # => [["pickled peppers", "peck of"], ["sick sheep", "sixth"]] Number conversion and coercion Most of the commonlyused conversion methods in stock Rubyare in the number classes. This makes sense because arithmetic operations can give different results depending on the numeric types of the inputs. This is one place where Ruby’s con- version methods are used as a substitute for casting. Here, to_f is used to force Ruby to perform floating-point division instead of integer division: 3/4 # => 0 3/4.to_f # => 0.75 Integers and floating-point numbers have to_i and to_f methods to convert back and forth between each other. BigDecimal or Rational objects define the same methods; theyalso define some brand new conversion methods: to_d to convert a number to BigDecimal, and to_r to convert a number to Rational. To convert to or from Rational objects you just have to require 'rational'. To convert to or from BigDecimal objects you must require 'bigdecimal' and also require 'bigdecimal/utils'. require 'rational' Rational(1, 3).to_f # => 0.333333333333333 Rational(11, 5).to_i # => 2 2.to_r # => Rational(2, 1) Here’s a table that shows how to convert between Ruby’s basic numeric types. Integer Floating-point BigDecimal Rational Integer to_i(identity) to_f to_r.to_d to_r Float to_i(decimal discard) to_f (new) to_d to_d.to_r (include bigdecimal/util) BigDecimal to_i to_f to_d (new) to_r (include bigdecimal/util) Rational to_i(dec discard) to_f (approx) to_d (include bigdecimal/util) to_r (identity) 290 | Chapter 8: Objects and Classes Two cases deserve special mention. You can’t convert a floating-point number directlyinto rational number, but youcan do it through BigDecimal. The result will be imprecise, because floating-point numbers are imprecise. require 'bigdecimal' require 'bigdecimal/util' one_third = 1/3.0 # => 0.333333333333333 one_third.to_r # NoMethodError: undefined method `to_r' for 0.333333333333333:Float one_third.to_d.to_r # => Rational(333333333333333, 1000000000000000) Similarly, the best way to convert an Integer to a BigDecimal is to convert it to a rational number first. 20.to_d # NoMethodError: undefined method `to_d' for 20:Fixnum 20.to_r.to_d # => # When it needs to perform arithmetic operations on two numbers of different types, Rubyuses a method called coerce. Everynumeric typeimplements a coerce method that takes a single number as its argument. It returns an arrayof two numbers: the object itself and the argument passed into coerce. Either or both numbers might undergo a conversion, but whatever happens, both the numbers in the return array must be of the same type. The arithmetic operation is performed on these two num- bers, coerced into the same type. This way, the authors of numeric classes don’t have to make their arithmetic opera- tions support operations on objects of different types. If they implement coerce, they know that their arithmetic operations will onlybe passed in another object of the same type. This is easiest to see for the Complex class. Below, everyinput to coerce is trans- formed into an equivalent complex number so that it can be used in arithmetic oper- ations along with the complex number i: require 'complex' i = Complex(0, 1) # => Complex(0, 1) i.coerce(3) # => [Complex(3, 0), Complex(0, 1)] i.coerce(2.5) # => [Complex(2.5, 0), Complex(0, 1)] This, incidentally, is why 3/4 uses integer division but 3/4.to_f uses floating-point division. 3.coerce(4) returns two integer objects, so the arithmetic methods of Fixnum are used. 3.coerce(4.0) returns two floating-point numbers, so the arith- metic methods of Float are used. Other conversion methods All Rubyobjects define conversion methods to_s and inspect, which give a string representation of the object. Usually inspect is the more readable of the two formats. [1, 2, 3].to_s # => "123" [1, 2, 3].inspect # => "[1, 2, 3]" 8.10 Getting a Human-Readable Printout of Any Object | 291 Here’s a grab bag of other notable conversion methods found within the Rubystan- dard library. This should give you a picture of what Ruby conversion methods typi- cally do. • MatchData#to_a creates an arraycontaining the match groups of a regular expres- sion match. • Matrix#to_a converts a mathematical matrix into a nested array. • Enumerable#to_a iterates over anyenumerable object and collects the results in an array. • Net::HTTPHeader#to_hash returns a hash mapping the names of HTTP headers to their values. • String#to_f and String#to_i parse strings into numeric objects. Including the bigdecimal/util librarywill define String#to_d, which parses a string into a BigDecimal object. • Including the yaml librarywill define to_yaml methods for all of Ruby’s built-in classes: Array#to_yaml, String#to_yaml, and so on. See Also • Recipe 1.12, “Testing Whether an Object Is String-Like” • Recipe 2.1, “Parsing a Number from a String” • Recipe 8.10, “Getting a Human-Readable Printout of AnyObject” 8.10 Getting a Human-Readable Printout of Any Object Problem You want to look at a natural-looking rendition of a given object. Solution Use Object#inspect. Nearlyall the time, this method will give yousomething more readable than simply printing out the object or converting it into a string. a = [1,2,3] puts a # 1 # 2 # 3 puts a.to_s # 123 puts a.inspect # [1, 2, 3] 292 | Chapter 8: Objects and Classes puts /foo/ # (?-mix:foo) puts /foo/.inspect # /foo/ f = File.open('foo', 'a') puts f # # puts f.inspect # # Discussion Even verycomplex data structures can be inspected and come out looking just like theywould in Rubycode to define that data structure. In some cases, youcan even run the output of inspect through eval to recreate the object. periodic_table = [{ :symbol => "H", :name => "hydrogen", :weight => 1.007 }, { :symbol => "Rg", :name => "roentgenium", :weight => 272 }] puts periodic_table.inspect # [{:symbol=>"H", :name=>"hydrogen", :weight=>1.007}, # {:symbol=>"Rg", :name=>"roentgenium", :weight=>272}] eval(periodic_table.inspect)[0] # => {:symbol=>"H", :name=>"hydrogen", :weight=>1.007} Bydefault, an object’s inspect method works the same wayas its to_s method.* Unless your classes override inspect, inspecting one of your objects will yield a bor- ing and not terriblyhelpful string, containing onlythe object’s class name, object_id, and instance variables: class Dog def initialize(name, age) @name = name @age = age * 7 #Compensate for dog years end end spot = Dog.new("Spot", 2.1) spot.inspect # => "#" That’s why you’ll help out your future self by defining useful inspect methods that give relevant information about the objects you’ll be instantiating. class Dog def inspect "" end * Contraryto what ri Object#inspect says, Object#inspect does not delegate to the Object#to_s method: it just happens to work a lot like Object#to_s. If you only override to_s, inspect won’t be affected. 8.11 Accepting or Passing a Variable Number of Arguments | 293 def to_s inspect end end spot.inspect # => "" Or, if you believe in being able to eval the output of inspect: class Dog def inspect %{Dog.new("#{@name}", #{@age/7})} end end spot.inspect # => "Dog.new("Spot", 2.1)" eval(spot.inspect).inspect # => "Dog.new("Spot", 2.1)" Just don’t automatically eval the output of inspect, because, as always, that’s dangerous: strange_dog_name = %{Spot", 0); puts "Executing arbitrary Ruby..."; puts("} spot = Dog.new(strange_dog_name, 0) puts spot.inspect # Dog.new("Spot", 0); puts "Executing arbitrary Ruby..."; puts("", 0) eval(spot.inspect) # Executing arbitrary Ruby... # # 0 8.11 Accepting or Passing a Variable Number of Arguments Problem You want to write a method that can accept anynumber of arguments. Or maybe you want to pass the contents of an array as arguments into such a method, rather than passing in the array itself as a single argument. Solution To accept anynumber of arguments to yourmethod, prefix the last argument name with an asterisk. When the method is called, all the “extra” arguments will be col- lected in a list and passed in as that argument: def sum(*numbers) puts "I'm about to sum the array #{numbers.inspect}" numbers.inject(0) { |sum, x| sum += x } end sum(1, 2, 10) # I'm about to sum the array [1, 2, 10] # => 13 294 | Chapter 8: Objects and Classes sum(2, -2, 2, -2, 2, -2, 2, -2, 2) # I'm about to sum the array [2, -2, 2, -2, 2, -2, 2, -2, 2] # => 2 sum # I'm about to sum the array [] # => 0 To pass an arrayof arguments into a method, use the asterisk signifier before the array you want to be turned into “extra” arguments: to_sum = [] 1.upto(10) { |x| to_sum << x } sum(*to_sum) # I'm about to sum the array [1, 2, 3, 4, 5, 6, 7, 8, 9, 10] # => 55 Bad things happen if you forget the asterisk: your entire array is treated as a single “extra” argument: sum(to_sum) # I'm about to sum the array [[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]] # TypeError: Array can't be coerced into Fixnum Discussion Whymake a method take a variable number of arguments, instead of just having it take a single array? It’s basically for the convenience of the user. Consider the Kernel#printf method, which takes one fixed argument (a format string), and then a variable number of inputs to the format string: printf('%s | %s', 'left', 'right') # left | right It’s veryrare that the caller of printf already has her inputs lying around in an array. Fortunately, Ruby is happy to create the array on the user’s behalf. If the caller does alreadyhave an arrayof inputs, it’s easyto pass the contents of that arrayas “extra” arguments by sticking the asterisk onto the appropriate variable name: inputs = ['left', 'right'] printf('%s | %s', *inputs) # left | right As you can see, a method can take a fixed number of “normal” arguments and then a variable number of “extra” arguments. When defining such a method, just make sure that the last argument is the one you prefix with the asterisk: def format_list(header, footer='', *data) puts header puts (line = '-' * header.size) puts data.join("\n") puts line puts footer end 8.12 Simulating Keyword Arguments | 295 cozies = 21 gaskets = 10 format_list("Yesterday's productivity numbers:", 'Congratulations!', "#{cozies} slime mold cozies", "#{gaskets} Sierpinski gaskets") # Yesterday's productivity numbers: # --------------------------------- # 21 slime mold cozies # 10 Sierpinski gaskets # --------------------------------- # Congratulations! You can use the asterisk trick to call methods that don’t take a variable number of arguments. You just need to make sure that the arrayyou’reusing has enough ele- ments to provide values for all of the method’s required arguments. You’ll find this especiallyuseful for constructors that take manyarguments. The fol- lowing code initializes four Range objects from four arrays of constructor arguments: ranges = [[1, 10], [1, 6, true], [25, 100, false], [6, 9]] ranges.collect { |l| Range.new(*l) } # => [1..10, 1...6, 25..100, 6..9] 8.12 Simulating Keyword Arguments Problem A function or method can accept manyoptional arguments. You want to let callers pass in onlythe arguments theyhave values for, but Rubydoesn’t support keyword arguments as Python and Lisp do. Solution Write your function to accept as its final argument a map of symbols to values. Con- sult the map as necessary to see what arguments were passed in. def fun_with_text(text, args={}) text = text.upcase if args[:upcase] text = text.downcase if args[:downcase] if args[:find] and args[:replace] text = text.gsub(args[:find], args[:replace]) end text = text.slice(0, args[:truncate_at]) if args[:truncate_at] return text end Ruby has syntactic sugar that lets you define a hash inside a function call without putting it in curly brackets. This makes the code look more natural: fun_with_text("Foobar", {:upcase => true, :truncate_at => 5}) # => "FOOBA" fun_with_text("Foobar", :upcase => true, :truncate_at => 5) # => "FOOBA" 296 | Chapter 8: Objects and Classes fun_with_text("Foobar", :find => /(o+)/, :replace => '\1d', :downcase => true) # => "foodbar" Discussion This simple code works well in most cases, but it has a couple of shortcomings com- pared to “real” keyword arguments. These simulated keyword arguments don’t work like regular arguments because they’re hidden inside a hash. You can’t reject an argu- ment that’s not part of the “signature,” and you can’t force a caller to provide a par- ticular keyword argument. Each of these problems is easyto work around (for instance, does a required argu- ment reallyneed to be a keywordargument?), but it’s best to define the workaround code in a mixin so you only have to do it once. The following code is based on a KeywordProcessor module by Gavin Sinclair: ### # This mix-in module lets methods match a caller's hash of keyword # parameters against a hash the method keeps, mapping keyword # arguments to default parameter values. # # If the caller leaves out a keyword parameter whose default value is # :MANDATORY (a constant in this module), then an error is raised. # # If the caller provides keyword parameters which have no # corresponding keyword arguments, an error is raised. # module KeywordProcessor MANDATORY = :MANDATORY def process_params(params, defaults) # Reject params not present in defaults. params.keys.each do |key| unless defaults.has_key? key raise ArgumentError, "No such keyword argument: #{key}" end end result = defaults.dup.update(params) # Ensure mandatory params are given. unfilled = result.select { |k,v| v == MANDATORY }.map { |k,v| k.inspect } unless unfilled.empty? msg = "Mandatory keyword parameter(s) not given: #{unfilled.join(', ')}" raise ArgumentError, msg end return result end end Here’s KeywordProcessor in action. Note how I set a default other than nil for a key- word argument, by defining it in the default value of args: 8.13 Calling a Superclass’s Method | 297 class TextCanvas include KeywordProcessor def render(text, args={}.freeze) args = process_params(args, {:font => 'New Reykjavik Solemn', :size => 36, :bold => false, :x => :MANDATORY, :y => :MANDATORY }.freeze) # ... puts "DEBUG: Found font #{args[:font]} in catalog." # ... end end canvas = TextCanvas.new canvas.render('Hello', :x => 4, :y => 100) # DEBUG: Found font New Reykjavik Solemn in catalog. canvas.render('Hello', :x => 4, :y => 100, :font => 'Lacherlich') # DEBUG: Found font Lacherlich in catalog. canvas.render('Hello', :font => "Lacherlich") # ArgumentError: Mandatory keyword parameter(s) not given: :x, :y canvas.render('Hello', :x => 4, :y => 100, :italic => true) # ArgumentError: No such keyword argument: italic Ruby 2.0 will, hopefully, have full support for keyword arguments. See Also • Recipe 8.8, “Delegating Method Calls to Another Object” • The KeywordProcessor module is based on the one in “Emulating Keyword Argu- ments in Ruby”; I modified it to be less oriented around the initialize method (http://www.rubygarden.org/ruby?KeywordArguments) 8.13 Calling a Superclass’s Method Problem When overriding a class’s method in a subclass, you want to extend or decorate the behavior of the superclass, rather than totally replacing it. Solution Use the super keyword to call the superclass implementation of the current method. When you call super with no arguments, the arguments to your method are passed to the superclass method exactlyas theywere recieved bythe subclass. Here’s a Recipe class that defines (among other things) a cook method. 298 | Chapter 8: Objects and Classes class Recipe # ... The rest of the Recipe implementation goes here. def cook(stove, cooking_time) dish = prepare_ingredients stove << dish wait_for(cooking_time) return dish end end Here’s a subclass of Recipe that tacks some extra behavior onto the recipe. It passes all of its arguments directly into super: class RecipeWithExtraGarlic < Recipe def cook(stove, cooking_time) 5.times { add_ingredient(Garlic.new.chop) } super end end A subclass implementation can also choose to pass arguments into super. This way, a subclass can accept different arguments from its superclass implementation: class BakingRecipe < Recipe def cook(cooking_time, oven_temperature=350) oven = Oven.new(oven_temperature) super(oven, cooking_time) end end Discussion You can call super at anytime in the bodyof a method—before, during, or after call- ing other code. This is in contrast to languages like Java, where you must call super in the method’s first statement or never call it at all. If you need to, you can even call super multiple times within a single method. Often you want to create a subclass method that exposes exactly the same interface as its parent. You can use the *args constructor to make the subclass method accept anyarguments at all, then call super with no arguments to pass all those arguments (as well as anyattached code block) into the superclass implementation. Let the superclass deal with any problems with the arguments. The String#gsub method exposes a fairlycomplicated interface, but the String sub- class defined here doesn’t need to know anything about it: class MyString < String def gsub(*args) return "#{super} -- This string modified by MyString#gsub (TM)" end end 8.14 Creating an Abstract Method | 299 str = MyString.new("Here's my string") str.gsub("my", "a") # => "Here's a string -- This string modified by MyString#gsub (TM)" str.gsub(/m| s/) { |match| match.strip.capitalize } # => "Here's MyString -- This string modified by MyString#gsub (TM)" If the subclass method takes arguments but the superclass method takes none, be sure to invoke super with an emptypair of parentheses. Usuallyyoudon’t have to do this in Ruby, but super is not a real method call. If you invoke super without paren- theses, it will pass all the subclass arguments into the superclass implementation, which won’t be able to handle them. In the example below, calling just super would result in an ArgumentError: it would pass a numeric argument into String#succ!, which takes no arguments: class MyString def succ!(skip=1) skip.times { super( ) } self end end str = MyString.new('a') str.succ!(3) # => "d" Invoking super works for class methods as well as instance methods: class MyFile < File def MyFile.ftype(*args) return "The type is #{super}." end end File.ftype("/bin") # => "directory" MyFile.ftype("/bin") # => "The type is directory." 8.14 Creating an Abstract Method Problem You want to define a method of a class, but leave it for subclasses to fill in the actual implementations. Solution Define the method normally, but have it do nothing except raise a NotImplementedError: class Shape2D def area raise NotImplementedError. 300 | Chapter 8: Objects and Classes new("#{self.class.name}#area is an abstract method.") end end Shape2D.new.area # NotImplementedError: Shape2D#area is an abstract method. A subclass can redefine the method with a concrete implementation: class Square < Shape2D def initialize(length) @length = length end def area @length ** 2 end end Square.new(10).area # => 100 Discussion Rubydoesn’t have a built-in notion of an abstract method or class, and though it has manybuilt-in classes that might be considered “abstract,” it doesn’t enforce this abstractness the wayC++ and Java do. For instance, youcan instantiate an instance of Object or Numeric, even though those classes don’t do anything by themselves. In general, this is in the spirit of Ruby. But it’s sometimes useful to define a super- class method that everysubclass is expected to implement. The NotImplementedError error is the standard wayof conveyingthat a method is not there, whether it’s abstract or just an unimplemented stub. Unlike other programming languages, Rubywill let youinstantiate a class that defines an abstract method. You won’t have anyproblems until youactuallycall the abstract method; even then, you can catch the NotImplementedError and recover. If you want, you can make an entire class abstract by making its initialize method raise a NotImplementedError. Then no one will be able to create instances of your class:* class Shape2D def initialize raise NotImplementedError. new("#{self.class.name} is an abstract class.") end end Shape2D.new # NotImplementedError: Shape2D is an abstract class. * Of course, unless you freeze the class afterwards, someone else can reopen your class, define an empty initialize, and then create instances of your class. 8.14 Creating an Abstract Method | 301 We can do the same thing in less code bydefining a decorator method of Class that creates an abstract method by the given name. class Class def abstract(*args) args.each do |method_name| define_method(method_name) do |*args| if method_name == :initialize msg = "#{self.class.name} is an abstract class." else msg = "#{self.class.name}##{method_name} is an abstract method." end raise NotImplementedError.new(msg) end end end end Here’s an abstract class that defines an abstract method move: class Animal abstract :initialize, :move end Animal.new # NotImplementedError: Animal is an abstract class. Here’s a concrete subclass that doesn’t bother to define an implementation for the abstract method: class Sponge < Animal def initialize @type = :Sponge end end sponge = Sponge.new sponge.move # NotImplementedError: Sponge#move is an abstract method. Here’s a concrete subclass that implements the abstract method: class Cheetah < Animal def initialize @type = :Cheetah end def move "Running!" end end Cheetah.new.move # => "Running!" 302 | Chapter 8: Objects and Classes Abstract methods declared in a class are, byconvention, eventuallydefined in the subclasses of that class. But Rubydoesn’t enforce this either. An abstract method has a definition; it just happens to be one that always throws an error. Since Rubylets youreopen classes and redefine methods later, the definition of a concrete method can happen later in time instead of further down the inheritance tree. The Sponge class defined above didn’t have a move method, but we can add one now: class Sponge def move "Floating on ocean currents!" end end sponge.move # => "Floating on ocean currents!" You can create an abstract singleton method, but there’s not much point unless you intend to fill it in later. Unlike instance methods, singleton methods aren’t inherited bysubclasses. If youwere to define Superclass.foo abstract, then define it for real as Subclass.foo, you would have accomplished little: Superclass.foo would still exist separately and would still be abstract. 8.15 Freezing an Object to Prevent Changes Problem You want to prevent any further changes to the state of an object. Solution Freeze the object with Object#freeze: frozen_string = 'Brrrr!' frozen_string.freeze frozen_string.gsub('r', 'a') # => "Baaaa!" frozen_string.gsub!('r', 'a') # TypeError: can't modify frozen string Discussion When an object is frozen, its instance variables are permanentlybound to their cur- rent values. The values themselves are not frozen: their instance variables can still be modified, to the extent they were modifiable before: sequences = [[1,2,3], [1,2,4], [1,4,9]].freeze sequences << [2,3,5] # TypeError: can't modify frozen array sequences[2] << 16 # => [1, 4, 9, 16] 8.15 Freezing an Object to Prevent Changes | 303 A frozen object cannot be unfrozen, and if cloned, the clone will also be frozen. Call- ing Object#dup (as opposed to Object#clone) on a frozen object yields an unfrozen object with the same instance variables. frozen_string.clone.frozen? # => true frozen_string.dup.frozen? # => false Freezing an object does not prevent reassignment of anyvariables bound to that object. frozen_string = 'A new string.' frozen_string.frozen? # => false To prevent objects from changing in ways confusing to the user or to the Ruby inter- preter, Rubysometimes copies objects and freezes the copies. When youuse a string as a hash key, Ruby actually copies the string, freezes the copy, and uses the copy as the hash key: that way, if the original string changes later on, the hash key isn’t affected. Constant objects are often frozen as a second line of defense against the object being modified in place. You can freeze an object whenever you need a permanent refer- ence to an object; this is most commonly seen with strings: API_KEY = "100f7vo4gg".freeze API_KEY[0] = 4 # TypeError: can't modify frozen string API_KEY = "400f7vo4gg" # warning: already initialized constant API_KEY Frozen objects are also useful in multithreaded code. For instance, Ruby’s internal file operations work from a frozen copyof a filename instead of using the filename directly. If another thread modifies the original filename in the middle of an opera- tion that’s supposed to be atomic, there’s no problem: Rubywasn’t relyingon the original filename anyway. You can adopt this copy-and-freeze pattern in multi- threaded code to prevent a data structure you’re working on from being changed by another thread. Another common programmer-level use of this feature is to freeze a class in order to prevent future modifications to it (byyourself,other code running in the same envi- ronment, or other people who use your code as a library). This is not quite the same as the final construct in C# and Java, because you can still subclass a frozen class, and override methods in the subclass. Calling freeze onlystops the in-place modifi- cation of a class. The simplest wayto do it is to call freeze as the last statement in the class definition: class MyClass def my_method puts "This is the only method allowed in MyClass." end 304 | Chapter 8: Objects and Classes freeze end class MyClass def my_method "I like this implementation of my_method better." end end # TypeError: can't modify frozen class class MyClass def my_other_method "Oops, I forgot to implement this method." end end # TypeError: can't modify frozen class class MySubclass < MyClass def my_method "This is only one of the methods available in MySubclass." end def my_other_method "This is the other one." end end MySubclass.new.my_method # => "This is only one of the methods available in MySubclass." See Also • Recipe 4.7, “Making Sure a Sorted ArrayStaysSorted,” defines a convenience method for making a frozen copy of an object • Recipe 5.5, “Using an Array or Other Modifiable Object as a Hash Key” • Recipe 8.16, “Making a Copy of an Object” • Recipe 8.17, “Declaring Constants” 8.16 Making a Copy of an Object Problem You want to make a copyof an existing object: a new object that can be modified separately from the original. Solution Ruby provides two ways of doing this. If you only want to have to remember one way, remember Object#clone: 8.16 Making a Copy of an Object | 305 s1 = 'foo' # => "foo" s2 = s1.clone # => "foo" s1[0] = 'b' [s1, s2] # => ["boo", "foo"] Discussion Rubyhas two object-copymethods: a quick one and a thorough one. The quick one, Object#dup, creates a new instance of an object’s class, then sets all of the new object’s instance variables so that theyreference the same objects as the original does. Finally, it makes the new object tainted if the old object was tainted. The downside of dup is that it creates a new instance of the object’s original class. If you open up a specific object and give it a singleton method, you implicitly create a metaclass, an anonymous subclass of the original class. Calling dup on the object will yielda copythat lacks the singleton methods. The other object-copymethod, Object#clone, makes a copyof the metaclass and instantiates the copy,instead of instantiating the object’s original class. material = 'cotton' class << material def definition puts 'The better half of velour.' end end material.definition # The better half of velour. 'cotton'.definition # NoMethodError: undefined method `definition' for "cotton":String material.clone.definition # The better half of velour. material.dup.definition # NoMethodError: undefined method `definition' for "cotton":String Object#clone is also more strict about propagating Ruby’s internal flags: it will prop- agate both an object’s “tainted?” flag and its “frozen?” flag. If you want to make an unfrozen copy of a frozen object, you must use Object#dup. Object#clone and Object#dup both perform shallow copies: theymake copies of an object without also copying its instance variables. You’ll end up with two objects whose instance variables point to the same objects. Modifications to one object’s instance variables will be visible in the other object. This can cause problems if you’re not expecting it: class StringHolder attr_reader :string def initialize(string) @string = string 306 | Chapter 8: Objects and Classes end end s1 = StringHolder.new('string') s2 = s1.dup s3 = s1.clone s1.string[1] = 'p' s2.string # => "spring" s3.string # => "spring" If you want to do a deep copy, an easy (though not particularly quick) way is to seri- alize the object to a binarystring with Marshal, then load a new object from the string: class Object def deep_copy Marshal.load(Marshal.dump(self)) end end s1 = StringHolder.new('string') s2 = s1.deep_copy s1.string[1] = 'p' s1.string # => "spring" s2.string # => "string" Note that this will only work on an object that has no singleton methods: class << s1 def definition puts "We hold strings so you don't have to." end end s1.deep_copy # TypeError: singleton can't be dumped When an object is cloned or duplicated, Rubycreates a new instance of its class or superclass, but without calling the initialize method. If you want to define some code to run when an object is cloned or duplicated, define an initialize_copy method. This is a hook method that gives youa chance to modifythe copybefore Rubypasses it back to whoever called clone or dup. If you want to simulate a deep copy without using Marshal, this is your chance to modify the copy’s instance variables: class StringHolder def initialize_copy(from) @string = from.string.dup end end s1 = StringHolder.new('string') s2 = s1.dup s3 = s1.clone 8.17 Declaring Constants | 307 s1.string[1] = "p" s2.string # => "string" s3.string # => "string" This table summarizes the differences between clone, dup, and the deep-copytech- nique that uses Marshal. See Also • Recipe 13.2, “Serializing Data with Marshal” 8.17 Declaring Constants Problem You want to prevent a variable from being assigned a different value after its initial definition. Solution Declare the variable as a constant. You can’t absolutelyprohibit the variable from being assigned a different value, but you can make Ruby generate a warning when- ever that happens. not_a_constant = 3 not_a_constant = 10 A_CONSTANT = 3 A_CONSTANT = 10 # warning: already initialized constant A_CONSTANT Discussion A constant variable is one whose name starts with a capital letter. Bytradition, Ruby constant names consist entirelyof capital letters, numbers, and underscores. Con- stants don’t mesh well with Ruby’s philosophy of unlimited changability: there’s no wayto absolutelyprevent someone from changing yourconstant. However, theyare Object#clone Object#dup Deep copy with Marshal Same instance variables? New references to the same objects New references to the same objects New objects Same metaclass? Yes No Yesa Same singleton methods? Yes No N/Aa a Marshal can’t serialize an object whose metaclass is different from its original class. Same frozen state? Yes No No Same tainted state? Yes Yes Yes 308 | Chapter 8: Objects and Classes a useful signal to the programmers who come after you, letting them know not to redefine a constant without a very good reason. Constants can occur anywhere in code. If they appear within a class or module, you can access them from outside the class or module with the double-colon operator (::). The name of the class or module qualifies the name of the constant, preventing con- fusion with other constants that mayhave the same name but be defined in different scopes. CONST = 4 module ConstModule CONST = 6 end class ConstHolder CONST = 8 def my_const return CONST end end CONST # => 4 ConstModule::CONST # => 6 ConstHolder::CONST # => 8 ConstHolder.new.my_const # => 8 The thing that’s constant about a constant is its reference to an object. If you change the reference to point to a different object, you’ll get a warning. Unfortunately, there’s no way to tell Ruby to treat the redeclaration of a constant as an error. E = 2.718281828 # => 2.718281828 E = 6 # warning: already initialized constant E E # => 6 However, you can use Module#remove_const as a sneakywayto “undeclare” a con- stant. You can then declare the constant again, without even triggering a warning. Clearly, this is potent and potentially dangerous stuff: # This should make things a lot simpler. module Math remove_const(:PI) PI = 3 end Math::PI # => 3 If a constant points to a mutable object like an arrayor a string, the object itself can change without triggering the constant warning. You can prevent this byfreezing the object to which the constant points: RGB_COLORS = [:red, :green, :blue] # => [:red, :green, :blue] RGB_COLORS << :purple # => [:red, :green, :blue, :purple] 8.18 Implementing Class and Singleton Methods | 309 RGB_COLORS = [:red, :green, :blue] # warning: already initialized constant RGB_GOLORS RGB_COLORS # => [:red, :green, :blue] RGB_COLORS.freeze RGB_COLORS << :purple # TypeError: can't modify frozen array Freezing operates on the object, not the reference. It does nothing to prevent a con- stant reference from being assigned to another object. HOURS_PER_DAY = 24 HOURS_PER_DAY.freeze # This does nothing since Fixnums are already immutable. HOURS_PER_DAY = 26 # warning: already initialized constant HOURS_PER_DAY HOURS_PER_DAY # => 26 See Also • Recipe 8.15, “Freezing an Object to Prevent Changes” 8.18 Implementing Class and Singleton Methods Problem You want to associate a new method with a class (as opposed to the instances of that class), or with a particular object (as opposed to other instances of the same class). Solution To define a class method, prefix the method name with the class name in the method definition. You can do this inside or outside of the class definition. The Regexp.is_valid? method, defined below, checks whether a string can be com- piled into a regular expression. It doesn’t make sense to call it on an alreadyinstanti- ated Regexp, but it’s clearlyrelated functionality,so it belongs in the Regexp class (assuming you don’t mind adding a method to a core Ruby class). class Regexp def Regexp.is_valid?(str) begin compile(str) valid = true rescue RegexpError valid = false end end end Regexp.is_valid? "The horror!" # => true Regexp.is_valid? "The)horror!" # => false 310 | Chapter 8: Objects and Classes Here’s a Fixnum.random method that generates a random number in a specified range: def Fixnum.random(min, max) raise ArgumentError, "min > max" if min > max return min + rand(max-min+1) end Fixnum.random(10, 20) # => 13 Fixnum.random(-5, 0) # => -5 Fixnum.random(10, 10) # => 10 Fixnum.random(20, 10) # ArgumentError: min > max To define a method on one particular other object, prefix the method name with the variable name when you define the method: company_name = 'Homegrown Software' def company_name.legalese return "#{self} is a registered trademark of ConglomCo International." end company_name.legalese # => "Homegrown Software is a registered trademark of ConglomCo International." 'Some Other Company'.legalese # NoMethodError: undefined method `legalese' for "Some Other Company":String Discussion In Ruby, a singleton method is a method defined on one specific object, and not available to other instances of the same class. This is kind of analagous to the Single- ton pattern, in which all access to a certain class goes through a single instance, but the name is more confusing than helpful. Class methods are actuallya special case of singleton methods. The object on which you define a new method is the Class object itself. Some common types of class methods are listed here, along with illustrative exam- ples taken from Ruby’s standard library: • Methods that instantiate objects, and methods for retrieving an object that implements the Singleton pattern. Examples: Regexp.compile, Date.parse, Dir. open, and Marshal.load (which can instantiate objects of manydifferent types). Ruby’s standard constructor, the new method, is another example. • Utilityor helper methods that use logic associated with a class, but don’t require an instance of that class to operate. Examples: Regexp.escape, Dir.entries, File. basename. • Accessors for class-level or Singleton data structures. Examples: Thread.current, Struct.members, Dir.pwd. • Methods that implicitlyoperate on an object that implements the Singleton pattern. Examples: Dir.chdir, GC.disable and GC.enable, and all the methods of Process. 8.19 Controlling Access by Making Methods Private | 311 When you define a singleton method on an object other than a class, it’s usually to redefine an existing method for a particular object, rather than to define a brand new method. This behavior is common in frameworks, such as GUIs, where each individ- ual object has customized behavior. Singleton method definition is a cheap substi- tute for subclassing when you only need to customize the behavior of a single object: class Button #A stub method to be overridden by subclasses or individual Button objects def pushed end end button_a = Button.new def button_a.pushed puts "You pushed me! I'm offended!" end button_b = Button.new def button_b.pushed puts "You pushed me; that's okay." end Button.new.pushed # button_a.pushed # You pushed me! I'm offended! button_b.pushed # You pushed me; that's okay. When you define a method on a particular object, Ruby acts behind the scenes to trans- form the object into an anonymous subclass of its former class. This new class is the one that actually defines the new method or overrides the methods of its superclass. 8.19 Controlling Access by Making Methods Private Problem You’ve refactored your code (or written it for the first time) and ended up a method that should be marked for internal use only. You want to prevent outside objects from calling such methods. Solution Use private as a statement before a method definition, and the method will not be callable from outside the class that defined it. This class defines an initializer, a pub- lic method, and a private method: class SecretNumber def initialize @secret = rand(20) end 312 | Chapter 8: Objects and Classes def hint puts "The number is #{"not " if secret <= 10}greater than 10." end private def secret @secret end end s = SecretNumber.new s.secret # NoMethodError: private method `secret' called for # # s.hint # The number is greater than 10. Unlike in manyother programming languages, a private method in Ruby is accessi- ble to subclasses of the class that defines it: class LessSecretNumber < SecretNumber def hint lower = secret-rand(10)-1 upper = secret+rand(10)+1 "The number is somewhere between #{lower} and #{upper}." end end ls = LessSecretNumber.new ls.hint # => "The number is somewhere between -3 and 16." ls.hint # => "The number is somewhere between -1 and 15." ls.hint # => "The number is somewhere between -2 and 16." Discussion Like manyparts of Rubythat look like special language features, Ruby’sprivacykey- words are actuallymethods. In this case, they’remethods of Module. When you call private, protected,orpublic, the current module (remember that a class is just a special kind of module) changes the rules it applies to newlydefined methods from that point on. Most languages that support method privacy make you put a keyword before every method saying whether it’s public, private, or protected. In Ruby, the special privacy methods act as toggles. When you call the private keyword, all methods you define after that point are declared as private, until the module definition ends or you call a different privacymethod. This makes it easyto group methods of the same privacy level—a good, general programming practice: 8.19 Controlling Access by Making Methods Private | 313 class MyClass def public_method1 end def public_method2 end protected def protected_method1 end private def private_method1 end def private_method2 end end Private and protected methods work a little differentlyin Rubythan in most other programming languages. Suppose you have a class called Foo and a subclass SubFoo. In languages like Java, SubFoo has no access to anyprivate methods defined by Foo. As seen in the Solution, Rubyprovides no wayto hide a class’s methods from its sub- classes. In this way, Ruby’s private works like Java’s protected. Suppose further that you have two instances of the Foo class, A and B. In languages like Java, A and B can call each other’s private methods. In Ruby, you need to use a protected method for that. This is the main difference between private and pro- tected methods in Ruby. In the example below, I tryto add another typeof hint to the LessSecretNumber class, one that lets you compare the relative magnitudes of two secret numbers. It doesn’t work because one LessSecretNumber can’t call the private methods of another LessSecretNumber: class LessSecretNumber def compare(other) if secret == other.secret comparison = "equal to" else comparison = secret > other.secret ? "greater than" : "less than" end "This secret number is #{comparison} the secret number you passed in." end end a = LessSecretNumber.new b = LessSecretNumber.new a.hint # => "The number is somewhere between 17 and 22." 314 | Chapter 8: Objects and Classes b.hint # => "The number is somewhere between 0 and 12." a.compare(b) # NoMethodError: private method `secret' called for # # But if I make make the secret method protected instead of private, the compare method starts working. You can change the privacyof a method after the fact by passing its symbol into one of the privacy methods: class SecretNumber protected :secret end a.compare(b) # => "This secret number is greater than the secret number you passed in." b.compare(a) # => "This secret number is less than the secret number you passed in." Instance variables are always private: accessible by subclasses, but not from other objects, even other objects of the same class. If you want to make an instance vari- able accessible to the outside, you should define a getter method with the same name as the variable. This method can be either protected or public. You can trick a class into calling a private method from outside bypassing the method’s symbol into Object#send (in Ruby1.8) or Object#funcall (in Ruby1.9). You’d better have a really good reason for doing this. s.send(:secret) # => 19 See Also • Recipe 8.2, “Managing Class Data,” has a prettygood reason for using the Object#send trick 315 Chapter 9 CHAPTER 9 Modules and Namespaces9 A Rubymodule is nothing more than a grouping of objects under a single name. The objects may be constants, methods, classes, or other modules. Modules have two uses. You can use a module as a convenient wayto bundle objects together, or you can incorporate its contents into a class with Ruby’s include statement. When a module is used as a container for objects, it’s called a namespace. Ruby’s Math module is a good example of a namespace: it provides an overarching structure for constants like Math::PI and methods like Math::log, which would otherwise clut- ter up the main Kernel namespace. We cover this most basic use of modules in Reci- pes 9.5 and 9.7. Modules are also used to package functionalityfor inclusion in classes. The Enumerable module isn’t supposed to be used on its own: it adds functionalityto a class like Array or Hash. We cover the use of modules as packaged functionalityfor existing classes in Recipes 9.1 and 9.4. Module is actuallythe superclass of Class, so everyRubyclass is also a module. Throughout this book we talk about using methods of Module from within classes. The same methods will work exactlythe same waywithin modules. The onlything you can’t do with a module is instantiate an object from it: Class.superclass # => Module Math.class # => Module Math.new # NoMethodError: undefined method `new' for Math:Module 9.1 Simulating Multiple Inheritance with Mixins Problem You want to create a class that derives from two or more sources, but Rubydoesn’t support multiple inheritance. 316 | Chapter 9: Modules and Namespaces Solution Suppose you created a class called Taggable that lets you associate tags (short strings of informative metadata) with objects. Everyclass whose objects should be taggable could derive from Taggable. This would work if you made Taggable the top-level class in your class structure, but that won’t work in everysituation. Eventuallyyoumight want to do something like make a string taggable. One class can’t subclass both Taggable and String, so you’d have a problem. Furthermore, it makes little sense to instantiate and use a Taggable object byitself— there is nothing there to tag! Taggabilityis more of a feature of a class than a full- fledged class of its own. The Taggable functionalityonlyworks in conjunction with some other data structure. This makes it an ideal candidate for implementation as a Rubymodule instead of a class. Once it’s in a module, any class can include it and use the methods it defines. require 'set' # Deals with a collection of unordered values with no duplicates # Include this module to make your class taggable. The names of the # instance variable and the setup method are prefixed with "taggable_" # to reduce the risk of namespace collision. You must call # taggable_setup before you can use any of this module's methods. module Taggable attr_accessor :tags def taggable_setup @tags = Set.new end def add_tag(tag) @tags << tag end def remove_tag(tag) @tags.delete(tag) end end Here’s a taggable string class: it subclasses String, but it also includes the functional- ity of Taggable. class TaggableString < String include Taggable def initialize(*args) super taggable_setup end end 9.1 Simulating Multiple Inheritance with Mixins | 317 s = TaggableString.new('It was the best of times, it was the worst of times.') s.add_tag 'dickens' s.add_tag 'quotation' s.tags # => # Discussion A Rubyclass can onlyhave one superclass, but it can include anynumber of mod- ules. These modules are called mixins. If you write a chunk of code that can add functionality to classes in general, it should go into a mixin module instead of a class. The onlyobjects that need to be defined as classes are the ones that get instantiated and used on their own (modules can’t be instantiated). If you come from Java, you might think of a module as being the combination of an interface and its implementation. Byincluding a module, yourclass implements cer- tain methods, and announces that since it implements those methods it can be treated a certain way. When a class includes a module with the include keyword, all of the module’s meth- ods and constants are made available from within that class. They’re not copied, the waya method is when youalias it. Rather, the class becomes aware of the methods of the module. If a module’s methods are changed later (even during runtime), so are the methods of all the classes that include that module. Module and class definitions have an almost identical syntax. If you find out after implementing a class that you should have done it as a module, it’s not difficult to translate the class into a module. The main problem areas will be methods defined both byyourmodule and the classes that include it: especiallymethods like initialize. Your module can define an initialize method, and it will be called bya class whose constructor includes a super call (see Recipe 9.8 for an example), but sometimes that doesn’t work. For instance, Taggable defines a taggable_setup method that takes no arguments. The String class, the superclass of TaggableString, takes one and only one argument. TaggableString can call super within its constructor to trigger both String#initialize and a hypothetical Taggable#initialize, but there’s no waya sin- gle super call can pass one argument to one method and zero arguments to another. That’s why Taggable doesn’t define an initialize method.* Instead, it defines a taggable_setup method and (in the module documentation) asks everyone who includes the module to call taggable_setup within their initialize method. Your module can define a _setup method instead of initialize, but you need to document it, or your users will be very confused. * An alternative is to define Taggable#initialize to take a variable number of arguments, and then just ignore all the arguments. This only works because Taggable can initialize itself without any outside information. 318 | Chapter 9: Modules and Namespaces It’s okayto expect that anyclass that includes yourmodule will implement some methods you can’t implement yourself. For instance, all of the methods in the Enumerable module are defined in terms of a method called each, but Enumerable never actuallydefines each. Everyclass that includes Enumerable must define what each means within that class before it can use the Enumerable methods. If you have such undefined methods, it will cut down on confusion if you provide a default implementation that raises a helpful exception: module Complaint def gripe voice('In all my years I have never encountered such behavior...') end def faint_praise voice('I am pleased to notice some improvement, however slight...') end def voice(complaint_text) raise NotImplementedError, "#{self.class} included the Complaint module but didn't define voice!" end end class MyComplaint include Complaint end MyComplaint.new.gripe # NotImplementedError: MyComplaint included the Complaint module # but didn't define voice! If two modules define methods with the same name, and a single class includes both modules, the class will have onlyone implementation of that method: the one from the module that was included last. The method of the same name from the other module will simply not be available. Here are two modules that define the same method: module Ayto def potato 'Pohtayto' end end module Ahto def potato 'Pohtahto' end end One class can mix in both modules: class Potato include Ayto include Ahto end 9.2 Extending Specific Objects with Modules | 319 But there can be only one potato method for a given class or module.* Potato.new.potato # => "Pohtahto" This rule sidesteps the fundamental problem of multiple inheritance byletting the programmer explicitlychoose which ancestor theywould like to inherit a particular method from. Nevertheless, it’s good programming practice to give distinctive names to the methods in your modules. This reduces the risk of namespace collisions when a class mixes in more than one module. Collisions can occur, and the later module’s method will take precedence, even if one or both methods are protected or private. See Also • If you want a real-life implementation of a Taggable-like mixin, see Recipe 13.18, “Adding Taggability with a Database Mixin” 9.2 Extending Specific Objects with Modules Credit: Phil Tomson Problem You want to add instance methods from a module (or modules) to specific objects. You don’t want to mix the module into the object’s class, because you want certain objects to have special abilities. Solution Use the Object#extend method. For example, let’s say we have a mild-mannered Person class: class Person attr_reader :name, :age, :occupation def initialize(name, age, occupation) @name, @age, @occupation = name, age, occupation end def mild_mannered? true end end Now let’s create a couple of instances of this class. jimmy = Person.new('Jimmy Olsen', 21, 'cub reporter') clark = Person.new('Clark Kent', 35, 'reporter') * You could get both methods byaliasing Potato#potato to another method after mixing in Ayto but before mixing in Ahto. There would still onlybe one Potato#potato method, and it would still be Ahto#potato, but the implementation of Ayto#potato would survive under a different name. 320 | Chapter 9: Modules and Namespaces jimmy.mild_mannered? # => true clark.mild_mannered? # => true But it happens that some Person objects are not as mild-mannered as theymight appear. Some of them have super powers. module SuperPowers def fly 'Flying!' end def leap(what) "Leaping #{what} in a single bound!" end def mild_mannered? false end def superhero_name 'Superman' end end If we use include to mix the SuperPowers module into the Person class, it will give everyperson super powers. Some people are bound to misuse such power. Instead, we’ll use extend to give super powers only to certain people: clark.extend(SuperPowers) clark.superhero_name # => "Superman" clark.fly # => "Flying!" clark.mild_mannered? # => false jimmy.mild_mannered? # => true Discussion The extend method is used to mix a module’s methods into an object, while include is used to mix a module’s methods into a class. The astute reader might point out that classes are actuallyobjects in Ruby.Let us see what happens when we use extend in a class definition: class Person extend SuperPowers end #which is equivalent to: Person.extend(SuperPowers) What exactlyare we extending here? Within the class definition, extend is being called on the Person class itself: we could have also written self. extend(SuperPowers). We’re extending the Person class with the methods defined in 9.3 Mixing in Class Methods | 321 SuperPowers. This means that the methods defined in the SuperPowers module have now become class methods of Person: Person.superhero_name # => "Superman" Person.fly # => "Flying!" This is not what we intended in this case. However, sometimes you do want to mix methods into a class, and Class#extend is an easy and powerful way to do it. See Also • Recipe 9.3, “Mixing in Class Methods,” shows how to mix in class methods with include 9.3 Mixing in Class Methods Credit: Phil Tomson Problem You want to mix class methods into a class, instead of mixing in instance methods. Solution The simplest wayto accomplish this is to call extend on the class object, as seen in the Discussion of Recipe 9.2. Just as you can use extend to add singleton methods to an object, you can use it to add class methods to a class. But that’s not always the best option. Your users maynot know that yourmodule provides or even requires some class methods, so theymight not extend their class when theyshould. How can you make an include statement mix in class methods as well? To begin, within your module, define a submodule called ClassMethods,* which con- tains the methods you want to mix into the class: module MyLib module ClassMethods def class_method puts "This method was first defined in MyLib::ClassMethods" end end end To make this code work, we must also define the included callback method within the MyLib module. This method is called everytime a module is included in the class, and it’s passed the class object in which our module is being included. Within the * The name ClassMethods has no special meaning within Ruby: technically, you can call your submodule what- ever you want. But the Ruby community has standardized on ClassMethods as the name of this submodule, and it’s used in many Ruby libraries, so you should use it too. 322 | Chapter 9: Modules and Namespaces callback method, we extend that class object with our ClassMethods module, making all of its instance methods into class methods. Continuing the example: module MyLib def self.included(receiver) puts "MyLib is being included in #{receiver}!" receiver.extend(ClassMethods) end end Now we can include our MyLib module in a class, and get the contents of ClassMethods mixed in as genuine class methods: class MyClass include MyLib end # MyLib is being included in MyClass! MyClass.class_method # This method was first defined in MyLib::ClassMethods Discussion Module#included is a callback method that is automaticallycalled during the inclu- sion of a module into a class. The default included implementation is an empty method. In the example, MyLib overrides it to extend the class that’s including the MyLib module with the contents of the MyLib::ClassMethods submodule. The Object#extend method takes a Module object as a parameter. It mixes all the methods defined in the module into the receiving object. Since classes are them- selves objects, and the singleton methods of a Class object are just its class methods, calling extend on a class object fills it up with new class methods. See Also • Recipe 7.11, “Coupling Systems Loosely with Callbacks,” covers callbacks in general and shows how to write your own • Recipe 10.6, “Listening for Changes to a Class,” covers Ruby’s other class and module callback methods 9.4 Implementing Enumerable: Write One Method, Get 22 Free Problem You want to give a class all the useful iterator and iteration-related features of Ruby’s arrays (sort, detect, inject, and so on), but your class can’t be a subclass of Array. You don’t want to define all those methods yourself. 9.4 Implementing Enumerable: Write One Method, Get 22 Free | 323 Solution Implement an each method, then include the Enumerable module. It defines 22 of the most useful iteration methods in terms of the each implementation you provide. Here’s a class that keeps multiple arrays under the covers. By defining each, it can expose a large interface that lets the user treat it like a single array: class MultiArray include Enumerable def initialize(*arrays) @arrays = arrays end def each @arrays.each { |a| a.each { |x| yield x } } end end ma = MultiArray.new([1, 2], [3], [4]) ma.collect # => [1, 2, 3, 4] ma.detect { |x| x > 3 } # => 4 ma.map { |x| x ** 2 } # => [1, 4, 9, 16] ma.each_with_index { |x, i| puts "Element #{i} is #{x}" } # Element 0 is 1 # Element 1 is 2 # Element 2 is 3 # Element 3 is 4 Discussion The Enumerable module is the most common mixin module. It lets you add a lot of behavior to your class for a little investment. Since Ruby relies so heavily on iterator methods, and almost everydata structure can be iterated over in some way,it’s no wonder that so manyof the classes in Ruby’sstandard libraryinclude Enumerable: Dir, Hash, Range, and String, just to name a few. Here’s the complete list of methods you can get by including Enumerable. Manyof them are described elsewhere in this book, especiallyin Chapter 4. Perhaps the most useful are collect, inject, find_all, and sort_by. Enumerable.instance_methods.sort # => ["all?", "any?", "collect", "detect", "each_with_index", "entries", # => "find", "find_all", "grep", "include?", "inject", "map", "max", # => "member?", "min", "partition", "reject", "select", "sort", "sort_by", # => "to_a", "zip"] Although you can get all these methods simply by implementing an each method, some of the methods won’t work unless your each implementation returns objects that can be compared to each other. For example, a data structure that contains both 324 | Chapter 9: Modules and Namespaces numbers and strings can’t be sorted, since it makes no sense to compare a number to a string: ma.sort # => [1, 2, 3, 4] mixed_type_ma = MultiArray.new([1, 2, 3], ["a", "b", "c"]) mixed_type_ma.sort # ArgumentError: comparison of Fixnum with String failed The methods subject to this restriction are max, min, sort, and sort_by. Since you probably don’t have complete control over the types of the data stored in your data structure, the best strategyis probablyto just let a method fail if the data is incom- patible. This is what Array does: [1, 2, 3, "a", "b", "c"].sort # ArgumentError: comparison of Fixnum with String failed One more example: in this one, I’ll make Module itself include Enumerable.Myeach implementation will iterate over the instance methods defined bya class or module. This makes it easy to find methods of a class that meet certain criteria. class Module include Enumerable def each instance_methods.each { |x| yield x } end end # Find all instance methods of String that modify the string in place. String.find_all { |method_name| method_name[-1] == ?! } # => ["sub!", "upcase!", "delete!", "lstrip!", "succ!", "gsub!", # => "squeeze!", "downcase!", "rstrip!", "slice!", "chop!", "capitalize!", # => "tr!", "chomp!", "next!", "swapcase!", "reverse!", "tr_s!", "strip!"] # Find all instance methods of Fixnum that take 2 arguments. sample = 0 sample.class.find_all { |method_name| sample.method(method_name).arity == 2 } # => ["instance_variable_set", "between?"] See Also • Manyof the recipes in Chapter 4 actuallycover methods of Enumerable; see espe- cially Recipe 4.12, “Building Up a Hash Using Injection” • Recipe 9.1, “Simulating Multiple Inheritance with Mixins” 9.5 Avoiding Naming Collisions with Namespaces Problem You want to define a class or module whose name conflicts with an existing class or module, or you want to prevent someone else from coming along later and defining a class whose name conflicts with yours. 9.5 Avoiding Naming Collisions with Namespaces | 325 Solution A Rubymodule can contain classes and other modules, which means youcan use it as a namespace. Here’s some code from a physics library that defines a class called String within the StringTheory module. The real name of this class is its fully-qualified name: StringTheory::String. It’s a totally different class from Ruby’s built-in String class. module StringTheory class String def initialize(length=10**-33) @length = length end end end String.new # => "" StringTheory::String.new # => # Discussion If you’ve read Recipe 8.17, you’ve already seen namespaces in action. The constants defined in a module are qualified with the module’s name. This lets Math::PI have a different value from Greek::PI. You can qualifythe name of anyRubyobject this way:a variable, a class, or even another module. Namespaces let you organize your libraries, and make it possible for them to coexist alongside others. Ruby’sstandard libraryuses namespaces heavilyas an organizing principle. An excel- lent example is REXML, the standard XML library. It defines a REXML namespace that includes lots of XML-related classes like REXML::Comment and REXML::Instruction. Naming those classes Comment and Instruction would be a disaster: they’d get overwrit- ten byother librarys’ Comment and Instruction classes. Since nothing about the generic- sounding names relates them to the REXML library, you might look at someone else’s code for a long time before realizing that the Comment objects have to do with XML. Namespaces can be nested: see for instance rexml’s REXML::Parsers module, which contains classes like REXML::Parsers::StreamParser. Namespaces group similar classes in one place so you can find what you’re looking for; nested namespaces do the same for namespaces. In Ruby, you should name your top-level module after your software project (SAX), or after the task it performs (XML::Parser). If you’re writing Yet Another implementa- tion of something that already exists, you should make sure your namespace includes your project name (XML::Parser::SAX). This is in contrast to Java’s namespaces: they exist in its package structure, which follows a naming convention that includes a domain name, like org.xml.sax. 326 | Chapter 9: Modules and Namespaces All code within a module is implicitlyqualified with the name of the module. This can cause problems for a module like StringTheory, if it needs to use Ruby’s built-in String class for something. This should be fixed in Ruby2.0, but youcan also fix it bysetting the built-in String class to a variable before defining your StringTheory:: String class. Here’s a version of the StringTheory module that can use Ruby’s built- in String class: module StringTheory2 RubyString = String class String def initialize(length=10**-33) @length = length end end RubyString.new("This is a built-in string, not a StringTheory2::String") end # => "This is a built-in string, not a StringTheory2::String" See Also • Recipe 8.17, “Declaring Constants” • Recipe 9.7, “Including Namespaces” 9.6 Automatically Loading Libraries as Needed Problem You’ve written a big librarywith multiple components. You’d like to split it up so that users don’t have to load the entire libraryinto memoryjust to use part of it. But you don’t want to make your users explicitly require each part of the librarythey plan to use. Solution Split the big libraryinto multiple files, and set up autoloading for the individual files by calling Kernel#autoload. The individual files will be loaded as they’re referenced. Suppose you have a library, functions.rb, that provides two very large modules: # functions.rb module Decidable # ... Many, many methods go here. end module Semidecidable # ... Many, many methods go here. end 9.6 Automatically Loading Libraries as Needed | 327 You can provide the same interface, but possibly save your users some memory, by splitting functions.rb into three files. The functions.rb file itself becomes a stub full of autoload calls: # functions.rb autoload :Decidable, "decidable.rb" autoload :Semidecidable, "semidecidable.rb" The modules themselves go into the files mentioned in the new functions.rb: # decidable.rb module Decidable # ... Many, many methods go here. end # semidecidable.rb module Semidecidable # ... Many, many methods go here. end The following code will work if all the modules are in functions.rb, but it will also work if functions.rb only contains calls to autoload: require 'functions' Decidable.class # => Module # More use of the Decidable module follows... When Decidable and Semidecidable have been split into autoloaded modules, that code onlyloads the Decidable module. Memoryis saved that would otherwise be used to contain the unsed Semidecidable module. Discussion Refactoring a libraryto consist of autoloadable components takes a little extra plan- ning, but it’s often worth it to improve performance for the people who use your library. Each call to Kernel#autoload binds a symbol to the path of the Ruby file that’s sup- posed to define that symbol. If the symbol is referenced, that file is loaded exactly as though it had been passed as an argument into require. If the symbol is never refer- enced, the user saves some memory. Since you can use autoload wherever you might use require, you can autoload built- in libraries when the user triggers some code that needs them. For instance, here’s some code that loads Ruby’s built-in set library as needed: autoload :Set, "set.rb" def random_set(size) max = size * 10 set = Set.new set << rand(max) until set.size == size return set end # More code goes here... 328 | Chapter 9: Modules and Namespaces If random_set is never called, the set librarywill never be loaded, and memorywill be saved. As soon as random_set gets called, the set libraryis autoloaded, and the code works even though we never explicitly require 'set': random_set(10) # => # require 'set' # => false 9.7 Including Namespaces Problem You want to use the objects within a module without constantlyqualifyingthe object names with the name of their module. Solution Use include to copya module’s objects into the current namespace. You can then use them from the current namespace, without qualifying their names. Instead of this: require 'rexml/document' REXML::Document.new(xml) You might write this: require 'rexml/document' include REXML Document.new(xml) Discussion This is the exact same include statement you use to incorporate a mixin module into a class you’re writing. It does the same thing here as when it includes a mixin: it cop- ies the contents of a module into the current namespace. Here, though, the point isn’t to add new functionalityto a class or module: it’s to save you from having to do so much typing. This technique is especially useful with large library modules like Curses and the Rails libraries. This use of include comes with the same caveats as anyother: if youalreadyhave variables with the same names as the objects being included, the included objects will be copied in over them and clobber them. You can, of course, import a namespace that’s nested within a namespace of its own. Instead of this: require 'rexml/parsers/pullparser' REXML::Parsers::PullParser.new("Some XML") 9.8 Initializing Instance Variables Defined by a Module | 329 You might write this: require 'rexml/parsers/pullparser' include REXML::Parsers PullParser.new("Some XML") See Also • Recipe 11.3, “Extracting Data While Parsing a Document” 9.8 Initializing Instance Variables Defined by a Module Credit: Phil Tomson Problem You have a mixin module that defines some instance variables. Given a class that mixes in the module, you want to initialize the instance variables whenever an instance of the class is created. Solution Define an initialize method in the module, and call super in your class’s constructor. Here’s a Timeable module that tracks when objects are created and how old they are: module Timeable attr_reader :time_created def initialize @time_created = Time.now end def age #in seconds Time.now - @time_created end end Timeable has an instance variable time_created, and an initialize method that assigns Time.now (the current time) to the instance variable. Now let’s mix Timeable into another class that also defines an initialize method: class Character include Timeable attr_reader :name def initialize( name ) @name = name super( ) #calls Timeable's initialize end end 330 | Chapter 9: Modules and Namespaces c = Character.new "Fred" c.time_created # => Mon Mar 27 18:34:31 EST 2006 Discussion You can define and access instance variables within a module’s instance methods, but you can’t actually instantiate a module. A module’s instance variables only exist within objects of a class that includes the module. However, classes don’t usually need to know about the instance variables defined bythe modules theyinclude. That sort of information should be initialized and maintained by the module itself. The Character#initialize method overrides the Timeable#initialize method, but you can use super to call the Timeable constructor from within the Character con- structor. When a module is included in a class, that module becomes an ancestor of the class. We can test this in the context of the example above bycalling the Module#ancestors on the Character class: Character.ancestors # => [Character, Timeable, Object, Kernel] When you call super from within a method (such as initialize), Rubyfinds every ancestor that defines a method with the same name, and calls it too. See Also • Recipe 8.13, “Calling a Superclass’s Method” • Sometimes an initialize method won’t work; see Recipe 9.3, “Mixing in Class Methods,” for when it won’t work, and how to manage without one • Recipe 9.9, “AutomaticallyInitializing Mixed-In Modules,” covers an even more complex case, when you want a module to perform some initialization, without making the class that includes do anything at all beyond the initial include 9.9 Automatically Initializing Mixed-In Modules Credit: Phil Tomson Problem You’ve written a module that gets mixed into classes. Your module has some initial- ization code that needs to run whenever the mixed-into class is initialized. You do not want users of your module to have to call super in their initialize methods. Solution First, we need a wayfor classes to keep track of which modules they’veincluded. We also need to redefine Class#new to call a module-level initialize method for each 9.9 Automatically Initializing Mixed-In Modules | 331 included module. Fortunately, Ruby’s flexibility lets us makes changes to the built-in Class class (though this should never be done lightly): class Class def included_modules @included_modules ||= [] end alias_method :old_new, :new def new(*args, &block) obj = old_new(*args, &block) self.included_modules.each do |mod| mod.initialize if mod.respond_to?(:initialize) end obj end end Now everyclass has a list of included modules, accessable from the included_modules class method. We’ve also redefined the Class#new method so that it iterates through all the modules in included_modules, and calls the module-level initialize method of each. All that’s missing is a wayto add included modules to included_modules. We’ll put this code into an Initializable module. A module that wants to be initializable can mix this module into itself and define an initialize method: module Initializable def self.included(mod) mod.extend ClassMethods end module ClassMethods def included(mod) if mod.class != Module #in case Initializeable is mixed-into a class puts "Adding #{self} to #{mod}'s included_modules" if $DEBUG mod.included_modules << self end end end end The included callback method is called whenever this module is included in another module. We’re using the pattern shown in Recipe 9.3 to add an included callback method into the receiving module. If we didn’t do this, you’d have to use that pat- tern yourself for every module you wanted to be Initializable. Discussion That’s a lot of code, but here’s the payoff. Let’s define a couple of modules which include Initializeable and define initialize module methods: 332 | Chapter 9: Modules and Namespaces module A include Initializable def self.initialize puts "A's initialized." end end module B include Initializable def self.initialize puts "B's initialized." end end We can now define a class that mixes in both modules. Instantiating the class instan- tiates the modules, with not a single super call in sight! class BothAAndB include A include B end both = BothAAndB.new # A's initialized. # B's initialized. The goal of this recipe is verysimilar to Recipe 9.8. In that recipe, youcall super in a class’s initialize method to call a mixed-in module’s initialize method. That rec- ipe is a lot simpler than this one and doesn’t require anychanges to built-in classes, so it’s often preferable to this one. Consider a case like the BothAAndB class above. Using the techniques from Recipe 9.8, you’d need to make sure that both A and B had calls to super in their initialize meth- ods, so that each module would get initialized. This solution moves all of that work into the Initializable module and the built-in Class class. The other drawback of the previous technique is that the user of your module needs to know to call super some- where in their initialize method. Here, everything happens automatically. This technique is not without its pitfalls. Anytime you redefine critical built-in meth- ods like Class#new, you need to be careful: someone else may have already redefined it elsewhere in your program. Also, you won’t be able to define your own included method callback in a module which includes Initializeable: doing so will override the callback defined by Initializable itself. See Also • Recipe 9.3, “Mixing in Class Methods” • Recipe 9.8, “Initializing Instance Variables Defined bya Module” 333 Chapter 10 CHAPTER 10 Reflection and Metaprogramming10 In a dynamic language like Ruby, few pieces are static. Classes can grow new meth- ods and lose the ones theyhad before. Methods can be defined manually,or auto- matically with well-written code. Probablythe most interesting aspect of the Rubyprogramming philosophyis its use of reflection and metaprogramming to save the programmer from having to write repeti- tive code. In this chapter, we will teach you the ways and the joys of these techniques. Reflection lets you treat classes and methods as objects. With reflection you can see which methods you can call on an object (Recipes 10.2 and 10.3). You can grab one of its methods as an object (Recipe 10.4), and call it or pass it in to another method as a code block. You can get references to the class an object implements and the modules it includes, and print out its inheritance structure (Recipe 10.1). Reflection is especiallyuseful when you’reinteractivelyexamining an unfamiliar object or class structure. Metaprogramming is to programming as programming is to doing a task byhand. If you need to sort a file of a hundred lines, you don’t open it up in a text editor and start shuffling the lines: you write a program to do the sort. By the same token, if you need to give a Rubyclass a hundred similar methods, youshouldn’t just start writ- ing the methods one at a time. You should write Rubycode that defines the meth- ods for you (Recipe 10.10). Or you should make your class capable of intercepting calls to those methods: this way, you can implement the methods without ever defin- ing them at all (Recipe 10.8). Methods you’ve seen already, like attr_reader, use metaprogramming to define cus- tom methods according to your specifications. Recipe 8.2 created a few more of these “decorator” methods; Recipe 10.16 in this chapter shows a more complex example of the same principle. You can metaprogram in Rubyeither bywriting normal Rubycode that uses a lot of reflection, or bygenerating a string that contains Rubycode, and evaluating the string. Writing normal Rubycode with reflection is generallysafer, but sometimes 334 | Chapter 10: Reflection and Metaprogramming the reflection just gets to be too much and you need to evaluate a string. We provide a demonstration recipe for each technique (Recipes 10.10 and 10.11). 10.1 Finding an Object’s Class and Superclass Problem Given a class, you want an object corresponding to its class, or to the parent of its class. Solution Use the Object#class method to get the class of an object as a Class object. Use Class#superclass to get the parent Class of a Class object: 'a string'.class # => String 'a string'.class.name # => "String" 'a string'.class.superclass # => Object String.superclass # => Object String.class # => Class String.class.superclass # => Module 'a string'.class.new # => "" Discussion Class objects in Rubyare first-class objects that can be assigned to variables, passed as arguments to methods, and modified dynamically. Many of the recipes in this chapter and Chapter 8 discuss things you can do with a Class object once you have it. The superclass of the Object class is nil. This makes it easyto iterate up an inherit- ance hierarchy: class Class def hierarchy (superclass ? superclass.hierarchy : []) << self end end Array.hierarchy # => [Object, Array] class MyArray < Array end MyArray.hierarchy # => [Object, Array, MyArray] While Rubydoes not support multiple inheritance, the language allows mixin Modules that simulate it (see Recipe 9.1). The Modules included bya given Class (or another Module) are accessible from the Module#ancestors method. A class can have onlyone superclass, but it mayhave anynumber of ancestors. The list returned by Module#ancestors contains the entire inheritance hierarchy(including 10.2 Listing an Object’s Methods | 335 the class itself), anymodules the class includes, and the ever-present Kernel module, whose methods are accessible from anywhere because Object itself mixes it in. String.superclass # => Object String.ancestors # => [String, Enumerable, Comparable, Object, Kernel] Array.ancestors # => [Array, Enumerable, Object, Kernel] MyArray.ancestors # => [MyArray, Array, Enumerable, Object, Kernel] Object.ancestors # => [Object, Kernel] class MyClass end MyClass.ancestors # => [MyClass, Object, Kernel] See Also • Most of Chapter 8 • Recipe 9.1, “Simulating Multiple Inheritance with Mixins” 10.2 Listing an Object’s Methods Problem Given an unfamiliar object, you want to see what methods are available to call. Solution All Rubyobjects implement the Object#methods method. It returns an arraycontain- ing the names of the object’s public instance methods: Object.methods # => ["name", "private_class_method", "object_id", "new", # "singleton_methods", "method_defined?", "equal?", ... ] To get a list of the singleton methods of some object (usually, but not always, a class), use Object#singleton_methods: Object.singleton_methods # => [] Fixnum.singleton_methods # => ["induced_from"] class MyClass def MyClass.my_singleton_method end def my_instance_method end end MyClass.singleton_methods # => ["my_singleton_method"] To list the instance methods of a class, call instance_methods on the object. This lets you list the instance methods of a class without instantiating the class: ''.methods == String.instance_methods # => true 336 | Chapter 10: Reflection and Metaprogramming The output of these methods are most useful when sorted: Object.methods.sort # => ["<", "<=", "<=>", "==", "===", "=~", ">", ">=", # "__id__", "__send_ _", "allocate", "ancestors", ... ] Rubyalso defines some elementarypredicates along the same lines. To see whether a class defines a certain instance method, call method_defined? on the class or respond_ to? on an instance of the class. To see whether a class defines a certain class method, call respond_to? on the class: MyClass.method_defined? :my_instance_method # => true MyClass.new.respond_to? :my_instance_method # => true MyClass.respond_to? :my_instance_method # => false MyClass.respond_to? :my_singleton_method # => true Discussion It often happens that while you’re in an interactive Ruby session, you need to look up which methods an object supports, or what a particular method is called. Look- ing directlyat the object is faster than looking its class up in a book. If you’reusing a librarylike Rails or Facets, or yourcode has been adding methods to the built-in classes, it’s also more reliable. Noninteractive code can also benefit from knowing whether a given object imple- ments a certain method. You can use this to enforce an interface, allowing anyobject to be passed into a method so long as the argument implements certain methods (see Recipe 10.16). If you find yourself using respond_to? a lot in an interactive Rubysession, you’rea good customer for irb’s autocomplete feature. Put the following line in your .irbrc file or equivalent: require 'irb/completion' #Depending on your system, you may also have to add the following line: IRB.conf[:use_readline] = true Then you can type (for instance) “[1,2,3].”, hit the Tab key, and see a list of all the methods you can call on the array [1, 2, 3]. methods, instance_methods, and singleton_methods will onlyreturn public methods, and method_defined? will onlyreturn true if yougive it the name of a public method. Rubyprovides analagous methods for discovering protected and private methods, though these are less useful. All the relevant methods are presented in Table 10-1. Table 10-1. Discovering protected and private methods Goal Public Protected Private List the methods of an object methods or public_ methods protected_methods private_methods List the instance methods defined by a class instance_methods or public_instance_ methods protected_instance_ methods private_instance_ methods 10.3 Listing Methods Unique to an Object | 337 Just because you can see the names of protected or private methods in a list doesn’t mean you can call the methods, or that respond_to? will find them: String.private_instance_methods.sort # => ["Array", "Float", "Integer", "String", "`", "abort", "at_exit", # "autoload","autoload?", "binding", "block_given?", "callcc", ... ] String.new.respond_to? :autoload? # => false String.new.autoload? # NoMethodError: private method `autoload?' called for "":String See Also • To strip awayirrelevant methods, see Recipe 10.3, “Listing Methods Unique to an Object” • Recipe 10.4, “Getting a Reference to a Method,” shows how to assign a Method object to a variable, given its name; among other things, this lets you find out how many arguments a method takes • See Recipe 10.6, “Listening for Changes to a Class,” to set up a hook to be called whenever a new method or singleton method is defined for a class • Recipe 10.16, “Enforcing Software Contracts” 10.3 Listing Methods Unique to an Object Problem When you list the methods available to an object, the list is cluttered with extrane- ous methods defined in the object’s superclasses and mixed-in modules. You want to see a list of only the methods defined by that object’s direct class. Solution Subtract the instance methods defined bythe object’s superclass. You’ll be left with onlythe methods defined bythe object’s direct class (plus anymethods defined on List the singleton methods defined by a class singleton_methods N/A N/A Does this class define such- and-such an instance method? method_defined? or public_method_ defined? protected_method_ defined? private_method_ defined? Will this object respond to such-and-such an instance method? respond_to? N/A N/A Table 10-1. Discovering protected and private methods (continued) Goal Public Protected Private 338 | Chapter 10: Reflection and Metaprogramming the object after its creation). The my_methods_only method defined below gives this capability to every Ruby object: class Object def my_methods_only my_super = self.class.superclass return my_super ? methods - my_super.instance_methods : methods end end s = '' s.methods.size # => 143 Object.instance_methods.size # => 41 s.my_methods_only.size # => 102 (s.methods - Object.instance_methods).size # => 102 def s.singleton_method( ) end s.methods.size # => 144 s.my_methods_only.size # => 103 class Object def new_object_method end end s.methods.size # => 145 s.my_methods_only.size # => 103 class MyString < String def my_string_method end end MyString.new.my_methods_only # => ["my_string_method"] Discussion The my_methods_only technique removes methods defined in the superclass, the par- ent classes of the superclass, and in anymixin modules included bythose classes. For instance, it removes the 40 methods defined bythe Object class when it mixed in the Kernel module. It will not remove methods defined bymixin modules included by the class itself. Usuallythese methods aren’t clutter, but there can be a lot of them (for instance, Enumerable defines 22 methods). To remove them, you can start out with my_ methods_only, then iterate over the ancestors of the class in question and subtract out all the methods defined in modules: class Object def my_methods_only_no_mixins self.class.ancestors.inject(methods) do |mlist, ancestor| mlist = mlist - ancestor.instance_methods unless ancestor.is_a? Class mlist 10.4 Getting a Reference to a Method | 339 end end [].methods.size # => 121 [].my_methods_only.size # => 78 [].my_methods_only_no_mixins.size # => 57 See Also • Recipe 10.1, “Finding an Object’s Class and Superclass,” explores ancestors in more detail 10.4 Getting a Reference to a Method Problem You want to the name of a method into a reference to the method itself. Solution Use the eponymous Object#method method: s = 'A string' length_method = s.method(:length) # => # length_method.arity # => 0 length_method.call # => 8 Discussion The Object#methods introspection method returns an arrayof strings, each contain- ing the name of one of the methods available to that object. You can pass anyof these names into an object’s method method and get a Method object corresponding to that method of that object. A Method object is bound to the particular object whose method method you called. Invoke the method’s Method#call method, and it’s just like calling the object’s method directly: 1.succ # => 2 1.method(:succ).call # => 2 The Method#arity method indicates how manyarguments the method takes. Argu- ments, including block arguments, are passed to call just as theywould be to the original method: 5.method('+').call(10) # => 15 [1,2,3].method(:each).call { |x| puts x } # 1 # 2 # 3 340 | Chapter 10: Reflection and Metaprogramming A Method object can be stored in a variable and passed as an argument to other meth- ods. This is useful for passing preexisting methods into callbacks and listeners: class EventSpawner def initialize @listeners = [] @state = 0 end def subscribe(&listener) @listeners << listener end def change_state(new_state) @listeners.each { |l| l.call(@state, new_state) } @state = new_state end end class EventListener def hear(old_state, new_state) puts "Method triggered: state changed from #{old_state} " + "to #{new_state}." end end spawner = EventSpawner.new spawner.subscribe do |old_state, new_state| puts "Block triggered: state changed from #{old_state} to #{new_state}." end spawner.subscribe &EventListener.new.method(:hear) spawner.change_state(4) # Block triggered: state changed from 0 to 4. # Method triggered: state changed from 0 to 4. A Method can also be used as a block: s = "sample string" replacements = { "a" => "i", "tring" => "ubstitution" } replacements.collect(&s.method(:gsub)) # => ["simple string", "sample substitution"] You can’t obtain a reference to a method that’s not bound to a specific object, because the behavior of call would be undefined. You can get a reference to a class method bycalling method on the class. When you do this, the bound object is the class itself: an instance of the Class class. Here’s an example showing how to obtain references to an instance and a class method of the same class: class Welcomer def Welcomer.a_class_method return "Greetings from the Welcomer class." end 10.5 Fixing Bugs in Someone Else’s Class | 341 def an_instance_method return "Salutations from a Welcomer object." end end Welcomer.method("an_instance_method") # NameError: undefined method `an_instance_method' for class `Class' Welcomer.new.method("an_instance_method").call # => "Salutations from a Welcomer object." Welcomer.method("a_class_method").call # => "Greetings from the Welcomer class." See Also • Recipe 7.11, “Coupling Systems Loosely with Callbacks,” contains a more com- plex listener example 10.5 Fixing Bugs in Someone Else’s Class Problem You’re using a class that’s got a bug in one of its methods. You know where the bug is and how to fix it, but you can’t or don’t want to change the source file itself. Solutions Extend the class from within your program and overwrite the buggy method with an implementation that fixes the bug. Create an alias for the buggyversion of the method, so you can still access it if necessary. Suppose you’re trying to use the buggy method in the Multiplier class defined below: class Multiplier def double_your_pleasure(pleasure) return pleasure * 3 # FIXME: Actually triples your pleasure. end end m = Multiplier.new m.double_your_pleasure(6) # => 18 Reopen the class, alias the buggymethod to another name, then redefine it with a correct implementation: class Multiplier alias :double_your_pleasure_BUGGY :double_your_pleasure def double_your_pleasure(pleasure) return pleasure * 2 end end 342 | Chapter 10: Reflection and Metaprogramming m.double_your_pleasure(6) # => 12 m.double_your_pleasure_BUGGY(6) # => 18 Discussion In manyprogramming languages a class, function, or method can’t be modified after its initial definition. In other languages, this behavior is possible but not encouraged. For Rubyprogrammers, the abilityto reprogram classes on the flyis just another technique for the toolbox, to be used when necessary. It’s most commonly used to add new code to a class, but it can also be used to deploya drop-in replacement for buggy or slow implementation of a method. Since Rubyis (at least right now) a purelyinterpreted language, youshould be able to find the source code of anyRubyclass used byyourprogram. If a method in one of those classes has a bug, you should be able to copy and paste the original Ruby implementation into your code and fix the bug in the new copy.* This is not an ele- gant technique, but it’s often better than distributing a slightlymodified version of the entire class or library (that is, copying and pasting a whole file). When you fix the buggy behavior, you should also send your fix to the maintainer of the software that contains the bug. The sooner you can get the fix out of your code, the better. If the software package is abandoned, you should at least post the fix online so others can find it. If a method isn’t buggy, but simply doesn’t do what you’d like it to do, add a new method to the class (or create a subclass) instead of redefining the old one. Methods you don’t know about may use the behavior of the method as it is. Of course, there could be methods that relyon the buggybehavior of a buggymethod, but that’s less likely. See Also • Throughout this book we use techniques like this to work around bugs and per- formance problems in the Rubystandard library(although most of the bugs have been fixed in Ruby1.9); see, for instance, Recipe 2.7, “Taking Loga- rithms,” Recipe 2.16, “Generating Prime Numbers,” and Recipe 6.18, “Delet- ing a File” • Recipe 10.14, “Aliasing Methods” * Bugs in RubyC extensions are much more difficult to patch. You might be able to write equivalent Ruby code, but there’s probablya reason whythe original code was written in C. Since C doesn’t share Ruby’s attitude towards redefining functions on the fly, you’ll need to fix the bug in the original C code and recom- pile the extension. 10.6 Listening for Changes to a Class | 343 10.6 Listening for Changes to a Class Credit: Phil Tomson Problem You want to be notified when the definition of a class changes. You might want to keep track of new methods added to the class, or existing methods that get removed or undefined. Being notified when a module is mixed into a class can also be useful. Solution Define the class methods method_added, method_removed, and/or method_undefined. Whenever the class gets a method added, removed, or undefined, Rubywill pass its symbol into the appropriate callback method. The following example prints a message whenever a method is added, removed, or undefined. If the method “important” is removed, undefined, or redefined, it throws an exception. class Tracker def important "This is an important method!" end def self.method_added(sym) if sym == :important raise 'The "important" method has been redefined!' else puts %{Method "#{sym}" was (re)defined.} end end def self.method_removed(sym) if sym == :important raise 'The "important" method has been removed!' else puts %{Method "#{sym}" was removed.} end end def self.method_undefined(sym) if sym == :important raise 'The "important" method has been undefined!' else puts %{Method "#{sym}" was removed.} end end end If someone adds a method to the class, a message will be printed: class Tracker def new_method 344 | Chapter 10: Reflection and Metaprogramming 'This is a new method.' end end # Method "new_method" was (re)defined. Short of freezing the class, you can’t prevent the important method from being removed, undefined, or redefined, but you can raise a stink (more precisely, an exception) if someone changes it: class Tracker undef :important end # RuntimeError: The "important" method has been undefined! Discussion The class methods we’ve defined in the Tracker class (method_added, method_removed, and method_undefined) are hook methods. Some other piece of code (in this case, the Rubyinterpreter) knows to call anymethods bythat name when certain conditions are met. The Module class defines these methods with emptybodies: bydefault, noth- ing special happens when a method is added, removed, or undefined. Given the code above, we will not be notified if our Tracker class later mixes in a module. We won’t hear about the module itself, nor about the new methods that are available because of the module inclusion. class Tracker include Enumerable end # Nothing! Detecting module inclusion is trickier. Rubyprovides a hook method Module#included, which is called on a module whenever it’s mixed into a class. But we want the opposite: a hook method that’s called on a particular class whenever it includes a module. Since Rubydoesn’t provide a hook method for module inclusion, we must define our own. To do this, we’ll need to change Module#include itself. class Module alias_method :include_no_hook, :include def include(*modules) # Run the old implementation. include_no_hook(*modules) # Then run the hook. modules.each do |mod| self.include_hook mod end end def include_hook # Do nothing by default, just like Module#method_added et al. # This method must be overridden in a subclass to do something useful. 10.7 Checking Whether an Object Has Necessary Attributes | 345 end end Now when a module is included into a class, Rubywill call that class’s include_hook method. If we define a Tracker#include_hook method, we can have Rubynotifyus of inclusions: class Tracker def self.include_hook(mod) puts %{"#{mod}" was included in #{self}.} end end class Tracker include Enumerable end # "Enumerable" was included in Tracker. See Also • Recipe 9.3, “Mixing in Class Methods,” for more on the Module#included method • Recipe 10.13, “Undefining a Method,” for the difference between removing and undefining a method 10.7 Checking Whether an Object Has Necessary Attributes Problem You’re writing a class or module that delegates the creation of some of its instance variables to a hook method. You want to be make sure that the hook method actu- ally created those instance variables. Solution Use the Object#instance_variables method to get a list of the instance variables. Check them over to make sure all the necessaryinstance variables have been defined. This Object#must_have_instance_variables method can be called at any time: class Object def must_have_instance_variables(*args) vars = instance_variables.inject({}) { |h,var| h[var] = true; h } args.each do |var| unless vars[var] raise ArgumentError, %{Instance variable "@#{var} not defined"} end end end end 346 | Chapter 10: Reflection and Metaprogramming The best place to call this method is in initialize or some other setup method of a module. Alternatively, you could accept values for the instance variables as argu- ments to the setup method: module LightEmitting def LightEmitting_setup must_have_instance_variables :light_color, :light_intensity @on = false end # Methods that use @light_color and @light_intensity follow... end You can call this method from a class that defines a virtual setup method, to make sure that subclasses actually use the setup method correctly: class Request def initialize gather_parameters # This is a virtual method defined by subclasses must_have_instance_variables :action, :user, :authentication end # Methods that use @action, @user, and @authentication follow... end Discussion Although Object#must_have_instance_variables is defined and called like anyother method, it’s conceptuallya “decorator” method similar to attr_accessor and private. That’s whyI didn’t use parentheses above, even though I called it with mul- tiple arguments. The lack of parentheses acts as a visual indicator that you’re calling a decorator method, one that alters or inspects a class or object. Here’s a similar method that you can use from outside the object. It basically imple- ments a batch form of duck typing: instead of checking an object’s instance variables (which are onlyavailable inside the object), it checks whether the object supports all of the methods you need to call on it. It’s useful for checking from the outside whether an object is the “shape” you expect. class Object def must_support(*args) args.each do |method| unless respond_to? method raise ArgumentError, %{Must support "#{method}"} end end end end obj = "a string" obj.must_support :to_s, :size, "+".to_sym obj.must_support "+".to_sym, "-".to_sym # ArgumentError: Must support "-" 10.8 Responding to Calls to Undefined Methods | 347 See Also • Recipe 10.16, “Enforcing Software Contracts” 10.8 Responding to Calls to Undefined Methods Problem Rather than having Rubyraise a NoMethodError when someone calls an undefined method on an instance of your class, you want to intercept the method call and do something else with it. Or you are faced with having to explicitly define a large (possibly infinite) number of methods for a class. You would rather define a single method that can respond to an infinite number of method names. Solution Define a method_missing method for your class. Whenever anyone calls a method that would otherwise result in a NoMethodError, the method_missing method is called instead. It is passed the symbol of the nonexistent method, and any arguments that were passed in. Here’s a class that modifies the default error handling for a missing method: class MyClass def defined_method 'This method is defined.' end def method_missing(m, *args) "Sorry, I don't know about any #{m} method." end end o = MyClass.new o.defined_method # => "This method is defined." o.undefined_method # => "Sorry, I don't know about any undefined_method method." In the second example, I’ll define an infinitude of new methods on Fixnum bygiving it a method_missing implementation. Once I’m done, Fixnum will answer to anymethod that looks like “plus_#” and takes no arguments. class Fixnum def method_missing(m, *args) if args.size > 0 raise ArgumentError.new("wrong number of arguments (#{args.size} for 0)") end 348 | Chapter 10: Reflection and Metaprogramming match = /^plus_([0-9]+)$/.match(m.to_s) if match self + match.captures[0].to_i else raise NoMethodError. new("undefined method `#{m}' for #{inspect}:#{self.class}") end end end 4.plus_5 # => 9 10.plus_0 # => 10 -1.plus_2 # => 1 100.plus_10000 # => 10100 20.send(:plus_25) # => 45 100.minus_3 # NoMethodError: undefined method `minus_3' for 100:Fixnum 100.plus_5(105) # ArgumentError: wrong number of arguments (1 for 0) Discussion The method_missing technique is frequentlyfound in delegation scenarios, when one object needs to implement all of the methods of another object. Rather than defining each method, a class implements method_missing as a catch-all, and uses send to dele- gate the “missing” method calls to other objects. The built-in delegate librarymakes this easy(see Recipe 8.8), but for the sake of illustration, here’s a class that delegates almost all its methods to a string. Note that this class doesn’t itself subclass String. class BackwardsString def initialize(s) @s = s end def method_missing(m, *args, &block) result = @s.send(m, *args, &block) result.respond_to?(:to_str) ? BackwardsString.new(result) : result end def to_s @s.reverse end def inspect to_s end end The interesting thing here is the call to Object#send. This method takes the name of another method, and calls that method with the given arguments. We can delegate 10.8 Responding to Calls to Undefined Methods | 349 anymissing method call to the underlyingstring without even looking at the method name. s = BackwardsString.new("I'm backwards.") # => .sdrawkcab m'I s.size # => 14 s.upcase # => .SDRAWKCAB M'I s.reverse # => I'm backwards. s.no_such_method # NoMethodError: undefined method `no_such_method' for "I'm backwards.":String The method_missing technique is also useful for adding syntactic sugar to a class. If one method of your class is frequently called with a string argument, you can make object.string a shortcut for object.method("string"). Consider the Library class below, and its simple query interface: class Library < Array def add_book(author, title) self << [author, title] end def search_by_author(key) reject { |b| !match(b, 0, key) } end def search_by_author_or_title(key) reject { |b| !match(b, 0, key) && !match(b, 1, key) } end :private def match(b, index, key) b[index].index(key) != nil end end l = Library.new l.add_book("James Joyce", "Ulysses") l.add_book("James Joyce", "Finnegans Wake") l.add_book("John le Carre", "The Little Drummer Boy") l.add_book("John Rawls", "A Theory of Justice") l.search_by_author("John") # => [["John le Carre", "The Little Drummer Boy"], # ["John Rawls", "A Theory of Justice"]] l.search_by_author_or_title("oy") # => [["James Joyce", "Ulysses"], ["James Joyce", "Finnegans Wake"], # ["John le Carre", "The Little Drummer Boy"]] We can make certain queries a little easier to write byadding some syntacticsugar. It’s as simple as defining a wrapper method; its power comes from the fact that Ruby directs all unrecognized method calls to this wrapper method. 350 | Chapter 10: Reflection and Metaprogramming class Library def method_missing(m, *args) search_by_author_or_title(m.to_s) end end l.oy # => [["James Joyce", "Ulysses"], ["James Joyce", "Finnegans Wake"], # ["John le Carre", "The Little Drummer Boy"]] l.Fin # => [["James Joyce", "Finnegans Wake"]] l.Jo # => [["James Joyce", "Ulysses"], ["James Joyce", "Finnegans Wake"], # ["John le Carre", "The Little Drummer Boy"], # ["John Rawls", "A Theory of Justice"]] You can also define a method_missing method on a class. This is useful for adding syntacticsugar to factoryclasses. Here’s a simple factoryclass that makes it easyto create strings (as though this weren’t already easy): class StringFactory def StringFactory.method_missing(m, *args) return String.new(m.to_s, *args) end end StringFactory.a_string # => "a_string" StringFactory.another_string # => "another_string" As before, an attempt to call an explicitlydefined method will not trigger method_ missing: StringFactory.superclass # => Object The method_missing method intercepts all calls to undefined methods, including the mistyped names of calls to “real” methods. This is a common source of bugs. If you run into trouble using your class, the first thing you should do is add debug state- ments to method_missing, or comment it out altogether. If you’re using method_missing to implicitlydefine methods, youshould also be aware that Object.respond_to? returns false when called with the names of those methods. After all, they’re not defined! 25.respond_to? :plus_20 # => false You can override respond_to? to fool outside objects into thinking you’ve got explicit definitions for methods you’ve actually defined implicitly in method_missing. Be very careful, though; this is another common source of bugs. class Fixnum def respond_to?(m) super or (m.to_s =~ /^plus_([0-9]+)$/) != nil 10.9 Automatically Initializing Instance Variables | 351 end end 25.respond_to? :plus_20 # => true 25.respond_to? :succ # => true 25.respond_to? :minus_20 # => false See Also • Recipe 2.13, “Simulating a Subclass of Fixnum” • Recipe 8.8, “Delegating Method Calls to Another Object,” for an alternate imple- mentation of delegation that’s usually easier to use 10.9 Automatically Initializing Instance Variables Problem You’re writing a class constructor that takes a lot of arguments, each of which is sim- ply assigned to an instance variable. class RGBColor(red=0, green=0, blue=0) @red = red @green = green @blue = blue end You’d like to avoid all the typing necessary to do those variable assignments. Solution Here’s a method that initializes the instance variables for you. It takes as an argu- ment the list of variables passed into the initialize method, and the binding of the variables to values. class Object private def set_instance_variables(binding, *variables) variables.each do |var| instance_variable_set("@#{var}", eval(var, binding)) end end end Using this method, you can eliminate the tedious variable assignments: class RGBColor def initialize(red=0, green=0, blue=0) set_instance_variables(binding, *local_variables) end end RGBColor.new(10, 200, 300) # => # 352 | Chapter 10: Reflection and Metaprogramming Discussion Our set_instance_variables takes a list of argument names to turn into instance variables, and a Binding containing the values of those arguments as of the method call. For each argument name, an eval statement binds the corresponding instance variable to the corresponding value in the Binding. Since you control the names of your own variables, this eval is about as safe as it gets. The names of a method’s arguments aren’t accessible from Rubycode, so how do we get that list? Through trickery. When a method is called, any arguments passed in are immediatelybound to local variables. At the verybeginning of the method, these are the only local variables defined. This means that calling Kernel#local_variables at the beginning of a method will get a list of all the argument names. If your method accepts arguments that you don’t want to set as instance variables, simplyremove their names from the result of Kernel#local_variables before passing the list into set_instance_variables: class RGBColor def initialize(red=0, green=0, blue=0, debug=false) set_instance_variables(binding, *local_variables-['debug']) puts "Color: #{red}/#{green}/#{blue}" if debug end end RGBColor.new(10, 200, 255, true) # Color: 10/200/255 # => # 10.10 Avoiding Boilerplate Code with Metaprogramming Problem You’ve got to type in a lot of repetitive code that a trained monkey could write. You’re resentful at having to do this yourself, and angry that the repetitive code will clutter up your class listings. Solution Rubyis happyto be the trained monkeythat writes yourrepetitive code. You can define methods algorithmically with Module#define_method. Usuallythe repetitive code is a bunch of similar methods. Suppose youneed to write code like this: class Fetcher def fetch(how_many) puts "Fetching #{how_many ? how_many : "all"}." end 10.10 Avoiding Boilerplate Code with Metaprogramming | 353 def fetch_one fetch(1) end def fetch_ten fetch(10) end def fetch_all fetch(nil) end end You can define this exact same code without having to write it all out. Create a data structure that contains the differences between the methods, and iterate over that structure, defining a method each time with define_method. class GeneratedFetcher def fetch(how_many) puts "Fetching #{how_many ? how_many : "all"}." end [["one", 1], ["ten", 10], ["all", nil]].each do |name, number| define_method("fetch_#{name}") do fetch(number) end end end GeneratedFetcher.instance_methods - Object.instance_methods # => ["fetch_one", "fetch", "fetch_ten", "fetch_all"] GeneratedFetcher.new.fetch_one # Fetching 1. GeneratedFetcher.new.fetch_all # Fetching all. This is less to type, less monkeyish, and it takes up less space in your class listing. If you need to define more of these methods, you can add to the data structure instead of writing out more boilerplate. Discussion Programmers have always preferred writing new code to cranking out variations on old code. From lex and yacc to modern programs like Hibernate and Cog, we’ve always used tools to generate code that would be tedious to write out manually. Instead of generating code with an external tool, Rubyprogrammers do it from within Ruby.* There are two officiallysanctioned techniques. The nicer technique is * This would make a good bumper sticker: “Ruby programmers do it from within Ruby.” 354 | Chapter 10: Reflection and Metaprogramming to use define_method to create a method whose implementation can use the local variables available at the time it was defined. The built-in decorator methods we’ve alreadyseen use metaprogramming. The attr_ reader method takes a string as an argument, and defines a method whose name and implementation is based on that string. The code that’s the same for everyreader method is factored out into attr_reader; all you have to provide is the tiny bit that’s different every time. Methods whose code you generated are indistinguishable from methods that you wrote out longhand. Theywill show up in method lists and in generated RDoc docu- mentation (if you’re metaprogramming with string evaluations, as seen in the next recipe, you can even generate the RDoc documentation and put it at the beginning of a generated method). Usuallyyou’lluse metaprogramming the way attr_reader does: to attach new meth- ods to a class or module. For this you should use define_method, if possible. How- ever, the block you pass into define_method needs to itself be valid Rubycode, and this can be cumbersome. Consider the following generated methods: class Numeric [["add", "+"], ["subtract", "-"], ["multiply", "*",], ["divide", "/"]].each do |method, operator| define_method("#{method}_2") do method(operator).call(2) end end end 4.add_2 # => 6 10.divide_2 # => 5 Within the block passed into define_method, we have to jump through some reflec- tion hoops to get a reference to the operator we want to use. You can’t just write self operator 2, because operator isn’t an operator: it’s a variable containing an operator name. See the next recipe for another metaprogramming technique that uses string substitution instead of reflection. Another of define_method’s shortcomings is that in Ruby1.8, youcan’t use it to define a method that takes a block. The following code will work in Ruby1.9 but not in Ruby 1.8: define_method "call_with_args" do |*args, &block| block.call(*args) end call_with_args(1, 2) { |n1, n2| n1 + n2 } # => 3 call_with_args "mammoth" { |x| x.upcase } # => "MAMMOTH" 10.11 Metaprogramming with String Evaluations | 355 See Also • Metaprogramming is used throughout this book to generate a bunch of meth- ods at once, or to make it easyto define certain kinds of methods; see, for instance, Recipe 4.7, “Making Sure a Sorted Array Stays Sorted” • Because define_method is a private method, you can only use it within a class definition; Recipe 8.2, “Managing Class Data,” shows a case where it needs to be called outside of a class definition • The next recipe, Recipe 10.11, “Metaprogramming with String Evaluations” • Metaprogramming is a staple of Rubylibraries; it’s used throughout Rails, and in smaller libraries like delegate 10.11 Metaprogramming with String Evaluations Problem You’re trying to write some metaprogramming code using define_method, but there’s too much reflection going on for your code to be readable. It gets confusing and is almost as frustrating as having to write out the code in longhand. Solution You can define new methods bygenerating the definitions as strings and running them as Ruby code with one of the eval methods. Here’s a reprint of the metaprogramming example from the previous recipe, which uses define_method: class Numeric [['add', '+'], ['subtract', '-'], ['multiply', '*',], ['divide', '/']].each do |method, operator| define_method("#{method}_2") do method(operator).call(2) end end end The important line of code, method(operator).call(2), isn’t something you’d write in normal programming. You’d write something like self + 2 or self / 2, depending on which operator you wanted to apply. By writing your method definitions as strings, you can do metaprogramming that looks more like regular programming: class Numeric [['add', '+'], ['subtract', '-'], ['multiply', '*',], ['divide', '/']].each do |method, operator| module_eval %{ def #{method}_2 self.#{operator}(2) end } 356 | Chapter 10: Reflection and Metaprogramming end end 4.add_2 # => 6 10.divide_2 # => 5 Discussion You can do all of your metaprogramming with define_method, but the code doesn’t look a lot like the code you’d write in normal programming. You can’t set an instance variable with @foo = 4; you have to call instance_variable_set('foo', 4). The alternative is to generate a method definition as a string and execute the string as Rubycode. Most interpreted languages have a wayof parsing and executing arbi- trarystrings as code, but it’s usuallyregarded as a toyor a hazard, and not given much attention. Ruby breaks this taboo. The most common evalutation method used for metaprogramming is Module#module_ eval. This method executes a string as Rubycode, within the context of a class or module. Anymethods or class variables youdefine within the string will be attached to the class or module, just as if you’d typed the string within the class or module definition. Thanks to the variable substitutions, the generated string looks exactly like the code you’d type in manually. The following four pieces of code all define a new method String#last: class String def last(n) self[-n, n] end end "Here's a string.".last(7) # => "string." class String define_method('last') do |n| self[-n, n] end end "Here's a string.".last(7) # => "string." class String module_eval %{def last(n) self[-n, n] end} end "Here's a string.".last(7) # => "string." String.module_eval %{def last(n) self[-n, n] end} "Here's a string.".last(7) # => "string." 10.12 Evaluating Code in an Earlier Context | 357 The instance_eval method is less popular than module_eval. It works just like module_eval, but it runs inside an instance of a class rather than the class itself. You can use it to define singleton methods on a particular object, or to set instance vari- ables. Of course, you can also call define_method on a specific object. The other evaluation method is just plain eval. This method executes a string exactly as though you had written it as Ruby code in the same spot: class String eval %{def last(n) self[-n, n] end} end "Here's a string.".last(7) # => "string." You must be verycareful when youuse the eval methods, lest the end-user of a pro- gram trick you into running arbitrary Ruby code. When you’re metaprogramming, though, it’s not usuallya problem: the onlystrings that get evaluated are ones you constructed yourself from hardcoded data, and by the time your class is loaded and readyto use, the eval calls have alreadyrun. You should be safe unless your eval statement contains strings obtained from untrusted sources. This might happen if you’re creating a custom class, or modifying a class in response to user input. 10.12 Evaluating Code in an Earlier Context Problem You’ve written a method that evaluates a string as Rubycode. But whenever anyone calls the method, the objects referenced byyourstring go out of scope. Your string can’t be evaluated within a method. For instance, here’s a method that takes a variable name and tries to print out the value of the variable. def broken_print_variable(var_name) eval %{puts "The value of #{var_name} is " + #{var_name}.to_s} end The eval code onlyworks when it’s run in the same context as the variable defini- tion. It doesn’t work as a method, because your local variables go out of scope when you call a method. tin_snips = 5 broken_print_variable('tin_snips') # NameError: undefined local variable or method `tin_snips' for main:Object var_name = 'tin_snips' eval %{puts "The value of #{var_name} is " + #{var_name}.to_s} # The value of tin_snips is 5 358 | Chapter 10: Reflection and Metaprogramming Solution The eval method can execute a string of Rubycode as though youhad written in some other part of your application. This magic is made possible by Binding objects. You can get a Binding at anytime bycalling Kernel#binding, and pass it in to eval to recreate your original environment where it wouldn’t otherwise be available. Here’s a version of the above method that takes a Binding: def print_variable(var_name, binding) eval %{puts "The value of #{var_name} is " + #{var_name}.to_s}, binding end vice_grips = 10 print_variable('vice_grips', binding) # The value of vice_grips is 10 Discussion A Binding object is a bookmark of the Rubyinterpreter’s state. It tracks the values of any local variables you have defined, whether you are inside a class or method defini- tion, and so on. Once you have a Binding object, you can pass it into eval to run code in the same context as when you created the Binding. All the local variables you had back then will be available. If you called Kernel#binding within a class definition, you’ll also be able to define new methods of that class, and set class and instance variables. Since a Binding object contains references to all the objects that were in scope when it was created, those objects can’t be garbage-collected until both theyand the Binding object have gone out of scope. See Also • This trick is used in several places throughout this book; see, for example, Rec- ipe 1.3, “Substituting Variables into an Existing String,” and Recipe 10.9, “Auto- matically Initializing Instance Variables” 10.13 Undefining a Method Problem You want to remove an already defined method from a class or module. Solution From within a class or module, you can use Module#remove_method to remove a method’s implementation, forcing Rubyto delegate to the superclass or a module included by a class. 10.13 Undefining a Method | 359 In the code below, I subclass Array and override the << and [] methods to add some randomness. Then I decide that overriding [] wasn’t such a good idea, so I undefine that method and get the inherited Array behavior back. The override of << stays in place. class RandomizingArray < Array def <<(e) insert(rand(size), e) end def [](i) super(rand(size)) end end a = RandomizingArray.new a << 1 << 2 << 3 << 4 << 5 << 6 # => [6, 3, 4, 5, 2, 1] # That was fun; now let's get some of those entries back. a[0] # => 1 a[0] # => 2 a[0] # => 5 #No, seriously, a[0]. a[0] # => 4 #It's a madhouse! A madhouse! a[0] # => 3 #That does it! class RandomizingArray remove_method('[]') end a[0] # => 6 a[0] # => 6 a[0] # => 6 # But the overridden << operator still works randomly: a << 7 # => [6, 3, 4, 7, 5, 2, 1] Discussion Usually you’ll override a method by redefining it to implement your own desired behavior. However, sometimes a class will override an inherited method to do some- thing you don’t like, and you just want the “old” implementation back. You can onlyuse remove_method to remove a method from a class or module that explicitlydefines it. You’ll get an error if youtryto remove a method from a class that merelyinherits that method. To make a subclass stop responding to an inher- ited method, you should undefine the method with undef_method. Using undef_method on a class prevents the appropriate method signals from reach- ing objects of that class, but it has no effect on the parent class. 360 | Chapter 10: Reflection and Metaprogramming class RandomizingArray remove_method(:length) end # NameError: method `length' not defined in RandomizingArray class RandomizingArray undef_method(:length) end RandomizingArray.new.length # NoMethodError: undefined method `length' for []:RandomizingArray Array.new.length # => 0 As you can see, it’s generally safer to use undef_method on the class you actually want to change than to use remove_method on its parent or a module it includes. You can use remove_method to remove singleton methods once you’re done with them. Since remove_method is private, using it to remove a singleton method requires some unorthodox syntax: my_array = Array.new def my_array.random_dump(number) number.times { self << rand(100) } end my_array.random_dump(3) my_array.random_dump(2) my_array # => [6, 45, 12, 49, 66] # That's enough of that. class << my_array remove_method(:random_dump) end my_array.random_dump(4) # NoMethodError: undefined method `random_dump' for [6, 45, 12, 49, 66]:Array When you define a singleton method on an object, Ruby silently defines an anony- mous subclass used onlyfor that one object. In the example above, my_array is actu- allyan anonymoussubclass of Array that implements a method random_dump. Since the subclass has no name (my_array is a variable name, not a class name), there’s no wayof using the class syntax. We must “append” onto the definition of the my_array object. Class methods are just a special case of singleton methods, so you can also use remove_method to remove class methods. Rubyalso provides a couple of related meth- ods for removing things besides methods. Module#remove_constant undefines a con- stant so that it can be redefined with a different value, as seen in Recipe 8.17. Object#remove_instance_variable removes an instance variable from a single instance of a class: class OneTimeContainer def initialize(value) 10.14 Aliasing Methods | 361 @use_just_once_then_destroy = value end def get_value remove_instance_variable(:@use_just_once_then_destroy) end end object_1 = OneTimeContainer.new(6) object_1.get_value # => 6 object_1.get_value # NameError: instance variable @use_just_once_then_destroy not defined object_2 = OneTimeContainer.new('ephemeron') object_2.get_value # => "ephemeron" You can’t remove a particular instance variable from all instances bymodifyingthe class because the class is its own object, one which probablynever defined that instance variable in the first place: class MyClass remove_instance_variable(:@use_just_once_then_destroy) end # NameError: instance variable @use_just_once_then_destroy not defined You should definitelynot use these methods to remove methods or constants in sys- tem classes or modules: that might make arbitraryparts of the Rubystandard library crash or act unreliably. As with all metaprogramming, it’s easy to abuse the power to remove and undefine methods at will. See Also • Recipe 8.17, “Declaring Constants” • Recipe 10.5, “Fixing Bugs in Someone Else’s Class” 10.14 Aliasing Methods Problem You (or your users) frequently misremember the name of a method. To reduce the confusion, you want to make the same method accessible under multiple names. Alternatively, you’re about to redefine a method and you’d like to keep the old ver- sion available. 362 | Chapter 10: Reflection and Metaprogramming Solution You can create alias methods manually, but in most cases, you should let the alias command do it for you. In this example, I define an InventoryItem class that includes a price method to calculate the price of an item in quantity. Since it’s likely that some- one might misremember the name of the price method as cost, I’ll create an alias: class InventoryItem attr_accessor :name, :unit_price def initialize(name, unit_price) @name, @unit_price = name, unit_price end def price(quantity=1) @unit_price * quantity end #Make InventoryItem#cost an alias for InventoryItem#price alias :cost :price #The attr_accessor decorator created two methods called "unit_price" and #"unit_price=". I'll create aliases for those methods as well. alias :unit_cost :unit_price alias :unit_cost= :unit_price= end bacon = InventoryItem.new("Chunky Bacon", 3.95) bacon.price(100) # => 395.0 bacon.cost(100) # => 395.0 bacon.unit_price # => 3.95 bacon.unit_cost # => 3.95 bacon.unit_cost = 3.99 bacon.cost(100) # => 399.0 Discussion It’s difficult to pick the perfect name for a method: you must find the word or short phrase that best conveys an operation on a data structure, possibly an abstract opera- tion that has different “meanings” depending on context. Sometimes there will be no good name for a method and you’ll just have to pick one; sometimes there will be too many good names for a method and you’ll just have to pick one. In either case, your users may have difficulty remembering the “right” name of the method. You can help them out by creating aliases. Rubyitself uses aliases in its standard library:for instance, for the method of Array that returns the number of items in the array. The terminology used in area varies widely. Some languages use length or len to find the length of a list, and some use size.* * Java uses both: length is a member of a Java array, and size is a method that returns the size of a collection. 10.14 Aliasing Methods | 363 Rubycompromises bycalling its method Array#length, but also creating an alias called Array#size.* You can use either Array#length or Array#size because theydo the same thing based on the same code. If you come to Ruby from Python, you can make yourself a little more comfortable by creating yet another alias for length: class Array alias :len :length end [1, 2, 3, 4].len # => 4 The alias command doesn’t make a single method respond to two names, or create a shell method that delegates to the “real” method. It makes an entirelyseparate copyof the old method under the new name. If youthen modifythe original method, the alias will not be affected. This mayseem wasteful, but it’s frequentlyuseful to Rubyprogrammers, who love to redefine methods that aren’t working the way they’d like. When you redefine a method, it’s good practice to first alias the old method to a different name, usually the original name with an _old suffix. This way, the old functionality isn’t lost. This code (veryunwisely)redefines Array#length, creating a copyof the original method with an alias: class Array alias :length_old :length def length return length_old / 2 end end Note that the alias Array#size still works as it did before: array = [1, 2, 3, 4] array.length # => 2 array.size # => 4 array.length_old # => 4 Since the old implementation is still available, it can be aliased back to its original name once the overridden implementation is no longer needed. class Array alias :length :length_old end array.length # => 4 If you find this behavior confusing, your best alternative is to avoid alias altogether. Instead, define a method with the new name that simplydelegates to the “real” * Throughout this book, we use Array#size instead of Array#length. We do this mainlybecause it makes the lines of code a little shorter and easier to fit on the page. This is probablynot a concern for you,so use which- ever one you’re comfortable with. 364 | Chapter 10: Reflection and Metaprogramming method. Here I’ll modifythe InventoryItem class so that cost delegates to price, rather than having alias create a copy of price and calling the copy cost. class InventoryItem def cost(*args) price(*args) end end If I then decide to modify price to tack on sales tax, cost will not have to be modi- fied or realiased. bacon.cost(100) # => 399.0 require 'bigdecimal' require 'bigdecimal/util' class InventoryItem def price(quantity=1, sales_tax=BigDecimal.new("0.0725")) base_price = (unit_price * quantity).to_d price = (base_price + (base_price * sales_tax).round(2)).to_f end end bacon.price(100) # => 427.93 bacon.cost(100) # => 427.93 We don’t even need to change the signature of the cost method to match that of price, since we used the *args construction to accept and delegate any arguments at all: bacon.cost(100, BigDecimal.new("0.05")) # => 418.95 See Also • Recipe 2.9, “Converting Between Degrees and Radians” • Recipe 4.7, “Making Sure a Sorted Array Stays Sorted” • Recipe 17.14, “Running Multiple Analysis Tools at Once” 10.15 Doing Aspect-Oriented Programming Problem You want to “wrap” a method with new code, so that calling the method triggers some new feature in addition to the original code. Solution You can arrange for code to be called before and after a method invocation byusing method aliasing and metaprogramming, but it’s simpler to use the glue gem or the 10.15 Doing Aspect-Oriented Programming | 365 AspectR third-party library. The latter lets you define “aspect” classes whose meth- ods are called before and after other methods. Here’s a simple example that traces calls to specific methods as they’re made: require 'aspectr' class Verbose < AspectR::Aspect def describe(method_sym, object, *args) "#{object.inspect}.#{method_sym}(#{args.join(",")})" end def before(method_sym, object, return_value, *args) puts "About to call #{describe(method_sym, object, *args)}." end def after(method_sym, object, return_value, *args) puts "#{describe(method_sym, object, *args)} has returned " + return_value.inspect + '.' end end Here, I’ll wrap the push and pop methods of an array. Every time I call those meth- ods, the aspect code will run and some diagnostics will be printed. verbose = Verbose.new stack = [] verbose.wrap(stack, :before, :after, :push, :pop) stack.push(10) # About to call [].push(10). # [10].push(10) has returned [[10]]. stack.push(4) # About to call [10].push(4). # [10, 4].push(4) has returned [[10, 4]]. stack.pop # About to call [10, 4].pop( ). # [10].pop( ) has returned [4]. Discussion There’s a pattern that shows up again and again in Ruby(we cover it in Recipe 7.10). You write a method that performs some task-specific setup (like initializing a timer), runs a code block, then performs task-specific cleanup (like stopping the timer and printing out timing results). Bypassing in a code block to one of these methods you give it a new aspect: the same code runs as if you’d just called Proc#call on the code block, but now it’s got something extra: the code gets timed, or logged, or won’t run without authentication, or it automatically performs some locking. 366 | Chapter 10: Reflection and Metaprogramming Aspect-oriented programming lets you permanently add these aspects to previously defined methods, without having to change anyof the code that calls them. It’s a good wayto modularize yourcode, and to modifyexisting code without having to do a lot of metaprogramming yourself. Though less mature, the AspectR library has the same basic features of Java’s AspectJ. The Aspect#wrap method modifies the methods of some other object or class. In the example above, the push and pop methods of the stack are modified: you could also modifythe Array#push and Array#pop methods themselves, bypassing in Array instead of stack. Aspect#wrap aliases the old implementations to new names, and defines the method anew to include calls to a “pre” method (@Verbose#before in the example) and/or a “post” method (@Verbose#after in the example). You can wrap the same method with different aspects at the same time: class EvenMoreVerbose < AspectR::Aspect def useless(method_sym, object, return_value, *args) puts "More useless verbosity." end end more_verbose = EvenMoreVerbose.new more_verbose.wrap(stack, :useless, nil, :push) stack.push(60) # About to call [10].push(60). # More useless verbosity. # [10, 60].push(60) has returned [[10, 60]]. You can also undo the effects of a wrap call with Aspect#unwrap. verbose.unwrap(stack, :before, :after, :push, :pop) more_verbose.unwrap(stack, :useless, nil, :push) stack.push(100) # => [10, 60, 100] Because theyuse aliasing under the covers, youcan’t use AspectR or glue to attach aspects to operator methods like <<. If you do, AspectR (for instance) will try to define a method called __aop_ _singleton_<<, which isn’t a valid method name. You’ll need to do the alias yourself, using a method name like “old_lshift”, and define a new << method that makes the pre- and post-calls. See Also • The AspectR home page is at http://aspectr.sourceforge.net/ • Recipe 7.10, “Hiding Setup and Cleanup in a Block Method” • Recipe 10.14, “Aliasing Methods” • Recipe 20.4, “Synchronizing Access to an Object” 10.16 Enforcing Software Contracts | 367 10.16 Enforcing Software Contracts Credit: Maurice Codik Problem You want your methods to to validate their arguments, using techniques like duck typing and range validation, without filling your code with tons of conditions to test arguments. Solution Here’s a Contracts module that you can mix in to your classes. Your methods can then define and enforce contracts. module Contracts def valid_contract(input) if @user_defined and @user_defined[input] @user_defined[input] else case input when :number lambda { |x| x.is_a? Numeric } when :string lambda { |x| x.respond_to? :to_str } when :anything lambda { |x| true } else lambda { |x| false } end end end class ContractViolation < StandardError end def define_data(inputs={}.freeze) @user_defined ||= {} inputs.each do |name, contract| @user_defined[name] = contract if contract.respond_to? :call end end def contract(method, *inputs) @contracts ||= {} @contracts[method] = inputs method_added(method) end def setup_contract(method, inputs) @contracts[method] = nil method_renamed = "_ _#{method}".intern 368 | Chapter 10: Reflection and Metaprogramming conditions = "" inputs.flatten.each_with_index do |input, i| conditions << %{ if not self.class.valid_contract(#{input.inspect}).call(args[#{i}]) raise ContractViolation, "argument #{i+1} of method '#{method}' must" + "satisfy the '#{input}' contract", caller end } end class_eval %{ alias_method #{method_renamed.inspect}, #{method.inspect} def #{method}(*args) #{conditions} return #{method_renamed}(*args) end } end def method_added(method) inputs = @contracts[method] setup_contract(method, inputs) if inputs end end You can call the define_data method to define contracts, and call the contract method to apply these contracts to your methods. Here’s an example: class TestContracts def hello(n, s, f) n.times { f.write "hello #{s}!\n" } end The hello method takes as its arguments a positive number, a string, and a file-type object that can be written to. The Contracts module defines a :string contract for making sure an item is stringlike. We can define additional contracts as code blocks; these contracts make sure an object is a positive number, or an open object that sup- ports the write method: extend Contracts writable_and_open = lambda do |x| x.respond_to?('write') and x.respond_to?('closed?') and not x.closed? end define_data(:writable => writable_and_open, :positive => lambda {|x| x >= 0 }) Now we can call the contract method to create a contract for the three arguments of the hello method: contract :hello, [:positive, :string, :writable] end 10.16 Enforcing Software Contracts | 369 Here it is in action: tc = TestContracts.new tc.hello(2, 'world', $stdout) # hello world! # hello world! tc.hello(-1, 'world', $stdout) # Contracts::ContractViolation: argument 1 of method 'hello' must satisfy the # 'positive' contract tc.hello(2, 3001, $stdout) # test-contracts.rb:22: argument 2 of method 'hello' must satisfy the # 'string' contract (Contracts::ContractViolation) closed_file = open('file.txt', 'w') { } tc.hello(2, 'world', closed_file) # Contracts::ContractViolation: argument 3 of method 'hello' must satisfy the # 'writable' contract Discussion The Contracts module uses manyof Ruby’smetaprogramming features to make these runtime checks possible. The line of code that triggers it all is this one: contract :hello, [:positive, :string, :writable] That line of code replaces the old implementation of hello with one that looks like this: def hello(n,s,f) if not (n >= 0) raise ContractViolation, "argument 1 of method 'hello' must satisfy the 'positive' contract", caller end if not (s.respond_to? String) raise ContractViolation, "argument 2 of method 'hello' must satisfy the 'string' contract", caller end if not (f.respond_to?('write') and f.respond_to?('closed?') and not f.closed?) raise ContractViolation, "argument 3 of method 'hello' must satisfy the 'writable' contract", caller end return _ _hello(n,s,f) end def _ _hello(n,s,f) n.times { f.write "hello #{s}!\n" } end 370 | Chapter 10: Reflection and Metaprogramming The bodyof define_data is simple: it takes a hash that maps contract names to Proc objects, and adds each new contract definition to the user_defined hash of custom contracts for this class. The contract method takes a method symbol and an array naming the contracts to impose on that method’s arguments. It registers a new set of contracts bysending them to the method symbol in the @contracts hash. When Rubyadds a method defi- nition to the class, it automaticallycalls the Contracts::method_added hook, passing in the name of the method name as the argument. Contracts::method_added checks whether or not the newlyadded method has a contract defined for it. If it finds one, it calls setup_contract. All of the heavy lifting is done in setup_contract. This is how it works, step by step: • Remove the method’s information in @contracts. This prevents an infinite loop when we redefine the method using alias_method later. • Generate the new name for the method. In this example, we simplyappend two underscores to the front. • Create all of the code to test the types of the arguments. We loop through the arguments using Enumerable#each_with_index, and build up a string in the conditions variable that contains the code we need. The condition code uses the valid_contract method to translate a contract name (such as :number), to a Proc object that checks whether or not its argument satisfies that contract. • Use class_eval to insert our code into the class that called extend Contracts. The code in the eval statment does the following: • Call alias_method to rename the newlyadded method to our generated name. • Define a new method with the original’s name that checks all of our condi- tions and then calls the renamed function to get the original functionality. See Also • Recipe 13.14, “Validating Data with ActiveRecord” • Ruby also has an Eiffel-style Design by Contract library, which lets you define invariants on classes, and pre- and post-conditions on methods; it’s available as the dbc gem 371 Chapter 11 CHAPTER 11 XML and HTML11 XML and HTML are the most popular markup languages (textual ways of describ- ing structured data). HTML is used to describe textual documents, like you see on the Web. XML is used for just about everything else: data storage, messaging, config- uration files, you name it. Just about every software buzzword forged over the past few years involves XML. Java and C++ programmers tend to regard XML as a lightweight, agile technology, and are happyto use it all over the place. XML is a lightweight technology, but only compared to Java or C++. Rubyprogrammers see XML from the other end of the spectrum, and from there it looks prettyheavy.Simpler formats like YAML and JSON usuallywork just as well (see Recipe 13.1 or Recipe 13.2), and are easier to manipulate. But to shun XML altogether would be to cut Rubyoff from the rest of the world, and nobodywants that. This chapter covers the most useful waysof pars- ing, manipulating, slicing, and dicing XML and HTML documents. There are two standard APIs for manipulating XML: DOM and SAX. Both are over- kill for most everyday uses, and neither is a good fit for Ruby’s code-block–heavy style. Ruby’s solution is to offer a pair of APIs that capture the style of DOM and SAX while staying true to the Ruby programming philosophy.* Both APIs are in the standard library’s REXML package, written by Sean Russell. Like DOM, the Document class parses an XML document into a nested tree of objects. You can navigate the tree with Rubyaccessors (Recipe 11.2) or with XPath queries (Recipe 11.4). You can modifythe tree bycreating yourown Element and Text objects (Recipe 11.9). If even Document is too heavyweight for you, you can use the XmlSimple library to transform an XML file into a nested Ruby hash (Recipe 11.6). With a DOM-style API like Document, you have to parse the entire XML file before you can do anything. The XML document becomes a large number of Ruby objects * REXML also provides the SAX2Parser and SAX2Listener classes, which implement the basic SAX2 API. 372 | Chapter 11: XML and HTML nested under a Document object, all sitting around taking up memory. With a SAX- style parser like the StreamParser class, you can process a document as it’s parsed, creating only the objects you want. The StreamParser API is covered in Recipe 11.3. The main problem with the REXML APIs is that they’re very picky. They’ll only parse a document that’s valid XML, or close enough to be have an unambiguous rep- resentation. This makes them nearlyuseless for parsing HTML documents off the World Wide Web, since the average web page is not valid XML. Recipe 11.5 shows how to use the third-partytools RubyfulSoup and SGMLParser; theygive a DOM- or SAX-style interface that handles even invalid XML. • http://www.germane-software.com/software/rexml/ • http://www.germane-software.com/software/rexml/docs/tutorial.html 11.1 Checking XML Well-Formedness Credit: Rod Gaither Problem You want to check that an XML document is well-formed before processing it. Solution The best wayto see whether a document is well-formed is to tryto parse it. The REXML libraryraises an exception when it can’t parse an XML document, so just try parsing it and rescue any exception. The valid_xml? method below returns nil unless it’s given a valid XML document. If the document is valid, it returns a parsed Document object, so you don’t have to parse it again: require 'rexml/document' def valid_xml?(xml) begin REXML::Document.new(xml) rescue REXML::ParseException # Return nil if an exception is thrown end end Discussion To be useful, an XML document must be structured correctlyor “well-formed.” For instance, an opening tag must either be self-closing or be paired with an appropriate closing tag. As a file and messaging format, XML is often used in situations where you don’t have control over the input, so you can’t assume that it will always be well-formed. Rather 11.1 Checking XML Well-Formedness | 373 than just letting REXML throw an exception, you’ll need to handle ill-formed XML gracefully, providing options to retry or continue on a different path. This bit of XML is not well-formed: it’s missing ending tags for both the pending and done elements: bad_xml = %{ Grocery Shopping Dry Cleaning } valid_xml?(bad_xml) # => nil This bit of XML is well-formed, so valid_xml? returns the parsed Document object. good_xml = %{ Wheat Quadrotriticale } doc = valid_xml?(good_xml) doc.root.elements[1] # => ... When your program is responsible for writing XML documents, you’ll want to write unit tests that make sure you generate valid XML. You can use a feature of the Test:: Unit libraryto simplifythe checking. Since invalid XML makes REXML throw an exception, your unit test can use the assert_nothing_thrown method to make sure your XML is valid: doc = nil assert_nothing_thrown {doc = REXML::Document.new(source_xml)} This is a simple, clean test to verify XML when using a unit test. Note that valid_xml? doesn’t work perfectly: some invalid XML is unambiguous, which means REXML can parse it. Consider this truncated version of the valid XML example. It’s missing its closing tags, but there’s no ambiguityabout which closing tag should come first, so REXML can parse the file and provide the closing tags: invalid_xml = %{ Wheat } (valid_xml? invalid_xml) == nil # => false # That is, it is "valid" REXML::Document.new(invalid_xml).write # # Wheat # 374 | Chapter 11: XML and HTML See Also • Official information on XML can be found at http://www.w3.org/XML/ • The Wikipedia has a good description of the difference between Well-Formed and Valid XML documents at http://en.wikipedia.org/wiki/Xml#Correctness_in_ an_XML_document • Recipe 11.5, “Parsing Invalid Markup” • Recipe 17.3, “Handling an Exception” 11.2 Extracting Data from a Document’s Tree Structure Credit: Rod Gaither Problem You want to parse an XML file into a Rubydata structure, to traverse it or extract data from it. Solution Pass an XML document into the REXML::Document constructor to load and parse the XML. A Document object contains a tree of subobjects (of class Element and Text) rep- resenting the tree structure of the underlying document. The methods of Document and Element give you access to the XML tree data. The most useful of these methods is #each_element. Here’s some sample XML and the load process. The document describes a set of orders, each of which contains a set of items. This particular document contains a single order for two items. orders_xml = %{ 105 02/10/2006 Corner Store } require 'rexml/document' orders = REXML::Document.new(orders_xml) 11.2 Extracting Data from a Document’s Tree Structure | 375 To process each order in this document, we can use Document#root to get the docu- ment’s root element () and then call Element#each_element to iterate over the children of the root element (the elements). This code repeatedlycalls each to move down the document tree and print the details of each order in the document: orders.root.each_element do |order| # each in order.each_element do |node| # , , etc. in if node.has_elements? node.each_element do |child| # each in puts "#{child.name}: #{child.attributes['desc']}" end else # the contents of , , etc. puts "#{node.name}: #{node.text}" end end end # number: 105 # date: 02/10/2006 # customer: Corner Store # item: Red Roses # item: Candy Hearts Discussion Parsing an XML file into a Document gives you a tree-like data structure that you can treat kind of like an array of arrays. Starting at the document root, you can move down the tree until you find the data that interests you. In the example above, note how the structure of the Rubycode mirrors the structure of the original document. Everycall to each_element moves the focus of the code down a level: from to to to . There are manyother methods of Element you can use to navigate the tree structure of an XML document. Not only can you iterate over the child elements, you can ref- erence a specific child byindexing the parent as though it were an array.You can navigate through siblings with Element.next_element and Element.previous_element. You can move up the document tree with Element.parent: my_order = orders.root.elements[1] first_node = my_order.elements[1] first_node.name # => "number" first_node.next_element.name # => "date" first_node.parent.name # => "order" This onlyscratches the surface; there are manyother waysto interact with the data loaded from an XML source. For example, explore the convenience methods Element.each_element_with_attribute and Element.each_element_with_text, which let you select elements based on features of the elements themselves. 376 | Chapter 11: XML and HTML See Also • The RDoc documentation for the REXML::Document and REXML::Element classes • The section “Tree Parsing XML and Accessing Elements” in the REXML Tutorial (http://www.germane-software.com/software/rexml/docs/tutorial.html#id2247335) • If you want to start navigating the document at some point other than the root, an XPath statement is probablythe simplest wayto get where youwant; see Recipe 11.4, “Navigating a Document with XPath” 11.3 Extracting Data While Parsing a Document Credit: Rod Gaither Problem You want to process a large XML file without loading it all into memory. Solution The method REXML::Document.parse_stream gives you a fast and flexible way to scan a large XML file and process the parts that interest you. Consider this XML document, the output of a hypothetical program that runs auto- mated tasks. We want to parse the document and find the tasks that failed (that is, returned an error code other than zero). event_xml = %{ } We can process the document as it’s being parsed bywriting a REXML:: StreamListener subclass that responds to parsing events such as tag_start and tag_ end. Here’s a subclass that listens for tags with a nonzero value for their error attribute. It prints a message for every failed event it finds. require 'rexml/document' require 'rexml/streamlistener' class ErrorListener include REXML::StreamListener def tag_start(name, attrs) if attrs["error"] != nil and attrs["error"] != "0" puts %{Event "#{name}" failed for system "#{attrs["system"]}" } + %{with code #{attrs["error"]}} end end end 11.4 Navigating a Document with XPath | 377 To actuallyparse the XML data, pass it along with the StreamListener into the method REXML::Document.parse_stream: REXML::Document.parse_stream(event_xml, ErrorListener.new) # Event "clean" failed for system "dev" with code 1 # Event "backup" failed for system "dev" with code 2 Discussion We could find the failed events in less code byloading the XML into a Document and running an XPath query. That approach would work fine for this example, since the document onlycontains four events. It wouldn’t work as well if the document were a file on disk containing a billion events. Building a Document means building an elabo- rate in-memorydata structure representing the entire XML document. If youonly care about part of a document (in this case, the failed events), it’s faster and less memory-intensive to process the document as it’s being parsed. Once the parser reaches the end of the document, you’re done. The stream-oriented approach to parsing XML can be as simple as shown in this rec- ipe, but it can also handle much more complex scenarios. Your StreamListener sub- class can keep arbitrarystate in instance variables, letting youtrack complex combinations of elements and attributes. See Also • The RDoc documentation for the REXML::StreamParser class • The “Stream Parsing” section of the REXML Tutorial (http://www.germane- software.com/software/rexml/docs/tutorial.html#id2248457) • Recipe 11.2, “Extracting Data from a Document’s Tree Structure” 11.4 Navigating a Document with XPath Problem You want to find or address sections of an XML document in a standard, programming-language–independent way. Solution The XPath language defines a wayof referring to almost anyelement or set of ele- ments in an XML document, and the REXML librarycomes with a complete XPath implementation. REXML::XPath provides three class methods for locating Element objects within parsed documents: first, each, and match. Take as an example the following XML description of an aquarium. The aquarium contains some fish and a gaudycastle decoration full of algae. Due to an aquarium 378 | Chapter 11: XML and HTML stocking mishap, some of the smaller fish have been eaten bylarger fish, just like in those cartoon food chain diagrams. (Figure 11-1 shows the aquarium.) xml = %{ } require 'rexml/document' doc = REXML::Document.new xml We can use REXML::Xpath.first to get the Element object corresponding to the first tag in the document: REXML::XPath.first(doc, '//fish') # => We can use match to get an array containing all the elements that are green: REXML::XPath.match(doc, '//[@color="green"]') # => [ ... , ] We can use each with a code block to iterate over all the fish that are inside other fish: def describe(fish) "#{fish.attribute('size')} #{fish.attribute('color')} fish" end Figure 11-1. The aquarium Aquarium Small blue fish Large orange fish Small green fish Tiny red fish Gaudy castle Green algae 11.4 Navigating a Document with XPath | 379 REXML::XPath.each(doc, '//fish/fish') do |fish| puts "The #{describe(fish.parent)} has eaten the #{describe(fish)}." end # The large orange fish has eaten the small green fish. # The small green fish has eaten the tiny red fish. Discussion Everyelement in a Document has an xpath method that returns the canonical XPath path to that element. This path can be considered the element’s “address” within the document. In this example, a complex bit of Rubycode is replaced bya simple XPath expression: red_fish = doc.children[0].children[3].children[1].children[1] # => red_fish.xpath # => "/aquarium/fish[2]/fish/fish" REXML::XPath.first(doc, red_fish.xpath) # => Even a brief overview of XPath is beyond the scope of this recipe, but here are some more examples to give you ideas: # Find the second green element. REXML::XPath.match(doc, '//[@color="green"]')[1] # => # Find the color attributes of all small fish. REXML::XPath.match(doc, '//fish[@size="small"]/@color') # => [color='blue', color='green'] # Count how many fish are inside the first large fish. REXML::XPath.first(doc, "count(//fish[@size='large'][1]//*fish)") # => 2 The Elements class acts kind of like an arraythat supports XPath addressing. You can make your code more concise by passing an XPath expression to Elements#each,or using it as an array index. doc.elements.each('//fish') { |f| puts f.attribute('color') } # blue # orange # green # red doc.elements['//fish'] # => Within an XPath expression, the first element in a list has an index of 1, not 0. The XPath expression //fish[size='large'][1] matches the first large fish, not the 380 | Chapter 11: XML and HTML second large fish, the way large_fish[1] would in Rubycode. Pass a number as an array index to an Elements object, and you get the same behavior as XPath: doc.elements[1] # => ... doc.children[0] # => ... See Also • The XPath standard, at http://www.w3.org/TR/xpath, has more XPath examples • XPath and XPointer by John E. Simpson (O’Reilly) 11.5 Parsing Invalid Markup Problem You need to extract data from a document that’s supposed to be HTML or XML, but that contains some invalid markup. Solution For a quick solution, use Rubyful Soup, written by Leonard Richardson and found in the rubyful_soup gem. It can build a document model even out of invalid XML or HTML, and it offers an idiomatic Rubyinterface for searching the document model. It’s good for quick screen-scraping tasks or HTML cleanup. require 'rubygems' require 'rubyful_soup' invalid_html = 'A lot of tags are never closed.' soup = BeautifulSoup.new(invalid_html) puts soup.prettify # A lot of # tags are # never closed. # # soup.b.i # => never closed. soup.i # => never closed. soup.find(nil, :attrs=>{'class' => '2'}) # => never closed. soup.find_all('i') # => [never closed.] soup.b['class'] # => "1" soup.find_text(/closed/) # => "never closed." If you need better performance, do what Rubyful Soup does and write a custom parser on top of the event-based parser SGMLParser (found in the htmltools gem). It works a lot like REXML’s StreamListener interface. 11.5 Parsing Invalid Markup | 381 Discussion Sometimes it seems like the authors of markup parsers do their coding atop an ivory tower. Most parsers simplyrefuse to parse bad markup, but this cuts off an enormous source of interesting data. Most of the pages on the World Wide Web are invalid HTML, so if your application uses other peoples’ web pages as input, you need a for- giving parser. Invalid XML is less common but by no means rare. The SGMLParser class in the htmltools gem uses regular expressions to parse an XML- like data stream. When it finds an opening or closing tag, some data, or some other part of an XML-like document, it calls a hook method that you’re supposed to define in a subclass. SGMLParser doesn’t build a document model or keep track of the docu- ment state: it just generates events. If closing tags don’t match up or if the markup has other problems, it won’t even notice. Rubyful Soup’s parser classes define SGMLParser hook methods that build a docu- ment model out of an ambiguous document. Its BeautifulSoup class is intended for HTML documents: it uses heuristics like a web browser’s to figure out what an ambiguous document “really” means. These heuristics are specific to HTML; to parse XML documents, you should use the BeautifulStoneSoup class. You can also subclass BeautifulStoneSoup and implement your own heuristics. Rubyful Soup builds a densely linked model of the entire document, which uses a lot of memory. If you only need to process certain parts of the document, you can imple- ment the SGMLParser hooks yourself and get a faster parser that uses less memory. Here’s a SGMLParser subclass that extracts URLs from a web page. It checks every A tag for an href attribute, and keeps the results in a set. Note the similarityto the LinkGrabber class defined in Recipe 11.13. require 'rubygems' require 'html/sgml-parser' require 'set' html = %{O'Reilly irrelevantRuby} class LinkGrabber < HTML::SGMLParser attr_reader :urls def initialize @urls = Set.new super end def do_a(attrs) url = attrs.find { |attr| attr[0] == 'href' } @urls << url[1] if url end end 382 | Chapter 11: XML and HTML extractor = LinkGrabber.new extractor.feed(html) extractor.urls # => # The equivalent Rubyful Soup program is quicker to write and easier to understand, but it runs more slowly and uses more memory: require 'rubyful_soup' urls = Set.new BeautifulStoneSoup.new(html).find_all('a').each do |tag| urls << tag['href'] if tag['href'] end You can improve performance by telling Rubyful Soup’s parser to ignore everything except A tags and their contents: puts BeautifulStoneSoup.new(html, :parse_only_these => 'a') # # O'Reilly # Ruby But the fastest implementation will always be a custom SGMLParser subclass. If your parser is part of a full application (rather than a one-off script), you’ll need to find the best tradeoff between performance and code legibility. See Also • Recipe 11.13, “Extracting All the URLs from an HTML Document” • The Rubyful Soup documentation (http://www.crummy.com/software/RubyfulSoup/ documentation.html) • The htree librarydefines a forgiving HTML/XML parser that can convert a parsed document into a REXML Document object (http://cvs.m17n.org/~akr/htree/) • The HTML TIDY librarycan fix up most invalid HTML so that it can be parsed by a standard parser; it’s a C librarywith Rubybindings; see http://tidy.sourceforge. net/ for the library, and http://rubyforge.org/projects/tidy for the bindings 11.6 Converting an XML Document into a Hash Problem When you parse an XML document with Document.new, you get a representation of the document as a complex data structure. You’d like to represent an XML docu- ment using simple, built-in Ruby data structures. 11.6 Converting an XML Document into a Hash | 383 Solution Use the XmlSimple library, found in the xml-simple gem. It parses an XML document into a hash. Consider an XML document like this one: xml = %{ Phyllo dough Ice cream } Here’s how you parse it with XMLSimple: require 'rubygems' require 'xmlsimple' doc = XmlSimple.xml_in xml And here’s what it looks like: require 'pp' pp doc # {"icecubetray"=>[{"cube2"=>[{}], "cube1"=>[{}]}], # "food"=>["Phyllo dough", "Ice cream"], # "scale"=>"celcius", # "temp"=>"-12"} Discussion XmlSimple is a lightweight alternative to the Document class. Instead of exposing a tree of Element objects, it exposes a nested structure of Rubyhashes and arrays.There’s no performance savings (XmlSimple actuallybuilds a Document class behind the scenes and iterates over it, so it’s about half as fast as Document), but the resulting object is easyto use. XmlSimple also provides several tricks that can make a document more concise and navigable. The most useful trick is the KeyAttr one. Suppose you had a better-organized freezer than the one above, a freezer in which everything had its own name attribute:* xml = %{ * Okay, it’s not really better organized. In fact, it’s exactly the same. But it sure looks cooler! 384 | Chapter 11: XML and HTML } You could parse this data with just a call to XmlSimple.xml_in, but you get a more con- cise representation by specifing the name attribute as a KeyAttr argument. Compare: parsed1 = XmlSimple.xml_in xml pp parsed1 # {"scale"=>"celcius", # "item"=> # [{"name"=>"Phyllo dough", "type"=>"food"}, # {"name"=>"Ice cream", "type"=>"food"}, # {"name"=>"Ice cube tray", # "type"=>"container", # "item"=> # [{"name"=>"Ice cube", "type"=>"food"}, # {"name"=>"Ice cube", "type"=>"food"}]}], # "temp"=>"-12"} parsed2 = XmlSimple.xml_in(xml, 'KeyAttr' => 'name') pp parsed2 # {"scale"=>"celcius", # "item"=> # {"Phyllo dough"=>{"type"=>"food"}, # "Ice cube tray"=> # {"type"=>"container", # "item"=>{"Ice cube"=>{"type"=>"food"}}}, # "Ice cream"=>{"type"=>"food"}}, # "temp"=>"-12"} The second parsing is also easier to navigate: parsed1["item"].detect { |i| i['name'] == 'Phyllo dough' }['type'] # => "food" parsed2["item"]["Phyllo dough"]["type"] # => "food" But notice that the second parsing represents the ice cube trayas containing onlyone ice cube. This is because both ice cubes have the same name. When two tags at the same level have the same KeyAttr, one overwrites the other in the hash. You can modifythe data structure with normal Rubyhash and arraymethods, then write it back out to XML with XMLSimple.xml_out: parsed1["item"] << {"name"=>"Curry leaves", "type"=>"spice"} parsed1["item"].delete_if { |i| i["name"] == "Ice cube tray" } puts XmlSimple.xml_out(parsed1, "RootName"=>"freezer") # # # # # 11.7 Validating an XML Document | 385 Be sure to specifya RootName argument when you call xml_out. When it parses a file, XmlSimple removes one level of indirection bythrowing awaythe name of yourdocu- ment’s root element. You can prevent this byusing the KeepRoot argument in your original call to xml_in. You’ll need an extra hash lookup to navigate the resulting data structure, but you’ll retain the name of your root element. parsed3 = XmlSimple.xml_in(xml, 'KeepRoot'=>true) # Now there's no need to add an extra root element when writing back to XML. XmlSimple.xml_out(parsed3, 'RootName'=>nil) One disadvantage of XmlSimple is that, since it puts elements into a hash, it replaces the order of the original document with the random-looking order of a Rubyhash. This is fine for a document listing the contents of a freezer—where order doesn’t matter—but it would give interesting results if you tried to use it on a web page. Another disadvantage is that, since an element’s attributes and children are put into the same hash, you have no reliable way of telling one from the other. Indeed, attributes and subelements may even end up in a list together, as in this example: pp XmlSimple.xml_in(%{ Body of temporary worker who knew too much }) # {"scale"=>"celcius", # "temp"=>["-12", "Body of temp worker who knew too much"]} See Also • The XmlSimple home page at http://www.maik-schmidt.de/xml-simple.html has much more information about the options you can pass to XmlSimple.xml_in 11.7 Validating an XML Document Credit: Mauro Cicio Problem You want to check whether an XML document conforms to a certain schema or DTD. Solution Unfortunately, as of this writing there are no stable, pure Ruby libraries that do XML validation. You’ll need to install a Rubybinding to a C library.The easiest one to use is the Rubybinding to the GNOME libxml2 toolkit. (There are actuallytwo Ruby bindings to libxml2, so don’t get confused: we’re referring to the one you get when you install the libxml-ruby gem.) To validate a document against a DTD, create a a Dtd object and pass it into Document#validate. To validate against an XML Schema, pass in a Schema object instead. 386 | Chapter 11: XML and HTML Consider the following DTD, for a cookbook like this one: require 'rubygems' require 'libxml' dtd = XML::Dtd.new(%{ }) Here’s an XML document that looks like it conforms to the DTD: open('cookbook.xml', 'w') do |f| f.write %{ A recipe A difficult/common problem A smart solution A deep solution Pointers } end But does it really? We can tell for sure with Document#validate: document = XML::Document.file('cookbook.xml') document.validate(dtd) # => true Here’s a Schema definition for the same document. We can validate the document against the schema bymaking it into a Schema object and passing that into Document#validate: schema = XML::Schema.from_string %{ 11.7 Validating an XML Document | 387 } document.validate(schema) # => true Discussion Programs that use XML validation are more robust and less complicated than non- validating versions. Before starting work on a document, you can check whether or not it’s in the format you expect. Most services that accept XML as input don’t have forgiving parsers, so you must validate your document before submitting it or it might fail without you even noticing. One of the most popular and complete XML libraries around is the GNOME Libxml2 library. Despite its name, it works fine outside the GNOME platform, and has been ported to manydifferent OSes. The Rubyproject libxml (http://libxml. rubyforge.org) is a Rubywrapper around the GNOME Libxml2 library.The project is not yet in a mature state, but it’s very active and the validation features are defini- tivelyusable. Not onlydoes libxml support validation and a complete range of XML manipolation techniques, it can also improve your program’s speed by an order of magnitude, since it’s written in C instead of REXML’s pure Ruby. Don’t confuse the libxml project with the libxml library. The latter is part of the XML::Tools project. It binds against the GNOME Libxml2 library, but it doesn’t expose that library’s validation features. If you try the example code above but can’t find the XML::Dtd or the XML::Schema classes, then you’ve got the wrong binding. If you installed the libxml-ruby package on Debian GNU/Linux, you’ve got the wrong one. You need the one you get by installing the libxml-ruby gem. Of course, you’ll need to have the actual GNOME libxml library installed as well. See Also • The Ruby libxml project page (http://www.rubyforge.org/projects/libxml) • The other Ruby libxml binding (the one that doesn’t do validation) is part of the XML::Tools project (http://rubyforge.org/projects/xml-tools/); don’t confuse the two! • The GNOME libxml project homepage (http://xmlsoft.org/) • Refer to http://www.w3.org/XML for the difference between a DTD and a Schema 388 | Chapter 11: XML and HTML 11.8 Substituting XML Entities Problem You’ve parsed a document that contains internal XML entities. You want to substi- tute the entities in the document for their values. Solution To perform entitysubstitution on a specific text element, call its value method. If it’s the first text element of its parent, you can call text on the parent instead. Here’s a simple document that defines and uses two entities in a single text node. We can substitute those entities for their values without changing the document itself: require 'rexml/document' str = %{ ]> &product; v&version; is the most advanced astronomy product on the market. } doc = REXML::Document.new str doc.root.children[0].value # => "\n Stargaze v2.3 is the most advanced astronomy product on the market.\n" doc.root.text # => "\n Stargaze v2.3 is the most advanced astronomy product on the market.\n" doc.root.children[0].to_s # => "\n &product; v&version; is the most advanced astronomy product on the market.\n" doc.root.write # # &product; v&version; is the most advanced astronomy program on the market. # Discussion Internal XML entities are often used to factor out data that changes a lot, like dates or version numbers. But REXML onlyprovides a convenient wayto perform substi- tution on a single text node. What if you want to perform substitutions throughout the entire document? When you call Document#write to send a document to some IO object, it ends up call- ing Text#to_s on each text node. As seen in the Solution, this method presents a “normalized” view of the data, one where entities are displayed instead of having their values substituted in. 11.8 Substituting XML Entities | 389 We could write our own version of Document#write that presents an “unnormalized” view of the document, one with entityvalues substituted in, but that would be a lot of work. We could hack Text#to_s to work more like Text#value, or hack Text#write to call the value method instead of to_s. But it’s less intrusive to do the entity replacement outside of the write method altogether. Here’s a class that wraps any IO object and performs entity replacement on all the text that comes through it: require 'delegate' require 'rexml/text' class EntitySubstituter < DelegateClass(IO) def initialize(io, document, filter=nil) @document = document @filter = filter super(io) end def <<(s) super(REXML::Text::unnormalize(s, @document.doctype, @filter)) end end output = EntitySubstituter.new($stdout, doc) doc.write(output) # # # ]> # # Stargaze v2.3 is the most advanced astronomy product on the market. # Because it processes the entire output of Document#write, this code will replace all entityreferences in the document. This includes anyreferences found in attribute values, which may or may not be what you want. If you create a Text object manually, or set the value of an existing object, REXML assumes that you’re giving it unnormalized text, and normalizes it. This can be prob- lematic if your text contains strings that happen to be the values of entities: text_node = doc.root.children[0] text_node.value = "&product; v&version; has a catalogue of 2.3 " + "million celestial objects." doc.write # # # ]> # &product; v&version; has a catalogue of &version; million celestial objects. 390 | Chapter 11: XML and HTML To avoid this, you can create a “raw” text node: text_node.raw = true doc.write # # # ]> # &product; v&version; has a catalogue of 2.3 million celestial objects. text_node.value # => "Stargaze v2.3 has a catalogue of 2.3 million celestial objects." text_node.to_s # => "&product; v&version; has a catalogue of 2.3 million celestial objects." In addition to entities you define, REXML automatically processes five named char- acter entities: the ones for left and right angle brackets, single and double quotes, and the ampersand. Each is replaced with the corresponding ASCII character. str = %{ ]> © &year; Komodo Dragon & Bob Productions } doc = REXML::Document.new str text_node = doc.root.children[0] text_node.value # => "© 2006 Komodo Dragon & Bob Productions" text_node.to_s # => "© &year; Komodo Dragon & Bob Productions" “©” is an HTML character entity representing the copyright symbol, but REXML doesn’t know that. It onlyknows about the five XML character entities. Also, REXML onlyknows about internal entities: ones whose values are defined within the same document that uses them. It won’t resolve external entities. See Also • The section “Text Nodes” of the REXML tutorial (http://www.germane-software. com/software/rexml/docs/tutorial.html#id2248004) 11.9 Creating and Modifying XML Documents Problem You want to modify an XML document, or create a new one from scratch. Solution To create an XML document from scratch, just start with an empty Document object. 11.9 Creating and Modifying XML Documents | 391 require 'rexml/document' require doc = REXML::Document.new To add a new element to an existing document, pass its name and anyattributes into its parent’s add_element method. You don’t have to create the Element objects yourself. meeting = doc.add_element 'meeting' meeting_start = Time.local(2006, 10, 31, 13) meeting.add_element('time', { 'from' => meeting_start, 'to' => meeting_start + 3600 }) doc.children[0] # => ... doc.children[0].children[0] # => "