软件演化与维护


SOFTWARE EVOLUTION AND MAINTENANCE PRIYADARSHI TRIPATHY KSHIRASAGAR NAIK A PRACTITIONER’S APPROACH www.it-ebooks.info www.it-ebooks.info SOFTWARE EVOLUTION AND MAINTENANCE www.it-ebooks.info www.it-ebooks.info SOFTWARE EVOLUTION AND MAINTENANCE A Practitioner’s Approach PRIYADARSHI TRIPATHY KSHIRASAGAR NAIK www.it-ebooks.info Copyright © 2015 by John Wiley & Sons, Inc. All rights reserved. Published by John Wiley & Sons, Inc., Hoboken, New Jersey. Published simultaneously in Canada. No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, recording, scanning, or otherwise, except as permitted under Section 107 or 108 of the 1976 United States Copyright Act, without either the prior written permission of the Publisher, or authorization through payment of the appropriate per-copy fee to the Copyright Clearance Center, Inc., 222 Rosewood Drive, Danvers, MA 01923, (978) 750-8400, fax (978) 646-8600, or on the web at www.copyright.com. Requests to the Publisher for permission should be addressed to the Permissions Department, John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, (201) 748-6011, fax (201) 748-6008. Limit of Liability/Disclaimer of Warranty: While the publisher and author have used their best efforts in preparing this book, they make no representations or warranties with respect to the accuracy or completeness of the contents of this book and specifically disclaim any implied warranties of merchantability or fitness for a particular purpose. No warranty may be created or extended by sales representatives or written sales materials. The advice and strategies contained herin may not be suitable for your situation. You should consult with a professional where appropriate. Neither the publisher nor author shall be liable for any loss of profit or any other commercial damages, including but not limited to special, incidental, consequential, or other damages. For general information on our other products and services please contact our Customer Care Department with the U.S. at 877-762-2974, outside the U.S. at 317-572-3993 or fax 317-572-4002. Wiley also publishes its books in a variety of electronic formats. Some content that appears in print, however, may not be available in electronic format. Library of Congress Cataloging-in-Publication Data Tripathy, Priyadarshi, 1958– Software evolution and maintenance : a practitioner’s approach / Priyadarshi Tripathy, Kshirasagar Naik. pages cm Includes index. ISBN 978-0-470-60341-3 (cloth) 1. Software maintenance. I. Naik, Kshirasagar, 1959– II. Title. QA76.76.S64T75 2015 005.1′6–dc23 2014033541 Printed in the United States of America. 10987654321 www.it-ebooks.info To our parents Kunjabihari and Surekha Tripathy Sukru and Teva Naik www.it-ebooks.info www.it-ebooks.info CONTENTS Preface xiii List of Figures xvii List of Tables xxi 1 Basic Concepts and Preliminaries 1 1.1 Evolution Versus Maintenance, 1 1.1.1 Software Evolution, 3 1.1.2 Software Maintenance, 4 1.2 Software Evolution Models and Processes, 6 1.3 Reengineering, 9 1.4 Legacy Systems, 11 1.5 Impact Analysis, 12 1.6 Refactoring, 13 1.7 Program Comprehension, 14 1.8 Software Reuse, 15 1.9 Outline of the Book, 16 References, 18 Exercises, 23 2 Taxonomy of Software Maintenance and Evolution 25 2.1 General Idea, 25 2.1.1 Intention-Based Classification of Software Maintenance, 26 2.1.2 Activity-Based Classification of Software Maintenance, 28 2.1.3 Evidence-Based Classification of Software Maintenance, 28 vii www.it-ebooks.info viii CONTENTS 2.2 Categories of Maintenance Concepts, 37 2.2.1 Maintained Product, 37 2.2.2 Maintenance Types, 40 2.2.3 Maintenance Organization Processes, 41 2.2.4 Peopleware, 43 2.3 Evolution of Software Systems, 44 2.3.1 SPE Taxonomy, 46 2.3.2 Laws of Software Evolution, 49 2.3.3 Empirical Studies, 54 2.3.4 Practical Implications of the Laws, 56 2.3.5 Evolution of FOSS Systems, 58 2.4 Maintenance of Cots-Based Systems, 61 2.4.1 Why Maintenance of CBS Is Difficult?, 62 2.4.2 Maintenance Activities for CBSs, 65 2.4.3 Design Properties of Component-Based Systems, 67 2.5 Summary, 70 Literature Review, 73 References, 75 Exercises, 80 3 Evolution and Maintenance Models 83 3.1 General Idea, 83 3.2 Reuse-Oriented Model, 84 3.3 The Staged Model for Closed Source Software, 87 3.4 The Staged Model for Free, Libre, Open Source Software, 90 3.5 Change Mini-Cycle Model, 91 3.6 IEEE/EIA Maintenance Process, 94 3.7 ISO/IEC 14764 Maintenance Process, 99 3.8 Software Configuration Management, 111 3.8.1 Brief History, 112 3.8.2 SCM Spectrum of Functionality, 113 3.8.3 SCM Process, 117 3.9 CR Workflow, 119 3.10 Summary, 125 Literature Review, 126 References, 129 Exercises, 131 4 Reengineering 133 4.1 General Idea, 133 4.2 Reengineering Concepts, 135 4.3 A General Model for Software Reengineering, 137 4.3.1 Types of Changes, 140 www.it-ebooks.info CONTENTS ix 4.3.2 Software Reengineering Strategies, 141 4.3.3 Reengineering Variations, 143 4.4 Reengineering Process, 144 4.4.1 Reengineering Approaches, 144 4.4.2 Source Code Reengineering Reference Model, 146 4.4.3 Phase Reengineering Model, 150 4.5 Code Reverse Engineering, 153 4.6 Techniques Used for Reverse Engineering, 156 4.6.1 Lexical Analysis, 157 4.6.2 Syntactic Analysis, 157 4.6.3 Control Flow Analysis, 157 4.6.4 Data Flow Analysis, 158 4.6.5 Program Slicing, 158 4.6.6 Visualization, 160 4.6.7 Program Metrics, 162 4.7 Decompilation Versus Reverse Engineering, 164 4.8 Data Reverse Engineering, 165 4.8.1 Data Structure Extraction, 168 4.8.2 Data Structure Conceptualization, 169 4.9 Reverse Engineering Tools, 170 4.10 Summary, 174 Literature Review, 176 References, 178 Exercises, 185 5 Legacy Information Systems 187 5.1 General Idea, 187 5.2 Wrapping, 189 5.2.1 Types of Wrapping, 189 5.2.2 Levels of Encapsulation, 191 5.2.3 Constructing a Wrapper, 192 5.2.4 Adapting a Program for Wrapper, 194 5.2.5 Screen Scraping, 194 5.3 Migration, 195 5.4 Migration Planning, 196 5.5 Migration Methods, 202 5.5.1 Cold Turkey, 202 5.5.2 Database First, 203 5.5.3 Database Last, 204 5.5.4 Composite Database, 205 5.5.5 Chicken Little, 206 5.5.6 Butterfly, 208 5.5.7 Iterative, 212 5.6 Summary, 217 www.it-ebooks.info x CONTENTS Literature Review, 218 References, 219 Exercises, 221 6 Impact Analysis 223 6.1 General Idea, 223 6.2 Impact Analysis Process, 225 6.2.1 Identifying the SIS, 228 6.2.2 Analysis of Traceability Graph, 229 6.2.3 Identifying the Candidate Impact Set, 231 6.3 Dependency-Based Impact Analysis, 234 6.3.1 Call Graph, 234 6.3.2 Program Dependency Graph, 235 6.4 Ripple Effect, 238 6.4.1 Computing Ripple Effect, 238 6.5 Change Propagation Model, 242 6.5.1 Recall and Precision of Change Propagation Heuristics, 243 6.5.2 Heuristics for Change Propagation, 245 6.5.3 Empirical Studies, 246 6.6 Summary, 247 Literature Review, 248 References, 249 Exercises, 253 7 Refactoring 255 7.1 General Idea, 255 7.2 Activities in a Refactoring Process, 258 7.2.1 Identify What to Refactor, 258 7.2.2 Determine Which Refactorings Should be Applied, 259 7.2.3 Ensure that Refactoring Preserves the Behavior of the Software, 261 7.2.4 Apply the Refactorings to the Chosen Entities, 262 7.2.5 Evaluate the Impacts of the Refactorings on Quality, 263 7.2.6 Maintain Consistency of Software Artifacts, 265 7.3 Formalisms for Refactoring, 265 7.3.1 Assertions, 265 7.3.2 Graph Transformation, 266 7.3.3 Software Metrics, 267 7.4 More Examples of Refactorings, 271 7.5 Initial Work on Software Restructuring, 273 7.5.1 Factors Influencing Software Structure, 273 7.5.2 Classification of Restructuring Approaches, 275 7.5.3 Restructuring Techniques, 276 7.6 Summary, 282 www.it-ebooks.info CONTENTS xi Literature Review, 283 References, 286 Exercises, 288 8 Program Comprehension 289 8.1 General Idea, 289 8.2 Basic Terms, 291 8.2.1 Goal of Code Cognition, 291 8.2.2 Knowledge, 291 8.2.3 Mental Model, 293 8.2.4 Understanding Code, 296 8.3 Cognition Models for Program Understanding, 298 8.3.1 Letovsky Model, 298 8.3.2 Shneiderman and Mayer Model, 301 8.3.3 Brooks Model, 303 8.3.4 Soloway, Adelson, and Ehrlich Model, 308 8.3.5 Pennington Model, 310 8.3.6 Integrated Metamodel, 312 8.4 Protocol Analysis, 315 8.5 Visualization for Comprehension, 317 8.6 Summary, 321 Literature Review, 321 References, 322 Exercises, 324 9 Reuse and Domain Engineering 325 9.1 General Idea, 325 9.1.1 Benefits of Reuse, 327 9.1.2 Reuse Models, 327 9.1.3 Factors Influencing Reuse, 328 9.1.4 Success Factors of Reuse, 329 9.2 Domain Engineering, 329 9.2.1 Draco, 331 9.2.2 DARE, 331 9.2.3 FAST, 331 9.2.4 FORM, 331 9.2.5 KobrA, 332 9.2.6 PLUS, 332 9.2.7 PuLSE, 332 9.2.8 Koala, 332 9.2.9 RSEB, 332 9.3 Reuse Capability, 333 9.4 Maturity Models, 334 9.4.1 Reuse Maturity Model, 334 www.it-ebooks.info xii CONTENTS 9.4.2 Reuse Capability Model, 336 9.4.3 RiSE Maturity Model, 338 9.5 Economic Models of Software Reuse, 340 9.5.1 Cost Model of Gaffney and Durek, 346 9.5.2 Application System Cost Model of Gaffney and Cruickshank, 348 9.5.3 Business Model of Poulin and Caruso, 350 9.6 Summary, 352 Literature Review, 352 References, 353 Exercises, 356 Glossary 359 Index 379 www.it-ebooks.info PREFACE karmany eva dhikaras te; ma phalesu kadachana; ma karmaphalahetur bhur; ma te sango stv akarmani. Your right is to work only; but never to the fruits thereof; may you not be motivated by the fruits of actions; nor let your attachment to be towards inaction. —Bhagavad Gita We have been witnessing stellar growth of the global software industry for three decades. As this century progresses the industry is engaged in fixing defects and enhancing and adding new features to the existing software applications. In fact, more resources are spent on software maintenance than on actual software development. The imbalance between software development and maintenance is opening up new business opportunities for software off-shoring companies. It is also generating much research interest to develop methods and tools for improving software evolution and maintenance (SEAM). Twenty-five years ago, the software industry was a much smaller one, and the academia used to offer a single, comprehensive course entitled Software Engineering to educate undergraduate students in the nuts and bolts of software development and maintenance. Although software maintenance has been a part of the classical software engineering literature for decades, the subject has not been widely incorporated into the mainstream undergraduate curriculum. A few universities have started offering an option in software engineering comprising four specialized courses, namely, Require- ments Specification, Software Design, Software Testing and Quality Assurance, and Software Evolution and Maintenance. In addition, some universities have introduced full undergraduate and graduate degree programs in software engineering. xiii www.it-ebooks.info xiv PREFACE Our survey of the subject of software evolution and maintenance reveals that a large body of work exists in disparate form, including research papers, technical reports, and reports of working groups. Moreover, there are many excellent books focusing on specific aspects of a course in software maintenance and evolution. However, there is no single book that presents the materials in a comprehensive manner. Absence of a comprehensive textbook explaining most of the aspects of software evolution and maintenance creates several problems for instructors and students alike. For example, an instructor needs to refer to many sources to prepare lecture materials. Consequently, it takes much time on the part of the instructor, and students do not have access to all those sources. Our goal is to introduce the students and the instructors to a set of well-rounded educational materials covering the fundamental developments in software evolution and common maintenance practices in the industry. We intend to provide students a single, comprehensive textbook covering most of the topics in evolution and maintenance with much detail so that it is very easy to get a handle on SEAM without reading a number of books and articles in this subject. We have not tried to specifically address their research challenges. Instead, we have presented the evolution theory and practice as a broad stepping stone which will enable the students and practitioners to understand and develop maintenance practices for complex software system. We decided to write this book based on our teaching, research, and industrial experiences in software maintenance. For the past 20 years, Sagar has been teaching software engineering, including software testing and maintenance on a regular basis; Piyu managed software quality assurance teams in industry for the maintenance of routers, switches, wireless data networks, storage networks, and intrusion prevention appliances. Our combined experience has helped us in selecting and structuring the contents of this book to make it suitable as a text. WHO SHOULD READ THIS BOOK? We have written this book to introduce students, researchers, and software profession- als to the fundamental developments in evolution models and common maintenance practices for software. Undergraduate students in software engineering, computer science, and computer engineering will be introduced to the subject matter in a step- by-step manner. Practitioners too will benefit from the structured presentation and comprehensive nature of the materials. Graduate students can use it as a reference. After reading the whole book, the reader will have gained a thorough understanding of the following topics: r Laws of software evolution and the means to control themr Evolution and maintenance models, including maintenance of commercial off- the-shelf systemsr Reengineering techniques and processes for migration of legacy information systems www.it-ebooks.info PREFACE xv r Impact analysis and change propagation techniquesr Program comprehension and refactoringr Reuse and domain engineering models Each chapter gives a clear understanding of a particular topic in software evolution by discussing the main ideas with examples. It starts by explaining the basic concepts about the topic, thereby ensuring a common base of understanding; next, it expands the presentation by drilling the important aspects deeper. HOW SHOULD THIS BOOK BE READ? This book consists of several independent topics in SEAM glued together. Chapters 1, 2, and 3 provide basic understanding of the subject matters. Therefore, the first three chapters must be read in order. Next, depending upon the interest of the reader, one can choose any chapter to study without any difficulty. However, we recommend the reader to study Chapters 4 and 5 together in that order. This is because Chapter 4 (“Reengineering”) introduces basic concepts of reengineering, reverse engineering, and data reverse engineering, whereas Chapter 5 (“Legacy Information Systems”) discusses the migration of a system after it is reengineered. Therefore, in our opinion, the ordering will facilitate easier understanding of the materials, especially for those who are new to software evolution. Notes for instructors The book can be used as a text in an introductory course in SEAM. It is desirable to cover all the chapters in an introductory course in SEAM. When used as a rec- ommended text in a software engineering course, the following selected portions can help students imbibe the essential concepts in software evolution and maintenance: r Chapter 1: All the sectionsr Chapter 2: Sections 2.1 and 2.3r Chapter 3: Sections 3.1, 3.2, 3.3, 3.4, 3.5 and 3.8r Chapter 4: Sections 4.1 and 4.2r Chapter 5: Sections 5.1 and 5.2r Chapter 6: All the sectionsr Chapter 7: Sections 7.1 and 7.2r Chapter 8: Sections 8.1 and 8.2r Chapter 9: Section 9.1 Supplementary materials for instructors are available at: http://ece.uwaterloo.ca/∼ snaik/mybook2.html www.it-ebooks.info xvi PREFACE ACKNOWLEDGMENTS While preparing this book, we received invaluable support of different kinds from many people, including researchers, the publisher, our family members, our friends, and our colleagues. First, we thank all the researchers who have been shaping this field ever since programs were started to be written. Without their published work, this book would not have seen the light of the day. Second, we thank our editors, namely, George Telecki, Michael Christian, and Whitney A. Lesch, who gave us much professional guidance and patiently answered our various questions. The first author, Piyu Tripathy, would like to thank his former colleagues at Cisco Systems, Airvana Inc., NEC Laboratories America Inc., and present colleagues at Knowledge Trust. Finally, the supports of our parents, parents-in-law, and spouses deserve a special mention. I, Piyu Tripathy, thank my dear wife Leena, who has taken many household and family duties off my hands to give me time that I needed to write this book; I would like to thank my newly arrived daughter Inu for asking me inquisitive questions about this book, which helped me in writing this preface. I, Sagar Naik, thank my loving wife Alaka for her invaluable support. I also thank my charming daughters, Monisha and Sameeksha, and exciting son, Siddharth, for their understanding while I was writing this book. Finally, I heartily acknowledge all the support that my elder brother Gajapati extended to me. We are very pleased that now we have more time for our families. Priyadarshi (Piyu) Tripathy Knowledge Trust Bhubaneswar, India Kshirasagar (Sagar) Naik University of Waterloo Waterloo, Canada www.it-ebooks.info LIST OF FIGURES 2.1 Groups or clusters and their types 29 2.2 Decision tree types. From Reference 15. © 2001 John Wiley & Sons 32 2.3 Overview of concept categories affecting software maintenance 38 2.4 Inputs and outputs of software evolution. From Reference 26. © 1988 John Wiley & Sons 45 2.5 S-type programs 47 2.6 P-type programs 48 2.7 E-type programs 49 2.8 E-type programs with feedback. From Reference 33. © 2006 John Wiley & Sons 50 2.9 Onion model of FOSS development structure 59 2.10 Growth of the major subsystems (development releases only) of the Linux OS. From Reference 57. © 2000 IEEE 61 3.1 Traditional SDLC model. From Reference 1. © 1988 John Wiley & Sons 84 3.2 The quick fix model. From Reference 2. © 1990 IEEE 85 3.3 The iterative enhancement model. From Reference 2. © 1990 IEEE 85 3.4 The full reuse model. From Reference 2. © 1990 IEEE 86 3.5 The simple staged model for the CSS life cycle. From Reference 6. © 2000 IEEE 88 3.6 The versioned staged model for the CSS life cycle. From Reference 6. © 2000 IEEE 90 3.7 The staged model for the FLOSS system. From Reference 9. © 2007 ACM 91 xvii www.it-ebooks.info xviii LIST OF FIGURES 3.8 The change min-cycle. From Reference 12. © 2008 Springer 92 3.9 Seven phases of IEEE maintenance process. From Reference 26. © 2004 IEEE 95 3.10 Problem identification phase 96 3.11 Analysis phase 97 3.12 Design phase 97 3.13 Implementation phase 98 3.14 System test phase 98 3.15 Acceptance test phase 99 3.16 Delivery phase 100 3.17 ISO/IEC 14764 iterative maintenance process. From Reference 26. © 2004 IEEE 102 3.18 Process implementation activity 102 3.19 Problem and modification activity 103 3.20 Modification implementation activity 105 3.21 Maintenance review/acceptance activity 106 3.22 Migration activity 107 3.23 Retirement activity 109 3.24 Technical dimensions of SCM systems 113 3.25 An evolution of a file with two branches 114 3.26 A process for implementing SCM 117 3.27 State transition diagram of a CR 120 4.1 Levels of abstraction and refinement. From Reference 5. © 1992 IEEE 136 4.2 Conceptual basis for the reengineering process. From Reference 5. © 1992 IEEE 137 4.3 General model of software reengineering. From Reference 5. © 1992 IEEE 138 4.4 Horseshoe model of reengineering. From Reference 7. © 1998 IEEE 139 4.5 Conceptual basis for reengineering strategies. From Reference 5. © 1992 IEEE 142 4.6 Source code reengineering reference model. From Reference 15. © 1990 IEEE 147 4.7 The interface nomenclature. From Reference 15. © 1990 IEEE. “(N)-” represents the Nth layer 147 4.8 Software reengineering process phases. From Reference 14. © 1992 IEEE 150 4.9 Replacement strategies for recoding 152 4.10 Relationship between reengineering and reverse engineering. From Reference 6. © 1990 IEEE 154 4.11 A block of code to compute the sum and product of all the even integers in the range [0, N)forN ≥ 3 159 4.12 The backward slice of code obtained from Figure 4.11 by using the criterion S < [7]; sum > 159 www.it-ebooks.info LIST OF FIGURES xix 4.13 The forward slice of code obtained from Figure 4.11 by using the criterion S < [3]; product > 160 4.14 Relationship between decompilation and traditional reengineering. From Reference 83. © 2007 165 4.15 General architecture of the DBRE methodology. From Reference 95. © 1997 IEEE 169 4.16 Basic structure of reverse engineering tools. From Reference 6. © 1990 IEEE 171 5.1 Forward wrapper. From Reference 10. © 2006 ACM 190 5.2 Backward wrapper. From Reference 10. © 2006 ACM 190 5.3 Levels of encapsulation. From Reference 11. © 1996 IEEE 191 5.4 Modules of a wrapping framework 193 5.5 Portfolio analysis chi-square chart 198 5.6 Database first approach. From Reference 19. © 1999 IEEE 203 5.7 Database last approach. From Reference 19. © 1999 IEEE 204 5.8 Composite database approach. From Reference 19. © 1999 IEEE 205 5.9 Application gateway. From Reference 19. © 1999 IEEE 206 5.10 Information system gateway. From Reference 19. © 1999 IEEE 207 5.11 Migrating TempStore in Butterfly methodology. From Reference 19. © 1999 IEEE 211 5.12 The iterative system architecture methodology during reengineering. From Reference 29. © 2003 IEEE 214 5.13 The iterative migration process. From Reference 29. © 2003 IEEE 215 6.1 Impact analysis process. From Reference 6. © 2008 IEEE 226 6.2 Traceability in software work products. From Reference 22. © 1991 IEEE 229 6.3 Underlying graph for maintenance. From Reference 22. © 1991 IEEE 230 6.4 Determine work product impact. From Reference 22. © 1991 IEEE 231 6.5 Simple directed graph of SLOs. From Reference 12. © 2002 IEEE 232 6.6 In-degree and out-degree of SLO1. From Reference 12. © 2002 IEEE 232 6.7 Example of a call graph. From Reference 26. © 2003 IEEE 234 6.8 Execution trace 235 6.9 Example program. From Reference 31. © 1990 ACM 236 6.10 Program dependency graph of the program in Figure 6.9 236 6.11 Dynamic program slice for the code in Figure 6.9, text case X =−1, with respect to variable Y 237 6.12 Intramodule and intermodule change propagation. From Reference 36. © 2001 John Wiley & Sons 239 6.13 Change propagation model. From Reference 10. © 2004 IEEE 243 6.14 Change propagation flow for a simple example. From Reference 10. © 2004 IEEE 244 www.it-ebooks.info xx LIST OF FIGURES 6.15 Program 253 7.1 Class diagram of a local area network (LAN) simulator. From Reference 6. © 2007 Springer 260 7.2 Applications of two refactorings. From Reference 6. © 2007 Springer 263 7.3 An example of a soft-goal graph for maintainability, with one leaf node. From Reference 11. © 2002 IEEE 264 7.4 An example of a program graph. From Reference 13. © 2006 Elsevier 266 7.5 Program graph obtained after applying push-down method refactoring to the program graph of Figure 7.4. From Reference 13. © 2006 Elsevier 267 7.6 An example of a VRML diagram of two classes C1 and C2. Circles denote methods and squares denote attributes 269 7.7 Illustration of the push-down method refactoring: (a) the class diagram before refactoring; (b) the class diagram after refactoring 272 7.8 An example of parameterizing a method. There are four methods in (a), whereas there is one method in (b) with one parameter 273 7.9 Factors which can influence software structure. From Reference 2. © 1989 IEEE 274 7.10 Broad classification of approaches to software structuring 275 7.11 System sandwich approach to software restructuring. The arrows represent the flow of data and/or commands 278 7.12 Illustration of system level remodularization. Bullets represent low level entities. Dotted shapes represent modules. Arrows represent progression from one level to the next 279 7.13 Illustration of entity level remodularization. Bullets represent low level entities. Dotted shapes represent modules 279 7.14 Dendogram representation of Figure 7.12 282 8.1 Gaining general knowledge and software-specific knowledge 292 8.2 Letovsky’s program comprehension model 298 8.3 Shneiderman and Mayer program comprehension model 302 8.4 An overview of Brooks comprehension model 304 8.5 Soloway, Adelson, and Ehrlich comprehension model 309 8.6 Pennington model 310 8.7 Integrated Metamodel. From Reference 1. © 1995 IEEE 313 9.1 Feedback between domain and application engineering 330 9.2 Reuse capability. From Reference 40. © 1993 IEEE 334 www.it-ebooks.info LIST OF TABLES 2.1 Evidence-Based 12 Mutually Exclusive Maintenance Types 30 2.2 Impact of the Types 31 2.3 Summary of Evidence-Based Types of Software Maintenance 33 2.4 Staged Model Maintenance Task 39 2.5 Laws of Software Evolution 50 2.6 System Data to be Used in Question 14 81 3.1 Template of a Maintenance Plan 101 3.2 Modification Request Task Steps 104 3.3 Option Task Steps 104 3.4 Documentation Task Steps 105 3.5 Review and Approval Task Steps 106 3.6 Migration Plan Task Steps 108 3.7 Operation and Training Task Steps 108 3.8 Retirement Plan Task Steps 110 3.9 Change Request Schema Field Summary 121 3.10 Engineering Change Document Information 124 4.1 Reengineering Process Variations 143 4.2 Tasks—Analysis and Planning Phase 151 4.3 Commonly Used Software Metrics 163 5.1 Common Quantifiable Benefit Metrics 200 5.2 Chicken Little Migration Approach 207 5.3 Phases of Butterfly Methodology 209 5.4 Migration Activities in Phase 1 209 5.5 Migration Activities in Phase 2 210 5.6 Migration Activities in Phase 3 210 xxi www.it-ebooks.info xxii LIST OF TABLES 5.7 Migration Activities in Phase 4 210 5.8 Migration Activities in Phase 5 211 6.1 Relationships Represented by a Connectivity Matrix 233 6.2 Relationships Represented by a Reachability Matrix 233 6.3 Relationship with Distance Indicators 234 6.4 Laws of Software Evolution 242 6.5 Performance of Change Propagation Heuristics for the Five Software Systems 247 8.1 Tasks and Activities Requiring Code Understanding 290 8.2 Code Cognition Models 290 9.1 Reuse Maturity Model 335 9.2 Critical Success Factors 337 9.3 RiSE Maturity Model Levels: Organizational Factors [42] 341 9.4 RiSE Maturity Model Levels: Business Factors [42] 343 9.5 RiSE Maturity Model Levels: Technological Factors [42] 344 9.6 RiSE Maturity Model Levels: Processes Factors [42] 345 9.7 Relative Costs of Development Activities 347 9.8 Relative Reuse Cost (b) 348 www.it-ebooks.info 1 BASIC CONCEPTS AND PRELIMINARIES Another flaw in the human character is that everybody wants to build and nobody wants to do maintenance. —Kurt Vonnegut, Jr. 1.1 EVOLUTION VERSUS MAINTENANCE In 1965, Mark Halpern introduced the concept of software evolution to describe the growth characteristics of software [1]. Later, the term “evolution” in the context of application software was widely used. The concept further attracted the attentions of researchers after Belady and Lehman published a set of principles determining evolution of software systems [2, 3]. The principles were very general in nature. In his landmark article entitled “The Maintenance ‘Iceberg’,” R. G. Canning compared software maintenance to an “iceberg” to emphasize the fact that software developers and maintenance personnel face a large number of problems [4]. A few years later, in 1976, Swanson introduced the term “maintenance” by grouping the maintenance activities into three basic categories: corrective, adaptive, and perfective [5]. In the early 1970s, IBM called them “maintenance engineers” or “maintainers” who had been making intentional modifications to running code that they had not developed themselves. The main reason for using nondevelopment personnel in maintenance work was to free up the software development engineers or programmers from support Software Evolution and Maintenance: A Practitioner’s Approach, First Edition. Priyadarshi Tripathy and Kshirasagar Naik. © 2015 John Wiley & Sons, Inc. Published 2015 by John Wiley & Sons, Inc. 1 www.it-ebooks.info 2 BASIC CONCEPTS AND PRELIMINARIES activities [6]. In this book, we will use maintainer, maintenance engineer, developer, and programmer interchangeably. Bennett and Rajlich [7] researched the term “software evolution” and found that there is no widely accepted definition of the term. In addition, some researchers and practitioners used the phrases “software evolution” and “software maintenance” interchangeably. However, key semantic differences exist between the two. The two are distinguished as follows: r The concept of software maintenance means preventing software from failing to deliver the intended functionalities by means of bug fixing.r The concept of software evolution means a continual change from a lesser, simpler, or worse state to a higher or better state ([8], p. 1). Bennett and Xu [9] made further distinctions between the two as follows: r All support activities carried out after delivery of software are put under the category of maintenance.r All activities carried out to effect changes in requirements are put under the category of evolution. In general, maintenance and evolution are generally differentiated as follows [10]: r Maintenance of software systems primarily means fixing bugs but preserving their functionalities. Maintenance tasks are very much planned. For example, bug fixing must be done and it is a planned activity. In addition to the planned activities, unplanned activities are also necessitated. For example, a new usage of the system may emerge. Generally, maintenance does not involve making major changes to the architecture of the system. In other words, maintenance means keeping an installed system running with no change to its design [11].r Evolution of software systems means creating new but related designs from existing ones. The objectives include supporting new functionalities, making the system perform better, and making the system run on a different operating system. Basically, as time passes, the stakeholders develop more knowledge about the system. Therefore, the system evolves in several ways. As time passes, not only new usages emerge, but also the users become more knowledgeable. As Mehdi Jazayeri observed: “Over time what evolves is not the software but our knowledge about a particular type of software” ([12], p. 3). While we are on the topic of maintenance, it is useful to glance at the maintenance of physical systems. Maintenance of physical systems often requires replacing broken and worn-out parts. For example, owners replace the worn-out tires and broken lamps of their cars. Similarly, a malfunctioning memory card is replaced with a good one. On the other hand, software maintenance is different than hardware maintenance. In hardware maintenance, a system or a component is returned to its original good state. On the other hand, in software maintenance, a software system is moved from www.it-ebooks.info EVOLUTION VERSUS MAINTENANCE 3 its original erroneous state to an expected good state [13]. Software maintenance comprises all activities associated with the process of changing software for the purposes of: r fixing bugs; and/orr improving the design of the system so that future changes to the system are less expensive. 1.1.1 Software Evolution Although the phrase “software evolution” had been used previously by other researchers, fundamental work in the field of software evolution was done by Lehman and his collaborators. Based on empirical studies [2, 14], Lehman and his collabo- rators formulated some observations and they introduced them as laws of evolution. The “laws” themselves have “evolved” from three in 1974 to eight by 1997 [15, 16]. Those laws are the results of studies of the evolution of large-scale proprietary or closed source software (CSS) systems. The laws concern a category of software systems called E-type systems. The eight laws are briefly explained as follows: 1. Continuing change. Unless a system is continually modified to satisfy emerging needs of users, the system becomes increasingly less useful. 2. Increasing complexity. Unless additional work is done to explicitly reduce the complexity of a system, the system will become increasingly more complex due to maintenance-related changes. 3. Self-regulation. The evolution process is self-regulating in the sense that the measures of products and processes, that are produced during the evolution, follow close to normal distributions. 4. Conservation of organizational stability. The average effective global activity rate on an evolving system is almost constant throughout the lifetime of the system. In other words, the average amount of additional effort needed to produce a new release is almost the same. 5. Conservation of familiarity. As a system evolves all kinds of personnel, namely, developers and users, for example, must gain a desired level of understanding of the system’s content and behavior to realize satisfactory evolution. A large incremental growth in a release reduces that understanding. Therefore, the average incremental growth in an evolving system remains almost the same. 6. Continuing growth. As time passes, the functional content of a system is con- tinually increased to satisfy user needs. 7. Declining quality. Unless the design of a system is diligently fine-tuned and adapted to new operational environments, the system’s qualities will be per- ceived as declining over the lifetime of the system. 8. Feedback system. The system’s evolution process involves multi-loop, multi- agent, multi-level feedback among different kinds of activities. Developers must recognize those complex interactions in order to continually evolve an existing system to deliver more functionalities and higher levels of qualities. www.it-ebooks.info 4 BASIC CONCEPTS AND PRELIMINARIES In circa 1988, Pirzada [17] was the first one to study the differences between the evolution of the Unix operating system developed by Bell Laboratories and the systems studied by Lehman and Belady [18]. Pirzada argued that the differences in academic and industrial software development could lead to differences in the evolutionary pattern. In circa 2000, after a gap of 12 years, empirical study of evolution of free and open source software (FOSS) was conducted by Godfrey and Tu [19]. The authors provided the trend of growth of the popular FOSS operating system Linux during 1994–1999. They showed the growth rate to be super-linear that is greater than linear. Robles et al. [20] later replicated the study of Godfrey and Tu and concluded that Lehman’s laws Nos. 3, 4, and 5 do not hold for large-scale FOSS systems such as Linux. These studies reveal the changing nature of both software and software development processes. Lehman’s studies mostly examined proprietary, monolithic systems developed by a team of developers within a company, whereas FOSS systems and their developments follow a different evolution paradigm. Remark: FOSS is available to all with relaxed or nonexistent copyrights. FOSS is commonly used as a synonym for free software even though “free” and “open” have different semantics. The term “free” means the freedom to modify and redistribute the system under the terms of the original agreement, while “open” means accessibility to the source code. 1.1.2 Software Maintenance More likely than not, there are defects in delivered software applications, because defect removal and quality control processes are not perfect. Therefore, mainte- nance is needed to repair those defects in released software. E. Burton Swanson [5] initially defined three categories of software maintenance activities, namely, cor- rective, adaptive, and perfective. Those definitions were later incorporated into the standard software engineering–software life cycle processes–Maintenance [21] and introduced a fourth category called preventive maintenance. The reader may note that some researchers and developers view preventive maintenance as a subset of perfective maintenance. Swanson’s classification of maintenance activities is intention based because the maintenance activities reflect the intents of the developer to carry out specific main- tenance tasks on the system. In the intention-based classification of maintenance activities, the intention of an activity depends upon the motivations for the change. An alternative way of classifying modifications to software is to simply categorize the modifications in terms of activities performed [22]: r Activities to make corrections. If there are discrepancies between the expected behavior of a system and the actual behavior, then some activities are performed to eliminate or reduce the discrepancies.r Activities to make enhancements. A number of activities are performed to imple- ment a change to the system, thereby changing the behavior or implementation www.it-ebooks.info EVOLUTION VERSUS MAINTENANCE 5 of the system. This category of activities is further refined into three subcate- gories: – enhancements that modify existing requirements; – enhancements that create new requirements; and – enhancements that modify the implementation without changing the require- ments. Chapin et al. [6] expanded the typology of Swanson into an evidence-based classi- fication of 12 different types of software maintenance: training, consultive, evaluative, reformative, updative, groomative, preventive, performance, adaptive, reductive, cor- rective, and enhancive. The three objectives for classifying the types of software maintenance are as follows: r It is more informative to classify maintenance tasks based on objective evidence that can be verified with observations and/or comparisons of software before and after modifications. This does not require accessing the knowledge of the personnel who originally developed the system.r The granularity of the proposed classification can be made to accurately reflect the actual mix of activities observed in the practice of software maintenance and evolution.r The classification groups are independent of hardware platform, operating sys- tem choice, design methodology, implementation language, organizational prac- tices, and the availability of the personnel doing the original development. Maintenance of COTS-Based Systems Many present-day software systems are built from components previously developed for other systems or to be reused in many systems. In this approach, new components are developed by combining commercial off-the-shelf (COTS) components, custom-built (in-house) components, and open source software components. The components are obtained from a variety of sources and maintained by different vendors, possibly from different countries [23]. The motivations for performing software maintenance are the same for both component- based software systems (CBS) and custom-built software systems. However, there are noticeable differences between the activities in the two approaches. The major sources of the differences are as follows [24, 25]: r Skills of system maintenance teams. Maintenance of CBS requires specialized skills to monitor and integrate COTS products. Those skills are different than the skills required to perform the more traditional maintenance functions: analyze and modify source code developed in-house. Maintainers view a CBS as a group of black-box components, and not as a compiled set of source code modules, thereby requiring a different set of maintenance skills. The differences in skills are neither pros nor cons, but it is important that the differences are taken into consideration for planning, staffing, and training. www.it-ebooks.info 6 BASIC CONCEPTS AND PRELIMINARIES r Infrastructure and organization. Running a support group for in-house products is necessary to manage a large product. This additional cost may be shared with other projects.r COTS maintenance cost. This cost includes the costs of purchasing compo- nents, licensing components, upgrading components, and training maintenance personnel. From the perspective of a system’s life cycle, much cost is shifted from in-house development to license and maintenance fees, thereby increasing the overall maintenance cost.r Larger user community. COTS users are part of a broad community of users, and the community of users can be considered as a resource, which is a positive factor. However, being part of a community means having less control over changes and improvements to COTS products.r Modernization. In general, vendors of COTS components keep pace with chang- ing technology and continually update the components. As a result, the system does not become obsolete. However, the flip side is that the costs and risks of making changes keep increasing even if the application does not require any changes. In general, control over the evolution and maintenance of signif- icant portions of the system is relinquished to third-party COTS developers. Those third-party developers may be motivated to pursue their own commercial self-interest. In addition, the third-party vendors control not only the nature of maintenance to be done on the products, but also when it is to be done. Therefore reliance on third-party products impacts both the type and timing of the main- tenance performed by COTS-based developers. In a nutshell, unfortunately, upgrades to products are necessitated by technology and vendor economics.r Split maintenance function. A COTS product is maintained by its vendor, whereas the overall system that uses the COTS product is maintained by the sys- tem’s host organization. As a result, multiple, independent maintenance teams exist. The advantage of COTS-based development is that the system maintain- ers receive additional support from the COTS vendors. On the other hand, the drawback of the approach is that the different COTS pieces need tighter coor- dination, and the product vendors may stray in all directions with respect to functionality and standard.r More complex planning. If a system depends upon multiple technologies and COTS products, the unpredictability and risk of change become high, and planning becomes complicated because coordination among a large number of vendors is more difficult. 1.2 SOFTWARE EVOLUTION MODELS AND PROCESSES There is much confusion about the terms “software maintenance” and “software evolution.” The confusion is partly due to a lack of attention paid to models for sustaining software systems and partly due to considering maintenance to be another activity in software development. For example, consider the classical Waterfall model for software development proposed by Winston Royce in circa 1970 [26]. The final www.it-ebooks.info SOFTWARE EVOLUTION MODELS AND PROCESSES 7 phase of the Waterfall model is known as maintenance, which implies that software maintenance is a part of software development. In this regard it is worth quoting Norman Schneidewind [27]: “The traditional view of the software life cycle has done a disservice to maintenance by depicting it solely as a single step at the end of the cycle” (p. 304). Therefore, software maintenance should have its own software maintenance life cycle (SMLC) model [28]. A number of SMLC models with some variations are available in literature [8, 29–35]. Three common features of the SMLC models found in the literature are: r understanding the code;r modifying the code; andr revalidating the code. Other models view software development as iterative processes and based on the idea of change mini-cycle [7, 36–39] as explained in the following: r Iterative models. The iterative models share the ideas that a complete set of requirements for a system cannot be completely understood, or the developers do not know how to build the full system. Therefore, systems are constructed in builds, each of which is a refinement of requirements of the previous build. A build is refined by considering feedback from users [40]. One may note that maintenance and evolution activities do not exist as distinct phases. Rather, they are closely intertwined.r Change mini-cycle models. First proposed by Yau et al. [36] in the late 1970s, these models were recently re-visited by Bennet et al. [7] and Mens [41] among others. These models consist of five major phases: change request, analyze and plan change, implement change, verify and validate, and documentation change. In this process model, several important activities were identified, such as pro- gram comprehension, impact analysis, refactoring, and change propagation. A different kind of software evolution model, called staged model of maintenance and evolution, has been proposed by Rajlich and Bennett [42]. The model is descrip- tive in nature, and its primary objective is to improve the understanding of how long-lived software evolves. The model considers four distinct, sequential stages of the lifetime of a system, as explained below: 1. Initial development. When the initial version of the system is produced, detailed knowledge about the system is fresh. Before delivery of the system, it undergoes many changes. Eventually, a system architecture emerges and soon it stabilizes. 2. Evolution. After the initial stability, it is easy to perform simple changes to the system. Significant changes involve higher cost and higher risk. In the period immediately following the initial delivery, knowledge about the system is still almost fresh in the minds of the developers. It is possible that the development team as a whole does not exist, because many original developers have taken up new responsibilities in the organization and some might have left www.it-ebooks.info 8 BASIC CONCEPTS AND PRELIMINARIES the organization. In general, for many systems, their lifespan are spent in this stage, because the systems continue to be of importance to the organizations. 3. Servicing. When the knowledge about the system has significantly decreased, the developers mainly focus on maintenance tasks, such as fixing bugs, whereas architectural changes are rarely effected. The developers do not consider the system to be a key asset. In this stage, the effects of changes are very difficult to predict. Moreover, the costs and risks of making changes are very significant. 4. Phaseout. When even minimal servicing of a system is not an option, the system enters its very final stage. The organization decides to replace the system for various reasons: (i) it is too expensive to maintain the system; or (ii) there is a newer solution available. Therefore, the organization develops an exit strategy to move from the current system to a new system. Moving from an existing, difficult-to-maintain system to a modern solution system has its own challenges involving wrapping and data migration. After the new system keeps running satisfactorily, sometimes in parallel with the old system, the old system is finally completely shut down. Software Maintenance Standards A well-defined process for software mainte- nance can be observed and measured, and thus improved. In addition, adoption of processes allows the dissemination of effective work practices more quickly than gaining personal experience. Process centric software maintenance is more of an engineering activity, with predictable time and effort constraints, and less of an art. Therefore, software maintenance standards have been formulated by ISO and IEEE. The maintenance standard document from ISO is called ISO/IEC 14764 [21] which is a part of the standard document ISO/IEC 12207 [43] for life cycle processes. The maintenance standard document from IEEE is called IEEE/EIA 1219 [44]. Both the standards describe processes for managing and executing activities for maintenance. The IEEE/EIA 1219 standard organizes the maintenance process in seven phases: problem identification, analysis, design, implementation, system test, acceptance test, and delivery. As a quick summary, the standard identifies the different phases and the sequence of their executions. Next, for each phase, the standard identifies the input and output deliverables, the supporting processes and the related activities, and a set of evaluation metrics. Both the standards, namely ISO/IEC 14764 and IEEE/EIA 1219, use the same terminology to describe software maintenance, with a little difference in their depictions. An iterative process has been described in ISO/IEC 14764 to manage and execute maintenance activities. The activities comprising the maintenance process are: r process implementation;r problem and modification analysis;r modification implementation;r maintenance review/acceptance;r migration; andr retirement. www.it-ebooks.info REENGINEERING 9 Each of the aforementioned activities is made up of tasks described with specific inputs, outputs, and actions. Software Configuration Management Configuration management (CM) is the dis- cipline of managing changes in large systems. The goal of CM is to manage and control the various extensions, adaptations, and corrections that are applied to a sys- tem over its lifetime. It handles the control of all products/configuration items and changes to those items. Software configuration management (SCM) is the config- uration management applied to software systems. SCM is the means by which the process of software evolution is managed. SCM has been defined in the IEEE 1042 standard [45] as “software configuration management (SCM) is the discipline of man- aging and controlling change in the evolution of software systems.” SCM provides a framework for managing changes in a controlled manner. The purpose of SCM is to reduce communication errors among personnel working on different aspects of the software project by providing a central repository of information about the project and a set of agreed upon procedures for coping with changes. It ensures that the released software is not contaminated by uncontrolled or unapproved changes. Early SCM tools had limited capabilities in terms of functionality and applicability. However, modern SCM systems provide advanced capabilities through which many different artifacts are managed. For example, modern SCM systems support their users in building an executable program out of its versioned source files. Moreover, it must be possible to regenerate old versions of the software system. In general, an SCM system has four different elements, each element addressing a distinct user need as follows [46, 47]: r Identification of software configurations. This includes the definitions of the different artifacts, their baselines or milestones, and the changes to the artifacts.r Control of software configurations. This element is about controlling the ways artifacts or configurations are altered with the necessary technical and adminis- trative support.r Auditing software configurations. This element is about making the current sta- tus of the software system in its life cycle visible to management and determining whether or not the baselines meet their requirements.r Accounting software configuration status. This element is about providing an administrative history of how the software system has been altered, by recording the activities necessitated by the other three SCM elements. 1.3 REENGINEERING Hongji Yang and Martin Ward [48] defined software evolution as “ … the process of conducting continuous software reengineering” (p. 23). Reengineering implies a single cycle of taking an existing system and generating from it a new system, whereas evolution can go forever. In other words, to a large extent, software evolution can www.it-ebooks.info 10 BASIC CONCEPTS AND PRELIMINARIES be seen as repeated software reengineering. Reengineering is done to transform an existing “lesser or simpler” system into a new “better” system. Chikofsky and Cross II [49] define reengineering as “the examination and alteration of a subject system to reconstitute it in a new form and the subsequent implementation of the new form.” Therefore, reengineering includes some kind of reverse engineering activities to design an abstract view of a given system. The new abstract view is restructured, and forward engineering activities are performed to implement the system in its new form. The aforementioned process is captured by Jacobson and Lindst¨orm [50] with the following expression: Reengineering = Reverse engineering +Δ+Forward engineering. Let us analyze the right-hand portion of the above equation. The first element “reverse engineering” is the activity of defining a more abstract and easier to understand representation of the system. For example, the input to the reverse engineering process is the source code of the system, and the output is the system architecture. The core of reverse engineering is the process of examination of the system, and it is not a process of change. Therefore it does not involve changing the software under examination. The third element “forward engineering” is the traditional process of moving from a high-level abstraction and logical, implementation-independent design to the physical implementation of the system. The second element “Δ” captures alterations performed to the original system. While performing reverse engineering on a large system, tools and method- ologies are generally not stable. Therefore, a high-level organizational paradigm enables repetitions of processes so that maintenance engineers learn about the system. Benedusi et al. [51] have proposed a repeatable paradigm, called Goals/Models/Tools, that describes reverse engineering in three successive stages, namely, Goals, Models, and Tools. Goals. In this phase, one analyzes the motivations for setting up the process to identify the information needs of the process and the abstractions to be produced. Models. In this phase, one analyzes the abstractions to construct representation models that capture the information needed for their production. Tools. In this phase, software tools are defined, acquired, enhanced, integrated, or constructed to: (i) execute the Models phase and (ii) transform the program models into the abstractions identified in the Goals phase. It is important to note that fact-finding and information gathering from the source code are keys to the Goal/Models/Tools paradigm. In order to extract information that is not explicitly available in source code, automated analysis techniques, such as lexical analysis, syntactic analysis, control flow analysis, data flow analysis, and program slicing are used to facilitate reverse engineering. www.it-ebooks.info LEGACY SYSTEMS 11 The increased use of data mining techniques in support systems have given rise to an interest in data reverse engineering (DRE) technology. DRE tackles the question of what information is stored and how this information can be used in a different context. DRE is defined by Peter Aiken as “the use of structured techniques to reconstitute the data assets of an existing system” [52]. The two vital aspects of the DRE process are to: (i) recover data assets that are useful or valuable and (ii) reconstitute the recovered data assets to make them more useful. Therefore, DRE can be regarded as adding value to the existing data assets, making it easier for organizations to conduct business efficiently and effectively. 1.4 LEGACY SYSTEMS A legacy software system is an old program that continues to be used because it still meets the users’ needs, in spite of the availability of newer technology or more efficient methods of performing the task. More often than not, a legacy system includes outdated procedures or terminology, and it is very difficult for new developers to understand the system. Organizations continue to use legacy systems because those are vital to them and the systems significantly resist modification and evolution to meet new and constantly changing business requirements [53, 54]. A legacy system falls in the Phase out stage of the software evolution model of Rajlich and Bennet described earlier. Organizations in business for a long time generally possess a sizable number of legacy systems. To manage legacy systems, a number of options are available. Some commonly chosen options are as follows [55, 56]: r Freeze. An organization decides to perform no further work on a legacy system. This implies that either the services of the system are no longer needed or a new system completely replaces a legacy system.r Outsource. An organization may decide that supporting legacy software—or for that matter any software—is not its core business. As an alternative, it may outsource the support service to a specialist organization.r Carry on maintenance. In this approach, the organization continues to maintain the system for another period of time, despite all the difficulties in doing so.r Discard and redevelop. In this approach, the application is redeveloped once again from scratch, using new hardware and software platforms, new software architecture and databases, and modern tools. When the new system is available, the legacy system is simply discarded.r Wrap. In this approach, a legacy system is wrapped around with a new software layer, thereby hiding the unwanted complexity of the existing data, individual programs, application systems, and interfaces. The old system performs the actual computations, but users interact with the system in better ways. The notion of “wrapper” was first introduced by Dietrich et al. at IBM in the late 1980s [57]. Wrapping is a black-box reengineering task, because only the legacy interface is analyzed while ignoring the system’s internals. A wrapper does www.it-ebooks.info 12 BASIC CONCEPTS AND PRELIMINARIES not directly modify the source code, but it indirectly modifies the software functionality of the legacy component. Wrapping lets organizations reuse well- tested components that they trust and leverage their massive investments in the system. As a result, the lifetime of the legacy system is increased. Many researchers have proposed techniques for wrapping legacy systems [58–60].r Migrate. In this approach, an operational legacy system is moved to a new hardware and/or software platform, while still retaining the legacy system’s functionality. The idea is to minimize any disruption to the existing business environment. Migration is the best alternative if wrapping is unsuitable and redevelopment is not acceptable due to substantial risk. Migration involves changes to the legacy system, including restructuring the system and enhancing the functionality of the system. However, it retains the basic functionality of the existing system without having to completely redevelop it. Migration projects require careful planning for smooth execution. Harry M. Sneed [61] suggested five steps for a good plan: project justification, portfolio analysis, cost estimation, cost-benefit analysis, and contracting. Project justification is the first step in any planning. Justifying the project requires analysis of the existing products, the maintenance process, and the business value of the applications. Portfolio analysis prioritizes applications to be reengineered according to their business value and technical quality. Cost estimation gives us an idea about the cost of the migration project. Cost-benefit analysis tells us the costs of the migration project and the expected returns. Contracting entails the identification of tasks and the distribution of efforts. Given the scale, complexity, and risk of failure of migration projects, a well-defined, easily implementable, detailed approach is essential to their success. Several migration approaches can be found in the literature: Cold Turkey, Database First, Database Last, Composite Database, Chicken Little, Butterfly, and Iterative [62–64]. 1.5 IMPACT ANALYSIS Impact analysis is the task of identifying portions of the software that can potentially be affected if a proposed change to the system is effected. The outcome of impact analysis can be used when planning for changes, making changes, and tracking the effects of changes in order to localize the sources of new faults. Impact analysis techniques can be categorized into two classes as follows [65]: r Traceability analysis. In this approach, the high-level artifacts, such as require- ments, design, code, and test cases related to the feature to be changed, are identified. A model of associations among artifacts, such that each artifact links to other artifacts, is constructed. This helps in locating the corresponding por- tions of the design, code, and test cases that need to be maintained.r Dependency analysis. Dependency analysis attempts to assess the effects of a change on the semantic dependencies between program entities. This is achieved www.it-ebooks.info REFACTORING 13 by identifying the syntactic dependencies that may signal the presence of such semantic dependencies [66]. The two dependency-based impact analysis tech- niques are [67]: call graph-based analysis and dependency graph-based analysis. Dependency analysis is also known as source code analysis. The following two additional notions are found to be keys to understanding impact analysis: r Ripple effect analysis. Ripple effect analysis emphasizes the tracing repercus- sions in source code when the code is changed. It measures the impact of a change to a particular module on the rest of the program [68]. Impact can be stated in terms of the problems being created for the rest of the program because of the change. Analysis of ripple effect can provide information regarding what changes are occurring and where they are occurring. Measurement of ripple effect can provide knowledge about the system as a whole through its evolu- tion: (i) the amount of increase or decrease of its complexity since the previous version; (ii) the levels of complexity of individual parts of a system in relation to other parts of the system; and (iii) the effect that a new module has on the complexity of a system as a whole when it is added.r Change propagation. Change propagation activities ensure that a change made in one component is propagated properly throughout the entire system [69–71]. Misunderstanding, lack of experience, and unexpected dependencies are some reasons for failing to propagate changes throughout the development and main- tenance cycles of source code. If a change is not propagated correctly, the project risks the introduction of new interface defects [72]. 1.6 REFACTORING Refactoring means performing changes to the structure of software to make it eas- ier to comprehend and cheaper to make subsequent changes without changing the observable behavior of the system. A similar idea for non-object-oriented systems is called restructuring. Refactoring is achieved through removal of duplicate code, simplification of code, and moving code to a different class, among others. With- out continual refactoring, the internal structure of software will eventually deform beyond comprehension, due to periodic maintenance. Therefore, regular refactoring helps the system to retain its basic structure [73]. In an agile software methodology, such as eXtreme Programming (XP), refactoring is continuously applied to: (i) make the architecture of the software stable; (ii) render the code readable; and (iii) make the tasks of integrating new functionalities into the system flexible. An important characteristic of refactoring is that it must preserve the “observ- able behavior” of the system. Preservation of the observable behavior is verified by ensuring that all the tests passing before refactoring must pass after refactoring. Regression testing is used to ensure that the system did not deviate from the original www.it-ebooks.info 14 BASIC CONCEPTS AND PRELIMINARIES system during refactoring. Refactoring does not normally involve code transforma- tion to implement new requirements. Rather, it can be performed without adding new requirements to the existing system. Another aspect of refactoring is to enhance the internal structure of the system. In addition, the concept of program restructuring can be applied to transform legacy code into a more structured form and migrate it to a different programming language. That is, restructuring and refactoring can be used to reengineer software systems. Refactoring techniques put emphasis on the development of a list of basic refac- torings, which can be combined to form complex refactorings [74, 75]. The original list of basic refactorings contained transformations on object-oriented code: (i) add a class, method, or attribute; (ii) rename a class, method, or attribute; (iii) move an attribute or method up or down the hierarchy; (iv) remove a class, method, or attribute; and (v) extract chunks of code into separate methods. Most complex refactoring sce- narios require small code changes for the refactorings to work correctly. Primitive refactorings are rarely used in isolation. 1.7 PROGRAM COMPREHENSION The purpose of program comprehension is to understand an existing software system for planning, designing, coding, and testing changes. T. A. Corbi [76] observed in 1989 that program comprehension accounts for 50% of the total effort expended throughout the life cycle of a software system. Therefore, good understanding of the software is key to raising its quality by means of maintenance at a lower cost. In terms of concrete activities, program comprehension involves building mental models of an underlying system at different levels of abstractions, varying from low-level models of the code to very high-level models of the underlying application domain [77]. Mental models have been studied by cognitive scientists to understand how human beings know, perceive, make decisions, and construct behavior in a real world [78, 79]. In the domain of program comprehension, a mental model describes a programmer’s mental representation of the program to be comprehended. Program comprehension involves constructing a mental model of the program by applying various cognitive processes. A key step in developing mental models is generating hypotheses, or conjectures, and investigating their validity. Hypotheses are a way for a programmer to understand code in an incremental manner. After some understanding of the code, the programmer forms a hypothesis and verifies it by reading code. Verification of hypothesis results in either accepting the hypothesis or rejecting it. Sometimes, a hypothesis may not be completely correct because of incomplete understanding of code by the programmer. By continuously formulating new hypotheses and verifying them, the programmer understands more and more code and in increasing details. One can apply several strategies to arrive at meaningful hypotheses, such as bottom–up, top–down, and opportunistic combinations of the two. A bottom–up strategy works by beginning with the code, whereas a top–down strategy operates www.it-ebooks.info SOFTWARE REUSE 15 by working from a high-level goal. A strategy is formulated by identifying actions to achieve a goal. Strategies guide two mechanisms, namely, chunking and cross- referencing to produce higher-level abstraction structures. Chunking creates new, higher-level abstraction structures from lower-level structures. Cross-referencing means being able to make links between elements of different abstraction levels. This helps in building a mental model of the program under study at different levels of abstractions. In general, understanding a program involves a knowledge base, which represents the expertise and background knowledge a programmer brings to the table, a mental model, and an assimilation process [80]. The assimilation process guides the programmer to look at certain pieces of information, such as a code segment or a comment, and move forward/backward while reading the code. The assimilation process can work in three ways: top–down, bottom–up, and opportunistic. 1.8 SOFTWARE REUSE The 1968 NATO (North Atlantic Treaty Organization) conference on software engi- neering is viewed to have germinated the ever growing field of software engineering [81]. The conference focused on software crises—the problem of building large, reli- able software systems in a controlled way. In that conference, the term software crisis was coined for the first time. Even in the first forum on software systems, software reuse was pronounced as a means for tackling software crisis. The idea of software reuse was first introduced by Dough McIlroy in a seminal paper [82] in 1968. He proposed to realize reuse by means of library components and automated ways for customizing components to varying degrees of robustness and precision. Other significant early reuse research developments include David Parnas’s idea of program families [83] and Jim Neighbors’ introduction of the concepts of domain analysis [84]. A program family is a set of programs whose common properties are so extensive that it becomes advantageous to study the common properties of these programs before analyzing individual differences. On the other hand, domain analysis is an activity of identifying objects and operations of a class of similar systems in a particular problem domain. Simply stated, software reuse means using existing software knowledge or artifacts during the development of a new system. Reusable assets can include both artifacts and software knowledge. Note that reuse is not constrained to source code fragments. Rather, Capers Jones identified four broad types of artifacts for reuse [85]: r Data reuse. This involves a standardization of data formats. Reusable func- tions imply a standard data interchange format. Therefore, one of the critical precursors to full reusable software is that of reusable data.r Architectural reuse. This consists of standardizing a set of design and program- ming conventions dealing with the logical organization of software. The goal is to define a complete set of functional elements which will be needed to create new systems from standard components. www.it-ebooks.info 16 BASIC CONCEPTS AND PRELIMINARIES r Design reuse. This deals with the reuse of abstract designs that do not include implementation details. These are then implemented specifically to fit the appli- cation requirements.r Program reuse. This deals with reusing running code. The software units that are reused may be of different sizes. The whole of software system may be reused by incorporating it without change into other system (COTS product reuse). Reusability is a property of software assets, which indicates the degree to which the software can be reused. For a software component to be reusable, it must exhibit the following characteristics: high cohesion, low coupling, adaptability, understand- ability, reliability, and portability. Those characteristics encourage the component’s reuse in similar situations. There are two advantages of reusing previously written code [86–88]: r Better quality. If previously tested modules are reused in a new software project, the reused modules are likely to have less faults than new modules. This reduces the overall failure rate of the new software.r Increased productivity. Organizations can save time and other resources by reusing operational modules, thereby increasing their overall productivity. How- ever, the quantum of increase depends upon the size and complexity of the com- ponents being reused and the overall size and complexity of the new software which reuses those components. The development cost of any software project is only about 40% of the total cost over its entire life cycle [89]. Significant maintenance benefit also results from reusing quality software. The empirical study conducted by Stephen R. Schach shows that the cost savings during main- tenance, as a consequence of reuse, are nearly twice the corresponding savings during development [90]. 1.9 OUTLINE OF THE BOOK Having given the aforementioned brief introduction to software evolution and main- tenance, now we provide an outline of the remaining chapters. Each chapter focuses on a specific topic in software evolution and maintenance, and it explains the topic by covering the technical, process, model, and/or practical aspects of the topic. Con- sequently, the reader gains a broad understanding of the main concepts in software evolution and maintenance. In Chapter 2 we explain three major maintenance classification schemes based on intention, activity, and evidence. Then we describe Lehman’s classification of properties of closed source software (CSS) of type S (Specified), P (Problem), and E (Evolving). The eight laws of software evolution for the E-type CSS sys- tem including empirical studies and its practical implications have been introduced. We discuss the origin of FOSS movement and the differences between CSS and www.it-ebooks.info OUTLINE OF THE BOOK 17 FOSS systems with respect to: (i) team structure; (ii) processes; (iii) releases; and (iv) global factors. In addition, we discuss the empirical research results about the Linux FOSS system to study the laws of evolution, originally proposed for CSS sys- tems. We conclude this chapter with a brief discussion on maintenance of component off-the-shelf-based systems. Chapter 3 introduces three types of evolution models, namely, reuse-oriented, staged, and change mini-cycle. Next, we discuss the IEEE/EIA 1219 and the ISO IEC 14764 maintenance processes. We explain a framework to make a plan for SCM to control software evolution processes. We close this chapter with a presentation of a state transition model to track the individual change requests as those flow through the organization. Chapter 4 introduces the concepts of software reengineering based on three basic principles: abstraction, refinement, and alteration. We discuss five basic reengineer- ing approaches: big bang, incremental, partial, iterative, and evolutionary. Next, we discuss two specific models for software reengineering: source code reengineering reference model and phase reengineering model. With the reengineering approaches and models in place, we introduce the concepts and objectives of reverse engineer- ing with an introduction to the Goals/Models/Tools paradigm that divides a process for reverse engineering into three successive phases: Goals, Models, and Tools. In addition, we examine some low-level reverse engineering tasks such as decompilers and disassemblers. DRE for data-oriented applications is explained toward the end of the chapter. Chapter 5 identifies the problems an organization faces in dealing with legacy information systems and discusses six viable solutions to the problems: freeze, out source, carry on, discard and redevelop, wrap, and migrate. We study four types of wrapping techniques in detail: database wrapper, system service wrapper, application wrapper, and function wrapper. In addition, we discuss five different levels of encap- sulations: process level, transaction level, program level, module level, and procedural level. Next, we focus our attention on migration of information systems. First we dis- cuss the migration issues, followed by 13 steps for migration planning to minimize the risk of modernization effort. We discuss seven available migration approaches: Cold Turkey, Database First, Database Last, Composite Database, Chicken Little, Butterfly, and Iterative. Chapter 6 presents the fundamentals of impact analysis, including the related concepts of ripple effect and change propagation. The reader learns the strengths and limitations of impact analysis techniques. We have selected topics to provide a foundation for enduring value of impact analysis and change propagation. In Chapter 7, we introduce to the reader different refactoring activities. Different formalisms and techniques to support these activities have been discussed. In addition, we discuss the initial work on software restructuring, such as elimination-of-goto, system sandwich, localization and information hiding, and clustering approaches. Chapter 8 considers the issues and solutions that underpin program understanding during maintenance. We discuss different models proposed by different researchers. In addition, the concept of protocol analysis is introduced to the readers. The chapter ends with a brief discussion of visualization for software comprehension. www.it-ebooks.info 18 BASIC CONCEPTS AND PRELIMINARIES In Chapter 9, we introduce the readers to reuse and domain engineering. Software reuse has the potential to reduce the maintenance cost more than development cost of software projects. We present a taxonomy of reuse, followed by a detailed description of domain and application engineering concepts, including real domain engineering approaches: DARE, FAST, and Koala. Finally, we discuss maturity and cost models associated with reuse. In the glossary section we have defined all the keywords that have been used in the book. The reader will find about 10 practice questions at the end of each chapter. A carefully chosen list of references is given at the end of each chapter for those who are more curious about the details of some of the topics. Finally, each of the following chapters contains a section on further reading. The further reading section provides pointers to more advanced materials concerning the topics of the chapter. REFERENCES [1] M. I. Halpern. 1965. Machine independence: its technology and economics. Communi- cations of the ACM, 8(12), 782–785. [2] L. A. Belady and M. M. Lehman. 1976. A model of large program development. IBM Systems Journal, 15(1), 225–252. [3] M. M. Lehman. 1980. Programs, life cycles, and laws of software evolution. Proceedings of the IEEE, September, 1060–1076. [4] R. G. Canning. 1972. The maintenance “iceberg”. EDP Analyzer, 10(10), 1–14. [5] E. B. Swanson. 1976. The Dimensions of Maintenance. Proceedings of the 2nd Interna- tional Conference on Software Engineering (ICSE), October 1976, San Francisco, CA. IEEE Computer Society Press, Los Alamitos, CA. pp. 492–497. [6] N. Chapin, J. F. Hale, K. M. Khan, J. F. Ramil, and W. G. Tan. 2001. Types of software evolution and software maintenance. Journal of Software Maintenance and Evolution: Research and Practice, 13, 3–30. [7] K. H. Bennett and V.T. Rajlich. 2000, Software Maintenance and Evolution: A Roadmap. ICSE, The Future of Software Engineering, June 2000, Limerick, Ireland. ACM, New York. pp. 73–87. [8] L. J. Arthur. 1988. Software Evolution: The Software Maintenance Challenge. John Wiley & Sons. [9] K. H. Bennett and J. Xu. 2003. Software Services and Software Maintenance. Proceedings of the 7th European Conference on Software Maintenance and Reengineering, March 2003, Benevento, Italy. IEEE Computer Society Press, Los Alamitos, CA. pp. 3–12. [10] M. W. Godfrey and D. M. German. 2008. The Past, Present, and Future of Software Evolution. Proceedings of the 2008 Frontiers of Software Maintenance (FoSM), October 2008, Beijing, China. IEEE Computer Society Press, Los Alamitos, CA. pp. 129–138. [11] D. L. Parnas. 1994. Software Aging. Proceedings of 16th International Conference on Software Engineering, May 1994, Sorrento, Italy. IEEE Computer Society Press, Los Alamitos, CA. pp. 279–287. [12] M. Jazayeri. 2005. Species Evolve, Individuals Age. Proceedings of 8th International Workshop on Principles of Software Evolution (IWPSE), September 2005, Lisbon, Portugal. IEEE Computer Society Press, Los Alamitos, CA. pp. 3–9. www.it-ebooks.info REFERENCES 19 [13] P. Stachour and D. C. Brown. 2009. You don’t know jack about software maintenance. Communications of the ACM, 52(11), 54–58. [14] M. M. Lehman, D. E. Perry, and J. F. Ramil. 1998. On Evidence Supporting the Feast Hypothesis and the Laws of Software Evolution. Proceedings of the 5th International Software Metrics Symposium (Metrics), November 1998. IEEE Computer Society Press, Los Alamitos, CA. pp. 84–88. [15] M. M. Lehman and J. F. Ramil. 2006. Software evolution. In: Software Evolution and Feedback (Eds N. H. Madhavvji, J. F. Ramil, and D. Perry). John Wiley, West Sussex, England. [16] M. M. Lehman, J. F. Ramil, P. D. Wernick, D. E. Perry, and W. M. Turski. 1997. Metrics and Laws of Software Evolution—The Nineties View. Proceedings of the 4th International Symposium on Software Metrics (Metrics 97), November 1997. IEEE Computer Society Press, Los Alamitos, CA, pp. 20–32. [17] S. S. Pirzada. 1988. A statistical examination of the evolution of the Unix system. PhD Thesis, Department of Computing, Imperical College, London, England. [18] M. M. Lehman and L. A. Belady. 1985. Program Evolution: Processess of Software Change. Academic Press, London. [19] M. W. Godfrey and Q. Tu. 2000. Evolution in Open Source Software: A Case Study. Proceedings of the International Conference on Software Maintenance, October 2000. IEEE Computer Society Press, Los Alamitos, CA, pp. 131–142. [20] G. Robles, J. J. Amor, J. M. Gonzalez-Barahona, and I. Herraiz. 2005. Evolution and Growth in Large Libre Software Projects. Proceedings of the 8th International Workshop on Principles of Software Evolution (IWPSE), September 2005, Lisbon, Portugal. IEEE Computer Society Press, Los Alamitos, CA. pp. 165–174. [21] ISO/IEC 14764:2006 and IEEE Std 14764-2006. 2006. Software Engineering–Software Life Cycle Processes–Maintenance. Geneva, Switzerland. [22] B. A. Kitchenham, G. H. Travassos, A. N. Mayrhauser, F. Niessink, N. F. Schneidewind, J. Singer, S. Takada, R. Vehvilainen, and H. Yang. 1999. Towards an ontologyy of software maintenance. Journal of Software Maintenance and Evolution: Research and Practice, 11, 365–389. [23] G. Ramesh and R. Bhattiprolu. 2006. Software Maintenance. Tata McGraw-Hill, New Delhi. [24] M. Vigder and A. Kark. 2006. Maintaining Cots-Based Systems: Start with Design. Proceedings of the 5th International Conference on Commercial-Off-The-Shelf (COTS)- Based Software Systems, February 2006, Orlando, Florida. IEEE Computer Society Press, Los Alamitos, CA. pp. 11–18. [25] D. Hybertson, A. Ta, and W. Thomas. 1997. Maintenance of cots-intensive software systems. Journal of Software Maintenance and Evolution: Research and Practice,9, 203–216. [26] W. W. Royce. 1970. Managing the Development of Large Software System: Concepts and Techniques. Proceeding of IEEE WESCON, August 1970, pp. 1–9, Republished in ICSE, Monterey, CA, 1987, pp. 328–338. [27] N. Schneidewind. 1987. The state of software maintenance. IEEE Transactions on Soft- ware Engineering, March, 303–309. [28] N. Chapin. 1988. Software Maintenance Life Cycle. Proceedings of the International Conference on Software Maintenance (ICSM), October 1988, Phoenix, Arizona. IEEE Computer Society Press, Los Alamitos, CA, pp. 6–13. www.it-ebooks.info 20 BASIC CONCEPTS AND PRELIMINARIES [29] W. K. Sharpley. 1977. Software Maintenance Planning for Embedded Computer Systems. Proceedings of the IEEE COMPSAC, November 1977, IEEE Computer Society Press, Los Alamitos, CA, pp. 520–526. [30] G. Parikh. 1982. The world of software manitenance. In: Techniques of Program and System Maintenance (Ed. G. Parikh), pp. 9-13. Little, Brown and Company, Boston, MA. [31] J. Martin and C. L. McClure. 1983. Software Maintenance: The Problem and Its Solution. Prentice-Hall, Englewood Cliffs, NJ. [32] S. Chen, K. G. Heisler, W. T. Tsai, X. Chen, and E. Leung. 1990. A model for assembly program maintenance. Journal of Software Maintenance: Research and Practice, March, pp. 3–32. [33] D. R. Harjani and J. P.Queille. 1992. A Process Model for the Maintenance of Large Space Systems Software. Proceedings of the International Conference on Software Maintenance (ICSM), November 1992, Orlando, FL. IEEE Computer Society Press, Los Alamitos, CA. pp. 127–136. [34] S. S. Yau, R. A. Nicholi, J. Tsai, and S. Liu. 1988. An integrated life-cycle model for software maintenance. IEEE Transactions on Software Engineering, August, pp. 1128– 1144. [35] S. S. Yau and I. S. Collofello. 1980. Some stability measures for software maintenance. IEEE Transactions on Software Engineering, November, pp. 545–552. [36] S. S. Yau, J. S. Collofello, and T. MacGregor. 1978. Ripple Effect Analysis of Software Maintenance. COMPSAC, Chicago, Illinois. IEEE Computer Society Press, Piscataway, NJ. pp. 60–65. [37] B. W. Boehm. 1988. A spiral model of software development and maintenance. IEEE Computer, May, pp. 61–72. [38] V. R. Basili. 1990. Viewing maintenance as reuse-oriented software development. IEEE Software, January, pp. 19–25. [39] R. G. Martin. 2002. Agile Software Development: Principles, Patterns, and Practices. Prentice-Hall. [40] T. Gilb. 1988. Principles of Software Engineering Management. Addison-Wesley, Read- ing, MA. [41] T. Mens. 2008. Introduction and roadmap: history and challenges of software evolution. In: Software Evolution (Eds. T. Mens and S. Demeyer). Springer Verlag, Berlin. [42] V. T. Rajlich and K. H. Bennett. 2000. A staged model for the software life cycle. IEEE Computer, July, pp. 2–8. [43] ISO/IEC 12207:2006 and IEEE Std 12207-2006. 2008. System and Software Engineering–Software Life Cycle Processes. Geneva, Switzerland. [44] IEEE Standard 1219-1998. 1998. Standard for Software Maintenance. IEEE Computer Society Press, Los Alamitos, CA. [45] IEEE Std 1042-1987. 1988. IEEE Guide to Software Configuration Management. IEEE, Inc., New York, NY. [46] K. Narayanaswamy and W. Scacchi. 1987. Maintaining configuration of evolving soft- ware systems. IEEE Transactions of Software Engineering, March, 13(3), 324–334. [47] D. Leblang. 1994. The CM challenge: configuration management that works. In: Config- uration Management, Chapter 1 (Ed. W. F. Tichy). John Wiley, Chichester. www.it-ebooks.info REFERENCES 21 [48] H. Yang and M. Ward. 2003. Successful Evolution of Software Systems. Artech House, Boston, MA. [49] E. J. Chikofsky and J. H. Cross II. 1990. Reverse engineering and design recovery. IEEE Software, January, pp. 13–17. [50] I. Jacobson and F. Lindstr ̇om. 1991. Re-engineering of Old Systems to an Object- oriented Architecture. Proceedings of the ACM Conference on Object Oriented Program- ming Systems Languages and Applications, October 1991. ACM Press, New York, NY, pp. 340–350. [51] P. Benedusi, A. Cimitile, and U. De Carlini. 1992. Reverse engineering processes, design document production, and structure charts. Journal of Systems Software, 19, 225–245. [52] P. Aiken. 1996. Data Reverse Engineering: Staying the Legacy Dragon. McGraw-Hill, Boston, New York, NY. [53] K. H. Bennett. 1995. Legacy systems: coping with success. IEEE Software, January, pp. 19–23. [54] M. Brodie and M. Stonebraker. 1995. Migrating Legacy Systems. Morgan Kaufmann, San Mateo, CA. [55] A. Cimitile, H. M ̇uller, and R. Klosch (Eds.) 1997. Pulling Together. Proceedings of the International Conference on Software Engineering, Workshop on Migration Strategies for Legacy Systems, Available as Technical Report TUV-1841-97-06 from Technical University University of Vienna, Vienna, Austria. [56] K. Bennett, M. Ramage, and M. Munro. 1999. Decision Model for Legacy Systems. IEEE Proceedings on Software, June, pp. 153–159. [57] W. C. Dietrich Jr., L. R. Nackman, and F. Gracer. 1989. Saving a legacy with objects. Proceedings of the 1989 ACM OOPSLA Conference on Object-Oriented Programming, 24(10), 77–83. ACM SIGPLAN Notices, ACM, New York, NY. [58] H. M. Sneed. 1996. Encapsulating Legacy Software for Use in Client/Server Systems.3rd Working Conference on Reverse Engineering, Washington, DC. IEEE Computer Society Press, Los Alamitos, CA. pp. 104–119. [59] S. Comella-Dorda, K. Wallnau, R. C. Seacord, and J. Robert. 2000. A Survey of Black-box Modernization Approaches for Information Systems. Proceedings of the International Conference on Software Maintenance, October, 2000, San Jose, CA. IEEE Computer Society Press, Los Alamitos, CA. pp. 173–183. [60] F. P. Coyle. 2000. Legacy integration—changing perpectives. IEEE Software, March/ April, 37–41. [61] H. M. Sneed. 1995. Planning the reengineeirng of legacy systems. IEEE Software, January, pp. 24–34. [62] J. Bisbal, D. Lawless, B. Wu, J. Grimson, V. Wade, R. Richardson, and D. O’Sullivan. 1997. A survey of research into legacy system migration. Technical Report TCD-CS- 1997-01, Computer Science Department, Trinity College, Dublin, January, pp. 39. [63] M. Battaglia, G. Savoia, and J. Favaro. 1998. Renaissance: A Method to Migrate from Legacy to Immortal Software Systems. Proceedings of Second Euromicro Conference on Software Maintenance and Reengineering, 1998, Florence, Italy. IEEE Computer Society Press, Los Alamitos, CA. pp. 197–200. [64] A. Bianchi, D. Caivano, V. Marengo, and G. Visaggio. 2003. Iterative reengineering of legacy systems. IEEE Transactions on Software Engineering, March, 225–241. www.it-ebooks.info 22 BASIC CONCEPTS AND PRELIMINARIES [65] S. A. Bohner and R. S. Arnold. 1996. An introduction to software change impact analysis. In: Software Change Impact Analysis (Eds. S. A. Bohner and R. S. Arnold). IEEE Computer Society Press, Los Alamitos, CA. [66] A. Podgurski and L. Clrke. 1990. A formal model of program dependencies and its implications for software testing, debugging, and maintenance. IEEE Transactions of Software Engineering, September, 16(9), 965–979. [67] M. J. Harrold and B. Malloy. 1993. A unified interprocedural program representation for maintenance environment. IEEE Transactions of Software Engineering, 19(6), 584–593. [68] S. Black. 2008. Deriving an approximation algorithm for automatic computation of ripple effect measures. Information and Software Technology, 50, 723–736. [69] V. Rajlich. 1997. A Model for Change Propagation Based on Graph Rewriting. Proceed- ings of the International Conference on Software Maintenance (ICSM), October 1997, Bari, Italy. IEEE Computer Society Press, Los Alamitos, CA. pp. 84–91. [70] A. E. Hassan and R. C. Holt. 2004. Predicting Change Propagation in Software Systems. Proceedings of the International Conference on Software Maintenance (ICSM), October 2004, Chicago, USA. IEEE Computer Society Press, Los Alamitos, CA. pp. 284–293. [71] N. Ibrahim, W. M. N. Kadir, and S. Deris. 2008. Comparative Evaluation of Change Propagation Approaches Towards Resilient Software Evolution. Proceedings of the Third International Conference on Software Engineering Advances, pp. 198–204. [72] D. E. Perry and W.M. Evangelist. 1987. An Empirical Study of Software Interface Faults— An Update. Proceedings of the Twentieth Annual Hawaii International Conference on Systems Sciences, January 1987, Volume II, pp. 113–126. [73] M. Fowler. 1999. Refactoring: Improving the Design of Existing Programs. Addison- Wesley. [74] W. F. Opdyke. 1992. Refactoring: A program restructuring aid in designing object- oriented application framework. PhD thesis, University of Illinois at Urbana-Champaign. [75] S. Demeyer. 2008. Object-oriented reengineering. In: Software Evolution (Eds. T. Mens and S. Demeyer). Springer Verlag, Berlin. [76] T. A. Corbi. 1989. Program understanding: challenge for the 1990s. IBM Systems Journal, 28(2), pp. 294–306. [77] H. A. M ̇uller. 1996. Understanding Software Systems Using Reverse Engineering Tech- nologies: Research and Practice. Department of Computer Science, University of Victoria. Available at http://www.rigi.csc.uvic.ca/uvicrevtut/uvicrevtut.html. [78] P. N. Johnson-laird. 1983. Mental Model. Harvard University Press, Cambridge, MA. [79] K. J. W. Craik. 1943. The Nature of Explanation. Cambridge University Press, Cambridge, UK. [80] S. Letovsky. 1986. Cognitive Processes in Program Comprehension. Proceedings of the First Workshop in Empirical Studies of Programmers, pp. 58–79. [81] P. Nauer, B. Randell, and J. N. Buxton (Eds). 1969. Software engineering. Report on a Conference by the NATO Science Committee, NOATO Scientific Affairs Division, Brussels, Belgium, Available through Petrocelli-Charter, New York. [82] M. D. McIlroy. 1969. Mass Produced Software Components. Proceedings of Software Engineering Concepts and Techniques, 1968 NATO Conference on Software Engineer- ing (Eds. P. Naur, B. Randell, and J. N. Buxton ), pp. 138–155. Petrocelli-Charter, New York, NY. www.it-ebooks.info EXERCISES 23 [83] D. L. Parnas. 1976. On the design and development of program families. IEEE Transac- tions of Software Engineering, 2(1), 1–9. [84] J. M. Neighbors. 1980. Software construction using components. Technical Report 160, Department of Information and Computer Sciences, University of California, Irvine. [85] T. C. Jones. 1984. Reusability in programming: a survey of the state of the art. IEEE Transactions of Software Engineering, 10(5), 488–494. [86] J. E. Gaffney and T. A. Durek. 1989. Software reuse - key to enhanced productivity: some quantitative models. Information and Software Technology, 31(5), 258–267. [87] R. D. Banker and R. J. Kauffman. 1991. Reuse and productivity in integrated computer- aided software engineering: an emprical study. MIS Quarterly, 15(3), 374–401. [88] V. R. Basili, L. C. Brand, and W. L. Melo. 1996. Machine independence: Its technology and economics. Communications of the ACM, 39(10), pp. 104–116. [89] Gartner Group Inc. 1991. Software engineering strategies. Strategic Analysis Report, Stamford, CT, April. [90] S. R. Schach. 1994. The economic impact of software reuse on maintenance. Jour- nal of Software Maintenance and Evolution: Research and Practice, July/August, 6(4), pp. 185–196. EXERCISES 1. Discuss the differences between software evolution and software maintenance. 2. Explain why a software system which is used in a real-world environment must be changed to not become progressively less useful. 3. What are some characteristics of maintaining software as opposed to new soft- ware systems? 4. You are asked to make a change to a system that leaves its functional specification unchanged but affects the design and source code of the system. This can be any of the four types of maintenance mentioned earlier except one. Identify the exception and justify your answer. (a) Corrective maintenance (b) Adaptive maintenance (c) Perfective maintenance (d) Preventive maintenance 5. Discuss the major differences between COTS-based software development and traditional in-house software development activities. 6. One of the key sources of risks in COTS-based development is the reliance on one or more third-party software vendors. However, this dependence can also present new challenges for the evolution of such systems. Which of the following evolution challenges can be directly attributed to reliance on the vendor? (a) Lack of control over when errors in components are fixed. www.it-ebooks.info 24 BASIC CONCEPTS AND PRELIMINARIES (b) Number and complexity of inter-component interfaces. (c) Diversity of inter-component interfaces. (d) Lack of experience and tools for evolving COTS-based systems. 7. What are the objectives of SCM? 8. A feature of any complex change to an existing software system is that it is likely to introduce new defects, even if the aim of the change is to remove defects. When considering whether or not to implement a change request, should this feature be considered as a cost, benefit, or risk associated with the change request? 9. System A is a mission critical legacy system that captures and stores detailed data on product sales. Data from system A must be regularly extracted and loaded into a new system (B), which is to be used to help managers understand the changes in sales patterns from week to week. Initial estimates suggest that the data for 1 week can be extracted and transformed in around 3 hours. What migration frequency would you choose for this new application? (a) Migrate on update. (b) Migrate daily, every evening at 2.00 a.m. (c) Migrate weekly, every Sunday evening at 2.00 a.m. (d) Migrate monthly, on the last Sunday of every month at 2.00 a.m. 10. What are some of the risks of not doing an impact analysis before effecting a change? 11. What actions can be taken to minimize the impact of fixing defects? 12. What problems do maintainers face when rewriting or reengineering a piece of code? What are the causes of those problems? 13. Explain the term hypotheses in the context of program understanding. 14. What benefits can be derived from reusing software? www.it-ebooks.info 2 TAXONOMY OF SOFTWARE MAINTENANCE AND EVOLUTION Evolution is not a force but a process. Not a cause but a law. —John Morley 2.1 GENERAL IDEA In the early 1970s, the term “maintenance” was used to refer to tasks for making intentional modifications to the existing software at IBM. Those who performed maintenance tasks had not carried out the software development work. The idea behind having a different set of personnel to carry out maintenance work was to free the development engineers from support activities. The aforementioned model continued to influence the activities that are collectively known as “software main- tenance.” In circa 1972, in his landmark article “The Maintenance ‘Iceberg’,” R. G. Canning [1] used the iceberg metaphor to describe the enormous mass of potential problems facing practitioners of software maintenance. Practitioners took a narrow view of maintenance as correcting errors and expanding the functionalities of the system. In other words, maintenance consisted of two kinds of activities: correcting errors and enhancing functionalities of the software. Hence, maintenance can be inap- propriately seen as a continuation of software development with an extra input—the existing software system [2]. The ISO/IEC 14764 standard [3] defines software maintenance as “… the totality of activities required to provide cost-effective support to a software system. Activities Software Evolution and Maintenance: A Practitioner’s Approach, First Edition. Priyadarshi Tripathy and Kshirasagar Naik. © 2015 John Wiley & Sons, Inc. Published 2015 by John Wiley & Sons, Inc. 25 www.it-ebooks.info 26 TAXONOMY OF SOFTWARE MAINTENANCE AND EVOLUTION are performed during the pre-delivery stage as well as the post-delivery stage” (p. 4). Post-delivery activities include changing software, providing training, and operating a help desk. Pre-delivery activities include planning for post-delivery operations. Dur- ing the development process, maintainability is specified, reviewed, and controlled. If this is done successfully, the maintainability of the software will improve during the post-delivery stage. The standard further defines software maintainability as “… the capability of the software product to be modified. Modification may include correc- tions, improvements or adaptation of the software to changes in environment, and in requirements and functional specification” (p. 3). A major difference exists between software maintenance and software develop- ment: maintenance is event driven, whereas development is requirements driven [4]. A process for software development begins with the objective of designing and imple- menting a system to deliver certain functional and nonfunctional requirements. On the other hand, a maintenance task is scheduled in response to an event. Reception of a change request from a customer is a kind of event that can trigger software maintenance. Similarly, recognition of the needs to fix a set of bugs is considered another kind of event. Events originate from both the customers and from within the developed organization. Generally, the inputs that invoke maintenance activi- ties are unscheduled events; execution of the actual maintenance activities might be scheduled according to a plan, but the events that initiate maintenance activities occur randomly. A maintenance activity accepts some existing artifacts as inputs and generates some new and/or modified artifacts. Now we further explain the idea of a maintenance activity taking in an input and producing an output. In general, an investigation activity is the first activity in a maintenance process. In an investigation activity, a maintenance engineer evaluates the nature of the events, say, a change request (CR). Finding the impact of executing the CR is an example of investigation activity. Upon completion of the first activity, the organization decides whether or not to proceed with the modification activity. In the following subsections, we explain maintenance activities from three view- points: r Intention-based classification of software maintenance activities;r Activity-based classification of software maintenance activities; andr Evidence-based classification of software maintenance activities. 2.1.1 Intention-Based Classification of Software Maintenance In the intention-based classification, one categorizes maintenance activities into four groups based on what we intend to achieve with those activities [5–7]. Based on the Standard for Software Engineering–Software Maintenance, ISO/IEC 14764 [3], the four categories of maintenance activities are corrective, adaptive, perfective, and preventive as explained in the following. Corrective maintenance. The purpose of corrective maintenance is to correct failures: processing failures and performance failures. A program producing a wrong output is an example of processing failure. Similarly, a program not being able to www.it-ebooks.info GENERAL IDEA 27 meet real-time requirements is an example of performance failure. The process of corrective maintenance includes isolation and correction of defective elements in the software. The software product is repaired to satisfy requirements. There is a variety of situations that can be described as corrective maintenance such as correcting a program that aborts or produces incorrect results. Basically, corrective maintenance is a reactive process, which means that corrective maintenance is performed after detecting defects with the system. Adaptive maintenance. The purpose of adaptive maintenance is to enable the sys- tem to adapt to changes in its data environment or processing environment. This process modifies the software to properly interface with a changing or changed environment. Adaptive maintenance includes system changes, additions, deletions, modifications, extensions, and enhancements to meet the evolving needs of the envi- ronment in which the system must operate. Some generic examples are: (i) changing the system to support new hardware configuration; (ii) converting the system from batch to online operation; and (iii) changing the system to be compatible with other applications. A more concrete example is: an application software on a smartphone can be enhanced to support WiFi-based communication in addition to its present Third Generation (3G) cellular communication. Perfective maintenance. The purpose of perfective maintenance is to make a variety of improvements, namely, user experience, processing efficiency, and maintainabil- ity. For example, the program outputs can be made more readable for better user experience; the program can be modified to make it faster, thereby increasing the processing efficiency; and the program can be restructured to improve its readability, thereby increasing its maintainability. In general, activities for perfective maintenance include restructuring of the code, creating and updating documentations, and tuning the system to improve performance. It is also called “maintenance for the sake of maintenance” or “reengineering” [8]. Preventive maintenance. The purpose of preventive maintenance is to prevent problems from occurring by modifying software products. Basically, one should look ahead, identify future risks and unknown problems, and take actions so that those problems do not occur. For example, good programming styles can reduce the impact of change, thereby reducing the number of failures [9]. Therefore, the program can be restructured to achieve good styles to make later program comprehension easier. Preventive maintenance is very often performed on safety critical and high available software systems [10–13]. The concept of “software rejuvenation” is a preventive maintenance measure to prevent, or at least postpone, the occurrences of failures due to continuously running the software system. Software rejuvenation is a proactive fault management technique aimed at cleaning up the system internal state to prevent the occurrence of more severe crash in the future. It involves occasionally terminating an application or a system, cleaning its internal state, and restarting it. Rejuvenation may increase the downtime of the application; however, it prevents the occurrence of more severe and costly failures. In a safety critical environment, the necessity of performing preventive maintenance is evident from the example of control software for Patriot missile: “On 21 February, the office sent out a warning that ‘very long running time’ could affect the targeting accuracy. The troops were not told, however, how many www.it-ebooks.info 28 TAXONOMY OF SOFTWARE MAINTENANCE AND EVOLUTION hours ‘very long’ was or that it would help to switch the computer off and on again after 8 hours.” (p. 1347 of Ref. [14]). The purpose of software maintenance activities of preventive maintenance of a safety critical software system is to eliminate hazard or reduce their associated risk to an acceptable level. Note that a hazard is a state of a system or a physical situation which, when combined with certain environment con- ditions, could lead to an accident. A hazard is a prerequisite for an accident or mishap. 2.1.2 Activity-Based Classification of Software Maintenance In the intention-based classification of maintenance activities, the intention of an activity depends upon the reason for the change [7]. On the other hand, Kitchenham et al. [4] organize maintenance modification activities based on the maintenance activ- ity. The authors classify the maintenance modification activities into two categories: corrections and enhancements. r Corrections. Activities in this category are designed to fix defects in the system, where a defect is a discrepancy between the expected behavior and the actual behavior of the system.r Enhancements. Activities in this category are designed to effect changes to the system. The changes to the system do not necessarily modify the behavior of the system. This category of activities is further divided into three subcategories as follows: – enhancement activities that modify some of the existing requirements imple- mented by the system; – enhancement activities that add new system requirements; and – enhancement activities that modify the implementation without changing the requirements implemented by the system. Now one can find a mapping between Swanson’s terminology and Kitchenham’s terminology. Enhancement activities which are necessary to change the implemen- tations of existing requirements are similar to Swanson’s idea of perfective mainte- nance. Enhancement activities, which add new requirements to a system, are similar to the idea of adaptive maintenance. Enhancement activities which do not impact requirements but merely affect the system implementation appear to be similar to preventive maintenance. 2.1.3 Evidence-Based Classification of Software Maintenance The intention-based classification of maintenance activities was further refined by Chapin et al. [15]. The objectives of the classification are as follows: r base the classification on objective evidence that can be measured from obser- vations and comparisons of software before and after modifications;r set the coarseness of the classification to truly reflect a representative mix of observed activities; www.it-ebooks.info GENERAL IDEA 29 r make the classification independent of the execution and development envi- ronments: hardware, operating system (OS), organizational practices, design methodology, implementation language, and personnel involved in mainte- nance. Modifications performed, detected, or observed on four aspects of the system being maintained are used as the criteria to cluster the types of maintenance activities: r the whole software;r the external documentation;r the properties of the program code; andr the system functionality experienced by the customer. Classification of maintenance activities is based on changes in the aforementioned four kinds of entities. Evidence of changes to those entities is gathered by comparing the appropriate portions of the software before the activity with the appropriate parts after the execution of the activity. In general, software maintenance or evolution involves many activities that may result in some kind of modifications to the software. A dominant categorization of activities emerges from all the modifications made, detected, or observed. The classification proposed by Chapin [16] was exhaustive in nature. His mutually exclusive types were grouped into clusters as illustrated in Figure 2.1. The definitions of the 12 types of maintenance activities are given in Table 2.1. Software properties Business rules Adaptive Cluster Groomative Preventive Performance Reductive Corrective Enhancive Training Consultive Evaluative Reformative Updative Documentation Support interface Type Type Type Type High impact Low impact FIGURE 2.1 Groups or clusters and their types www.it-ebooks.info 30 TAXONOMY OF SOFTWARE MAINTENANCE AND EVOLUTION TABLE 2.1 Evidence-Based 12 Mutually Exclusive Maintenance Types Types of Maintenance Definitions Training This means training the stakeholders about the implementation of the system. Consultive In this type, cost and length of time are estimated for maintenance work, personnel run a help desk, customers are assisted to prepare maintenance work requests, and personnel make expert knowledge about the available resources and the system to others in the organization to improve efficiency. Evaluative In this type, common activities include reviewing the program code and documentations, examining the ripple effect of a proposed change, designing and executing tests, examining the programming support provided by the operating system, and finding the required data and debugging. Reformative Ordinary activities in this type improve the readability of the documentation, make the documentation consistent with other changes in the system, prepare training materials, and add entries to a data dictionary. Updative Ordinary activities in this type are substituting out-of-date documentation with up-to-date documentation, making semi-formal, say, in UML to document current program code, and updating the documentation with test plans. Groomative Ordinary activities in this type are substituting components and algorithms with more efficient and simpler ones, modifying the conventions for naming data, changing access authorizations, compiling source code, and doing backups. Preventive Ordinary activities in this type perform changes to enhance maintainability and establish a base for making a future transition to an emerging technology. Performance Activities in performance type produce results that impact the user. Those activities improve system up time and replace components and algorithms with faster ones. Adaptive Ordinary activities in this type port the software to a different execution platform and increase the utilization of COTS components. Reductive Ordinary activities in this type drop some data generated for the customer, decreasing the amount of data input to the system and decreasing the amount of data produced by the system. Corrective Ordinary activities in this type are correcting identified bugs, adding defensive programming strategies and modifying the ways exceptions are handled. Enhancive Ordinary activities in this type are adding and modifying business rules to enhance the system’s functionality available to the customer and adding new data flows into or out of the software. www.it-ebooks.info GENERAL IDEA 31 TABLE 2.2 Impact of the Types Impact on Software Impact on Business Low ← −−−−−→ High Cluster and Type Low Support interface ↑ ⋄⋄⋄⋄⋄ Training | ⋄⋄⋄⋄ Consultive | ⋄⋄ Evaluative | Documentation | ⋄⋄ Reformative | ⋄⋄ Updative | Software properties | ⋄⋄ Groomative | ⋄⋄⋄ Preventive | ⋄⋄⋄ Performance | ⋄⋄ Adaptive | Business rules | ⋄ Reductive | ⋄⋄⋄ Corrective ↓ ⋄⋄⋄⋄⋄⋄ Enhancive High Source: From Reference 15. © 2001 John Wiley & Sons. The impacts of the different types or clusters of maintenance activities are summa- rized in Table 2.2. The first dimension of the impact of the evolution is the customer’s ability to perform its business functions effectively while continuing to use the sys- tem. The impact is represented as low or high. The number of diamonds in a row indicates the more probable range of impact on the business process of the customer. As an example, if the software is enhanced with new functionalities, then the customer is more likely to be able to achieve its business goals than modifications on noncode documentation. The second dimension is the software itself. This is arranged from top to bottom. As an example, rewriting a few pages of a user’s manual has almost no impact on the software compared to rewriting a block of code to fix a defect. Figure 2.1 illustrates the relationship among the different types of activities in terms of impacts. Decision Tree-Based Criteria The classification of maintenance activities have been illustrated in Figure 2.1. The classification is based on modifications performed —modifications deliberately done, modifications observed, modifications mani- fested, and modifications detected—in various physical and conceptual entities: r A—the whole software system.r B—the program code.r C—the functionalities experienced by customers. www.it-ebooks.info 32 TAXONOMY OF SOFTWARE MAINTENANCE AND EVOLUTION The idea of objective evidence about activities is key to making the classification. The fundamental acts of observation of behavior and comparison of behavior of two entities reveal the evidence, and evidence of change-producing activities serves as the criteria. Note that observation is performed on the artifacts and the activities operating on them. On the other hand, comparison is performed on the relevant parts of the software before and after a maintenance activity is performed. Activities are classified into different types by applying a two-step decision process: r First, apply criteria-based decisions to make the clusters of types.r Next, apply the type decisions to identify one type within the cluster. Figure 2.2 summarizes the aforementioned two steps in a hierarchical manner. The decision tree shown in Figure 2.2 is an objective evidence-based classification of maintenance types. The three-criteria decision involving A, B, and C lead to the right types of cluster. The decisions characterize the types within each cluster. In a maintenance process, we are interested in the impacts of maintenance activities not only on the software but also the business processes of the software. Due to the impact characteristics summarized in Table 2.2, one reads Figure 2.2 by beginning on the left-hand side and moving toward the right for increasing impact on the software and/or the business processes. First, clusters of maintenance types are identified by using the type (namely, A, B, and C) decisions. Next, the types within a cluster are Business rules D-3: Enhancive* D-2: Corrective D-1: Reductive Software properties C-4: Adaptive* C-3: Performance C-2: Preventive C-1: Groomative Documentation B-2: Updative* B-1: Reformative Support interface A-3: Evaluative* A-2: Consultive A-1: Training C: Was function changed? B: Was source code changed? A: Was software changed? No No No Yes YesYes Types of software maintenance Notes: Types is read from left to right, bottom to top. Questions have been listed in Table 2.2. Italics show the type name when the type decision at the left of it is ‘Yes’ For “cc”, read “change to code”.* indicates the default type in the cluster. FIGURE 2.2 Decision tree types. From Reference 15. © 2001 John Wiley & Sons www.it-ebooks.info GENERAL IDEA 33 TABLE 2.3 Summary of Evidence-Based Types of Software Maintenance Criteria Type Decision Question Type A-1 To train the stakeholders, did the activities utilize the software as subject? Training A-2 As a basis for consultation, did the activities employ the software? Consultive A-3 Did the activities evaluate the software? Evaluative B-1 To meet stakeholder needs, did the activities modify the noncode documentation? Reformative B-2 To conform to implementation, did the activities modify the noncode documentation? Updative C-1 Was maintainability or security changed by the activities? Groomative C-2 Did the activities constrain the scope of future maintenance activities? Preventive C-3 Were performance properties or characteristics modified by the activities? Performance C-4 Were different technology or resources used by the activities? Adaptive D-1 Did the activities constrain, reduce, or remove some functionalities experienced by the customer? Reductive D-2 Did the activities fix bugs in customer-experienced functionality? Corrective D-3 Did the activities substitute, add to, or expand some functionalities experienced by the customers? Enhancive identified by using type decisions. To make type decisions, one asks questions about a specific evidence. A type is only applicable if the answer to the type decision question is “Yes.” For specific type decision questions, the reader is referred to Table 2.3. Sometimes, an objective evidence may be found to be ambiguous. In that case, clusters have their designated default types for use. The overall default type is evaluative if there are ambiguities in an activity. Ambiguities can be present in observations and the available documentation evidence. Trivially, no type of software maintenance has been conducted if observations indicate that no activities are performed on the software, except merely executing it. Since software maintenance involves many activities, a variety of questions are answered to determine their types, as illustrated in Figure 2.2, in the structure of a decision tree. All the three cluster decision criteria A, B, and C shown in the three dotted boxes of Figure 2.2 and, listed below as well, must be asked: r A: Was software changed?r B: Was source code changed?r C: Was function changed? A “No” response to any of the aforementioned questions leads to a cluster, where one or more type decisions result in an “Yes.” As an example, if the answer to question www.it-ebooks.info 34 TAXONOMY OF SOFTWARE MAINTENANCE AND EVOLUTION A is “No,” then at least one question in the box labeled “Support interface” produces a “Yes” answer. In addition, if the answer to question C is “Yes,” then at least one question in the box labeled “Business rules” produces a “Yes” answer. The basic idea is to move upward from the bottom of the rightmost column of leaves in Figure 2.1 as far up as possible. The aforementioned traversal is similar to traversing the tree shown in Figure 2.2. Next, we explain the four kinds of clusters, namely, support interface, documentation, software properties, and business rules one by one. Support Interface Cluster This cluster relates to the modifications on how service and/or maintenance personnel interact with stakeholders and customers. The support interface cluster is invoked if the answer to the A criteria decision question “Was software changed?” is a “No.” It consists of maintenance type decision in the order of A-1, A-2, and A-3, because of their increasing impact. The default type here is Evaluative. In the following, we explain the three type decisions one by one. r Type decision A-1. “To train the stakeholders, did the activities utilize the soft- ware as subject?” is the A-1-type decision. In the training type, common activ- ities include: (i) in-class lessons for customers and (ii) a variety of training spanning from on-site to web-based training using training materials from the documentations. The idea is to provide training to the stakeholders in the details of the system that has been implemented by the software.r Type decision A-2. This type decision is “As a basis for consultation, did the activities employ the software?,” and it is of consultive type. This type decision is commonly performed as it involves such activities as estimating the cost of the planned maintenance task, providing support from a help desk, helping customers in preparing a CR, and making specific knowledge about the system or resources available to others in the organization.r Type decision A-3. This decision, which is of evaluative type, is “Did the activ- ities evaluate the software?” A-3 is a commonly used decision as it includes the following activities: searching, examining, auditing, diagnostic testing, regres- sion testing, understanding the software without modifying it, and computing different types of metrics. Documentation Cluster For the A criterion decision, if assessment of the objective evidence is “Yes,” then it implies that the software was modified. Next, we analyze the B decision, which is about source code. The documentation cluster is invoked by a “No” answer to the B criterion question “Did the activities change the code?” It concerns modifications in the documentation except source code. The cluster comprises two decisions, namely, B-1 and B-2. Documentation cluster activities normally appear after the software interface cluster. r Type decision B-1. The B-1 decision is “To meet stakeholder needs, did the activities modify the noncode documentation?” Ordinary activities in reformative type improve the readability of the documentation, change the www.it-ebooks.info GENERAL IDEA 35 documentation to incorporate the effects of modifications in the manuals, pre- pare training materials, and change the style of the documentation for noncode entities. In other words, it involves reformulation of documentation for noncode entities by modifying its style while preserving the code.r Type decision B-2. The B-2 decision is “To conform to implementation, did the activities modify the noncode documentation?” This updative type involves activities for replacing out-of-date documentation with up-to-date documenta- tion, making semi-formal models to describe current source code, and combin- ing test plans with the documentation, without modifying the code. Out of the two types, the default type is update. Software Properties Cluster The code is said to be modified if the B criterion decision produces a “Yes” outcome. Next, one analyzes decision C which queries about modifications in the functionality of the system observed by the user. The software properties cluster is invoked by a “No” answer to the C criteria decision “Did the activities change the customer-experienced functionality?” This cluster comprises four type decisions, namely, C-1, C-2, C-3, and C-4, with increasing impact in that order. The cluster concerns modifications in the attributes of the software without involving modifications in the functionality delivered by the software. Activities in this group commonly follow the documentation cluster and the support interface cluster. The default type here is adaptive, and the details of the type decisions in this cluster are as follows: r Type decision C-1. “Was maintainability or security changed by the activities?” is the C-1-type decision, and it is ofgroomative type. The decision involves “anti- regression” activities for source code grooming, such as substituting algorithms and modules with better ones, modifying conventions for data naming, making backups, changing access authorizations, altering code understandability, and recompiling the code.r Type decision C-2. C-2 is a preventive type decision asking the question “Did the activities constrain the scope of future maintenance activities?” This type of activities makes modifications to the code without changing either the exist- ing functionality experienced by the customers or the resources utilized or the existing technology. The impacts of such activities are generally not visible to the customer. The common activities are making changes to improve maintain- ability.r Type decision C-3. “Were performance properties or characteristics modified by the activities?” is a C-3-type decision, and it is of performance type. This involves improving system up time, substituting algorithms and modules with the ones with better efficiency, reducing the demand for storage, and improving the system’s robustness and reliability. The customer often observes the changes in those properties.r Type decision C-4. “Were different technology or resources used by the activ- ities?” is a C-4-type decision, and it is an adaptive type. This type includes www.it-ebooks.info 36 TAXONOMY OF SOFTWARE MAINTENANCE AND EVOLUTION activities such as porting the software to a new execution platform, increased utilization of commercial off-the-shelf (COTS), changing the supported commu- nication protocols, and moving to object-oriented technologies. Those activities can change customer-perceivable system properties, but similar to C-1, C-2, and C-3, type C-4 does not modify the functionality experienced by the customers. Type C-4 is the default in this group. Business Rules Cluster This cluster is invoked by a “Yes” answer to the C cri- teria decision question “Did the activities change the customer-experienced func- tionality?” This cluster comprises the D-1, D-2, and D-3-type decisions. These types of activities occur most frequently, and activities from other clusters are needed to support these activities. This cluster involves the user- and business-level functionalities. r Type decision D-1. Type decision D-1 is “Did the activities constrain, reduce, or remove some functionalities experienced by the customers?” and it is of reductive type. This type of activities delete portions or all of the mod- ules to constrain or remove some business rules. When organizations merge, their business rules undergo such actions as elimination, restriction, and reduction.r Type decision D-2. Type decision D-2 is “Did the activities fix bugs in customer- experienced functionality?” The major tasks fix defects, introduce defensive programming, and modify the ways exceptions are handled.r Type decision D-3. Type decision D-3 is “Did the activities substitute, add to, or expand some functionalities experienced by the customers?,” and it is of enhancive type. This type implements modifications by enhancing the business rules to support more functionalities of the system. The major tasks are to add new subsystems and algorithms and modify the current ones to enhance their scope. The changes may affect customer experience of system functionality. D-3 is the default type in this group. Example: A maintenance engineer, after analyzing all the documentation along with the program code, modified the program code for one component without modifying other documentation, built the rewritten component, executed the regression test suite, checked it into the version control, and embedded it into the production system. The only consequence the customer observed was improved latency. Question: Identify the type of software maintenance performed by the engineer [15]. From the activities reported it is apparent that criteria A and B evaluate to “Yes,” whereas criterion C evaluates to “No.” The given evidence leads to a “Yes” decision for the performance type in the Software Properties cluster. In addition, the evidence leads us to the evaluative type in the cluster Support Interface. In Figure 2.1, we identify the two types, namely, performance and evaluative, and note that the first type is higher up than the second one. Because performance type is higher up than www.it-ebooks.info CATEGORIES OF MAINTENANCE CONCEPTS 37 evaluative type, one expects evidence of the consultive, training, reformative, upda- tive, preventive, and groomative type, but not of the adoptive, enhancive, corrective, or reductive types. Therefore, performance is the dominant maintenance type for this example. 2.2 CATEGORIES OF MAINTENANCE CONCEPTS In the previous section, we discussed different views of software maintenance. In this section, we describe some ways to organize maintenance activities. Organization of maintenance activities is conceptually similar to the organization of activities for software development. However, maintenance activities focus on product correction and adaptation, whereas development activities transform high-level requirements into working code. In this section, the key factors influencing the maintenance process are identified. The domain concepts that influence software maintenance process can be classified into four categories: r the product to be maintained;r the types of maintenance to be performed;r the maintenance organization processes to be followed; andr the peopleware involved, that is, the people in the maintenance organization and in the customer/client organization. The four categories and the concepts that influence the maintenance process have been illustrated in Figure 2.3. The concepts in each category are defined to understand the relationships among maintenance concepts. Next, we describe the characteristics of these concepts and their impacts on maintenance activities. This categorization enables one to know to what degree methods, tools, and skills for maintenance differ from those for software development [4]. In the following subsections, we explain the four dimensions of maintenance as depicted in Figure 2.3. 2.2.1 Maintained Product The maintained product dimension is characterized by three concepts: product, prod- uct upgrade, and artifacts. We explain the characteristics of these concepts and their impacts on maintenance process. Product. A product is a coherent collection of several different artifacts. Source code is the key component of a software product. Other artifacts of interest include print manuals and online help. Product upgrade. Baseline is an arrangement of related entities that make up a particular software configuration. Any change or upgrade made to a software www.it-ebooks.info 38 TAXONOMY OF SOFTWARE MAINTENANCE AND EVOLUTION Maintained product Product Product upgrade Artifact Maintenance process Activity Investigation activity Modification activity Management activity QA activity Resources Peopleware Client organization Maintenance Organization Human resource Maintenance Maintenance types Organization processes Service level agreement Maintenance management Change control Configuration management Management event Investigation report FIGURE 2.3 Overview of concept categories affecting software maintenance product relates to a specific baseline. Note that an upgrade can create a new version of the system being maintained, a patch code for an object, or even a notice explaining a restriction on the use of the system. A restriction notice can be a release note saying that the product may not work with a specific version of a hardware. Artifact. A number of different artifacts are used in the design of a software product and, similarly, a number of artifacts simultaneously exist along side a software product. One can find the following types of artifacts: textual and graphical documents, component off-the-shelf products, and object code components. Textual documents are readily understood by human readers: requirements specifications, plans, designs, and source code listings. The key elements of the maintained product are size, age, application type, compo- sition, and quality. The key characteristics of the aforementioned elements affecting maintenance performance are explained below. The size of a software system affects the number of personnel needed to maintain the system. A small-size product can be maintained by a single person, whereas a medium-size product needs a team. However, for a large product one may need multiple maintenance teams. Maintenance activities on relatively large systems are generally less efficient than activities on small systems, because in a large product it www.it-ebooks.info CATEGORIES OF MAINTENANCE CONCEPTS 39 is more difficult to: (i) conduct root cause analysis of some problems and (ii) identify ripple effects on the modules to support large enhancements. The age of a software product, also known as software geriatrics, is the number of calendar years elapsed since its first release. The age of a software product can affect maintenance activities in various ways [17, 18]. It is difficult to maintain too old products for the following reasons: (i) it is difficult to find maintenance personnel for too old products, because of changes in development technologies; (ii) finding tools, namely, compilers and code analyzers, for very old systems is difficult; and (iii) the original or up-to-date development documentation may not be available for old products. The life cycle stages of a software product have been listed in the first column of Table 2.4. The maintenance life cycle starts after the initial development and ends when the product is withdrawn from the market, as discussed in Section 3.3. The second column represents the corresponding maintenance tasks for each stage of maintenance. The third column represents typical size of user population associated with the product in that stage of its life cycle. It may be noted that large enhancements at the evolution stage can cause several patch releases at the servicing stage, consid- ering the need for fixing faults (See Lehman’s fifth law “conservation of familiarity.”) Table 2.4 shows different types of maintenance tasks performed by an organization that are related to the maturity of a system. Knowledge about the application domain influences productivities of software development and maintenance activities [19]. For example, a computer engineer will have stronger knowledge of IP (Internet Protocol) networking than an aerospace engineer. Hence, a computer engineer will be more productive in maintaining an IP network, whereas an aerospace engineer will be more productive in maintaining a software product to design airplane wings. In addition, application domains put constraints on maintenance of products and artifacts. For example, while maintaining a safety critical system, such as air traffic control, one must preserve—and, even increase—the product’s reliability. On the other hand, in another application domain the concept of time-to-market may cause early deployment of a newer version of a system. Therefore, different application domains affect the same aspect of mainte- nance to different degrees. The level of abstraction of the product component determines the skills required by the maintenance personnel and the tools they need to support the component. If the product has been derived from an in-house design, the maintenance personnel TABLE 2.4 Staged Model Maintenance Task Life Cycle Stage Maintenance Task User Population Initial development – – Evolution Corrections, enhancements Small Servicing Corrections Growing Phaseout Corrections Maximum Closedown Corrections Declining www.it-ebooks.info 40 TAXONOMY OF SOFTWARE MAINTENANCE AND EVOLUTION need access to the Computer-Aided Software Engineering (CASE) tools used by the software developers. On the other hand, if the product is composed of COTS com- ponents obtained from third parties, the maintenance engineers need integration and acceptance testing skills, and the required skills include development of supporting components such as wrapper, glue, and tailoring. The product quality initially delivered to the customer places constraints on the subsequent maintenance activities. Intuitively, good quality artifacts are easier to maintain than poor quality ones. In the absence of communication between main- tenance personnel and the original developers, quality of the product essentially determines the level of difficulty of maintaining the product. Documentation is often poor or even nonexistent for old products so maintenance personnel need specialized tools to reengineer the system. 2.2.2 Maintenance Types We discussed different types of maintenance activities in Section 2.1. In this subsec- tion, different types of maintenance activities are defined, followed by the impacts of those activities on maintenance performance. Activity. A number of different broad classes of maintenance activities are per- formed on software products, including investigation, modification, manage- ment, and quality assurance. An activity may be composed of several smaller sub-activities. Usually, an activity accepts some artifacts as inputs and produces new or changed artifacts. In the following, we briefly explain the four kinds of activities. Investigation activity. This kind of activities evaluate the impact of making a certain change to the system. Modification activity. This kind of activities change the system’s implementation. Management activity. This kind of activities relate to the configuration control of the maintained system or the management of the maintenance process. Quality assurance activity. This kind of activities ensure that modifications per- formed on a system do not damage the integrity of the system. For example, regression testing is an example of quality assurance activities. Maintenance personnel require different levels of understanding of the product by means of development tools so that they can execute the different maintenance modifications. For example, a corrective maintenance activity requires the ability to find the precise location of the faulty code and perform localized modifications. The maintenance engineers need to reproduce the bug in the test environment and may want to use tools to step through the suspected portion of code. On the other hand, an activity for quality or functionality enhancement requires a broad compre- hension of large portions of the system. In addition, the maintenance engineer needs the development environment, such as version control, to check out and check in www.it-ebooks.info CATEGORIES OF MAINTENANCE CONCEPTS 41 the code. The maintenance engineer may require re-engineering tools if the docu- mentation is poor. The size and priority of the change are important features that impact the productivity and efficiency of the activities. Specifically, an enhancement with a large scope will involve a large number of maintenance personnel and it will incur more communication and coordination overhead. The priority of a main- tenance activity affects the length of time needed for the change to be delivered to clients. The efficiency and quality of investigation activities depend upon the knowledge of the maintenance engineers possess about the current status of the release, outstanding issues, and any planned modifications about the portion of the system involved. The effectiveness of the configuration control mechanism and control of the change process impact the availability of those information. To identify the status of each system component, a good configuration control process is required. 2.2.3 Maintenance Organization Processes Two different levels of maintenance processes are followed within a maintenance organization: r the individual-level maintenance processes followed by maintenance personnel to implement a specific CR andr the organization-level process followed to manage the CRs from maintenance personnel, users, and customers/clients; this higher level of process is referred to as the maintenance organization process. The software maintenance concepts defined below are used to modify one or more artifacts to implement a CR: Development technology. This refers to the technology that was used to develop the original system and its constituent artifacts. The development technology constraints the maintenance procedures. Paradigm. This refers to the philosophy adopted at the time of developing the original system. Some examples are procedural paradigm and object-oriented paradigm. This too constraints the possible maintenance procedures. Procedure. A procedure can be a method, a technique, or a script. From a set of procedures, one is chosen to perform a specific activity. Method. A method is a general, systematic procedure giving clear steps and heuristics to implement some activities. Technique. A technique is a less rigorously defined procedure to realize an activity. Script. A script is a general guideline to construct or amend a specific type of document. www.it-ebooks.info 42 TAXONOMY OF SOFTWARE MAINTENANCE AND EVOLUTION Artifacts include plans, documents, system representations, and source and object code items. Artifacts are created during the software development process and changed during maintenance. A multitude of different scripts, techniques, and meth- ods are used to create and change such artifacts. The performance of maintenance activities are impacted by the selection of development paradigm and development technology. It is also impacted by the degree of automation of procedures. In general, development paradigm and development technologies put limitations on maintenance activities and skill requirements. For example, if the service-oriented architecture (SOA) paradigm is used for the software development process, then the maintenance engineers must be well versed with the SOA. Next, we briefly define the concepts of the maintenance organization processes and then discuss the impact of these concepts on maintenance. Service-level agreement (SLA). This is a contract between the customers and the providers of a maintenance service. Performance targets for the maintenance services are specified in the SLA. Maintenance management. This process is used to manage the maintenance ser- vice, which is not the same as managing individual CRs. An organization process is set up and run by the senior management. They create a structure of the maintenance team so that service-level agreement can be executed. In addi- tion to fulfilling the roles of regular processes, such as project management and quality assurance, maintenance management handles events, change control, and configuration control. Event management. The stream of events, namely, all the CRs from various sources, received by the maintenance organization, is handled in an event management process. Change control. Evaluation of results of investigations of maintenance events is performed in a process called change control. Based on the evaluation, the organization approves a system change. Configuration management. A system’s integrity is maintained by means of a configuration management process. Integrity of a product is maintained in terms of its modification status and version number. Maintenance organization structure. This is the hierarchy of roles assigned to maintenance personnel to perform administrative tasks. Maintenance event. This is a problem report or a CR originating from within the maintenance organization or from the customers. Investigation report. This is the outcome of assessing the cause and impacts of a maintenance event. Three major elements of a maintenance organization are event management, configuration management, and change control system, as explained before in this section. A maintenance organization handles maintenance requests from users, cus- tomers, and maintainers. If any of the three elements is not adequate, maintenance www.it-ebooks.info CATEGORIES OF MAINTENANCE CONCEPTS 43 will be inefficient and product quality is likely to be compromised. The efficiency of maintenance activities is highly influenced by support tools. There are many tools, namely, ManageEngine and Zendesk, to assist event management. The type and vol- ume of CRs affect the performance of the maintenance organization. As an example, if many defects are reported in a short time, there may not be enough resources to carry out modifications. After an initial investigation of a CR, a management process is put in place for approving change activities. Approval of a CR is normally the responsibility of a change control board. A change control board is organized as a formal process with meetings between maintenance managers, clients, users, and customers. A proposed modification activity is scheduled only after the modification is approved by the board and an SLA is signed with the client. The level of formality adopted in change control board procedures can affect the quality and efficiency of performing changes. A formal change control board generally slows down the maintenance process but is better at protecting the integrity of the systems being maintained. SLAs describe the maintenance organization’s performance targets. It can be used by maintenance personnel as a guidance to meet customer’s expectations. SLAs should be based on results rather than effort, and maintenance organizations must be prepared to meet their SLAs. In general, maintenance organizations use three different support levels to organize the staff: r Level 1. This group files problem reports and identifies the technical support person who can best assist the person reporting a problem.r Level 2. This level includes experts who know how to communicate with users and analyze their problems. These people recommend quick fixes and temporary workarounds.r Level 3. This level includes programmers who can perform actual changes to the product software. Note that not all maintenance works cause changes in the system. In many situa- tions, users may need advice on how to continue to use the system or in what ways they can bypass a problem. 2.2.4 Peopleware Maintenance activities cannot ignore the human element, because software pro- duction and maintenance are human-intensive activities. The three people-centric concepts related to maintenance are as follows: Maintenance organization. This is the organization that maintains the product(s). Client organization. A client organization uses the maintained system and it has a clear relationship with the maintenance organization. The said relationship is described in the SLA. www.it-ebooks.info 44 TAXONOMY OF SOFTWARE MAINTENANCE AND EVOLUTION Human resource. Human resource includes personnel from the maintenance and client organizations. Maintenance organization personnel include managers and engineers, whereas client organization personnel include customers and users. The management negotiates with the customers to find out the SLA, scheduling of requirement enhancements, and cost. In general, maintenance tasks are perceived to be less challenging, and, hence, less well rewarded than original work. Often maintenance tasks are partly assigned to newly recruited programmers, which has a significant impact on productivity and quality. A novice programmer may introduce new defects while resolving an incident because of an absence in understanding the whole system. Normally, more skilled maintenance personnel produce more and better quality works. Separation between development staff and maintenance staff impacts the main- tenance process. On one hand, there is no real separation between maintenance and development. In such a scenario, the product undergoes continual evolution. The developers incorporate maintenance activities into a continuing process for planned enhancements. In this case, the tools and procedures are the same for both the devel- opment and maintenance activities. On the other hand, there are maintenance organi- zations which operate with minimal interactions with the development departments. Occasionally, the maintenance group may not be located in the same organization that produced the software. In such a scenario, maintenance engineers may need spe- cially designed tools. Maintenance managers need to focus on the aforementioned issues when signing SLAs. Finally, the following user and customer issues affect maintenance: r Size. The size of the customer base and the number of licenses they hold affect the amount of effort needed to support a system.r Variability. High variability in the customer base impacts the scope of mainte- nance tasks.r Common goals. The extent to which the users and the customer have common goals affects the SLAs. Ultimately, customers fund maintenance activities. If the customers do not have a good understanding of the requirements of the actual users, some SLAs may not be appropriate to the end users. 2.3 EVOLUTION OF SOFTWARE SYSTEMS The term evolution was used by Mark I. Halpern in circa 1965 to define the dynamic growth of software [20]. It attracted more attention in the 1980s after Meir M. Lehman proposed eight broad principles about how certain types of software systems evolve [21–24]. Bennett and Rajlich [25] researched the term “software evolution,” but found no widely accepted definition of the term. However, some researchers and www.it-ebooks.info EVOLUTION OF SOFTWARE SYSTEMS 45 Software evolution Revised software People Existing software Machine (CPUs) Money (Resources) FIGURE 2.4 Inputs and outputs of software evolution. From Reference 26. © 1988 John Wiley & Sons practitioners used the term software evolution as a substitute for the term software maintenance. Lowell Jay Arthur distinguished the two terms as follows: r Maintenance means preserving software from decline or failure.r Evolution means a continuously changing software from a worse state to a better state (p. 1 of Ref. [26]). Software evolution is like a computer program, with inputs, processes, and outputs (p. 246 of Ref. [26]) (See Figure 2.4). Keith H. Bennett and Jie Xu [27] use “maintenance” for all post-delivery support, whereas they use “evolution” to refer to perfective modifications—modifications triggered by changes in requirements. Ned Chapin defines software evolution as: “the applications of software maintenance activities and processes that generate a new operational software version with a changed customer-experienced functionality or properties from a prior operational version, where the time period between versions may last from less than a minute to decades, together with the associated quality assurance activities and processes, and with the management of the activities and processes” (p. 21 of Ref. [15]). The majority of software maintenance changes are concerned with evolutions triggered by user requests for changes in the requirements. The following are the key properties of software evolution as desired by the stakeholders: r Changes are accomplished quickly and in a cost-effective manner.r The reliability of the software should not be degraded by those changes.r The maintainability of the system should not degrade. Otherwise, future changes will be more expensive to carry out. Software evolution is studied with two broad, complementary approaches, namely, explanatory and process improvement, and those describe the what and how aspects, respectively, of software evolution. r Explanatory (what/why). This approach attempts to explain the causes of soft- ware evolution, the processes used, and the effects of software evolution. The www.it-ebooks.info 46 TAXONOMY OF SOFTWARE MAINTENANCE AND EVOLUTION explanatory approach studies evolution from a scientific view point. In this approach, the nature of the evolution phenomenon is studied, and one strives to understand its driving factors and impacts.r Process improvement (how). This approach attempts to manage the effects of software evolution by developing better methods and tools, namely, design, maintenance, refactoring, and reengineering. The process improvement approach studies evolution from an engineering view point. It focuses on the more pragmatic aspects that assist the developers in their daily routines. There- fore, methods, tools, and activities that provide the means to direct, implement, and control software evolution are at the core of the process improvement approach. 2.3.1 SPE Taxonomy The abbreviation SPE refers to S (Specified), P (Problem), and E (Evolving) pro- grams. In circa 1980, Meir M. Lehman [24] proposed an SPE classification scheme to explain the ways in which programs vary in their evolutionary characteristics. The classification scheme is characterized by: (i) how a program interacts with its environment and (ii) the degree to which the environment and the underlying prob- lem that the program addresses can change. He observed a key difference between software developed to meet a fixed set of requirements and software developed to solve a real-world problem which changes with time. The observation leads to the identification of types S (Specified), P (Problem), and E (Evolving) programs. In the following, we explain the SPE concepts in detail. S-type programs: S-type programs have the following characteristics: r All the nonfunctional and functional program properties that are important to its stakeholders are formally and completely defined.r Correctness of the program with respect to its formal specification is the only criterion of the acceptability of the solution to its stakeholders. A formal definition of the problem is viewed as the specification of the program. S-type programs solve problems that are fully defined in abstract and closed ways. Examples of S-type programs include calculation of the lowest common multiple of two integers and to perform matrix addition, multiplication, and inversion [28]. The problem is completely defined, and there are one or more correct solutions to the problem as stated. The solution is well known, so the developer is concerned not with the correctness of the solution but with the correctness of the implementation of the solution. As illustrated in Figure 2.5, the specification directs and controls the programmers in creating the program that defines the desired solution. The problem statement, the program, and the solution may relate to a real world—and the real world can change. However, if the real world changes, the original problem turns into a completely new problem that must be respecified. But then it has a new program to provide a solution. It may be possible and time saving to derive a new program from www.it-ebooks.info EVOLUTION OF SOFTWARE SYSTEMS 47 Real world Formal statement of problem Requirements specification Program Solution Compare Static Change FIGURE 2.5 S-type programs the old one, but it is a different program that defines a solution to a different problem. The program remains almost the same in the sense that it does not accommodate changes in the problem that generates it [29]. In the real world, S-type systems are rare. However, it is an important concept that evolution of software does not occur under some conditions. P-type programs: With many real problems, the system outputs are accurate to a constrained level of precision. The concept of correctness is difficult to define in those programs. Therefore, approximate solutions are developed for pragmatic rea- sons. Numerical problems, except computations with integers and rational numbers, are resolved through approximations. For example, consider a program to play chess. Since the rules of chess are completely defined, the problem can be completely spec- ified. At each step of the game a solution might involve calculating the various moves and their impacts to determine the next best move. However, complete implementa- tion of such a solution may not be possible, because the number of moves is too large to be evaluated in a given time duration. Therefore, one must develop an approximate solution that is more practical while being acceptable. In order to develop this type of solution, we describe the problem in an abstract way and write the requirement specification accordingly. A program developed this way is of P-type because it is based on a practical abstraction of the problem, instead of relying on a completely defined specification. Even though an exact solution may exist, the solution produced by a P-type program is tampered by the environment in which it must be produced. The solution of a P-type program is accepted if the program outcomes make sense to the stakeholder(s) in the world in which the prob- lem is embedded. As illustrated in Figure 2.6, P-type programs are more dynamic than S-type programs. P-type programs are likely to change in an incremental fash- ion. If the output of the solution is unacceptable, then the problem abstraction may www.it-ebooks.info 48 TAXONOMY OF SOFTWARE MAINTENANCE AND EVOLUTION Real world Formal statement of problem Requirements specification Program Solution Compare Change Abstraction FIGURE 2.6 P-type programs be changed and the requirements modified to make the new solution more realis- tic. Note that the program resulting from the changes cannot be considered a new solution to a new problem. Rather, it is a modification of the old solution to better fit the existing problem. In addition, the real world may change, hence the problem changes. E-type programs: An E-type program is one that is embedded in the real world and it changes as the world does. These programs mechanize a human or society activity, make simplifying assumptions, and interface with the external world by requiring or providing services. An E-type system is to be regularly adapted to: (i) stay true to its domain of application; (ii) remain compatible with its executing environment; and (iii) meet the goals and expectations of its stakeholders [30]. Figure 2.7 illustrates the dependence of an E-type program on its environment and the consequent changeability. The acceptance of an E-type program entirely depends upon the stakeholders’ opinion and judgment of the solution. Their descrip- tions cannot be completely formalized to permit the demonstration of correctness, and their operational domains are potentially unbounded. The first characteristic of an E-type program is that the outcome of executing the program is not definitely pre- dictable. Therefore, for E-type programs, the concept of correctness is left up to the stakeholders. That is, the criterion of acceptability is the stakeholders’ satisfac- tion with each program execution [31]. An E-type program’s second characteristic www.it-ebooks.info EVOLUTION OF SOFTWARE SYSTEMS 49 Real world Formal statement of problem Requirements specification Program Solution Compare Change Abstraction FIGURE 2.7 E-type programs is that program execution changes its operational domain, and the evolution pro- cess is viewed as a feedback system [32]. Figure 2.8 [33] succinctly illustrates the feedback process. 2.3.2 Laws of Software Evolution Lehman and his colleagues have postulated eight “laws” over 20 years starting from the mid-1970s to explain some key observations about the evolution of E-type soft- ware systems [34, 35]. The laws themselves have evolved from three in 1974 to eight by 1997, as listed in Table 2.5. The eight laws are the results of empirical stud- ies of the evolution of large-scale proprietary software—also called closed source software (CSS)—in a variety of corporate settings. The laws primarily relate to perfective maintenance. These laws are largely based on the concept of feedback existing in the software environment. Their description of the phenomena are inter- twined, and the laws are not to be studied separately. The numbering of the laws has no significance apart from the sequence of their development. www.it-ebooks.info 50 TAXONOMY OF SOFTWARE MAINTENANCE AND EVOLUTION Program domain Program concept Operational program Bounding Development Step 1 Development Step 2 Development Step 3 Development Step i-1 Development Step i Program Exogenous change FIGURE 2.8 E-type programs with feedback. From Reference 33. © 2006 John Wiley & Sons TABLE 2.5 Laws of Software Evolution Names of the Laws Brief Descriptions I. Continuing change (1974) E-type programs must be continually adapted, else they become progressively less satisfactory. II. Increasing complexity (1974) As an E-type program evolves, its complexity increases unless work is done to maintain or reduce it. III. Self-regulation (1974) The evolution process of E-type programs is self-regulating, with the time distribution of measures of processes and products being close to normal. IV. Conservation of organizational stability (1978) The average effective global activity rate in an evolving E-type program is invariant over the product’s lifetime. V. Conservation of familiarity (1978) The average content of successive releases is constant during the life cycle of an evolving E-type program. VI. Continuing growth (1991) To maintain user satisfaction with the program over its lifetime, the functional content of an E-type program must be continually increased. VII. Declining quality (1996) An E-type program is perceived by its stakeholders to have declining quality if it is not maintained and adapted to its environment. VIII. Feedback system (1971–1996) The evolution processes in E-type programs constitute multi-agent, multi-level, multi-loop feedback systems. Source: Adapted from Lehman et al. [34]. ©1997 IEEE. www.it-ebooks.info EVOLUTION OF SOFTWARE SYSTEMS 51 Lehman’s laws were not meant to be used in a mathematical sense, as, say, Newton’s laws are used in physics. Rather, those were intended to capture stable, long-term knowledge about the common features of changing software systems, in the same sense social scientists use laws to characterize general principles applying to some classes of social situations [30]. The term “laws” was used because the observed phenomena were beyond the influence of managers and developers. The laws were an attempt to study the nature of software evolutions and the evolutionary trajectory likely taken by software. First Law Continuing change: E-type programs must be continually adapted, else they become progressively less satisfactory. Many assumptions are embedded in an E-type program. A subset of those assumptions may be complete and valid at the initial release of the product; that is, the program performed satisfactorily even if not all assumptions were satisfied. As users continue to use a system over time, they gain more experience, and their needs and expectations grow. As the application’s environment changes in terms of the number of sophisticated users, a growing number of assumptions become invalid. Consequently, new requirements and new CRs will emerge. In addition, changes in the real world will occur and the application will be impacted, requiring changes to be made to the program to restore it to an acceptable model. When the updated and modified program is reintroduced into the operational domain, it continues to satisfy user needs for a while; next, more changes occur in the operation environment, additional user needs are identified, and additional CRs are made. As a result, the evolution process moves into a vicious cycle. Second Law Increasing complexity: As an E-type program evolves, its complexity increases unless work is done to maintain or reduce it. As the program evolves, its complexity grows because of the imposition of changes after changes on the pro- gram. In order to incorporate new changes, more objects, modules, and sub-systems are added to the system. As a consequence, there is much increase in: (i) the effort expended to ensure an adequate and correct interface between the old and new elements; (ii) the number of errors and omissions; and (iii) the possibility of incon- sistency in their assumptions. Such increases lead to a decline in the product quality and in the evolution rate, unless additional work is performed to arrest the decline. The only way to avoid this from happening is to invest in preventive maintenance, where one spends time to improve the structure of the software without adding to its functionality. Third Law Self-regulation: The evolution process of E-type programs is self- regulating, with the time distribution of measures of processes and products being close to normal. This law states that large programs have a dynamics of their own; attributes such as size, time between releases, and the number of reported faults are approximately invariant from release to release because of fundamental structural and organizational factors. In an industrial setup E-type programs are designed and coded by a team of experts working in a larger context comprising a variety of management entities, namely, finance, business, human resource, sales, marketing, support, and user process. The various groups within the large organization apply constraining information controls and reinforcing information controls influenced www.it-ebooks.info 52 TAXONOMY OF SOFTWARE MAINTENANCE AND EVOLUTION by past and present performance indicators. Their actions control, check, and bal- ance the resource usage, which is a kind of feedback-driven growth and stabilization mechanism. This establishes a self-controlled dynamic system whose process and product parameters are normally distributed as a result of a huge number of largely independent implementation and managerial decisions [36]. Fourth Law Conservation of organizational stability: The average effective global activity rate in an evolving E-type program is invariant over the product’s lifetime. This law suggests that most large software projects work in a “stationary” state, which means that changes in resources or staffing have small effects on long-term evolution of the software. To a certain extent management certainly do control resource allo- cation and planning of activities. However, as suggested by the third law, program evolution is essentially independent of management decisions. In some instances, as indicated by Brooks [37], situations may arise where additional resources may reduce the effective rate of productivity output due to higher communication overhead or decrease in process quality. In reality, activities during the life cycle of a system are not exclusively decided by management but by a wide spectrum of controls and feedback inputs [38]. Fifth Law Conservation of familiarity: The average content of successive releases is constant during the life cycle of an evolving E-type program. As an E-type system evolves, both developers and users must try to develop mastery of its content and behavior. Thus, after every major release, established familiarity with the application and the system in general is counterbalanced by a decline in the detail knowledge and mastery of the system. This would be expected to produce a temporary slow down in the growth rate of the system as it is recognized that the system must be cleaned up to simplify the process of re-familiarization. In practice, adding new features to a program invariably introduces new program faults due to unfamiliarity with the new functionality and the new operational environment. The more changes are made in a new release, the more faults will be introduced. The law suggests that one should not include a large number of features in a new release without taking into account the need for fixing the newly introduced faults. Conservation of familiarity implies that maintenance engineers need to have the same high level of understanding of a new release even if more functionalities have been added to it. Sixth Law Continuing growth: To maintain user satisfaction with the program over its lifetime, the functional content of an E-type program must be continually increased. It is useful to note that programs exhibit finite behaviors, which implies that they have limited properties relative to the potential of the application domain. Properties excluded by the limitedness of the programs eventually become a source of perfor- mance constraints, errors, and irritation. To eliminate all those negative attributes, it is needed to make the system grow. It is important to distinguish this law from the first law which focuses on “Con- tinuing Change.” The first law captures the fact that an E-type software’s operational domain undergoes continual changes. Those changes are partly driven by installation and operation of the system and partly by other forces; an example of other forces is human desire for improvement and perfection. These two laws—the first and www.it-ebooks.info EVOLUTION OF SOFTWARE SYSTEMS 53 the sixth—reflect distinct phenomena and different mechanisms. When phenomena are observed, it is often difficult to determine which of the two laws underlies the observation. Seventh Law Declining quality: An E-type program is perceived by its stakehold- ers to have declining quality if it is not maintained and adapted to its environment. This law directly follows from the first and the sixth laws. An E-Type program must undergo changes in the forms of adaptations and extensions to remain satisfactory in a changing operational domain. Those changes are very likely to degrade the perfor- mance and will potentially inject more faults into the evolving program. In addition, the complexity (e.g., the cyclomatic measure) of the program in terms of interactions between its components increases, and the program structure deteriorates. The term for this increase in complexity over time is called entropy. The average rate at which software entropy increases is about 1–3 per calendar year [17]. There is significant decline in stakeholder satisfaction because of growing entropy, declining perfor- mance, increasing number of faults, and mismatch of operational domains. The afore- mentioned factors also cause a decline in software quality from the user’s perspective [39]. The decline of software quality over time is related to the growth in entropy associated with software product aging [18] or code decay [8]. Therefore, it is impor- tant to continually undertake preventive measures to reduce the entropy by improving the software’s overall architecture, high-level and low-level design, and coding. Eighth Law Feedback system: The evolution processes in E-type programs consti- tute multi-agent, multi-level, multi-loop feedback systems. Several laws of software revolution refer to the role of information feedback in the life cycles of software. The eighth law is based on the observation that evolution process of the E-type software constitutes a multi-level, multi-loop, multi-agent feedback system: (i) multi-loop means that it is an iterative process; (ii) multi-level refers to the fact that it occurs in more than one aspect of the software and its documentation; and (iii) a multi-agent software system is a computational system where software agents cooperate and compete to achieve some individual or collective tasks. Feedback will determine and constrain the manner in which the software agents communicate among themselves to change their behavior [40]. Remark: There are two types of aging in software life cycles: software process execution aging and software product aging. The first one manifests in degradation in performance or transient failures in continuously running the software system. The second one manifests in degradation of quality of software code and documentation due to frequent changes. The following aging-related symptoms in software were identified by Visaggio [41]: r Pollution. Pollution means that there are many modules or components in a system which are not used in the delivery of the business functions of the system.r Embedded knowledge. Embedded knowledge is the knowledge about the appli- cation domain that has been spread throughout the program such that the knowl- edge cannot be precisely gathered from the documentation. www.it-ebooks.info 54 TAXONOMY OF SOFTWARE MAINTENANCE AND EVOLUTION r Poor lexicon. Poor lexicon means that the component identifiers have little lexical meaning or are incompatible with the commonly understood meaning of the components that they identify.r Coupling. Coupling means that the programs and their components are linked by an elaborate network of control flows and data flows. Remark: The code is said to have decayed if it is very difficult to change it, as reflected by the following three key responses: (i) the cost of the change, which is effective only on the personnel cost for the developers who implement it; (ii) the calendar or clock time to make the changes; and (iii) the quality of the changed software. It is important to note that code decay is antithesis of evolution in the sense that while the evolution process is intended to make the code better, changes are generally degenerative thereby leading to code decay. 2.3.3 Empirical Studies Empirical studies are aimed at acquiring knowledge about the effectiveness of pro- cesses, methods, techniques, and tools used in software development and mainte- nance. Similarly, the laws of software evolution are prime candidates for empirical studies, because we want to know to what extent they hold. In circa 1976, Belady and Lehman [22] studied 20 releases of the OS/360 operating system. The results of their study led them to postulate five laws of software evolution: continuing change, increasing complexity, self-regulation, conservation of organizational stability, and conservation of familiarity. Those laws were further developed in an article pub- lished in 1980 [24]. Yuen [36, 42] further studied their five laws of evolution. He re-examined three different systems from Belady and Lehman [22] and several other systems and examined a variety of dependent variables. The number and percentage of modules handled are examples of dependent variables. After re-examining the data from previous studies, he observed that the characteristics observed for OS/360 did not necessarily hold for other systems. Specifically, the first two laws were supported, while the remaining three laws were not. Yuen, a collaborator of Lehman, notes that these three laws are more based upon those of human organizations involved in the maintenance process rather than the properties of the software itself. Later, in a project entitled FEAST (Feedback, Evolution, And Software Tech- nology), Lehman and his colleagues studied evolution of releases from four CSS systems: (i) two operating systems (OS/360 of IBM and VME OS of ICL); (ii) one financial system (Logica’s FW banking transaction system); and (iii) a real-time telecommunication system (Lucent Technologies). Their results are summarized as a set of growth curves, as described by Lehman, Perry, and Ramil [43]. The studies suggest that during the maintenance process a system tracks a growth curve that can be approximated either as linear or inverse square [44]. The inverse square model represents the growth phenomena as an inverse square of the continuing effort. Those trends increase the confidence of validity of the following six laws: Continuing change (I), Increasing complexity (II), Self-regulation (III), Conservation of familiarity (V), www.it-ebooks.info EVOLUTION OF SOFTWARE SYSTEMS 55 Continuing growth (VI), and Feedback system (VII). Confidence in the seventh law “Declining quality” is based on the theoretical analysis, whereas the fourth law “Con- servation of organizational stability” is neither supported nor falsified based on the metric presented. In 1982, there was an independent study by Lawrence [45], who took a statistical approach to observe some evidence supporting laws I and II, while laws III–V were not supported by the data. Inverse square law. According to the fourth law, the incremental effort, denoted by E, expended on each release stays the same during the evolution of the system. It is assumed that the incremental effort expended on each release is almost the same during the system evolution. Let Δi denote the incremental change of the system size and assume that it is solely due to the effort E expended on the ith release. To relate E and Δi, we need to consider a conceptual factor si, which behaves in this context like mass in dynamic physical systems. A larger si implies a greater resistance to change, and a smaller Δi will be obtained from expending effort E. Thus, Δi = E∕si is the first, simple relation that appears in the view. Assume that the size of the system is expressed in terms of number of modules, and complexity is expressed in terms of the number of intermodule interactions. By considering the number of intermodule interactions to be proportional to the square of the number of modules, one can obtain a relationship as follows: Δi = E∕s2 i , which is called the inverse square law. We assume that the law is valid and obtain the following expressions: s1 = s1 s2 = s1 + E∕s2 1 s3 = s2 + E∕s2 2 … By means of substitution we get: s1 = s1 s2 = s1 + E∕s2 1 s3 = s2 + E∕s2 2 = s1 + E ( 1∕s2 1 + 1∕s2 2 ) … Similarly, we get: E1 = s2 − s1 1∕s2 1 E2 = s3 − s1 1∕s2 1 + 1∕s2 2 … www.it-ebooks.info 56 TAXONOMY OF SOFTWARE MAINTENANCE AND EVOLUTION where the right-hand sides of the equations contain data about releases and nothing else. Thus, one can compute the mean of n values of E as E = ( ∑n i=1 Ei)∕n. E can be interpreted as the mean effort required per release, which can be considered to be a good approximation of the “constant” effort throughout the system evolution. Employing E, system evolution is described by using the inverse square law as follows: s1 = s1 s2 = s1 + E∕s2 1 s3 = s2 + E∕s2 2 … sn = sn−1 + E∕s2 n−1 The aforementioned relationship is consistent with the view that increasing com- plexity, captured in the second law, restrains growth. The inverse square law has an interesting property. By showing that the model closely matches the evolution pattern of a system, one may accurately predict the size of the subsequent releases after the data about the first few releases are available. 2.3.4 Practical Implications of the Laws Based on the eight laws, Lehman suggested more than 50 rules for management, con- trol, and planning [35] of software evolution. Those 50+ rules are put into three broad categories: assumptions management, evolution management, and release manage- ment [46]. Assumptions management. Several assumptions are made by different personnel involved throughout the life cycle of a project. When a software project fails, the primary source of failure can be traced back to those assumptions. It is generally found that some of those assumptions were never valid in the first place, or it is more likely that some of the assumptions became invalid as a result of changes outside the software system. Therefore, management of assumptions plays a key role in successful execution of projects involving E-type software. The following is a list of activities for managing assumptions. r Identify and capture the assumptions pertinent to the project. The difficulty lies in completely identifying all the assumptions.r Initiate periodic reviews to assess any need to correct or update the list of assumptions.r Review and revalidate the assumptions whenever a change occurs in the speci- fication, design, implementation, or operational domain. www.it-ebooks.info EVOLUTION OF SOFTWARE SYSTEMS 57 r Where software operates in a rapidly changing environment, complement detailed assumption review process with re-writing of appropriate components of the software.r Develop and use tools to track all the above activities. Evolution management. In this section, the discussion has mainly referred to the evolution of software as reflected in a series of releases or upgrades. Recommendation relating to evolution and maintenance process includes the following list of items: r Consistently assess and pursue antiregressive work such as complexity control, restructuring, and full documentation. The phrase antiregressive work means the work to be performed to reduce a program’s complexity with no modifications to the user perceived functionality delivered by the system. As part of the development and maintenance responsibility, carry out antiregressive activities. This may not have an immediate impact on stakeholders, but this will facilitate future evolvability.r Ensure that documentation includes identification and recording of assumptions.r Assess the trends in the evolutions of the functional and nonfunctional require- ments of the software product in advance. Review those trends during the release planning while taking the operational domain into consideration.r Involve application and operational domain specialists in the assessment.r Use tools to support data collection, modeling, and related activities.r Acquire, plot, model, and interpret historical evolution metrics to project trends, patterns, growth, and their rate of changes in order to improve planning and processes.r When validating incremental growth, assess the impact on the unchanged parts of the system and assumptions.r Establish baselines of key measures over time to support evolution and mainte- nance planning and control. Release management. A software release can be categorized as safe, risky,or unsafe according to the condition described as follows. Let m be the mean of the incremental growth mi of the system in going from release i to release i + 1 and s be the standard deviation of the incremental growth. The release is safe if the content of the ith desired release (say, mi) is less than or equal to m. The release is said to be risky if the content of the desired release is greater than m but less than m + 2s. Finally, the release is unsafe if the content of the desired release is close to or greater than m + 2s. Based on the aforementioned concepts of safety, concrete activities for release management are as follows: r Ensure that the release is safe.r When the release is not safe, then distribute the growth across several releases to make individual releases safe. www.it-ebooks.info 58 TAXONOMY OF SOFTWARE MAINTENANCE AND EVOLUTION r If excessive functional increments are unavoidable, plan for follow-on clean-up releases with a focus on fixing defects and updating documentation.r Follow established software engineering principles, namely, information hiding to minimize spread of changes between system elements.r By allocating resources, put emphasis on antiregressive work, namely, restruc- turing, eliminating dead code, and reengineering.r Consider the alternation of enhancement and extension with clean-up and restructuring releases. The model discussed in this section concentrated on systems developed under industrial software process paradigm, namely, Closed Source Software (CSS) and extension of the waterfall model. The discussion has mainly referred to the evolution of software as reflected in a series of releases or upgrades. But general validity of the laws of Lehman in the context of newer paradigms, such as open source, agile programming, and COTS-based development, cannot be taken for granted. Hence, we discuss the evolution of free and open source software (FOSS) systems next. 2.3.5 Evolution of FOSS Systems FOSS is a class of software that is both free software and open source. It is liber- ally licensed to grant users the right to use, copy, study, change, and improve its design through availability of its source code. The FOSS movement is attributed to Richard M. Stallman, who started the GNU project in circa 1984, and a supporting organization, the Free Software Foundation [47]. It is often emphasized that free software is a “matter of liberty not price” [48]. FOSS—also referred to as FLOSS (Free/Libre/Open Sources Software)—systems have attracted much academic and commercial interests due to the accessibility to large amounts of code and other free artifacts. Gradually, more and more software systems were developed by Open Source Community (OSC). Compared with CSS development methods, FOSS have lots of new characteristics. Eric Raymond concisely documented the FOSS approach in an article entitled “The Cathedral and the Bazaar” [49]. In this section, we briefly describe the differences between the evolutions of FOSS-based software and CSS- based software in terms of: (i) team structure; (ii) process; (iii) releases; and (iv) global factors [50–52]. Team structure. In traditional CSS development, organizations often have dedi- cated groups to carry out evolution tasks. These groups are staffed with specialist maintenance personnel. In contrast, FOSS development is very different. Even though several FOSS communities have core teams to manage evolution activities on a daily basis, most works are done voluntarily. An onion model of FOSS development has been illustrated in Figure 2.9 [53, 54]. According to this model, a core sustainable community consists of a small group of key members, additional contributing developers, and a large number of active users who report defects. The outer layer represents those users who are not actively involved in the development process. The onion model has three primary www.it-ebooks.info EVOLUTION OF SOFTWARE SYSTEMS 59 Contributing developers Passive users Active users Core members FIGURE 2.9 Onion model of FOSS development structure characteristics: (i) a small core team; (ii) contributing developers add and maintain features; and (iii) active users take ownership of system testing and defect report- ing. In the FOSS evolution model, numerous nomadic volunteers work together as a community. Therefore, one should consider the change of people in the evaluation of evolution of FOSS-based software. Process. The FOSS development process is lighter than CSS development process followed in companies, where requirement documents and design specifications are indispensable. There are strict rules about coding style, documentation, and defect fixes. On the other hand, in FOSS development, requirement specification and detailed design documentations take a back seat, at least from the user’s perspectives. Though there are standards for coding and documenting, these are less relaxedly adhered compared to traditional CSS development. In FOSS development, source code comprises the main artifact for disseminating knowledge among the developers. Therefore, FOSS activities are largely confined to coding and testing. To overcome the ensuing difficulties due to not following a document-driven development process, an array of supplementary information is provided: release notes, defect databases, configuration management facilities, and email lists. For some projects, some devel- opers act as the “gate keepers” for any revision to the code. Each project community makes their own rules to regulate the submission of bug fixes and new functionalities. Systematic testing is not always present compared to CSS-based projects. However, in FOSS projects, due to the large number of developers and beta-testers, almost all the issues can be quickly characterized and the fix is obvious to someone. As observed by Eric S. Raymond [49], “given enough eyeballs, all bugs are shallow.” This is known as Linus’s Law: the more widely available the source code is for public testing, scrutiny, and experimentation, the more rapidly all forms of defects are discovered. Remark: Linus’s law was attributed to a Finnish software engineer, Linus Benedict Torvals by Eric S. Raymond. He was the lead of the Linux kernel project and www.it-ebooks.info 60 TAXONOMY OF SOFTWARE MAINTENANCE AND EVOLUTION later became the chief architect of the Linux operating system and the project coordinator. Releases. A key attribute of FOSS is that code is shared with almost no constraints. Compared to CSS development, FOSS development generally do not have schedule for regular releases. However, larger FOSS projects do have stated goals and releases are generally scheduled in terms of functionalities to be delivered. In general, there are two related streams of source code: (i) a stable stream of code for distribution and (ii) a development stream. The latter one is currently being modified and improved. At some instant, the development stream becomes stable and is released. The development stream is frozen for a few days prior to a milestone release to identify critical problems. When it is determined that critical problems have been resolved and the code is indeed stable, then the code is released [55]. However, it is a common behavior in many FOSS projects to follow the rule: “release early, release often.” The above rule means that the code is available to public well before it is stable. Global factors. In the FOSS development paradigm, developers working on even a very small project might be living in many countries around the world, due to the pervasive use of the world-wide web (WWW). With globalization on the rise, the collaborators hail from many countries with a variety of cultural backgrounds. FOSS development becomes very challenging because of the need to coordinate the geographically distributed developers. Though many companies have their develop- ment teams spread out in many countries, most of the traditional companies develop systems with their local teams and, occasionally, the system is tested at an overseas location. Empirical Studies of FOSS Evolution In circa 1988, Pirzada [56] analyzed the dissimilarities between the systems studied by Lehman and Belady [32] and the evolution of the Unix operating system. It was argued that the differences between commercial development and development for academic purposes could lead to differences in their evolutionary trajectories. In circa 2000, empirical study of FOSS evolution was conducted by Godfrey and Tu [57]. They provided the trend of growth between 1994 and 1999 for the Linux operating system (OS), which is a popular FOSS system, and showed its rate of growth to be superlinear. Specifically, they found that the size of the Linux followed a quadratic growth trend, and at that time the OS was about 2+ million lines of code (LOCs). In circa 2002, Schach et al. [58] studied the evolution of 365 versions of Linux and showed that module coupling, that is interconnection of modules, has been growing exponentially. They argued that unless efforts to alleviate this situation is undertaken, the Linux OS would become unmaintainable. Their argument was fully consistent with Lehman’s sixth and seventh laws. However, it appears to be at odds with the third and the fifth laws, which are self-regulation and conservation of familiarity, respectively. Robles et al. [59], while replicating Godfrey and Tu’s study, concluded that Lehman’s fourth law (conservation of organizational stability) does not fit well with large-scale FOSS systems such as Linux. This behavior of exponential growth may be considered as an anomaly as pointed out by Lehman et al. [60]. The evolutionary behavior of other www.it-ebooks.info MAINTENANCE OF COTS-BASED SYSTEMS 61 1,200,000 1,000,000 drivers arch include net fs kernel mm ipc lib init 800,000 Total uncommented LOC 600,000 400,000 200,000 Jan 1933 Jun 1994 Oct 1995 Mar 1995 Jul 1998 Dec 1999 Apr 2001 0 FIGURE 2.10 Growth of the major subsystems (development releases only) of the Linux OS. From Reference 57. © 2000 IEEE FOSS systems such as Gcc, Linux kernel, Apache, Brocade library, and Zlib appears to follow Lehman’s laws for software evolution [61, 62]. Godfrey and Tu observed that the growth rate was more for the device-driver subsystem of Linux as can be seen from Figure 2.10. As a matter of fact, the device- drivers appear to be mutually independent. Therefore, adding a new driver does not raise the subsystem’s complexity. Another characteristic of Linux is that the system gives a false impression that it is larger than it really is. The “larger-than-real-size” impression is observed due to the fact that certain features, say, supporting different CPU types, are implemented with code replication (software clones). Moreover, participation of an unrestricted pool of (novice) developers may explain this, that is, software clones. In order to have a better understanding of how a system is evolving, it is necessary to analyze and understand each subsystem within and across software releases irrespective of FOSS or CSS systems. This observation has been reported by Gall et al. based on the study of a CSS: a large telecommunication switching system [63]. 2.4 MAINTENANCE OF COTS-BASED SYSTEMS Component-based development has an intuitive underlying idea. Instead of devel- oping a system by programming it entirely from scratch, develop it by using preex- isting building blocks, components, and plug them together as required to built the www.it-ebooks.info 62 TAXONOMY OF SOFTWARE MAINTENANCE AND EVOLUTION target system. Components are nearly independent and replaceable parts of a system. Special components called Commercial-off-the-shelf (COTS) can be purchased on a component market. Often, these types of components are delivered without their source code. The use of COTS components is increasing in modern software devel- opment because of the following reasons: (i) there is significant gain in productivity due to reusing commercial components; (ii) the time to market the product decreases; (iii) the product quality is expected to be high, assuming that the COTS compo- nents have been well tested; and (iv) there is efficient use of human resources due to the fact that development personnel will be freed up for other tasks. However, many difficulties are to be overcome while using COTS compared to using in-house components. The black-box nature of COTS components prevents system integra- tors from modifying the components to better meet user’s requirements. Moreover, the integrators have no visibility of the details of the components, their evolutions, or their maintenance; rather, they are solely dependent on the developers and sup- pliers of the components. The only source code being written and modified by the integrators is what is needed for integrating the COTS-based systems. This includes code for tailoring and wrapping the individual components, as well as the “glue” code required to put the components together [64]. Wrapper code X combined with another piece of code Y determines how code Y is executed. The wrapper acts as an interface between its caller and the wrapped code Y. Wrapping may be done for compatibility. A glue component is basically designed to combine the services from many components to provide a higher level of service. Component tailoring means enhancing the functionality of a component, and it is done by adding new elements to a component. Note that the source code is not changed by this activity. “Scripting” is an example of tailoring, because a program can be enhanced by having some event trigger a script. Irrespective of a software system being in-house developed or COTS based, main- tenance is the most expensive phase of the system’s life cycle. There are key differ- ences in the activities executed to maintain component-based software (CBS), even though the motivations behind system maintenance remain the same. The differences are due to the following major sources: r Maintainers perceive a CBS system as an interacting group of large-scale black- box components, instead of a compiled set of source modules. The two views require different maintenance skill sets.r Most of the source code implementing the wrapper, glue, and tailoring modules are used to integrate the system, instead of delivering services and functions.r The maintenance organization largely loses control over the precise evolution of the system, because COTS developers focus on their own business interests. 2.4.1 Why Maintenance of CBS Is Difficult? The cost of maintaining COTS-based software systems represent a significant fraction of the total cost of developing software products. Studies show that CBSs incur more maintenance cost than in-house built software [65]. As a first step, reduction of the cost of COTS-based software requires understanding of what makes CBSs maintenance www.it-ebooks.info MAINTENANCE OF COTS-BASED SYSTEMS 63 difficult. In the following, we provide a list of those difficulties [66]. Next, we explain those difficulties one by one. r Frozen functionalityr Incompatibility of upgradesr Trojan horsesr Unreliable COTS componentsr Defective middleware. Frozen functionality. The functionalities of a COTS component are rendered to be frozen when the vendor stops enhancing the product or stops providing further product support. This occurs if the vendor or the supplier discontinues to support the component. The host system becomes unmaintainable due to the components becoming frozen. The host organization will have a serious problem if periodic updates are required to be performed on those components. The term host organization refers to an individual, group or organization that applies components as a part of some software system, called host system. To find a solution, an integrator is faced with the following options: (i) attempt to implement the frozen functionalities; (ii) acquire a new but similar component from a different vendor; and (iii) acquire the source code from the present vendor to maintain it. The first option is the most difficult choice, unless the host organization has the necessary domain knowledge. The second one is likely to be opted if there are competing alternative vendors. Otherwise, the third one is the only option available. In order to exercise the third option, the integrator should develop a good understanding of the domain to maintain the component source code. Incompatibility of upgrades. The host organization integrates the components and upgrades the software product to meet the needs of its customers. If a modified component becomes inconsistent with the remaining components of the host system, then integration of the modified component with the host system may not be pos- sible. For example, the new version of a component may require new data formats, which, in turn, requires modifications to be done to the contents and formats of the current files that were generated by earlier versions of the COTS software. The problem then becomes similar to the frozen functionality problem. Assuming that the solutions available to handle the frozen functionality problem are not available, then, as a fourth solution, one may build wrappers around a component to refrain it from exhibiting the incompatibility creating behavior. If wrappers alone are insuffi- cient to eliminate incompatibilities, one can rewrite the “glue” connecting the newly possessed components with the existing ones. Finally, if wrappers and the modified “glue” do not solve the upgrade problem, then the integrator may consider downgrad- ing the functionalities of the system. Not upgrading to the next version can produce the following consequences: r There may not be continued vendor support for prior versions.r The host organization may be unable to purchase more copies of the version in operation. Additional copies of a product may be needed when the system is being incrementally deployed. www.it-ebooks.info 64 TAXONOMY OF SOFTWARE MAINTENANCE AND EVOLUTION Trojan horses. A software Trojan horse is a piece of code that has been programmed into a component to make it behave in a malicious way. For instance, deleting all files after switching to a privileged directory can be considered an example of Trojan horse functionality. Determining a functionality to be a Trojan horse is a difficult task. Making the Trojan horse dynamically context sensitive is one way to hide it. For instance, “delete all files” can be a valid command if it refers to entities in a temporary directory. On the other hand, it can have devastating consequences if it is executed in a system context. Therefore, deleting system files can be classified as Trojan horse behavior, whereas deleting temporary files is a normal function. Detection of malicious behavior is difficult enough even with full access to program code. Detection of suspicious actions in a running component will require capturing the requests emanating from the component and verifying their contexts. For COTS components, this can be done at the wrapper level. Note that this approach to detecting malicious behavior is too expensive, because too many calls and context checks are involved. In a running component, Trojan horses go largely undetected. It is an issue that programmers must be aware of while substituting an old component with a new one. Component substitution requires specialized procedures for testing and certifying COTS components. Each time some COTS products are changed, the components and the host system may have to be recertified. Unreliable COTS components. The scenario of incompatible upgrades has been discussed before in this section. Now we consider unreliable COTS components. Though incompatibility and unreliability are related, there is a distinction between the two. Today, no uniform standard exists to test software components to certify their reliability [67]. By paying software certification laboratories (SCL) to grant software certificates, independent vendors partially shift their responsibility to the SCL. How- ever, there are several ramifications of using services from SCLs: cost, liability issues, developer resources needed to access SCLs, and applicability to safety-critical sys- tems [68]. It may be argued that products with better reliability can be produced with good processes, which can be graded with Capability Maturity Model (CMM), Test Process Improvement (TPI) model, and Test Maturity Model (TMM) [69]. However, process quality does not guarantee product quality, and a vendor may not reveal the maturity level of their process. Though software reliability models exist for decades [70], generic assumptions are made in those models about execution environments, rate of defect, severity of defects, and sizes of faults. Consequently, it is difficult to apply those models. The assumptions may not reflect the individual peculiarities of different environments. Therefore, the dependability of a component is not known to its customers. Even if a score for dependability is provided by the vendor, it is likely that the score was computed based on intricacies that do not broadly reflect the customer’s execution environment. Defective middleware. COTS components are primarily integrated by analyzing the syntax and semantics of their interfaces. However, the integrators have several means for integrating COTS components into a host system, and designing middle- ware is a straightforward approach. Whenever concerns exist regarding the behavior of a COTS component in the context of a whole system, it is prudent to write middle- ware to ensure that certain constraints are satisfied. For example, wrappers are a kind www.it-ebooks.info MAINTENANCE OF COTS-BASED SYSTEMS 65 of middleware which can be used to constrain the functionalities of components. The main ideas in wrapper design is to: (i) restrict the inputs; (ii) perform preprocessing on the inputs; (iii) restrict the outputs of a component; or (iv) perform post-processing on the outputs. All those kinds of processing have the potential of modifying the semantics of a component. The key problem in designing wrappers is that it is not completely known what behavior to protect against. Querying the vendor could elicit some bits and pieces of information, but it is prudent to thoroughly test a component in its real environment. One can combine vendor supplied information with results of in-house testing to design better wrappers. But, wrappers can be complex, incomplete, and unreliable. Wrappers are discussed in Section 5.2 in great details. 2.4.2 Maintenance Activities for CBSs It is necessary to identify the activities of the maintenance and management per- sonnel to effectively manage COTS-based systems. Strategies can be formulated to facilitate those activities. Vigder and Kark [71] have surveyed several organizations maintaining systems with a significant portion of COTS elements. In their study, they identified the following cost-drivers: r Component reconfigurationr Testing and debuggingr Monitoring of systemsr Enhancing functionality for usersr Configuration management. Component reconfiguration. Component reconfiguration means adding, removing, and replacing components of a system. The following actions lead to a system being reconfigured: (i) add new components to increase the capability of the system; (ii) delete components as requirements change; (iii) replace a component with a newer version; (iv) replace an old component with a better one; and (v) replace an in-house built component with COTS components. Often component vendors release software updates many times in a year. Therefore, integration of the enhanced components into the host system becomes an expensive task. The host organization continuously evaluates new component releases from the vendors. It establishes criteria by considering capabilities, risk, and cost to make a decision on system upgradation. If a decision is made to upgrade the system, then: (i) the components are analyzed within the context of the current host system and (ii) a new cycle for system integration and testing is planned. To determine the differences between the old version and the new version, system interfaces and behavior are tested. It is likely that the assumptions of the enhanced COTS system are not consistent with those of the other COTS elements. Hence, rigorous regression testing must be performed. In summary, performing component reconfiguration is a time-consuming process that requires an organization to move through a full release cycle: evaluate the product, obtain a design, perform integration, and execute system regression tests [72]. www.it-ebooks.info 66 TAXONOMY OF SOFTWARE MAINTENANCE AND EVOLUTION Testing and debugging. Every organization follows its own methodologies, strate- gies, processes, and techniques for performing testing and debugging on in-house developed code. However, testing and debugging of CBSs pose new challenges due to the absence of visibility into the internal details of third-party developed compo- nents. For example, fixing defects in an in-house developed system typically involves: (i) executing the system with a debugger to locate the problem and (ii) modifying the source code to fix the defects. On the other hand, maintenance personnel cannot modify source code of COTS components. Rather, they become dependent upon the component vendors to understand the internal details of the product. Consequently, maintenance personnel and COTS vendors frequently exchange detailed messages. System monitoring. While a system is in operation, maintainers need to closely monitor the system to be able to better understand the system performance. Contin- uous monitoring enables the maintenance personnel to enhance the performance of the system, measure usage of resources, keep track of the anomalies in the system behavior, and perform root cause analysis of system failures. Monitoring for the purpose of maintenance is a difficult task because of the low visibility of the internal operations of COTS software. Enhancing the functionalities for users. COTS products are designed and imple- mented in a broad sense so that those can be adapted in various applications. Integra- tion personnel need to customize and tailor COTS functionality to satisfy their user community. Therefore, successful host systems exhibit the properties of efficient modifiability and tailorability to incorporate evolving and new user requirements. Tailoring involves a continual process of configuring and customizing of products, combining services of multiple products, and adding new components to products. In the absence of access to source code, tailoring is done by means of two techniques: r write additional glue code to hold the system together and provide enhanced functionality; andr use vendor supplied tailoring techniques to customize the products, because integration personnel have no access to program code. Configuration management. Configuration management of CBS systems is done at two levels: (i) source-code level to manage the in-house software, namely, wrapper, glue, and tailoring developed by the personnel performing system integration and (ii) component level to manage COTS, procured from third-party vendors [73]. The following five activities are specifically done for CBS products: r Track the versions of the COTS products, and retain the following details for each component in the version archive: – Save the name of the developer of the component if available. – Save the contact information of the person or organization supplying the component. – Archive the source code of the component if available. www.it-ebooks.info MAINTENANCE OF COTS-BASED SYSTEMS 67 – Archive the working versions of all tools, namely, compilers and linkers, necessary to rebuild a component. – Make a detailed rationale for including the component in the system, includ- ing any previous use of the component and known facts about its quality attributes. For instance, BSD Unix was used by Sun Microsystems to build their proprietary OS. Therefore, the information “BSD Unix” is an instance of “previous use.” – Obtain the contact information of some of those using the component.r Perform configuration management on the individually tailored COTS elements.r Track the configuration history of a product at all its deployment sites.r Find the compatible versions of the various COTS elements.r Manage support and licenses for each COTS element. Configuration management is a key activity over the life cycle of a large system. For COTS components, configuration management needs to be performed for each COTS software product and each platform on which the product is installed. The lists of those COTS software products and platforms are needed while (i) distributing software upgrades or fixes to multiple sites or (ii) restoring configurations that have been broken. 2.4.3 Design Properties of Component-Based Systems The architecture of a CBS has significant impact on its maintainability. Component maintainability properties, such as minimal component coupling and visibility, cannot be enhanced after building a CBS. Rather, one must consider these properties at the time of the initial development of the CBS. The main areas influencing CBS maintainability are the choice of the components and the architecture and design used to perform system integration on the components. Component Selection Though system integrators do not design and implement individual components, they do have a say while components are selected to be integrated into the host system. The CBS integrator must consider the CBS evolution factor when designing criteria for component selection. A number of attributes of components effect the evolution and maintenance of CBSs. These attributes are discussed in what follows. Openness of components. A component is considered to be open if it is designed to be visible, extensible, adaptable, and easily integrated into a variety of differ- ent host systems. In general, the more the openness of a component, the easier it is for integrators and maintainers to monitor, manage, extend, replace, test, and integrate. The factors that make a component open are adherence to standards, avail- ability of source code, and ability to inter-operate with products from other vendors. Source code can be made available through original equipment manufacturer (OEM) partnership. www.it-ebooks.info 68 TAXONOMY OF SOFTWARE MAINTENANCE AND EVOLUTION Tailorability of components. Tailoring the functionality of the components to meet the evolving user requirements is a kind of maintenance effort for CBS systems. One criterion for component selection can be whether or not the component can be tailored to satisfy the end users’ requirements. Though components are seen as black-boxes, vendors can apply many techniques to make components tailorable. Examples of tailoring techniques are: scripting interfaces and extendible frameworks through the use of plug-ins and inheritance. Available support community. Host system builders need much support from exter- nal organizations to build and maintain commercial products. The external support comes from the user community and the vendors. Noting that external support is key to system maintenance, the host system builders need to evaluate the support available during the component evaluation process. Design Properties of Maintainable CBS By analyzing and partly resolving the issues of maintainability in the design phase, one can build a system that facilitates the maintenance of CBSs. This requires the development of a set of criteria to evaluate maintainability. The following design attributes of a maintainable CBS have been identified by Vigder and Kark [71]: r Encapsulated component collaborationsr Controlled component interfacesr Controlled component dependenciesr Minimal component couplingr Consistent failure handlingr High level of visibilityr Minimal build and deployment effort. Encapsulated component collaborations. Collaborations are time-sequenced coor- dinated actions. Collaborations among components can involve many data and behav- ioral dependencies. A key design objective is to make collaborations explicit and encapsulate each collaboration within a single object, often called a mediator.A mediator can be implemented as part of the glue code, and it should be designed to (i) translate and transform data formats to enable data transfer and (ii) manage event sequencing for the components. By encapsulating collaborations the CBS will be more maintainable by supporting the following activities [74]: r Product reconfiguration. It is much easier to understand and manage component dependencies by encapsulating component interactions within a separate object.r Troubleshooting. Numerous problems in CBS are related to sequences of inter- actions among components. Problem isolation becomes easier if most of the interaction sequences happen within a single mediator. www.it-ebooks.info MAINTENANCE OF COTS-BASED SYSTEMS 69 r Modifying and adding services. By combining services from different com- ponents, mediators implement many business processes. Updating business processes become easier by confining services to mediators.r System monitoring. Mediators can include instrumentation code for monitoring system behavior. Controlled component interfaces. There are two main reasons to use integrator- controlled interfaces on COTS components: (i) facilitate component reconfiguration; and (ii) add visibility. As new components and component versions are combined with a system, an integrator-controlled interface can reduce the impact of frequent reconfigurations by means of isolation. In addition, management, instrumentation, and monitoring functionality can be included in an integrator. If integration personnel directly use interfaces supplied by vendors, they might face difficulties in: (i) reducing the consequences of reconfigurations and (ii) determining the dependencies between components. Integrators can use a number of approaches to turn interfaces into first- class objects and manipulate them. A first-class object is one that can be dynamically created, destroyed, or passed as an argument. One approach to creating a first-class object is designing a wrapper around all the components; a wrapper can be designed by using an adapter design pattern. The adapter design pattern translates one interface for a class into a compatible interface. An adapter allows classes to work together that normally could not because of incompatible interfaces, by providing its interface to clients while using the original interface. As the underlying component is modified, the ripple effect on the other components can be minimized. A second approach to creating a first-class object is to use standardized interfaces for COTS products. Controlled component dependencies. The integrator identifies mutually dependent components and realizes strategies for controlling and managing those dependencies. Complex dependencies among components produce a fragile system. It becomes difficult to upgrade, add, and delete elements in a fragile system. To make maintenance easier for CBS systems, designers must reduce dependencies between components. Some dependencies are explicit and some are implicit. Flow of data through a visible interface is an example of direct dependency. Conflicting assumptions made by different software components are examples of implicit dependency. Dependencies can also result from resource contentions; an example of resource contention is when many components try to use the same TCP port number at the same time. Dependencies between COTS products include the following three broad kinds of dependencies: r Syntactic dependencies. These occur when components make assumptions about the interface signature of the component.r Behavioral dependencies. These occur when components are involved in two- way interactions or multi-way collaborations.r Resources dependencies. These occur when multiple components compete for the same resources. www.it-ebooks.info 70 TAXONOMY OF SOFTWARE MAINTENANCE AND EVOLUTION The designer must record, update, and verify all cross-component dependencies to perform risk analyses before a system upgrade involving those components. In addi- tion, integrators can provide mechanisms for managing those dependencies: identifi- cation of deployment-time versions and verification of satisfaction of dependencies. Minimal component coupling. It becomes more difficult to substitute a component if coupling of the component with other components is very high. Minimal component coupling is realized by: (i) constraining and controlling component access and (ii) isolating the resources used by separate components. A wrapper around a component can suppress undesired functionalities that introduce additional dependencies. Consistent failure handling. In their tasks of testing, debugging, and monitoring system behavior, maintainers are assisted by failure handling. Therefore, integrators need consistent, complete, and effective means to detect and handle failures. A component can behave in an unpredictable manner if it is provided with faulty inputs. Therefore, it is important to identify and isolate faults sooner before they propagate as errors through other components. A consistent failure handling mechanism enables maintainers to detect errors when they occur, identify the root cause of the failure, and minimize the impact of the failure. For example, a consistent failure handling mechanism for any routines includes a status output argument, which is used to return error status codes. Status codes may be passed to another routine (viz. errormsgtext()) to extract an error message text from a message catalogue. Errors can be detected by the wrappers and handled in the glue code holding the system together. High level of visibility. Visibility means that maintenance engineers are able to monitor the system, including the behavior of wrappers and glue components. Visibil- ity can be added to the system by the integrators in several ways. Instrumentation and monitoring capabilities should be integrated into wrappers and glue code to support monitoring of interactions as part of the overall system design. Monitoring tools can support additional capabilities to gain visibility into a running system, namely, mon- itoring the input and output behavior of communication protocols with sniffer code. Minimal build and deployment effort. CBSs are often built with many components that require frequent tailoring and reconfigurations. Therefore, the build process to install the software at all deployment sites is complex. On the other hand, it should be easily achievable to replace products or add new functionalities without going through an expensive build process. COTS elements may involve intricacies in their deployment processes, thereby contradicting the assumptions made by other products. As a result, the build itself can become an expensive and complex part of system maintenance [75, 76]. A build process becomes complicated if too many modules are new and have complex interfaces. A tool for version control is highly recommended for automating the build process. 2.5 SUMMARY This chapter began with definitions of software maintenance from the perspectives of researchers, practitioners, and standardization groups. We differentiated development from maintenance: developing new software is a requirement-driven activity, whereas www.it-ebooks.info SUMMARY 71 maintaining an existing system is event driven. For example, when a request for change is received, the maintenance organization may modify the system. Therefore, the inputs that drive maintenance changes are random events, originating, for example, from an user in the form of a CR. Software maintenance consists of two primary activities: correcting errors and enhancing functionality of the software. Hence, it can be seen as continued development. We identified and explained the following maintenance activities: r intention-based classification of software maintenance activities;r activity-based classification of software maintenance activities; andr evidence-based classification of software maintenance activities. To explain intention-based classification of maintenance tasks, we introduced Swan- son’s approach [5] which defined three kinds of maintenance activities: corrective, adaptive, and corrective. On the other hand, the maintenance classification of Kitchen- ham et al. [4] consists of two broad kinds of activities for corrections and enhance- ments. The category for enhancement is subdivided into three types as follows: (i) modifications that change some of the current requirements; (ii) modifications that add new requirements to the system; and (iii) modifications that alter the implementation but not the requirements. The evidence-based classification of Chapin [16] consists of 12 types of software maintenance tasks: training, consultive, evaluative, reformative, updative, groomative, preventive, performance, adaptive, reductive, corrective, and enhancive. Next we explained various concepts that influence software maintenance pro- cesses. Those concepts were studied under four categories: (i) maintained product; (ii) maintenance types; (iii) organization process; and (iv) peopleware. Then, we described the characteristics of those concepts and their impact on maintenance activi- ties. This discussion clarifies the difference between maintenance methods/tools/skills from those used for software development. Next, we studied various ways in which researchers define software evolution, and differentiated it from software maintenance. Keith H. Bennett and Jie Xu [27] use the term maintenance to refer to all post-delivery support activities, and evolution to perfective changes, that is, those driven by changes in requirements. In addition, the authors further state that evolution addresses both functional and nonfunctional requirements. On the other hand, Ned Chapin defines software evolution as the applications of software maintenance activities and processes that generate a new operational software version with a changed customer-experienced functionality or properties from a prior operational version [15]. We described Lehman’s classification of properties of CSS of S-type (Specified), P-type (Problem), and E-type (Evolving). The S-type programs implement solutions to the problems that can be completely and unambiguously specified, for which, in theory at least, a program implementation can be proven correct with respect to the specification. The definition of S-type requires that the program be correct in the full mathematical sense related to the specification. A P-type program is based on a www.it-ebooks.info 72 TAXONOMY OF SOFTWARE MAINTENANCE AND EVOLUTION practical abstraction of the problem, rather than a completely defined specification. Even though the exact solution may exist, the solution produced by a P-type program is tampered by the environment in which it must be produced. The solution produced by a P-type program is acceptable if the results make sense to the stakeholder(s) in the world in which the problem is embedded. Finally, the distinctive attributes of E-type systems are as follows: r The complex and large problems addressed by E-type systems cannot be com- pletely and formally specified.r The system has an incomplete model of the execution environment that embeds the program.r The system makes a large number of simplifications and assumptions about the real world.r Program execution modifies the operation domain.r The development and evolution processes for E-type software are feedback driven. Next, we discussed the following eight laws of software evolution for E-type CSS systems, including empirical studies and their practical implications. I. Continuing change. E-type programs must be continually adapted, else they become progressively less satisfactory. II. Increasing complexity. As an E-type program evolves, its complexity increases unless work is done to maintain or reduce it. III. Self-regulation. The evolution process of E-type programs is self-regulating, with the time distribution of measures of processes and products being close to normal. IV. Conservation of organizational stability. The average effective global activity rate in an evolving E-type program is invariant over the product’s lifetime. V. Conservation of familiarity. The average content of successive releases is con- stant during the life cycle of an evolving E-type program. VI. Continuing growth. To maintain user satisfaction with the program over its life- time, the functional content of an E-type program must be continually increased. VII. Declining quality. An E-type program is perceived by its stakeholders to have declining quality if it is not maintained and adapted to its environment. VIII. Feedback system. The evolution processes in E-type programs constitute multi- agent, multi-level, multi-loop feedback systems. Next, we described the origin of FOSS movement and the differences between CSS and FOSS systems with respect to: team structure, process, releases, and global factors. In addition, we discussed the empirical research results about the Linux FOSS system to study the laws of evolution, originally proposed for CSS systems. We concluded this chapter with a discussion on maintenance of COTS. www.it-ebooks.info LITERATURE REVIEW 73 LITERATURE REVIEW The book by Dennis D. Smith (“Designing Maintainable Software,” Springer, New York, 1999) is an excellent starting point for understanding the many issues related to maintenance. With theoretical reasoning and observation, the book explains how maintainers undergo problem solving. He provides helpful tips for maintainers regard- ing cognitive structures, naming conventions, and the use of truncation. Though the book does not cover evolution, it clearly explains software evolvability. Soft- ware maintenance is usually considered in terms of corrections, improvements, and enhancements. The article by Dewayne E. Perry (“Dimensions of Software Evolu- tion” by D. E. Perry, International Conference on Software Maintenance, Victoria, BC, IEEE Computer Society Press, Los Alamitos, CA, September, 1994, p. 296–303) looks at the other three dimensions: domain, experience, and process to gain insights into the sources of software evolution. These three dimensions are interrelated in various ways and interact with each other in a number of ways. One will be able to understand and manage effectively the evolution of software systems only when there is a deep understanding of these dimensions. This idea was further extended by Ciraci et al. (“A Taxonomy for a Constructive Approach to Software Evolution,” by S. Ciraci, P. Broek, and M. Aksit, Journal of Software, Vol. 2, No. 2, August, 2007, p. 84–96) to 24 feasible contexts for the software evolution. The taxonomy is based on the fact that a change in one of these sources (e.g., domain, process, or experience) has occurred or is expected to occur. The typology of Swanson has been influential among researchers and practitioners [7]. However very few researchers and practitioners have followed Swanson’s typol- ogy. Rather, others have given new meanings to those terms. For example, the standard proposed by IEEE [77] defines the three terms “corrective,” “adaptive,” and “perfec- tive,” which are not completely compatible with Swanson’s. The article by Chapin et al. [15] clearly identifies and compares the differences in a tabular form along with the definitions of 12 maintenance types: training, consultive, evaluative, reformative, updative, groomative, preventive, performance, adaptive, reductive, corrective, and enhancive. The authors excluded the “perfective” type from their classification and coined a new term called “groomative.” The reader is urged to study the article on preventive maintenance by Kajko-Mattsson [13]. The author did a comprehensive literature study on preventive maintenance both within software and hardware engi- neering. In addition, Kajko-Mattsson et al. [78] have discussed a comprehensive taxonomy of activities performed for the corrective type. In the article by Jim Buckley et al. [79], the authors took a complementary view toward a taxonomy of maintenance by focusing more on the how, when, what, and where aspects of software changes. The article proposed four logical themes and 15 dimensions. The four logical themes are: (i) temporal properties (When is the change made?); (ii) object of change (Where is a change made?); (iii) system properties (What is being changed?); and (iv) change report (How is the change accomplished?). The 15 dimensions are: time to change, change history, change frequency, antic- ipation, artifacts, granularity, impact, change propagation, availability, activeness, openness, safety, degree of freedom, degree of formality, and change type. www.it-ebooks.info 74 TAXONOMY OF SOFTWARE MAINTENANCE AND EVOLUTION The revised form of the SPE taxonomy, called SPE+, was proposed by Stephen Cook et al. [30] to address some ambiguities and weaknesses in the original taxonomy. For example, Lehman did not choose to characterize the P-type software. Kemerer and Slaughter [80] showed that problems in software maintenance can be traced to an absence of understanding of: (i) the maintenance process and (ii) the cause–effect relationships between software maintenance practices and outcomes. Their study focused on the effort expended, modification performed, and cost incurred to evolve the software. Many researchers are further studying new topics related to laws of software evolution. Readers interested in a more detailed discussion of the topic may read the following books: N. H. Madhavji, J. F. Ramil, and D. E. Perry, Ed. Software Evolution and Feedback – Theory and Practice, John Wiley, West Sussex, England, 2006. T. Mens and S. Demeyer, Ed. Software Evolution, Springer-Verlag, Berlin Heidelberg, 2008. The first book focuses on the what and why aspects of software evolution, that is, on the nature of software evolution phenomenon with emphasis on nontechnical aspects, such as complexity theory, social interactions, and human psychology. This book provides a depth of material in the field of software evolution and feedback. Specifically, it describes the phenomenological and technological underpinnings of software evolution, and it explains the impact of feedback on development and maintenance of software. Part I (Chapters 1–16) is “evolution” centered, whereas part II (Chapters 17–27) is “feedback” centered, though both the topics are often discussed in the same chapter. Within these partitions, the chapters are organized from one more conceptual to more concrete contents. The second book entitled “Software Evolution” focuses on the how aspects of software evolution: methods, activities, tools, and technology that give the means to manage software evolution. The book has been structured into three parts. The first part focuses on: (i) analysis of release histories and version repositories and (ii) improvement of evolution by fixing defects and eliminating redundancies in software. The second part explains how one can reengineer a legacy system into a modern system that is easier to maintain. The third part discusses the relation between evolution and other main subjects in software development. Those interested in knowing more details about the comparative empirical study of FOSS and CSS may refer to the article by Paulson et al. [61]. The authors studied and compared the results of the evolution of three CSS and FOSS projects. The three well-known FOSS projects are the Linux kernel, the GCC compiler, and the Apache HTTP web server. The authors chose to consider only the kernel part of Linux because, apparently, the three CSS systems were more comparable with the kernel than the whole Linux system. The three CSS projects are from the embedded real-time system domain described as “software protocol stacks in wireless telecommunication device.” The five hypotheses studied in this article were: (i) FOSS grows more quickly than proprietary, that is CSS, systems; (ii) FOSS systems foster more creativity; (iii) FOSS systems are less complex than CSS systems; (iv) fewer bugs are there in www.it-ebooks.info REFERENCES 75 FOSS systems and those can be more rapidly located and fixed; and (v) FOSS systems are better modularized. Out of those five hypotheses, only (ii) and (iv) were supported with measurements. The following measurements were used to test the hypotheses: 1. For hypothesis (i), count the number of lines of code added over time. This measurement captures the size (or growth) metric. 2. For hypothesis (ii), count the number of functions added over time. This mea- surement reflects the creativity shown over time. 3. For hypothesis (iii), measure the average complexity of all the functions and the average complexity of all the newly added functions. 4. For hypothesis (iv), count the number of functions changed to fix bugs and represent the number of functions changed to fix bugs as a percentage of the total number of functions. 5. For hypothesis (v), compute the correlation between the number of functions added and the number of functions changed. REFERENCES [1] R. G. Canning. 1972. The maintenance ‘iceberg’. EDP Analyzer, 10(10), 1–14. [2] T. M. Pigoski. 2001. Chapter 6, Software Maintenance, SWEBOK: A Project of the Software Engineering Coordinating Committee (Trial Version 1.00). IEEE Computer Society Press, Los Alamitos, CA. [3] ISO/IEC 14764:2006 and IEEE Std 14764-2006. 2006. Software Engineering – Software Life Cycle Processes – Maintenance. Geneva, Switzerland. [4] B. A. Kitchenham, G. H. Travassos, A. N. Mayrhauser, F. Niessink, N. F. Schneidewind, J. Singer, S. Takada, R. Vehvilainen, and H. Yang. 1999. Towards an ontologyy of software maintenance. Journal of Software Maintenance and Evolution: Research and Practice, 11, 365–389. [5] E. B. Swanson. 1976. The Dimensions of Maintenance. Proceedings of the 2nd Interna- tional Conference on Software Engineering (ICSE), October 1976, San Francisco, CA. IEEE Computer Society Press, Los Alamitos, CA. pp. 492–497. [6] B. P. Lientz and E. B. Swanson. 1980. Software Maintenance Management. Addison- Wesley, Reading, MA. [7] E. B. Swanson and N. Chapin. 1995. Interview with E. Burton Swanson. Journal of Software Maintenance and Evolution: Research and Practice, 7(5), 303–315. [8] S. G. Eick, T. L. Graves, A. f. Karr, J. S. Marron, and A. Mockus. 2001. Does code decay? Assessing evidence from change management data. IEEE Transactions on Software Engineering, January, 1–12. [9] K. J. Lieberherr and I. M. Holland. 1989. Tools for Preventive Software Maintenance. Proceedings of International Conference on Software Maintenance (ICSM), October 1989, Miami, FL. IEEE Computer Society Press, Los Alamitos, CA. pp. 1–12. [10] Y. Huang, C. Kintala, N. Kolettis, and N. D. Fulton. 1995. Software Rejuvenation: Analysis, Module and Applications. Proceedings of the 25th symposium on Fault Tolerant www.it-ebooks.info 76 TAXONOMY OF SOFTWARE MAINTENANCE AND EVOLUTION Computing, June 1995, Pasadena, CA. IEEE Computer Society Press, Los Alamitos, CA. pp. 381–390. [11] S. Garg, A. Puliafito, M. Telek, and K. Trivedi. 1998. Analysis of preventive maintenance in transactions based software systems. IEEE Transactions on Computers, January, 96– 107. [12] M. Grottke and K. S. Trivedi. 2007. Fighting bugs: remove, retry, replicate, and rejuvenate. IEEE Computers, February, 107–109. [13] M. K. Mattsson. 2001. Can We Learn Anything from Hardware Preventive Maintenance? Seventh IEEE International Conference on Engineering of Complex Computer Systems, June 2001, Skovde, Sweden. IEEE Computer Society Press, Los Alamitos, CA. pp. 106– 111. [14] E. Marshall. 1992. Fatal error: how patriot overlooked a scud. Science, March 13, p. 1347. [15] N. Chapin, J. F. Hale, K. M. Khan, J. F. Ramil, and W. G. Tan. 2001. Types of software evolution and software maintenance. Journal of Software Maintenance and Evolution: Research and Practice, 13, 3–30. [16] N. Chapin. 2000. Software Maintenance Types—A Fresh View. Proceedings of the Inter- national Conference on Software Maintenance (ICSM), October 2000, San Jose, CA. IEEE Computer Society Press, Los Alamitos, CA. pp. 247–252. [17] C. Jones. 2007. Geriatric issues of aging software. CrossTalk: The Journal of Defense Software Engineering, December, 4–8. [18] D. L. Parnas. 1994. Software Aging. Proceedings of 16th International Conference on Software Engineering, May 1994, Sorrento, Italy. IEEE Computer Society Press, Los Alamitos, CA. pp. 279–287. [19] K. Maxwell, L. V. Wassenhove, and S. Dutta. 1996. Software development productivity of European space, military and industrial applications. IEEE Transactions on Software Engineering, October, pp. 706–718. [20] M. I. Halpern. 1965. Machine independence: its technology and economics. Communi- cations of the ACM, 8(12), 782–785. [21] R. F. Couch. 1971. Evolution of a toll mis – bell Canada. Management Information Sys- tems: Selected Papers from MIS Copenhagen 70—An IAG Conference (Eds W. Goldberg, T. H. Nielsen, E. Johnson, and H. Josefsen), pp. 163–188. Auerbach Publisher Inc., Princeton, NJ. [22] L. A. Belady and M. M. Lehman. 1976. A model of large program development. IBM Systems Journal, 15(1), 225–252. [23] P. Wegner. 1978. Research Direction in Software Technology. Proceedings of the 3rd International Conference on Software Engineering (ICSE), May 1978, Atlanta, Georgia. IEEE Computer Society Press, Los Alamitos, CA. pp. 243–259. [24] M. M. Lehman. 1980. Programs, life cycles, and laws of software evolution. Proceedings of the IEEE, September, 1060–1076. [25] K. H. Bennett and V.T. Rajlich. 2000. Software Maintenance and Evolution: A Roadmap. ICSE, The Future of Software Engineering, June 2000, Limerick, Ireland. ACM, New York. pp. 73–87. [26] L. J. Arthur. 1988. Software Evolution: The Software Maintenance Challenge. John Wiley & Sons. www.it-ebooks.info REFERENCES 77 [27] K. H. Bennett and J. Xu. Software Services and Software Maintenance. Proceedings of 7th European Conference on Software Maintenance and Reengineering, March 2003, Benevento, Italy. IEEE Computer Society Press, Los Alamitos, CA. pp. 3–12. [28] M. M. Lehman and J. Ramil. 2002. Software evolution and software evolution processes. Annals of Software Engineering, 14, 275–309. [29] S. L. Pfleeger. 1998. The nature of system change. IEEE Software, May/June, 87–90. [30] S. Cook, R. Harrison, M. M. Lehman, and P. Wernick. 2006. Evolution in software systems: foundations of SPE classification scheme. Journal of Software Maintenance and Evolution: Research and Practice, 18, 1–35. [31] M. M. Lehman. 1996. Feedback in the software evolution process. Information and Software Technology, 38, 681–686. [32] M. M. Lehman and L. A. Belady. 1985. Program Evolution: Processes of Software Change. Academic Press, London. [33] M. M. Lehman and J. F. Ramil. 2006. Software evolution. In: Software Evolution and Feedback (Eds N. H. Madhavvji, J. F. Ramil, and D. Perry). John Wiley & Sons, West Sussex. [34] M. M. Lehman, J. F. Ramil, P. D. Wernick, D. E. Perry, and W. M. Turski. 1997. Metrics and Laws of Software Evolution—The Nineties View. Proceedings of 4th International Symposium on Software Metrics (Metrics 97), November 1997. IEEE Computer Society Press, Los Alamitos, CA. pp. 20–32. [35] M. M. Lehman. 2001. Rules and tools for software evolution planning and management. Annals of Software Engineering, 11, 15–44. [36] C. K. S. Chong Hok Yuen. 1987. A Statistical Rational for Evolution Dynamics Concepts. Proceedings of the International Conference on Software Maintenance, September 1987, Austin, Texas. IEEE Computer Society Press, Los Alamitos, CA. pp. 156–164. [37] F. Brooks. 1993. The Mythical Man Month (2nd Ed.). Addison-Wesley, Reading, MA. [38] M. M. Lehman. 1996. Laws of Software Evolution Revisited. Proceedings of the 5th Euro- pean Workshop on Software Process Technology, Lecture Notes in Computer Science, Vol. 1149. Springer, London, pp. 108–124. [39] D. A. Garvin. 1984. What does product quality mean? Sloan Management Review, Fall, 25–45. [40] N. H. Madhavji, J. F. Ramil, and D. E. Perry (Eds). 2006. Software Evolution and Feedback: Theory and Practice. John Wiley & Sons, West Sussex. [41] G. Visaggio. 2001. Ageing of a data-intensive legacy system: symptoms and remedies. Journal of Software Maintenance and Evolution: Research and Practice, 13, 281–308. [42] C. K. S. Chong Hok Yuen. 1988. On Analyzing Maintenance Process Data at the Global and Detailed Levels: A Case Study. Proceedings of the International Conference on Software Maintenance, October 1988. Phoenix, Arizona. IEEE Computer Society Press, Los Alamitos, CA. pp. 248–255. [43] M. M. Lehman, D. E. Perry, and J. F. Ramil. 1998. On Evidence Supporting the Feast Hypothesis and the Laws of Software Evolution. Proceedings of the 5th International Software Metrics Symposium (Metrics), November 1998. IEEE Computer Society Press, Los Alamitos, CA. pp. 84–88. [44] W. M. Turski. 1996. Reference model for growth of software systems. IEEE Transactions on Software Engineering, August, 599–600. www.it-ebooks.info 78 TAXONOMY OF SOFTWARE MAINTENANCE AND EVOLUTION [45] M. Lawrence. 1982. An Examination of Evolution Dynamics. Proceedings of International Conference on Software Engineering (ICSE), September 1982. IEEE Computer Society Press, Los Alamitos, CA. pp. 188–196. [46] M. M. Lehman and J. F. Ramil. 2003. Software evolution—background, theory, practice. Information Processing Letters, 88(1–2), 33–44. [47] S. Williams. 2002. Free as in Freedom: Richard Stallman’s Crusade for Free Software. O’Reilly & Associates, Inc., Sebastopol, CA. [48] R. M. Stallman, L. Lessig, and G. Gay. 2002. Free Software, Free Society. Free Software Foundation, Cambridge, MA. [49] E. S. Raymond. 2001. The Cathedral and the Bazaar: Musings on Linux and Open Source by an Accidental Revolutionary. O’Reilly & Associates, Inc., Sebastopol, CA. [50] Y. Wang, D. Guo, and H. Shi. 2007. Measuring the evolution of open source software systems with their communities. ACM SIGSOFT Software Engineering Notes, 32(6), pp. 1–7. [51] J. F. Ramil, A. Lozano, and M. Wermelinger. 2008. Empirical studies of open source evolution. In: Software Evolution (Eds T. Mens and S. Demeyer). Springer, Berlin. [52] W. Scacchi. 2006. Understanding open source software evolution. In: Software Evolution and Feedback (Eds N. H. Madhavvji, J. F. Ramil, and D. Perry), pp. 181–2026. John Wiley & Sons, West Sussex. [53] K. Crowston and J. Howison. 2005. The social structure of free and open source software devellopment. First Monday, 10(2). [54] M. Aberdour. 2007. Achieving quality in open source software. IEEE Software, 24(1), 58–64. [55] A. Mockus, R. T. Fielding, and J. D. Herbsleb. 2002. Two case studies of open source software development: Apache and mozilla. ACM Transactions on Software Engineering and Methodology, July, 309–346. [56] S. S. Pirzada. 1988. A statistical examination of the evolution of the Unix system. PhD Thesis, Department of Computing, Imperical College, London, England. [57] M. W. Godfrey and Q. Tu. 2000. Evolution in Open Source Software: A Case Study. Proceedings of the International Conference on Software Maintenance (ICSM), October 2000. IEEE Computer Society Press, Los Alamitos, CA. pp. 131–142. [58] S. R. Schach, B. Jin, D. R. Wright, G. Z. Heller, and A. J. Offutt. 2002. Maintainability of the linux kernel. IEEE proceedings—Software, 149(1), pp. 18–22. [59] G. Robles, J. J. Amor, J. M. Gonzalez-Barahona, and I. Herraiz. 2005. Evolution and Growth in Large Libre Software Projects. Proceedings of Eighth International Workshop on Principles of Software Evolution (IWPSE), September 2005, Lisbon, Portugal. IEEE Computer Society Press, Los Alamitos, CA. pp. 165–174. [60] M. M. Lehman, J. F. Ramil, and U. Sandler. 2001. An Approach to Modelling Long- term Growth Trends in Software Systems. Proceedings of the International Conference on Software Maintenance (ICSM), November 2001, Florence, Italy. IEEE Computer Society Press, Los Alamitos, CA, pp. 219–228. [61] J. W. Paulson, G. Succi, and A. Eberlein. 2004. An empirical study of open-source and closed-source software products. IEEE Transactions on Software Engineering, April, 246–256. www.it-ebooks.info REFERENCES 79 [62] C. K. Roy and J. R. Cordy. 2006. Evaluating the Evolution of Small Scale Open Source Software Systems. 15th International Conference on Computing, Mexico City, November 2006. IEEE Computer Society Press, Los Alamitos, CA, Research in Computing Science, Vol. 23, pp. 123–136. [63] H. Gall, M. Jazayeri, R. Kl˝osch, and G. Trausmuth. 1997. Software Evolution Observed Based on the Product Release History. Proceedings of the International Conference on Software Maintenance (ICSM), October 1997, Bari, Italy. IEEE Computer Society Press, Los Alamitos, CA. pp. 160–166. [64] M. R. Vigder and J. Dean. 1997. An Architectural Approach to Building Systems from COTS Software Components. Proceedings of the 22nd Annual Software Engineering Workshop, December 1997, Greenbelt, MA. pp. 99–113. [65] D. Reifer, V. Basili, B. Boehm, and B. Clark. 2003. Eight lessons learned during cots- based systems maintenance. IEEE Software, September/October, 94–96. [66] J. Voas. 1998. Maintaining components-based systems. IEEE Software, July/August, 22–27. [67] S. Beydeda and V. Gruhn (Eds.) 2005. Testing Commerical-off-the-Shelf Components and Systems. Springer, Germany. [68] J. Morris, G. Lee, K. Parker, G. Bundell, and C. Lam. 2001. Software component certi- fication. IEEE Computer, September, pp. 30–36. [69] K. Naik and P. Tripathy. 2008. Software Testing and Quality Assurance: Theory and Practice. John Wiley & Sons, Inc., Hoboken. [70] C. V.Ramamoorthy and F. B. Bastani. 1982. Software reliability—status and perspectives. IEEE Transactions on Software Engineering, July, pp. 354–371. [71] M. Vigder and A. Kark. 2006. Maintaining Cots-based Systems: Start with Design. Proceedings of the 5th International Conference on Commercial-Off-The-Shelf (COTS)- Based Software Systems, February 2006, Orlando, FL. IEEE Computer Society Press, Los Alamitos, CA. pp. 11–18. [72] M. Vigder and J. Dean. 2000. Maintenance of cots-based systems. National Research Council of Canada, Institute for Information Technology, Ottawa, Ontario, Canada, NRC Report 43626, p. 6. [73] D. J. Carney, S. A. Hissam, and D. Plakosh. 2000. Complex cots-based software practical steps for their maintenance. Journal of Software Maintenance and Evolution: Research and Practice, 12, 357–376. [74] M. Krieger, M. Vigder, J. Dean, and M. Siddiqui. 2003. Coordination in COTS- based Development. Second International Conference on COTS-Based Software Sys- tems (ICCBSS), February 2003, Ottawa, Canada, LNCS-2580. Springer, pp. 123– 133. [75] D. Garlan, A. Robert, and J. Ockerbloom. 1995. Architectural mismatch: why reuse is so hard. IEEE Software, November, 17–26. [76] D. Garlan, A. Robert, and J. Ockerbloom. 2009. Architectural mismatch: why reuse is still so hard. IEEE Software, July/August, 66–69. [77] IEEE Standard 1219-1998. 1998. Standard for Software Maintenance. IEEE Computer Society Press, Los Alamitos, CA. [78] M. K. Mattsson, U. Westblom, S. Forssander, G. Andersson, M. Medin, S. Ebarasi, T. Fahlgren, S. E. Johansson, S. Tornquist, and M. Holmgren. 2001. Taxonomy of www.it-ebooks.info 80 TAXONOMY OF SOFTWARE MAINTENANCE AND EVOLUTION Problem Management Activities. Proceedings of the 5th European Conference on Soft- ware Maintenance and Reengineering, March, 2001, lisbon, Portugal, IEEE Computer Society Press, Los Alamitos, CA. pp. 1–10. [79] J. Buckley, T. Mens, M. Zenger, A. Rashid, and G. Kniesel. 2005. Towards a taxonomy of software change. Journal of Software Maintenance and Evolution: Research and Practice, 17(5), 309–332. [80] C. F. Kemerer and S. Slaughter. 1999. An empirical approach to studying evolution. IEEE Transactions on Software Engineering, July/August, 493–509. EXERCISES 1. Explain why it is important to make distinctions among the different types of software maintenance. 2. What are the causes of adaptive maintenance problems? What are the conse- quences of these problems? 3. Why should you fix something that is not broken? What are some reasons for performing perfective maintenance? 4. What are some common justifications for not doing perfective maintenance? 5. What is software aging? What are the common causes of software aging? How can those causes be eliminated? Discuss the answers in detail. (Hint: rejuvenate, software maintainability) 6. Explain the concept of software rejuvenation. Discuss the pros and cons of software rejuvenation. 7. Discuss the differences between hardware and software maintenance. 8. For each of the following situations, explain whether it is a hazard or a mishap. (a) Water in a swimming pool becomes electrified. (b) A room fills with carbon dioxide. (c) A car stops abruptly. (d) A long distance telephone company suffers an outage. (e) A nuclear weapon is destroyed in an unplanned manner. 9. Compare the software maintenance classification based on intention, activity, and evidence. 10. Explain the rationale behind Lehman’s laws. Under what circumstances those laws may not hold? 11. What is software entropy? Explain its relationship with software aging. www.it-ebooks.info EXERCISES 81 12. What is code decay? Discuss the symptoms of code decay. What are the causes of code decay? 13. Explain the term software cloning and discuss its significance with respect to software evolution. 14. Explain the inverse square model of system evolution. For the system data given in Table 2.6, plot the graph for the actual and calculated (inverse square model) sizes for the system. TABLE 2.6 System Data to be Used in Question 14 RSN Size RSN Size RSN Size 1 977 8 1800 15 2151 2 1344 9 1595 16 2091 3 1390 10 1897 17 2095 4 1226 11 1832 18 2101 5 1246 12 1897 19 2312 6 1492 13 1902 20 2167 7 1581 14 2087 21 2315 Source: Data, taken from Reference 44. © 1996 IEEE. 15. Discuss the differences between FOSS and CSS software evolution system. 16. Suppose that you wish to construct a system by combining the functionality of two COTS-based products together. Requests must be handled by sending them first to one of the COTS products, and sending the results from that product to the second. The results from this second product can then be returned to the original requester. Which standard middleware component would you use in building this system? (a) glue (b) wrapper (c) mediator (d) tailoring 17. Identify the cluster and evidence maintenance type for the following scenario. Explain your answer. “A maintainer was assigned to install three upgraded COTS components, the first of which implements a hardware upgrade. Each COTS component was received from a different vender, but all are used in one system. In attempting to install a test version of the system, the maintainer found that one of the upgrades was incompatible with the other two, and that those other two would not work with the existing in-use version of the third. After considerable diagnostic test runs, and obtaining an “its your problem” response from the vendor of the third component, the maintainer got approval to write a wrapper for one of www.it-ebooks.info 82 TAXONOMY OF SOFTWARE MAINTENANCE AND EVOLUTION the upgraded COTS components in order to fit with the continued use of the existing version of the third component. After successful regression testing, the maintainer had the configuration management data updated and the new wrapper recorded. The change, since it would be nearly transparent to the customer, was put into production at the time of the next scheduled release of a version of the systems, and the document was updated.” www.it-ebooks.info 3 EVOLUTION AND MAINTENANCE MODELS People seldom improve when they have no other model but themselves to copy after. —Oliver Goldsmith 3.1 GENERAL IDEA The software production processes comprise a set of activities starting from concep- tion to retirement. There are many software processes, differing primarily in their classifications of phases and activities. One traditional software development life cycle (SDLC) is shown in Figure 3.1, which comprises two discrete phases, namely, development and maintenance, the latter commonly approaching two-thirds of the product life span. As this diagram shows, about one-fourth to one-third of all soft- ware life cycle costs are attributed to software development, and the remaining cost is due to operations and maintenance. Note that the percentages in Figure 3.1 indicate relative costs. As listed below [1], software maintenance has unique characteristics, although many activities related to maintaining and developing software are similar: r Constraints of an existing system. Maintenance is performed on an operational system. Therefore, all modifications must be compatible with the constraints of the existing software architecture, design, and code.r Shorter time frame. A maintenance activity may span from a few hours to a few months, whereas software development may span 1 or more years. Software Evolution and Maintenance: A Practitioner’s Approach, First Edition. Priyadarshi Tripathy and Kshirasagar Naik. © 2015 John Wiley & Sons, Inc. Published 2015 by John Wiley & Sons, Inc. 83 www.it-ebooks.info 84 EVOLUTION AND MAINTENANCE MODELS Requirements definition 3% Preliminary design 3% Detailed design 5% Implementation 7% Testing 15% Operations and maintenance 67% Development phase Maintenance phase FIGURE 3.1 Traditional SDLC model. From Reference 1. © 1988 John Wiley & Sons r Available test data. In software development, test cases are designed from scratch, whereas software maintenance can select a subset of these test cases and execute them as regression tests. Thus, the challenge is to select appropriate test cases from the existing test suite. In addition to the regression test cases, new test cases need to be created to adequately test the code changes. Therefore, software maintenance should have its own software maintenance life cycle (SMLC) model as it involves many unique activities. On the other hand, soft- ware maintenance has got many similarities with software development, with a focus on product enhancement and correction, in addition to transforming requirements to software functionality. In this chapter, three maintenance models will be explained: reuse, simple staged, and change mini-cycle, representing, respectively, the old, rela- tively new, and still in research models. We examine in detail two standards, IEEE/EIA 1219 and ISO/IEC 14764, to manage and execute software maintenance activities. Software maintenance is at the heart of an evolving software product. Evolution, change, and system configuration complicate maintenance activities. The software product which is released to a customer is in the form of executable code, whereas the corresponding “product” within the supplier organization is source code. Source code can be modified without affecting the executable version in use. Thus, strict control must be kept, otherwise exact source code representation of a particular executable version may not exist. In addition, documentation associated with the executable code must be compatible, otherwise the customer may not be able to understand the system. Therefore, tight documentation control is necessary. In other words, the set of products that are released to the customer must be controlled. Software configuration management (SCM) is the way by which the process of software evolution is controlled. SCM provides a framework for managing changes in an efficient way. The functionalities and best practices of SCM are discussed in this chapter. In addition, we discuss a state transition model of a modification (or, change) request, as it flows through the organization. 3.2 REUSE-ORIENTED MODEL One obtains a new version of an old system by modifying one or several components of the old system and possibly adding new components. As a consequence, the new system is likely to reuse many components of the old system. A new version of www.it-ebooks.info REUSE-ORIENTED MODEL 85 Old system New system Requirements Requirements Design Design Code Code Test Test FIGURE 3.2 The quick fix model. From Reference 2. © 1990 IEEE the system can be created after the maintenance activities are implemented on some of the old system’s components. Based on this concept, three process models for maintenance have been proposed by Basili [2]: r Quick fix model. In this model, necessary changes are quickly made to the code and then to the accompanying documentation (Figure 3.2).r Iterative enhancement model. In this model, as illustrated in Figure 3.3, first changes are made to the highest level documents. Eventually, changes are prop- agated down to the code level.r Full reuse model. In this model, as illustrated in Figure 3.4, a new system is built from components of the old system and others available in the repository. The old system is reused by all of the three aforementioned models, and, therefore, those belong to the reuse-oriented paradigm. The models assume that the descriptions of the existing system are complete and consistent. Quick fix model. This model embodies a commonly used approach to software maintenance. In this model, as illustrated in Figure 3.2, (i) source code is modified to Old system New system Requirements Requirements Design Design Code Code Test Test Analysis Analysis FIGURE 3.3 The iterative enhancement model. From Reference 2. © 1990 IEEE www.it-ebooks.info 86 EVOLUTION AND MAINTENANCE MODELS Old system New system Requirements Requirements Design Design Code Code Test Test Repository {Ri} {Di} {Ci} {Ti} FIGURE 3.4 The full reuse model. From Reference 2. © 1990 IEEE fix the problem; (ii) necessary changes are made to the relevant documents; and (iii) the new code is recompiled to produce a new version. Often changes to the source code are made with no prior investigation such as analysis of impact of the changes, ripple effects of the changes, and regression testing. Moreover, resource constraints often entail that modifications performed to the code are not documented. Iterative enhancement model. This model is based on the Japanese principle of Kaizen, which means the incremental and progressive improvement of practices. Iterative and incremental development methodologies were practiced in early 1950s, before Winston Royce’s Waterfall model [3] was widely used. An alternative approach to software maintenance is suggested by the iterative and incremental models. Those two models have the following ideas in common: (i) it is difficult to fully comprehend a large set of requirements for a system and (ii) developers may find it difficult to build the full system in one go. Therefore, a complete system is developed in progressively larger builds, where one build refines the requirements of the preceding build by taking user inputs into account [4]. The iterative enhancement model, explained in Figure 3.3, shows how changes flow from the very top-level documents to the lowest-level documents. The model works as follows: r It begins with the existing system’s artifacts, namely, requirements, design, code, test, and analysis documents.r It revises the highest-level documents affected by the changes and propagates the changes down through the lower-level documents.r The model allows maintainers to redesign the system, based on the analysis of the existing system. Remark: The terms iteration and increment are liberally used when discussing iterative and incremental development. However, they are not synonyms in the field of software engineering. On the one hand, iteration implies that a process is basically cyclic, thereby meaning that the activities of the process are repeatedly executed in a structured manner. On the other hand, increment implies some quantifiable outcome of an iteration. Iterative development is based on scheduling strategies in which time is set aside to improve and revise parts of the system under development. www.it-ebooks.info THE STAGED MODEL FOR CLOSED SOURCE SOFTWARE 87 Incremental development is based on staging and scheduling strategies in which parts of the system are developed at different times and/or paces and integrated as they are completed. The model is effectively a three-phase cycle: analysis, characterization of proposed enhancements, and redesign and implementation. A new build is constructed by starting with an analysis of the existing system’s requirements, followed by design, coding, and testing. Next, documents at all levels, which are affected by the changes, are modified. Reuse, as explained in Chapter 9, is explicitly supported by the model. The model also accommodates the quick fix model. The iterative enhancement model gives us the key advantage that documentation is kept up-to-date with changes made to the code. With replicated controlled experiments, Visaggio [5] compared the iterative model and the quick fix model with respect to maintainability. It has been shown that main- tainability of systems degrade faster with the quick fix model. In addition, the itera- tive enhancement model enables organizations to perform maintenance modifications faster than those adopting the quick fix model. In general, an organization may adopt the quick fix model if they do not have time. Therefore, the latter observation is counterintuitive. Full reuse model. The model illustrated in Figure 3.4 shows maintenance as a special case of reuse-based software development. The main assumption in this model is the availability of a repository of artifacts describing the earlier versions of the present and similar systems. Full reuse comprises two major steps: r perform requirement analysis and design of the new system; andr use the appropriate artifacts, such as requirements, design, code, and test from any earlier versions of the old system. In the full reuse model, reuse is explicit and the following activities are performed: r identify the components of the old system that are candidates for reuse;r understand the identified system components;r modify the old system components to support the new requirements; andr integrate the modified components to form the newly developed system. 3.3 THE STAGED MODEL FOR CLOSED SOURCE SOFTWARE Rajlich and Bennett [6] have defined a simple staged model to represent the traditional commercial Closed Source Software (CSS) life cycle. Their model comprises a sequence, as illustrated in Figure 3.5, of five stages: r Initial development. Develop the first functioning version of the software.r Evolution. The developers improve the functionalities and capabilities of the software to meet the needs and expectations of the customer. www.it-ebooks.info 88 EVOLUTION AND MAINTENANCE MODELS Evolution Initial development Servicing Phaseout Closedown Evolution changes Servicing patches First running version Loss of evolvability Servicing discontinued Switchoff FIGURE 3.5 The simple staged model for the CSS life cycle. From Reference 6. © 2000 IEEE r Servicing. The developers only fix minor and emergency defects, and no major functionality is included.r Phaseout. In this phase, no more servicing is undertaken, while the vendors seek to generate revenue as long as possible.r Closedown. The software is withdrawn from the market, and customers are directed to migrate to a replacement. Initial development. Software developers build the first version of the system from scratch to satisfy the initial requirements. The initial development includes design, initial coding, and testing. Generally, no releases are made public to the customers in this stage. The first version may lack some functionality, but it lays two important foundations for future iterations, namely, the software architecture and the team knowledge: r The software architecture. The components of the software, the interactions among them, and their desired properties, such as efficiency and functionality, continue to stay intact through the remains of the life cycle of the system.r The team knowledge. During initial development, the software engineering team acquires knowledge about the application domain, user requirements, business process, data formats, algorithms, weaknesses and strengths of the software architectures, and execution environment. For the subsequent stages of the life- cycle of the software system, this knowledge is considered to be crucial. Evolution. The software system moves to the evolution stage after the initial devel- opment is successful. Software developers extend the functionalities and capabilities of the system to meet the needs and expectations of the customers. In this stage: (i) quick patches and new releases are dispatched to the customers and (ii) feedback from the customers are received for additional enhancement to the software system. www.it-ebooks.info THE STAGED MODEL FOR CLOSED SOURCE SOFTWARE 89 Customer demands for additional functionalities and competitive products from other vendors cause the system to evolve. In addition, evolution of the system may occur due to changes in the operating environment and the business practice. An example of change in the business practice is to target enterprise markets instead of the service provider market segment. Sometimes, the developing company releases the software system right after the initial development. However, often a system is released in its evolution phase after it has undergone many quality improvement cycles. For example, reliability and stability are improved during a system’s quality improve- ment cycle. The exact release date for the product is based on several factors such as timeliness, quality, innovation, and business goals of the company [7]. Servicing. For software to evolve easily, it has to have an appropriate architecture and the software team has to have the necessary expertise. When either architectural integrity or the expertise of the architecture is missing, the software ceases to easily evolve, and it makes a transition to its servicing stage. The system is viewed to have aged or decayed in the servicing stage. In this stage, the software is considered to have matured and simple modifications are made to the source code, without providing user perceivable enhancements. Changes in this stage are expensive and difficult. Therefore, software developers minimize the number of changes or use wrappers as a way to effect changes. Each of these changes further weakens the system architecture, thereby increasing the need for further servicing. Chapin et al. [8] refer to the servicing stage as the real maintenance phase. After considering the economic profitability of the system, a decision is made to transition the system from the evolution stage to the servicing stage. When new revenues from a software system do not justify the cost of performing modifications, the system is designated as a legacy system and it is no more evolved. Phaseout. During the phaseout stage, the supplier may decide to not perform any more servicing. The software may still be in use, but because change requests (CRs) are no longer honored, it is becoming increasingly outdated. The users must work around the known deficiencies of the system more often. Going back to an earlier servicing stage becomes very difficult because of the increasingly large number of CRs. Eventually, the software system becomes a legacy system application. Closedown. During the final shut down, the vendor pulls out the software product from the market and makes recommendations to the customers for alternative solu- tions. The supplier may have certain pending contractual responsibilities, namely legal obligations and source code retention. In the areas of outsourced software, source code retention is an important responsibility. As the software system moves from the phaseout to the closedown stage and if the software is still found to be businessworthy to its stakeholders, the system is called a legacy system. For a legacy system, it is prudent to move to a newer system which provides similar functionalities, without exhibiting the poor quality of the legacy system. One version of the staged model for CSS is called versioned staged model and it has been illustrated in Figure 3.6. The model shown in Figure 3.6 has essentially the same stages as found in Figure 3.5, but separate evolution tracks from the initial development are found in Figure 3.6. The evolution process is the backbone of the model. Each evolution track includes servicing, phaseout, and closedown. At certain www.it-ebooks.info 90 EVOLUTION AND MAINTENANCE MODELS Evolution, version 1 Initial development Servicing, version 1 Phaseout, version 1 Closedown, version 1 Evolution changes Servicing patches First running version Evolution of new version Evolution, version 2 Evolution changes Servicing, version 2 Phaseout, version 2 Closedown, version 2 Servicing patches Evolution of new version Evolution, version ... FIGURE 3.6 The versioned staged model for the CSS life cycle. From Reference 6. © 2000 IEEE time frames, a version of the software is completed and released to the customers. The evolution of the software does not stop at that point; rather, it continues and eventually produces the next version. The released version is no longer evolved but only serviced. Many organizations use a scheme such as , where version reflects the strategic changes made to the system during evolution, release reflects the servicing patches, and build reflects the, say, daily internal build of the software. 3.4 THE STAGED MODEL FOR FREE, LIBRE, OPEN SOURCE SOFTWARE Capiluppi et al. [9] revised the staged model for its applicability to Free, Libre, Open Source Software (FLOSS) systems, as shown in Figure 3.7. The authors provide empirical evidence to justify the FLOSS model. The model benefits developers by characterizing FLOSS systems in terms of stages and indicating which stage the system is currently in and to which stage the system is more likely to transition. Three major differences are identified between CSS systems and FLOSS systems. The first one is related to the availability of releases. CSS systems are available to the customers in a running condition after having been tested enough. On the other hand, a FLOSS system is posted on the versioning system repositories much before the official release. Therefore, binaries as well as source code can be downloaded not only by end users but by developers as well. The revised model shown in Figure 3.7 reflects www.it-ebooks.info CHANGE MINI-CYCLE MODEL 91 Evolution Initial development Servicing Phaseout Closedown Evolution changes Servicing patches First running version Loss of evolvability Servicing discontinued Switchoff FIGURE 3.7 The staged model for the FLOSS system. From Reference 9. © 2007 ACM the aforementioned difference between FLOSS and CSS systems. In Figure 3.7, the rectangle with the label “Initial development” has been visually highlighted because it can be the only initial development stage in the evolution of FLOSS systems. In other words, it does not have any evolution track for FLOSS systems. The second difference concerns the transition from the evolution to the servicing stage. Based on the empirical data from several FLOSS systems, it was observed that a new development stage is reached following a phase without much enhancements. With some systems that were analyzed, after a transition from evolution to servicing, a new period of evolution was observed. This possibility is depicted in Figure 3.7 as a broken arc from the servicing stage to the evolution stage. The third difference is a possibility of a transition from phaseout stage to evolution stage for FLOSS systems. A case study of a FLOSS system was illustrated by Capiluppi et al. [9]. In the said case study, a new team of developers took over the maintenance task that was abandoned by the previous developed team. In general, the active developers of FLOSS systems get frequently replaced by new developers. Therefore, the dashed line in Figure 3.7 exhibits this possibility of a transition from phaseout stage to evolution stage. 3.5 CHANGE MINI-CYCLE MODEL Software change is a fundamental ingredient of software evolution and maintenance. Let us revisit the first law of software evolution which is stated as “A program undergoes continuing changes or becomes less useful. The change process continues until it becomes cost-effective to replace the program with a re-created version.” The CCS staged model discussed earlier is based on the above fundamental premise. The difficulty of software changes distinguishes the two stages: evolution and servicing. Whereas substantial software changes are allowed in the evolution stage, in the Servicing stage limited changes are permitted. Note that iterative modification is www.it-ebooks.info 92 EVOLUTION AND MAINTENANCE MODELS Analyze and plan change Program comprehension Change impact analysis Implement change Restructuring Change propagation Verify and validate Request rejected Further changes required Documentation change Change request Select a new request change FIGURE 3.8 The change min-cycle. From Reference 12. © 2008 Springer the primitive building block from which both the evolution and servicing stages are derived. Software change is a process that may introduce new requirements to the existing system. In addition, there may be a need to alter the software system if the require- ments are not correctly implemented. In order to capture this, an evolutionary model, known as change mini-cycle (Figure 3.8), was proposed by Yau et al. [10] in the late 1970s and revisited by other researchers, namely, Bennet et al. [11] and Mens [12]. The change mini-cycle model consists of five major phases: CR, analyze and plan change, implement change, verify and validate, and documentation change. In this process model, new significant activities were identified to reflect the fact that software changes are rarely isolated. Examples of those new activities are change impact anal- ysis and change propagation. These activities continue to be the subjects of research. Change request. A CR generally originates from the management, users of the software, or customers. A CR may take one of the following two forms: defect report and enhancement request. r A defect report describes the defect and software system actions that are out of line with requirements.r An enhancement request describes a change to the requirements, functionality, or quality of the system. The above two items were in the focus of practitioner’s concerns that can be traced back to the circa 1972 article “That maintenance ‘iceberg’” by Canning [13]. www.it-ebooks.info CHANGE MINI-CYCLE MODEL 93 According to the said article, practitioners observe maintenance narrowly as cor- recting errors and broadly as expanding and extending software functionality. In this book, CR refers to both the aforementioned views. The CR document must capture a minimal set of information about changes to software, hardware, and documentation. Analyze and plan change. In the second phase, program comprehension and impact analysis are conducted. Program comprehension [14] is essential to understanding which parts of the software will be affected by a CR. Program comprehension is basically a process of acquiring useful information from source code. One such information is the location of the domain-specific concept in the source code [15, 16]. The code implementing the concepts may need to be changed in order to provide a solution to the CR. Concepts are units of human knowledge that can be processed by the human mind in one instance. As an example, let us consider the CR “Add a debit card payment issued by Chautauqua bank to the ATM system.” In order to change the implementation, the maintenance engineer must locate those system components that implement the concepts “debit card,” “payment,” and “issued” embedded in the CR. The idea here is to identify the set of system components that are thought to be initially affected by the CR [17]. The identified system components are called Starting Impact Set (SIS) which is discussed in Chapter 6. A thorough discussion of program comprehension is given in Chapter 8. Impact analysis is conducted to identify the potential consequences of a change and estimate the resources needed to accomplish the change [18]. By means of impact analysis, a software system is analyzed by maintenance personnel to identify the software components that will be affected by a CR. In this analysis, first decide if the components, which are neighbors of the SIS, also need to be modified due to the ripple effect [19]. A neighboring component is added to the set if it needs to be modified. For the newly added component, identify which of its neighboring components will be modified, and add them to the set. The process of identifying new components to be modified is repeated until it is found that a modification will not impact new neighboring components. The resulting set of components estimated to be modified is known as the estimated impact set (EIS). The objectives of impact analysis are as follows: r to determine the set of system components to be affected, given the SIS identified by program comprehension activity;r to develop accurate estimates of the resources needed to accomplish the imple- mentation task; andr to analyze the cost and benefits of the CR and make a decision on whether or not to implement the CR. Software developers use the information gathered from impact analysis in planning how to implement a CR. Moreover, the goal of impact analysis is to minimize unexpected side effects of change. A side effect is an error or an undesirable behavior that occurs as a result of a modification in the software [20]. Chapter 6 discusses impact analysis in greater detail. www.it-ebooks.info 94 EVOLUTION AND MAINTENANCE MODELS Implement change. The CR is implemented after the feasibility of a change is estab- lished. However, before the implementation of the CR, restructuring or refactoring of the software is performed in order to accommodate the requested modification. Refactoring is essentially the object-oriented variant of restructuring [21]. Restructur- ing is most often required in software maintenance; otherwise, systems lose structure. Restructuring is a means of restoring order to understand and change; the restructured product is less susceptible to error when future changes are made. Refactoring, dis- cussed in Chapter 7, improves the software structure without changing their behavior. Implementing a change comprises a number of steps, each focusing on one specific software component after the completion of refactoring. If a component is changed, it may cease to be compatible with the components with which it interacts. Therefore, non-essential changes must be made in the interacting components, thereby creating a ripple effect throughout the system. The aforementioned activity, generally called a change propagation activity [22, 23], ensures that a modification performed in one component is completely reflected throughout the entire system. Chapter 6 discusses change propagation in greater detail. Verify and validate. In this phase the software system is verified and validated in order to assure that the integrity of the system has not been compromised. This activity includes code review, regression testing, and execution of new tests if necessary. Regression testing comprises a subset of the unit-, integration-, and system-level tests [24]. If the results are unsatisfactory, then the actualization of the request is rejected which in turn is investigated and further changes are implemented. Documentation change. The final phase of the change mini-cycle deals with updat- ing the program documentation. It is time to complete the documentation aspect which may include updating the requirements, functional specifications, and design specifi- cations to be consistent with the code. In addition, user manuals and installation and troubleshooting guides are accordingly updated. 3.6 IEEE/EIA MAINTENANCE PROCESS The IEEE/EIA 1219 standard [25] explains a process for executing and managing activities for software maintenance. The standard basically explains maintenance as a fundamental life cycle process and describes maintenance as the process of a software product undergoing “modification to code and associated documentation due to a problem or the need for improvement. The objective is to modify the existing software product while preserving its integrity” (p. 6–1 of Reference 26). The standard focuses on a seven-phase activity model of maintenance as illustrated in Figure 3.9. The seven phases are listed below: r Identification of problemsr Analysisr Designr Implementation www.it-ebooks.info IEEE/EIA MAINTENANCE PROCESS 95 Seven phases Acceptance test Delivery Problem identification Analysis Design Implementation System test Modification request (MR) FIGURE 3.9 Seven phases of IEEE maintenance process. From Reference 26. © 2004 IEEE r System testr Acceptance testr Delivery Each of the seven activities have five associated attributes as follows: r Activity definition. This refers to the implementation process of the activity.r Input. This refers to the items that are required as input to the activity.r Output. This refers to the items that are produced by the activity.r Control. This refers to those items that provide control over the activity.r Metrics. This refers to the items that are measured during the execution of the activity. Problem identification. A request for change to the software is normally made by the users of the software system or the customers, and it starts the maintenance process. The request for change is submitted in the form of a modification request (MR) for a correction or for an enhancement. It may be noted that MR and CR are interchangeably used in maintenance literature. The maintenance (or sustaining) organization: (i) determines the type of request; (ii) determines the appropriate main- tenance category – corrective, adaptive, or perfective; (iii) assigns a priority level; and (iv) assigns a unique identification number. Activities included in this phase are as follows: (i) reject or accept the MR; (ii) identify and estimate the resources needed to change the system; and (iii) put the MR in a batch of changes scheduled www.it-ebooks.info 96 EVOLUTION AND MAINTENANCE MODELS Problem identification Uniquely identify MR Enter MR into repository Modification request (MR) No. of MR submittals No. of duplicate MRs Time expended for problem validation Validated MR Process determinations INPUT OUTPUT METRICS CONTROL FIGURE 3.10 Problem identification phase for implementation. The process of collecting and reviewing MRs, such as number of MR submitted and number of MR rejected, begins in this phase. For the problem identification phase, the input, output, control, and metrics have been summarized in Figure 3.10. Analysis. The inputs to this phase are a validated MR, an initial resource estimation, repository information, and project documentation. Repository is the location in which all software-related artifacts are stored. The process is viewed to have two major components: feasibility analysis and detailed analysis. First, feasibility analysis is performed to (i) determine the impact of the change, (ii) investigate other possible solutions including prototyping, (iii) assess both short-term and long-term costs, and (iv) determine the benefits of making the change. After selecting a specific approach, the second phase of detailed analysis is undertaken. The second phase identifies (i) firm modification requirements, (ii) the software components involved, (iii) an overall test strategy, and (iv) an implementation plan. The standard puts emphasis on at least three levels of tests: unit, integration, and acceptance. In addition, regression tests are associated with each of the three levels of tests. Figure 3.11 summarizes input, control, metrics, and output for the analysis phase. Upon completion of the analysis phase, a number of actions are taken: (i) risk analysis is performed; (ii) the preliminary resource estimate is updated; and (iii) by involving the customer, it is decided whether or not to proceed on to the next phase. If it is decided to move on to the next phase, the phase deliverables, including a detailed analysis report, are specified. The standard suggests several metrics to be gathered, such as the number of requirement changes, elapsed time, and the error rate generated. Design. A modification to the system begins in this phase based on the information gathered up to this point. The information includes system and project documenta- tion, the output of the analysis phase, existing software, and repository information. Activities of this phase are as follows: (i) identify the affected software components; (ii) modify the software components; (iii) document the changes; (iv) create a test suite for the new design; and (v) select test cases for regression testing. This phase provides a revised design baseline, revised test plans, an up-to-date detailed analysis report, revised risk analysis, and verified requirements. Figure 3.12 summarizes the input, output, metrics, and control for the design phase. www.it-ebooks.info IEEE/EIA MAINTENANCE PROCESS 97 Analysis Conduct technical review Verify that documentation is updated Verify test strategy Identify safety and security issues Validated MR Project document Repository information Requirement changes Documentation error rates Effort per function area Elapsed time (schedule) Error rates generated by priority and type Feasibility report for MR Detailed analysis report Preliminary modification report Test strategy Implementation plan INPUT OUTPUT METRICS CONTROL FIGURE 3.11 Analysis phase Implementation. The design phase produces the primary inputs to this phase. The activities executed in this phase are: writing new code and performing unit testing, integrating changed code, conducting integration and regression testing, performing risk analysis, and reviewing the system for test readiness. To assess whether or not the system is ready for system-level testing, a review is performed in this phase. In this phase, risk analysis and reviews are periodically performed, rather than at the end of the phase. Multiple reviews need to be performed due to the fact that a large percentage of design, performance issues, risks, and cost are exposed while changing the system. All documentations, including the software, design, test, user, and training Design Conduct software inspection Verify that design is documented Complete traceability of requirements to design Project document Analysis phase output Source code database Software complexity Design changes Effort per function area Elapsed time (schedule) Test plan and procedure changes Error rates generated by priority and type Number of lines of code added, deleted, modified, and tested Revised modification list Updated design baseline Updated test plan Revised detail analysis Verified requirements Revised implementation plan Documented constraints and risks INPUT OUTPUT METRICS CONTROL FIGURE 3.12 Design phase www.it-ebooks.info 98 EVOLUTION AND MAINTENANCE MODELS Implementation Conduct software inspections Ensure that unit and integration testing are performed Ensure software placed under CM control Ensure training and documentation have been updated Verify traceability of design to code Project document System document Results of design phase Source code Function points Source lines of code Error rates generated by priority and type Updated software Updated design documents Updated test documents Update user documents Update training materials Verified requirements Statement of risk Test readiness review report INPUT OUTPUT METRICS CONTROL FIGURE 3.13 Implementation phase information are updated. For the implementation phase, the input, output, metrics, and control are summarized in Figure 3.13. System test. In this phase, tests are performed on the full system to ensure that the modified system complies with the original requirements as well as the new modifica- tions. System-level testing comprises a broad spectrum of testing activities: function- ality testing, robustness testing, stability testing, load testing, performance testing, security testing, and regression testing. Regression testing is conducted to validate that no new faults have been introduced. Quite often, during the maintenance process, the sustaining test engineers execute the system test cases [24]. Finally, the maintenance personnel verify whether or not the system is ready to perform acceptance testing. This phase accepts as its input a system test plan consisting of detailed test cases, test readiness review report, and an updated system. This phase provides a test report, a fully integrated tested system, and test readiness review report. For the system test phase, the input, output, metrics, and control are summarized in Figure 3.14. Acceptance test. Acceptance testing is performed on a completely integrated sys- tem, and it involves customers, users, or their representatives. The main objective of System test CM control of code and listings CM control of MRs CM control of test documentation Updated system Updates software documentation Test readiness review report Error rates generated by priority and type Errors corrected Errors generated Tested integrated system Test report Test readiness review report INPUT OUTPUT METRICS CONTROL FIGURE 3.14 System test phase www.it-ebooks.info ISO/IEC 14764 MAINTENANCE PROCESS 99 Acceptance test Execute acceptance tests Report test results Conduct functional audit Establish new baseline Place acceptance test documentation under CM Test readiness review report Fully integrated system Acceptance test plan Acceptance test cases Acceptance test procedures Error rates generated by priority and type Errors corrected Errors generated New system baseline Functional configuration audit report Acceptance test report INPUT OUTPUT METRICS CONTROL FIGURE 3.15 Acceptance test phase acceptance testing is to assess the overall quality of the system, rather than actively identify defects. As an aside, on the other hand, the objective of system testing is to search for defects [24]. An important concept in acceptance testing is the customer’s expectation from the system. The primary inputs to this phase are the test readiness review report, a fully integrated system, and a test plan with detailed test cases for acceptance testing. At the end of acceptance testing, a test report is generated. The report explains the status of the criteria that was agreed upon for successful com- pletion of acceptance testing. The status report is communicated to the committee responsible for review. The customer chairs the review committee to evaluate the exit criteria and the test report to make sure that the system is ready for a release. For the acceptance test phase, the input, output, metrics, and control are summarized in Figure 3.15. Delivery. In this phase, the changed system is released to customers for installation and operation. Included in this phase are the following activities: notify the user community, perform installation and training, and develop an archival version of the system for backup. For the delivery phase, the input, output, metrics, and control are summarized in Figure 3.16. Guidelines on maintenance practices are also recommended by the standard in its appendices. For example, guidelines for maintenance practices include a guideline to make a maintenance plan; Table 3.1 shows the key sections of a maintenance plan. 3.7 ISO/IEC 14764 MAINTENANCE PROCESS The document ISO/IEC 14764 [27] is an international standard for software main- tenance, and it describes maintenance using the same concepts as IEEE/EIA 1219 except that they are depicted slightly differently. An iterative process to execute and manage maintenance activities is described in the document. The basic structure of an ISO process is made up of activities, and an activity is made up of tasks. To change www.it-ebooks.info 100 EVOLUTION AND MAINTENANCE MODELS Delivery Arrange physical configuration audit Complete version description document Complete updates to status accounting database Tested and accepted system Documentation changes − training manuals − operation guidelines − version description document Physical configuration audit report Version description document INPUT OUTPUT METRICS CONTROL FIGURE 3.16 Delivery phase an operational software without breaking its integrity, the necessary activities are described in the maintenance process. Upon an activation of the maintenance process, plans and procedures are developed and resources are allocated to carry out maintenance. In response to a CR, code is mod- ified in conjunction with the relevant documentation. Modification of the running soft- ware without losing the system’s integrity is considered to be the overall objective of maintenance. The maintenance process enables the software product to migrate from its initial environment at its inception to new environments. The maintenance process is terminated upon the eventual decommissioning of the product, commonly known as being retired. The maintenance process comprises the following high-level activities: 1. Process implementation 2. Problem and modification analysis 3. Modification implementation 4. Maintenance review and acceptance 5. Migration 6. Retirement The maintenance process activities developed by ISO/IEC are shown in Figure 3.17. Each of these activities is made up of tasks, and each task describes a specific action with inputs and outputs. A task specifies what to do, but not how to do [28]. Inputs refer to the items that are used by the maintenance activity to generate outputs. Effective controls are needed to provide useful guidance so that the maintenance activity produces the desired outputs. Outputs are objects generated by the maintenance activity. Support refers to the items that support the maintenance activity. Process implementation. This activity establishes plans and procedures to be fol- lowed. A maintenance plan is made concurrently with the plan for development. Figure 3.18 graphically summarizes the process implementation activity with the www.it-ebooks.info ISO/IEC 14764 MAINTENANCE PROCESS 101 TABLE 3.1 Template of a Maintenance Plan 1. Introduction This section outlines the goals, purpose, and general scope of the maintenance effort. Also, deviations from the standard are identified. 2. References The documents that impose constraints on the maintenance effort are identified in this section. In addition, other documents supporting maintenance activities are identified. 3. Definitions All terms required to understand the maintenance plan are defined in this section. If some terms are already defined in other documents, then references are provided to those documents. 4. Software Maintenance Overview This section briefly describes the following aspects of the maintenance process: 4.1 Organization 4.2 Scheduling Priorities 4.3 Resource Summary 4.4 Responsibilities 4.5 Tools, Techniques, and Methods 5. Software Maintenance Process This section describes the actions to be executed in each phase of the maintenance process. Each action is described in the form of input, output, process, and control. 5.1 Problem Identification/Classification and Prioritization 5.2 Analysis 5.3 Design 5.4 Implementation 5.5 System Testing 5.6 Acceptance Testing 5.7 Delivery 6. Software Maintenance Reporting Requirements This section briefly describes the process for gathering information and disseminating it to members of the maintenance organization. 7. Software Maintenance Administrative Requirements Describes the standard practices and rules for anomaly resolution and reporting. 7.1 Anomaly Resolution and Reporting 7.2 Deviation Policy 7.3 Control Procedures 7.4 Standards, Practices, and Conventions 7.5 Performance Tracking 7.6 Quality Control of Plan 8. Software Maintenance Documentation Requirements Describes the procedures to be followed in recording and presenting the outputs of the maintenance process. Source: From Reference 25. © 1998 IEEE. www.it-ebooks.info 102 EVOLUTION AND MAINTENANCE MODELS Maintenance review/ acceptance Modification implementation 2 Process implementation Retirement Migration Problem and modification analysis 1 6 3 4 5 FIGURE 3.17 ISO/IEC 14764 iterative maintenance process. From Reference 26. © 2004 IEEE input, output, control, and support items. The process implementation activity con- sists of three major tasks as explained in the following: Maintenance plan: The maintenance plan describes a strategy to maintain the sys- tem, whereas the procedures for maintenance describe in details how to actually accomplish maintenance. The plan also describes how to: (i) organize and staff the maintenance team; (ii) assign responsibilities among team members; and (iii) schedule resources. The main idea is to provide cost-effective support to the maintenance team. Modification requests: Users submit modification (or change) requests to commu- nicate with the maintainer. The maintainer establishes procedures to receive, record, and track user requests for modifications and giving them feedback. The problem resolution process is initiated whenever an MR is received. MRs are Process implementation QA audit joint reviews System baseline System documentation Modification request Documentation process Configuration management process Quality assurance process Joint review process Maintenance plan Training plan Maintenance procedures Problem resolution procedures Maintenance manual Plans for user feedback INPUT OUTPUT SUPPORT CONTROL FIGURE 3.18 Process implementation activity www.it-ebooks.info ISO/IEC 14764 MAINTENANCE PROCESS 103 Problem and modification analysis QA audit Peer reviews Management reviews System baseline System repository Modification request Functional requirements Interface requirements Configuration status information Project planning data Outputs from process implementation activity Documentation process Quality assurance process Problem resolution process Impact analysis Recommended option Approved modification Test strategy Software documentation Updated requirements Updated test plans Updated test procedures INPUT OUTPUT SUPPORT CONTROL FIGURE 3.19 Problem and modification activity classified by the maintainer as either problem reports (corrective) or enhance- ment (adaptive and perfective) requests. The maintenance process prioritizes and tracks these requests individually as different types of maintenance are there. A thorough discussion of MR workflow is given in Section 3.9. Configuration management (CM): The software product and any changes made to it during its maintenance lifespan need to be controlled. Basically, change control is performed by enforcing and implementing an approved SCM process. The SCM process is implemented by developing and following a configuration management plan (CMP) and the corresponding procedures. It is discussed in detail in Section 3.8. Problem and modification analysis. This activity is invoked after the software system transitions from development stage to maintenance stage, and it is called iteratively when the need for modification arises, as depicted in Figure 3.17. The maintainer analyzes the MR to identify its impact on the organization, the existing system, and the interfacing systems. Further, the maintainer (i) develops and docu- ments potential solutions and (ii) obtains the approval from the upper management to implement the solutions. Figure 3.19 graphically summarizes the problem and modification analysis activity with the input, output, control, and support items. This activity comprises five tasks as discussed in the following paragraphs. MR analysis: The maintainer analyzes the MR to determine the impact on the organization, hardware, the existing system, other interfacing systems, docu- mentation, data structures, and humans (operators, maintainers, and users). The overall objective of impact analysis is to determine all the entities that are going to be modified and/or affected if the MR is going to be implemented. The steps of impact analysis are given in Table 3.2. Verification: The maintainer must reproduce the problem and document the test results in the laboratory environment if the MR is corrective in order to deter- mine the validity of the MR. For adaptive and perfective maintenance tasks, verification is not required. The maintainer designs a test strategy to verify and replicate the problem. www.it-ebooks.info 104 EVOLUTION AND MAINTENANCE MODELS TABLE 3.2 Modification Request Task Steps 1. Decide whether or not the maintainer is adequately staffed to make the proposed changes. 2. Decide whether or not the maintenance program has received adequate budget. 3. Decide whether or not enough resources are available and whether the proposed change will effect some current or future projects. 4. Determine the operational issues to be considered. 5. Determine handling priority. 6. Classify the type of maintenance. 7. Determine the impact to current and future users. 8. Determine safety and security implications. 9. Identify ripple effects. 10. Determine any hardware or software constraints that may result from the proposed changes. 11. Estimate the values of the benefits of making the changes. 12. Determine the impact on existing schedules. 13. Document the risks resulting from the impact analysis. 14. Estimate the evaluation to be performed. 15. Estimate the cost of management to execute the modification. 16. Place developed artifacts under CM. Options: The maintainer must outline two or more alternative solutions to the MR based on the analysis performed. The alternative solutions report must include the cost, effort, and schedule for implementing different solutions. The maintainer must perform the task steps shown in Table 3.3 to identify alternative solutions to the MR. Documentation: The maintainer documents the MRs, the analysis results, and the implementation option report after the analysis is complete and the alter- nate solutions are identified. The maintainer may use the task steps shown in Table 3.4 to write this document. Approval: The maintainer submits the analysis report to the appropriate authority in the organization to seek their approval for the selected change option. Upon approval, the maintainer updates the requirements if the MR is an enhancement (improvement). TABLE 3.3 Option Task Steps 1. The MR is assigned a work priority. 2. Explore a work-around for the problem. If a work-around exists, provide it to the user. 3. Identify concrete requirements for the planned modification. 4. Calculate the magnitude and size of the planned modification. 5. Identify a variety of options to execute the planned modification. 6. Estimate the impacts of the options on the users and system hardware. 7. Analyze the risks of each option. 8. Document the outcomes of risk analysis for each of the proposed options. 9. Develop a widely acceptable plan to implement the modification. www.it-ebooks.info ISO/IEC 14764 MAINTENANCE PROCESS 105 TABLE 3.4 Documentation Task Steps 1. Ensure result analyses have been completed and documentations updated. If documentations do not exist, develop new documentation. 2. For accuracy, review the planned strategy to perform tests and review the schedule. 3. Review resource estimates for accuracy. 4. Revise the database for storing accounting status. 5. Describe a procedure to decide whether or not to approve the MR. Modification implementation. In this activity, maintainers (i) identify the items to be modified and (ii) execute a development process to actually implement the modifications. The maintainer determines the type of documentation, software units, and version of the software that are to be changed. Though development becomes part of the modification activity, it is tailored to eliminate the activities that do not apply to maintenance effort, such as requirement elicitation and architectural design. To ensure that the modified or the newly added requirements are correctly implemented, test plans and procedures are included in the development process. In addition, it is ensured that the requirements that have not been modified are not affected by the new implementation. The inputs to this activity include all the analysis work performed in previous activities, and the output is a new software baseline. Figure 3.20 shows the modification implementation activity. Maintenance review/acceptance. By means of this activity, it is ensured that (i) the changes made to the software are correct and (ii) changes are made to the soft- ware according to accepted standards and methodologies. The activity is augmented with the following processes: (i) a process for quality management; (ii) a process to verify the product; (iii) a process to validate the product; and (iv) a process to review the product. The maintenance plan should have documented how these supporting processes were tailored to address the characteristics of the specific soft- ware product. The inputs to this activity include the modified software and the test results. Figure 3.21 summarizes the maintenance/acceptance activity with the input, Modification implementation QA audit Peer reviews Management reviews System architecture definitions Modification request record Approved MR Source code Impact analysis report Outputs from problem and modification analysis activity Documentation process Quality assurance process Joint review process Modified source code Measures Updated MR Detailed analysis report Updated requirements Updated test plans Updated test procedures Updated test reports Updated training materials INPUT OUTPUT SUPPORT CONTROL FIGURE 3.20 Modification implementation activity www.it-ebooks.info 106 EVOLUTION AND MAINTENANCE MODELS Maintenance review/ acceptance QA audit Peer reviews Management reviews Modified software Modified test results Quality assurance process Verification process Validation process Joint review process Audit process New baseline, incorporating accepted modifications Rejected modifications Acceptance report Review and audit report Software qualification test report INPUT OUTPUT SUPPORT CONTROL FIGURE 3.21 Maintenance review/acceptance activity output, control, and support items. The process implementation activity consists of two major tasks: review and approval. The task steps for both review and approval are enumerated in Table 3.5. Migration. This refers to the process of moving a software system from one tech- nological environment to a different one that is considered to be better. Migration is effected in two broad phases: (i) identify the actions required to achieve migration and (ii) design and document the concrete steps to be executed to effect migra- tion. Figure 3.22 summarizes the migration activity with the input, output, control, TABLE 3.5 Review and Approval Task Steps Review Task Steps 1. Track the MRs from requirement specification to coding. 2. Ensure that the code is testable. 3. Ensure that the code conforms to coding standards. 4. Ensure that only the required software components were changed. 5. Ensure that the new code is correctly integrated with the system. 6. Ensure that documentations are accurately updated. 7. CM personnel build software items for testing. 8. Perform testing by an independent test organization. 9. Perform system test on a fully integrated system. 10. Develop test report. Approval Task Steps 1. Obtain quality assurance approval. 2. Verify that the process has been followed. 3. CM prepares the delivery package. 4. Conduct functional and physical configuration audit. 6. Notify operators. 7. Perform installation and training at the operator’s facility. www.it-ebooks.info ISO/IEC 14764 MAINTENANCE PROCESS 107 Migration QA audit Management reviews Old environment New environment Old baseline New baseline Documentation process Quality assurance process Verification process Validation process Joint review process Audit process Migration plan Migration tools Notification of intent Migrated software product Notification of completion Measures Archived data INPUT OUTPUT SUPPORT CONTROL FIGURE 3.22 Migration activity and support items. This activity comprises seven tasks discussed in the following paragraphs. Migration standard: During the migration of a software product from an old to a new operational environment, the maintainer must ensure that any additional software products or data produced or modified adhere to standard ISO/IEC 12207 [29]. As a part of the standard tasks, the maintainer (i) identifies all the software elements or data that were changed or added and (ii) ensures that the tasks were performed according to standard ISO/IEC 12207. Migration plan: For successful migration, a plan must be developed, documented, reviewed, and executed. The maintainer performs the task steps shown in Table 3.6 to write this document. The plan is developed in collaboration with the customers and it addresses the following:r Requirements analysis and definition of migrationr Development of migration toolsr Conversion of software product and datar Execution of migrationr Verification of migrationr Support backward compatibility with the old execution environment Notification of intent: The maintainer explains to the users: (i) why support for the old environment has been discontinued; (ii) the new environment and when it will be supported; and (iii) the availability of other options, if there is any, upon the removal of the old environment. Implement operations and training: Once a software product has been improved by modification and tested by the maintainer, it is installed in an operational environment to run concurrently with the old system. By running the old and the new system in parallel, users get an opportunity to become familiar with www.it-ebooks.info 108 EVOLUTION AND MAINTENANCE MODELS TABLE 3.6 Migration Plan Task Steps 1. Analyze the requirements for migration. 2. Perform an impact analysis of migrating the software system. 3. Make a schedule to execute migration. 4. Determine all requirements for data collection to perform post-operation review. 5. Identify and record the migration effort. 6. Identify and reduce risks. 7. Identify the required tools to support migration. 8. Determine how the old environment is going to be supported. 9. Acquire and/or design new tools to support migration. 10. Partition software products and data for conversion in an incremental manner. 11. Prioritize the activities involving data conversion and software products. 12. Execute software products and data conversions. 13. Perform migration of software products and data to the new environment. 14. Operate the migrated system and the old system in parallel as much as possible. 15. Perform testing to ensure the success of migration. 16. Should there be a need, continue to provide support for the old environment. TABLE 3.7 Operation and Training Task Steps Parallel Operations Task Steps 1. Survey the site. 2. Install hardware equipment. 3. Install the software system. 4. Run basic tests to ensure that hardware and software have been correctly installed. 5. Run both the new and old systems in parallel, under the desired operational load. 6. Gather data from the old and the new systems. 7. Analyze the collected data. Training Task Steps 1. Identify the requirements for migration training. 2. Schedule the requirements for migration training. 3. Review the migration training. 4. Update the plan to provide training. the new system, so that transition from the old to the new system becomes smoother. In addition, this will create an environment for the maintainer to compare and understand the input/output relationships of the old and the new system. During this period training should also be provided to the users. The steps listed in Table 3.7 can be performed by the maintainer in this step. Notification of completion: The maintainer notifies all the sites that the new system will become operational and that the old system can be shut down and uninstalled, after the completion of training and parallel operation of both the new and the old system for an appropriate number of hours. Essentially, the following task steps are performed by the maintainer: www.it-ebooks.info ISO/IEC 14764 MAINTENANCE PROCESS 109 1. Announce the migration. 2. Document the site-specific issues and make a plan to resolve them. 3. Archive the old system, including data and software. 4. Remove the old equipment. Post-operation review: Following the installation and operation of a changed system, a review is performed to assess the impact of changing the system in the new environment. The review reports are sent to the competent parties for information, guidance, and further actions. The maintainer executes the following steps, as part of the task: 1. Analyze the results of running the two systems concurrently. 2. Identify potential risk areas. 3. Summarize the lessons learned. 4. Produce a report on impact analysis. Data archival: Data associated with the old environment are made accessible to comply with the contractual requirements for data protection and audit. The maintainer performs the following steps as part of the task: 1. The old data and software are archived. 2. The old data and software are put on multiple media. 3. The media are saved in secure places. Retirement. A software product is retired when it is viewed to have reached the end of its useful life. An economic-based analysis is performed to retire the product and it is included in the retirement plan. Sometimes the work performed by the product is no longer needed; therefore, the retired product is not replaced. In other cases, a new software product has already been developed to replace the current system. In either case, the software system must be removed from the service in an orderly manner. In addition, considerations are given to accessing data produced by the software to be retired. Figure 3.23 Retirement QA audit Management reviews Old environment New software product Old software product baseline to be retired Documentation process Quality assurance process Configuration management process Joint review process Audit process Retirement plan Notification of issues Retirement results Trained people Retired software product Notification of completion Measures Archived baseline and data INPUT OUTPUT SUPPORT CONTROL FIGURE 3.23 Retirement activity www.it-ebooks.info 110 EVOLUTION AND MAINTENANCE MODELS TABLE 3.8 Retirement Plan Task Steps 1. Analyze the requirements to retire the systems. 2. Determine what impacts the retiring software will have. 3. Identify a product that will replace the software to be retired. 4. Make a schedule to retire the software. 5. Determine the need for residual support in the future. 6. Identify and describe the retirement effort. summarizes the retirement activity with the input, output, control, and support items. All artifacts from the retirement activity are controlled with CM. This activity comprises five tasks discussed in the following paragraphs. Retirement plan: In order to ensure a successful retirement, a retirement plan is developed, documented, reviewed, and executed. The maintainer performs the task steps shown in Table 3.8 to write this document. The plan is developed in collaboration with the customers to address the following:r Transition to any new software system.r Withdrawal of partial or full support after a grace period.r Responsibility for any future contractual support.r Archiving the software system, including all the documentation.r Accessibility to archived data. Notification of intent: The maintainer conveys to the users: (i) the reason for discontinuing support for the product; (ii) a note about the replacement or upgrade for the phased-out system, with an availability date; and (iii) a list of the other options available, if there is any, upon the removal of the old environment. Implement parallel operations and training: If there is a replacement system for the software product to be retired, it is installed in an operational environ- ment to run concurrently with the old system. By running the new and the old system in parallel, users will get an opportunity to become familiar with the new system so that transition from the old to the new system becomes smoother. In addition, this will create an environment for the maintainer to compare and understand the input/output relationships between the new sys- tem and the old system. In addition, training is provided to the users during this period. Notification of completion: The maintainer notifies all the sites that the new system will become operational and that the old system can be shut down. The old system is generally shut down after the the new system is in operation for a certain length of time. The maintainer performs the following steps as part of the task: 1. Make an announcement about the changes. 2. Identify issues specific to individual sites and describe how those will be resolved. www.it-ebooks.info SOFTWARE CONFIGURATION MANAGEMENT 111 3. Store the old data and software in an archive. 4. Disconnect and move out the old hardware infrastructure. Data archival: Data associated or used with the old environment will be made accessible according to contractual requirements involving data protection and audit. The maintainer executes the following steps as part of this task: 1. Archive the old data and software. 2. Make multiple copies of the old data and software. 3. Keep the media in safe places. 3.8 SOFTWARE CONFIGURATION MANAGEMENT Large, complex systems undergo many more changes than relatively small systems, and management of changes in large systems is nontrivial. Therefore, the concept of CM was developed to manage changes in large systems. The goal of CM is to manage and control the numerous corrections, extensions, and adaptations that are applied to a system over its lifetime. It handles the control of all product items and changes to those items. On the other hand, SCM is applied to software products. In this case the product items include document, executable software, source code, hardware, and disks. SCM has been defined by Bersoff, Hen- deson, and Siegel [30] as the discipline of identifying the configuration of a system at discrete points in time for the purpose of systematically controlling changes to this configuration and maintaining the integrity and traceability of this configuration throughout the system life cycle. SCM accrues two kinds of benefits to an organization as follows: r SCM ensures that development processes are traceable and systematic so that all changes are precisely managed. Consequently, the product is always in a well-defined state [31].r SCM enhances the quality of the delivered system and the productivity of the maintainers. CM is an essential part of software development and maintenance environment. It ensures that the released software is not contaminated by uncontrolled or unapproved changes. The objectives of SCM are to: r Uniquely identify every version of every software at various points in time.r Retain past versions of documentations and software.r Provide a trail of audit for all modifications performed.r Throughout the software life cycle, maintain the traceability and integrity of the system changes. www.it-ebooks.info 112 EVOLUTION AND MAINTENANCE MODELS Projects benefit from effective SCM as follows [32]: 1. Confusion is reduced and order is established. 2. To maintain product integrity, the necessary activities are organized. 3. Correct product configurations are ensured. 4. Quality is ensured and better quality software consumes less maintenance efforts. 5. Productivity is improved, because analysts and programmers know exactly where to find any piece of the software. 6. Liability is reduced by documenting the trail of actions. 7. Life cycle cost is reduced. 8. Conformance with requirements is enabled. 9. A reliable working environment is provided. 10. Compliance with standards is enhanced. 11. Accounting of status is enhanced. 3.8.1 Brief History A need for CM was originally felt in the aerospace industry in the 1950s with the primary purpose of guaranteeing reproducibility of aircrafts and managing engi- neering changes (ECs). In the 1970s, large-scale computer software began to pose many of those same change management problems. It became apparent that software maintenance engineers could borrow CM techniques from the aerospace industry to manage software modifications. In the beginning, punch cards with different colors were used to indicate changes. In the late 1960s, to indicate changes to the UNIVAC- 1100 EXEC-8 operating system, maintenance personnel used “corrective cards.” At that time, development of operating systems benefited from SCM. In the 1970s and the 1980s, SCM emerged as a distinct discipline. With the advancements in user- friendly software development environments, namely, Unix, specialized computer software tools were built. For example, the Unix-based software tool Make [33] accepts descriptions of system configurations and can automatically construct the system from its descriptions. The source code control system (SCCS) [34] and the revision control system (RCS) [35] tools permit the maintainer to keep track of all the textual alterations made to a file. Gradually, software products became candidates for configuration control. This resulted in a need to manage user workspaces, which was duly supported by newer SCM systems. In other words, SCM functionalities continued to evolve. Instead of storing the entire versions of software products, in the 1980s, delta algorithms based on text matching were developed to enable SCM tools to store just the differences among versions. In the 1990s, developers felt an increasing need to manage nontextual objects. Consequently, novel algorithms were developed for efficient storage and retrieval of nontextual objects. By the year 2000, due to disk storage becoming inexpensive, CPUs becoming fast, and nontextual objects becoming common, the storage of deltas became unimportant and many new tools simply used compression www.it-ebooks.info SOFTWARE CONFIGURATION MANAGEMENT 113 such as zip. These days SCM systems support the management of evolution of a broad range of software systems that are being modified by a large number of maintenance personnel working in different countries and utilizing a variety of machines. 3.8.2 SCM Spectrum of Functionality These days a broad range of high-level functionalities are supported by SCM systems. Estublier et al. [31] classified the functionalities into three broad areas: product, tool, and process. Next, each area is decomposed into a number of technical dimensions as shown in Figure 3.24. Those technical dimensions are briefly explained in the following. Identification. The items whose configurations need to be managed are identi- fied in this function. The identified items include specification, design, documents, data, drawings, source code, executable code, test plan, test script, hardware compo- nents, and components of the software development environment, namely, compilers, debuggers, and emulators. Project plan and customer requirements should also be included. To accurately identify products, including their configuration and version levels, a schema of names and numbers is designed. Finally, for all configuration items and systems, a baseline configuration is established. If there is a need to make changes to the baseline, it is done so with the concurrence of the configuration control organization. Version control. To avoid confusion during the process of artifact evolution, a new identifier is assigned to the artifact every time the artifact is modified. It is important to note that assigning a new identifier for every modification of the same artifact may hide important relations among the uniquely identified artifacts. As an example, one may be interested in recording a fact that a given artifact fixes a subset of defects Product Tool Process SCM functionalities Identification Version control System models and selection Workspace control Building Change management Status accounting Auditing FIGURE 3.24 Technical dimensions of SCM systems www.it-ebooks.info 114 EVOLUTION AND MAINTENANCE MODELS found in an earlier release. The aforementioned kind of relation can be recorded by means of the version control (VC) functionality of SCM by: (i) interpreting software artifacts as configuration items and (ii) identifying the relations, if there is any, among the configuration items. The basic VC idea is to have two separate files: master copies and working copies. The former is stored in a centralized repository. Software developers check out working copies from the repository, modify the working copies, and, finally, check in the working copies into the repository. Checking in a file means committing to the changes made to the working copies. The VC system creates a new version in the repository every time a file is committed. As time passes, all versions of the file are stored in the repository. Storing multiple copies of a file does not excessively waste space. Therefore, storing many copies of a file does not excessively waste storage space. Conflicts can arise if many software developers want to use the same version of a file. However, conflicts can be resolved by means of two techniques: lock-modify- unlock and copy-modify-merge [36]. The former model requires developers to obtain a lock on the file they want to modify. While a developer is holding the lock on a file, no other developer can modify the file. However, it may be noted that a locked file can be checked out for viewing and compilation. When there is no need to further keep a file locked, the developer commits the modifications and the lock is released. On the other hand, developers are at liberty to change their working copies in the copy-modify-merge model. If different developers make conflicting changes, the VC system flags the conflicts so that the developers can resolve them. In addition, VC must support parallel development by allowing branching of ver- sions. For example, consider the scenario: (i) an organization is currently developing the next version of their already released application; and (ii) a report about a major defect is received from the end users. Now the development group has the option to retrieve the released version and create a branch, as illustrated in Figure 3.25, to fix the defect. The figure illustrates how a file evolves with two branches, where the main path is called trunk. As shown in the figure, branch changes are incorporated by merging with the trunk. System models and selection. Files are discrete entities that contain descriptions of well-defined items of a project, namely, requirement specification, test cases, design, code, test results, and defect reports. However, it is neither efficient nor effective to manage a project file-by-file. Consequently, a need to support aggregate artifacts arises so that maintenancex personnel can enforce consistency in large projects by 1.0 1.2 1.2 1.2.1 1.2.2 1.2.3 1.3 Trunk Branch Merge FIGURE 3.25 An evolution of a file with two branches www.it-ebooks.info SOFTWARE CONFIGURATION MANAGEMENT 115 means of relationships among artifacts and attributes. Relationships among artifacts and attributes are captured by developing models which support the idea of software configurations. Intuitively, a configuration means an aggregate of versionable items. The general idea of configuration raises a need for enabling users to have selective access to parts and versions of such aggregated artifacts. By default, SCCS and RCS keep in the workspace the most recent version of the principal variant. Next, all artifacts that are exceptions to the default placement rule can be explicitly fetched by the user. Workspace control. Workspaces are implemented by SCM systems to give users an isolated place. In their own workspaces, users perform the usual tasks of editing and managing their artifacts. Such an environment that enables the maintainer to make and test the changes in an isolated manner is called workspace. In an SCM system, software versions are stored in a repository that cannot be directly modified. Rather, when a need to modify some files arises, the files are copied into a workspace. One can realize a workspace in two ways: (i) it can be as simple as the home directory of the programmer who wants to modify the files and (ii) it can be a complex mechanism such as an integrated development environment and a database. In general, three basic functions are performed in a workspace: r Sandbox: Checked out files are put in a workspace to be freely edited. In addition, it is not necessary to lock the original files in the repository.r Building: An SCM system generally stores the differences between successive versions to save space. Therefore, the workspace expands the deltas into full- fledged source files. In addition, the workspace stores the derived binaries.r Isolation: Every developer maintains at least one workspace. Therefore, the developer makes modifications to the source code, compiles the files, performs tests, and debugs code without impacting the works of other developers. The aforementioned features are generally available in modern software develop- ment environments. SCM systems provide a centralized facility to manage these features. Building. Efficiency is a key requirement of SCM systems so that developers can quickly build an executable file from the versioned source files. A second requirement of SCM systems is that it must enable the building of old versions of the system for recovery, testing, maintenance, or additional release purpose. Third, most SCM sys- tems support building of software. The build process and their products are assessed for quality assurance. Outputs of the build process become quality assurance records, and the records may be needed for future reference. The make [33] application on the Unix operating system, originally developed by researchers at Bell Laboratories, is a classical example of a build process. The tool continues to remain popular for system building. For instance, commercial SCM systems such as ClearCase [37] continue to rely on variants of make. Change management. SCM systems must: (i) enable users to understand the impact of modifications; (ii) enable users to identify the products to which a specific www.it-ebooks.info 116 EVOLUTION AND MAINTENANCE MODELS modification applies; and (iii) provide maintenance personnel with tools for change management so that all activities from specifying requirements to coding can be traced. In the beginning, CRs were managed in paper form. However, these days CRs are saved in the SCM repository and are linked with the actual modifications, in addition to being automated. This topic is further discussed in Section 3.9. Status accounting. To be able to quantify the properties of the software being developed and the process being used, it is necessary to gather statistics—and it can be done at the SCM level. The primary purpose of status accounting is to: (i) keep formal records of already existing configurations and (ii) produce periodic reports about the status of the configurations. These records: (i) describe the product correctly; (ii) are used to verify the configuration of the system for testing and delivery; and (iii) maintain a history of CRs, including both the approved ones and the rejected ones. A history of CR includes the answers to the following questions: r Why are changes made?r When are the changes made?r Who makes the changes?r What changes are made? Status accounting is useful in communicating important details of the project and configuration items to the stakeholders of the project. For example, maintenance personnel can view what files or fixes were part of what baseline systems. Another example is that project managers can trace completion of problem reports. Status accounting reduces the need to produce extensive reports, which include item delta report, transaction log, and modification log. Other common reports include resource usage, change in process, change in deviations, and status of all configuration items [38]. Examples of status accounting include the number of CR per software configu- ration item and the average time needed to implement a CR. Auditing. SCM systems need to provide the following features: (i) roll back to earlier stable points and (ii) identify which modifications were performed, why those modifications were performed, and who performed those modifications. In other words, ideally, an SCM system behaves as a searchable archive of all things that happened in the past. By means of auditing, the organization maintains the integrity of the baselines and release configurations for all products. Two kinds of audits are performed before a software is released: audit for functional configuration and audit for physical configuration. The former determines whether or not the software satisfies the user requirement specification and the system requirement specification. On the other hand, the latter verifies if the reference and design documents accurately represent the software. Overall, a configuration audit tries to find answers to the following: r To what extent are the requirements satisfied by the modified system?r Does the software release under consideration reflect the MRs? www.it-ebooks.info SOFTWARE CONFIGURATION MANAGEMENT 117 The activities to perform a configuration audit are as follows: r Procedures and an audit schedule are defined.r The personnel to perform the audits are identified.r Established baselines are audited.r Audit reports are generated. 3.8.3 SCM Process There is a large gap between mere understanding of the capabilities of SCM and successfully applying SCM in practice. As it is the case with large software projects, planning is critical to the successful application of SCM. By means of a plan, a configuration baseline is established. Baselining is the process by which a given set of configurable items formally become publicly available at a standard location, such as in a repository, to the people who are authorized to use it. Following the identi- fication of the initial configuration, a configuration control process, as illustrated in Figure 3.26 [32], is invoked. Figure 3.26 shows the three major SCM implementation activities: planning, baseline development, and configuration control. Planning. Planning is begun with two activities: (i) defining the SCM process and (ii) establishing procedures to control and document changes. A key step during planning is the identification of the stakeholders. All those who influence a system’s behavior and all those who are impacted by the system are stakeholders of the system [39]. The stakeholders in a CM are the maintainers, development engineers, sustaining test engineers, quality assurance auditors, users, and the management. The Define SCM process Identify stakeholders Develop or procure SCM tools Plan SCM program Identify items to control Identify baselines Develop schema of identifiers Establish baselines Evaluate proposed changes and approve/ disapprove Track approved changes to closure Update baselines and history— publish reports Audit—compare with documented configuration Control, document and audit configuration FIGURE 3.26 A process for implementing SCM www.it-ebooks.info 118 EVOLUTION AND MAINTENANCE MODELS stakeholders are also known as configuration control board (CCB) members. Not all changes are reviewed by the board. Rather, small groups review and approve most of the changes. Therefore, those groups need to be identified in the planning phase. Various SCM tools are used to maintain configuration history and facilitate the SCM process flow. Examples of such tools are concurrent version system (CVS) and Clearcase [37]. Establishing baselines. Once an SCM program plan is in place, the next step in implementing effective SCM is to identify the items that are the subject of configura- tion control. Some examples of those items are code, data, and documents. With the configuration items identified, a software baseline library is established to make the set of configurable items publicly available. The library, called repository, is the heart of the SCM system. The repository is the central place that contains all configurations that have been made public. In other words, the repository has information about all the baselined items. The process of baseline (or re-baseline for a change) involves the following activities: 1. Create a snapshot of the current version of the product and its configuration items and allocate a configuration identifier to the entire configuration. 2. Allocate version numbers to the configuration items and check in the configu- ration. 3. Store the approved authority information as part of meta data in the repository. 4. Broadcast all the above information to the stakeholders. 5. To accurately identify the configuration version, design a schema of words, numbers, or letters for common types of configuration items. In addition, project requirements may dictate specific nomenclature. Controlling, documenting, and auditing. After establishing a baseline, it is important to: (i) keep the actual and the documented configuration identical and (ii) ensure that the baseline complies with a project’s configuration described in the requirements document. The aforementioned requirements of a baseline are realized by means of a four-step iterative process illustrated in Figure 3.26. The stakeholders specified in the SCM plan review and evaluate all changes to the configuration. After their evaluations, both approvals and disapprovals are documented. Approved changes are tracked until they are verified—and this is discussed in Section 3.9. Next, the appropriate baseline is revised in conjunction with all relevant documents, and reports are generated. At regular intervals, records and products are audited to verify that: r there is acceptable matching between the documented configuration and the actual configuration;r the configuration conforms with the requirements of the project; andr documentations of all change activities are complete and up-to-date. The three steps in the cycle, namely, controlling, documenting, and auditing, are repeatedly executed throughout the lifetime of the project. www.it-ebooks.info CR WORKFLOW 119 3.9 CR WORKFLOW A CR, also called an MR, is a vehicle for recording information about a system defect, requested enhancement, or quality improvement. In other words, defect reports or enhancement requests are documented as a CR. CRs are placed under the control of a change management system. Change management systems control changes by an automated system in the form of workflow. The basic objective of change management is to uniquely identify, describe, and track the status of each requested change. It is a methodology for controlling changes to evolving systems. The objectives of change management are as follows [1]: r Provide a common method for communication among stakeholders.r Uniquely identify and track the status of each CR. This feature simplifies progress reporting and provides better control over changes.r Maintain a database about all changes to the system. This information can be used for monitoring and measuring metrics. A CR describes the desires and needs of users which the system is expected to implement. While describing a CR, two factors need to be taken into account: r Correctness of CRs: CRs need to be unambiguously described so that it is easy to review them for their correctness. The “form” of a CR is key to effec- tive interactions between the software development organization and the users. The “form” should document essential information about changes to software, hardware, and documentation.r Clear communication of CRs to the stakeholders: CRs need to be clearly com- municated to the stakeholders, including the maintainers, so that those CRs are not interpreted in a different way. The people who might be collecting CRs may not be part of the maintenance group. For example, the marketing people may be collecting CRs. Therefore, there may not be direct communication between the teams actually carrying out the changes to the system and the end users. It becomes counterproductive for different teams to interpret CRs differently. The results of interpreting a CR in different ways are as follows: r The team carrying out actual changes to the system and the team performing tests may develop contradicting views about the new system’s quality.r The changed system may not meet the needs and desires of the end users. CRs need to be represented in an unambiguous manner and made available in a centralized repository. Wide availability of CRs to all the stakeholders is likely to reveal differences in interpretations by different groups. Next, a formal model is described to represent CRs for analysis and review. The life cycle of a CR has been illustrated in Figure 3.27, by means of a state- transition diagram. Each state represents a distinct stage in the life-cycle of a CR. www.it-ebooks.info 120 EVOLUTION AND MAINTENANCE MODELS Submit Review Analysis Commit ImplementVerification Closed Decline FIGURE 3.27 State transition diagram of a CR The model shows the evolution of a CR via the following major states: Submit, Review, Analysis, Commit, Implement, Verification, and Closed. Specific actions are associated with each state, and the state of a CR is updated upon the completion of those actions. For several reasons, the status of a CR is changed to the Decline state from Review, Analysis, Implement, and Verification. For instance, the marketing team may conclude that realization of a CR may not fetch more business. The motivation for describing CRs by means of state diagrams is to enable their easy tracking. For ease of implementation and management of CRs, a general schema, as shown in Table 3.9, can be used. Once such a schema is implemented with a back-end database and a front-end graphical user interface, CRs can be stored in a database. Later, for the purposes of tracking and reporting the status of the CRs, queries are generated. Submit state. This is the initial state of a newly submitted CR. Usually, end users, customers, and marketing managers are the prime sources of CRs. When a new CR is filed, the following fields, described in Table 3.9, are instantiated: change_request_id priority description maintenance_type component note product customer Based on the priority level of a CR, it is moved from Submit to Review. Usually, a marketing manager assumes the responsibility of this initial handling of a CR, and he becomes the ”owner” of the CR. www.it-ebooks.info CR WORKFLOW 121 TABLE 3.9 Change Request Schema Field Summary Field Name Description change_request_id A unique identifier of the CR title A concise summary of the CR description A short description of the CR maintenance_type Classification of the maintenance type in terms of a member of {Corrective, Adaptive, Perfective, Preventive} product Product name component Component where the change is needed, or where the problem occurred state Present state of the CR in terms of a member of {Submit, Review, Analysis, Commit, Implementation, Verification, Closed, Declined} customer Name of the customer making the CR problem_origin The origin of the problem impacts Components that are affected by a change and its ripple effect resolution Documentation of what was changed, how, and why note Additional information provided by the submitter for subsequent decision making software_release The version number of the product release in which the CR is likely to be effective committed_release The version number of the product release in which the CR will be effective priority Priority of CR, which is an element of a set, namely, {normal, high} severity Severity of CR, which is an element of a set, namely, {normal, critical} marketing_justification The business justification for the CR to exist time_to_implement The time, in person-week, required to effect the change eng_assigned The engineering personnel assigned to analyze the CR functional_spec_title Title of the specification for functional requirements functional_spec_name Name of the file describing the functional requirements functional_spec_version This is the most recent version number of the specification of functional requirements decline_note The reason for declining the CR ec_number The identifier of the EC document attachment Attachment to further describe the CR (if any) tc_id Identifiers of test cases used in effecting the CR tc_results The result of testing: {Untested, Passed, Failed Blocked, Invalid} verification_method Record the methods of verification of the CR: analysis, testing, inspection, and/or demonstration verification_status The verification state of the CR: passed, failed, or incomplete compliance The level of compliance:{Non-compliance, Partial Compliance, or Compliance} (continued) www.it-ebooks.info 122 EVOLUTION AND MAINTENANCE MODELS TABLE 3.9 (Continued) Field Name Description testing_note Reports from the test personnel, possibly describing the demonstration given to the customers, analysis performed on the change, or inspection of the code performed by test personnel defect_id Defect identifier If “Failed” is assigned to the tc_results field, the defect identifier is associated with the failed test to indicate the defect causing the failure. The defect identifier is obtained from test database. Review state. Generally, a manager for marketing handles the CR in the Review state by coordinating the following activities: r It is possible that the newly generated CR is a duplicate of an existing CR. If it is found to be a duplicate of an existing CR, the request is moved to the Decline state with a short explanation and a link to the original CR. Should there be any ambiguity in the description of the CR, the submitter is asked to provide more details, which are recorded in the note and the description fields.r Accept the assigned priority level of the CR or modify it.r Re-evaluate the maintenance_type of the CR initially estimated by the submitter, and accept or modify it.r Determine the level of severity of the CR: normal and critical. If it is critical, then the upper management may want to complete the review immediately. Note that a severity level and a priority level are independently assigned.r To reflect the CR, determine a software release.r Give a marketing rationale for the CR.r For further actions, the CR is moved to the Analysis state. In summary, the following fields are updated in the Review state: priority severity maintenance_type decline_note software_release marketing_justification description and note www.it-ebooks.info CR WORKFLOW 123 Analysis state. In this stage, impact analysis is conducted to understand the CR and to estimate the time required to implement it. In addition, a high-level functional specification for the CR is prepared. If it is decided that it is not possible or desirable to implement the CR, then Decline becomes the next state of the CR. Otherwise, the CR is moved to the Commit state. In the Commit state, the program manager controls the CR by becoming its owner. While in the Analysis state, the owner, who is typically the director of software engineering, updates the following fields: component problem_origin impacts time_to_implement attachment functional_spec_title functional_spec_name functional_spec_version eng_assigned Commit state. The CR continues to stay in the Commit state before it is committed to a specific release of the product. In this state, the program manager is the owner of the CR. All the CRs that are desired to be in a specific software release are reviewed. Some CRs may be re-assigned to a later release after consultations with customers, the marketing division, and the director of software engineering. After committing a CR to a particular release, the CR is moved to the Implement state and all the functional specifications are frozen for development and test design purposes. In the Commit state: (i) the engineering team begins modifying the software component documentation, namely, data and control flow diagrams and schematics; (ii) test personnel review the CR and the associated functional specification to ensure that the CR is testable; (iii) test personnel write new test cases for the CR; and (iv) test personnel select regression tests. Committed_release is the only field updated in the Commit state. Implement state. The Implement stage is controlled by the director of software engineering. A number of different scenarios can occur in this stage as follows: r The CR can be declined if its implementation is infeasible.r If the CR is infeasible in its current form, the director of software engineering may assign an EC number and provide an explanation, and the EC document is linked with the CR definition. Table 3.10 shows how to organize an EC document.r If the CR or its modified version is doable, the software engineering group writes code and performs unit tests. The CR is moved from Implement to Verification after the product is available for system-level testing. www.it-ebooks.info 124 EVOLUTION AND MAINTENANCE MODELS TABLE 3.10 Engineering Change Document Information EC number A unique number. Requirement(s) affected Identifiers of CRs and their titles. Description of problem/issue Brief description of the issue. Description of change required Description of changes needed to the original CR description. Secondary technical impact Description of the impact the EC will have on the system. Customer impacts Description of the impact the EC will have on the end customer. Change recommended by Name of the engineer(s). Change approved by Name of the approver(s). Source: From Reference 24. © 2008 John Wiley & Sons. In the Implement state the following fields are updated: decline_note ec_number attachment resolution Verification state. In the Verification state, activities are largely controlled by the sustaining test manager. To assign a test verdict, verification can be performed by one or more methods: demonstration, analysis, inspection, and testing. If verification is performed by testing, then the software is executed with a set of tests. Inspection means reviewing the code to detect defects. Analysis is performed by means of statistical and/or mathematical tools. Demonstration implies showing the system in a live operation. A status of verification is provided in terms of the degree of compliance of the modified system to the CR: noncompliance, partial compliance, or full compliance. If the testing method is not used, then a note explains the details of the demonstration, the inspection, and/or the analysis performed. Shortfalls in the realization of the CR, in the form of incomplete and even partly accurate implementation, are specified in an EC document. It is very difficult to correct any deviations or errors discovered at this stage. Therefore, a pragmatic approach to dealing with the deficiency is to produce an EC document, after negotiating with the customer, to revise the CR, and possibly generate a new CR for future considerations. As an extreme decision, the sustaining test manager may decline to accept the modifications made to the code an EC number and an explanation, followed by a state change to Decline. On the other hand, after ensuring that the implementation passed the required tests, the sustaining test manager moves the CR to the Closed phase. In the Verification state, the following fields are instantiated: decline_note ec_number attachment www.it-ebooks.info SUMMARY 125 verification_method verification_status compliance tc_id tc_results defect_id testing_note Closed state. After successfully verifying that the CR has been incorporated into the software, the CR is moved from Verification to the Closed state. This is done by the owner of the CR in the Verification state who is, in general, the sustaining test manager. Decline state. The Decline state is controlled by the marketing department. Due to one or more of the following causes, a CR happens to be in this state: r Because of insufficient business impact of the CR, the marketing department decides to reject the CR.r It is technically infeasible to implement the CR.r The sustaining test manager concludes that changes made to the software to effect the CR could not be satisfactorily verified. An explanation is provided in the form of an EC number. The CR may be moved to Submit by the marketing group after negotiating with the customer. Negotiations with the customer may lead to a reduction in the scope of the CR. The EC information is used as a basis for negotiation with the customer. 3.10 SUMMARY This chapter began with three well-known reuse-oriented paradigms: quick fix, iter- ative enhancement, and full reuse. All the three models assume that a set of docu- ments completely and accurately describe the existing system. The first model makes the necessary changes to the code first, followed by changes to the relevant doc- umentations. The second model modifies the top-level documents impacted by the modifications and then propagates those changes down to the code level. The third model builds a new system from components of the old system and other components available in the repository. Next, we studied a simple staged model for CSS development, which comprises five major stages: Initial development, Evolution, Servicing, Phaseout, and Close- down. In this view software life cycle, maintenance is actually a series of distinct stages, each with different activities. The software evolution process is the backbone of the model. It continues in an iterative fashion and eventually produces the next version of the software. We discussed its applicability to FLOSS systems model. The www.it-ebooks.info 126 EVOLUTION AND MAINTENANCE MODELS model benefits developers to characterize FLOSS systems in terms of stages, and identify the current stage of a system and the new stage to which it is more likely to move. Three major differences between CSS and FLOSS systems were identified and discussed. Next, we described an evolutionary model known as change mini-cycle, which consists of five major phases: CR, analyze and plan change, implement change, verify and validate, and document change. In this model, several interesting activities were identified, such as program comprehension, impact analysis, refactoring, and change propagation, which continue to be the subjects of intense research. With the three evolution models in place, we examined two standards: IEEE/EIA 1219 and ISO/IEC 14764. Both the standards describe the process for managing and executing software maintenance activities. Next, we discussed the management of system evolution by focusing on SCM. Next, a state-transition model was given to monitor individual CRs as those move through the software organization. Certain actions are completed in each state of the model. Finally, a sample schema to manage CRs was presented in detail. LITERATURE REVIEW In the classical Waterfall model for software development proposed by Royce in circa 1970 [40], the final phase is Operation and Maintenance. This model considers software maintenance as another task of software development. On the other hand, J. R. McKee, in his article “Maintenance as a function of design,” AFIP National Conference Proceeding, 53, 1984, pp. 187–193, suggests software maintenance to be 2nd, 3rd, … , nth round of development. In this regard it is worth quoting Norman Schneidewind from his article “The state of software maintenance,” IEEE Transac- tions on Software Engineering, 13(3), 1987, pp. 303–309: “The traditional view of the software life cycle has done a disservice to maintenance by depicting it solely as a single step at the end of the cycle” (p. 304). Victor Basili [2] argues that software maintenance is continued development, using the same knowledge, methods, and tools used for software development. Based on this view he developed the reuse- oriented software development process. A more complicated idea of using SDLC is proposed by Barry Boehm (“Software engineering,” IEEE Transaction on Comput- ers, December 1976, pp. 1226–1241) based on his spiral model [41]. Ned Chapin, in his article “Software maintenance life cycle,” Proceedings of Conference on Software Maintenance, Phoenix, Arizona, IEEE Computer Society Press, Los Alamitos, CA. 1988, pp. 6-13, argued that the SDLC model is not compatible with a software main- tenance model. The use of SDLC will generate an inappropriate expectation set of metric requirements, such as effort needed, selection of tools, management support, and complexity of relevant task. Therefore, he prefers software maintenance to have its own SMLC model. A number of proposals based on the SMLC model have been published with some variations among them. Three common features of SMLC models found in literature www.it-ebooks.info LITERATURE REVIEW 127 are: (i) understanding the code; (ii) modifying the code; and (iii) revalidating the software. In the following we list up those advocating the SMLC model: G. Parikh. 1982. ‘The world of software maintenance. In: Techniques of Program and System Maintenance (Ed. G. Parikh), pp. 9–13. Little, Brown and Company, Boston, MA. J. Martin and C. L. McClure. 1983. Software Maintenance: The Problem and Its Solution. Prentice-Hall, Englewood Cliffs, NJ. S. Chen, K. G. Heisler, W. T. Tsai, X. Chen, and E. Leung. 1990. A model for assembly program maintenance. Journal of Software Maintenance: Research and Practice, March, 3–32. D. R. Harjani and J. P. Queille. 1992. A Process Model for the Maintenance of Large Space Systems Software. Proceedings of Conference on Software Maintenance, November 1992, Orlando, FL. IEEE Computer Society Press, Los Alamitos, CA. pp. 127–136. W. K. Sharpley. 1977. Software Maintenance Planning for Embedded Computer Systems. Proceedings of the IEEE COMPSAC, November 1977, IEEE Computer Society Press, Los Alamitos, CA, pp. 520–526. S. S. Yau, R. A. Nicholi, J. Tsai, and S. Liu. 1988. An integrated life-cycle model for software maintenance. IEEE Transaction on Software Engineering, August, 1128–1144. S. S. Yau and I. S. Collofello. 1980. Some stability measures for software maintenance. IEEE Transaction on Software Engineering, November, 545–552. L. J. Arthur. 1988. Software Evolution: The Software Maintenance Challenge. John Wiley & Sons, New York, NY, 1988. The model proposed by Sharpley focused on corrective maintenance activities through problem verification, problem diagnosis, reprogramming, and baseline revalidation. On the other hand, Arthur’s model is for corrective, adaptive, and perfective mainte- nance activities and consisted of several phases: (i) change management; (ii) impact analysis; (iii) system release planning; (iv) design change; (v) coding; (vi) testing; and (vii) system release. Neal Febbraro and V. Rajlich presented an agile model of incremental change and development process that consists of repeated incremental change (The Role of Incremental Change in Agile Software Processes. Proceedings Agile, August 2007, Washington, D.C. IEEE Computer Science Press, Los Alamitos. pp. 92–103. At Iona Technologies, XP has been used successfully for maintenance of a Corba- based middleware product called Orbix, as reported in the article (“Using extreme programming in a maintenance environment,” IEEE Software, November/December, 2001, pp. 42–50) by C. Poole and J. W. Huisman. According to Kent Beck, one of the proponents of XP, “Maintenance is really the normal state of an XP project” (Extreme Programming Explained. Addison-Wesley, Reading, MA, 1999). By following a development process an organization develops new software prod- ucts, whereas a maintenance process is seen as providing subsequent services. The line between services and products is not a crisp one. On the one hand, adaptive maintenance is viewed as a hybrid of services and product. On the other hand, cor- rective maintenance is a product-intensive service, and software operation can be www.it-ebooks.info 128 EVOLUTION AND MAINTENANCE MODELS considered a pure service. Basically, an intangible set of activities and/or benefits are bundled and sold as a service by an organization to its customers. Therefore, to deliver high-quality results from maintenance tasks, two quality dimensions must be considered: functional quality and technical quality. Maintenance services are improved by organizations by improving the: (i) technical quality of their work and (ii) functional quality of software maintenance. In this context it is valuable to read and practice the process proposed by Niessink and van Vliet in their article “Soft- ware maintenance from a service perspective,” published in the Journal of Software Maintenance: Research and Practice, March/April 2000, pp. 103–120. Their process can be summarized in the form of four items: 1. Customer expectations are translated into maintenance service agreements. 2. Maintenance activities are planned and implemented by using the service agree- ments as a basis. 3. Planning and procedures guide the maintenance activities. 4. Manage the communication concerning the delivered services. In addition, it is highly recommended for information technology (IT) professionals to study the following two standards: 1. The IT Service Capability Maturity Model, Version 1.0RC1, January 28, 2005, http://www.itservicecmm.org 2. The IT Infrastructure Library (ITIL)—An Introduction, Central Computer and Telecommunication Agency (CCTA), HMSO books, Norwich, England, 1993. The British government developed ITIL through CCTA and it was maintained by the Netherlands IT Examinations Institute (EXIN). ITIL aims at establishing best standards and practices for IT service delivery. To explain the “best practices” in the delivery of IT service, nine sets of infrastructure library booklets have been developed. The nine sets of booklets cover two broad topics: (i) provision of IT services and management of IT infrastructure and (ii) environment. The former has been addressed in the first six sets of booklets, whereas the latter in the remaining three. Cabling, building, and service facilities are part of the environment. The IT service capability maturity model has much similarities with the Software CMM. IT services are delivered by installing, maintaining, managing, or operating the IT needs of a customer. In other words, software maintenance is one element of a whole gamut of the deliverable IT services. The software maintenance maturity model SMmm was designed by Alain April and Alain Abran (Software Maintenance Management: Evaluation and Continu- ous Improvement. Wiley-IEEE Computer Society Press, April 2008) as a customer- focused reference model for either (i) auditing the software maintenance capability of a software maintenance service supplier or outsourcer or (ii) improving internal software maintenance organizations. It includes 4 process domains, 18 key processes www.it-ebooks.info REFERENCES 129 areas (KPAs), 74 roadmaps, and 443 practices. In addition, the reader is recommended to study the corrective maintenance maturity model article by Mira Kajko-Mattsson, “Motivating the Corrective Maintenance Maturity Model (CM3). Seventh IEEE Inter- national Conference on Engineering of Complex Computer Systems, 2001, Skovde, Sweden. IEEE Computer Society Press, Piscataway, NJ. pp. 112–117.” Those who are interested in a much detailed treatment of SCM peruse the article by Estublier, Leblang, Hoek, Conradi, Clemm, Tichy, and Wiborg-Weber (“Impact of software engineering research on the practice of software configuration management,” ACM Transactions on Software Engineering and Methodology, 14(4), October 2005, pp. 383–430). They discussed the evolution of SCM technology, with emphasis on the impact of industrial and university research. REFERENCES [1] L. J. Arthur. 1988. Software Evolution: The Software Maintenance Challenge. John Wiley &Sons,NewYork,NY. [2] V. R. Basili. 1990. Viewing maintenance as reuse-oriented software development. IEEE Software, January, 19–25. [3] C. Larman and V. R. Basili. 2003. Iterative and incremental development: a brief history. IEEE Computer, June, 47–55. [4] T. Gilb. 1988. Principles of Software Engineering Management. Addison-Wesley, Read- ing, MA. [5] G. Visaggio. 1999. Assessing the maintenance process through replicated controlled experiments. The Journal of Systems and Software, January, 187–197. [6] V. T. Rajlich and K. H. Bennett. 2000. A staged model for the software life cycle. IEEE Computer, July, 2–8. [7] E. Yourdon. 1995. When good enough software is best. IEEE Software, May, 79–81. [8] N. Chapin, J. E. Hale, K. M. Khan, J. F. Ramil, and W. G. Tan. 2001. Types of software evolution and software maintenance. Journal of Software Maintenance and Evolution: Research and Practice, January/February, 3–30. [9] A. Capiluppi, J. M. G. Barahona, I. Herraiz, and G. Robles. 2007. Adapting the Staged Model for Software Evolution to Free/Libre/Open Source Software. IWPSE, September 2007, Dubrovnik, Croatia. ACM, New York. pp. 79–82. [10] S. S. Yau, J. S. Collofello, and T. MacGregor. 1978. Ripple Effect Analysis of Software Maintenance. COMPSAC, November 1978, Chicago, Illinois. IEEE Computer Society Press, Piscataway, NJ. pp. 60–65. [11] K. H. Bennett and V. T. Rajlich. Software Maintenance and Evolution: A Roadmap. ICSE, The Future of Software Engineering, June 2000, Limerick, Ireland. ACM, New York. pp. 73–87. [12] T. Mens. 2008. Introduction and roadmap: history and challenges of software evolution. In: Software Evolution (Eds T. Mens and S. Demeyer). Springer-Verlag, Berlin. [13] R. G. Canning. 1972. That maintenance ‘iceberg’. EDP Analyzer, October, 1–14. [14] A. V. Mayrhauser and A. M. Vans. 1995. Program comprehension during software maintenance and evolution. IEEE Computer, August, 44–55. www.it-ebooks.info 130 EVOLUTION AND MAINTENANCE MODELS [15] V. T. Rajlich and N. Wilde. 2002. The Role of Concepts in Program Comprehension. IWPC, June 2002, Paris, France. IEEE Computer Society Press, Piscataway, NJ. pp. 271–278. [16] T. J. Biggerstaff, B. G. Mitbander, and D. E. Webster. 1994. Program understanding and the concept assignment problem. Communications of the ACM, May, 72–82. [17] G. Antoniol, G. Canfora, G. Casazza, and A. De Lucia. 2000. Identifying the Start- ing Impact Set of a Maintenance Request: A Case Study. CSMR, February 2000, Zurich, Switzerland. IEEE Computer Society Press, Los Alamitos, CA. pp. 227– 230. [18] S. A. Bohner and R. S. Arnold. 1996. Software Change Impact Analysis (Eds S. A. Bohner and R. S. Arnold). IEEE Computer Society Press, Los Alamitos, CA. [19] W. P. Stevens, G. J. Myers, and L. Constantine. 1974. Structured design. IBM Systems Journal, 13(2), 115–139. [20] D. P. Freedman and G. M. Weinberg. 1981. A checklist for potential side effects of a maintenance change. In: Techniques of Program and System Maintenance (Ed. G. Parikh), pp. 93–100. QED Information Sciences Inc., Wellesley, MA, USA. [21] T. Mens. 2004. A survey of software refactoring. IEEE Transactions on Software Engi- neering, February, pp. 126–139. [22] V. T. Rajlich. 1997. A Model for Change Propagation Based on Graph Rewriting.ICSM, October 1997, Bari, Italy. IEEE Computer Society Press, Los Alamitos, CA. pp. 84–91. [23] A. E. Hassan and R. C. Holt. 2004. Predicting Change Propagation in Software System. ICSM, September 2004, Chicago, Illinois. IEEE Computer Society Press, Los Alamitos, CA. pp. 284–293. [24] S. Naik and P. Tripathy. 2008. Software Testing and Quality Assurance: Theory and Practice. John Wiley & Sons, Hoboken, NJ. pp. 93–100. [25] IEEE Standard 1219-1998. 1998. Standard for Software Maintenance. IEEE Computer Society Press, Los Alamitos, CA. [26] SWEBOK. 2004. Guide to the Software Engineering Body of Knowledge. IEEE Computer Scociety Press, Los Alamitos, CA. [27] ISO/IEC 14764:2006 and IEEE Std 14764-2006. 2006. Software Engineering - Software Life Cycle Processes - Maintenance. Geneva, Switzerland. [28] T. M. Pigoski. 1996. Practical Software Maintenance. John Wiley & Sons, New York, NY. [29] ISO/IEC 12207:2006 and IEEE Std 12207-2006. 2008. System and Software Engineering - Software Life Cycle Processes. Geneva, Switzerland. [30] E. Bersoff, V. Henderson, and S. Siegel. 1980. Software Configuration Management—An Investment in Product Integrity. Prentice Hall, Englewood Cliffs, NJ. [31] J. Estublier, D. Leblang, A. V. Hock, R. Conradi, G. Clemm, W. Tichy, and D. Wiborg- Weber. 2005. Impact of software engineering research on the practice of software con- figuration management. ACM Transactions on Software Engineering and Methodology, October, 383–430. [32] Software Technology Support Center. 2005. Configuration management fundamentals. CrossTalk A Journal of Defense Software Engineering, July, 10–15. [33] S. Feldman. 1979. Make—a program for maintaining computer programs. Software- Practice and Experience, 9, 255–265. www.it-ebooks.info EXERCISES 131 [34] M. Rochkind. 1975. The source code control system. IEEE Transactions on Software Engineering, December, 364–370. [35] W. Tichy. 1985. Rcs—a system for version control. Software-Practice and Experience, 15, 637–654. [36] P. Louridas. 2006. Version control. IEEE Software, January–February, 104–107. [37] ClearCase. 2009. IBM Rational, October. Available at http://www.ibm.com/software/ awdtools/clearcase/. [38] M. Ben-Menachem. 1994. Software Configuration Management Guidebook. McGraw- Hill, New York, NY. [39] M. Glinz and R. J. Wieringa. 2007. Stakeholders in requirements engineering. IEEE Software, March–April, pp. 18–20. [40] W. W. Royce. 1970. Managing the Development of Large Software System: Concepts and Techniques. Proceedings of IEEE WESCON, August 1970. pp. 1–9; Republished in 1987 by ICSE, Monterey, CA. 1987, pp. 328–338. [41] B. W. Boehm. 1988. A spiral model of software development and maintenance. IEEE Computer, May, 61–72. EXERCISES 1. Define the terms process, life-cycle, and model. 2. Discuss different ways of changing the following characteristics from one stage to another stage during the CSS life cycle model. (a) Staff expertise (b) Software architecture (c) Software decay (d) Economics 3. Discuss the similarities and differences between the staged models of CSS sys- tems and FLOSS systems. 4. What is the difference between change propagation and change impact analysis? 5. Discuss the role of concept in program comprehension. 6. What is ripple effect? In what way is it different from side effect? 7. Explain why it is important to conduct program comprehension before impact analysis. 8. What are the advantages of the quick fix model and why is it still used? 9. Explain the differences between the incremental and the iterative development models? 10. What are the drawbacks of the iterative enhancement model? www.it-ebooks.info 132 EVOLUTION AND MAINTENANCE MODELS 11. Discuss the differences between the iterative enhancement and the full reuse models. 12. Discuss the major differences between the IEEE 1219 and the ISO/IEC 14764 standards for software maintenance procedure. 13. How would each of the following groups use the information contained in a CR? (a) Maintainers (b) Management (c) Quality assurance auditors (d) Sustaining test engineers (e) Customers 14. What are some of the factors that you would think about when reviewing a CR? 15. Why should a maintainer categorize CRs into different groups? What are some of the factors that should be considered when categorizing those CRs? www.it-ebooks.info 4 REENGINEERING Neither situation nor people can be altered by the interference of an outsider. If they are to be altered, that alteration must come from within. —Phyllis Bottome 4.1 GENERAL IDEA Reengineering is the examination, analysis, and restructuring of an existing software system to reconstitute it in a new form and the subsequent implementation of the new form. The goal of reengineering is to: r understand the existing software system artifacts, namely, specification, design, implementation, and documentation; andr improve the functionality and quality attributes of the system. Some examples of quality attributes are evolvability, performance, and reusability. Fundamentally, a new system is generated from an operational system, such that the target system possesses better quality factors. The desired software quality fac- tors include reliability, correctness, integrity, efficiency, maintainability, usability, flexibility, testability, interoperability, reusability, and portability [1]. In other words, reengineering is done to convert an existing “bad” system into a “good” system [2]. Of course there are risks involved in this transformation, and the primary risks are: (i) the Software Evolution and Maintenance: A Practitioner’s Approach, First Edition. Priyadarshi Tripathy and Kshirasagar Naik. © 2015 John Wiley & Sons, Inc. Published 2015 by John Wiley & Sons, Inc. 133 www.it-ebooks.info 134 REENGINEERING target system may not have the same functionality as the existing system; (ii) the tar- get system may be of lower quality than the original system; and (iii) the benefits are not realized in the required time frame [3]. Software systems are reengineered by keeping one or more of the following four general objectives in mind [4]: r Improving maintainabilityr Migrating to a new technologyr Improving qualityr Preparing for functional enhancement. Improving maintainability. Let us revisit Lehman’s second law, namely, increasing complexity: “As an E-type program evolves, its complexity increases unless work is done to maintain or reduce it.” As systems grow and evolve, the cost of maintenance increases because changes become difficult and time consuming. Consequently, it is unrealistic to avoid reengineering in the long run. The system is redesigned with more explicit interfaces and more relevant functional modules. In addition, both external and internal documentations are made up-to-date. All those activities lead to better maintainability of the system. Migrating to a new technology. Lehman’s first law, namely, continuing change, states that “E-type programs must be continually adapted, else they become progres- sively less satisfactory.” A program is continually adapted to make it compatible with its operating environment. In the fast-paced information technology industry, new— and sometimes cheaper—execution platforms include new features, which quickly make the current system outdated, and maybe more expensive. Compatibility of the newer system with the old one is likely to be an issue, because vendors have less motivation to provide support for older parts—both software and hardware—that become incompatible and more expensive. Moreover, as systems evolve, expertise of employees migrates to newer technologies, with fewer staff to maintain the old system. Consequently, organizations with perfectly working software that continues to meet their business needs are forced to migrate to a modern execution platform that includes newer hardware, operating system, and/or language. Improving quality. Lehman’s seventh law, namely declining quality, states that “Stakeholders will perceive an E-type program to have declining quality unless it is rigorously maintained and adapted to its environment.” As time passes, users make increasingly more change requests to modify the system. Each change causes “ripple effects,” implying that one change causes more problems to be fixed. As the system is continually modified as a result of maintenance activities, the reliability of the software gradually decreases to an unacceptable level. Therefore, the system must be reengineered to achieve greater reliability. Preparation for functional enhancement. Lehman’s sixth law, namely, continuing growth, states that “The functionality of an E-type program must be continually increased to maintain user satisfaction with the program over its lifetime.” This law reflects the fact that all programs, being finite, limit the functionalities to a finite selection from a potentially unbounded set. Properties excluded by the bounds www.it-ebooks.info REENGINEERING CONCEPTS 135 eventually become a source of performance limitations, dissatisfaction, and error. These properties in terms of functionalities must be implemented in the application to satisfy the stakeholders. In general, reengineering is not performed to support more functionalities of a system; rather, as a preparatory step to enhance functionalities, a system is reengineered. For example, if the objective is to transform programs from a procedural to an object-oriented form to distribute them in a client-server architecture, then, at the same time, plan to reduce the maintenance costs by using a language such that the system will be more amenable to changes. 4.2 REENGINEERING CONCEPTS A good comprehension of the software development processes is useful in making a plan for reengineering. Several concepts applied during software development are key to reengineering of software. For example, abstraction and refinement are key concepts used in software development, and both the concepts are equally useful in reengineering. It may be recalled that abstraction enables software maintenance personnel to reduce the complexity of understanding a system by: (i) focusing on the more significant information about the system and (ii) hiding the irrelevant details at the moment. On the other hand, refinement is the reverse of abstraction. The principles of abstraction and refinement are explained as follows [5]: Principle of abstraction. The level of abstraction of the representation of a system can be gradually increased by successively replacing the details with abstract information. By means of abstraction one can produce a view that focuses on selected system characteristics by hiding information about other character- istics. Principle of refinement. The level of abstraction of the representation of the system is gradually decreased by successively replacing some aspects of the system with more details. A new software is created by going downward from the top, highest level of abstraction to the bottom, lowest level. This downward movement is known as for- ward engineering. Forward engineering follows a sequence of activities: formulating concepts about the system to identifying requirements to designing the system to implementing the design. On the other hand, the upward movement through the lay- ers of abstractions is called reverse engineering. Reverse engineering of software systems is a process comprising the following steps: (i) analyze the software to deter- mine its components and the relationships among the components and (ii) represent the system at a higher level of abstraction or in another form [6]. Some examples of reverse engineering are: (i) decompilation, in which object code is translated into a high-level program; (ii) architecture extraction, in which the design of a program is derived; (iii) document generation, in which information is produced from, say, source code, for better understanding of the program; and (iv) software visualization, in which some aspect of a program is depicted in an abstract way. www.it-ebooks.info 136 REENGINEERING Implementation Design Conceptual Requirements Abstraction Refinement FIGURE 4.1 Levels of abstraction and refinement. From Reference 5. © 1992 IEEE The concepts of abstraction and refinement are used to create models of software development as sequences of phases, where the phases map to specific levels of abstraction or refinement, as shown in Figure 4.1. The four levels, namely, conceptual, requirements, design, and implementation, are described one by one below: r Conceptual level. At the highest level of abstraction, the software is described in terms of very high-level concepts and its reason for existence (why?). In other words, this level addresses the “why” aspects of the system, by answering the question: “Why does the system exist?”r Requirements level. At this level, functional characteristics (what?) of the system are described at a high level, while leaving the details out. In other words, this level addresses the “what” aspects of the system by answering the question: “What does the system do?”r Design level. At the design-refinement level, system characteristics (what and how?), namely, major components, the architectural style putting the compo- nents together, the interfaces among the components, algorithms, major internal data structures, and databases are described in detail. In other words, this level addresses more of “what” and “how” aspects of the system by answering the questions: (i) “What are the characteristics of the system?” and (ii) “How is the system going to possess the characteristics to deliver the functionalities?”r Implementation level. This is the lowest level of abstraction in the hierarchy. At this level, the system is described at a very low level in terms of implementation details in a high-level language. In other words, this level addresses “how” exactly the system is implemented. In summary, the refinement process can be represented as why? → what? → what and how? → how? and the abstraction process can be represented as how? → what and how? → what? → why? A concept, a requirement, a design, and an implementation of a program usually denote different levels of abstraction. Moving from one level to another involves a process of crossing levels of abstraction. Usually a specification is more abstract than its implementation; therefore, the cycle of abstraction and refinement can be represented as follows: concrete → more abstract → abstract → highly abstract → www.it-ebooks.info A GENERAL MODEL FOR SOFTWARE REENGINEERING 137 Alteration Abstraction Refinement Existing system Target system FIGURE 4.2 Conceptual basis for the reengineering process. From Reference 5. © 1992 IEEE abstract → less abstract → concrete. Abstraction and refinement are important con- cepts, and these are useful in reengineering as well as in forward engineering. In addition to the two principles of abstraction and refinement, an optional principle called alteration underlies many reengineering methods. The principle of alteration is defined as follows: Principle of alteration. The making of some changes to a system representation is known as alteration. Alteration does not involve any change to the degree of abstraction, and it does not involve modification, deletion, and addition of information. Figure 4.2 shows the use of the three fundamental principles to explain reengi- neering characteristics. An important conceptual basis for the reengineering process is the sequence {abstraction, alteration, and refinement}. Reengineering principles are represented by means of arrows. Specifically, abstraction is represented by an up arrow, alteration is represented by a horizontal arrow, and refinement by a down arrow. The arrows depicting refinement and abstraction are slanted, thereby indicat- ing the increase and decrease, respectively, of system information. It may be noted that alteration is nonessential for reengineering. Generally, the path from abstraction to refinement is via alteration. Alteration is guided by reengineering strategies as discussed in Section 4.3.2. Another term closely related to “alteration” is restructuring, which was introduced in Chapter 3 and discussed in Chapter 7. In reengineering context, the term “restruc- turing” is defined as the transformation from one representation form to another at the same relative abstract level while preserving the subject system’s external behavior [6]. Restructuring is often used as a form of preventive maintenance. 4.3 A GENERAL MODEL FOR SOFTWARE REENGINEERING The reengineering process accepts as input the existing code of a system and produces the code of the renovated system. On the one hand, the reengineering process may be www.it-ebooks.info 138 REENGINEERING as straightforward as translating with a tool the source code from the given language to source code in another language. For example, a program written in BASIC can be translated into a new program in C. On the contrary, the reengineering process may be very complex as explained below: r recreate a design from the existing source code;r find the requirements of the system being reengineered;r compare the existing requirements with the new ones;r remove those requirements that are not needed in the renovated system;r make a new design of the desired system; andr code the new system. Founded on the different levels of abstractions used in the development of soft- ware, Figure 4.3, originally proposed by Eric J. Byrne [5], depicts the processes for all abstraction levels of reengineering. This model suggests that reengineer- ing is a sequence of three activities—reverse engineering, re-design, and forward engineering—strongly founded in three principles, namely, abstraction, alteration, and refinement, respectively. A visual metaphor called horseshoe, as depicted in Figure 4.4, was developed by Kazman et al. [7] to describe a three-step architectural reengineering process. Three distinct segments of the horseshoe are the left side, the top part, and the right side. Those three parts denote the three steps of the reengineering process. The first step, represented on the left side, aims at extracting the architecture from the source code by using the abstraction principle. The second step, represented on the top, involves architecture transformation toward the target architecture by using the alter- ation principle. Finally, the third step, represented on the right side, involves the generation of the new architecture by means of refinement. One can look at the horseshoe bottom-up to notice how reengineering progresses at different levels of abstraction: source code, functional model, and architectural design. Conceptual Requirements Design Implementation Conceptual Requirements Design Implementation (Alteration) Rethink Respecify Redesign Recode Reverse engineering (abstraction) Forward engineering (refinement) Existing system Target system FIGURE 4.3 General model of software reengineering. From Reference 5. © 1992 IEEE www.it-ebooks.info A GENERAL MODEL FOR SOFTWARE REENGINEERING 139 Design pattern and styles Base architecture Desired architecture Architectural representation Functional Level representation Code structure representation Source text representation Concepts Concepts Architectural representation Legacy source New system source Code structure representation Functional level representationProgram plans Code styles Architecture based development Architecture recovery/conformance Architecture transformation FIGURE 4.4 Horseshoe model of reengineering. From Reference 7. © 1998 IEEE Now, we are in a position to revisit three definitions of reengineering. r The definition by Chikofsky and Cross II [6]: Software reengineering is the analysis and alteration of an operational system to represent it in a new form and to obtain a new implementation from the new form. Here, a new form means a representation at a higher level of abstraction.r The definition by Byrne [5]: Reengineering of a software system is a process for creating a new software from an existing software so that the new system is better than the original system in some ways.r The definition by Arnold [3]: Reengineering of a software system is an activity that: (i) improves the comprehension of the software system or (ii) raises the quality levels of the software, namely, performance, reusability, and maintain- ability. In summary, it is evident that reengineering entails: (i) the creation of a more abstract view of the system by means of some reverse engineering activities; (ii) the restructuring of the abstract view; and (iii) implementation of the system in a new form by means of forward engineering activities. This process is formally captured by Jacobson and Lindst¨orm [8] with the following expression: Reengineering = Reverse engineering +Δ+Forward engineering. Referring to the right-hand side of the above equation, the first element, namely, “reverse engineering,” is an activity to create an easier to understand and more abstract form of the system. The third element, namely, “forward engineering,” is the traditional process of moving from a high-level abstraction and logical, implementation-independent design to the physical implementation of the system. www.it-ebooks.info 140 REENGINEERING The second element “Δ” captures alterations made to the original system. Two major dimensions of alteration are change in functionality and change in implementation technique. A change in functionality comes from a change in the business rules [9]. Thus, a modification of the business rules results in modifications of the system. Moreover, change of functionality does not affect how the system is implemented, that is, how forward engineering is carried out. Next, concerning a change of implementation technique, an end-user of a system never knows if the system is implemented in an object-oriented language or a procedural language. Often, reengineering is associated with the introduction of a new development technology, for example, model-driven engineering [10]. A variant of reengineering in which the transformation is driven by a major technology change is called migration. Migration of legacy information systems (LIS) is discussed in Chapter 5. Another common term used by practitioners of reengineering is rehosting. Rehost- ing means reengineering of source code without addition or reduction of features in the transformed targeted source code [11]. Rehosting is most effective when the user is satisfied with the system’s functionality, but looks for better qualities of the sys- tem. Examples of better qualities are improved efficiency of execution and reduced maintenance costs. To modify a system’s characteristics, alteration is performed at an abstraction level with much details about the characteristics. For example, if there is a need to translate the source code of a system to a new programming language, there is no need to perform reverse engineering. Rather, alteration, that is recod- ing, is done at the source code level. However, at a higher level of abstraction, say, architecture design, reverse engineering is involved and the amount of alterations to be done is increased. Similarly, to respecify requirements by identifying the func- tional characteristics of the system, reverse engineering techniques are applied to the source code. 4.3.1 Types of Changes The model in Figure 4.3 suggests that an existing system can be reengineered by following one of four paths. The selection of a specific path for reengineering depends partly on the characteristics of the system to be modified. For a given characteristic to be altered, the abstraction level of the information about that characteristics plays a key role in path selection. Based on the type of changes required, system characteristics are divided into groups: rethink, respecify, redesign, and recode. It is worth noting that, on the one hand, modifications performed to characteristics within one group do not cause any changes at a higher level of abstraction. On the other hand, modifications within a particular group do result in modifications to lower levels of abstraction. For instance, one requirement is reflected in many design components, and one design component is realized by a block of source code. Hence, a small change in a design component may require several modifications to the code. However, the change to the design component should not influence the requirements. Next, the characteristics of each group are discussed in what follows. Recode. Implementation characteristics of the source program are changed by recoding it. Source code level changes are performed by means of rephrasing and www.it-ebooks.info A GENERAL MODEL FOR SOFTWARE REENGINEERING 141 program translation. In the latter approach, a program is transformed into a program in a different language. On the other hand, rephrasing keeps the program in the same language [12]. Examples of translation scenarios are compilation, decompilation, and migration. By means of compilation, one transforms a program written in a high-level language into assembly or machine code. Decompilation is a form of transformation in which high-level source code is discovered from an executable program. In migration, a program is transformed into a program in another language while retaining the program’s abstraction level. The language of the new program need not be completely different than the original program’s language; rather, it can be a variation of the first language. Examples of rephrasing scenarios are normalization, optimization, refactoring, and renovation. Normalization reduces a program to a program in a sublanguage, that is to a subset of the language, with the purpose of decreasing its syntactic complexity. Elimination of GOTO and module flattening in a program are examples of program normalization. Optimization is a transformation that improves the execution time or space performance of a program. Refactoring is a transformation that improves the design of a program by means of restructuring to better understand the new program. Redesign. The design characteristics of the software are altered by redesigning the system. Common changes to the software design include: (i) restructuring the archi- tecture; (ii) modifying the data model of the system; and (iii) replacing a procedure or an algorithm with a more efficient one. Respecify. This means changing the requirement characteristics of the system in two ways: (i) change the form of the requirements and (ii) change the scope of the requirements. The former refers to changing only the form of existing requirements, that is, taking the informal requirements expressed in a natural language and generat- ing a formal specification in a formal description language, such as the Specification and Description Language (SDL) or Unified Modeling Language (UML). The latter type of changes includes such changes as adding new requirements, deleting some existing requirements, and altering some existing requirements. Rethink. Rethinking a system means manipulating the concepts embodied in an existing system to create a system that operates in a different problem domain. It involves changing the conceptual characteristics of the system, and it can lead to the system being changed in a fundamental way. Moving from the development of an ordinary cellular phone to the development of smartphone system is an example of Rethink. 4.3.2 Software Reengineering Strategies Three strategies that specify the basic steps of reengineering are rewrite, rework, and replace. The three strategies are founded on three fundamental principles in software engineering, namely, abstraction, alteration, and refinement. The rewrite strategy is based on the principle of alteration. The rework strategy is based on the principles of abstraction, alteration, and refinement. Finally, the replace strategy is based on the principles of abstraction and refinement. www.it-ebooks.info 142 REENGINEERING Alteration Abstraction Abstraction Refinement Refinement (b) Rework(a) Rewrite Alteration (c) Replace FIGURE 4.5 Conceptual basis for reengineering strategies. From Reference 5. © 1992 IEEE Rewrite strategy. This strategy reflects the principle of alteration. By means of alteration, an operational system is transformed into a new system while preserving the abstraction level of the original system. For example, the Fortran code of a system can be rewritten in the C language. The rewrite strategy has been further explained in Figure 4.5a. Rework strategy. The rework strategy applies all the three principles. First, by means of the principle of abstraction, obtain a system representation with less details than what is available at a given level. For example, one can create an abstraction of source code in the form of a high-level design. Next, the reconstructed system model is transformed into the target system representation, by means of alteration, without changing the abstraction level. Finally, by means of refinement, an appropriate new system representation is created at a lower level of abstraction. The main ideas in rework are illustrated in Figure 4.5b. Now, let us consider an example originally given by Byrne [5]. Let the goal of a reengineering project be to restructure the control flow of a program. Specifically, we want to replace the unstructured control flow constructs, namely GOTOs, with more commonly used structured constructs, say, a “for” loop. A classical, rework strategy-based approach to doing that is as follows: r Application of abstraction: By parsing the code, generate a control flow graph (CFG) for the given system.r Application of alteration: Apply a restructuring algorithm to the CFG to produce a structured CFG.r Application of refinement: Translate the new, structured CFG back into the original programming language. Replace strategy. The replace strategy applies two principles, namely, abstraction and refinement. To change a certain characteristic of a system: (i) the system is reconstructed at a higher level of abstraction by hiding the details of the characteristic and (ii) a suitable representation for the target system is generated at a lower level of abstraction by applying refinement. Figure 4.5c succinctly represents the replace strategy. Let us reconsider the GOTO example given by Byrne [5]. By means of abstraction, a program is represented at a higher level without using control flow concepts. For instance, a module’s behavior can be described by its net effect, with no mention of control flow. Next, by means of refinement, the system is represented at a lower level of abstraction with a new structured control flow. In summary, the original unstructured control flow is replaced with a structured control flow. www.it-ebooks.info A GENERAL MODEL FOR SOFTWARE REENGINEERING 143 4.3.3 Reengineering Variations Three reengineering strategies and four broad types of changes were discussed in the preceding sections: (i) rewrite, rework, and replace are the three reengineering strate- gies and (ii) rethink, respecify, redesign, and recode are the four types of changes. The reengineering strategies and the change types can be combined to create dif- ferent process variations. Three process factors cause variability in reengineering processes: r the level of abstraction of the system representation under consideration;r the kind of change to be made: rethink, respecify, redesign, and recode; andr the reengineering strategy to be applied: rewrite, rework, and replace. Possible variations in reengineering processes have been identified in Table 4.1. The table is interpreted by asking questions of the following type: If A is the abstrac- tion level of the representation of the system to be reengineered and the plan is to make a B type of change, can I use strategy C? The table shows 30 reengineering TABLE 4.1 Reengineering Process Variations Reengineering Strategy Starting Abstraction Level Type Change Rewrite Rework Replace Implementation Recode Yes Yes Yes level Redesign Bad Yes Yes Respecify Bad Yes Yes Rethink Bad Yes* Yes* Design Recode No No No level Redesign Yes Yes Yes Respecify Bad Yes Yes Rethink Bad Yes* Yes* Requirement Recode No No No level Redesign No No No Respecify Yes Yes Yes Rethink Bad Yes* Yes* Conceptual Recode No No No level Redesign No No No Respecify No No No Rethink Yes Yes* Yes* Source: From Reference 5. © 1992 IEEE. Yes—One can produce a target system. Yes*—Same as Yes, but the starting degree of abstraction is lower than the uppermost degree of abstraction within the conceptual abstraction level. No—One cannot start at abstraction level A,makeB type of changes by using strategy C, because the starting abstraction level is higher than the abstraction level required by the particular type of change. Bad—A target system can be created, but the likelihood of achieving a good result is low. www.it-ebooks.info 144 REENGINEERING process variations. Out of the 30 variations, 24 variations are likely to produce accept- able solutions. 4.4 REENGINEERING PROCESS An ordered set of activities designed to perform a specific task is called a process. For ease of understanding and communication, processes are described by means of process models. For example, in the software development domain, the Waterfall process model is widely used in developing well-understood software systems. Pro- cess models are used to comprehend, evaluate, reason about, and improve processes. Intuitively, process models are described by means of important relationships among data objects, human roles, activities, and tools. In this section, we discuss two process models for software reengineering. Similarly, by understanding and following a process model for software reengi- neering, one can achieve improvements in how software is reengineered. The process of reengineering a large software system is a complex endeavor. For ease of perform- ing reengineering, the process can be specialized in many ways by developing several variations. In a reengineering process, the concept of approach impacts the overall process structure. If a particular process model requires fine-tuning for certain project goals, those approaches need to be clearly understood. Five major approaches will be explained in the following subsections. 4.4.1 Reengineering Approaches There are five basic approaches to reengineering software systems. Each approach advocates a different path to perform reengineering [13, 14]. Several considerations are made while selecting a particular reengineering approach: r objectives of the project;r availability of resources;r the present state of the system being reengineered; andr the risks in the reengineering project. The five approaches are different in two aspects: (i) the extent of reengineering performed and (ii) the rate of substitution of the operational system with the new one. The five approaches have their own risks and benefits. In the following, we introduce the five basic approaches to software reengineering one by one. Big Bang approach. The “Big Bang” approach replaces the whole system at once. Once a reengineering effort is initiated, it is continued until all the objectives of the project are achieved and the target system is constructed. This approach is generally used if reengineering cannot be done in parts. For example, if there is a need to move to a different system architecture, then all components affected by such a move must www.it-ebooks.info REENGINEERING PROCESS 145 be changed at once. The consequent advantage is that the system is brought into its new environment all at once. On the other hand, the disadvantage of Big Bang is that the reengineering project becomes a monolithic task, which may not be desirable in all situations. In addition, the Big Bang approach consumes too much resources at once for large systems and takes a long stretch of time before the new system is visible. Incremental approach. As the name indicates, by means of this approach a system is reengineered gradually, one step closer to the target system at a time. Thus, for a large system, several new interim versions are produced and released. Successive interim versions satisfy increasingly more project goals than their preceding versions. The desired system is said to be generated after all the project goals are achieved. The advantages of this approach are as follows: (i) locating errors becomes easier, because one can clearly identify the newly added components and (ii) it becomes easy for the customer to notice progress, because interim versions are released. The incremental approach incurs a lower risk than the “Big Bang” approach due to the fact that as a component is reengineered, the risks associated with the corresponding code can be identified and monitored. The disadvantages of the incremental approach are as follows: (i) with multiple interim versions and their careful version controls, the incremental approach takes much longer to complete; and (ii) even if there is a need, the entire architecture of the system cannot be changed. Partial approach. In this approach, only a part of the system is reengineered and then it is integrated with the nonengineered portion of the system. One must decide whether to use a “Big Bang” approach or an “Incremental” approach for the portion to be reengineered. The following three steps are followed in the partial approach: r In the first step, the existing system is partitioned into two parts: one part is identified to be reengineered and the remaining part to be not reengineered.r In the second step, reengineering work is performed using either the “Big Bang” or the “Incremental” approach.r In the third step, the two parts, namely, the not-to-be-reengineered part and the reengineered part of the system, are integrated to make up the new system. The afore-described partial approach has the advantage of reducing the scope of reengineering to a level that best matches an organization’s current need and desire to spend a certain amount of resources. A reduced scope implies that the selected portions of a system to be modified are those that are urgently in need of reengineering. A reduced scope of reengineering takes less time and costs less. A disadvantage of the partial approach is that modifications are not performed to the interface between the portion modified and the portion not modified. Iterative approach. The reengineering process is applied on the source code of a few procedures at a time, with each reengineering operation lasting for a short time. This process is repeatedly executed on different components in different stages. During the execution of the process, ensure that the four types of components can www.it-ebooks.info 146 REENGINEERING coexist: old components not reengineered, components currently being reengineered, components already reengineered, and new components added to the system. Their coexistence is necessary for the operational continuity of the system. There are two advantages of the iterative reengineering process: (i) it guarantees the continued operation of the system during the execution of the reengineering process and (ii) the maintainers’ and the users’ familiarities with the system are preserved. The disadvantage of this approach is the need to keep track of the four types of components during the reengineering process. In addition, both the old and the newly reengineered components need to be maintained. Evolutionary approach. Similar to the ”Incremental” approach, in the ”Evolution- ary” approach components of the original system are substituted with reengineered components. However, in this approach, the existing components are grouped by functions and reengineered into new components. Software engineers focus their reengineering efforts on identifying functional objects irrespective of the locations of those components within the current system. As a result, the new system is built with functionally cohesive components as needed. There are two advantages of the “Evolutionary” approach: (i) the resulting design is more cohesive and (ii) the scope of individual components is reduced. As a result, the “Evolutionary” approach works well to convert an operational system into an object-oriented system. A major disad- vantage of the approach is as follows: all the functions with much similarities must be first identified throughout the operational system; next, those functions are refined as one unit in the new system. 4.4.2 Source Code Reengineering Reference Model The Source Code Reengineering Reference Model (SCORE/RM) is useful in under- standing the process of reengineering of software. The model was proposed by Colbrook, Smythe, and Darlison [15]. The framework, depicted in Figure 4.6, con- sists of four kinds of elements: function, documentation, repository database, and metrication. The function element is divided into eight layers, namely, encapsulation, transformation, normalization, interpretation, abstraction, causation, regeneration, and certification. The eight layers provide a detailed approach to (i) rationalizing the system to be reengineered by removing redundant data and altering the control flow; (ii) comprehending the software’s requirements; and (iii) reconstructing the software according to established practices. The top six of the eight layers shown in Figure 4.6 constitute a process for reverse engineering, and the bottom three layers constitute a process for forward engineering. Both the processes include causation, because it represents the derivation of requirements specification for the software. Improvements in the software as a result of reengineering are quantified by means of the metrication element. The metrication element is described in terms of the relevant software metrics before executing a layer and after executing the same layer. The specification, constraints, and implementation details of both the old and the new versions of the software are described in the documentation element. The repository database is the information store for the entire reengineering process, containing the following kinds of information: metrication, documentation, and both the old and the www.it-ebooks.info REENGINEERING PROCESS 147 Certification Encapsulation Transformation Normalizatiom Interpretation Abstraction Causation Regeneration Metrication Documentation Repository database Transformed source code Original source code FIGURE 4.6 Source code reengineering reference model. From Reference 15. © 1990 IEEE new source codes. The interfaces among the elements are shown in Figure 4.7. For simplicity, any layer is referred to as (N)-layer, while its next lower and next higher layers are referred to as (N − 1)-layer and the (N + 1)-layer, respectively. The three types of interfaces are explained as follows: r Metrication/function: (N)-MF—the structures describing the metrics and their values. (N−1)-layer (N)-FF (N)-function (N)-documentation (N−1)-function (N+1)-function (N)-layer (N+1)-layer (N)-metrication (N−1)-FF(N)-MF (N)-DF (N)-layer FIGURE 4.7 The interface nomenclature. From Reference 15. © 1990 IEEE. “(N)-” repre- sents the Nth layer www.it-ebooks.info 148 REENGINEERING r Documentation/function: (N)-DF—the structures describing the documentation.r Function/function: (N)-FF—the representation structures for source code passed between the layers. The functions of the individual layers are discussed in the following. Encapsulation: This is the first layer of reverse engineering. In this layer, a reference baseline is created from the original source code. The goal of the reference baseline is to uniquely identify a version of a software and to facilitate its reengineering. The following functions are expected of this layer: r Configuration management. The changes to the software undergoing mainte- nance are recorded by following a well-documented and defined procedure for later use in the new source code. This step requires strong support from upper management by allocating resources.r Analysis. The portions of the software requiring reengineering are evaluated. In addition, cost models for the tangible benefits are put in place.r Parsing. The source code of the system to be reengineered is translated into an intermediate language (IL). The IL can have several dialects, depending upon the relationship between the languages for the new code and the original code. All the reengineering algorithms act upon the IL representation of the source code.r Test generation. This refers to the design of certification tests and their results for the original source code. Certification tests are basically acceptance tests to be used as baseline tests. The “correctness” of the newly derived software will be evaluated by means of the baseline tests. Transformation: To make the code structured, its control flow is changed. This layer performs the following functions: r Rationalization of control flow. The control flow is altered to make code struc- tured.r Isolation. All the external interfaces and referenced software are identified.r Procedural granularity. This refers to the sizing of the procedures, by using the ideas of high cohesion and low coupling. Normalization: In this stage data and their structures are scrutinized by means of the following functions: r Data reduction. Duplicate data are eliminated. To be consistent with the require- ments of the program, databases are modified.r Data representation. The life histories of the data entities are now generated. The life histories describe how data are changed and reveal which control structures act on the data. www.it-ebooks.info REENGINEERING PROCESS 149 Interpretation: The process of deriving the meaning of a piece of software is started in this layer. The interpretation layer performs the following functions: r Functionalization. This is additional rationalization of the data and control structure of the code, which (i) eliminates global variables and/or (ii) introduces recursion and polymorphic functions.r Program reading. This means annotating the source code with logical com- ments. Abstraction: The annotated and rationalized source code is examined by means of abstractions to identify the underlying object hierarchies. The abstraction layer performs the following functions: r Object identification. The main idea in object identification is (i) separate the data operators and (ii) group those data operators with the data they manipulate.r Object interpretation. Application domain meanings are mapped to the objects identified above. It is the different implementations of those objects that produce differences between the renovated code and the original code. Causation: This layer performs the following functions: r Specification of actions. This refers to the services provided to the user.r Specification of constraints. This refers to the limitations within which the software correctly operates.r Modification of specification. The specification is extended and/or reduced to accurately reflect the user’s requirements. Regeneration: Regeneration means reimplementing the source code using the requirements and the functional specifications. The layer performs the following functions: r Generation of design. This refers to the production and documentation of the detailed design.r Generation of code. This means generating new code by reusing portions of the original code and using standard libraries.r Test generation. New tests are generated to perform unit and integration tests on the source code developed and reused. Certification: The newly generated software is analyzed to establish that it is (i) operating correctly; (ii) performing the specified requirements; and (iii) consistent with the original code. The layer performs the following functions: r Validation and Verification. The new system is tested to show its correctness. www.it-ebooks.info 150 REENGINEERING r Conformance. Tests are performed to show that the renovated source code performs at the minimum all those functionalities that were performed by the original source code. It is not known what form the equivalence relationship must take, particularly when modification of the specification is likely to have occurred during reengineering. 4.4.3 Phase Reengineering Model The phase model of software reengineering was originally proposed by Byrne and Gustafson [14] in circa 1992. The model comprises five phases: analysis and planning, renovation, target system testing, redocumentation, and acceptance testing and system transition, as depicted in Figure 4.8. The labels on the arcs denote the possible information that flows from the tail entities of the arcs to the head entities. A major process activity is represented by each phase. Tasks represent a phase’s activities, and tasks can be further decomposed to reveal the detailed methodologies. Analysis and planning. The first phase of the model is analysis and planning. Analysis addresses three technical and one economic issue. The first technical issue concerns the present state of the system to be reengineered and understanding its properties. The second technical issue concerns the identification of the need for the system to be reengineered. The third technical issue concerns the specification of the characteristics of the new system to be produced. Specifically, one identifies (i) the characteristics of the system that are needed to modified and (ii) the func- tionalities of the system that are needed to be modified. The identified modifications are analyzed and eventually integrated into the system. In addition, understand the expected characteristics of the new system to plan the required work. Analysis and planning Dedocumentation Renovation Target system testing Acceptance and system transition Documentation standards Existing system documentation Existing test plans Existing system source code Historical project database Existing standards and procedures Target system documentation Target system source code Target system source code Target system test plan Target system source code Change record Target system source code Acceptance test plan Initial transition plan Initial test plan Project plan Source code Existing system Project informationDocumentation plan FIGURE 4.8 Software reengineering process phases. From Reference 14. © 1992 IEEE www.it-ebooks.info REENGINEERING PROCESS 151 TABLE 4.2 Tasks—Analysis and Planning Phase Task Description Implementation motivations and objectives List the motivations for reengineering the system. List the objectives to be achieved. Analyze environment Identify the differences between the existing and the target environments. Differences can influence system changes. Collect inventory Form a baseline for knowledge about the operational system by locating all program files, documents, test plans, and history of maintenance. Analyze implementation Analyze the source code and record the details of the code. Define approach Choose an approach to reengineer the system. Define project procedures and standards Procedures outline how to perform reviews and report problems. Standards describe the acceptable formats of the outputs of processes. Identify resources Determine what resources are going to be used; ensure that resources are ready to be used. Identify tools Determine and obtain tools to be used in the reengineering project. Data conversion planning Make a plan to effect changes to databases and files. Test planning Identify test objectives and test procedures, and evaluate the existing test plan. Design new tests if there is a need. Define acceptance criteria By means of negotiations with the customers, identify acceptance criteria for the target system. Documentation planning Evaluate the existing documentation. Develop a plan to redocument the target system. Plan system transition Develop an end-of-project plan to put the new system into operation and phase out the old one. Estimation Estimate the resource requirements of the project: effort, cost, duration, and staffing. Define organizational structure Identify personnel for the project, and develop a project organization. Scheduling Develop a schedule, including dependencies, for project phases and tasks. Source: From Reference 14. © 1992 IEEE. The economic issue concerns a cost and benefit analysis of the reengineering project. The economics of reengineering must compare with the costs, benefits, and risks of developing a new system as well as the costs and risks of maintaining an old system [16]. Planning includes (i) understanding the scope of the work; (ii) identifying the required resources; (iii) identifying the tasks and milestones; (iv) estimating the required effort; and (v) preparing a schedule. The tasks to be performed in this phase are listed in Table 4.2. Renovation. An operational system is modified into the target system in the ren- ovation phase. Two main aspects of a system are considered in this phase: (i) repre- sentation of the system and (ii) representation of external data. In general, the former www.it-ebooks.info 152 REENGINEERING Design Source code Existing implementation Traget implementation Object code Forward engineering Reverse engineering Decompilation (Reverse engineering) Compilation Recode Design Source code Object code Less abstract More abstract 1 2 2 3 3 FIGURE 4.9 Replacement strategies for recoding refers to source code, but it may include the design model and the requirement spec- ification of the existing system. On the other hand, the latter refers to the database and/or data files used by the system. Often the external data are reengineered, and it is known as data reengineering. Data reengineering has been discussed in Section 4.8. An operational system can be renovated in many ways, depending upon the objec- tives of the project, the approach followed, and the starting representation of the system. It may be noted that the starting representation can be source code, design, or requirements. Table 4.1 in Section 4.3.3 recommends several alternatives to renovate a system. Selection of a specific renovation approach is a management decision. Let us consider an example of a project in which the objective is to recode the system from Fortran to C. Figure 4.9 shows the three possible replacement strategies. First, to perform source-to-source translation, program migration is used. Program migration accepts the source code for the system to be reengineered as input and pro- duces new source code as output for the target system. Second, a high-level design is constructed from the operational source code, say, in Fortran, and the resulting design is reimplemented in the target language, C in this case. Finally, a mix of compilation and decompilation is used to obtain the system implementation in C: (i) compile the Fortran code to obtain object code and (ii) decompile the object code to obtain a C version of the program. For all the three approaches, the end effects are the same, but the tasks to be executed are different for each of the three replacement strategies. Target system testing. In this phase for system testing, faults are detected in the target system. Those faults might have been introduced during reengineering. Fault detection is performed by applying the target system test plan on the target system. The same testing strategies, techniques, methods, and tools that are used in software devel- opment are used during reengineering. For example, apply the existing system-level test cases to both the existing and the new systems. Assuming that the two systems have identical requirements, the test results from both the scenarios must be the same. Redocumentation. In the redocumentation phase, documentations are rewritten, updated, and/or replaced to reflect the target system. Documents are revised according www.it-ebooks.info CODE REVERSE ENGINEERING 153 to the redocumentation plan. The two major tasks within this phase are (i) analyze new source code and (ii) create documentation. Documents requiring revision are require- ment specification, design documentation, a report justifying the design decisions, assumptions made in the implementation, configuration, user and reference manuals, on-line help, and the document describing the differences between the existing and the target systems. Different documents require different redocumentation tasks. A task for redocumentation comprises detailed subtasks to make a plan, actually update the document, and review the document. Acceptance and system transition. In this final phase, the reengineered system is evaluated by performing acceptance testing. Acceptance criteria should already have been established in the beginning of the project. Should the reengineered system pass those tests, preparation begins to transition to the new system. On the other hand, if the reengineered system fails some tests, the faults must be fixed; in some cases, those faults are fixed after the target system is deployed. Upon completion of the acceptance tests, the reengineered system is made operational, and the old system is put out of service. System transition is guided by the prior developed transition plan. 4.5 CODE REVERSE ENGINEERING Reverse engineering was first applied in electrical engineering to produce schematics from an electrical circuit. It was defined as the process of developing a set of spec- ifications for a complex hardware system by an orderly examination of specimens of that system [17]. In the context of software engineering, Chikofsky and Cross II [6] defined reverse engineering as a process to (i) identify the components of an operational software; (ii) identify the relationships among those components; and (iii) represent the system at a higher level of abstraction or in another form. In other words, by means of reverse engineering one derives information from the existing software artifacts and transforms it into abstract models to be easily understood by maintenance personnel. The factors necessitating the need for reverse engineering are as follows [18]: r The original programmers have left the organization.r The language of implementation has become obsolete, and the system needs to be migrated to a newer one.r There is insufficient documentation of the system.r The business relies on software, which many cannot understand.r The company acquired the system as part of a larger acquisition and lacks access to all the source code.r The system requires adaptations and/or enhancements.r The software does not operate as expected. The above factors imply that a combination of both high-level and low-level reverse engineering steps need to be applied. High-level reverse engineering means creating www.it-ebooks.info 154 REENGINEERING Requirements Design ImplementationForward engineering Forward engineering Reverse engineering Reverse engineering Design recoveryDesign recovery Reengineering Reengineering Restructuring Restructuring Restructuring FIGURE 4.10 Relationship between reengineering and reverse engineering. From Refer- ence 6. © 1990 IEEE abstractions of source code in the form of design, architecture, and/or documentation. Low-level reverse engineering, discussed in Section 4.7, means creating source code from object code or assembly code. Reverse engineering is performed to achieve two key objectives: redocumentation of artifacts and design recovery. The former aims at revising the current description of components or generating alternative views at the same abstraction level. Examples of redocumentation are pretty printing and drawing CFGs. On the other hand, the latter creates design abstractions from code, expert knowledge, and existing documentation [19]. In design recovery the domain knowledge, external information, and deduction or fuzzy reasoning are added to the observations of the subject system to identify meaningful higher-level abstractions beyond those obtained directly by examining the system itself. The relationship between forward engineering, reengineering, and reverse engineering is shown in Figure 4.10. Although difficulties faced by software maintenance personnel gave rise to the idea of software reverse engineering, it can be used to solve problems in related areas as well. Six objectives of reverse engineering, as identified by Chikofsky and Cross II [6], are generating alternative views, recovering lost information, synthesizing higher levels of abstractions, detecting side effects, facilitating reuse, and coping with com- plexity. If source code is the only reliable representation of a system, following the IEEE Standard for Software Maintenance [20], one can perform reverse engineering on the system to understand it. Reverse engineering has been effectively applied in the following problem areas: r redocumenting programs [21];r identifying reusable assets [22–25];r discovering design architectures [7, 26–30]; www.it-ebooks.info CODE REVERSE ENGINEERING 155 r recovering design patterns [31, 32];r building traceability between code and documentation [33–35];r finding objects in procedural programs [36];r deriving conceptual data models [37–40];r detecting duplications and clones [41–44];r cleaning up code smells [45];r aspect-oriented software development [46];r computing change impact [47];r transforming binary code into source code [48];r redesigning user interfaces [49–51];r parallelizing largely sequential programs [52];r translating a program to another language [53, 54];r migrating data [55];r extracting business rules [9, 56, 57];r wrapping legacy code [58];r auditing security and vulnerability [59, 60]; andr extracting protocols of network applications [61]. Six key steps in reverse engineering, as documented in the IEEE Standard for Software Maintenance [20], are: r partition source code into units;r describe the meanings of those units and identify the functional units;r create the input and output schematics of the units identified before;r describe the connected units;r describe the system application; andr create an internal structure of the system. The first three of the six steps involve local analysis, because those are performed at the unit level. On the other hand, the remaining three steps involve global analysis, because those steps are performed at the system level. A high-level organizational paradigm is found to be useful while setting up a reverse engineering process, as advocated by Benedusi et al. [21, 62]. The high-level paradigm plays two roles: (i) define a framework to use the available methods and tools and (ii) allow the process to be repetitive. They propose a paradigm, namely, Goals/Models/Tools, which partitions a process for reverse engineering into three ordered stages: Goals, Models, and Tools. Next, the three phases are explained one by one. Goals. In this phase, the reasons for setting up a process for reverse engineering are identified and analyzed. Analyses are performed to identify the information needs of the process and the abstractions to be created by the process. The team setting up www.it-ebooks.info 156 REENGINEERING the process first acquires a good understanding of the forward engineering activities and the environment where the products of the reverse engineering process will be used. Results of the aforementioned comprehension are used to accurately identify (i) the information to be generated and (ii) the formalisms to be used to represent the information. For example, the design documents to be generated from source code are as follows: r Low-level documents give both an overview and detailed descriptions of indi- vidual modules; detailed descriptions include the structures of the modules in terms of control flow and data structures.r High-level documents give (i) a general description of the software product and (ii) a detailed description of its structuring in terms of modules, their interconnections, and the flow of information between modules. Models. In this phase, the abstractions identified in the Goals stage are analyzed to create representation models. Representation models include information required for the generation of abstractions. Activities in this phase are: r identify the kinds of documents to be generated;r to produce those documents, identify the information and their relations to be derived from source code;r define the models to be used to represent the information and their relationships extracted from source code; andr to produce the desired documents from those models, define the abstraction algorithm for reverse engineering. The important properties of a reverse engineering model are expressive power, language independence, compactness, richness of information content, granularity, and support for information-preserving transformation. Tools. In this phase, tools needed for reverse engineering are identified, acquired, and/or developed in-house. Those tools are grouped into two categories: (i) tools to extract information and generate program representations according to the identified models and (ii) tools to extract information and produce the required documents. Extraction tools generally work on source code to reconstruct design documents. Therefore, those tools are ineffective in producing inputs for an abstraction process aiming to produce high-level design documents. 4.6 TECHNIQUES USED FOR REVERSE ENGINEERING Fact-finding and information gathering from the source code are the keys to the Goal/Models/Tools paradigm. In order to extract information which is not explicitly available in source code, automated analysis techniques are used. The well-known analysis techniques that facilitate reverse engineering are lexical analysis, syntactic www.it-ebooks.info TECHNIQUES USED FOR REVERSE ENGINEERING 157 analysis, control flow analysis, data flow analysis, program slicing, visualization, and program metrics. In the following sections, we explain these techniques one by one. 4.6.1 Lexical Analysis Lexical analysis is the process of decomposing the sequence of characters in the source code into its constituent lexical units. Various useful representations of program information are enabled by lexical analysis. Perhaps the most widely used program information is the cross reference listing. A program performing lexical analysis is called a lexical analyzer, and it is a part of a programming language’s compiler. Typically it uses rules describing lexical program structures that are expressed in a mathematical notation called regular expressions. Modern lexical analyzers are automatically built using tools called lexical analyzer generators, namely, lex and flex (fast lexical analyzer) [63]. 4.6.2 Syntactic Analysis The next most complex form of automated program analysis is syntactic in nature. Compilers and other tools such as interpreters determine the expressions, statements, and modules of a program. Syntactic analysis is performed by a parser. Here, too, the requisite language properties are expressed in a mathematical formalism called context-free grammars. Usually, these grammars are described in a notation called Backus–Naur Form (BNF). In the BNF notation, the various program parts are defined by rules in terms of their constituents. Similar to syntactic analyzers, parsers can be automatically constructed from a description of the programmatical properties of a programming language. YACC is one of the most commonly used parsing tools [63]. Two types of representations are used to hold the results of syntactic analysis: parse tree and abstract syntax tree. The former is the more primitive one of the two. It is similar to the parsing diagrams used to show how a natural language sentence is broken up into its constituents. However, a parse tree contains details unrelated to actual program meaning, such as the punctuation, whose role is to direct the parsing process. For instance, grouping parentheses are implicit in the tree structure, which can be pruned from the parse tree. Removal of those extraneous details produces a structure called an Abstract Syntax Tree (AST). An AST contains just those details that relate to the actual meaning of a program. Because an AST is a tree, nodes of the tree can be visited in a pre-set manner, such as depth-first order, and the information contained in the node is delivered to the analyzer. Many tools have been based on the AST concept; to understand a program, an analyst makes a query in terms of the node types. The query is interpreted by a tree walker to deliver the requested information. 4.6.3 Control Flow Analysis After determining the structure of a program, control flow analysis (CFA) can be performed on it [64]. The two kinds of CFA are intraprocedural analysis and www.it-ebooks.info 158 REENGINEERING interprocedural analysis. The former shows the order in which statements are exe- cuted within a subprogram, whereas the latter shows the calling relationship among program units. Intraprocedural analysis is performed by generating CFGs of subpro- grams. The idea of basic blocks is central to constructing a CFG. A basic block is a maximal sequence of program statements such that execution enters at the top of the block and leaves only at the bottom via a conditional or an unconditional branch statement. A basic block is represented with one node in the CFG, and an arc indicates possible flow of control from one node to another. A CFG can directly be constructed from an AST by walking the tree to determine basic blocks and then connecting the blocks with control flow arcs. A CFG shows an abstract view of the ways in which a subprogram can execute. Interprocedural analysis is performed by constructing a call graph [65, 66]. Calling relationships between subroutines in a program are represented as a call graph which is basically a directed graph. Specifically, a procedure in the source code is represented by a node in the graph, and the edge from node f to g indicates that procedure f calls procedure g. Call graphs can be static or dynamic. A dynamic call graph is an execution trace of the program. Thus a dynamic call graph is exact, but it only describes one run of the program. On the other hand, a static call graph represents every possible run of the program. 4.6.4 Data Flow Analysis Although CFA is useful, many questions cannot be answered by means of CFA. For example, CFA cannot answer the question: Which program statements are likely to be impacted by the execution of a given assignment statement? To answer this kind of questions, an understanding of definitions (def) of variables and references (uses) of variables is required. Normally, if a variable appears on the left-hand side of an assignment statement, then the variable is said to be defined. On the contrary, if a variable appears on the right-hand side of an assignment statement, then it is said to be referenced in that statement. Data flow analysis (DFA) concerns how values of defined variables flow through and are used in a program [67]. CFA can detect the possibility of loops, whereas DFA can determine data flow anomalies [68]. One example of data flow anomaly is that an undefined variable is referenced. Another example of data flow anomaly is that a variable is successively defined without being referenced in between. DFA enables the identification of code that can never execute, variables that might not be defined before they are used, and statements that might have to be altered when a bug is fixed. 4.6.5 Program Slicing Originally introduced by Mark Weiser, program slicing has served as the basis of numerous tools [69]. The slice of a program for a given variable at a given line of code is the portion of the program that gives a value to the variable at that point. Therefore, if one determines during debugging that the value of a variable at a specific www.it-ebooks.info TECHNIQUES USED FOR REVERSE ENGINEERING 159 [1] int i; [2] int sum = 0; [3] int product = 1; [4] for(i = 0;((i