*(&A[j], n-j, i); inssort2*(A, n, 1); } Figure 7.7 An implementation for Shell Sort. It is not necessarily the case that this will be true, but it is almost always true in practice. When the ﬁnal Insertion Sort is conducted, the list should be “almost sorted,” yielding a relatively cheap ﬁnal Insertion Sort pass. Some choices for increments will make Shellsort run more efﬁciently than oth- ers. In particular, the choice of increments described above (2k, 2k1, ..., 2, 1) turns out to be relatively inefﬁcient. A better choice is the following series based on division by three: (..., 121, 40, 13, 4, 1). The analysis of Shellsort is difﬁcult, so we must accept without proof that the average-case performance of Shellsort (for “divisions by three” increments) is O(n1.5). Other choices for the increment series can reduce this upper bound somewhat. Thus, Shellsort is substantially better than Insertion Sort, or any of the ⇥(n2) sorts presented in Section 7.2. In fact, Shellsort is not terrible when com- pared with the asymptotically better sorts to be presented whenever n is of medium size (thought is tends to be a little slower than these other algorithms when they are well implemented). Shellsort illustrates how we can sometimes exploit the spe- cial properties of an algorithm (in this case Insertion Sort) even if in general that algorithm is unacceptably slow. 7.4 Mergesort A natural approach to problem solving is divide and conquer. In terms of sorting, we might consider breaking the list to be sorted into pieces, process the pieces, and then put them back together somehow. A simple way to do this would be to split the list in half, sort the halves, and then merge the sorted halves together. This is the idea behind Mergesort. 242 Chap. 7 Internal Sorting 36 20 17 13 28 14 23 15 2823151436201713 20 36 13 17 14 28 15 23 13 14 15 17 20 23 28 36 Figure 7.8 An illustration of Mergesort. The ﬁrst row shows eight numbers that are to be sorted. Mergesort will recursively subdivide the list into sublists of one element each, then recombine the sublists. The second row shows the four sublists of size 2 created by the ﬁrst merging pass. The third row shows the two sublists of size 4 created by the next merging pass on the sublists of row 2. The last row shows the ﬁnal sorted list created by merging the two sublists of row 3. Mergesort is one of the simplest sorting algorithms conceptually, and has good performance both in the asymptotic sense and in empirical running time. Surpris- ingly, even though it is based on a simple concept, it is relatively difﬁcult to im- plement in practice. Figure 7.8 illustrates Mergesort. A pseudocode sketch of Mergesort is as follows: List mergesort(List inlist) { if (inlist.length() <= 1) return inlist;; List L1 = half of the items from inlist; List L2 = other half of the items from inlist; return merge(mergesort(L1), mergesort(L2)); } Before discussing how to implement Mergesort, we will ﬁrst examine the merge function. Merging two sorted sublists is quite simple. Function merge examines the ﬁrst element of each sublist and picks the smaller value as the smallest element overall. This smaller value is removed from its sublist and placed into the output list. Merging continues in this way, comparing the front elements of the sublists and continually appending the smaller to the output list until no more input elements remain. Implementing Mergesort presents a number of technical difﬁculties. The ﬁrst decision is how to represent the lists. Mergesort lends itself well to sorting a singly linked list because merging does not require random access to the list elements. Thus, Mergesort is the method of choice when the input is in the form of a linked list. Implementing merge for linked lists is straightforward, because we need only remove items from the front of the input lists and append items to the output list. Breaking the input list into two equal halves presents some difﬁculty. Ideally we would just break the lists into front and back halves. However, even if we know the length of the list in advance, it would still be necessary to traverse halfway down the linked list to reach the beginning of the second half. A simpler method, which does not rely on knowing the length of the list in advance, assigns elements of the Sec. 7.4 Mergesort 243 template void mergesort(E A[], E temp[], int left, int right) { if (left == right) return; // List of one element int mid = (left+right)/2; mergesort(A, temp, left, mid); mergesort(A, temp, mid+1, right); for (int i=left; i<=right; i++) // Copy subarray to temp temp[i] = A[i]; // Do the merge operation back to A int i1 = left; int i2 = mid + 1; for (int curr=left; curr<=right; curr++) { if (i1 == mid+1) // Left sublist exhausted A[curr] = temp[i2++]; else if (i2 > right) // Right sublist exhausted A[curr] = temp[i1++]; else if (Comp::prior(temp[i1], temp[i2])) A[curr] = temp[i1++]; else A[curr] = temp[i2++]; } } Figure 7.9 Standard implementation for Mergesort. input list alternating between the two sublists. The ﬁrst element is assigned to the ﬁrst sublist, the second element to the second sublist, the third to ﬁrst sublist, the fourth to the second sublist, and so on. This requires one complete pass through the input list to build the sublists. When the input to Mergesort is an array, splitting input into two subarrays is easy if we know the array bounds. Merging is also easy if we merge the subarrays into a second array. Note that this approach requires twice the amount of space as any of the sorting methods presented so far, which is a serious disadvantage for Mergesort. It is possible to merge the subarrays without using a second array, but this is extremely difﬁcult to do efﬁciently and is not really practical. Merging the two subarrays into a second array, while simple to implement, presents another dif- ﬁculty. The merge process ends with the sorted list in the auxiliary array. Consider how the recursive nature of Mergesort breaks the original array into subarrays, as shown in Figure 7.8. Mergesort is recursively called until subarrays of size 1 have been created, requiring log n levels of recursion. These subarrays are merged into subarrays of size 2, which are in turn merged into subarrays of size 4, and so on. We need to avoid having each merge operation require a new array. With some difﬁculty, an algorithm can be devised that alternates between two arrays. A much simpler approach is to copy the sorted sublists to the auxiliary array ﬁrst, and then merge them back to the original array. Figure 7.9 shows a complete implementation for mergesort following this approach. An optimized Mergesort implementation is shown in Figure 7.10. It reverses the order of the second subarray during the initial copy. Now the current positions of the two subarrays work inwards from the ends, allowing the end of each subarray 244 Chap. 7 Internal Sorting template void mergesort(E A[], E temp[], int left, int right) { if ((right-left) <= THRESHOLD) { // Small list inssort(&A[left], right-left+1); return; } int i, j, k, mid = (left+right)/2; mergesort(A, temp, left, mid); mergesort(A, temp, mid+1, right); // Do the merge operation. First, copy 2 halves to temp. for (i=mid; i>=left; i--) temp[i] = A[i]; for (j=1; j<=right-mid; j++) temp[right-j+1] = A[j+mid]; // Merge sublists back to A for (i=left,j=right,k=left; k<=right; k++) if (Comp::prior(temp[i], temp[j])) A[k] = temp[i++]; else A[k] = temp[j--]; } Figure 7.10 Optimized implementation for Mergesort. to act as a sentinel for the other. Unlike the previous implementation, no test is needed to check for when one of the two subarrays becomes empty. This version also uses Insertion Sort to sort small subarrays. Analysis of Mergesort is straightforward, despite the fact that it is a recursive algorithm. The merging part takes time ⇥(i) where i is the total length of the two subarrays being merged. The array to be sorted is repeatedly split in half until subarrays of size 1 are reached, at which time they are merged to be of size 2, these merged to subarrays of size 4, and so on as shown in Figure 7.8. Thus, the depth of the recursion is log n for n elements (assume for simplicity that n is a power of two). The ﬁrst level of recursion can be thought of as working on one array of size n, the next level working on two arrays of size n/2, the next on four arrays of size n/4, and so on. The bottom of the recursion has n arrays of size 1. Thus, n arrays of size 1 are merged (requiring ⇥(n) total steps), n/2 arrays of size 2 (again requiring ⇥(n) total steps), n/4 arrays of size 4, and so on. At each of the log n levels of recursion, ⇥(n) work is done, for a total cost of ⇥(n log n). This cost is unaffected by the relative order of the values being sorted, thus this analysis holds for the best, average, and worst cases. 7.5 Quicksort While Mergesort uses the most obvious form of divide and conquer (split the list in half then sort the halves), it is not the only way that we can break down the sorting problem. And we saw that doing the merge step for Mergesort when using an array implementation is not so easy. So perhaps a different divide and conquer strategy might turn out to be more efﬁcient? Sec. 7.5 Quicksort 245 Quicksort is aptly named because, when properly implemented, it is the fastest known general-purpose in-memory sorting algorithm in the average case. It does not require the extra array needed by Mergesort, so it is space efﬁcient as well. Quicksort is widely used, and is typically the algorithm implemented in a library sort routine such as the UNIX qsort function. Interestingly, Quicksort is ham- pered by exceedingly poor worst-case performance, thus making it inappropriate for certain applications. Before we get to Quicksort, consider for a moment the practicality of using a Binary Search Tree for sorting. You could insert all of the values to be sorted into the BST one by one, then traverse the completed tree using an inorder traversal. The output would form a sorted list. This approach has a number of drawbacks, including the extra space required by BST pointers and the amount of time required to insert nodes into the tree. However, this method introduces some interesting ideas. First, the root of the BST (i.e., the ﬁrst node inserted) splits the list into two sublists: The left subtree contains those values in the list less than the root value while the right subtree contains those values in the list greater than or equal to the root value. Thus, the BST implicitly implements a “divide and conquer” approach to sorting the left and right subtrees. Quicksort implements this concept in a much more efﬁcient way. Quicksort ﬁrst selects a value called the pivot. (This is conceptually like the root node’s value in the BST.) Assume that the input array contains k values less than the pivot. The records are then rearranged in such a way that the k values less than the pivot are placed in the ﬁrst, or leftmost, k positions in the array, and the values greater than or equal to the pivot are placed in the last, or rightmost, nk positions. This is called a partition of the array. The values placed in a given partition need not (and typically will not) be sorted with respect to each other. All that is required is that all values end up in the correct partition. The pivot value itself is placed in position k. Quicksort then proceeds to sort the resulting subarrays now on either side of the pivot, one of size k and the other of size n k 1. How are these values sorted? Because Quicksort is such a good algorithm, using Quicksort on the subarrays would be appropriate. Unlike some of the sorts that we have seen earlier in this chapter, Quicksort might not seem very “natural” in that it is not an approach that a person is likely to use to sort real objects. But it should not be too surprising that a really efﬁcient sort for huge numbers of abstract objects on a computer would be rather different from our experiences with sorting a relatively few physical objects. The C++ code for Quicksort is shown in Figure 7.11. Parameters i and j deﬁne the left and right indices, respectively, for the subarray being sorted. The initial call to Quicksort would be qsort(array, 0, n-1). Function partition will move records to the appropriate partition and then return k, the ﬁrst position in the right partition. Note that the pivot value is initially 246 Chap. 7 Internal Sorting template void qsort(E A[], int i, int j) { // Quicksort if (j <= i) return; // Don’t sort 0 or 1 element int pivotindex = findpivot(A, i, j); swap(A, pivotindex, j); // Put pivot at end // k will be the first position in the right subarray int k = partition(A, i-1, j, A[j]); swap(A, k, j); // Put pivot in place qsort(A, i, k-1); qsort(A, k+1, j); } Figure 7.11 Implementation for Quicksort. placed at the end of the array (position j). Thus, partition must not affect the value of array position j. After partitioning, the pivot value is placed in position k, which is its correct position in the ﬁnal, sorted array. By doing so, we guarantee that at least one value (the pivot) will not be processed in the recursive calls to qsort. Even if a bad pivot is selected, yielding a completely empty partition to one side of the pivot, the larger partition will contain at most n 1 elements. Selecting a pivot can be done in many ways. The simplest is to use the ﬁrst key. However, if the input is sorted or reverse sorted, this will produce a poor partitioning with all values to one side of the pivot. It is better to pick a value at random, thereby reducing the chance of a bad input order affecting the sort. Unfortunately, using a random number generator is relatively expensive, and we can do nearly as well by selecting the middle position in the array. Here is a simple findpivot function: template inline int findpivot(E A[], int i, int j) { return (i+j)/2; } We now turn to function partition. If we knew in advance how many keys are less than the pivot, partition could simply copy elements with key values less than the pivot to the low end of the array, and elements with larger keys to the high end. Because we do not know in advance how many keys are less than the pivot, we use a clever algorithm that moves indices inwards from the ends of the subarray, swapping values as necessary until the two indices meet. Figure 7.12 shows a C++ implementation for the partition step. Figure 7.13 illustrates partition. Initially, variables l and r are immedi- ately outside the actual bounds of the subarray being partitioned. Each pass through the outer do loop moves the counters l and r inwards, until eventually they meet. Note that at each iteration of the inner while loops, the bounds are moved prior to checking against the pivot value. This ensures that progress is made by each while loop, even when the two values swapped on the last iteration of the do loop were equal to the pivot. Also note the check that r>lin the second while Sec. 7.5 Quicksort 247 template inline int partition(E A[], int l, int r, E& pivot) { do { // Move the bounds inward until they meet while (Comp::prior(A[++l], pivot)); // Move l right and while ((l < r) && Comp::prior(pivot, A[--r])); // r left swap(A, l, r); // Swap out-of-place values } while (l < r); // Stop when they cross return l; // Return first position in right partition } Figure 7.12 The Quicksort partition implementation. Pass 1 Swap 1 Pass 2 Swap 2 Pass 3 72 6 57 88 85 42 83 73 48 60 l r 72 6 57 88 85 42 83 73 48 60 48 6 57 88 85 42 83 73 72 60 r 48 6 57 88 85 42 83 73 72 60 l 48 6 57 42 85 88 83 73 72 60 rl 48 6 57 42 88 83 73 72 60 Initial l l r r 85 l,r Figure 7.13 The Quicksort partition step. The ﬁrst row shows the initial po- sitions for a collection of ten key values. The pivot value is 60, which has been swapped to the end of the array. The do loop makes three iterations, each time moving counters l and r inwards until they meet in the third pass. In the end, the left partition contains four values and the right partition contains six values. Function qsort will place the pivot value into position 4. loop. This ensures that r does not run off the low end of the partition in the case where the pivot is the least value in that partition. Function partition returns the ﬁrst index of the right partition so that the subarray bound for the recursive calls to qsort can be determined. Figure 7.14 illustrates the complete Quicksort algorithm. To analyze Quicksort, we ﬁrst analyze the findpivot and partition functions operating on a subarray of length k. Clearly, findpivot takes con- stant time. Function partition contains a do loop with two nested while loops. The total cost of the partition operation is constrained by how far l and r can move inwards. In particular, these two bounds variables together can move a 248 Chap. 7 Internal Sorting Pivot = 6 Pivot = 73 Pivot = 57 Final Sorted Array Pivot = 60 Pivot = 88 42 57 48 57 6 42 48 57 60 72 73 83 85 88 Pivot = 42 Pivot = 85 6 57 88 60 42 83 73 48 85 8572738388604257648 6 4842 42 48 85 83 88 8583 72 73 85 88 83 72 Figure 7.14 An illustration of Quicksort. total of s steps for a subarray of length s. However, this does not directly tell us how much work is done by the nested while loops. The do loop as a whole is guaranteed to move both l and r inward at least one position on each ﬁrst pass. Each while loop moves its variable at least once (except in the special case where r is at the left edge of the array, but this can happen only once). Thus, we see that the do loop can be executed at most s times, the total amount of work done moving l and r is s, and each while loop can fail its test at most s times. The total work for the entire partition function is therefore ⇥(s). Knowing the cost of findpivot and partition, we can determine the cost of Quicksort. We begin with a worst-case analysis. The worst case will occur when the pivot does a poor job of breaking the array, that is, when there are no elements in one partition, and n 1 elements in the other. In this case, the divide and conquer strategy has done a poor job of dividing, so the conquer phase will work on a subproblem only one less than the size of the original problem. If this happens at each partition step, then the total cost of the algorithm will be n Xk=1 k =⇥(n2). In the worst case, Quicksort is ⇥(n2). This is terrible, no better than Bubble Sort.2 When will this worst case occur? Only when each pivot yields a bad parti- tioning of the array. If the pivot values are selected at random, then this is extremely unlikely to happen. When selecting the middle position of the current subarray, it 2The worst insult that I can think of for a sorting algorithm. Sec. 7.5 Quicksort 249 is still unlikely to happen. It does not take many good partitionings for Quicksort to work fairly well. Quicksort’s best case occurs when findpivot always breaks the array into two equal halves. Quicksort repeatedly splits the array into smaller partitions, as shown in Figure 7.14. In the best case, the result will be log n levels of partitions, with the top level having one array of size n, the second level two arrays of size n/2, the next with four arrays of size n/4, and so on. Thus, at each level, all partition steps for that level do a total of n work, for an overall cost of n log n work when Quicksort ﬁnds perfect pivots. Quicksort’s average-case behavior falls somewhere between the extremes of worst and best case. Average-case analysis considers the cost for all possible ar- rangements of input, summing the costs and dividing by the number of cases. We make one reasonable simplifying assumption: At each partition step, the pivot is equally likely to end in any position in the (sorted) array. In other words, the pivot is equally likely to break an array into partitions of sizes 0 and n1, or 1 and n2, and so on. Given this assumption, the average-case cost is computed from the following equation: T(n)=cn + 1 n n1 Xk=0 [T(k)+T(n 1 k)], T(0) = T(1) = c. This equation is in the form of a recurrence relation. Recurrence relations are discussed in Chapters 2 and 14, and this one is solved in Section 14.2.4. This equation says that there is one chance in n that the pivot breaks the array into subarrays of size 0 and n 1, one chance in n that the pivot breaks the array into subarrays of size 1 and n 2, and so on. The expression “T(k)+T(n 1 k)” is the cost for the two recursive calls to Quicksort on two arrays of size k and n1k. The initial cn term is the cost of doing the findpivot and partition steps, for some constant c. The closed-form solution to this recurrence relation is ⇥(n log n). Thus, Quicksort has average-case cost ⇥(n log n). This is an unusual situation that the average case cost and the worst case cost have asymptotically different growth rates. Consider what “average case” actually means. We compute an average cost for inputs of size n by summing up for every possible input of size n the product of the running time cost of that input times the probability that that input will occur. To simplify things, we assumed that every permutation is equally likely to occur. Thus, ﬁnding the average means summing up the cost for every permutation and dividing by the number of inputs (n!). We know that some of these n! inputs cost O(n2). But the sum of all the permutation costs has to be (n!)(O(n log n)). Given the extremely high cost of the worst inputs, there must be very few of them. In fact, there cannot be a constant fraction of the inputs with cost O(n2). Even, say, 1% of the inputs with cost O(n2) would lead to 250 Chap. 7 Internal Sorting an average cost of O(n2). Thus, as n grows, the fraction of inputs with high cost must be going toward a limit of zero. We can conclude that Quicksort will have good behavior if we can avoid those very few bad input permutations. The running time for Quicksort can be improved (by a constant factor), and much study has gone into optimizing this algorithm. The most obvious place for improvement is the findpivot function. Quicksort’s worst case arises when the pivot does a poor job of splitting the array into equal size subarrays. If we are willing to do more work searching for a better pivot, the effects of a bad pivot can be decreased or even eliminated. One good choice is to use the “median of three” algorithm, which uses as a pivot the middle of three randomly selected values. Using a random number generator to choose the positions is relatively expensive, so a common compromise is to look at the ﬁrst, middle, and last positions of the current subarray. However, our simple findpivot function that takes the middle value as its pivot has the virtue of making it highly unlikely to get a bad input by chance, and it is quite cheap to implement. This is in sharp contrast to selecting the ﬁrst or last element as the pivot, which would yield bad performance for many permutations that are nearly sorted or nearly reverse sorted. A signiﬁcant improvement can be gained by recognizing that Quicksort is rel- atively slow when n is small. This might not seem to be relevant if most of the time we sort large arrays, nor should it matter how long Quicksort takes in the rare instance when a small array is sorted because it will be fast anyway. But you should notice that Quicksort itself sorts many, many small arrays! This happens as a natural by-product of the divide and conquer approach. A simple improvement might then be to replace Quicksort with a faster sort for small numbers, say Insertion Sort or Selection Sort. However, there is an even better — and still simpler — optimization. When Quicksort partitions are below a certain size, do nothing! The values within that partition will be out of order. However, we do know that all values in the array to the left of the partition are smaller than all values in the partition. All values in the array to the right of the partition are greater than all values in the partition. Thus, even if Quicksort only gets the values to “nearly” the right locations, the array will be close to sorted. This is an ideal situation in which to take advantage of the best-case performance of Insertion Sort. The ﬁnal step is a single call to Insertion Sort to process the entire array, putting the elements into ﬁnal sorted order. Empirical testing shows that the subarrays should be left unordered whenever they get down to nine or fewer elements. The last speedup to be considered reduces the cost of making recursive calls. Quicksort is inherently recursive, because each Quicksort operation must sort two sublists. Thus, there is no simple way to turn Quicksort into an iterative algorithm. However, Quicksort can be implemented using a stack to imitate recursion, as the amount of information that must be stored is small. We need not store copies of a Sec. 7.6 Heapsort 251 subarray, only the subarray bounds. Furthermore, the stack depth can be kept small if care is taken on the order in which Quicksort’s recursive calls are executed. We can also place the code for findpivot and partition inline to eliminate the remaining function calls. Note however that by not processing sublists of size nine or less as suggested above, about three quarters of the function calls will already have been eliminated. Thus, eliminating the remaining function calls will yield only a modest speedup. 7.6 Heapsort Our discussion of Quicksort began by considering the practicality of using a binary search tree for sorting. The BST requires more space than the other sorting meth- ods and will be slower than Quicksort or Mergesort due to the relative expense of inserting values into the tree. There is also the possibility that the BST might be un- balanced, leading to a ⇥(n2) worst-case running time. Subtree balance in the BST is closely related to Quicksort’s partition step. Quicksort’s pivot serves roughly the same purpose as the BST root value in that the left partition (subtree) stores val- ues less than the pivot (root) value, while the right partition (subtree) stores values greater than or equal to the pivot (root). A good sorting algorithm can be devised based on a tree structure more suited to the purpose. In particular, we would like the tree to be balanced, space efﬁcient, and fast. The algorithm should take advantage of the fact that sorting is a special- purpose application in that all of the values to be stored are available at the start. This means that we do not necessarily need to insert one value at a time into the tree structure. Heapsort is based on the heap data structure presented in Section 5.5. Heapsort has all of the advantages just listed. The complete binary tree is balanced, its array representation is space efﬁcient, and we can load all values into the tree at once, taking advantage of the efﬁcient buildheap function. The asymptotic perfor- mance of Heapsort is ⇥(n log n) in the best, average, and worst cases. It is not as fast as Quicksort in the average case (by a constant factor), but Heapsort has special properties that will make it particularly useful when sorting data sets too large to ﬁt in main memory, as discussed in Chapter 8. A sorting algorithm based on max-heaps is quite straightforward. First we use the heap building algorithm of Section 5.5 to convert the array into max-heap order. Then we repeatedly remove the maximum value from the heap, restoring the heap property each time that we do so, until the heap is empty. Note that each time we remove the maximum element from the heap, it is placed at the end of the array. Assume the n elements are stored in array positions 0 through n 1. After removing the maximum value from the heap and readjusting, the maximum value will now be placed in position n 1 of the array. The heap is now considered to be 252 Chap. 7 Internal Sorting of size n 1. Removing the new maximum (root) value places the second largest value in position n2 of the array. After removing each of the remaining values in turn, the array will be properly sorted from least to greatest. This is why Heapsort uses a max-heap rather than a min-heap as might have been expected. Figure 7.15 illustrates Heapsort. The complete C++ implementation is as follows: template void heapsort(E A[], int n) { // Heapsort E maxval; heap H(A, n, n); // Build the heap for (int i=0; i void binsort(E A[], int n) { List B[MaxKeyValue]; E item; for (int i=0; i void radix(E A[], E B[], int n, int k, int r, int cnt[]) { // cnt[i] stores number of records in bin[i] int j; for (int i=0, rtoi=1; i=0; j--) B[--cnt[(getKey::key(A[j])/rtoi)%r]] = A[j]; for (j=0; ji; j--) Consider the effect of replacing this with the following statement: for (int j=n-1; j>0; j--) Would the new implementation work correctly? Would the change affect the asymptotic complexity of the algorithm? How would the change affect the running time of the algorithm? 7.4 When implementing Insertion Sort, a binary search could be used to locate the position within the ﬁrst i 1 elements of the array into which element i should be inserted. How would this affect the number of comparisons re- quired? How would using such a binary search affect the asymptotic running time for Insertion Sort? 7.5 Figure 7.5 shows the best-case number of swaps for Selection Sort as ⇥(n). This is because the algorithm does not check to see if the ith record is already in the ith position; that is, it might perform unnecessary swaps. (a) Modify the algorithm so that it does not make unnecessary swaps. (b) What is your prediction regarding whether this modiﬁcation actually improves the running time? (c) Write two programs to compare the actual running times of the origi- nal Selection Sort and the modiﬁed algorithm. Which one is actually faster? 7.6 Recall that a sorting algorithm is said to be stable if the original ordering for duplicate keys is preserved. Of the sorting algorithms Insertion Sort, Bub- ble Sort, Selection Sort, Shellsort, Mergesort, Quicksort, Heapsort, Binsort, and Radix Sort, which of these are stable, and which are not? For each one, describe either why it is or is not stable. If a minor change to the implemen- tation would make it stable, describe the change. 7.7 Recall that a sorting algorithm is said to be stable if the original ordering for duplicate keys is preserved. We can make any algorithm stable if we alter the input keys so that (potentially) duplicate key values are made unique in a way that the ﬁrst occurrence of the original duplicate value is less than the second occurrence, which in turn is less than the third, and so on. In the worst case, it is possible that all n input records have the same key value. Give an Sec. 7.11 Exercises 267 algorithm to modify the key values such that every modiﬁed key value is unique, the resulting key values give the same sort order as the original keys, the result is stable (in that the duplicate original key values remain in their original order), and the process of altering the keys is done in linear time using only a constant amount of additional space. 7.8 The discussion of Quicksort in Section 7.5 described using a stack instead of recursion to reduce the number of function calls made. (a) How deep can the stack get in the worst case? (b) Quicksort makes two recursive calls. The algorithm could be changed to make these two calls in a speciﬁc order. In what order should the two calls be made, and how does this affect how deep the stack can become? 7.9 Give a permutation for the values 0 through 7 that will cause Quicksort (as implemented in Section 7.5) to have its worst case behavior. 7.10 Assume L is an array, length(L) returns the number of records in the array, and qsort(L, i, j) sorts the records of L from i to j (leaving the records sorted in L) using the Quicksort algorithm. What is the average- case time complexity for each of the following code fragments? (a) for (i=0; i 1) { SPLITk(L, sub); // SPLITk places sublists into sub for (i=0; i L[dpne] we will step to the right by pn until we reach a value in L that is greater than K. We are now within pn positions of K. Assume (for now) that it takes a constant number of comparisons to bracket K within a sublist of size pn. We then take this sublist and repeat the process recursively. That is, at the next level we compute an interpolation to start somewhere in the subarray. We then step to the left or right (as appropriate) by steps of size p pn. What is the cost for QBS? Note that pcn = cn/2, and we will be repeatedly taking square roots of the current sublist size until we ﬁnd the item that we are looking for. Because n =2log n and we can cut log n in half only log log n times, the cost is ⇥(log log n) if the number of probes on jump search is constant. Say that the number of comparisons needed is i, in which case the cost is i (since we have to do i comparisons). If Pi is the probability of needing exactly i probes, then pn Xi=1 iP(need exactly i probes) =1P1 +2P2 +3P3 + ···+ pnPpn 316 Chap. 9 Searching We now show that this is the same as pn Xi=1 P(need at least i probes) =1+(1 P1)+(1 P1 P2)+···+ Ppn =(P1 + ... + Ppn)+(P2 + ... + Ppn)+ (P3 + ... + Ppn)+··· =1P1 +2P2 +3P3 + ···+ pnPpn We require at least two probes to set the bounds, so the cost is 2+ pn Xi=3 P(need at least i probes). We now make take advantage of a useful fact known as ˇCebyˇsev’s Inequality. ˇCebyˇsev’s inequality states that P(need exactly i probes), or Pi, is Pi p(1 p)n (i 2)2n 1 4(i 2)2 because p(1 p) 1/4 for any probability p. This assumes uniformly distributed data. Thus, the expected number of probes is 2+ pn Xi=3 1 4(i 2)2 < 2+1 4 1 Xi=1 1 i2 =2+1 4 ⇡ 6 ⇡ 2.4112 Is QBS better than binary search? Theoretically yes, because O(log log n) grows slower than O(log n). However, we have a situation here which illustrates the limits to the model of asymptotic complexity in some practical situations. Yes, c1 log n does grow faster than c2 log log n. In fact, it is exponentially faster! But even so, for practical input sizes, the absolute cost difference is fairly small. Thus, the constant factors might play a role. First we compare lg lg n to lg n. Factor n lg n lg lg n Di↵erence 16 4 2 2 256 8 3 2.7 216 16 4 4 232 32 5 6.4 Sec. 9.2 Self-Organizing Lists 317 It is not always practical to reduce an algorithm’s growth rate. There is a “prac- ticality window” for every problem, in that we have a practical limit to how big an input we wish to solve for. If our problem size never grows too big, it might not matter if we can reduce the cost by an extra log factor, because the constant factors in the two algorithms might differ by more than the log of the log of the input size. For our two algorithms, let us look further and check the actual number of comparisons used. For binary search, we need about log n 1 total comparisons. Quadratic binary search requires about 2.4lglgn comparisons. If we incorporate this observation into our table, we get a different picture about the relative differ- ences. Factor n lg n 12.4lglgn Di↵erence 16 3 4.8 worse 256 7 7.2 ⇡ same 64K 15 9.61.6 232 31 12 2.6 But we still are not done. This is only a count of raw comparisons. Bi- nary search is inherently much simpler than QBS, because binary search only needs to calculate the midpoint position of the array before each comparison, while quadratic binary search must calculate an interpolation point which is more expen- sive. So the constant factors for QBS are even higher. Not only are the constant factors worse on average, but QBS is far more depen- dent than binary search on good data distribution to perform well. For example, imagine that you are searching a telephone directory for the name “Young.” Nor- mally you would look near the back of the book. If you found a name beginning with ‘Z,’ you might look just a little ways toward the front. If the next name you ﬁnd also begins with ’Z,‘ you would look a little further toward the front. If this particular telephone directory were unusual in that half of the entries begin with ‘Z,’ then you would need to move toward the front many times, each time eliminating relatively few records from the search. In the extreme, the performance of interpo- lation search might not be much better than sequential search if the distribution of key values is badly calculated. While it turns out that QBS is not a practical algorithm, this is not a typical situation. Fortunately, algorithm growth rates are usually well behaved, so that as- ymptotic algorithm analysis nearly always gives us a practical indication for which of two algorithms is better. 9.2 Self-Organizing Lists While ordering of lists is most commonly done by key value, this is not the only viable option. Another approach to organizing lists to speed search is to order the 318 Chap. 9 Searching records by expected frequency of access. While the beneﬁts might not be as great as when organized by key value, the cost to organize (at least approximately) by frequency of access can be much cheaper, and thus can speed up sequential search in some situations. Assume that we know, for each key ki, the probability pi that the record with key ki will be requested. Assume also that the list is ordered so that the most frequently requested record is ﬁrst, then the next most frequently requested record, and so on. Search in the list will be done sequentially, beginning with the ﬁrst position. Over the course of many searches, the expected number of comparisons required for one search is Cn =1p0 +2p1 + ... + npn1. In other words, the cost to access the record in L[0] is 1 (because one key value is looked at), and the probability of this occurring is p0. The cost to access the record in L[1] is 2 (because we must look at the ﬁrst and the second records’ key values), with probability p1, and so on. For n records, assuming that all searches are for records that actually exist, the probabilities p0 through pn1 must sum to one. Certain probability distributions give easily computed results. Example 9.1 Calculate the expected cost to search a list when each record has equal chance of being accessed (the classic sequential search through an unsorted list). Setting pi =1/n yields Cn = n Xi=1 i/n =(n + 1)/2. This result matches our expectation that half the records will be accessed on average by normal sequential search. If the records truly have equal access probabilities, then ordering records by frequency yields no beneﬁt. We saw in Section 9.1 the more general case where we must consider the probability (labeled pn) that the search key does not match that for any record in the array. In that case, in accordance with our general formula, we get (1pn)n +1 2 +pnn = n +1 npnn pn +2pn 2 = n +1+p0(n 1) 2 . Thus, n+1 2 Cn n, depending on the value of p0. A geometric probability distribution can yield quite different results. Sec. 9.2 Self-Organizing Lists 319 Example 9.2 Calculate the expected cost for searching a list ordered by frequency when the probabilities are deﬁned as pi = ⇢ 1/2i if 0 i n 2 1/2n if i = n 1. Then, Cn ⇡ n1 Xi=0 (i + 1)/2i+1 = n Xi=1 (i/2i) ⇡ 2. For this example, the expected number of accesses is a constant. This is because the probability for accessing the ﬁrst record is high (one half), the second is much lower (one quarter) but still much higher than for the third record, and so on. This shows that for some probability distributions, or- dering the list by frequency can yield an efﬁcient search technique. In many search applications, real access patterns follow a rule of thumb called the 80/20 rule. The 80/20 rule says that 80% of the record accesses are to 20% of the records. The values of 80 and 20 are only estimates; every data access pat- tern has its own values. However, behavior of this nature occurs surprisingly often in practice (which explains the success of caching techniques widely used by web browsers for speeding access to web pages, and by disk drive and CPU manufac- turers for speeding access to data stored in slower memory; see the discussion on buffer pools in Section 8.3). When the 80/20 rule applies, we can expect consid- erable improvements to search performance from a list ordered by frequency of access over standard sequential search in an unordered list. Example 9.3 The 80/20 rule is an example of a Zipf distribution. Nat- urally occurring distributions often follow a Zipf distribution. Examples include the observed frequency for the use of words in a natural language such as English, and the size of the population for cities (i.e., view the relative proportions for the populations as equivalent to the “frequency of use”). Zipf distributions are related to the Harmonic Series deﬁned in Equa- tion 2.10. Deﬁne the Zipf frequency for item i in the distribution for n records as 1/(iHn) (see Exercise 9.4). The expected cost for the series whose members follow this Zipf distribution will be Cn = n Xi=1 i/iHn = n/Hn ⇡ n/ loge n. When a frequency distribution follows the 80/20 rule, the average search looks at about 10-15% of the records in a list ordered by frequency. 320 Chap. 9 Searching This is potentially a useful observation that typical “real-life” distributions of record accesses, if the records were ordered by frequency, would require that we visit on average only 10-15% of the list when doing sequential search. This means that if we had an application that used sequential search, and we wanted to make it go a bit faster (by a constant amount), we could do so without a major rewrite to the system to implement something like a search tree. But that is only true if there is an easy way to (at least approximately) order the records by frequency. In most applications, we have no means of knowing in advance the frequencies of access for the data records. To complicate matters further, certain records might be accessed frequently for a brief period of time, and then rarely thereafter. Thus, the probability of access for records might change over time (in most database systems, this is to be expected). Self-organizing lists seek to solve both of these problems. Self-organizing lists modify the order of records within the list based on the actual pattern of record access. Self-organizing lists use a heuristic for deciding how to to reorder the list. These heuristics are similar to the rules for managing buffer pools (see Section 8.3). In fact, a buffer pool is a form of self-organizing list. Ordering the buffer pool by expected frequency of access is a good strategy, because typically we must search the contents of the buffers to determine if the desired information is already in main memory. When ordered by frequency of access, the buffer at the end of the list will be the one most appropriate for reuse when a new page of information must be read. Below are three traditional heuristics for managing self-organizing lists: 1. The most obvious way to keep a list ordered by frequency would be to store a count of accesses to each record and always maintain records in this or- der. This method will be referred to as count. Count is similar to the least frequently used buffer replacement strategy. Whenever a record is accessed, it might move toward the front of the list if its number of accesses becomes greater than a record preceding it. Thus, count will store the records in the order of frequency that has actually occurred so far. Besides requiring space for the access counts, count does not react well to changing frequency of access over time. Once a record has been accessed a large number of times under the frequency count system, it will remain near the front of the list regardless of further access history. 2. Bring a record to the front of the list when it is found, pushing all the other records back one position. This is analogous to the least recently used buffer replacement strategy and is called move-to-front. This heuristic is easy to implement if the records are stored using a linked list. When records are stored in an array, bringing a record forward from near the end of the array will result in a large number of records (slightly) changing position. Move- to-front’s cost is bounded in the sense that it requires at most twice the num- Sec. 9.2 Self-Organizing Lists 321 ber of accesses required by the optimal static ordering for n records when at least n searches are performed. In other words, if we had known the se- ries of (at least n) searches in advance and had stored the records in order of frequency so as to minimize the total cost for these accesses, this cost would be at least half the cost required by the move-to-front heuristic. (This will be proved using amortized analysis in Section 14.3.) Finally, move-to-front responds well to local changes in frequency of access, in that if a record is frequently accessed for a brief period of time it will be near the front of the list during that period of access. Move-to-front does poorly when the records are processed in sequential order, especially if that sequential order is then repeated multiple times. 3. Swap any record found with the record immediately preceding it in the list. This heuristic is called transpose. Transpose is good for list implementations based on either linked lists or arrays. Frequently used records will, over time, move to the front of the list. Records that were once frequently accessed but are no longer used will slowly drift toward the back. Thus, it appears to have good properties with respect to changing frequency of access. Unfortunately, there are some pathological sequences of access that can make transpose perform poorly. Consider the case where the last record of the list (call it X) is accessed. This record is then swapped with the next-to-last record (call it Y), making Y the last record. If Y is now accessed, it swaps with X. A repeated series of accesses alternating between X and Y will continually search to the end of the list, because neither record will ever make progress toward the front. However, such pathological cases are unusual in practice. A variation on transpose would be to move the accessed record forward in the list by some ﬁxed number of steps. Example 9.4 Assume that we have eight records, with key values A to H, and that they are initially placed in alphabetical order. Now, consider the result of applying the following access pattern: FDFGEGFADFGE. Assume that when a record’s frequency count goes up, it moves forward in the list to become the last record with that value for its frequency count. After the ﬁrst two accesses, F will be the ﬁrst record and D will be the second. The ﬁnal list resulting from these accesses will be FGDEABCH, and the total cost for the twelve accesses will be 45 comparisons. If the list is organized by the move-to-front heuristic, then the ﬁnal list will be EGFDABCH, 322 Chap. 9 Searching and the total number of comparisons required is 54. Finally, if the list is organized by the transpose heuristic, then the ﬁnal list will be ABF DGECH, and the total number of comparisons required is 62. While self-organizing lists do not generally perform as well as search trees or a sorted list, both of which require O(log n) search time, there are many situations in which self-organizing lists prove a valuable tool. Obviously they have an advantage over sorted lists in that they need not be sorted. This means that the cost to insert a new record is low, which could more than make up for the higher search cost when insertions are frequent. Self-organizing lists are simpler to implement than search trees and are likely to be more efﬁcient for small lists. Nor do they require additional space. Finally, in the case of an application where sequential search is “almost” fast enough, changing an unsorted list to a self-organizing list might speed the application enough at a minor cost in additional code. As an example of applying self-organizing lists, consider an algorithm for com- pressing and transmitting messages. The list is self-organized by the move-to-front rule. Transmission is in the form of words and numbers, by the following rules: 1. If the word has been seen before, transmit the current position of the word in the list. Move the word to the front of the list. 2. If the word is seen for the ﬁrst time, transmit the word. Place the word at the front of the list. Both the sender and the receiver keep track of the position of words in the list in the same way (using the move-to-front rule), so they agree on the meaning of the numbers that encode repeated occurrences of words. Consider the following example message to be transmitted (for simplicity, ignore case in letters). The car on the left hit the car I left. The ﬁrst three words have not been seen before, so they must be sent as full words. The fourth word is the second appearance of “the,” which at this point is the third word in the list. Thus, we only need to transmit the position value “3.” The next two words have not yet been seen, so must be sent as full words. The seventh word is the third appearance of “the,” which coincidentally is again in the third position. The eighth word is the second appearance of “car,” which is now in the ﬁfth position of the list. “I” is a new word, and the last word “left” is now in the ﬁfth position. Thus the entire transmission would be The car on 3 left hit 3 5 I 5. Sec. 9.3 Bit Vectors for Representing Sets 323 0 1 2 3 4 5 6 7 8 9 10 11 12 15 0000010100 11 101 13 14 0 Figure 9.1 The bit array for the set of primes in the range 0 to 15. The bit at position i is set to 1 if and only if i is prime. This approach to compression is similar in spirit to Ziv-Lempel coding, which is a class of coding algorithms commonly used in ﬁle compression utilities. Ziv- Lempel coding replaces repeated occurrences of strings with a pointer to the lo- cation in the ﬁle of the ﬁrst occurrence of the string. The codes are stored in a self-organizing list in order to speed up the time required to search for a string that has previously been seen. 9.3 Bit Vectors for Representing Sets Determining whether a value is a member of a particular set is a special case of searching for keys in a sequence of records. Thus, any of the search methods discussed in this book can be used to check for set membership. However, we can also take advantage of the restricted circumstances imposed by this problem to develop another representation. In the case where the set values fall within a limited range, we can represent the set using a bit array with a bit position allocated for each potential member. Those members actually in the set store a value of 1 in their corresponding bit; those members not in the set store a value of 0 in their corresponding bit. For example, consider the set of primes between 0 and 15. Figure 9.1 shows the corresponding bit array. To determine if a particular value is prime, we simply check the corre- sponding bit. This representation scheme is called a bit vector or a bitmap. The mark array used in several of the graph algorithms of Chapter 11 is an example of such a set representation. If the set ﬁts within a single computer word, then set union, intersection, and difference can be performed by logical bit-wise operations. The union of sets A and B is the bit-wise OR function (whose symbol is | in C++). The intersection of sets A and B is the bit-wise AND function (whose symbol is & in C++). For example, if we would like to compute the set of numbers between 0 and 15 that are both prime and odd numbers, we need only compute the expression 0011010100010100 & 0101010101010101. The set difference A B can be implemented in C++ using the expression A&˜B (˜ is the symbol for bit-wise negation). For larger sets that do not ﬁt into a single computer word, the equivalent operations can be performed in turn on the series of words making up the entire bit vector. 324 Chap. 9 Searching This method of computing sets from bit vectors is sometimes applied to doc- ument retrieval. Consider the problem of picking from a collection of documents those few which contain selected keywords. For each keyword, the document re- trieval system stores a bit vector with one bit for each document. If the user wants to know which documents contain a certain three keywords, the corresponding three bit vectors are AND’ed together. Those bit positions resulting in a value of 1 cor- respond to the desired documents. Alternatively, a bit vector can be stored for each document to indicate those keywords appearing in the document. Such an organiza- tion is called a signature ﬁle. The signatures can be manipulated to ﬁnd documents with desired combinations of keywords. 9.4 Hashing This section presents a completely different approach to searching arrays: by direct access based on key value. The process of ﬁnding a record using some computa- tion to map its key value to a position in the array is called hashing. Most hash- ing schemes place records in the array in whatever order satisﬁes the needs of the address calculation, thus the records are not ordered by value or frequency. The function that maps key values to positions is called a hash function and will be denoted by h. The array that holds the records is called the hash table and will be denoted by HT. A position in the hash table is also known as a slot. The number of slots in hash table HT will be denoted by the variable M, with slots numbered from 0 to M 1. The goal for a hashing system is to arrange things such that, for any key value K and some hash function h, i = h(K) is a slot in the table such that 0 h(K) void hashdict:: hashInsert(const Key& k, const E& e) { int home; // Home position for e int pos = home = h(k); // Init probe sequence for (int i=1; EMPTYKEY != (HT[pos]).key(); i++) { pos = (home + p(k, i)) % M; // probe Assert(k != (HT[pos]).key(), "Duplicates not allowed"); } KVpair temp(k, e); HT[pos] = temp; } Figure 9.6 Insertion method for a dictionary implemented by a hash table. potentially hold the record. The ﬁrst slot in the sequence will be the home position for the key. If the home position is occupied, then the collision resolution policy goes to the next slot in the sequence. If this is occupied as well, then another slot must be found, and so on. This sequence of slots is known as the probe sequence, and it is generated by some probe function that we will call p. The insert function is shown in Figure 9.6. Method hashInsert ﬁrst checks to see if the home slot for the key is empty. If the home slot is occupied, then we use the probe function, p(k, i) to locate a free slot in the table. Function p has two parameters, the key k and a count i for where in the probe sequence we wish to be. That is, to get the ﬁrst position in the probe sequence after the home slot for key K, we call p(K, 1). For the next slot in the probe sequence, call p(K, 2). Note that the probe function returns an offset from the original home position, rather than a slot in the hash table. Thus, the for loop in hashInsert is computing positions in the table at each iteration by adding the value returned from the probe function to the home position. The ith call to p returns the ith offset to be used. Searching in a hash table follows the same probe sequence that was followed when inserting records. In this way, a record not in its home position can be recov- ered. A C++ implementation for the search procedure is shown in Figure 9.7. The insert and search routines assume that at least one slot on the probe se- quence of every key will be empty. Otherwise, they will continue in an inﬁnite loop on unsuccessful searches. Thus, the dictionary should keep a count of the number of records stored, and refuse to insert into a table that has only one free slot. The discussion on bucket hashing presented a simple method of collision reso- lution. If the home position for the record is occupied, then move down the bucket until a free slot is found. This is an example of a technique for collision resolution known as linear probing. The probe function for simple linear probing is p(K, i)=i. Sec. 9.4 Hashing 335 // Search for the record with Key K template E hashdict:: hashSearch(const Key& k) const { int home; // Home position for k int pos = home = h(k); // Initial position is home slot for (int i = 1; (k != (HT[pos]).key()) && (EMPTYKEY != (HT[pos]).key()); i++) pos = (home + p(k, i)) % M; // Next on probe sequence if (k == (HT[pos]).key()) // Found it return (HT[pos]).value(); else return NULL; // k not in hash table } Figure 9.7 Search method for a dictionary implemented by a hash table. That is, the ith offset on the probe sequence is just i, meaning that the ith step is simply to move down i slots in the table. Once the bottom of the table is reached, the probe sequence wraps around to the beginning of the table. Linear probing has the virtue that all slots in the table will be candidates for inserting a new record before the probe sequence returns to the home position. While linear probing is probably the ﬁrst idea that comes to mind when consid- ering collision resolution policies, it is not the only one possible. Probe function p allows us many options for how to do collision resolution. In fact, linear probing is one of the worst collision resolution methods. The main problem is illustrated by Figure 9.8. Here, we see a hash table of ten slots used to store four-digit numbers, with hash function h(K)=K mod 10. In Figure 9.8(a), ﬁve numbers have been placed in the table, leaving ﬁve slots remaining. The ideal behavior for a collision resolution mechanism is that each empty slot in the table will have equal probability of receiving the next record inserted (assum- ing that every slot in the table has equal probability of being hashed to initially). In this example, assume that the hash function gives each slot (roughly) equal proba- bility of being the home position for the next key. However, consider what happens to the next record if its key has its home position at slot 0. Linear probing will send the record to slot 2. The same will happen to records whose home position is at slot 1. A record with home position at slot 2 will remain in slot 2. Thus, the probability is 3/10 that the next record inserted will end up in slot 2. In a similar manner, records hashing to slots 7 or 8 will end up in slot 9. However, only records hashing to slot 3 will be stored in slot 3, yielding one chance in ten of this happen- ing. Likewise, there is only one chance in ten that the next record will be stored in slot 4, one chance in ten for slot 5, and one chance in ten for slot 6. Thus, the resulting probabilities are not equal. To make matters worse, if the next record ends up in slot 9 (which already has a higher than normal chance of happening), then the following record will end up 336 Chap. 9 Searching 0 1 2 4 3 5 6 7 9 0 1 2 3 4 5 6 7 8 9 8 9050 1001 9877 9050 1001 9877 2037 1059 2037 (a) (b) Figure 9.8 Example of problems with linear probing. (a) Four values are inserted in the order 1001, 9050, 9877, and 2037 using hash function h(K)=K mod 10. (b) The value 1059 is added to the hash table. in slot 2 with probability 6/10. This is illustrated by Figure 9.8(b). This tendency of linear probing to cluster items together is known as primary clustering. Small clusters tend to merge into big clusters, making the problem worse. The objection to primary clustering is that it leads to long probe sequences. Improved Collision Resolution Methods How can we avoid primary clustering? One possible improvement might be to use linear probing, but to skip slots by a constant c other than 1. This would make the probe function p(K, i)=ci, and so the ith slot in the probe sequence will be (h(K)+ic)modM. In this way, records with adjacent home positions will not follow the same probe sequence. For example, if we were to skip by twos, then our offsets from the home slot would be 2, then 4, then 6, and so on. One quality of a good probe sequence is that it will cycle through all slots in the hash table before returning to the home position. Clearly linear probing (which “skips” slots by one each time) does this. Unfortunately, not all values for c will make this happen. For example, if c =2and the table contains an even number of slots, then any key whose home position is in an even slot will have a probe sequence that cycles through only the even slots. Likewise, the probe sequence for a key whose home position is in an odd slot will cycle through the odd slots. Thus, this combination of table size and linear probing constant effectively divides Sec. 9.4 Hashing 337 the records into two sets stored in two disjoint sections of the hash table. So long as both sections of the table contain the same number of records, this is not really important. However, just from chance it is likely that one section will become fuller than the other, leading to more collisions and poorer performance for those records. The other section would have fewer records, and thus better performance. But the overall system performance will be degraded, as the additional cost to the side that is more full outweighs the improved performance of the less-full side. Constant c must be relatively prime to M to generate a linear probing sequence that visits all slots in the table (that is, c and M must share no factors). For a hash table of size M = 10, if c is any one of 1, 3, 7, or 9, then the probe sequence will visit all slots for any key. When M = 11, any value for c between 1 and 10 generates a probe sequence that visits all slots for every key. Consider the situation where c =2and we wish to insert a record with key k1 such that h(k1)=3. The probe sequence for k1 is 3, 5, 7, 9, and so on. If another key k2 has home position at slot 5, then its probe sequence will be 5, 7, 9, and so on. The probe sequences of k1 and k2 are linked together in a manner that contributes to clustering. In other words, linear probing with a value of c>1 does not solve the problem of primary clustering. We would like to ﬁnd a probe function that does not link keys together in this way. We would prefer that the probe sequence for k1 after the ﬁrst step on the sequence should not be identical to the probe sequence of k2. Instead, their probe sequences should diverge. The ideal probe function would select the next position on the probe sequence at random from among the unvisited slots; that is, the probe sequence should be a random permutation of the hash table positions. Unfortunately, we cannot actually select the next position in the probe sequence at random, because then we would not be able to duplicate this same probe sequence when searching for the key. However, we can do something similar called pseudo-random probing. In pseudo-random probing, the ith slot in the probe sequence is (h(K)+ri)modM where ri is the ith value in a random permutation of the numbers from 1 to M 1. All insertion and search operations use the same random permutation. The probe function is p(K, i)=Perm[i 1], where Perm is an array of length M 1 containing a random permutation of the values from 1 to M 1. Example 9.9 Consider a table of size M = 101, with Perm[1] = 5, Perm[2] = 2, and Perm[3] = 32. Assume that we have two keys k1 and k2 where h(k1) = 30 and h(k2) = 35. The probe sequence for k1 is 30, then 35, then 32, then 62. The probe sequence for k2 is 35, then 40, then 37, then 67. Thus, while k2 will probe to k1’s home position as its second choice, the two keys’ probe sequences diverge immediately thereafter. 338 Chap. 9 Searching Another probe function that eliminates primary clustering is called quadratic probing. Here the probe function is some quadratic function p(K, i)=c1i2 + c2i + c3 for some choice of constants c1, c2, and c3. The simplest variation is p(K, i)=i2 (i.e., c1 =1, c2 =0, and c3 =0. Then the ith value in the probe sequence would be (h(K)+i2)modM. Under quadratic probing, two keys with different home positions will have diverging probe sequences. Example 9.10 Given a hash table of size M = 101, assume for keys k1 and k2 that h(k1) = 30 and h(k2) = 29. The probe sequence for k1 is 30, then 31, then 34, then 39. The probe sequence for k2 is 29, then 30, then 33, then 38. Thus, while k2 will probe to k1’s home position as its second choice, the two keys’ probe sequences diverge immediately thereafter. Unfortunately, quadratic probing has the disadvantage that typically not all hash table slots will be on the probe sequence. Using p(K, i)=i2 gives particularly in- consistent results. For many hash table sizes, this probe function will cycle through a relatively small number of slots. If all slots on that cycle happen to be full, then the record cannot be inserted at all! For example, if our hash table has three slots, then records that hash to slot 0 can probe only to slots 0 and 1 (that is, the probe sequence will never visit slot 2 in the table). Thus, if slots 0 and 1 are full, then the record cannot be inserted even though the table is not full. A more realistic example is a table with 105 slots. The probe sequence starting from any given slot will only visit 23 other slots in the table. If all 24 of these slots should happen to be full, even if other slots in the table are empty, then the record cannot be inserted because the probe sequence will continually hit only those same 24 slots. Fortunately, it is possible to get good results from quadratic probing at low cost. The right combination of probe function and table size will visit many slots in the table. In particular, if the hash table size is a prime number and the probe function is p(K, i)=i2, then at least half the slots in the table will be visited. Thus, if the table is less than half full, we can be certain that a free slot will be found. Alternatively, if the hash table size is a power of two and the probe function is p(K, i)=(i2 + i)/2, then every slot in the table will be visited by the probe function. Both pseudo-random probing and quadratic probing eliminate primary cluster- ing, which is the problem of keys sharing substantial segments of a probe sequence. If two keys hash to the same home position, however, then they will always follow the same probe sequence for every collision resolution method that we have seen so far. The probe sequences generated by pseudo-random and quadratic probing (for example) are entirely a function of the home position, not the original key value. Sec. 9.4 Hashing 339 This is because function p ignores its input parameter K for these collision resolu- tion methods. If the hash function generates a cluster at a particular home position, then the cluster remains under pseudo-random and quadratic probing. This problem is called secondary clustering. To avoid secondary clustering, we need to have the probe sequence make use of the original key value in its decision-making process. A simple technique for doing this is to return to linear probing by a constant step size for the probe function, but to have that constant be determined by a second hash function, h2. Thus, the probe sequence would be of the form p(K, i)=i ⇤ h2(K). This method is called double hashing. Example 9.11 Assume a hash table has size M = 101, and that there are three keys k1, k2, and k3 with h(k1) = 30, h(k2) = 28, h(k3) = 30, h2(k1)=2, h2(k2)=5, and h2(k3)=5. Then, the probe sequence for k1 will be 30, 32, 34, 36, and so on. The probe sequence for k2 will be 28, 33, 38, 43, and so on. The probe sequence for k3 will be 30, 35, 40, 45, and so on. Thus, none of the keys share substantial portions of the same probe sequence. Of course, if a fourth key k4 has h(k4) = 28 and h2(k4)=2, then it will follow the same probe sequence as k1. Pseudo- random or quadratic probing can be combined with double hashing to solve this problem. A good implementation of double hashing should ensure that all of the probe sequence constants are relatively prime to the table size M. This can be achieved easily. One way is to select M to be a prime number, and have h2 return a value in the range 1 h2(K) M 1. Another way is to set M =2m for some value m and have h2 return an odd value between 1 and 2m. Figure 9.9 shows an implementation of the dictionary ADT by means of a hash table. The simplest hash function is used, with collision resolution by linear prob- ing, as the basis for the structure of a hash table implementation. A suggested project at the end of this chapter asks you to improve the implementation with other hash functions and collision resolution policies. 9.4.4 Analysis of Closed Hashing How efﬁcient is hashing? We can measure hashing performance in terms of the number of record accesses required when performing an operation. The primary operations of concern are insertion, deletion, and search. It is useful to distinguish between successful and unsuccessful searches. Before a record can be deleted, it must be found. Thus, the number of accesses required to delete a record is equiv- alent to the number required to successfully search for it. To insert a record, an empty slot along the record’s probe sequence must be found. This is equivalent to 340 Chap. 9 Searching // Dictionary implemented with a hash table template class hashdict : public Dictionary { private: KVpair* HT; // The hash table int M; // Size of HT int currcnt; // The current number of elements in HT Key EMPTYKEY; // User-supplied key value for an empty slot int p(Key K, int i) const // Probe using linear probing { return i; } int h(int x) const { return x % M; } // Poor hash function int h(char* x) const { // Hash function for character keys int i, sum; for (sum=0, i=0; x[i] != ’\0’; i++) sum += (int) x[i]; return sum % M; } void hashInsert(const Key&, const E&); E hashSearch(const Key&) const; public: hashdict(int sz, Key k){ // "k" defines an empty slot M = sz; EMPTYKEY = k; currcnt = 0; HT = new KVpair[sz]; // Make HT of size sz for (int i=0; i class TTNode { // 2-3 tree node structure public: E lval; // The node’s left record Key lkey; // Left record’s key E rval; // The node’s right record Key rkey; // Right record’s key TTNode* left; // Pointer to left child TTNode* center; // Pointer to middle child TTNode* right; // Pointer to right child TTNode() { center = left = right = NULL; lkey = rkey = EMPTYKEY; } TTNode(Key lk, E lv, Key rk, E rv, TTNode* p1, TTNode* p2, TTNode* p3) { lkey = lk; rkey = rk; lval = lv; rval = rv; left = p1; center = p2; right = p3; } ˜TTNode() { } bool isLeaf() { return left == NULL; } TTNode* add(TTNode* it); }; Figure 10.10 The 2-3 tree node implementation. techniques of Section 5.3.1 can be applied here to implement separate internal and leaf node types. From the deﬁning rules for 2-3 trees we can derive relationships between the number of nodes in the tree and the depth of the tree. A 2-3 tree of height k has at least 2k1 leaves, because if every internal node has two children it degenerates to the shape of a complete binary tree. A 2-3 tree of height k has at most 3k1 leaves, because each internal node can have at most three children. Searching for a value in a 2-3 tree is similar to searching in a BST. Search begins at the root. If the root does not contain the search key K, then the search progresses to the only subtree that can possibly contain K. The value(s) stored in the root node determine which is the correct subtree. For example, if searching for the value 30 in the tree of Figure 10.9, we begin with the root node. Because 30 is between 18 and 33, it can only be in the middle subtree. Searching the middle child of the root node yields the desired record. If searching for 15, then the ﬁrst step is again to search the root node. Because 15 is less than 18, the ﬁrst (left) branch is taken. At the next level, we take the second branch to the leaf node containing 15. If the search key were 16, then upon encountering the leaf containing 15 we would ﬁnd that the search key is not in the tree. Figure 10.11 is an implementation for the 2-3 tree search method. 362 Chap. 10 Indexing // Find the record that matches a given key value template