15 EECS 280 Container ADTs


EECS 280 Programming and Introductory Data Structures Container ADTs Slides by Andrew DeOrio, Jeff Ringenberg and Brian Noble 1 Abstract Data Types Review Abstract data types provide two main advantages: Information hiding: we don't need to know the details of how the object is represented, nor do we need to know how the operations on those objects are implemented. Encapsulation: the objects and their operations are defined in the same place; the ADT combines both data and operation in one entity. 2 Containers The purpose of a container is to hold other objects For example, an array can hold integers: int array[10]; a[4] = 517; What if we want more features in our container? Say, a set of integers. This is a “set” in the mathematical sense: A collection of zero or more integers, with no duplicates. 3 Set abstraction Let's create an abstraction that holds a set of integers There are four operations on this set that we will define: Insert a value into the set Remove a value from the set Query to see if a value is in the set Count the number of elements in the set 4 IntSet Declaration class IntSet { public: IntSet(); void insert(int v); void remove(int v); bool query(int v) const; int size() const; //... }; 5 IntSet Declaration class IntSet { //... //EFFECTS: returns true if v is in set, //false otherwise bool query(int v) const; //EFFECTS: returns |set| int size() const; //... }; 6 IntSet Representation class IntSet { public: IntSet(); void insert(int v); void remove(int v); bool query(int v) const; int size() const; }; The class is incomplete because we haven't chosen a representation for sets Choosing a representation involves two things: 1. Deciding what concrete data elements will be used to represent the values of the set 2. Providing an implementation for each method 7 IntSet Representation class IntSet { public: IntSet(); void insert(int v); void remove(int v); bool query(int v) const; int size() const; }; Despite not having a representation for a set, the (incomplete) declaration is all that a customer of the IntSet abstraction needs to know since it has: The general overview of the ADT. The specification of each method. 8 IntSet Representation Start with a representation invariant for the set: Use an array Represent a set of size N as an unordered array of integers with no duplicates, stored in the first N slots of the array. int elts_size: equal to the number of elements currently in the array. 9 IntSet Representation Since this is an array, and arrays have maximum sizes, we have to choose a maximum size int ELTS_CAPACITY = 100; 100 is arbitrary, but soon we'll get to dynamic memory Include the size in the OVERVIEW //OVERVIEW: mutable set of ints, |set|<=ELTS_CAPACITY Account for full sets in the REQUIRES clause for insert: //REQUIRES: set is not full //MODIFIES: this //EFFECTS: set=set+{v} void insert(int v); 10 IntSet::size() class IntSet { //OVERVIEW: mutable set of ints,|set|<=ELTS_CAPACITY int elts[ELTS_CAPACITY]; int elts_size; public: static const int ELTS_CAPACITY = 100; void insert(int v); void remove(int v); bool query(int v) const; int size() const; }; 11 static Member Variables class IntSet { public: static const int ELTS_CAPACITY = 100; }; static means "there's only one" All instances of the class share this member variable A lot like a global variable const means "you can't change this" 12 static is Confusing! static int add(int a, int b) { return a + b; } Static function: visible only inside one file (internal linkage) class IntSet { public: static const int ELTS_CAPACITY = 100; }; Static member variable: all instances of the IntSet class share this member variable 13 IntSet::size() class IntSet { int elts[ELTS_CAPACITY]; int elts_size; public: static const int ELTS_CAPACITY = 100; void insert(int v); void remove(int v); bool query(int v) const; int size() const; }; int IntSet::size() const { return elts_size; } 14 Given this representation, and the representation invariants, we can write the methods. Because our rep invariant says that elts_size is always the size of the set, we can return it directly Searching IntSet Next, consider the three final member functions: query: search the array looking for a specific number remove: search the array for a number; if it exists, remove it insert: search the array for a number; if it doesn't exist, add it All three of these have search in common. One might be tempted to just write insert and remove in terms of query, but remove won't work that way. Query only tells us whether the element exists, not where – we need one more method 15 IntSet::indexOf() class IntSet { int elts[ELTS_CAPACITY]; int elts_size; int indexOf(int v) const; // EFFECTS: returns the index of // v if it exists in the // array, ELTS_CAPACITY otherwise public: static const int ELTS_CAPACITY = 100; void insert(int v); void remove(int v); bool query(int v) const; int size() const; }; 16 This member function must be private. This is because it exposes details about the concrete representation. It is inappropriate to expose these details to a client of this class. class IntSet { int elts[ELTS_CAPACITY]; int elts_size; int indexOf(int v) const; public: static const int ELTS_CAPACITY = 100; void insert(int v); void remove(int v); bool query(int v) const; int size() const; }; int IntSet::indexOf(int v) const { for (int i = 0; i < elts_size; ++i) { if (elts[i] == v) return i; } return ELTS_CAPACITY; } IntSet::indexOf() 17 IntSet::query() class IntSet { // OVERVIEW omitted for space int elts[ELTS_CAPACITY]; int elts_size; int indexOf(int v) const; public: static const int ELTS_CAPACITY = 100; void insert(int v); void remove(int v); bool query(int v) const; int size() const; }; bool IntSet::query(int v) const { return indexOf(v) != ELTS_CAPACITY; } 18 With indexOf, query is easy! IntSet::insert() The code for insert is not much more difficult than query: 1. First look for the indexOf the element to insert 2. If it doesn’t exist, we need to add this element to the “end” of  the array 3. Using elts_size, the current “end” is elts[elts_size-1] 4. Place the element in the next slot and update elts_size 19 IntSet::insert() class IntSet { int elts[ELTS_CAPACITY]; int elts_size; int indexOf(int v) const; public: static const int ELTS_CAPACITY = 100; void insert(int v); void remove(int v); bool query(int v) const; int size() const; }; void IntSet::insert(int v) { if (query(v)) return; assert(elts_size < ELTS_CAPACITY); //REQUIRES! elts[elts_size++] = v; } 20 Take a couple minutes to figure this out. IntSet::remove () remove works similarly to insert. 1. If the element (called the victim) is in the array we have to remove it leaving a "hole" in the array 2. Instead of moving each element after the victim to the left by one position, pick up the current "last" element and move it to the hole This again breaks the invariant on elts_size, so we must fix it 21 IntSet::remove () void IntSet::remove(int v) { int victim = indexOf(v); if (victim == ELTS_CAPACITY) return;//not found elts[victim] = elts[--elts_size]; } 22 Take a couple minutes to figure this out Initialization Exercise Consider the newly-created set int main() { IntSet s; } Problem: On creation, s's data members are uninitialized! This means that the value of elts_size could be anything, but our representational invariant says it must be zero! Exercise: implement the IntSet default constructor 23 Exercise Add a print() member function "Promise" not to modify any member variables class IntSet { int elts[ELTS_CAPACITY]; int elts_size; public: static const int ELTS_CAPACITY = 100; void insert(int v); void remove(int v); bool query(int v) const; int size() const; }; 25 Using IntSet int main () { IntSet is; is.insert(7); is.insert(4); is.insert(7); is.print(); is.remove(7); is.print(); } 27 ./a.out { 7 4 } { 4 } IntSet efficiency int IntSet::indexOf(int v) const { for (int i = 0; i < elts_size; ++i) { if (elts[i] == v) return i; } return ELTS_CAPACITY; } Question: How many elements of the array must indexOf examine in the worst case if there are 10 elements? If there are 90 elements? 28 IntSet efficiency We say the time for indexOf grows linearly with the size of the set. If there are N elements in the set, we have to examine all N of them in the worst case. For large sets that perform lots of queries, this might be too expensive. Luckily, we can replace this implementation with a different one that can be more efficient. The only change we need to make is to the representation – the abstraction can stay precisely the same. 29 Improving IntSet efficiency Still use an array (of 100 elements) to store the elements of the set and the values will still occupy the first elts_size slots. However, now we'll keep the elements in sorted order The constructor and size methods don’t need to change at all  since they just use the elts_size field. query also doesn't need to change. If the index exists in the array’s legal bounds, then it’s there. 30 Improving IntSet efficiency However, the others all do need to change. We'll start with the easiest one: remove() Recall the old version that moved the last element from the end to somewhere in the middle, this will break the new “sorted”  invariant. Instead of doing a swap, we have to "squish" the array together to cover up the hole. 31 1 2 3 4 5 6 7 1 2 4 5 67 1 2 3 4 5 6 7 1 2 5 6 74 Improving IntSet efficiency How are we going to do the “squish”? Move the element next to the hole to the left leaving a new hole Keep moving elements until the hole is “off the end” of the  elements 32 Improving IntSet efficiency void IntSet::remove(int v) { int gap = indexOf(v); if (gap == ELTS_CAPACITY) return; //not found --elts_size; //one less element while (gap < elts_size) { //there are elts to our right elts[gap] = elts[gap+1]; ++gap; } } 33 Take a couple minutes to figure this out Improving IntSet efficiency We also have to change insert since it currently just places the new element at the end of the array. This also will break the new “sorted” invariant. 34 1 2 34 5 6 7 1 2 5 6 7 34+ 1 2 3 4 5 6 7 Improving IntSet efficiency How are we going to do the insert? Start by moving the last element to the right by one position. Repeat this process until the correct location is found to insert the new element. Stop if the start of the array is reached or the element is sorted. We'll need a new loop variable to track this movement called cand(idate). It's invariant is that it always points to the next element that might have to move to the right. 35 Improving IntSet efficiency void IntSet::insert(int v) { if (indexOf(v) != ELTS_CAPACITY) return;//present! assert(elts_size < ELTS_CAPACITY); //REQUIRES int cand = elts_size-1; // largest element while ((cand >= 0) && elts[cand] > v) { elts[cand+1] = elts[cand]; --cand; } // Now, cand points to the left of the "gap" elts[cand+1] = v; ++elts_size; // repair invariant } 36 Take a couple minutes to figure this out Improving IntSet efficiency void IntSet::insert(int v) { if (indexOf(v) != ELTS_CAPACITY) return;//present! assert(elts_size <= ELTS_CAPACITY) //enough room? int cand = elts_size-1; // largest element while ((cand >= 0) && elts[cand] > v) { elts[cand+1] = elts[cand]; --cand; } // Now, cand points to the left of the "gap". elts[cand+1] = v; ++elts_size; // repair invariant } 37 We are using the "short- circuit" property of &&. If cand is not greater than or equal to zero, we never evaluate the right-hand clause. Improving IntSet efficiency Question: Do we have to change indexOf? int IntSet::indexOf(int v) const { for (int i = 0; i < elts_size; ++i) { if (elts[i] == v) return i; } return ELTS_CAPACITY; } 38 Improving IntSet efficiency Question: Do we have to change indexOf? Answer: No, but it can be made more efficient with the new representation int IntSet::indexOf(int v) const { for (int i = 0; i < elts_size; ++i) { if (elts[i] == v) return i; } return ELTS_CAPACITY; } 39 Improving IntSet efficiency Suppose we are looking for foo. Compare foo against the middle element of the array and there are three possibilities: 1. foo is equal to the middle element. 2. foo is less than the element. 3. foo is greater than the element. If it's case 1, we're done. If it's case 2, then if foo is in the array, it must be to the left of the middle element If it's case 3, then if foo is in the array, it must be to the right of the middle element. 40 Improving IntSet efficiency Compare foo against the middle element of the array and there are three possibilities: 1. foo is equal to the middle element. 2. foo is less than the element. 3. foo is greater than the element. The comparison with the middle element eliminates at least half of the array from consideration! Then, we repeat the same thing over again. You could write this "repetition" as either a tail-recursive program or an iterative one. Most programmers find the iterative version more natural, so we'll write it iteratively, too. 41 Improving IntSet efficiency First, we need to find the “bounds” of the array. The “leftmost” element is always zero, but the “rightmost”  element is elts_size-1. int IntSet::indexOf(int v) const { int left = 0; int right = elts_size-1; ... } 42 Improving IntSet efficiency It's possible that the segment we are examining is empty and we return ELTS_CAPACITY since the element is missing. A nonempty array has at least one element in it, so right is at least as large as left (right >= left). int IntSet::indexOf(int v) const { int left = 0; int right = elts_size-1; while (right >= left) { ... } return ELTS_CAPACITY; } 43 Improving IntSet efficiency Next, find the "middle" element. We do this by finding out the size of our segment (right - left + 1), then divide it by two, and add it to left. int IntSet::indexOf(int v) const { int left = 0; int right = elts_size-1; while (right >= left) { int size = right - left + 1; int middle = left + size/2; ... } return ELTS_CAPACITY; } 44 Improving IntSet efficiency Next, find the "middle" element. We do this by finding out the size of our segment (right - left + 1), then divide it by two, and add it to left. int IntSet::indexOf(int v) const { int left = 0; int right = elts_size-1; while (right >= left) { int size = right - left + 1; int middle = left + size/2; ... } return ELTS_CAPACITY; } 45 If there is an odd number of elements, this will be the "true" middle. If there are an even number, it will be the element to the “right" of  true middle. Improving IntSet efficiency Then, we compare against the middle element. If that's the one we are looking for, we are done. int IntSet::indexOf(int v) const { int left = 0; int right = elts_size-1; while (right >= left) { int size = right - left + 1; int middle = left + size/2; if (elts[middle] == v) return middle; ... } return ELTS_CAPACITY; } 46 Improving IntSet efficiency If we are not looking at the element, the true element (if it exists) must be in either the “smaller” half or the “larger” half. If it is in the smaller half, than we can eliminate all elements at index middle and higher, so we move “right” to middle-1. Likewise, if it would be in the larger half, we move “left” to  middle+1, and we continue looking. 47 Improving IntSet efficiency int IntSet::indexOf(int v) const { int left = 0; int right = elts_size-1; while (right >= left) { int size = right - left + 1; int middle = left + size/2; if (elts[middle] == v) return middle; else if (elts[middle] < v) left = middle+1; else right = middle-1; } return ELTS_CAPACITY; } 48 Take a couple minutes to figure this out. Try using the inputs: v=3, v=8, and v=-1 with the array below. 1 2 3 4 5 6 7 8 90 Improving IntSet efficiency Since you eliminate half of the array with each comparison, this is a much more efficient. If the array has N elements, you'll need “about” log2(N) comparisons to search it. This is really cool, because log2(100) is less than 7 – so, we need only 7 comparisons in the worst case. Also, if you double the size of the array, you need only one extra comparison to do the search. This is called a binary search 49 Improving IntSet efficiency insert and remove are still linear, because they may have to "swap" an element to the beginning/end of the array. Here is the summary of asymptotic performance of each function: Unsorted Sorted insert O(N) O(N) remove O(N) O(N) query O(N) O(log N) 50 All the unsorted versions still require the unsorted indexOf() which is O(N) Improving IntSet efficiency Unsorted Sorted insert O(N) O(N) remove O(N) O(N) query O(N) O(log N) If you are going to do more searching than inserting/removing, you should use the "sorted array" version, because query is faster there. However, if query is relatively rare, you may as well use the "unsorted" version. It's "about the same as" the sorted version for insert and remove, but it's MUCH simpler! 51 Log(n) run time. So what? 52 0 20 40 60 80 100 120 140 1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106 113 120 127 O(n) O(log n) Log(n) run time. So what? 52 0 20 40 60 80 100 120 140 1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106 113 120 127 O(n) O(log n)
还剩50页未读

继续阅读

下载pdf到电脑,查找使用更方便

pdf的实际排版效果,会与网站的显示效果略有不同!!

需要 3 金币 [ 分享pdf获得金币 ] 0 人已下载

下载pdf

pdf贡献者

qiuhuaiyao

贡献于2014-10-27

下载需要 3 金币 [金币充值 ]
亲,您也可以通过 分享原创pdf 来获得金币奖励!
下载pdf