The Collection Classes

The .NET Framework contains several predefined classes for collections that will be useful for most applications. The collection classes generally fall into one of three different categories: generic collections, bit collections, and specialized collections, which are collections designed for a specific purpose.

The generic collection classes contain some of the well-known collection algorithms. These include the list, stack, queue, dictionary, and hash table. As mentioned earlier, these classes each work with any .NET Framework data type.

For this hour, you're going to create a simple console-based managed C++ application. This will allow you to concentrate on working with collections without having to spend time working on a user interface. To begin, click New, Project from the File menu in Visual Studio .NET. Select Visual C++ Projects from the list of project types and select the Managed C++ Application template. Give your project the name CollectionTest and click OK to create the project.

In order to work with any of the .NET collections, you need to use the System::Collections namespace. This adds the necessary definitions for the various collection classes and interfaces you will be working with. In the CollectionTest.cpp file, insert the namespace directive for the System::Collections namespace, as shown here:

using namespace System::Collections;

Place this after the namespace directive for the System namespace.

`System::Array`

The first collection you're going to work with is the Array class. The Array class is the one collection that doesn't belong to the System::Collections namespace but is still considered a collection due to the fact that it implements the IList interface, as mentioned earlier.

The Array class also differs from the other collection classes in the way it is created. Other collections can be created by using the C++ new keyword, but in order to create an Array object, you must call the CreateInstance method. The reason for this distinction is because the Array class supports what's known as late binding. In early binding, which is used by the classes in the System::Collections namespace, the compiler determines the type of object the collection uses and can determine the references to the object at compile time. This can have a tremendous performance boost when using the collection. However, the Array class uses late binding, which in most cases can be slower. So, why was it designed like this in the first place? The reason is that late binding is very versatile in that it can work with differing object types during runtime. For instance, the Array function Copy can copy arrays of differing types and handles the type casting automatically.

As mentioned, you create an array using the CreateInstance function. The first parameter to this function is the data type of the elements in this array. The second parameter is the number of elements this array contains. The reason that the number of elements the array requires is needed is due to the very nature of the traditional array data type. When an array is created, a linear block of fixed memory is created based on the type of data and the number of elements in that array. Therefore, you can easily walk through the elements of the array by using an index value without having to keep pointers to each element. The advantage to this is quick lookup times when accessing an element using an index. Also, in an ordered array (an array whose elements are organized in a logical manner, such as alphabetically), you can do quick binary searches for elements. Arrays, however, are not without their disadvantages. Because the memory is allocated only once when the object is created, you cannot insert or delete any of the elements. If you do need to insert an element, the operation will be costly because you need to create another array object that is bigger than the original and then perform a deep copy on each element to the new array.

Open the CollectionTest.cpp file if it isn't open and create a function called TestArray() with no return type and no parameters. For this function and the rest of the collection tests during this hour, you will create the collection, add elements, output some collection statistics, and finally output the values of the collection. Therefore, create an Array object that contains String values and contains six elements. You can follow Listing 16.1 for your implementation of this function. For setting the values of the array, the Array class provides a function named SetValue that's called with the element value and the index to place this value within the array.

Listing 16.1 Adding and Viewing Properties of the `Array` Class

 1: include "stdafx.h"
 2:
 3: #using <mscorlib.dll>
 4: #include <tchar.h>
 5: #include <windows.h>
 6:
 7: using namespace System;
 8: using namespace System::Collections;
 9:
10: void TestArray()
11: {
12:     // create Array with 5 elements
13:     Array* pArray;
14:     pArray = Array::CreateInstance( __typeof(String), 6);
15:
16:     // populate array
17:     pArray->SetValue( S"Doug", 0 );
18:     pArray->SetValue( S"Heather", 1 );
19:     pArray->SetValue( S"Kevin", 2 );
20:     pArray->SetValue( S"Julie", 3 );
21:     pArray->SetValue( S"Daniel", 4 );
22:     pArray->SetValue( S"Lynne", 5 );
23:
24:     // print array statistics
25:     Console::WriteLine( "\tVariable: pArray" );
26:     Console::WriteLine( "\tType: {0}", pArray->GetType()->ToString() );
27:     Console::WriteLine( "\tCount:    {0}", pArray->Length.ToString() );
28: }
29:
30: // This is the entry point for this application
31: int _tmain(void)
32: {
33:     Console::WriteLine();
34:     TestArray();
35: }

The next step is to display some statistics about the collection. This is meant to show how the different classes represent the elements they contain and to allow you to make simple comparisons. Because all the collection classes you are using inherit from the base Object class, you can display the type of the collection by calling the GetType method implemented within the Object class. This can be seen on line 26 of Listing 16.1.

To display a count of the number of elements within the array, you can access the Length property implemented by the Array class, as shown on line 27 of Listing 16.1.

Enumerating Collections

Before you start working with the other collection types, you'll need to add a function that enumerates through all of the collection elements and displays their values in the console. As mentioned earlier, most collections implement the IEnumerable interface. This interface only contains one method, which simply returns an instance of a different object. This new object is the real workhorse of collection enumeration. To get the enumerator object, you call the GetEnumerator method within the collection class, which then returns an IEnumerator object.

The IEnumerator object is implemented by the collection class itself because the collection only knows how it is internally organized. For those not familiar with interfaces and COM, this can be somewhat confusing, especially if you've done a lot of inheritance programming in the past. Each collection class returns an interface named IEnumerator with a call to GetEnumerator, but the internal implementation of each of these objects is different. In other words, if an object declares that it inherits from an interface, it is up to that object to actually implement the methods that the interface contains. More technically, the interface methods within an interface definition are pure virtual functions that must be overridden if an object is to inherit that interface.

The IEnumerator interface contains two methods and one property, as mentioned earlier. However, even though the internal implementation of these functions and this property is different among collection classes, the way a client uses the enumerator stays the same no matter what the collection is.

For this hour, you are going to create a function that prints out the values within a collection in the console. Because I mentioned that enumerating over collections is the same, this one function can be used by all the collection classes you will be working with, except for one, which will be explained later.

Underneath the TestArray function, create a function named PrintCollection that returns void and takes a single pointer to an IEnumerable object. The code for this function can be seen in Listing 16.2. To enumerate over the collection, you first need to call the GetEnumerator function to obtain a pointer to the IEnumerator object. Once you have this object, you can then enumerate over all of the values within the collection by calling MoveNext on the enumerator object to advance to the next value in the collection.

You need to be aware of a couple things when using enumerator objects. First of all, before you can access any of the values within the collection through the enumerator object, you must always call MoveNext first. Before that function is called, the enumerator is in an unknown state and will throw an exception. Also, once the MoveNext function returns a false value, you cannot access the collection values unless you reset the enumerator by calling the Reset function.

To print out the variables within the collection, use a while loop that uses the value returned from the MoveNext function as its conditional statement. If the MoveNext function returns true, you can access the element within the collection. This is done by using the Current property implemented by the collection.

After you have added the PrintCollection function, call it from within your TestArray function to print the array values using your array variable name as the parameter. The finished code for the TestArray function and the PrintCollection function can be seen in Listing 16.2.

Listing 16.2 Enumerating and Outputting the Values of a Collection

 1: void TestArray()
 2: {
 3:     // create Array with 5 elements
 4:     Array* pArray;
 5:     pArray = Array::CreateInstance( __typeof(String), 6);
 6:
 7:     // populate array
 8:     pArray->SetValue( S"Doug", 0 );
 9:     pArray->SetValue( S"Heather", 1 );
10:     pArray->SetValue( S"Kevin", 2 );
11:     pArray->SetValue( S"Julie", 3 );
12:     pArray->SetValue( S"Daniel", 4 );
13:     pArray->SetValue( S"Lynne", 5 );
14:     Console::Write( "\tValues: " );
15:     PrintCollection( pArray );
16:
17:     // print array statistics
18:     Console::WriteLine( "\tVariable: pArray" );
19:     Console::WriteLine( "\tType: {0}", pArray->GetType()->ToString() );
20:     Console::WriteLine( "\tCount:    {0}", pArray->Length.ToString() );
21: }
22:
23: void PrintCollection( IEnumerable* pIEnum )
24: {
25:     System::Collections::IEnumerator* pEnum = pIEnum->GetEnumerator();
26:     while ( pEnum->MoveNext() )
27:     {
28:         Console::Write( "{0} ", pEnum->Current );
29:     }
30:
31:     Console::WriteLine();
32: }

After you build and run your application, your output should look similar to Figure 16.1.

Figure 16.1. The results of creating an array and enumerating its values.

graphics/16fig01.jpg

The `ArrayList` Collection Class

The Array object you just learned about is good for a wide variety of applications and is fast when you known which index you need to access. However, the main disadvantage of using an Array object is the fixed size. With an array, once it is created, you cannot increase or add other elements anywhere in the array, nor can you remove any elements. The .NET Framework contains a different type of collection that has the same index-based lookup performance while still allowing you to add and delete members from the collection.

The ArrayList collection, like the Array class, implements the IList interface. However, the Array class is a fixed size. If you were to call the IList function Add on the Array object, a System.NotSupportedException would be thrown and the exception message would tell you that the collection is a fixed size. However, the ArrayList lets you add and remove elements at will while still maintaining fast lookups using an index value.

Of course, with the advantages also come the disadvantages. Although the ability to insert and remove elements is a good advantage, if you perform many of these operations repeatedly and the size of the array fluctuates, the performance factor starts to look a little less attractive. This is due to the way an array list is implemented internally. When you declare an ArrayList object, a predetermined number of memory blocks are set aside to be used when you start adding elements. Once you get to the point where the array is filled, more blocks of memory need to be created to hold additional values. For example, if you create an ArrayList object, 16 elements in the array are actually created but not indexable. Once you add the 17th element to the array list, another 16 blocks is allocated.

The other disadvantage, which is somewhat related to the repeated memory allocations, concerns memory storage. Once you add another element, which causes more blocks to be allocated, the array remains at that size regardless of how many elements are later removed. For example, if you added 100 elements, 112 blocks in the array are created. If you then removed 99 of those elements, the array size still remains at 112, even though you may never use these blocks again. Figure 16.2 shows how an ArrayList object is internally organized and how it expands its capacity to allow for more elements.

Figure 16.2. The internal structure of the `ArrayList` collection.

graphics/16fig02.gif

Add another function to CollectionTest.cpp, named TestArrayList(), that accepts zero parameters and returns void. To create an array list, simply declare a pointer variable of type ArrayList and assign to it a new instance of an ArrayList object using the new keyword. To add elements to an array list, call the member function Add with a single parameter that's the type of object you wish to add. If you need help, Listing 16.3 demonstrates how this is done.

Listing 16.3 Working with the `ArrayList` Class

 1: void TestArrayList()
 2: {
 3:   // create array list
 4:   ArrayList* pArrayList = new ArrayList();
 5:
 6:   // populate array list
 7:   pArrayList->Add(S"Doug");
 8:   pArrayList->Add(S"Heather");
 9:   pArrayList->Add(S"Kevin");
10:   pArrayList->Add(S"Julie");
11:   pArrayList->Add(S"Daniel");
12:   pArrayList->Add(S"Lynne");
13:
14:   // print arraylist statistics
15:   Console::WriteLine( "\tVariable: pArrayList" );
16:   Console::WriteLine( "\tType: {0}", pArrayList->GetType()->ToString() );
17:   Console::WriteLine( "\tCount:    {0}", pArrayList->Count.ToString() );
18:   Console::WriteLine( "\tCapacity: {0}", pArrayList->Capacity.ToString() );
19:   Console::Write( "\tValues:" );
20:   PrintCollection( pArrayList );
21: }

To view some of the statistics of an ArrayList object, you can copy the same code you used for the TestArray function, making sure to change the name of the variable you are using. However, there is one other property that ArrayList has that Array does not: the Capacity property. The capacity of an array list denotes the number of elements the array can hold before it has to allocate more. By default, the capacity is set to 16, but you are free to change that value by calling set_Capacity, which can be useful if you have any idea how large the array list may become.

Because the ArrayList class implements the IEnumerable interface, you can use the PrintCollection function you created earlier to enumerate through all the elements in the collection. Also, take note that the array list will only allow you to enumerate through the defined values. In other words, even thought the capacity of an array list is more than the number of elements you have added, they are not available because they have not been defined.

The Stack and the Queue

If you've ever taken college-level computer science classes, you probably had to implement either of the two classic data structures: the stack or the queue. The reason that these two collection classes are placed under the same heading is due to their similarity.

A stack is implemented like it sounds�as a stack of objects. Take, for instance, a stack of blocks that you have built. You have one block in your hand and place it on the stack of other blocks. Now, what is the first block you have to take off to get to the other blocks? Obviously, it's the last one you just placed on the stack. This is known as a Last-In First-Out (LIFO) structure.

A queue is just the opposite. To visualize a queue, think of a line of patrons outside of a movie theatre. If you've ever had to stand in line, you know that the first person there will be the first person to get served. This is known as a First-In First-Out (FIFO) structure.

Both data structures are implemented as a list but do not implement the IList interface. The reason for this is that stacks and queues are not indexable and therefore cannot have random access to all the elements within that collection. Instead, you add and remove elements one by one. Figure 16.3 shows a graphical representation of a stack and a queue.

Figure 16.3. The stack and the queue.

graphics/16fig03.gif

The processes of adding and removing items from a stack and queue are similar. The only difference is in the name of the functions. With a stack, you add objects by pushing them with the function Push, and you remove them by calling Pop. However, with a queue, you add items onto the queue by calling Enqueue and remove items with Dequeue.

Create two functions, named TestStack and TestQueue, where both functions take zero parameters and do not have a return value. Using the Push function for the stack and the Enqueue function for the queue, add items onto the appropriate structures. As before, also output the type of the collection and the number of items in the collection. You can use Listing 16.4 as a guide.

The last thing to do is to call the PrintCollection function for the stack and the queue. Compile and run your application and take note of the way the values in each collection are displayed. If everything has worked correctly, you'll notice that the stack collection is in reverse order. This is due to the nature of the stack, as described earlier. The last item you added is actually the first element in the list, and it's the item that will be removed when you call the Pop function.

Listing 16.4 Creating the Stack and the Queue

 1: void TestStack()
 2: {
 3:     // create stack
 4:     Stack* pStack = new Stack();
 5:
 6:     // populate stack
 7:     pStack->Push( S"Doug" );
 8:     pStack->Push( S"Heather" );
 9:     pStack->Push( S"Kevin" );
10:     pStack->Push( S"Julie" );
11:     pStack->Push( S"Daniel" );
12:     pStack->Push( S"Lynne" );
13:
14:     pStack->Pop();
15:
16:     // print stack statistics
17:     Console::WriteLine( "\tVariable: pStack" );
18:     Console::WriteLine( "\tType: {0}", pStack->GetType()->ToString() );
19:    Console::WriteLine( "\tCount:    {0}", pStack->Count.ToString() );
    Console::Write( "\tValues:" );

    PrintCollection( pStack );
}

void TestQueue()
{
    // create queue
    Queue* pQueue = new Queue();
    // populate queue
    pQueue->Enqueue( S"Doug" );
    pQueue->Enqueue( S"Heather" );
    pQueue->Enqueue( S"Kevin" );
    pQueue->Enqueue( S"Julie" );
    pQueue->Enqueue( S"Daniel" );
    pQueue->Enqueue( S"Lynne" );

    // print queue statistics and values
    Console::WriteLine( "\tVariable: pQueue" );
    Console::WriteLine( "\tType: {0}", pQueue->GetType()->ToString() );
    Console::WriteLine( "\tCount:    {0}", pQueue->Count.ToString() );
    Console::Write( "\tValues:" );
    PrintCollection( pQueue );
}

The `Hashtable` Collection Class

Whereas most collections so far have resembled that of a list, the hash table collection does not. Instead, a hash table is known as a dictionary collection. Dictionaries are ordered as one large collection of key and value pairs. In other words, if you were to look up a word in a real dictionary, you would find that it is associated with a definition. In this example, the key is the word and the value is the definition.

The Hashtable class within the .NET Framework allows you to associate any type of managed object with another managed object. If you have ever browsed through the online .NET Framework documentation, you may have noticed that many objects have a function named GetHashCode. This function is designed so that the object can be placed within a dictionary collection, such as a hash table. The GetHashCode function returns a pseudo-unique integer that identifies a key. The integer is considered "pseudo-unique" because implementing a hash function is more of an art than an exact method. In other words, given a large amount of keys, it is possible that at least two of the keys will have the same hash code. This hash code is used as an index into the hash table. The advantages of using a hash code coupled with the hash table is that lookups are much faster than what's possible by searching value by value in other collections.

Create a function within your main file called TestHashtable that also takes zero parameters with no return value (void). To add elements to the Hashtable collection, you call the Add function, giving it an object for the key and an object for the value. For this example, associate each value with an integer key. Listing 16.5 shows how this is done.

Listing 16.5 Creating and Adding Items to a Hash Table

 1: void TestHashTable()
 2: {
 3:     Hashtable* pHashTable = new Hashtable();
 4:
 5:     pHashTable->Add( __box(1), S"Doug" );
 6:     pHashTable->Add( __box(2), S"Heather" );
 7:     pHashTable->Add( __box(3), S"Kevin" );
 8:     pHashTable->Add( __box(4), S"Julie" );
 9:     pHashTable->Add( __box(5), S"Daniel" );
10:     pHashTable->Add( __box(6), S"Lynne" );
11:
12:     // print Hashtable statistics and values
13:     Console::WriteLine( "\tVariable: pHashTable" );
14:     Console::WriteLine( "\tType: {0}", pHashTable->GetType()->ToString());
15:     Console::WriteLine( "\tCount:    {0}", pHashTable->Count.ToString() );
16:     Console::Write( "\tValues: " );
17:     PrintDictionaryCollection( pHashTable );
18: }

You may be wondering why the integers in Listing 16.5 are surrounded by the __box keyword. Because the Hashtable class expects a managed object for the key, you have to convert the integer to a managed type so that it can work within the .NET runtime. The __box keyword copies a value object into a managed object.

If you were to call the PrintCollection function for the Hashtable object you just created, you might notice some peculiarities. Instead of outputting the values you placed within the hash table, the values that get outputted are instead the names of a .NET Framework object type: the System::Collections::DictionaryEntry object. Each element within a hash table first gets placed into a DictionaryEntry object, along with its associated key, before it is placed in the collection. Therefore, when you enumerate the values, you are actually retrieving the DictionaryEntry objects instead of the value objects you placed in the collection. Therefore, create another function named PrintDictionaryCollection with a single IEnumerable* parameter.

DictionaryEntry contains two properties you can access to get the hash table element's key and value. These properties are Key and Value, respectively. Therefore, rather than outputting the Current property as you did before, output the Key and the Value properties instead. This is shown in Listing 16.6 .

Listing 16.6 Enumerating and Printing Values Within a Dictionary Collection

1: void PrintDictionaryCollection( IEnumerable* pIEnum )
2: {
3:     IDictionaryEnumerator* pEnum =
4:      dynamic_cast<IDictionaryEnumerator*>(pIEnum->GetEnumerator());
5:     while ( pEnum->MoveNext() )
6:     {
7:         Console::Write( "{0}={1} ", pEnum->Key, pEnum->Value );
8:     }
9: }

Top