EECS 311: DATA STRUCTURES

Assignment 3: Quad Trees

Goal

The goals of this assignment are:

Two-dimensional data arises in many contexts, particularly problems involving spatial information, such as geographical information systems (GIS) and video games. A common task is finding all points within a given region or near a given point. The techniques that work with scalar data don't do so well here. Red-black trees don't work because 2D points can't be given an ordering (operator<) that keeps nearby points together. Hashtables don't work because they don't keep similar data close together at all. And of course regular vectors have poor O(N) search times.

One solution is quad trees. These are described in Exercises 12.38 through 12.40 in Chapter 12, and online in many places. (For 3D data, a similar data structure is the octree.)

Your job is to implement quad trees and range queries, and test their performance on a moderately sized dataset.

Go slowly. Be sure the first subtask is working before going to the next. Get code for the first subtask approved in the Code Critic before submitting the code for the next subtask.

The Debugger is your friend. You will almost certainly have bugs in your initial coding. Use the debugger to trace through your code to uncover why points get lost or your program crashes.

But don't just write code, watch it break, and try tracing each step with the debugger. Work out yourself, by hand, in detail, exactly how the quad trees should grow with the test cases I've given below. When will splitting occur? What quadrant will a point go into? Once you do that, you'll quickly see in the debugger when the code is going down the wrong track. I speak from experience here.

Tasks

Set up

Create a new project directory called quadtrees. Copy the Magic Makefile into it.

PA3 Task 1: Create a range class

Define a Range2D structure in the files range2d.h and range2d.cpp. This will be used to represent a range of 2-dimensional data, e.g., all points whose X value is between 1 and 10 and all Y values between 25 and 35. This structure is worth developing and testing independently.

Here's the public API:

For testing purposes:

Put the following tests (and others, if you want) in tests.cpp. Note the use of back_inserter to get a suitable iterator for the vector.

TEST(rangeConstructor)
{
  Range2D range( 3, 2, 6, 4 );
  CHECK_EQUAL( 3, range.xmin );
  CHECK_EQUAL( 2, range.ymin );
  CHECK_EQUAL( 6, range.xmax );
  CHECK_EQUAL( 4, range.ymax );
}

TEST(rangeGrow)
{
  Range2D range( 3, 2, 6, 4 );
  range.resize( 2 );
  CHECK_EQUAL( 1.5, range.xmin );
  CHECK_EQUAL( 1, range.ymin );
  CHECK_EQUAL( 7.5, range.xmax );
  CHECK_EQUAL( 5, range.ymax );
}

TEST(rangeShrink)
{
  Range2D range( 3, 2, 6, 4 );
  range.resize( 0.5 );
  CHECK_EQUAL( 3.75, range.xmin );
  CHECK_EQUAL( 2.5, range.ymin );
  CHECK_EQUAL( 5.25, range.xmax );
  CHECK_EQUAL( 3.5, range.ymax );
}

TEST(rangeContains)
{
  Range2D range( 3, 1, 5, 3 );
  CHECK( range.contains( 4, 2 ) );
  CHECK( range.contains( 3, 1 ) );
  CHECK( range.contains( 5, 3 ) );
  CHECK( !range.contains( 2, 5 ) );
  CHECK( !range.contains( 3, 0 ) );
  CHECK( !range.contains( 6, 2 ) );
  CHECK( !range.contains( 3, 4 ) );
}

TEST(rangeCollect)
{
  Range2D range( 3, 1, 5, 3 );
  IntData data;
  data.push_back( IntDatum( 2, 5, 1 ) );
  data.push_back( IntDatum( 4, 2, 2 ) );
  data.push_back( IntDatum( 3, 0, 3 ) );
  data.push_back( IntDatum( 3, 1, 4 ) );
  data.push_back( IntDatum( 6, 2, 5 ) );
  data.push_back( IntDatum( 5, 3, 6 ) );
  data.push_back( IntDatum( 3, 4, 7  ) );

  IntData results;
  range.collect( data.begin(), data.end(), back_inserter( results ) );

  CHECK_EQUAL( 3, results.size() );
  CHECK_EQUAL( 2, results[0].value );
  CHECK_EQUAL( 4, results[1].value );
  CHECK_EQUAL( 6, results[2].value );
}

TEST(rangeIntersects)
{
  Range2D range( 3, 1, 6, 4 );
  CHECK( range.intersects ( range ) );
  CHECK( range.intersects( Range2D( 4, 2, 7, 5 ) ) );
  CHECK( range.intersects( Range2D( 2, 0, 4, 2 ) ) );
  CHECK( range.intersects( Range2D( 2, 0, 7, 5 ) ) );
  CHECK( range.intersects( Range2D( 4, 2, 5, 3 ) ) );
  CHECK( !range.intersects( Range2D( 4, 5, 6, 7 ) ) );
  CHECK( !range.intersects( Range2D( 8, 3, 10, 6 ) ) );
}

Submit to Code Critic: your range2d.h and range2d.cpp, clearly marking which is which. Don't send test code, unless you added any new tests. If so, send just the new tests.

PA3 Task 2: Create a quad tree class

Create a file quadtree.h. In it, define a generic templated QuadTree structure. This will be used to store data associated with 2-dimensional coordinates. The data might be anything, from a string or number to an arbitrary data structure.

Unlike a 2-d tree (chapter 12, section 12.6), a quad tree needs to know what range of points it will contain. The range can be as large as we need, though it's best to keep it as small as possible.

Exercises 12.38 through 12.40 and Figure 12.53 show the basic idea. Initially the tree contains one node that's empty. Adding one point to a node is easy. If another point is added to the node, however, 4 child nodes are created, one for each quadrants (hence the name), the old point is moved to itse appropriate subquadrant, and then, recursively, the new point is added to its appropriate subquadrant.

Note that if the new point is very close to the old point, it will probably go to the same subquadrant, and the splitting process will repeat, until subquadrants become small enough that the points go into different ones.

This raises the issue of what to do with equal points. If we just let splitting occur, we'd have infinite recursion. Instead, use this variation of the basic quad tree algorithm. Each node has

When an attempt is made to add a new item to a node:

Here's the public API:

Then add the following tests. Make sure you understand what these tests are checking for. Post a question if you don't.

TEST(emptyTree)
{
  QuadTree<IntDatum> tree( 0, 0, 10, 10 );
  CHECK_EQUAL( 0, tree.size() );
}

TEST(addOnePoint)
{
  QuadTree<IntDatum> tree( 0, 0, 10, 10 );
  IntDatum datum( 4, 2, 1 );
  tree.add( datum );
  CHECK_EQUAL( 1, tree.size() );
  
  IntData results;
  CHECK_EQUAL( 1, tree.get( 4, 2, back_inserter( results ) ) );
  CHECK_EQUAL( datum.value, results[0].value );
}

TEST(addSeveralPoints)
{
  QuadTree<IntDatum> tree( 0, 0, 20, 20 );
  tree.add( IntDatum( 4, 2, 1 ) );
  tree.add( IntDatum( 12, 8, 2 ) );
  tree.add( IntDatum( 1, 3, 3 ) );
  tree.add( IntDatum( 2, 1, 4 ) );
  tree.add( IntDatum( 10, 10, 5 ) );
  CHECK_EQUAL( 5, tree.size() );

  IntData results;
  IntInserter inserter = back_inserter( results );

  CHECK_EQUAL( 1, tree.get( 4, 2, inserter ) );
  CHECK_EQUAL( 1, tree.get( 12, 8, inserter ) );
  CHECK_EQUAL( 1, tree.get( 1, 3, inserter ) );
  CHECK_EQUAL( 1, tree.get( 2, 1, inserter ) );
  CHECK_EQUAL( 1, tree.get( 10, 10, inserter ) );

  sort( results.begin(), results.end() );
  IntData::iterator iter = results.begin();
  CHECK_EQUAL( 1, (iter++)->value );
  CHECK_EQUAL( 2, (iter++)->value );
  CHECK_EQUAL( 3, (iter++)->value );
  CHECK_EQUAL( 4, (iter++)->value );
  CHECK_EQUAL( 5, (iter++)->value );
}

TEST(addDuplicates)
{
  typedef QuadTree<IntDatum> Tree;
  Tree tree( 0, 0, 10, 10 );
  tree.add( IntDatum( 4, 2, 1 ) );
  tree.add( IntDatum( 4, 2, 2 ) );
  tree.add( IntDatum( 4, 2, 3 ) );
  CHECK_EQUAL( 3, tree.size() );

  IntData results;
  CHECK_EQUAL( 3, tree.get( 4, 2, back_inserter( results ) ) );
  CHECK_EQUAL( 1, results[0].value );
  CHECK_EQUAL( 2, results[1].value );
  CHECK_EQUAL( 3, results[2].value );
}

Submit to Code Critic: your quadtree.h file. If you created any new new test code, include that too.

PA3 Task 3: Support range queries

Add the ability to do range queries, i.e., to ask for all points in the quad tree within some rectangular range. A big advantage of quad trees is that only subtrees whose quadrants intersect with the range need to be checked. So if the query has a small range, the tree search will behave somewhat like binary search, rapidly eliminating and never inspect irrelevant distant points.

Add the following to the quad tree API:

TEST(rangeQuery)
{
  QuadTree<IntDatum> tree( 0, 0, 20, 20 );
  tree.add( IntDatum( 4, 2, 1 ) );
  tree.add( IntDatum( 12, 8, 2 ) );
  tree.add( IntDatum( 1, 3, 3 ) );
  tree.add( IntDatum( 2, 1, 4 ) );
  tree.add( IntDatum( 10, 10, 5 ) );

  CHECK_EQUAL( 5, tree.size() );

  IntData results;
  IntInserter inserter = back_inserter( results );
  tree.query( Range2D( 0, 0, 20, 20 ), inserter );
  CHECK_EQUAL( 5, results.size() );

  results.clear();
  tree.query( Range2D( 0, 0, 8, 8 ), inserter );
  CHECK_EQUAL( 3, results.size() );

  results.clear();
  tree.query( Range2D( 10, 10, 20, 20 ), inserter );
  CHECK_EQUAL( 1, results.size() );

  CHECK_EQUAL( 5, results[ 0 ].value );

  results.clear();
  tree.query( Range2D( 10, 10, 10, 10 ), inserter  );
  CHECK_EQUAL( 1, results.size() );
}

Submit to Code Critic: whatever code you added or changed. Don't send the entire files or the test code.

PA3 Task 4: load a large dataset

Now it's time to see how a quad tree compares to scanning a vector.

The plain text file zipcodes.txt contains the location of over 43,000 US Zip Codes. A sample line is:

01543	42.380877	-71.96427	MA	Rutland

There's one zip code per line, and the elements are tab-separated. The above says that the center of the area with Zip Code 01543 in Rutland, Massachusetts is latitude 42.380877 and longitude -71.96427.

Download this file and put it in your project directory.

In the files zipcode.h and zipcode.cpp define a ZipCode structure with:

Then put the following timing code in your tests.cpp. This code loads the zip codes into a simple vector, then stores them in your quad tree, then compares querying with several different ranges.

At the start of your tests file, after the includes but before the actual tests, put

typedef std::vector<ZipCode> ZipCodes;
typedef QuadTree<ZipCode> ZipTree;
typedef std::back_insert_iterator< ZipCodes > ZipInserter;

// Prints the time to retrieve all zip codes inside a range from a
// vector and from a quad tree.
void timeRetrieval( Range2D range, const ZipCodes &zipCodes, const ZipTree &zipTree )
{
  std::clock_t start, end;
  ZipCodes results;
  ZipInserter inserter = back_inserter( results );
  unsigned int count;

  start = std::clock();
  for (int i = 0; i < 1000; ++i) {
    results.clear();
    range.collect( zipCodes.begin(), zipCodes.end(), inserter );
  }
  end = std::clock();
  std::cout << "1000 vector scans for " << results.size() << " Chicago codes: "
            << (end - start) << " ticks" << std::endl;
  count = results.size();

  start = std::clock();
  for (int i = 0; i < 1000; ++i) {
    results.clear();
    zipTree.query( range, inserter );
  }
  end = std::clock();
  std::cout << "1000 quad tree queries for " << results.size() << " Chicago codes: "
            << (end - start) << " ticks" << std::endl;

  CHECK_EQUAL( count, results.size() );
}

Then add this to your tests:

TEST(timing)
{
  ZipCodes zipCodes;
  std::clock_t start, end;

  start = std::clock();
  ZipCode::readFile( "zipcodes.txt", back_inserter( zipCodes ) );
  end = std::clock();

  std::cout << "Time to load " << zipCodes.size() << " zip codes: "
    << (end - start) << " ticks" << std::endl;

  CHECK_EQUAL( 43191, zipCodes.size() );


  ZipTree zipTree( -10, -180, 75, -60 );
  start = std::clock();
  for ( ZipCodes::iterator iter = zipCodes.begin();
        iter != zipCodes.end();
        ++iter ) {
    zipTree.add( *iter );
  }
  end = clock();
  std::cout << "Time to build tree: " << (end - start) << " ticks" << std::endl;

  CHECK_EQUAL( zipCodes.size(), zipTree.size() );

  Range2D range( 41.34,	-87.89, 42.1,	-87.55 );
  timeRetrieval( range, zipCodes, zipTree );

  range.resize( 0.5 );
  timeRetrieval( range, zipCodes, zipTree );
}

Add a few other ranges, including looking for just one specific point, using a range with whose lower and upper bounds are equal. Collect the timings on all of these. Run 1000 times to get measurable results.

Submit to Code Critic: your zipcode.h and zipcode.cpp files, and any new tests you wrote.

Wrap Up

When all of the above has been completed, and approved as necessary via the Code Critic, check over your code and send it in, along with a text or Word file reporting on the times from the last task.

Make sure your program runs standalone, with no user input of any kind. That includes not stopping at the end with a "hit any key to continue" message. If I can't run your program automatically from a script, it will be returned.

Your code should compile without errors or warnings.

Be sure your test code file has an opening header comment that identifies the authorship of the code, as follows:

// EECS311 PA3 solution: Quad trees
// Date: November 12, 2009
// Authors: T.H.E.Hacker
// 
// Except for code provided by the instructor or textbook, this code was written
// solely by the authors listed above, and not copied from another student
// or external source.

Run make handin to generate a Zip archive of all your source code. By design, the Magic Makefile does not include the word list file. Do not include the word list file.

Email your analysis document and the Zip archive produced by the Magic Makefile to me. Use the Subject EECS 311 PA 3 Quad Tree.


Valid HTML 4.01 Strict