Modelling large trees and hierarchies

From: Paul Arnold <paula_at_pivetal.com>
Date: 16 Jun 2004 13:36:17 -0700
Message-ID: <b6ee5aa3.0406161236.6b2e1205_at_posting.google.com>



We are currently in the process of developing a client/server based generic tree application with which we want to be able to model and render any size, shape and depth of tree.  

Our primary objective is to provide the ability to model large data trees with some 20 million+ nodes whilst still allowing adequate rendering and query times from the client.  

We have just bought Joe Celko's "Tree's and Hierarchies in SQL fo Smarties" and have picked up some very useful tips.  

We are definitely looking at using some hybrid model but are undecided as to what combination would suit us best.  

Whilst constructing the tree we have the following information available to us, that we could/may include in our database structure:  

  1. The Node's parent.
  2. The Node's depth/level.
  3. The Node's path to the root node.
  4. The Node's child nodes count.
  5. The Node's descendant nodes count.
  6. The Node's position relative to it's sibling nodes.

This infomation would allow us to combine many of the models Joe Celko discusses, including:  

  1. The Nested Sets/Interval model - allows us to quickly find subtrees.
  2. The Adjacency List model - allows us to quickly find immediate child nodes.
  3. The Materialized Path model - allows us to quickly find the ancestors of the node.
  4. The Depth Model - allows us to quickly find nodes related to their levels/depth.

It is also quite likely that we will want to change the structure of the tree fairly frequently. As an indication, I can see update rates of every 15 minutes or less.  

To us, disk space and hardware are cheap, so we are not concerned with storage overhead. We just need optimal query performance.  

Types of queries we need are for example:  

Given a node...

1) Get it's immediate children.
2) Get it's ancestors.
3) Get all of it's descendants.
4) Get it's descendants to a certain depth.
5) Get it's siblings 

etc. etc.

Any suggestions/foresight/tips that may help us in the database modelling would be most appreciated? Received on Wed Jun 16 2004 - 22:36:17 CEST

Original text of this message