Re: Removing duplicate subtrees from CONNECT-BY query

From: Stephane Faroult <sfaroult_at_roughsea.com>
Date: Sun, 07 Oct 2007 22:24:16 +0200
Message-ID: <47094070.6030005@roughsea.com>

Riku,

Great question. Unfortunately, CONNECT merely refers to a particular way of traversing the rows, and there is, to my knowledge, nothing to tell Oracle to prune a subdirectory it has already visited - it would mean remembering all the nodes, ouch. Duplicates are just a natural thing in trees. We all have ancestors who appear several times in our family tree. You could of course try to manage it yourself with a cursor, a PL/SQL table and a stored procedure but I think you mentioned performance ...

If I understand well your problem, the question is "what are the descendants of <some thing that identify a row here> (condition #1) that also verify <something that tells you you have what you want> (condition #2).
What about reversing the problem? If condition #2 is selective enough, perhaps it would be simpler to search for condition #2 first, and check that there is a row among the "ancestors" of the row found that verify condition #1? I assume implicitly, of course, that you don't want to do this for millions of rows, because then all things considered a brutal DISTINCT on the result set may well be the most efficient ...

Short of that, I would question the design ...

HTH Stéphane Faroult

Riku Räsänen wrote:
> Hello,
>
> Certain application has a table with huge hierarchy, and a subtree of
> the hierarchy is allowed to exist in several places in the tree. No
> loops are allowed though.
>
> The problem is that this hierarchy has to be searched effectively.
> Querying a certain top-level hierarchy returns over 600 000 DISTINCT
> nodes. With the duplicate subtrees, the query returns almost 6 000 000
> nodes, where certain node appears 2000 times in the result.
>
> Of course the requirement is that the search has to be effective and
> even this worst case of 6M rows should be handled in reasonable time
> (this is an OLTP application). This "subtree allowed to appear several
> times" is completely new case for me in the land of CONNECT BY's. Did
> RTFM, did "use the Google, Luke" etc, but did not find anything sensible.
>
> So the question is: What is the most efficient way of removing the
> duplicates of this resultset? Currently there is no way to identify
> duplicate subtrees from the data except that the same ID appears
> multiple times. Is there a way to make CONNECT BY -operator to
> identify and prune the duplicate subtrees?

--
http://www.freelists.org/webpage/oracle-l

Received on Sun Oct 07 2007 - 15:24:16 CDT