Oracle FAQ | Your Portal to the Oracle Knowledge Grid |
![]() |
![]() |
Home -> Community -> Mailing Lists -> Oracle-L -> Re: Removing duplicate subtrees from CONNECT-BY query
Riku,
Great question. Unfortunately, CONNECT merely refers to a particular way of traversing the rows, and there is, to my knowledge, nothing to tell Oracle to prune a subdirectory it has already visited - it would mean remembering all the nodes, ouch. Duplicates are just a natural thing in trees. We all have ancestors who appear several times in our family tree. You could of course try to manage it yourself with a cursor, a PL/SQL table and a stored procedure but I think you mentioned performance ...
If I understand well your problem, the question is "what are the
descendants of <some thing that identify a row here> (condition #1) that
also verify <something that tells you you have what you want> (condition
#2).
What about reversing the problem? If condition #2 is selective enough,
perhaps it would be simpler to search for condition #2 first, and check
that there is a row among the "ancestors" of the row found that verify
condition #1? I assume implicitly, of course, that you don't want to do
this for millions of rows, because then all things considered a brutal
DISTINCT on the result set may well be the most efficient ...
Short of that, I would question the design ...
HTH Stéphane Faroult
Riku Räsänen wrote:
> Hello,
>
> Certain application has a table with huge hierarchy, and a subtree of
> the hierarchy is allowed to exist in several places in the tree. No
> loops are allowed though.
>
> The problem is that this hierarchy has to be searched effectively.
> Querying a certain top-level hierarchy returns over 600 000 DISTINCT
> nodes. With the duplicate subtrees, the query returns almost 6 000 000
> nodes, where certain node appears 2000 times in the result.
>
> Of course the requirement is that the search has to be effective and
> even this worst case of 6M rows should be handled in reasonable time
> (this is an OLTP application). This "subtree allowed to appear several
> times" is completely new case for me in the land of CONNECT BY's. Did
> RTFM, did "use the Google, Luke" etc, but did not find anything sensible.
>
> So the question is: What is the most efficient way of removing the
> duplicates of this resultset? Currently there is no way to identify
> duplicate subtrees from the data except that the same ID appears
> multiple times. Is there a way to make CONNECT BY -operator to
> identify and prune the duplicate subtrees?
-- http://www.freelists.org/webpage/oracle-lReceived on Sun Oct 07 2007 - 15:24:16 CDT
![]() |
![]() |