Oracle FAQ | Your Portal to the Oracle Knowledge Grid |
Home -> Community -> Usenet -> c.d.o.server -> Re: Fast bitwise search
Richard Kuhler <noone_at_nowhere.com> wrote:
> ctcgag_at_hotmail.com wrote:
> > Richard Kuhler <noone_at_nowhere.com> wrote:
> >
> >>kuntz.1507173_at_bloglines.com wrote:
> >>
> >>>we have a large table that contains a bitmask field:
> >>>MASK NUMBER(19)
> >>>
> >>>The application then performs a query that includes BITAND(MASK, ?) >
> >>>0 condition.
> >>>
> >>>Is there any way to use an index for this condition. I was thinking
> >>>about bitmap indexes for this but could not figure out how to use them
> >>>here.
> >>
> >>Why in the world would you bundle data up into a column like that? I
> >>can't imagine why the correct solution isn't to break those bits out
> >>into individual columns with descriptive names so people can actually
> >>see what information is stored there. Then you could use bitmap
> >>indexes on them if performance dictated it (keeping in mind the
> >>problems they create).
> >
> > I don't know why the OP would bundle data up like that, but I do it
> > because N+512 columns in a table is just too many, and N+2048 columns
> > is way too many. (Of course, I have little hopes of getting bitmap
> > indices to work effectively on this data anyway.)
> >
> > Xho
>
> Wow, you really have a tuple with 2048 distinct and meaningful
> attributes?
They certainly are distinct. I don't know about meaningful. Bit #478 being on means that the underlying object has a sub-component which hashes to 478 (or maybe to 479, I can never remember).
> What is this application if I might ask?
Cheminformatics (and some bioinformatics). Some queries in this field have the unfortunate property that testing a query against an object to determine if they match is an NP-complete task, and can take anywhere from 0.01 second to 10 seconds per row. But you can come up with clever ways to create bitstrings from both object and query such that, for each bit that is "on" in the query, there is a 100% chance of it also being on in a matching object while only a 50% (or so) chance of it being on in a non-matching object. If you test hundreds or thousands of bits, almost all the non-matching objects fall out, leaving you to do the slow, accurate, NP-complete check on only a handful of objects instead of millions.
> Even if you
> convinced me that they absolutely can't be columns, I'd wonder why they
> can't be rows in a related table.
In my case the entire reason for using the bitstring is for performance, and I very much doubt doing this would increase performance.
Not that this has much to do with the original question any more, but you did ask :)
Xho
-- -------------------- http://NewsReader.Com/ -------------------- Usenet Newsgroup Service $9.95/Month 30GBReceived on Wed Jun 02 2004 - 10:23:52 CDT